SCENIC寻找转录调控因子太慢了？那就试试这个NC新方法scRegClust

文摘 2024-11-12 07:05 北京

这个方法能干嘛，发现一些基因模块，找到一些关键的调控因子，跟SCENIC差不多。

揭示神经癌症的表型可塑性调控机制：scRegClust算法带来的新发现

近日，由瑞典乌普萨拉大学、美国Dana-Farber癌症研究院和麻省理工学院联合研究团队在《Nature Communications》上发表了一篇突破性的研究论文，揭示了神经系统癌症中细胞状态的复杂调控机制。通过创新算法“scRegClust”，研究人员成功解码了神经癌症中多种细胞状态的转录调控程序，为抗癌治疗提供了全新的方向。

背景：理解神经癌症的多样性

神经系统癌症，如成人胶质母细胞瘤（GBM）和儿童弥漫性中线胶质瘤（DMG），通常表现出高度的表型多样性，这种多样性与细胞在正常发育过程中的分化机制密切相关。例如，GBM肿瘤细胞会呈现出类似神经祖细胞（NPCs）、少突胶质祖细胞（OPCs）或星形胶质细胞（ACs）的特征，而在DMG中，肿瘤细胞则会从增殖性OPC样状态逐渐向成熟细胞转变。这些细胞状态不仅影响肿瘤的侵袭性，还与治疗耐药性密切相关。然而，当前关于这些细胞状态调控机制的理解仍然十分有限。

技术创新：单细胞调控驱动聚类算法

为了深入探究肿瘤细胞状态的调控机制，研究团队开发了“单细胞调控驱动聚类”算法（scRegClust）。该算法通过整合大量单细胞RNA测序（scRNA-seq）数据，精准识别影响细胞状态的关键转录因子和激酶。与现有的基因调控网络（GRN）构建方法相比，scRegClust具有更高的运算效率和更强的预测能力，可在短时间内分析数百万细胞的数据集。

研究发现：揭示关键调控因子

研究人员利用scRegClust算法对多种神经系统癌症的数据集进行了深入分析。结果显示，在GBM和DMG中，肿瘤细胞的多种转录状态与正常脑发育过程中的细胞状态有显著重叠。例如，他们发现SPI1和IRF8这两个转录因子在调控免疫介导的间质样状态（mesenchymal-like state）中发挥关键作用。这一发现表明，肿瘤细胞可能通过模仿免疫细胞的特征来逃避宿主的免疫监视，从而增强对治疗的抵抗力。

在胶质母细胞瘤的研究中，团队发现通过抑制PDGFRA、DDR1和ERBB3等激酶，可以有效地抑制OPC样细胞状态的转化，进而增强常用化疗药物替莫唑胺（TMZ）的治疗效果。实验验证表明，通过CRISPR/Cas9技术下调这些关键基因的表达，显著提高了肿瘤细胞对TMZ的敏感性。此外，团队还发现，与TMZ联合使用酪氨酸激酶抑制剂（如达沙替尼）能够显著增强治疗效果，为胶质母细胞瘤的药物组合治疗提供了新的策略。

拓展应用：跨癌种的调控程序解析

为了进一步验证算法的普适性，研究团队将scRegClust应用于其他类型的癌症，包括乳腺癌、肝癌、肺癌和胰腺癌等15种肿瘤类型的数据集。分析结果表明，不同癌症类型中存在一些共通的调控因子，如调控细胞周期的CENPA和HMGB1，以及调控细胞应激反应的YBX1和JUN。这些发现不仅有助于理解癌症的异质性，还为开发新的靶向治疗策略提供了潜在的分子靶点。

展望：为精准医学铺路

本次研究通过scRegClust算法揭示了神经癌症细胞状态的调控机制，不仅为基础研究提供了全新的工具，还为癌症的临床治疗带来了新的希望。研究团队表示，未来将进一步优化算法，以支持更多数据类型的整合分析，如单细胞ATAC-seq和CHIP-seq数据，从而更全面地解析肿瘤的多层次调控网络。此外，团队计划将这一算法推广到更广泛的癌症研究中，以期发现更多潜在的治疗靶点，加速精准医学的发展。

Cite this article

Larsson, I., Held, F., Popova, G. et al. Reconstructing the regulatory programs underlying the phenotypic plasticity of neural cancers. Nat Commun 15, 9699 (2024).

代码操作的整个流程比较简单，很容易理解

#安装# install.packages("devtools")devtools::install_github("scmethods/scregclust")

---title: "Demonstration of workflow"output: rmarkdown::html_vignettevignette: >  %\VignetteIndexEntry{Demonstration of workflow}  %\VignetteEngine{knitr::rmarkdown}  %\VignetteEncoding{UTF-8}---
```{r, include = FALSE}knitr::opts_chunk$set(  collapse = TRUE,  comment = "#>")```
The methods below are described in our article
> Larsson I & Held F, et al. (2023) Reconstructing the regulatory programs> underlying the phenotypic plasticity of neural cancers. Preprint available> at [bioRxiv](https://www.biorxiv.org/content/10.1101/2023.03.10.532041v1);> 2023.03.10.532041.
Here we demonstrate the scregclust workflow using the PBMC data from10X Genomics (available [here](https://www.10xgenomics.com/resources/datasets/pbmc-from-a-healthy-donor-granulocytes-removed-through-cell-sorting-3-k-1-standard-2-0-0)).This is the same data used in an [introductory vignette](https://satijalab.org/seurat/articles/pbmc3k_tutorial)for the Seurat package. We use [Seurat](https://satijalab.org/seurat/) forpre-processing of the data.
```{r load-packages, results='hide', message=FALSE}# Load required packageslibrary(Seurat)library(scregclust)```
# Download the data
We are focusing here on the filtered feature barcode matrix available as anHDF5 file from the website linked above. The data can be downloaded manuallyor using R.
However you obtain the data, the code below assumes that the HDF5 filecontaining it is placed in the same folder as this script with the name`pbmc_granulocyte_sorted_3k_filtered_feature_bc_matrix.h5`.
```{r download-data}url <- paste0(  "https://cf.10xgenomics.com/samples/cell-arc/2.0.0/",  "pbmc_granulocyte_sorted_3k/",  "pbmc_granulocyte_sorted_3k_filtered_feature_bc_matrix.h5")path <- "pbmc_granulocyte_sorted_3k_filtered_feature_bc_matrix.h5"
download.file(url, path, cacheOK = FALSE, mode = "wb")```
# Load the data in Seurat and preprocess
To perform preprocessing use Seurat to load the data. The file ships withtwo modalities, "Gene Expression" and "Peaks". We only use the former.
```{r load-h5}pbmc_data <- Read10X_h5(  "pbmc_granulocyte_sorted_3k_filtered_feature_bc_matrix.h5",  use.names = TRUE,  unique.features = TRUE)[["Gene Expression"]]```
We create a Seurat object and follow the Seurat vignette to subset thecells and features (genes).
```{r create-seurat-object}pbmc <- CreateSeuratObject(  counts = pbmc_data, min.cells = 3, min.features = 200)
pbmc[["percent.mt"]] <- PercentageFeatureSet(pbmc, pattern = "^MT.")pbmc <- subset(pbmc, subset = percent.mt < 30 & nFeature_RNA < 6000)```
[SCTransform](https://satijalab.org/seurat/articles/sctransform_vignette) isused for variance stabilization of the data and Pearson residuals for the6000 most variable genes are extracted as matrix `z`.
```{r apply-var-stabilization}pbmc <- SCTransform(pbmc, variable.features.n = 6000)
z <- GetAssayData(pbmc, layer = "scale.data")##直接提就行dim(z)```
# Use scregclust for clustering target genes into modules
We then use `scregclust_format` which extracts gene symbols from theexpression matrix and determines which genes are considered regulators.By default, transcription factors are used as regulators. Setting `mode`to `"kinase"` uses kinases instead of transcription factors. A list of theregulators used internally is returned by `get_regulator_list()`.
```{r prep-scregclust}out <- scregclust_format(z, mode = "TF")```
The output of `scregclust_format` is a list with three elements.
1. `genesymbols` contains the rownames of `z`2. `sample_assignment` is initialized to be a vector of `1`s of length `ncol(z)`   and can be filled with a known sample grouping. Here, we do not use it and   just keep it uniform across all cells.3. `is_regulator` is an indicator vector (elements are 0 or 1) corresponding to    the entries of `genesymbols` with 1 marking that the genesymbol is selected   as a regulator according to the model of `scregclust_format` (`"TF"` or   `"kinase"`) and 0 otherwise.
```{r extract-scregclust-arguments}genesymbols <- out$genesymbolssample_assignment <- out$sample_assignmentis_regulator <- out$is_regulator```
Run `scregclust` with number of initial modules set to 10 and testseveral penalties. The penalties provided to `penalization` are used duringselection of regulators associated with each module. An increasing penaltyimplies the selection of fewer regulators.`noise_threshold` controls the minimum $R^2$ a gene has to achieve acrossmodules. Otherwise the gene is marked as noise.The run can be reproduced with the command below. A pre-fitted model can bedownloaded from [GitHub](https://github.com/sven-nelander/scregclust/raw/main/datasets/pbmc_scregclust.rds)for convenience.
```{r run-scregclust}# set.seed(8374)# fit <- scregclust(#   z, genesymbols, is_regulator, penalization = seq(0.1, 0.5, 0.05),#   n_modules = 10L, n_cycles = 50L, noise_threshold = 0.05# )# saveRDS(fit, file = "pbmc_scregclust.rds")
url <- paste0(  "https://github.com/sven-nelander/scregclust/raw/main/datasets/",  "pbmc_scregclust.rds")path <- "pbmc_scregclust.rds"download.file(url, path)fit <- readRDS("pbmc_scregclust.rds")```
# Analysis of results
Results can be visualized easily using built-in functions.Metrics for helping in choosing an optimal penalty can be plotted by calling`plot` on the object returned from `scregclust`.
```{r viz-metrics, fig.width=7, fig.height=4, fig.dpi=100}plot(fit)```
The results for each penalization parameter are placed in a list, `results`,attached to the `fit` object. So `fit$results[[1]]` contains the resultsof running `scregclust` with `penalization = 0.1`. For each penalizationparameter, the algorithm might end up finding multiple optimal configurations.Each configuration describes target genes module assignments and whichregulators are associated with which modules.The results for each such configuration are contained in the list `output`.This means that `fit$results[[1]]$output[[1]]` contains the results forthe first final configuration. More than one may be available.
```{r n-configs}sapply(fit$results, function(r) length(r$output))```
In this example, at most two final configurations were found for eachpenalization parameters.
To plot the regulator network of the first configuration for `penalization = 0.1` the function `plot_regulator_network` can be used.
```{r viz-reg-network, fig.width=7, fig.height=7, fig.dpi=100}plot_regulator_network(fit$results[[1]]$output[[1]])```

生信钱同学

北京大学在读博士生，记录自己的学习日常🌞分享生信知识：如单细胞和空间测序、多组学分析、宏基因组、病理组学、影像组学等生物信息学、机器学习和深度学习内容🌬

最新文章

原来病理组学的质控是这么做的，这篇NC解决了这个问题

刚发的NC,想把空间组学学透，那就看这篇。一个技术不够，就多用几个

NCBI公共数据库中的数据该怎么下载，一条命令自动并行下载，公共数据深度挖掘

Nat Genet｜还是经典的课题设计，这些生信思路适用于临床科研人员——空间组学数据分析

Spateo-空间转录组的瑞士军刀-教程3：重构3D组织——切片对齐：基础使用

实用型绘图技巧分享——如何让热图的列聚类按照自己的想的顺序展示呢

又看到这个单细胞差异丰度分析用在顶刊上了，可以试下

单细胞多组学，空间数据分析代码，这篇Nature Medicine可以学

这篇Nature生信和实验部分衔接的太好了，简单的机制和思路。生信找东西，有用

从Nat Cancer 详解Scenic+用法：单细胞转录因子分析

Nature genetics你的单细胞数据也可以分析可变剪接，快补上这个分析

Spateo-空间转录组分析流程2：空间可变基因

T细胞注释搞不清楚，最近刚发的这篇Nat Methods肯定能给你整明白

Spateo-空间转录组的瑞士军刀-教程1：细胞分割

没有服务器，单细胞数据搞不定？我们目前做好了这些pipeline，可以帮你做

NC空间组学，与基因组特征结合，他给的代码基本上能把文章复现一下

Nature来解决生信痛点了，构建细胞图谱基础模型，推动跨数据集人类细胞相似性搜索

原来Scenic转录因子分析升级到Scenic+了，这篇Nat Cancer做了一个示范应用

系统学习单细胞多组学、空间转录组和机器学习单细胞分析应用线上会议11月30日开始

Cell教咱们学习一套的蛋白质组学的分析流程，5万多人

这次可不是只学单细胞，基本上从基础到多组学、空间、机器学习一条龙全打通了

Cell新发现，比较喜欢学习这种新的细胞类型的发现和验证过程，还有生信代码学

Nature单细胞测序还是能发现很多有意思的事情的，确实有用

这篇Nature Cancer以生信为主哎，标准的分析流程，不用太分领域都能用

对于咱们生信人来讲，通关了黑神话，显卡还能做点啥？GPU加速单细胞分析

Nature刚发的这个模型从结果上看很不错哎，挺有应用前景

活动名单公布；这篇NC单细胞数据分析的好简单啊，感觉只要入个门这些都能做出啦，用不了多久

真漂亮，这是这个月华大空间技术的第二篇Cell了，看看他们的3D时空建模框架——Spateo

SCENIC寻找转录调控因子太慢了？那就试试这个NC新方法scRegClust

这次有活动参与——NC还是比较喜欢生信文章，不需小鼠实验，单细胞+免疫组化+RNAscope即可，常规验证思路

这一期的nature两篇文章都在报道外染色体DNA（ecDNA），有何异同，了解一下重要性

这篇Cell的图也太漂亮的吧，简直是一种欣赏，生信空间组学

除了单细胞和单细胞核测序，这个文章还做了4种空间组学测序，这些技术组学有哪些优势？

干湿结合，看看这篇NC是怎么用好生信数据的，还有公共数据——学习本文代码

这篇Nature生信方法写的太详细了，能学到不少东西，Hotspot基因模块用好它，为什么用它？

审稿人意见：生信结果相关性才0.2-0.4，会不会不太行啊，该怎么解？

为啥我一直找不到做空间基因组测序的啊，看来要借助这种算法推断了

咱们的PCA分析除了降纬，还含有一些重要信息，你注意到了吗？

差异基因找的不好？Cell刚发的这个单细胞差异统计的方法，可以用到咱们自己的数据上

咱们的数据也能用这个生信方法试一试，看看有没有啥可用的发现

全代码干货奉上——多样本多方案去除单细胞环境RNA污染——这次把这个聊清楚

干细胞样CD4 T细胞——看看咱们的数据中有没有这个亚群，有没有这种分化潜力，我的数据中是能找到

Nature单细胞多组学+空间组学，看看他们怎么落脚细胞间通讯的

现在这个角度生信做起来还是挺有潜力的，所需样本量不多，估计很快也会被做完了

在没有目标的情况下，单细胞测序的数据该怎么用，怎么用好单测和公共数据？

亲测一批国产单细胞数据，没啥问题，国产单细胞新秀-寻因生物seeksoultools的使用

Bulk RNA-seq怎么找到与预后相关的细胞类型和靶基因——学习下这篇NC，代码分析

每一个单细胞图谱都有很多的生信地方可以学习，看看今日 Nature——学习笔记记录

单细胞测序拟时序生信分析，怎么选择起始点？——看看他们怎么选的

这应该是近期第三篇B细胞图谱了吧，这次是Cancer cell，3篇全看，弄懂泛癌分析策略

分类

时事

民生

政务

教育

文化

科技

财富

体娱

健康

情感

旅行

百科

职场

楼市

企业

乐活

学术

汽车

时尚

创业

美食

幽默

美体

文摘

原创标签

时事社会财经军事教育体育科技汽车科学房产搞笑综艺明星音乐动漫游戏时尚健康旅游美食生活摄影宠物职场育儿情感小说曲艺文化历史三农文学娱乐电影视频图片新闻宗教电视剧纪录片广告创意壁纸头像心灵鸡汤星座命理教育培训艺术文化金融财经健康医疗美妆时尚餐饮美食母婴育儿社会新闻工业农业时事政治星座占卜幽默笑话独立短篇连载作品文化历史科技互联网

发布位置

广东北京山东江苏河南浙江山西福建河北上海四川陕西湖南安徽湖北内蒙古江西云南广西甘肃辽宁黑龙江贵州新疆重庆吉林天津海南青海宁夏西藏香港澳门台湾美国加拿大澳大利亚日本新加坡英国西班牙新西兰韩国泰国法国德国意大利缅甸菲律宾马来西亚越南荷兰柬埔寨俄罗斯巴西智利卢森堡芬兰瑞典比利时瑞士土耳其斐济挪威朝鲜尼日利亚阿根廷匈牙利爱尔兰印度老挝葡萄牙乌克兰印度尼西亚哈萨克斯坦塔吉克斯坦希腊南非蒙古奥地利肯尼亚加纳丹麦津巴布韦埃及坦桑尼亚捷克阿联酋安哥拉