这个方法能干嘛,发现一些基因模块,找到一些关键的调控因子,跟SCENIC差不多。
揭示神经癌症的表型可塑性调控机制:scRegClust算法带来的新发现
近日,由瑞典乌普萨拉大学、美国Dana-Farber癌症研究院和麻省理工学院联合研究团队在《Nature Communications》上发表了一篇突破性的研究论文,揭示了神经系统癌症中细胞状态的复杂调控机制。通过创新算法“scRegClust”,研究人员成功解码了神经癌症中多种细胞状态的转录调控程序,为抗癌治疗提供了全新的方向。
背景:理解神经癌症的多样性
神经系统癌症,如成人胶质母细胞瘤(GBM)和儿童弥漫性中线胶质瘤(DMG),通常表现出高度的表型多样性,这种多样性与细胞在正常发育过程中的分化机制密切相关。例如,GBM肿瘤细胞会呈现出类似神经祖细胞(NPCs)、少突胶质祖细胞(OPCs)或星形胶质细胞(ACs)的特征,而在DMG中,肿瘤细胞则会从增殖性OPC样状态逐渐向成熟细胞转变。这些细胞状态不仅影响肿瘤的侵袭性,还与治疗耐药性密切相关。然而,当前关于这些细胞状态调控机制的理解仍然十分有限。
技术创新:单细胞调控驱动聚类算法
为了深入探究肿瘤细胞状态的调控机制,研究团队开发了“单细胞调控驱动聚类”算法(scRegClust)。该算法通过整合大量单细胞RNA测序(scRNA-seq)数据,精准识别影响细胞状态的关键转录因子和激酶。与现有的基因调控网络(GRN)构建方法相比,scRegClust具有更高的运算效率和更强的预测能力,可在短时间内分析数百万细胞的数据集。
研究发现:揭示关键调控因子
研究人员利用scRegClust算法对多种神经系统癌症的数据集进行了深入分析。结果显示,在GBM和DMG中,肿瘤细胞的多种转录状态与正常脑发育过程中的细胞状态有显著重叠。例如,他们发现SPI1和IRF8这两个转录因子在调控免疫介导的间质样状态(mesenchymal-like state)中发挥关键作用。这一发现表明,肿瘤细胞可能通过模仿免疫细胞的特征来逃避宿主的免疫监视,从而增强对治疗的抵抗力。
在胶质母细胞瘤的研究中,团队发现通过抑制PDGFRA、DDR1和ERBB3等激酶,可以有效地抑制OPC样细胞状态的转化,进而增强常用化疗药物替莫唑胺(TMZ)的治疗效果。实验验证表明,通过CRISPR/Cas9技术下调这些关键基因的表达,显著提高了肿瘤细胞对TMZ的敏感性。此外,团队还发现,与TMZ联合使用酪氨酸激酶抑制剂(如达沙替尼)能够显著增强治疗效果,为胶质母细胞瘤的药物组合治疗提供了新的策略。
拓展应用:跨癌种的调控程序解析
为了进一步验证算法的普适性,研究团队将scRegClust应用于其他类型的癌症,包括乳腺癌、肝癌、肺癌和胰腺癌等15种肿瘤类型的数据集。分析结果表明,不同癌症类型中存在一些共通的调控因子,如调控细胞周期的CENPA和HMGB1,以及调控细胞应激反应的YBX1和JUN。这些发现不仅有助于理解癌症的异质性,还为开发新的靶向治疗策略提供了潜在的分子靶点。
展望:为精准医学铺路
本次研究通过scRegClust算法揭示了神经癌症细胞状态的调控机制,不仅为基础研究提供了全新的工具,还为癌症的临床治疗带来了新的希望。研究团队表示,未来将进一步优化算法,以支持更多数据类型的整合分析,如单细胞ATAC-seq和CHIP-seq数据,从而更全面地解析肿瘤的多层次调控网络。此外,团队计划将这一算法推广到更广泛的癌症研究中,以期发现更多潜在的治疗靶点,加速精准医学的发展。
Cite this article
Larsson, I., Held, F., Popova, G. et al. Reconstructing the regulatory programs underlying the phenotypic plasticity of neural cancers. Nat Commun 15, 9699 (2024).
代码操作的整个流程比较简单,很容易理解
#安装
# install.packages("devtools")
devtools::install_github("scmethods/scregclust")
---
title: "Demonstration of workflow"
output: rmarkdown::html_vignette
vignette: >
%\VignetteIndexEntry{Demonstration of workflow}
%\VignetteEngine{knitr::rmarkdown}
%\VignetteEncoding{UTF-8}
---
```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>"
)
```
The methods below are described in our article
> Larsson I & Held F, et al. (2023) Reconstructing the regulatory programs
> underlying the phenotypic plasticity of neural cancers. Preprint available
> at [bioRxiv](https://www.biorxiv.org/content/10.1101/2023.03.10.532041v1);
> 2023.03.10.532041.
Here we demonstrate the scregclust workflow using the PBMC data from
10X Genomics (available [here](https://www.10xgenomics.com/resources/datasets/pbmc-from-a-healthy-donor-granulocytes-removed-through-cell-sorting-3-k-1-standard-2-0-0)).
This is the same data used in an [introductory vignette](https://satijalab.org/seurat/articles/pbmc3k_tutorial)
for the Seurat package. We use [Seurat](https://satijalab.org/seurat/) for
pre-processing of the data.
```{r load-packages, results='hide', message=FALSE}
# Load required packages
library(Seurat)
library(scregclust)
```
# Download the data
We are focusing here on the filtered feature barcode matrix available as an
HDF5 file from the website linked above. The data can be downloaded manually
or using R.
However you obtain the data, the code below assumes that the HDF5 file
containing it is placed in the same folder as this script with the name
`pbmc_granulocyte_sorted_3k_filtered_feature_bc_matrix.h5`.
```{r download-data}
url <- paste0(
"https://cf.10xgenomics.com/samples/cell-arc/2.0.0/",
"pbmc_granulocyte_sorted_3k/",
"pbmc_granulocyte_sorted_3k_filtered_feature_bc_matrix.h5"
)
path <- "pbmc_granulocyte_sorted_3k_filtered_feature_bc_matrix.h5"
download.file(url, path, cacheOK = FALSE, mode = "wb")
```
# Load the data in Seurat and preprocess
To perform preprocessing use Seurat to load the data. The file ships with
two modalities, "Gene Expression" and "Peaks". We only use the former.
```{r load-h5}
pbmc_data <- Read10X_h5(
"pbmc_granulocyte_sorted_3k_filtered_feature_bc_matrix.h5",
use.names = TRUE,
unique.features = TRUE
)[["Gene Expression"]]
```
We create a Seurat object and follow the Seurat vignette to subset the
cells and features (genes).
```{r create-seurat-object}
pbmc <- CreateSeuratObject(
counts = pbmc_data, min.cells = 3, min.features = 200
)
pbmc[["percent.mt"]] <- PercentageFeatureSet(pbmc, pattern = "^MT.")
pbmc <- subset(pbmc, subset = percent.mt < 30 & nFeature_RNA < 6000)
```
[SCTransform](https://satijalab.org/seurat/articles/sctransform_vignette) is
used for variance stabilization of the data and Pearson residuals for the
6000 most variable genes are extracted as matrix `z`.
```{r apply-var-stabilization}
pbmc <- SCTransform(pbmc, variable.features.n = 6000)
z <- GetAssayData(pbmc, layer = "scale.data")##直接提就行
dim(z)
```
# Use scregclust for clustering target genes into modules
We then use `scregclust_format` which extracts gene symbols from the
expression matrix and determines which genes are considered regulators.
By default, transcription factors are used as regulators. Setting `mode`
to `"kinase"` uses kinases instead of transcription factors. A list of the
regulators used internally is returned by `get_regulator_list()`.
```{r prep-scregclust}
out <- scregclust_format(z, mode = "TF")
```
The output of `scregclust_format` is a list with three elements.
1. `genesymbols` contains the rownames of `z`
2. `sample_assignment` is initialized to be a vector of `1`s of length `ncol(z)`
and can be filled with a known sample grouping. Here, we do not use it and
just keep it uniform across all cells.
3. `is_regulator` is an indicator vector (elements are 0 or 1) corresponding to
the entries of `genesymbols` with 1 marking that the genesymbol is selected
as a regulator according to the model of `scregclust_format` (`"TF"` or
`"kinase"`) and 0 otherwise.
```{r extract-scregclust-arguments}
genesymbols <- out$genesymbols
sample_assignment <- out$sample_assignment
is_regulator <- out$is_regulator
```
Run `scregclust` with number of initial modules set to 10 and test
several penalties. The penalties provided to `penalization` are used during
selection of regulators associated with each module. An increasing penalty
implies the selection of fewer regulators.
`noise_threshold` controls the minimum $R^2$ a gene has to achieve across
modules. Otherwise the gene is marked as noise.
The run can be reproduced with the command below. A pre-fitted model can be
downloaded from [GitHub](https://github.com/sven-nelander/scregclust/raw/main/datasets/pbmc_scregclust.rds)
for convenience.
```{r run-scregclust}
# set.seed(8374)
# fit <- scregclust(
# z, genesymbols, is_regulator, penalization = seq(0.1, 0.5, 0.05),
# n_modules = 10L, n_cycles = 50L, noise_threshold = 0.05
# )
# saveRDS(fit, file = "pbmc_scregclust.rds")
url <- paste0(
"https://github.com/sven-nelander/scregclust/raw/main/datasets/",
"pbmc_scregclust.rds"
)
path <- "pbmc_scregclust.rds"
download.file(url, path)
fit <- readRDS("pbmc_scregclust.rds")
```
# Analysis of results
Results can be visualized easily using built-in functions.
Metrics for helping in choosing an optimal penalty can be plotted by calling
`plot` on the object returned from `scregclust`.
```{r viz-metrics, fig.width=7, fig.height=4, fig.dpi=100}
plot(fit)
```
The results for each penalization parameter are placed in a list, `results`,
attached to the `fit` object. So `fit$results[[1]]` contains the results
of running `scregclust` with `penalization = 0.1`. For each penalization
parameter, the algorithm might end up finding multiple optimal configurations.
Each configuration describes target genes module assignments and which
regulators are associated with which modules.
The results for each such configuration are contained in the list `output`.
This means that `fit$results[[1]]$output[[1]]` contains the results for
the first final configuration. More than one may be available.
```{r n-configs}
sapply(fit$results, function(r) length(r$output))
```
In this example, at most two final configurations were found for each
penalization parameters.
To plot the regulator network of the first configuration for
`penalization = 0.1` the function `plot_regulator_network` can be used.
```{r viz-reg-network, fig.width=7, fig.height=7, fig.dpi=100}
plot_regulator_network(fit$results[[1]]$output[[1]])
```