偷偷问一下,关注了吗?
内容获取
1、购买打包合集(2025KS微信VIP付费合集),价格感人,可以加入微信VIP群(答疑交流群,甚至有小伙伴觉得群比代码更好),可以获取建号以来所有内容,群成员专享视频教程,提前更新,其他更多福利!
2、《KS科研分享与服务》公众号有QQ群,进入门槛是20元(完全是为了防止白嫖党,请理解),请考虑清楚。群里有免费推文的注释代码和示例数据(终身拥有),没有付费内容,群成员福利是购买单个付费内容半价!
需要者详情请联系作者(非需要者勿扰,处理太费时间):
Non-negative matrix factorization(NMF)非负矩阵分解是一种popular的矩阵分解方法,它允许从一组非负数据向量中提取稀疏和有意义的特征。它非常适合分解scRNA-seq数据,有效地将大型复杂矩阵简化为几个可解释的基因程序。GeneNMF是一个实现单细胞矩阵分解和基因程序发现方法的软件包。它可以直接应用于Seurat对象,以降低数据的维数,并在多个样本中检测稳定的基因程序。GeneNMF运行分析很简单,该方法受先前将非负矩阵分解(NMF)应用于单细胞 RNA 测序(scRNA-seq)数据工作的启发。相比于NMF,速度也是很快的。
GeneNMF github:https://github.com/carmonalab/GeneNMF?tab=readme-ov-file
官网提供了详细的教程,可以阅读学习!本文代码及数据已发布微信VIP群!
安装包:
setwd('/home/tq_ziv/data_analysis/NMF_singlecell/')
#===============================================================================
#安装加载R包
# install.packages("GeneNMF")
library(remotes)
remotes::install_github("carmonalab/GeneNMF")
library(GeneNMF)#R>=4.3
library(Seurat)
library(ggplot2)
library(UCell)
library(patchwork)
library(Matrix)
library(RcppML)
library(viridis)
GeneNMF用于降维,这里我们直接演示看看大群的效果,我们的数据是已经降维聚类注释好的数据,只不过用的是TSNE,我们这里用UMAP看看。其实和nmf一样,大群降维,nmf并没有什么明显的优势,seuat默认的降维方式分群已经就可以了!
seu <- FindVariableFeatures(uterus, nfeatures = 3000)
seu <- runNMF(seu, k = 10, assay="RNA")
# seu@reductions[["NMF"]]
# A dimensional reduction object with key NMF_
# Number of dimensions: 10
# Projected dimensional reduction calculated: FALSE
# Jackstraw run: FALSE
# Computed using assay: RNA
#我们这个数据是已经跑完降维并做好细胞注释的数据,只是跑的是TSNE
#我们可以这里跑以下UMAP,分别用pca和nmf降维,看看效果!
seu <- RunUMAP(seu, reduction = "NMF", dims=1:10,
reduction.name = "NMF_UMAP", reduction.key = "nmfUMAP_")
seu <- RunUMAP(seu, reduction = "pca", dims=1:10,
reduction.name = "PCA_UMAP", reduction.key = "pcaUMAP_")
library(ggplot2)
DimPlot(seu, reduction = "PCA_UMAP", group.by = "celltype", label=T) +
theme(aspect.ratio = 1,
axis.text = element_blank(),
axis.title = element_blank(),
axis.ticks = element_blank()) + ggtitle("PCA UMAP") + NoLegend()
DimPlot(seu, reduction = "NMF_UMAP", group.by = "celltype", label=T) +
theme(aspect.ratio = 1,
axis.text = element_blank(),
axis.title = element_blank(),
axis.ticks = element_blank()) + ggtitle("NMF UMAP") + NoLegend()
GeneNMF除了在降维上替代PCA之外,还可以识别跨多个样本的一致NMF程序。这里我们在亚群上进行演示,对上皮细胞进行亚群聚类:
#提取亚群,重新分群
Epi <- uterus[,Idents(uterus) %in% c("Unciliated epithelial cells","Ciliated epithelial cells")]
DefaultAssay(Epi) <- 'RNA'
Epi <- NormalizeData(Epi, normalization.method = "LogNormalize", scale.factor = 1e4)
Epi <- FindVariableFeatures(Epi, selection.method = 'vst', nfeatures = 3000)
Epi <- ScaleData(Epi, vars.to.regress = c("S.Score", "G2M.Score","percent.mt", "nCount_RNA"), verbose = T)
Epi <- RunPCA(Epi, npcs = 40, verbose = FALSE)
Epi <- RunHarmony(object = Epi,group.by.vars = c('orig.ident'))
Epi <- RunUMAP(Epi,reduction = "harmony",dims = 1:10)
Epi <- FindNeighbors(Epi,reduction = 'harmony', dims = 1:10)
Epi <- FindClusters(Epi,resolution = 0.5, verbose = FALSE)
DimPlot(Epi, label = T, pt.size = 1)
DimPlot(Epi, label = T, pt.size = 1, split.by = 'orig.ident')
k是NMF程序关键的参数,为了获取更加稳定的programs,使用multimf()函数自动对样本列表和多个k值执行NMF。
seu.list <- SplitObject(Epi, split.by = "orig.ident")
geneNMF.programs <- multiNMF(seu.list, assay="RNA",
k=4:10,
min.exp = 0.05,
nfeatures=2000)
#将识别的基因程序合并
geneNMF.metaprograms <- getMetaPrograms(geneNMF.programs,
metric = "cosine",
weight.explained = 0.5,
nMP=9,
max.genes=200)
#最后得到的结果geneNMF.metaprograms是一个list
#check一些参数
geneNMF.metaprograms$metaprograms.metrics
# sampleCoverage silhouette meanSimilarity numberGenes numberPrograms
# MP1 1.0000000 0.59783160 0.704 67 21
# MP2 1.0000000 0.21113850 0.333 98 24
# MP3 1.0000000 0.15745334 0.341 46 18
# MP4 1.0000000 0.04416608 0.220 73 28
# MP5 0.6666667 0.88466129 0.914 32 12
# MP6 0.6666667 0.66182734 0.724 15 12
# MP7 0.6666667 0.41217887 0.480 35 13
# MP8 0.3333333 0.48668043 0.594 34 8
# MP9 0.3333333 0.33003672 0.437 154 11
plot MP heatmap。
ht <- GeneNMF::plotMetaPrograms(geneNMF.metaprograms,similarity.cutoff = c(0.1,1))
这个默认作图也挺好的,但是我们在他的基础上修改修饰以下,让它更接近高分文章中的状态,为了表示对R包作者的尊重,函数还是用这个名称!
pdf('./aaa.pdf', height = 8,width = 9)
plotMetaPrograms(geneNMF.metaprograms,similarity.cutoff = c(0.1,1),
palette = c('white','#bfd3e6','#9ebcda','#8c96c6',
'#8c6bb1','#88419d','#810f7c','#4d004b'))
dev.off()
获取每个MP的基因:做富集分析,便于对基因程序的解释!
MP_genes <- unlist(geneNMF.metaprograms$metaprograms.genes)
MP <- rep(names(geneNMF.metaprograms$metaprograms.genes), lengths(geneNMF.metaprograms$metaprograms.genes))
MP_genes <- data.frame(gene = MP_genes, MPs = MP)
#of cource, geneNMF提供了分析函数,可以直接用他的
library(msigdbr)
library(fgsea)
top_p <- lapply(geneNMF.metaprograms$metaprograms.genes, function(program) {
runGSEA(program, universe=rownames(Epi), category = "C5", subcategory = "GO:BP")
})
save(top_p, file = 'top_p.RData')
得到了MPs的基因特征,可以做基因集分数:
mp.genes <- geneNMF.metaprograms$metaprograms.genes
Epi <- AddModuleScore_UCell(Epi, features = mp.genes, ncores=4, name = "")
VlnPlot(Epi, features=names(mp.genes), group.by = "orig.ident",pt.size = 0, ncol=3)
VlnPlot(Epi, features=names(mp.genes), group.by = "seurat_clusters",pt.size = 0, ncol=3)
#从样本角度看,当然我这个数据演示样本没有意义,有些MP在多个样本中活跃,也有在某些样本中突出活跃的MP
#从亚群cluster的角度看,MP与cluster的活跃性显著相关,例如MP1、MP5与cluster2明显相关
# FeaturePlot(Epi, features = "MP1")
# FeaturePlot(Epi, features = names(mp.genes),ncol=3) &
# scale_color_viridis(option="B") &
# theme(aspect.ratio = 1, axis.text=element_blank(), axis.ticks=element_blank())
integrated to seurat,使用基因程序得分表示单个细胞的特征!
#单个细胞现在可以用它们的基因程序得分来表示。
matrix <- Epi@meta.data[,names(mp.genes)]
#dimred <- scale(matrix)
dimred <- as.matrix(matrix)
colnames(dimred) <- paste0("MP_",seq(1, ncol(dimred)))
#New dim reduction
Epi@reductions[["MPsignatures"]] <- new("DimReduc",
cell.embeddings = dimred,
assay.used = "RNA",
key = "MP_",
global = FALSE)
Epi <- RunUMAP(Epi, reduction="MPsignatures", dims=1:length(Epi@reductions[["MPsignatures"]]),
metric = "euclidean", reduction.name = "umap_MP")
library(viridis)
FeaturePlot(Epi, features = names(mp.genes), reduction = "umap_MP", ncol=4) &
scale_color_viridis(option="B") &
theme(aspect.ratio = 1, axis.text=element_blank(), axis.ticks=element_blank())
DimPlot(Epi, reduction = "umap_MP", group.by = "seurat_clusters", label=T) + theme(aspect.ratio = 1,
axis.text = element_blank(),
axis.title = element_blank(),
axis.ticks = element_blank()) + ggtitle("Original cell types") + NoLegend()
关注我们获取精彩内容:
关注不迷路:扫描下面二维码关注公众号!
B站视频号链接:https://space.bilibili.com/471040659?spm_id_from=333.1007.0.0
关注 KS科研分享与服务,
认清正版优质内容和服务!
优质内容持续输出,物超所值!
合作联系:ks_account@163.com