写在前面的话
单细胞测序的价格逐渐逼近万元下,更有国产平台有着更低的价格!但这就对多平台多样本数据整合有着一定的考验,锚点方式容易过拟合,其余批次矫正手段偏弱等问题!
总而言之,秉着解释生物学意义的原则出发吧!
文献:Single-Cell, Single-Nucleus, and Spatial RNA Sequencing of the Human Liver Identifies Cholangiocyte and Mesenchymal Heterogeneity 《Hepatology CommuniCations》
肝脏样本的单细胞转录组测序和单细胞核转录组测序
图片描述:(C)scRNA- seq 和 snRNA- seq 得到的细胞的 UMAP 合并(i),然后在合并前单独Scale(ii)。(D) scRNA- seq和 snRNA- seq细胞的 UMAP 合并前单独Scale,然后使用Harmony进行整合。
确实整合效果不同!!!
融合了大量数据(包括这篇文献) 常规流程进行整合
scRNA <- NormalizeData(scRNA, normalization.method = "LogNormalize", scale.factor = 10000)
scRNA <- FindVariableFeatures(scRNA, selection.method = "vst", nfeatures = 5000)
g2m_genes = cc.genes$g2m.genes
g2m_genes = CaseMatch(search = g2m_genes, match = rownames(scRNA))
s_genes = cc.genes$s.genes
s_genes = CaseMatch(search = s_genes, match = rownames(scRNA))
scRNA <- CellCycleScoring(object=scRNA, g2m.features=g2m_genes, s.features=s_genes)
scRNA$CC.Difference <- scRNA$S.Score - scRNA$G2M.Score
scRNA <- ScaleData(scRNA, vars.to.regress = c("CC.Difference","percent.ribo","percent.mt", "nCount_RNA"), verbose = T)
scRNA <- RunPCA(scRNA, features = VariableFeatures(object = scRNA))
scRNA <- RunHarmony(scRNA, group.by.vars="orig.ident", max.iter.harmony = 10)
scRNA <- RunUMAP(scRNA,reduction = "harmony", dims = 1:20, verbose = FALSE)
DimPlot(scRNA,group.by = "tech")
因整合的数据过多,作者这方法太耗费资源,借鉴采用了将单细胞/核分开取交集高变基因后分别进行scale后赋予merge对象的scale矩阵
scRNA_sc <- subset(scRNA,idents="SC")
scRNA_sn <- subset(scRNA,idents="SN")
scRNA_sc <- NormalizeData(scRNA_sc, normalization.method = "LogNormalize", scale.factor = 10000)
scRNA_sc <- FindVariableFeatures(scRNA_sc, selection.method = "vst", nfeatures = 5000)
scRNA_sn <- NormalizeData(scRNA_sn, normalization.method = "LogNormalize", scale.factor = 10000)
scRNA_sn <- FindVariableFeatures(scRNA_sn, selection.method = "vst", nfeatures = 5000)
hvgs <- c(VariableFeatures(scRNA_sc),VariableFeatures(scRNA_sn))
hvgs <- unique(hvgs[duplicated(hvgs)])
hvgs <- hvgs[!grepl("^MT-", hvgs)]
scRNA_sc <- ScaleData(scRNA_sc,features=hvgs, vars.to.regress = c("percent.ribo","percent.mt", "nCount_RNA"), verbose = T)
scRNA_sn <- ScaleData(scRNA_sn,features=hvgs, vars.to.regress = c("percent.ribo","percent.mt", "nCount_RNA"), verbose = T)
scale_sc <- scRNA_sc@assays$RNA@scale.data
scale_sn <- scRNA_sn@assays$RNA@scale.data
scale <- cbind(scale_sc,scale_sn)
scRNA@assays$RNA@scale.data <- scale
scRNA <- RunPCA(scRNA, features = hvgs)
scRNA <- RunHarmony(scRNA, group.by.vars="orig.ident", max.iter.harmony = 10)
scRNA <- RunUMAP(scRNA,reduction = "harmony", dims = 1:15, verbose = FALSE)
DimPlot(scRNA,group.by = "tech")
好像效果是好了些!
用什么方法进行数据整合,因个人而异,刚好看到和审稿人argue scVI和Harmony比较的文章,特意补充note来解释scVI和Harmony对于解释数据的生物学意义差异!!!!
Human skeletal muscle aging atlas 《Nature Aging》