scATAC联合scRNA之signac分析(四): 单样本分析数据质控及降维聚类流程

学术   2024-12-23 09:03   重庆  

偷偷问一下,关注了吗

内容获取


1、购买打包合集(《KS科研分享与服务》付费内容打包集合),价格感人,可以加入微信VIP群(答疑交流群,甚至有小伙伴觉得群比代码更好),可以获取建号以来所有内容,群成员专享视频教程,提前更新,其他更多福利!


2、《KS科研分享与服务》公众号有QQ群,进入门槛是20元(完全是为了防止白嫖党,请理解),请考虑清楚。群里有免费推文的注释代码和示例数据(终身拥有),没有付费内容,群成员福利是购买单个付费内容半价!


需要者详情请联系作者(非需要者勿扰,处理太费时间):


数据读入之后,分析流程和scRNA有点类似熟悉,也是质控、降维这些内容。这里因为我有三个样本的数据,所以循环分别处理。至于多样本的数据我们后面会说到。流程是类似的。完整内容及视频解说教程已发布微信VIP,自行下载!

添加基因组注释,需要注意添加的基因组版本要和上游分析用到的一致:

granges(HC)# GRanges object with 170827 ranges and 0 metadata columns:#   seqnames        ranges strand# <Rle>     <IRanges>  <Rle>#   [1]       chr1 629474-630381      *#   [2]       chr1 633581-634476      *#   [3]       chr1 778268-779188      *#   [4]       chr1 816903-817798      *#   [5]       chr1 819473-820427      *#   ...        ...           ...    ...# [170823] GL000219.1   50733-51595      *#   [170824] GL000219.1  99273-100155      *#   [170825] GL000219.1 165071-165956      *#   [170826] GL000219.1 168659-169301      *#   [170827] KI270713.1   21470-22359      *#   -------#   seqinfo: 28 sequences from an unspecified genome; no seqlengths
#可以看到,features的基因组范围,前面表示染色体的部分,有些不是标准的chr,也就是EnsDb.Hsapiens.v86种没有#而是,chromosome scaffolds,e.g. (KI270713.1),所以去除一下HC <- HC[as.vector(seqnames(granges(HC)) %in% standardChromosomes(granges(HC))), ]AA <- AA[as.vector(seqnames(granges(AA)) %in% standardChromosomes(granges(AA))), ]SD <- SD[as.vector(seqnames(granges(SD)) %in% standardChromosomes(granges(SD))), ]
#添加基因组注释# extract gene annotations from EnsDb# 基因和基因组注释信息是后续计算TSS富集得分,核小体含量和基因活跃度得分所必需的。# 同样,你得保证这里选择的基因组版本得和你上游数据处理时用到的基因组版本一致。library(AnnotationHub)ah <- AnnotationHub()query(ah, "EnsDb.Hsapiens.v98")#Search for the Ensembl 98 EnsDb for Homo sapiens on AnnotationHubensdb_v98 <- ah[["AH75011"]]#这个版本的注释文件可以保存一下,如果后续其他数据分析还用得到的话# save(ensdb_v98, file = 'ensdb_v98.RData')annotations <- GetGRangesFromEnsDb(ensdb = ensdb_v98)# annotation1 <- GetGRangesFromEnsDb(ensdb = EnsDb.Hsapiens.v86)seqlevels(annotations) <- paste0('chr', seqlevels(annotations))genome(annotations) <- "hg38"
# add the gene information to the objectAnnotation(HC) <- annotationsAnnotation(AA) <- annotationsAnnotation(SD) <- annotations

数据质控,scATAC的数据质控,signac的QC指标主要有Nucleosome banding pattern,TSS富集分数,Total number of fragments in peaks,Fraction of fragments in peaks,基因组黑名单区域reads比率。不要纠结于别人设置了什么样的阈值,在我看来,只要是合理的,阈值都是个性化调整的,要针对自己实际的数据情况,不要盲目。signac提供了两个作图,DensityScatter,FragmentHistogram,可以借助设定指控指标,不同的样本可以设置不同的质控标准。

#接下来计算质控指标atac = list(HC,AA,SD)names(atac) <- c("HC","AA","SD")
#计算nucleosome signal score,TSS enrichment score#add fraction of reads in peaks,# add blacklist ratiofor(i in seq_along(atac)){
atac[[i]] <- NucleosomeSignal(object = atac[[i]]) atac[[i]] <- TSSEnrichment(object = atac[[i]]) atac[[i]]$pct_reads_in_peaks <- atac[[i]]$peak_region_fragments / atac[[i]]$passed_filters * 100 atac[[i]]$blacklist_ratio <- FractionCountsInRegion(object = atac[[i]], assay = 'peaks',regions = blacklist_hg38_unified)
}
#plot一下nCount_peaks和TSS.enrichment密度图,并且展示5%,10%,90%,95%百分位上的数值,可以作为质控参考DensityScatter(atac[["HC"]], x = 'nCount_peaks', y = 'TSS.enrichment', log_x = TRUE, quantiles = TRUE)DensityScatter(atac[["AA"]], x = 'nCount_peaks', y = 'TSS.enrichment', log_x = TRUE, quantiles = TRUE)DensityScatter(atac[["SD"]], x = 'nCount_peaks', y = 'TSS.enrichment', log_x = TRUE, quantiles = TRUE)

#按照nucleosome_signal高低对细胞进行分组,并plot不同fragment频率分布图,也可以作为质控参考for(i in seq_along(atac)){ atac[[i]]$nucleosome_group <- ifelse(atac[[i]]$nucleosome_signal > 5, 'NS > 5', 'NS <5')}
FragmentHistogram(object = atac[["HC"]], group.by = 'nucleosome_group')FragmentHistogram(object = atac[["AA"]], group.by = 'nucleosome_group')FragmentHistogram(object = atac[["SD"]], group.by = 'nucleosome_group')

#plot VlnplotVlnPlot(object = atac[["SD"]], features = c('peak_region_fragments', 'TSS.enrichment', 'blacklist_ratio', 'nucleosome_signal', 'pct_reads_in_peaks'), pt.size = 0.1, ncol = 5)


#QCfor(i in seq_along(atac)){
atac[[i]] <- subset(x = atac[[i]], subset = peak_region_fragments > 1000 & peak_region_fragments < 20000 & pct_reads_in_peaks > 15 & blacklist_ratio < 0.005 & nucleosome_signal < 4 & TSS.enrichment > 3)
}

质控后,可以展示一下质控后数据的质量,这里我们与ArchR来一次联动,我觉得ArchR的可视化挺好,用它的函数可视化!

#展示质控后结果数据,这里我们与ArchR梦幻联动,主要是我觉得ArchR质控图可视化看着还可以# devtools::install_github("GreenleafLab/ArchR", ref="master", repos = BiocManager::repositories())
#TSS & nCount_peaks 密度图library(ArchR)library(cowplot)
densityplot_list <- list()for (i in seq_along(atac)) {
df <- data.frame("nFrags"=log10(atac[[i]]$nCount_peaks),"TSSEnrichment" = atac[[i]]$TSS.enrichment) p <- ggPoint( x = df[,1], y = df[,2], colorDensity = TRUE, continuousSet = "sambaNight", xlabel = "Log10(Unique Fragments)", ylabel = "TSS Enrichment")+ geom_hline(yintercept = 3, lty = "dashed") + geom_vline(xintercept = log10(min(atac[[i]]$nCount_peaks)), lty = "dashed")+ ggtitle(paste0(names(atac)[i],"\n","Cells Pass Filter =",ncol(atac[[i]])))
densityplot_list[[i]] <- p}

plot_grid(densityplot_list[[1]], densityplot_list[[2]],densityplot_list[[3]],ncol = 3)

#fragment频率分布图fix(FragmentHistogram)FragmentHistogram(object = atac[["HC"]], mycols = "#2488F0")+ggtitle("HC")+FragmentHistogram(object = atac[["AA"]], mycols = "#51127CFF")+ggtitle("AA")+FragmentHistogram(object = atac[["SD"]], mycols = "#B51F29")+ggtitle("SD")

降维聚类:

for(i in seq_along(atac)){    atac[[i]] <- RunTFIDF(atac[[i]])#归一化处理  atac[[i]] <- FindTopFeatures(atac[[i]], min.cutoff = 'q5')#特征值选择,对应scRNA就是高变基因  atac[[i]] <- RunSVD(atac[[i]])  }for(i in seq_along(atac)){    atac[[i]] <- RunUMAP(object = atac[[i]], reduction = 'lsi', dims = 2:30)  atac[[i]] <- FindNeighbors(object = atac[[i]], reduction = 'lsi', dims = 2:30)  atac[[i]] <- FindClusters(object = atac[[i]], verbose = FALSE, algorithm = 3, resolution = 0.8)#resolution也可以多设置,自行调整  }

DimPlot(atac[['HC']], label = T)+ggtitle('HC')+DimPlot(atac[['AA']], label = T)+ggtitle('AA')+DimPlot(atac[['SD']], label = T)+ggtitle('SD')
#add gene activity matrix
gene.activities <- list()for(i in seq_along(atac)){ gene.activities[[i]] <- GeneActivity(atac[[i]])}
#将ene activity matrix添加到seurat,添加一个新的assay RNA。for(i in seq_along(atac)){
atac[[i]][['RNA']] <- CreateAssayObject(counts = gene.activities[[i]]) atac[[i]] <- NormalizeData(object = atac[[i]],assay = 'RNA',normalization.method = 'LogNormalize',scale.factor = median(atac[[i]]$nCount_RNA)) }

接下来就可以对数据进行注释了!

觉得我们分享有些用的,点个赞再走呗!
关注我们获取精彩内容:


关注不迷路:扫描下面二维码关注公众号!
B站视频号链接https://space.bilibili.com/471040659?spm_id_from=333.1007.0.0




关注 KS科研分享与服务,

认清正版优质内容和服务!

优质内容持续输出,物超所值!

合作联系:ks_account@163.com

新的板块-重要通知-双向选择


KS科研分享与服务
科研学习交流于分享,生信学习笔记,科研经历和生活!
 最新文章