前面我们给大家分享了一个综述,非常全面的描述了ATAC-Seq数据分析每一步的各种小工具,见《综述:ATAC-Seq 数据分析工具大全》。这次我们再给大家介绍一个综述,这个综述介绍了一种更新和优化的ATAC-seq协议,称为Omni-ATAC,文献信息如下:
标题:Chromatin accessibility profiling by ATAC-seq
发表:Nat Protoc. 2022 Apr 27;17(6):1518–1552.
DOI: 10.1038/s41596-022-00692-9
链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9189070/
Omni-ATAC协议概述
ATAC-seq需要相对较少的输入细胞,并且不需要预先了解调控系统动态的表观遗传标记或转录因子。在此,作者描述了一种更新和优化的ATAC-seq协议,称为Omni-ATAC,适用于广泛的细胞和组织类型。本协议详细介绍了生成和测序ATAC-seq文库的步骤,并对样本制备和下游生物信息学分析提出了建议。ATAC-seq工作流程主要包括五个步骤:
sample preparation:样本制备 transposition:转座 library preparation:文库制备 sequencing:测序 data analysis:数据分析
如下图所示,图中还包括了每个步骤大约需要的时间:
与其他技术比较
现有的用于绘制DNA调控元件的技术种类繁多,在特定应用中选择最适当且信息量最大的技术变得具有挑战性。在下表中,作者比较了用于绘制DNA调控元件的最常用技术的一些技术和实验方面:ATAC-seq、DNase-seq、MNase-seq、ChIP-seq和靶向CUT&TAG,以帮助新用户决定哪种检测方法最适合他们的特定应用。
选取的原则:
(i)回答特定研究问题需要哪种类型的信息 (ii)可用的输入材料是什么类型。
一般来说,表观基因组分析适用于回答细胞类型或组织可能表现出基因调控变化的“如何”或“为什么”这类问题。对于主要涉及“发生了什么变化”的问题,我们建议从RNA测序开始。
ATAC-seq | DNase-seq | MNase-seq | CUT&TAG or related ChIC techniques | |
---|---|---|---|---|
酶的种类 | Tn5 | endonuclease | endonuclease and exonuclease | Tn5 conjugated to an antibody via Protein A. |
是否存在测序偏倚? | Yes; complex, Tn5 insertion bias, with preference for A/Ts in insertion site and C/Gs flanking133-135 | Yes; complex, partially dependent on enzyme concentration and on methylation status of CpGs85,136 | Yes; preferential cutting upstream of A/T compared to G/C137,138 | Yes; dictated by antibody used to guide Tn5 and by Tn5 bias. |
标准分析中输入的细胞/细胞核数 | 500-50,000 | 1-10 million | 10,000-100,000 | 100,000-500,000 |
是否有低起始量/单细胞方法可用? | Yes86,87; commercial solutions available. | Yes67 | Yes66 | Yes62,64,139-141 |
样本类型 | Fresh or cryopreserved cells or nuclei. Fresh or frozen tissues. | Fresh or cryopreserved cells or nuclei. Fresh or frozen tissues. Formaldehyde cross-linked or formalin-fixed paraffin-embedded samples. | Fresh or cryopreserved cells or nuclei. Fresh or frozen tissues. Formaldehyde cross-linked samples. | Fresh or cryopreserved cells or nuclei. Fresh or frozen tissues. |
文库准备时间 | ~10 hours for 12 samples (this protocol) | 1-3 days | ~ 2-days | 1-2 days |
技术考量 | Library quality is highly dependent on cell viability. Protocol alterations are required for use on fixed cells and data quality is often reduced for those samples. | Enzyme concentration and digestion duration may need to be optimized to sample type. Size of fragments selected affects downstream analysis.28 | Enzyme concentration and digestion duration may need to be optimized to sample type. Apparent nucleosome occupancy is a function of MNase concentration. | The amount of antibody used must be titrated for the cell type or sample. This will be a function of the strength of the antibody and the abundance of the target protein. The assay is as specific as the primary antibody used. Additionally, this is a targeted technique, so additional libraries must be made of each modification or protein tested. |
测序类型 | Paired-end | Single-end | Single-end | Single-end or paired-end |
测序深度 | Low; 10 million read-pairs per sample with Omni-ATAC. | Medium/high: 20-50 million uniquely mapping reads per sample; 200 million for TF footprinting. | High; 150-200 million reads per sample (human)142 | Very low; 3 million read-pairs per sample. |
数据产量 | Tn5-accessible chromatin; | DNase-accessible chromatin; TF footprinting. | Nucleosome positioning, inaccessible chromatin. | Location of target on DNA. |
主要优势 | Links labeling of accessible regions and NGS library preparation, making preparation of library straightforward. | Footprinting analysis. | Method of choice for nucleosome positioning and quantitative nucleosome dynamics. | Enables mapping of specific TF or histone modification in low cell numbers. Some histone modifications, like H3K27ac, can be used to look for active enhancers. |
与以前的 ATAC-seq 方法比较
早期的 ATAC-Seq 方法中仍存在多个不足之处。例如,
由于线粒体DNA未被染色质化,如果ATAC-seq反应中有裂解的线粒体存在,会导致大量ATAC-seq测序读段映射到线粒体DNA上。 在许多细胞类型和情境中,低信噪比使得将ATAC-seq应用于某些实验系统变得困难甚至不可能
作者针对上述一些情况,之前开发了一种通用且优化的ATAC-seq方法,称为Omni-ATAC,它解决了许多限制ATAC-seq广泛应用的细胞或情境特异性问题。
Omni-ATAC协议开发
Omni-ATAC 协议通过减少比对到线粒体 DNA 的 reads ,并提高各种细胞系、组织和冷冻样本中的信噪比,改进了原始的ATAC-seq方法。这一改进是通过优化细胞裂解、细胞核分离和转座反应实现的。Omni-ATAC协议中的优化措施通过添加Tween-20和皂角苷(digitonin),以及传统的Nonidet P40(NP40),使得多种细胞类型的裂解成为可能。
Experimental Design
1、输入材料的准备
适用于多种哺乳动物细胞和组织类型: 以低至500个细胞(或细胞核),用50,000个细胞能够获得最佳结果 样本最好为:新鲜或冷冻保存的完整细胞或细胞核 不适用 固定石蜡包埋(FFPE)组织 生物学重复与技术重复:当资源有限时,建议使用生物复制,而不是技术复制;如果获取生物重复有限,可以 最好进行2-3次技术复制
2、ATAC-seq文库的质量控制
作者强烈建议通过低深度测序(每样本5万到10万条读对)来确定最终ATAC-seq文库的质量。ATAC-seq文库生成的成功与否取决于四个关键因素:
(i)转座酶插入在已知染色质可及区域的富集程度(信噪比) (ii)唯一片段的总数(文库复杂度) (iii)比对到细胞核基因组的比对率与线粒体基因组比对率 (iv)文库插入片段大小分布
下图,如
(e)一个成功的ATAC-seq文库:具有较高的转录起始位点(TSS)富集评分,但在Bioanalyzer电泳图中观察到的核小体周期性不明显 (f)一个不成功的ATAC-seq文库:具有较低的TSS富集评分,且在Bioanalyzer电泳图中没有明显的核小体周期性
3、测序参数指导
测序应用 | Insight gained | 最短read长度† | Index 长度* | 双端还是单端 | 测序数据量(reads数/样本) |
---|---|---|---|---|---|
Gene regulatory landscape profiling | Peaks, differential peaks between samples, motif analysis of peaks | 36 bp | 8 | Paired | 10M |
Genotyping | Gene regulatory landscape + genotype of sample; useful for patient samples and to determine if sequence variants affect a peak. | 100 bp | 8 | Paired | 10M |
Footprinting Analysis | Footprinting of different TFs to determine binding sequence at base-pair resolution | 36 bp | 8 | Paired | 200M |
Nucleosome occupancy | Location of nucleosomes along DNA | 36 bp | 8 | Paired | 60M |
更加详细的要求可以参考原文。
数据分析
测序完成后,作者建议使用公开可用的分析流程来执行比对和下游分析,比如:
PEPATAC 流程:https://pepatac.databio.org/en/latest/ ENCODE:https://www.encodeproject.org/atac-seq/ nf-core:https://nf-co.re/
上述三种分析管道的对比:
Step/Process | ENCODE ATAC-seq | PEPATAC | nf-core atacseq |
---|---|---|---|
用于比较的版本 | v1.10.0 | v0.10.0 | v1.2.1 |
运行环境 | Cromwell/caper | Pypiper | Nextflow |
去接头, 比对以及去重 | Cutadapt、bowtie2、Picard | TRIMMOMATIC、skewer、bowtie2、BWA、samblaster、Picard | TrimGalore、BWA、Picard |
Tn5偏移校正 | Yes | Yes | No |
线粒体基因过滤 | Yes | Yes | Yes |
Peak calling 方法 | MACS2 | MACS2 (default), F-seq、Genrich | MACS2 |
方法 | Based on the irreproducible discovery rate (IDR) for replicates – does not merge for a whole set of samples | Fixed-width, iterative overlap | Raw peak overlap using bedtools109 merge |
输出结果 | BAM files, bigwig files (one representing fold enrichment over expected background and the other representing statistical significance), BED file of peaks for each file and for the merged peak set | QC plots including alignment scoring, TSS scores and library complexity, BED peaks and counts, bam files, bigwig files (nucleotide resolution and smoothed) | QC html report, bam files, normalized bigwig files, BED peaks, annotation of peaks (HOMER), merged peak set, differential accessibility (DESeq2), IGV output. |
代码地址 | https://github.com/ENCODE-DCC/atac-seq-pipeline | https://github.com/databio/pepatac | https://github.com/nf-core/atacseq |
下游分析中peaks合并策略:
Single-cell ATAC-seq
Omini-ATAC 是专门为 bulk ATAC-seq 设计的,单细胞的 ATAC-seq 可以参考成熟的商业化应用如 10X Genomics。