🔗scRNA-seq 、🔗scRNA-seq高级分析、🔗scATAC-seq、 🔗R包开发、🔗源码拆解、 🔗测试、🔗RNA-seq 、🔗其它生信分析、 🔗R语言 、🔗Python 、🔗环境配置 、🔗文献分享 、 🔗一只羊的碎碎念
和单细胞RNA-seq上游分析类似,在下游分析前我们也需要构建单细胞染色质可及性矩阵。不同的是,矩阵的行代表Peak,列代表细胞。本篇文章将分享使用10xGenomics官方软件Cell Ranger ATAC构建单细胞染色质可及性矩阵,包括scATAC上游分析环境的搭建,参考基因组索引的构建(对于非人/小鼠)与Cell×Peak 矩阵的构建。
目录 :
Cell Ranger ATAC软件简介 分析流程 Cell Ranger ATAC 软件的下载与安装 使用(mkfastq及mkref)
Part1Cell Ranger ATAC简介
主要包括四个与单细胞染色质可及性分析相关的pipeline: mkref(建库)、count(数据分析)、aggr和reanalyze。
cellranger-atac软件是用于处理10x Genomics平台Chromium Single Cell ATAC-seq测序数据的分析流程。该软件主要包括以下四个分析流程
:
cellranger-atac mkfastq:该子程序主要将Illumina测序仪产生的原始raw base call (BCL)测序文件转换为FASTQ文件,该命令中封装着 bcl2fastq
程序。cellranger-atac count:该子程序是cellranger-atac软件的主要分析流程,包括以下功能:1)Read filtering and alignment 2)Barcode counting 3)Identification of transposase cut sites 4)Detection of accessible chromatin peaks 5)Cell calling 6)Count matrix generation for peaks and transcription factors 7)Dimensionality reduction 8)Cell clustering 9)Cluster differential accessibility cellranger-atac aggr:该子程序可以将多个 cellranger-atac count
的分析结果进行整合处理(如,将一个实验的多个样本的分析结果进行整合),包括以下步骤:1)Normalization of input runs to same median fragments per cell (sensitivity) 2)Detection of accessible chromatin peaks 3)Count matrix generation for peaks and transcription factors for the aggregate data 4)Dimensionality reduction 5)Cell clustering 6)Cluster differential accessibilitycellranger-atac reanalyze:该子程序可以将 cellranger-atac count
或cellranger-atac aggr
的分析结果进行二次分析,可以微调一些参数进行重新分析:1)Cell calling 2)Dimensionality reduction 3)Cell clustering 4)Cluster differential accessibility
Part2分析流程
One Sample, One GEM Well, One Flowcell
这是最基本的分析流程,在该分析流程中,我们只有一个生物学样本,使用一个GEM well(a set of partitioned cells from a single 10x Chromium™ Chip channel)构建单个测序文库,并使用单个flowcell进行测序。得到FASTQ测序文件后,使用cellranger-atac count
子程序进行分析。
One Sample, One GEM well, Multiple Flowcells如果我们单个的测序文库使用多个flowcells(e.g. to increase sequencing saturation)进行测序,我们可以将不同flowcell产生的测序数据混合到一起,然后使用cellranger-atac count
子程序进行分析。
Part3Cell Ranger ATAC 软件的下载与安装
System Requirements 系统需求
https://support.10xgenomics.com/single-cell-atac/software/overview/system-requirements
Hardware 硬件需求
Cell Ranger ATAC pipelines run on Linux systems
that meet these minimum requirements:
1)8-core Intel or AMD processor (16 cores recommended)
2)64GB RAM (128GB recommended)
3)1TB free disk space
4)64-bit CentOS/RedHat 6.0 or Ubuntu 12.04
In order to run in cluster mode
, the cluster needs to meet these additional minimum requirements:
1)8-core Intel or AMD processor per node
2)6GB RAM per core
3)Shared file system (e.g. NFS)
4)SGE or LSF batch scheduling system
Software 软件需求
In order to run cellranger-atac mkfastq
, the following software needs to be installed:
1)Illumina® bcl2fastq: bcl2fastq must be version 2.17 or higher
. It supports most sequencers running RTA version 1.18.54 or higher. If you are using NovaSeq™, the pipelines require version 2.20 or higher. If your sequencer is running an older version of RTA, then the pipelines require bcl2fastq 1.8.4.
Resource Limits 系统资源需求
1)Cell Ranger ATAC runs with --jobmode=local
by default, using 90% of available memory and all available cores. To restrict resource usage, please see the --localmem
and --localcores
flags for cellranger-atac count
at the link here for more information.
2)Many Linux systems have default user limits (ulimits) for maximum open files and maximum user processes as low as 1024 or 4096. Because Cell Ranger ATAC spawns multiple processes per core
, jobs that use a large number of cores can exceed these limits. 10x Genomics recommends higher limits.
How CPU and Memory Affect Runtime
1)运行内存的大小对软件运行时间的影响Here is cellranger-atac count
walltime as a function of available memory
. In general, you can improve performance by allocating more than the minimum 64GB of memory
to the pipeline. There is notable diminishing return beyond 128GB.
2)CPU的个数对软件运行时间的影响Here's cellranger-atac count
walltime as a function of threads
. If your system has ≫32 logical cores, you may want to run with --localcores=32
since there is diminishing return beyond 32 threads.
下载cellranger-atac软件
Step 1 – Download the Cell Ranger ATAC file.
https://support.10xgenomics.com/single-cell-atac/software/downloads/latest
注意添加--no-check-certificate参数,示例使用wget命令下载。当前最新是2.1.0版本。
cd 2023/00.software/
wget -O cellranger-atac-2.1.0.tar.gz "https://cf.10xgenomics.com/releases/cell-atac/cellranger-atac-2.1.0.tar.gz?Expires=1717527954&Policy=eyJTdGF0ZW1lbnQiOlt7IlJlc291cmNlIjoiaHR0cHM6Ly9jZi4xMHhnZW5vbWljcy5jb20vcmVsZWFzZXMvY2VsbC1hdGFjL2NlbGxyYW5nZXItYXRhYy0yLjEuMC50YXIuZ3oiLCJDb25kaXRpb24iOnsiRGF0ZUxlc3NUaGFuIjp7IkFXUzpFcG9jaFRpbWUiOjE3MTc1Mjc5NTR9fX1dfQ__&Signature=eB9Jo707qUcZgcsrSHqyQu9S9mocOdYSgApDTpObIZwXPdFjJCB1ps9BAUODrjDuY8vw4ICR~tcdSDICnGW7QQ6CBNJBXN40j4QKHLpvXHZ64GpPkPwOn6QdtXJcPPvIeop37ZexR40ajv3TDlAwk5IKF-UVh92OGAha299GyXW8CpYcDHqhfzxBWbsAm4RWJiHT34QAyV5K1k~n5owbwCqsB-wjs28hyc-nl9aIxYulPp~-ZHY6JxKiqVfs32pich2JHUfFToNeKz8Y7XqdnI4HWuQzoDi5cOa~1SvxKod3tZkUzgxKGsQNelGGBpFVBsXXU0ecFWAaHWsADVz0CQ__&Key-Pair-Id=APKAI7S6A5RYOXBWRPDA" --no-check-certificate
Step 2 – Unpack the Cell Ranger ATAC file.
tar -xzvf cellranger-atac-2.1.0.tar.gz
Step 3 – Download the reference data files.
官网可以下载GRCh38、mm10、GRCh38_and_mm10三个Reference
比如:
wget https://cf.10xgenomics.com/supp/cell-atac/refdata-cellranger-arc-GRCh38-2020-A-2.0.0.tar.gz
Step 4 – Unpack the reference data files(解压缩).
tar -xzvf refdata-cellranger-arc-GRCh38-2020-A-2.0.0.tar.gz
将cellranger-atac软件添加到系统环境变量中。当然,不添加也可以指定软件路径就好。
export PATH=/opt/cellranger-atac-1.2.0:$PATH
Verify Installation 检查是否安装成功
cellranger-atac -h
cellranger-atac cellranger-atac-2.1.0
Process 10x Genomics Chromium Single Cell ATAC data
USAGE:
cellranger-atac <SUBCOMMAND>
OPTIONS:
-h, --help Print help information
-V, --version Print version information
SUBCOMMANDS:
count Count reads from a single Single Cell ATAC library
mkfastq Run bcl2fastq on Single Cell ATAC sequencing data
mkref Create a cellranger-atac-compatible reference package
aggr Aggregate data from multiple `cellranger-atac count` runs
reanalyze Re-run secondary analysis (dimensionality reduction, clustering, etc) on a completed
`cellranger-atac count` or `cellranger-atac aggr` run
testrun Run a tiny cellranger-atac count pipeline to verify software integrity
upload Upload analysis logs to 10x Genomics support
sitecheck Collect linux system configuration information
help Print this message or the help of the given subcommand(s)
运行测试数据
cellranger-atac testrun --id=tiny