🔗单细胞测序 、🔗scRNA-seq高级分析、🔗scATAC-seq、 🔗R包开发、🔗源码拆解、 🔗测试、🔗RNA-seq 、🔗其它生信分析、 🔗R语言 、🔗Python 、🔗环境配置 、🔗文献分享 、 🔗一只羊的碎碎念
单细胞多组学在植物研究中的发文情况 中可以看到,除单细胞转录组和空间转录组,单细胞表观组在植物中也有所应用。比如2023年在Genome Biology上发表的这篇题为Cell-specific Clock Controlled Gene Expression Program Regulates Rhythmic Fiber Cell Growth in Cotton的研究论文,就结合了scRNA-seq、LCM-seq、bulk RNA-seq和scATAC-seq,首次发现棉纤维细胞的早期生长受到节律钟的调控,并且发现了节律钟下游控制的两个重要因子:小肽GhRALF1和转录因子GhTCP14,它们分别通过特异性地控制纤维细胞中的生长素信号、膜外pH及线粒体和蛋白翻译的代谢活动而调控纤维细胞的早期发育过程。对于数据分析方面,联合bulk rna-seq数据分析鉴定棉纤维细胞,首次发现其早期生长受到节律钟调控的分析思路值得参考。
生物学问题
棉花胚珠表皮产生纤维,是全球纺织工业最重要的天然纤维素来源。由于棉花胚珠组织结构复杂,早期发育的纤维细胞难以分离,对于棉纤维早期起始的研究一直局限于组织研究水平,也极大限制了对棉花纤维初期发育细胞转录特征的分析和发育机制的解析。
2016年,随着单细胞测序技术的发展成熟,研究团队在植物领域较早地开展高通量单细胞测序技术的应用探索。通过一年半的尝试,该团队成功创立了从棉花胚珠外被制备原生质体细胞的酶解方案(PTED,Partial Tissue Enzymatic Digestion)的方法。团队应用PTED方法成功获得陆地棉徐州142及其无绒突变体开花前后(-2至+2DPA)的胚珠外珠被的原生质体,并进行单细胞转录组和染色质开放性测序。
棉花胚珠外珠被单细胞转录图谱的构建
测序
scRNA-seq(BGISEQ-500)、 scATAC-seq(MGISEQ-2000)、LCM‑seq(DNBseq-T7),都是华大的平台。
主要数据分析过程
上游 cellranger v3
下游:scran
空细胞:Empty cells were filtered out by the emptyDrops function in the DropletUtils package with FDR cutoff = 0.01 and unique molecular identifier (UMI) cutoff = 1000 or 4000 for 10× Genomics v2 or v3 libraries
QC :First, the cells with a total UMI count larger than the median + 2 × MAD (median absolute deviation) value were removed. Second, for each cell, the numbers of detected and expressed genes were required to be in the range of 2.5–97.5% quantiles. The detected and expressed genes were defined as those with UMI counts in one cell ≥ 1 or ≥ 2, respectively
去批次:Batchelor
差异分析:The candidates of marker genes were predicted using findMarkers functions in scran package with the threshold (FDR < 1E− 9 and Foldchange > 2) for WT,fl, and merged data, separately
marker基因的鉴定:For TFs included in PlantTFDB, all of those candidate maker genes were considered marker genes. For a specific cluster, non-TF genes were considered as marker genes when satisfying the following three conditions: (1) The minimum expression ratio of this cluster in WT, fl, and merged data was greater than 0.01; (2) The minimum expression ratio was greater than 1.5 times the maximum expression ratio of other clusters in WT, fl, and merged data; (3) The maximum expression ratio in other clusters was less than 0.1.
基于metacell生成 pseudo-bulk,然后进行相关性分析:In comparing scRNA-seq and fiber sequencing data, we divided the UMAP of scRNA-seq data into different bins with a granularity of 0.5 × 0.5, and merged cells of each bin together to generate pseudo-bulk RNA-seq data. The Spearman correlation coefficients between fiber RNA-seq and pseudo-bulk RNA-seq data were computed for all bins separately. 多方面确定了C3是纤维细胞。
j,k The gene expression correlation analysis between the cells in the UMAP and 1 DPA fiber cells of LCM-seq (j), and 5 DPA fiber cells of bulk RNA-seq (k) in three replicates. The Spearman correlation coefficients are shown for the cells in the UMAP of scRNA-seq from WT (top) and fl (bottom). The dotted line circle marks the C3 cell cluster from scRNA-seq
(f) UMAP projection of estimated expression time for scRNA-seq cells. (g-k) UMAP projection of the correlation coefficient between scRNA-seq and time-course RNA-seq data.
节律表达基因的鉴定:对WT和fl不同时间点的bulk RNA-seq数据,通过差异分析,获得具有日-夜模式变化的基因,鉴定节律表达基因
棉纤维细胞高表达的小肽GhRALF1在体外实验中显示出对纤维早期生长的显著抑制作用,这引起了研究人员的极大兴趣。通过节律表达基因和定量PCR验证,发现棉纤维细胞的早期生长是一个精细调控的昼夜节律过程,而小肽GhRALF1作为一种生长速率控制的“变阻器”,可能通过影响生长素信号和纤维细胞的胞外pH动态而节律性地控制棉纤维的生长。
对于scATAC-seq数据分析部分,很多数据质量不好/没有明显分群的都会当成bulk ATAC-seq数据去做:
上游:cellranger-atac program (v1.2.0) with default parameters
call peak:The peaks were called on the merged reads from single cells using MACS2 (v2.1.1) with default parameters for WT and fl samples separately
ACR count matrix:The peaks from WT and fl samples were merged together using BEDTools (v2.29.0) as active chromatin regions (ACRs) and the UMI counts for all the ACRs were computed for individual cells.
Based on the ACR count matrix, we used SCALE (v1.0.2) and Seurat (v3.2.0) with different settings for clustering the cells and found that the cells were not grouped into distinct clusters
We thus treated scATAC-seq data as one cluster and used the AverageExpression and FindAllMarkers functions in the Seurat library to perform quantitative and differential expression analysis for the ACRs, respectively.
peak和gene关联:The peak counts within the gene body and the 1000 bp region upstream of the transcription start site were summed together as the ATAC-seq signals at the gene level.
最后进行motif分析,进一步鉴定出两个非常重要的顺式调控元件(CREs):TCP motif和TCP-like motif;并发现它们可与转录因子GhTCP14结合,可能通过调控纤维细胞中近1/3的高表达基因来节律性地调节棉纤维细胞的线粒体能量代谢和蛋白质翻译系统。
基于上述发现,研究团队提出了一个棉纤维早期生长的节律调控模型。在棉纤维细胞中,核心节律器(生物钟,ClockOscillators)通过控制棉纤维细胞特异性表达的基因(纤维特异性的生物钟控制基因,Fiber-specific clock controlled genes,CCGs),以节律性地调控线粒体能量,核糖体翻译,生长素响应等多个生理过程,从而控制棉纤维细胞的生长。
纤维细胞中节律调控模型
该项研究不仅是领域内首次发现棉纤维早期发育的生物钟控制现象,还首次在棉纤维细胞这一单细胞水平揭示了植物生物钟控制的“细胞特异性”机制,为生物节律的研究提供了一个全新的视角。它揭示了一条通过特异性改造棉纤维细胞的生物钟来改善纤维性状的新途径,为棉花产量和品质的遗传改良新策略应用提供了理论依据。
※标准分析|标准分析流程
※标准分析|Read10X源码拆解
标准分析|自动获得QC阈值
标准分析|污染处理SoupX
※细胞分化|轨迹分析基本概念1
※细胞分化|轨迹分析基本概念2
※细胞分化|monocle1原理
※细胞分化|monocle2原理
细胞分化|解决monocle2报错
细胞分化|Cytotrace分析
细胞分化|使用VECTOR进行无监督发育方向推断
※细胞分化|单细胞可变剪切分析全流程(基于velocyto.R)
细胞分化|不同scVelo模型
细胞分化|使用GeneTrajectory进行基因轨迹分析
※富集分析|基于TBtools&R语言进行富集分析及可视化
富集分析|更新clusterprofiler包
富集分析|基因ID格式转换
富集分析|水稻富集分析
※富集分析|植物组织特异性干细胞通路获取
※可视化|Featureplot函数进阶
※可视化|DotPlot函数进阶
※可视化|给你的Dotplot添加聚类及其它统计信息
分享内容:分子标记开发及种质资源鉴定、单细胞多组学数据分析、生信编程、算法原理、文献分享与复现等...
点个赞再走!