MicrobiomeStatPlot | 热图结合柱状图展示组间差异Heatmap bar plot tutorial

学术   2024-11-07 07:02   广东  

简介

在微生物组分析过程中,常常有多组之间的比较,用常用的柱状图、箱线图等不能同时展现多个物种多组的差异,可以用热图展示微生物物种的组间差异显著性,并用不同的颜色确定物种是显著上升还是下降。同时为了配合显著性差异热图,可以用柱状图展示微生物物种在每个分组中的流行率和相对丰度。

标签:#微生物组数据分析  #MicrobiomeStatPlot  #热图结合柱状图展示组间差异  #R语言可视化 #Heatmap bar plot

作者:First draft(初稿):Defeng Bai(白德凤);Proofreading(校对):Ma Chuang(马闯) and Jiani Xun(荀佳妮);Text tutorial(文字教程):Defeng Bai(白德凤)

源代码及测试数据链接:

https://github.com/YongxinLiu/MicrobiomeStatPlot/项目中目录 3.Visualization_and_interpretation/HeatmapBarPlot

或公众号后台回复“MicrobiomeStatPlot”领取


热图结合柱状图展示组间差异分析案例

这是Damian R. Plichta团队2023年发表于Nature Microbiology上的一篇论文,论文题目为:Centenarians have a diverse gut virome with the potential to modulate metabolism and promote healthy lifespan. https://doi.org/10.1038/s41564-023-01370-6

图 3a |  百岁老人 (CE)、老年人和年轻参与者之间按细菌宿主分组的 vOTU 种群相对病毒丰度的变化(在物种水平上)。

热图显示差异丰富的 vOTU 种群(Wilcoxon 秩和检验,双侧,*Padj < 0.05 FDR 校正)。只有在 CE vs 老年人或 CE vs 年轻人群中发现富集或耗尽的 vOTU 种群,例如与 C. scindens 或 F. prausnitzii 相关的种群,才会被纳入最终的热图。色标表示 MaAslin2 中一般线性模型的 beta 系数,表示富集程度(红色)或耗尽程度(蓝色)。vOTU 按其流行程度(水平条形面板)排序,其中与 C. scindens 或 A. muciniphila 相关的病毒在 >40% 的 CE 样本中检测到,在 <10% 的年轻成人样本中检测到。

结果

使用病毒宿主标签,我们在物种水平上鉴定出与 Alistipes、Parabacteroides、Clostridium、Eggerthella、Ruminococcus 和 Akkermansia 相关的多发性(WRST,双侧,Padj < 0.05)病毒群(图 3a)。相反,我们发现百岁老人微生物群中拟杆菌和粪杆菌病毒显著减少(WRST,双侧,Padj < 0.05)(图 3a、c),这与这些先前描述的共生细菌的相对减少相吻合。

R语言实战

源代码及测试数据链接:

https://github.com/YongxinLiu/MicrobiomeStatPlot/

或公众号后台回复“MicrobiomeStatPlot”领取

软件包安装

# 基于CRAN安装R包,检测没有则安装 Installing R packages based on CRAN and installing them if they are not detectedp_list = c("ggplot2", "patchwork", "dplyr", "reshape2", "ggprism", "plyr",           "magrittr","ggfun","cowplot" )for(p in p_list){if (!requireNamespace(p)){install.packages(p)}    library(p, character.only = TRUE, quietly = TRUE, warn.conflicts = FALSE)}
# 加载R包 Loading R packagessuppressWarnings(suppressMessages(library(ggplot2)))suppressWarnings(suppressMessages(library(patchwork)))suppressWarnings(suppressMessages(library(dplyr)))suppressWarnings(suppressMessages(library(reshape2)))suppressWarnings(suppressMessages(library(ggprism)))suppressWarnings(suppressMessages(library(plyr)))suppressWarnings(suppressMessages(library(magrittr)))suppressWarnings(suppressMessages(library(ggfun)))suppressWarnings(suppressMessages(library(cowplot)))

计算物种流行率

A组、B组和C组流行率计算

# 计算流行率# Load datadata_fungi <- read.table("data/species_count_data.txt", sep = "\t", header = TRUE, check.names = FALSE)design_A <- read.table("data/design_A.txt", sep = "\t", header = TRUE, row.names = 1)design_B <- read.table("data/design_B.txt", sep = "\t", header = TRUE, row.names = 1)design_C <- read.table("data/design_C.txt", sep = "\t", header = TRUE, row.names = 1)difference_species <- read.table("data/Difference_species.txt", sep = "\t", header = TRUE, row.names = 1)
# Preprocess datarownames(data_fungi) <- data_fungi$Speciesdata_fungi <- data_fungi[, -1] %>% apply(2, function(x) x / sum(x))data_fungi02 <- as.data.frame(t(data_fungi))
# Function to process each design groupprocess_design <- function(design, data_fungi02, all_counts_value) {  data_fungi_group <- data_fungi02[rownames(data_fungi02) %in% rownames(design), ] %>% t() %>% as.data.frame()    zero_counts <- rowSums(data_fungi_group == 0)  data_species2 <- data_fungi_group %>%    mutate(zero_counts = zero_counts,           sample_percent = round(1 - zero_counts / all_counts_value, 6))
 data_species3 <- data_species2[rownames(data_species2) %in% rownames(difference_species), ]  return(data_species3)}
# Process each design group and write to CSVdata_species3_C <- process_design(design_A, data_fungi02, 30)write.csv(data_species3_C, "data/data_species3_prevalence_A.csv")
data_species3_E <- process_design(design_B, data_fungi02, 35)write.csv(data_species3_E, "data/data_species3_prevalence_B.csv")
data_species3_Y <- process_design(design_C, data_fungi02, 50)write.csv(data_species3_Y, "data/data_species3_prevalence_C.csv")

热图+柱状图

# 载入数据# Load datadata <- read.table("data/Difference_species2.txt",header = TRUE,row.names = 1,sep = "\t")data[which(data$FDR<0.05),'sig'] <- '*'# 排序# Set orderdata$Species2 = factor(data$Species, levels = unique(data$Species))data = data %>%  mutate(Species2 = ordered(Species2,                         levels=c("Species21",                                  "Species20","Species19",                                  "Species18","Species17",                                  "Species16","Species15","Species14",                                  "Species13",                                  "Species12","Species11","Species10",                                  "Species09","Species08","Species07",                                   "Species06","Species05","Species04",                                  "Species03","Species02","Species01"                                  )))# 绘图# Plotp1 <- ggplot(data, aes(Group,Species2)) +    geom_tile(aes(fill=B_coef), color="white") +      geom_text(aes(label=sig), color="black", size=6,vjust = 0.8) +   scale_fill_gradient2(low='#0085c1', high='red',mid = 'white', limit=c(-1.3,1.3),name=paste0("B-coef.")) +  labs(x=NULL,y=NULL) +    theme_classic()+  theme(axis.text.x = element_text(size=10,angle = 45,hjust = 1,color = "black"),                    axis.text.y = element_text(size=10,color = "black"),         axis.line.x = element_blank(),        axis.line.y = element_blank(),        axis.ticks.length = unit(2.0, "mm"),        panel.background=element_blank(),        legend.position = "left")#p1#ggsave(paste("results/age_fheatmap01",".pdf", sep=""), p1, width=89 * 1.5, height=180 * 1.5, unit='mm')# Bar plot prevalence# 柱状图展示每个物种的流行率imp_species <- read.table("data/Prevalence2.txt", header = TRUE, sep = "\t")imp_species <- imp_species %>%  mutate(Species2 = ordered(Species,                            levels = c("Species21", "Species20", "Species19", "Species18", "Species17",                                       "Species16", "Species15", "Species14", "Species13",                                       "Species12", "Species11", "Species10", "Species09",                                       "Species08", "Species07", "Species06", "Species05",                                       "Species04", "Species03", "Species02", "Species01")))# 获取所有唯一的组名groups <- unique(imp_species$Group)# 为每个组指定不同的颜色colors <- c("A" = "#ffad00", "B" = "#5196d5", "C" = "#00fc8d")# 使用 lapply 绘制每个组的图plots <- lapply(seq_along(groups), function(i) {  g <- groups[i]  p <- ggplot(subset(imp_species, Group == g), aes(x = Species2, y = Prevalence)) +    geom_bar(stat = "identity", fill = colors[g]) +    coord_flip() +    theme_classic() +    theme(text = element_text(family = 'sans', size = 14),          panel.grid.major = element_blank(),          panel.grid.minor = element_blank(),          legend.position = "none",          axis.line.x = element_blank(),          axis.line.y = element_blank(),          axis.title.y = element_blank(),          axis.text.x = element_text(size = 10, angle = 45, hjust = 1, color = "black"),          axis.text.y = element_blank()          ) +    geom_hline(yintercept = c(0.25, 0.5, 0.75, 1), colour = 'black', lwd = 0.36, linetype = "dashed") +    #scale_y_continuous(expand = expansion(0, 0), limits = c(0, 1), breaks = c(0, 0.25, 0.5, 0.75, 1)) +    #labs(title = paste("Prevalence for", g))  facet_set(label=paste(g))    if (i != 1) {    p <- p + theme(axis.text.y = element_blank(),                   axis.ticks.y = element_blank())  }    return(p)})# 使用 plot_grid 将三个图放在一起,并确保宽度一致final_plot <- plot_grid(plotlist = plots, ncol = 3, rel_widths = c(1, 1, 1))# 显示最终图形#print(final_plot)library(cowplot)p2 <- ggdraw() +  draw_plot(p1, 0, 0.01, .35, 0.945)+  draw_plot(final_plot, 0.350, -0.01, .58, 1.002)#p2pdf("results/age_fungi_heatmap_ra_bar1.pdf", height = 7.2, width = 8)p2dev.off()#> png #>   2# Bar plot relative abundance# 柱状图展示每个物种的相对丰度# 如果要按照相对丰度降序排列,请自行调整imp_species <- read.table("data/Prevalence2.txt", header = TRUE, sep = "\t")imp_species <- imp_species %>%  mutate(Species2 = ordered(Species,                            levels = c("Species21", "Species20", "Species19", "Species18", "Species17",                                       "Species16", "Species15", "Species14", "Species13",                                       "Species12", "Species11", "Species10", "Species09",                                       "Species08", "Species07", "Species06", "Species05",                                       "Species04", "Species03", "Species02", "Species01")))# 获取所有唯一的组名groups <- unique(imp_species$Group)# 为每个组指定不同的颜色colors <- c("A" = "#ffad00", "B" = "#5196d5", "C" = "#00fc8d")# 使用 lapply 绘制每个组的图plots <- lapply(seq_along(groups), function(i) {  g <- groups[i]  p <- ggplot(subset(imp_species, Group == g), aes(x = Species2, y = RA)) +    geom_bar(stat = "identity", fill = colors[g]) +    coord_flip() +    theme_classic() +    theme(text = element_text(family = 'sans', size = 14),          panel.grid.major = element_blank(),          panel.grid.minor = element_blank(),          legend.position = "none",          axis.line.x = element_blank(),          axis.line.y = element_blank(),          axis.title.y = element_blank(),          axis.text.x = element_text(size = 10, angle = 45, hjust = 1, color = "black"),          axis.text.y = element_blank()          ) +    geom_hline(yintercept = c(0.05, 0.1, 0.15, 0.20), colour = 'black', lwd = 0.36, linetype = "dashed") +    #scale_y_continuous(expand = expansion(0, 0), limits = c(0, 1), breaks = c(0, 0.25, 0.5, 0.75, 1)) +    #labs(title = paste("Prevalence for", g))  facet_set(label=paste(g))    if (i != 1) {    p <- p + theme(axis.text.y = element_blank(),                   axis.ticks.y = element_blank())  }    return(p)})# 使用 plot_grid 将三个图放在一起,并确保宽度一致final_plot2 <- plot_grid(plotlist = plots, ncol = 3, rel_widths = c(1, 1, 1))# 显示最终图形#print(final_plot)library(cowplot)p3 <- ggdraw() +  draw_plot(p1, 0, 0.01, .35, 0.945)+  draw_plot(final_plot2, 0.350, -0.01, .58, 1.002)#p2pdf("results/age_fungi_heatmap_ra_bar2.pdf", height = 7.2, width = 8)p3dev.off()#> png #>   2

使用此脚本,请引用下文:

Yong-Xin Liu, Lei Chen, Tengfei Ma, Xiaofang Li, Maosheng Zheng, Xin Zhou, Liang Chen, Xubo Qian, Jiao Xi, Hongye Lu, Huiluo Cao, Xiaoya Ma, Bian Bian, Pengfan Zhang, Jiqiu Wu, Ren-You Gan, Baolei Jia, Linyang Sun, Zhicheng Ju, Yunyun Gao, Tao Wen, Tong Chen. 2023. EasyAmplicon: An easy-to-use, open-source, reproducible, and community-based pipeline for amplicon data analysis in microbiome research. iMeta 2: e83. https://doi.org/10.1002/imt2.83

Copyright 2016-2024 Defeng Bai baidefeng@caas.cn, Chuang Ma 22720765@stu.ahau.edu.cn, Jiani Xun 15231572937@163.com, Yong-Xin Liu liuyongxin@caas.cn

宏基因组推荐
本公众号现全面开放投稿,希望文章作者讲出自己的科研故事,分享论文的精华与亮点。投稿请联系小编(微信号:yongxinliu 或 meta-genomics)

猜你喜欢

iMeta高引文章 fastp 复杂热图 ggtree 绘图imageGP 网络iNAP
iMeta网页工具 代谢组MetOrigin 美吉云乳酸化预测DeepKla
iMeta综述 肠菌菌群 植物菌群 口腔菌群 蛋白质结构预测

10000+:菌群分析 宝宝与猫狗 梅毒狂想曲 提DNA发Nature

系列教程:微生物组入门 Biostar 微生物组  宏基因组

专业技能:学术图表 高分文章 生信宝典 不可或缺的人

一文读懂:宏基因组 寄生虫益处 进化树 必备技能:提问 搜索  Endnote

扩增子分析:图表解读 分析流程 统计绘图

16S功能预测   PICRUSt  FAPROTAX  Bugbase Tax4Fun

生物科普:  肠道细菌 人体上的生命 生命大跃进  细胞暗战 人体奥秘  

写在后面

为鼓励读者交流快速解决科研困难,我们建立了“宏基因组”讨论群,己有国内外6000+ 科研人员加入。请添加主编微信meta-genomics带你入群,务必备注“姓名-单位-研究方向-职称/年级”。高级职称请注明身份,另有海内外微生物PI群供大佬合作交流。技术问题寻求帮助,首先阅读《如何优雅的提问》学习解决问题思路,仍未解决群内讨论,问题不私聊,帮助同行。

点击阅读原文


宏基因组
宏基因组/微生物组是当今世界科研最热门的研究领域之一,为加强本领域的技术交流与传播,推动中国微生物组计划发展,中科院青年科研人员创立“宏基因组”公众号,目标为打造本领域纯干货技术及思想交流平台。
 最新文章