专注收集和自写可发表的科研图形的数据和代码分享,该系列的数据均可从以下链接下载:
百度云盘链接: https://pan.baidu.com/s/1M4vgU1ls0tilt0oSwFbqYQ
提取码: 请关注WX公zhong号 生信学习者 后台发送 科研绘图 获取提取码
介绍
Circos图是一种数据可视化工具,它以圆形布局展示数据,通常用于显示数据之间的关系和模式。这种图表特别适合于展示分层数据或网络关系。Circos图的一些关键特点包括:
圆形布局:数据被组织在一个或多个同心圆中,每个圆可以代表不同的数据维度或层次。
扇区:每个圆被划分为若干扇区,每个扇区可以代表一个数据项或类别。扇区的大小、颜色或标签可以表示数据的不同属性。
链接:数据项之间的连接可以用线条或曲线表示,显示它们之间的关系或相互作用。
交互性:虽然传统的Circos图可能不具备交互性,但现代的实现(如使用R语言的Circos包)可以增加交互功能,允许用户通过鼠标悬停或点击来获取更多信息。
加载R包
knitr::opts_chunk$set(warning = F, message = F)
library(tidyverse)
library(grid)
library(circlize)
library(ComplexHeatmap)
grp <- c("HTN", "Normal")
color_group <- c("#EE2B2B", "#2D6BB4")
color_sample <- c('#8DD3C7', '#FFFFB3', '#BEBADA', '#FB8072',
'#80B1D3', '#FDB462', '#B3DE69', '#FCCDE5',
'#BC80BD', '#CCEBC5', '#FFED6F', '#E41A1C')
color_phylum <- c('#BEAED4', '#FDC086', '#FFFF99', '#386CB0', '#F0027F')
color_otu <- c("#6C326C", "#77A2D1", "#FFD169", "#635F5F", "#D4D09A",
"#993116", "#6798CE", "#146666", "#CE9924", "#6D659D",
"#9F9B27", "#6D659D", "#9F9B27", "#C80b8A", "#2C3A89",
"#C8C5C5", "#90E2BF", "#FDAB4D", "#F4F4E8", "#B054BF",
"#FCE873", "#FFCCDB", "#AFD300", "#B089D8", "#F96E6F",
"#AAD3ED", "#639BCE")
导入数据
数据可从以下链接下载:
百度云盘链接: https://pan.baidu.com/s/1M4vgU1ls0tilt0oSwFbqYQ
提取码: 请关注WX公zhong号 生信学习者 后台发送 科研绘图 获取提取码
phen <- read.csv("./inputdata/46-phenotype.csv")
head(phen)
SampleID | Group | Read_Count |
---|---|---|
HTN1 | HTN | 86275 |
HTN2 | HTN | 96151 |
HTN3 | HTN | 84364 |
HTN4 | HTN | 97406 |
HTN5 | HTN | 85496 |
HTN6 | HTN | 89507 |
phylum <- read.csv("./inputdata/46-phylum_count.csv") %>%
tibble::column_to_rownames("tax")
head(phylum)
HTN1 | HTN2 | Normal1 | Normal2 | |
---|---|---|---|---|
p_Actinobacteriota | 557 | 361 | 176 | 195 |
p_Bacteroidota | 9672 | 21555 | 31353 | 33061 |
p_Campilobacterota | 111 | 438 | 888 | 363 |
p_Cyanobacteria | 34 | 30 | 22 | 13 |
p_Deferribacterota | 0 | 5 | 0 | 0 |
p_Desulfobacterota | 103 | 231 | 168 | 679 |
family <- read.csv("./inputdata/46-family_count.csv") %>%
tibble::column_to_rownames("tax")
head(family)
HTN1 | HTN2 | HTN3 | |
---|---|---|---|
p_Actinobacteriota.c_Actinobacteria.o_Bifidobacteriales.f_Bifidobacteriaceae | 6 | 0 | 30 |
p_Actinobacteriota.c_Actinobacteria.o_Micrococcales.f_Micrococcaceae | 0 | 3 | 0 |
p_Actinobacteriota.c_Coriobacteriia.o_Coriobacteriales.f_Atopobiaceae | 109 | 12 | 25 |
p_Actinobacteriota.c_Coriobacteriia.o_Coriobacteriales.f_Coriobacteriaceae | 92 | 27 | 10 |
p_Actinobacteriota.c_Coriobacteriia.o_Coriobacteriales.f_Eggerthellaceae | 254 | 306 | 185 |
p_Actinobacteriota.c_Coriobacteriia.o_Coriobacteriales.f_uncultured | 96 | 13 | 0 |
数据预处理
最高丰度的门水平物种
num <- 5
merge_phy <- phen %>%
dplyr::select(SampleID, Group) %>%
dplyr::inner_join(phylum %>%
t() %>% data.frame() %>%
tibble::rownames_to_column("SampleID"),
by = "SampleID")
merge_phy_top <- merge_phy %>%
dplyr::select(-c("SampleID", "Group")) %>%
dplyr::summarise_each(mean) %>%
tidyr::gather(key="tax", value="value") %>%
dplyr::arrange(desc(value)) %>%
dplyr::slice(c(1:num)) %>%
dplyr::mutate(tax=as.character(tax))
head(merge_phy_top)
tax | value |
---|---|
p_Firmicutes | 39684.5000 |
p_Bacteroidota | 24314.8333 |
p_Spirochaetota | 4497.5833 |
p_Proteobacteria | 950.1667 |
p_Actinobacteriota | 346.1667 |
分组信息
group_info <- phen %>%
dplyr::select(SampleID, Group)
head(group_info)
SampleID | Group |
---|---|
HTN1 | HTN |
HTN2 | HTN |
HTN3 | HTN |
HTN4 | HTN |
HTN5 | HTN |
HTN6 | HTN |
函数
get_tax_prf
: 获取画图表达谱,该函数主要执行以下步骤:
输入处理:接受三个参数:
prof
(一个数据框或矩阵),num
(一个数字,指定返回的行数),class
(一个数字,指定在分类学层次中选择的级别)。数据转换:将
prof
转换为一个带有行名的tibble,并按行名(分类学名称)分组。提取分类学信息:从每个分类学名称中提取门(phylum)和类(class指定的级别)。
数据清洗:过滤掉包含“uncultured”或以“_”结尾的OTU(操作单元)。
合并数据:将
prof
数据与phen
数据(假设是一个包含样本ID和组信息的数据框)合并,仅保留匹配的样本ID。计算均值:计算每个样本的均值,并按值降序排列,选取前
num
个。最终过滤:根据计算出的均值,过滤出相应的OTU和分类学信息。
输出结果:返回一个包含两个数据框的列表:
otutab
(OTU表)和taxtab
(分类学表)。其中,otutab
包含OTU ID和相应的数据,taxtab
包含详细的分类学信息和OTU ID。
get_tax_prf <- function(
prof,
num,
class) {
prf <- prof %>%
tibble::rownames_to_column("tax") %>%
dplyr::group_by(tax) %>%
dplyr::mutate(phylum = unlist(strsplit(tax, ".", fixed = TRUE))[1],
otu = unlist(strsplit(tax, ".", fixed = TRUE))[class]) %>%
dplyr::ungroup()
prf_cln <- prf %>%
dplyr::filter(phylum%in%merge_phy_top$tax) #%>%
# dplyr::slice(-grep("uncultured", otu)) %>%
# dplyr::slice(-grep("_$", otu))
rm_index <- grep("uncultured|_$", prf_cln$otu)
if (length(rm_index) > 0) {
prf_cln <- prf_cln[-rm_index, ]
}
mdat <- dplyr::inner_join(phen %>%
dplyr::select(SampleID, Group),
prf_cln %>%
dplyr::select(-c("tax", "phylum")) %>%
tibble::column_to_rownames("otu") %>%
t() %>% data.frame() %>%
tibble::rownames_to_column("SampleID"),
by = "SampleID")
mdat_mean <- mdat %>%
dplyr::select(-c("SampleID", "Group")) %>%
dplyr::summarise_each(mean) %>%
tidyr::gather(key="tax", value="value") %>%
dplyr::arrange(desc(value)) %>%
dplyr::slice(c(1:num)) %>%
dplyr::mutate(tax=as.character(tax))
prf_final <- prf_cln %>%
dplyr::filter(otu%in%mdat_mean$tax)
otu_table <- prf_final %>%
dplyr::select(-c("tax", "phylum")) %>%
dplyr::rename("OTU_ID"="otu") %>%
dplyr::mutate(OTU_ID=factor(OTU_ID, levels = mdat_mean$tax))
tax_table <- prf_final %>%
dplyr::select(c("tax", "phylum", "otu")) %>%
setNames(c("detail", "phylum", "OTU_ID")) %>%
dplyr::mutate(detail=gsub("\\.", ";", detail)) %>%
dplyr::mutate(phylum=factor(phylum, levels = merge_phy_top$tax))
res <- list(otutab = otu_table,
taxtab = tax_table)
return(res)
}
circos_fun
: 画图circos,该函数主要执行以下步骤:
输入处理:接受两个参数:
mdat_table
(一个包含OTU表和分类学表的列表),tag
(一个标签,用于生成文件名)。数据准备:从输入的列表中提取OTU表和分类学表,并获取独特的门类和样本组。
数据合并:将OTU表和分类学表合并,并按门类和OTU ID排序。
计算累积值:计算OTU和样本的累积值,用于后续的可视化布局。
初始化Circos:设置Circos绘图参数,如起始角度、间隔大小等。
绘制区域:在Circos上绘制多个区域,用于显示不同的数据和文本。
突出显示:根据门类和样本组,突出显示相应的扇区。
绘制链接:在OTU和样本之间绘制链接,表示它们之间的关系。
绘制矩形:在相应的扇区内绘制矩形,表示OTU和样本的累积值。
绘制图例:在图表的一角绘制OTU的图例,显示详细的分类学信息。
清理:完成绘图后,清除Circos绘图环境。
circos_fun <- function(
mdat_table,
tag) {
taxonomy <- mdat_table$taxtab
otu_table <- mdat_table$otutab
tax_phylum <- unique(taxonomy$phylum)
all_group <- unique(group_info$Group)
# merge table
otutab <- dplyr::inner_join(taxonomy, otu_table, by="OTU_ID") %>%
dplyr::arrange(phylum, OTU_ID) %>%
tibble::column_to_rownames("OTU_ID") %>%
dplyr::select(group_info$SampleID)
# extra parameters
all_otu <- rownames(otutab)
all_sample <- group_info$SampleID
all_ID <- c(all_otu, all_sample)
accum_otu <- rowSums(otutab)
accum_sample <- colSums(otutab)
all_ID_xlim <- cbind(rep(0, length(all_ID)), data.frame(c(accum_otu, accum_sample)))
# inter parameters
otutab$otu_ID <- all_otu
plot_data <- otutab %>%
tidyr::gather(key="SampleID", value="value", -"otu_ID") %>%
dplyr::mutate(otu_ID=factor(otu_ID, levels = all_otu),
SampleID=factor(SampleID, levels = all_sample)) %>%
dplyr::arrange(otu_ID, SampleID)
plot_data <- plot_data[c(2, 1, 3, 3)]
names(color_otu) <- all_otu
names(color_sample) <- all_sample
# name <- paste0(tag, "_circlize_plot.pdf")
# pdf(name, width = 20, height = 8)
circle_size <- unit(1, "snpc")
gap_size <- c(rep(3, length(all_otu) - 1), 6, rep(3, length(all_sample) - 1), 6)
circos.par(cell.padding = c(0, 0, 0, 0), start.degree = 270, gap.degree = gap_size)
circos.initialize(factors = factor(all_ID, levels = all_ID), xlim = all_ID_xlim)
circos.trackPlotRegion(
ylim = c(0, 1), track.height = 0.03, bg.border = NA,
panel.fun = function(x, y) {
sector.index = get.cell.meta.data("sector.index")
xlim = get.cell.meta.data("xlim")
ylim = get.cell.meta.data("ylim")
} )
for (i in 1:length(tax_phylum)) {
tax_OTU <- {subset(taxonomy, phylum == tax_phylum[i])}$OTU_ID
highlight.sector(tax_OTU, track.index = 1, col = color_phylum[i],
text = tax_phylum[i], cex = 0.5, text.col = "black",
niceFacing = FALSE)
}
for (i in 1:length(all_group)) {
group_sample <- {subset(group_info, Group == all_group[i])}$SampleID
highlight.sector(group_sample, track.index = 1, col = color_group[i],
text = all_group[i], cex = 0.7, text.col = 'black',
niceFacing = FALSE)
}
circos.trackPlotRegion(
ylim = c(0, 1), track.height = 0.05, bg.border = NA,
panel.fun = function(x, y) {
sector.index = get.cell.meta.data('sector.index')
xlim = get.cell.meta.data('xlim')
ylim = get.cell.meta.data('ylim')
} )
circos.track(
track.index = 2, bg.border = NA,
panel.fun = function(x, y) {
xlim = get.cell.meta.data('xlim')
ylim = get.cell.meta.data('ylim')
sector.name = get.cell.meta.data('sector.index')
xplot = get.cell.meta.data('xplot')
by = ifelse(abs(xplot[2] - xplot[1]) > 30, 0.25, 1)
for (p in c(0, seq(by, 1, by = by))){
circos.text(p*(xlim[2] - xlim[1]) + xlim[1], mean(ylim) + 0.4,
paste0(p*100, '%'), cex = 0.4, adj = c(0.5, 0), niceFacing = FALSE)
}
circos.lines(xlim, c(mean(ylim), mean(ylim)), lty = 3)
} )
circos.trackPlotRegion(
ylim = c(0, 1), track.height = 0.03,
bg.col = c(color_otu, color_sample),
bg.border = NA, track.margin = c(0, 0.01),
panel.fun = function(x, y) {
xlim = get.cell.meta.data('xlim')
sector.name = get.cell.meta.data('sector.index')
circos.axis(h = 'top', labels.cex = 0.4,
major.tick.percentage = 0.4, labels.niceFacing = FALSE)
circos.text(mean(xlim), 0.2, sector.name, cex = 0.4,
niceFacing = FALSE, adj = c(0.5, 0))
} )
circos.trackPlotRegion(ylim = c(0, 1), track.height = 0.03, track.margin = c(0, 0.01))
for (i in seq_len(nrow(plot_data))) {
circos.link(
plot_data[i,2], c(accum_otu[plot_data[i,2]],
accum_otu[plot_data[i,2]] - plot_data[i,4]),
plot_data[i,1], c(accum_sample[plot_data[i,1]],
accum_sample[plot_data[i,1]] - plot_data[i,3]),
col = paste0(color_otu[plot_data[i,2]], '70'), border = NA )
circos.rect(accum_otu[plot_data[i,2]], 0,
accum_otu[plot_data[i,2]] - plot_data[i,4], 1,
sector.index = plot_data[i,2],
col = color_sample[plot_data[i,1]], border = NA)
circos.rect(accum_sample[plot_data[i,1]], 0,
accum_sample[plot_data[i,1]] - plot_data[i,3], 1,
sector.index = plot_data[i,1],
col = color_otu[plot_data[i,2]], border = NA)
accum_otu[plot_data[i,2]] = accum_otu[plot_data[i,2]] - plot_data[i,4]
accum_sample[plot_data[i,1]] = accum_sample[plot_data[i,1]] - plot_data[i,3]
}
otu_legend <- Legend(
at = all_otu, labels = taxonomy$detail, labels_gp = gpar(fontsize = 8),
grid_height = unit(0.5, 'cm'), grid_width = unit(0.5, 'cm'),
type = 'points', pch = NA, background = color_otu)
pushViewport(viewport(x = 0.85, y = 0.5))
grid.draw(otu_legend)
upViewport()
circos.clear()
# dev.off()
}
运行
family_table <- get_tax_prf(prof = family, num = 15, class = 4)
# pdf("family_circlize_plot.pdf", width = 20, height = 8)
circos_fun(mdat_table = family_table, tag = "family")
# dev.off()
结果:不同分组样本的family水平的分布图(可以通过pdf函数输出结果图,另外circos图不适合太多样本的数据展示)
最外圈表示分组标签和物种phylum水平标签;
次外圈表示样本标签和物种family水平标签;
内圈表示不同样本的物种family水平组成和不同物种family的样本组成;
圈内的连线表示物种和样本之间的组成关系