医学科研绘图:年底了,我还在做火山图返修论文,圣诞老人当场对我敬礼!

文摘   2024-12-28 09:03   荷兰  

1. 前言

我时不时的会收到后台私信,想看看火山图的化教程。我对火山图也很高兴去,医学数据的基因结果可视化常常用到。众所周知,火山图(Volcano Plot)可以同时展示效应大小和统计显著性,直观地揭示差异显著的基因、蛋白质或代谢物。火山图不仅在基因表达分析中应用广泛,还在蛋白质组学、代谢组学、GWAS 和药物研究中使用。

今天的更新我们将介绍火山图的概念,并结合 R 语言代码演示其在医学数据中的具体应用。

2. 医学数据中的应用与代码示例

2.1 差异基因表达分析

情境:比较病例组和对照组的基因表达水平,寻找显著上调或下调的基因。

# 加载必要的库
library(ggplot2)
library(ggrepel)

# 模拟数据
set.seed(123)
n <- 1000

# 非显著基因
non_sig <- data.frame(
Gene = paste0("Gene", 1:(n * 0.7)),
log2FC = rnorm(n * 0.7, mean = 0, sd = 0.5),
pvalue = runif(n * 0.7, min = 0.05, max = 1)
)

# 显著上调基因
upregulated <- data.frame(
Gene = paste0("Gene_up", 1:(n * 0.15)),
log2FC = rnorm(n * 0.15, mean = 2, sd = 0.5),
pvalue = runif(n * 0.15, min = 0, max = 0.01)
)

# 显著下调基因
downregulated <- data.frame(
Gene = paste0("Gene_down", 1:(n * 0.15)),
log2FC = rnorm(n * 0.15, mean = -2, sd = 0.5),
pvalue = runif(n * 0.15, min = 0, max = 0.01)
)

# 合并数据
gene_data <- rbind(non_sig, upregulated, downregulated)

# 添加显著性标签
gene_data$Significance <- with(gene_data, ifelse(
pvalue < 0.01 & abs(log2FC) > 1,
ifelse(log2FC > 0, "Upregulated", "Downregulated"),
"Not Significant"
))

# 绘制优化后的火山图
ggplot(gene_data, aes(x = log2FC, y = -log10(pvalue), color = Significance)) +
geom_point(alpha = 0.8, size = 2) +
scale_color_manual(values = c("Upregulated" = "#E41A1C",
"Downregulated" = "#377EB8",
"Not Significant" = "grey80")) +
theme_minimal(base_size = 15) +
labs(
title = "Differential Gene Expression Volcano Plot",
x = expression(log[2]~Fold~Change),
y = expression(-log[10]~p~value)
) +
theme(
legend.position = "top",
plot.title = element_text(hjust = 0.5, size = 18, face = "bold"),
axis.title = element_text(size = 14),
panel.grid.minor = element_blank()
) +
geom_text_repel(
data = subset(gene_data, Significance != "Not Significant"),
aes(label = Gene),
size = 3,
max.overlaps = 10
)

2.2 代谢组学研究

情境:分析代谢产物在不同组间的变化。

# 加载必要的库
library(ggplot2)
library(ggrepel)

# 模拟数据
set.seed(456)
metabolite_data <- data.frame(
Metabolite = paste0("Metabolite", 1:500),
log2FC = rnorm(500, mean = 0, sd = 1.5),
pvalue = runif(500, min = 0, max = 0.05)
)

# 添加显著性标签
metabolite_data$Significance <- with(metabolite_data, ifelse(
pvalue < 0.01 & abs(log2FC) > 1,
ifelse(log2FC > 0, "Increased", "Decreased"),
"Not Significant"
))

# 绘制火山图
ggplot(metabolite_data, aes(x = log2FC, y = -log10(pvalue), color = Significance)) +
geom_point(alpha = 0.9, size = 2.5) +
scale_color_manual(values = c("Increased" = "#FBB4AE",
"Decreased" = "#B3CDE3",
"Not Significant" = "grey70")) +
theme_classic(base_size = 15) +
labs(
title = "Metabolomics Volcano Plot",
x = expression(log[2]~Fold~Change),
y = expression(-log[10]~p~value)
) +
theme(
legend.position = "bottom",
plot.title = element_text(hjust = 0.5, size = 18, face = "bold"),
panel.background = element_rect(fill = "#F0F0F0"),
axis.line = element_line(size = 0.8, color = "black")
) +
geom_text_repel(
data = subset(metabolite_data, Significance != "Not Significant"),
aes(label = Metabolite),
size = 3,
max.overlaps = 10
)

2.3 GWAS(基因组关联研究)

情境:筛选与疾病相关的 SNP。

# 加载必要的库
library(ggplot2)
library(ggrepel)

# 模拟数据
set.seed(789)
snp_data <- data.frame(
SNP = paste0("rs", sample(1:10000, 1000)),
log2OR = rnorm(1000, mean = 0, sd = 1),
pvalue = runif(1000, min = 0, max = 0.05)
)

# 添加显著性标签
snp_data$Significance <- with(snp_data, ifelse(
pvalue < 0.001 & abs(log2OR) > 0.8,
ifelse(log2OR > 0, "Risk", "Protective"),
"Not Significant"
))

# 绘制火山图
ggplot(snp_data, aes(x = log2OR, y = -log10(pvalue), color = -log10(pvalue))) +
geom_point(alpha = 0.9, size = 2) +
scale_color_gradientn(colors = c("#0571B0", "#92C5DE", "#F4A582", "#CA0020")) +
theme_minimal(base_size = 15) +
labs(
title = "GWAS Volcano Plot",
x = expression(log[2]~Odds~Ratio),
y = expression(-log[10]~p~value),
color = expression(-log[10]~p~value)
) +
theme(
legend.position = "right",
plot.title = element_text(hjust = 0.5, size = 18, face = "bold"),
panel.grid.major = element_line(color = "grey80", size = 0.5)
) +
geom_text_repel(
data = subset(snp_data, Significance != "Not Significant"),
aes(label = SNP),
size = 3,
max.overlaps = 10
)

3. 小结

今天的更新,我们展示了如何使用 R 语言生成美学优化后的火山图。因为我使用的都是模拟数据,所以效果可能不是很直观。但是这些代码的参数是通用的,相信你使用你自己的数据,再结合我的代码,可以绘制出风格简洁、信息清晰,能够满足高质量科研要求的火山图。如果有任何问题,欢迎留言交流。

感谢关注,你的支持是我不懈的动力!

科研代码
专注R和Python的数据分析。
 最新文章