孟德尔随机化分析简介
参考:
https://mp.weixin.qq.com/s/opx5-pGVerVDtEIZzRziEQ
在19世纪,孟德尔用豌豆花作为实验材料,通过对豌豆花颜色、形状等特征的观察和统计分析,发现了遗传的基本规律,这就是孟德尔定律。不过,孟德尔定律只适用于单基因的遗传性状,并且无法解释复杂的多基因遗传疾病。此外,孟德尔定律也无法解释环境因素对基因表达的影响,以及基因与环境的相互作用。为了解决这个问题,著名统计学家Fisher提出了孟德尔随机化的概念。
孟德尔随机化Mendelian randomization, MR是一种基于遗传变异的因果推断方法,其基本原理是利用自然界中的随机分配的基因型对表型的影响来推断生物学因素对疾病的影响。
举个例子: 假设我们了解体重BMI对冠心病发病的影响。但是对冠心病影响的因素太多了,比如高血压、高血糖等,我们怎么才能排除混在因素,确定为体重BMI对冠心病有影响呢?我们先要选定一个工具基因变量M,这个M要和我们研究变量X相关,和我们解决变量和混杂因素无关。最后我们通过MR分析得出M基因对Y有影响,因为M基因对Y没有直接关联,所以M基因是通过影响暴露因素X而后从而对Y产生影响。这一连成线的过程可以看作四类似一个中介效应,暴漏因素X就是中介变量,基因M通过对中介变量X的影响达到对Y的影响。
要做孟德尔随机化要有3个假设前提: 1.基因M要和体重(暴露因素)强相关联。-关联性假设 2.基因M和结局变量冠心病和其他混杂因素没有关联。-独立性假设 3.基因M只能通过影响体重对冠心病造成影响,不能通过其他途径对冠心病造成影响。-排他性假设
目前主流使用的孟德尔随机化的工具为TwoSampleMR包。TwoSampleMR软件包的使用可参考https://mrcieu.github.io/TwoSampleMR/articles/perform_mr.html
标签:#微生物组数据分析 #MicrobiomeStatPlot #孟德尔随机化分析 #R语言可视化 #Mendelian Randomization Analysis
作者:First draft(初稿):Defeng Bai(白德凤);Proofreading(校对):Ma Chuang(马闯) and Jiani Xun(荀佳妮);Text tutorial(文字教程):Defeng Bai(白德凤)
源代码及测试数据链接:
https://github.com/YongxinLiu/MicrobiomeStatPlot/项目中目录 3.Visualization_and_interpretation/MendelianRandomizationAnalysis
或公众号后台回复“MicrobiomeStatPlot”领取
这是来自于深圳华大基因Tao Zhang课题组2022年发表于Nature Genetics上的一篇文章。论文题目为:Mendelian randomization analyses support causal relationships between blood metabolites and the gut microbiome. https://doi.org/10.1038/s41588-021-00968-y
图 5 | 颤杆菌属和 Alistipes 对降低血液甘油三酯浓度的因果影响。
a,MR分析结果的示意图:对颤杆菌较高丰度的遗传易感性与血液甘油三酯浓度降低有关,在较小程度上与降低 BMI 和 WHR 有关。b,森林图表示颤杆菌丰度每增加 1 s.d. 对血液甘油三酯、BMI 和 WHR 的影响,分别使用观察性和 MR分析估计。使用多元线性模型在总共 2,545 个样本(紫色)中执行观察性相关性分析。使用由多达 134 个遗传预测因子构建的 PRS 作为工具变量进行单样本 MR 分析,分别在发现(蓝色)和复制(红色)队列中估计。列出了观察和单样本 MR 分析的 beta 估计值和 95% 置信区间 (CI) 值以及 P值。c、d、森林图分别表示使用单样本 MR 和六种不同的双样本 MR 方法在发现 (蓝色) 和复制 (红色) 队列中估计的 Oscillibacter (c) 和 Alistipes (d) 对甘油三酯含量的因果影响的 MR 估计值和 95% CI 值。还列出了通过每种 MR 方法计算的 P 值。
结果
最显著的因果效应是颤杆菌降低血液甘油三酯浓度(图 5a-c),在较小程度上降低 BMI 和腰臀比 (WHR),而血浆丙氨酸的影响是双向的。使用 134 个遗传变异构建多基因风险评分 (PRS)(134 个遗传变异和构建的 PRS 分别解释了 39.3% 和 49.6% 的表型变异;图 3b 和补充表 10)在发现队列中进行单样本 MR 分析,我们估计每个 1 s.d. Oscillibacter 丰度的增加会导致 WHR 降低 0.261 mmol l-1(P = 2.53 × 10-10)、甘油三酯浓度降低 0.161 kg m-2 和 0.126 比率(P = 2.73 × 10-3)BMI 降低(P = 1.33 × 10-4)。当进行四次双样本 MR 测试时,这种因果关系是稳健的(PGCTA-GSMR(全基因组复杂性状分析-广义总结 MR)Pinverse_variance_weighted = 2.45 × 10-15= 4.34 × 10-11,Pweighted-median = 1.22 × 10-7,和 PMR-Egger = 1.35 × 10-5)(图 5c),并且没有水平多效性的证据(PMR-PRESSOGlobaltest = 0.18;补充表 11)。此外,Alistipes 相对丰度较高也与血液甘油三酯浓度降低有关(图 5d)。
源代码及测试数据链接:
https://github.com/YongxinLiu/MicrobiomeStatPlot/
或公众号后台回复“MicrobiomeStatPlot”领取
软件包安装
# 基于CRAN安装R包,检测没有则安装 Installing R packages based on CRAN and installing them if they are not detected
p_list = c("ggplot2", "reshape2", "dplyr", "cowplot","readxl")
for(p in p_list){if (!requireNamespace(p)){install.packages(p)}
library(p, character.only = TRUE, quietly = TRUE, warn.conflicts = FALSE)}
# 基于github安装
library(remotes)
if(!requireNamespace("TwoSampleMR", quietly = TRUE))
remotes::install_github("MRCIEU/TwoSampleMR")
library(devtools)
if(!requireNamespace("MRPRESSO", quietly = TRUE))
remotes::install_github("rondolab/MR-PRESSO")
# 加载R包 Loading R packages
suppressWarnings(suppressMessages(library(ggplot2)))
suppressWarnings(suppressMessages(library(reshape2)))
suppressWarnings(suppressMessages(library(cowplot)))
suppressWarnings(suppressMessages(library(dplyr)))
suppressWarnings(suppressMessages(library(readxl)))
suppressWarnings(suppressMessages(library(TwoSampleMR)))
suppressWarnings(suppressMessages(library(MRPRESSO)))
实战1
参考:https://mp.weixin.qq.com/s/opx5-pGVerVDtEIZzRziEQ
# 1.获得暴露因素X的SNP数据,包括体重BMI的SNP数据和结局冠心病的SNP数据。通常通过各种GWAS数据库或者GWAS文献找到。或者通过https://gwas.mrcieu.ac.uk/datasets/这个数据库寻找,利用这个数据库可以直接通过TwoSampleMR包下载。进入数据库网站后,在网站Traitcontains选项中里填入体重指数,然后筛选数据,这里分析选择了ieu-a-835这个数据。然后再找冠心病的数据,这里选的是ieu-a-7这个数据。然后,需要下载ieu-a-835和ieu-a-7这两个数据,先使用extract_instruments函数对暴露数据进行下载。如果是已经下载到电脑里面的数据,使用read_exposure_data函数和clump_data函数读取。
# 从数据库中查找用于 MR 分析的数据
library(TwoSampleMR)
exposure_dat <- extract_instruments(outcomes='ieu-a-835',#access_token = NULL,
options(ieugwasr_api = 'gwas-api.mrcieu.ac.uk/'),
force_server = FALSE,
p1 = 5e-08,
clump = TRUE,
p2 = 5e-08,
r2 = 0.001,
kb = 10000)
# 查看exposure_dat的SNP数据
exposure_dat$SNP
#> [1] "rs543874" "rs11165643" "rs17024393" "rs3101336" "rs2820292"
#> [6] "rs657452" "rs6656785" "rs1528435" "rs7599312" "rs10182181"
#> [11] "rs13021737" "rs2121279" "rs12986742" "rs1016287" "rs2365389"
#> [16] "rs3849570" "rs13078960" "rs16851483" "rs6804842" "rs1516725"
#> [21] "rs13107325" "rs11727676" "rs10938397" "rs17001654" "rs2112347"
#> [26] "rs13191362" "rs2033529" "rs9400239" "rs205262" "rs2207139"
#> [31] "rs1167827" "rs2245368" "rs17405819" "rs2033732" "rs10968576"
#> [36] "rs6477694" "rs1928295" "rs4740619" "rs10733682" "rs7903146"
#> [41] "rs7899106" "rs17094222" "rs2176598" "rs4256980" "rs3817334"
#> [46] "rs12286929" "rs11030104" "rs11057405" "rs7138803" "rs12429545"
#> [51] "rs9579083" "rs10132280" "rs7141420" "rs16951275" "rs3736485"
#> [56] "rs879620" "rs758747" "rs9926784" "rs3888190" "rs4889606"
#> [61] "rs1558902" "rs12940622" "rs1000940" "rs1808579" "rs6567160"
#> [66] "rs17066856" "rs29941" "rs17724992" "rs2287019"
# 因为ieu-a-7这个数据是冠心病数据,所以这一步要在ieu-a-7这个数据中找到与上面65个SNP匹配的数据,这样就生成了结果数据
outcome_dat <- extract_outcome_data(snps=exposure_dat$SNP,
outcomes='ieu-a-7',
options(ieugwasr_api = 'gwas-api.mrcieu.ac.uk/'),
proxies = FALSE,
maf_threshold =0.01#,
#access_token =NULL
)
# 2.效应等位与效应量保持统一
dat<- harmonise_data(
exposure_dat = exposure_dat,
outcome_dat = outcome_dat)
#以上2步完成就可以进行MR分析。mr默认使用五种方法(MR Egger, Weighted median, Inverse variance weighted, Simple mode, Weighted mode)
res<- mr(dat)
# 在res数据表中,b是效应值,se是标准误,pval是P值,最重要的就是看Inverse variance weighted这个方法的P值。这里P值小于0.05,表明体重和冠心病解决是有统计意义的。
# 敏感性分析
mr_heterogeneity(dat)
# 水平多效性检验,如果变量工具不通过暴露影响结果,就违反了孟德尔的假设,就是存在多水平效应。
mr_pleiotropy_test(dat)
# Leave-one-out analysis,Leave-one-out analysis是指逐步剔除SNP后观察剩余的稳定性,理想的是剔除后变化不大。
res_loo <- mr_leaveoneout(dat)
#mr_leaveoneout_plot(res_loo)
# 可视化,散点图,可以看出趋势是正相关
p1 <-mr_scatter_plot(res, dat)
# p1
# 森林图
res_single<- mr_singlesnp(dat)
# mr_forest_plot(res_single)
# 漏斗图中的不对称性对于衡量特定 MR 分析的可靠性很有用。可以使用单个 SNP 结果生成漏斗图,如下所示:
# mr_funnel_plot(res_single)
实战2
参考:https://mp.weixin.qq.com/s/_1Zj5hAK8PvCSlW388Uffw
参考论文题目:Mendelian randomization analysis does not reveal a causal influence of mental diseases on osteoporosis.https://doi.org/10.3389/fendo.2023.1125427
library(MRPRESSO)
# 导入数据进行分析
Ins<-data<- read_excel("data/FA_BMD_data.xlsx",1)
# 暴漏数据
aaa<- extract_instruments(outcomes='ieu-a-22',clump=TRUE, r2=0.001,kb=10000)
# 数据不是标准格式,需要修改数据格式
outcome_dat<-format_data(Ins,
type = "outcome", header = TRUE,
phenotype_col ="Phenotype", snp_col = "SNPs",
beta_col ="beta.outcome",
se_col ="se.outcome",
eaf_col = "eaf.outcome",
effect_allele_col ="effect_allele.outcome",
other_allele_col ="other_allele.outcome",
pval_col = "pval.outcome")
# 进行数据效应等位与效应两保持统一,然后进行MR分析
Mydata<- harmonise_data(aaa, outcome_dat)
res<-mr(Mydata,method_list=c("mr_ivw", "mr_ivw_fe",
"mr_two_sample_ml","mr_egger_regression", "mr_weighted_median",
"mr_penalised_weighted_median","mr_simple_mode", "mr_weighted_mode"))
# 进行MR-PRESSO检验,是一个多水平效应检验
mr_presso(BetaOutcome="beta.outcome",
BetaExposure ="beta.exposure", SdOutcome ="se.outcome",
SdExposure = "se.exposure",OUTLIERtest =TRUE,DISTORTIONtest = TRUE,
data =Mydata, NbDistribution = 1000,SignifThreshold = 0.05)
#> $`Main MR results`
#> Exposure MR Analysis Causal Estimate Sd T-stat P-value
#> 1 beta.exposure Raw 0.02720285 0.01820984 1.493855 0.1390517
#> 2 beta.exposure Outlier-corrected NA NA NA NA
#>
#> $`MR-PRESSO results`
#> $`MR-PRESSO results`$`Global Test`
#> $`MR-PRESSO results`$`Global Test`$RSSobs
#> [1] 82.80165
#>
#> $`MR-PRESSO results`$`Global Test`$Pvalue
#> [1] 0.529
# 异质性检验
mr_heterogeneity(Mydata,
method_list=c("mr_egger_regression", "mr_ivw"))
# 多水平校验
pleio<- mr_pleiotropy_test(Mydata)
# Leave-one-out analysis,Leave-one-out analysis是指逐步剔除SNP后观察剩余的稳定性,理想的是剔除后变化不大,这和我们的meta分析剔除法很相似。
single<- mr_leaveoneout(Mydata)
# mr_leaveoneout_plot(single)
# 散点图
# mr_scatter_plot(res,Mydata)
# 绘制森林图
res_single<- mr_singlesnp(Mydata)
# mr_forest_plot(res_single)
# 绘制漏斗图,主要是看蓝线周围的散点是否对称
# mr_funnel_plot(res_single)
# 最后生成OR和置信区间
out<-generate_odds_ratios(res)
绘制孟德尔随机化结果图
# Load data
MR01 <- read.table(file = "data/data_single.txt", sep = "\t", header = TRUE, check.names = FALSE)
# Plot
# 绘图
Set <- "A"
col.palette <- setNames(c("#762a83"), c(Set))
p1 <- ggplot(MR01, aes(x = Total, y = Causal_effects, color = Set
)) +
geom_point(size = 6, shape = 22, fill = "#945893", stroke = 0.8) + # Larger point size with border
geom_errorbar(aes(xmin = Total - error_bar, xmax = Total + error_bar, colour = Set),
width = 0.3, size = 0.8) + # Error bars with adjusted width and size
labs(y = "", x = "Estimate (95% CI)") +
geom_vline(xintercept = 0, linetype = "dashed", color = "black", size = 0.7) +
scale_color_manual(values = col.palette) +
theme_classic(base_size = 14) + # Use a classic theme for a clean look
theme(
panel.grid.major = element_blank(), # Remove major grid lines
panel.grid.minor = element_blank(), # Remove minor grid lines
axis.text = element_text(color = 'black', size = 12),
axis.title = element_text(size = 14, face = "bold"),
legend.position = "none",
legend.title = element_text(size = 14, face = "bold"),
legend.text = element_text(size = 12),
plot.title = element_text(size = 16, face = "bold", hjust = 0.5)
)
# Save the plot
# 图保存
ggsave(filename = "results/Error_bar_plot01.pdf", plot = p1, width = 8, height = 6, useDingbats = FALSE, limitsize = FALSE)
# 分组对比(双)
MR02 <- read.table(file = "data/data_double.txt", sep = "\t", header = TRUE, check.names = FALSE)
# Plot
# 绘图
Set1 <- "A"
Set2 <- "B"
col.palette <- setNames(c("#762a83","#5aae61"), c(Set1, Set2))
p2 <- ggplot(MR02, aes(x = Total, y = Causal_effects, fill = Set, color = Set
)) +
geom_point(size = 6, shape = 22, stroke = 0.8, position = position_dodge(0.5)) + # Larger point size with border
geom_errorbar(aes(xmin = Total - error_bar, xmax = Total + error_bar, colour = Set), position = position_dodge(0.5),
width = 0.3, size = 0.8) + # Error bars with adjusted width and size
labs(y = "", x = "Estimate (95% CI)") +
geom_vline(xintercept = 0, linetype = "dashed", color = "black", size = 0.7) +
#geom_hline(yintercept = c(3.5, 6.5, 9.5), linetype = "dashed", color = "black", size = 0.7) +
scale_color_manual(values = col.palette) +
scale_fill_manual(values = col.palette) +
theme_classic(base_size = 14) + # Use a classic theme for a clean look
theme(
panel.grid.major = element_blank(), # Remove major grid lines
panel.grid.minor = element_blank(), # Remove minor grid lines
axis.text = element_text(color = 'black', size = 12),
axis.title = element_text(size = 14, face = "bold"),
legend.position = "none",
legend.title = element_text(size = 14, face = "bold"),
legend.text = element_text(size = 12),
plot.title = element_text(size = 16, face = "bold", hjust = 0.5)
)
# Save the plot
# 图保存
ggsave(filename = "results/Error_bar_plot02.pdf", plot = p2, width = 8, height = 6, useDingbats = FALSE, limitsize = FALSE)
# 分组对比(单+双)
MR03 <- read.table(file = "data/data_single_double.txt", sep = "\t", header = TRUE, check.names = FALSE)
# Plot
# 绘图
Set1 <- "A"
Set2 <- "B"
Set3 <- "C"
col.palette <- setNames(c("#762a83","#5aae61","#5791c9"), c(Set1, Set2, Set3))
p3 <- ggplot(MR03, aes(x = Total, y = Causal_effects, fill = Set, color = Set
)) +
geom_point(size = 6, shape = 22, stroke = 0.8, position = position_dodge(0.5)) + # Larger point size with border
geom_errorbar(aes(xmin = Total - error_bar, xmax = Total + error_bar, colour = Set), position = position_dodge(0.5),
width = 0.3, size = 0.8) + # Error bars with adjusted width and size
labs(y = "", x = "Estimate (95% CI)") +
geom_vline(xintercept = 0, linetype = "dashed", color = "black", size = 0.7) +
scale_color_manual(values = col.palette) +
scale_fill_manual(values = col.palette) +
theme_classic(base_size = 14) + # Use a classic theme for a clean look
theme(
panel.grid.major = element_blank(), # Remove major grid lines
panel.grid.minor = element_blank(), # Remove minor grid lines
axis.text = element_text(color = 'black', size = 12),
axis.title = element_text(size = 14, face = "bold"),
legend.position = "none",
legend.title = element_text(size = 14, face = "bold"),
legend.text = element_text(size = 12),
plot.title = element_text(size = 16, face = "bold", hjust = 0.5)
)
# Save the plot
# 图保存
ggsave(filename = "results/Error_bar_plot03.pdf", plot = p3, width = 8, height = 6, useDingbats = FALSE, limitsize = FALSE)
# 分组对比(单+双+空白行)
MR04 <- read.table(file = "data/data_single_double2.txt", sep = "\t", header = TRUE, check.names = FALSE)
# Plot
# 绘图
Set1 <- "A"
Set2 <- "B"
Set3 <- "C"
col.palette <- setNames(c("#762a83","#5aae61","#5791c9"), c(Set1, Set2, Set3))
MR04 = MR04 %>%
mutate(Causal_effects = ordered(Causal_effects,
levels=c("Oscillibacter WHR","Oscillibacter BMI","Oscillibacter triglyceride",
"Observational analysis","Analysis2","TIVM",
"TSLS","Analysis1"
)))
p4 <- ggplot(MR04, aes(x = Total, y = Causal_effects, fill = Set, color = Set
)) +
geom_point(size = 6, shape = 22, stroke = 0.8, position = position_dodge(0.5)) + # Larger point size with border
geom_errorbar(aes(xmin = Total - error_bar, xmax = Total + error_bar, colour = Set), position = position_dodge(0.5),
width = 0.3, size = 0.8) + # Error bars with adjusted width and size
labs(y = "", x = "Estimate (95% CI)") +
geom_vline(xintercept = 0, linetype = "dashed", color = "black", size = 0.7) +
scale_color_manual(values = col.palette) +
scale_fill_manual(values = col.palette) +
theme_classic(base_size = 14) + # Use a classic theme for a clean look
theme(
panel.grid.major = element_blank(), # Remove major grid lines
panel.grid.minor = element_blank(), # Remove minor grid lines
axis.text = element_text(color = 'black', size = 12),
axis.title = element_text(size = 14, face = "bold"),
legend.position = "none",
legend.title = element_text(size = 14, face = "bold"),
legend.text = element_text(size = 12),
plot.title = element_text(size = 16, face = "bold", hjust = 0.5)
)
# Save the plot
# 图保存
ggsave(filename = "results/Error_bar_plot04.pdf", plot = p4, width = 8, height = 6, useDingbats = FALSE, limitsize = FALSE)
排版Combo plots
组合多个子图为发表格式
library(cowplot)
width = 89
height = 59
p0 = plot_grid(p1, p2, p3, p4, labels = c("A", "B", "C", "D"), ncol = 2)
ggsave("results/MR_forest.pdf", p0, width = width * 3, height = height * 3, units = "mm")
使用此脚本,请引用下文:
Yong-Xin Liu, Lei Chen, Tengfei Ma, Xiaofang Li, Maosheng Zheng, Xin Zhou, Liang Chen, Xubo Qian, Jiao Xi, Hongye Lu, Huiluo Cao, Xiaoya Ma, Bian Bian, Pengfan Zhang, Jiqiu Wu, Ren-You Gan, Baolei Jia, Linyang Sun, Zhicheng Ju, Yunyun Gao, Tao Wen, Tong Chen. 2023. EasyAmplicon: An easy-to-use, open-source, reproducible, and community-based pipeline for amplicon data analysis in microbiome research. iMeta 2: e83. https://doi.org/10.1002/imt2.83
Copyright 2016-2024 Defeng Bai baidefeng@caas.cn, Chuang Ma 22720765@stu.ahau.edu.cn, Jiani Xun 15231572937@163.com, Yong-Xin Liu liuyongxin@caas.cn
猜你喜欢
iMeta高引文章 fastp 复杂热图 ggtree 绘图imageGP 网络iNAP
iMeta网页工具 代谢组MetOrigin 美吉云乳酸化预测DeepKla
iMeta综述 肠菌菌群 植物菌群 口腔菌群 蛋白质结构预测
10000+:菌群分析 宝宝与猫狗 梅毒狂想曲 提DNA发Nature
一文读懂:宏基因组 寄生虫益处 进化树 必备技能:提问 搜索 Endnote
16S功能预测 PICRUSt FAPROTAX Bugbase Tax4Fun
生物科普: 肠道细菌 人体上的生命 生命大跃进 细胞暗战 人体奥秘
写在后面
为鼓励读者交流快速解决科研困难,我们建立了“宏基因组”讨论群,己有国内外6000+ 科研人员加入。请添加主编微信meta-genomics带你入群,务必备注“姓名-单位-研究方向-职称/年级”。高级职称请注明身份,另有海内外微生物PI群供大佬合作交流。技术问题寻求帮助,首先阅读《如何优雅的提问》学习解决问题思路,仍未解决群内讨论,问题不私聊,帮助同行。
点击阅读原文