MicrobiomeStatPlot | 孟德尔随机化分析教程Mendelian Randomization Analysis

学术   2024-11-16 07:37   广东  

孟德尔随机化分析简介

参考:

https://mp.weixin.qq.com/s/opx5-pGVerVDtEIZzRziEQ

在19世纪,孟德尔用豌豆花作为实验材料,通过对豌豆花颜色、形状等特征的观察和统计分析,发现了遗传的基本规律,这就是孟德尔定律。不过,孟德尔定律只适用于单基因的遗传性状,并且无法解释复杂的多基因遗传疾病。此外,孟德尔定律也无法解释环境因素对基因表达的影响,以及基因与环境的相互作用。为了解决这个问题,著名统计学家Fisher提出了孟德尔随机化的概念。

孟德尔随机化Mendelian randomization, MR是一种基于遗传变异的因果推断方法,其基本原理是利用自然界中的随机分配的基因型对表型的影响来推断生物学因素对疾病的影响。

举个例子: 假设我们了解体重BMI对冠心病发病的影响。但是对冠心病影响的因素太多了,比如高血压、高血糖等,我们怎么才能排除混在因素,确定为体重BMI对冠心病有影响呢?我们先要选定一个工具基因变量M,这个M要和我们研究变量X相关,和我们解决变量和混杂因素无关。最后我们通过MR分析得出M基因对Y有影响,因为M基因对Y没有直接关联,所以M基因是通过影响暴露因素X而后从而对Y产生影响。这一连成线的过程可以看作四类似一个中介效应,暴漏因素X就是中介变量,基因M通过对中介变量X的影响达到对Y的影响。

要做孟德尔随机化要有3个假设前提: 1.基因M要和体重(暴露因素)强相关联。-关联性假设 2.基因M和结局变量冠心病和其他混杂因素没有关联。-独立性假设 3.基因M只能通过影响体重对冠心病造成影响,不能通过其他途径对冠心病造成影响。-排他性假设

目前主流使用的孟德尔随机化的工具为TwoSampleMR包。TwoSampleMR软件包的使用可参考https://mrcieu.github.io/TwoSampleMR/articles/perform_mr.html

标签:#微生物组数据分析  #MicrobiomeStatPlot  #孟德尔随机化分析  #R语言可视化 #Mendelian Randomization Analysis

作者:First draft(初稿):Defeng Bai(白德凤);Proofreading(校对):Ma Chuang(马闯) and Jiani Xun(荀佳妮);Text tutorial(文字教程):Defeng Bai(白德凤)

源代码及测试数据链接:

https://github.com/YongxinLiu/MicrobiomeStatPlot/项目中目录 3.Visualization_and_interpretation/MendelianRandomizationAnalysis

或公众号后台回复“MicrobiomeStatPlot”领取


孟德尔随机化分析案例

这是来自于深圳华大基因Tao Zhang课题组2022年发表于Nature Genetics上的一篇文章。论文题目为:Mendelian randomization analyses support causal relationships between blood metabolites and the gut microbiome. https://doi.org/10.1038/s41588-021-00968-y

图 5 |  颤杆菌属和 Alistipes 对降低血液甘油三酯浓度的因果影响。

a,MR分析结果的示意图:对颤杆菌较高丰度的遗传易感性与血液甘油三酯浓度降低有关,在较小程度上与降低 BMI 和 WHR 有关。b,森林图表示颤杆菌丰度每增加 1 s.d. 对血液甘油三酯、BMI 和 WHR 的影响,分别使用观察性和 MR分析估计。使用多元线性模型在总共 2,545 个样本(紫色)中执行观察性相关性分析。使用由多达 134 个遗传预测因子构建的 PRS 作为工具变量进行单样本 MR 分析,分别在发现(蓝色)和复制(红色)队列中估计。列出了观察和单样本 MR 分析的 beta 估计值和 95% 置信区间 (CI) 值以及 P值。c、d、森林图分别表示使用单样本 MR 和六种不同的双样本 MR 方法在发现 (蓝色) 和复制 (红色) 队列中估计的 Oscillibacter (c) 和 Alistipes (d) 对甘油三酯含量的因果影响的 MR 估计值和 95% CI 值。还列出了通过每种 MR 方法计算的 P 值。

结果

最显著的因果效应是颤杆菌降低血液甘油三酯浓度(图 5a-c),在较小程度上降低 BMI 和腰臀比 (WHR),而血浆丙氨酸的影响是双向的。使用 134 个遗传变异构建多基因风险评分 (PRS)(134 个遗传变异和构建的 PRS 分别解释了 39.3% 和 49.6% 的表型变异;图 3b 和补充表 10)在发现队列中进行单样本 MR 分析,我们估计每个 1 s.d. Oscillibacter 丰度的增加会导致 WHR 降低 0.261 mmol l-1(P = 2.53 × 10-10)、甘油三酯浓度降低 0.161 kg m-2 和 0.126 比率(P = 2.73 × 10-3)BMI 降低(P = 1.33 × 10-4)。当进行四次双样本 MR 测试时,这种因果关系是稳健的(PGCTA-GSMR(全基因组复杂性状分析-广义总结 MR)Pinverse_variance_weighted = 2.45 × 10-15= 4.34 × 10-11,Pweighted-median = 1.22 × 10-7,和 PMR-Egger = 1.35 × 10-5)(图 5c),并且没有水平多效性的证据(PMR-PRESSOGlobaltest = 0.18;补充表 11)。此外,Alistipes 相对丰度较高也与血液甘油三酯浓度降低有关(图 5d)。

孟德尔随机化分析R语言实战

源代码及测试数据链接:

https://github.com/YongxinLiu/MicrobiomeStatPlot/

或公众号后台回复“MicrobiomeStatPlot”领取

软件包安装

# 基于CRAN安装R包,检测没有则安装 Installing R packages based on CRAN and installing them if they are not detectedp_list = c("ggplot2", "reshape2",  "dplyr", "cowplot","readxl")for(p in p_list){if (!requireNamespace(p)){install.packages(p)}    library(p, character.only = TRUE, quietly = TRUE, warn.conflicts = FALSE)}
# 基于github安装library(remotes)if(!requireNamespace("TwoSampleMR", quietly = TRUE))  remotes::install_github("MRCIEU/TwoSampleMR")
library(devtools)if(!requireNamespace("MRPRESSO", quietly = TRUE))  remotes::install_github("rondolab/MR-PRESSO")
# 加载R包 Loading R packagessuppressWarnings(suppressMessages(library(ggplot2)))suppressWarnings(suppressMessages(library(reshape2)))suppressWarnings(suppressMessages(library(cowplot)))suppressWarnings(suppressMessages(library(dplyr)))suppressWarnings(suppressMessages(library(readxl)))suppressWarnings(suppressMessages(library(TwoSampleMR)))suppressWarnings(suppressMessages(library(MRPRESSO)))

实战1

参考:https://mp.weixin.qq.com/s/opx5-pGVerVDtEIZzRziEQ

# 1.获得暴露因素X的SNP数据,包括体重BMI的SNP数据和结局冠心病的SNP数据。通常通过各种GWAS数据库或者GWAS文献找到。或者通过https://gwas.mrcieu.ac.uk/datasets/这个数据库寻找,利用这个数据库可以直接通过TwoSampleMR包下载。进入数据库网站后,在网站Traitcontains选项中里填入体重指数,然后筛选数据,这里分析选择了ieu-a-835这个数据。然后再找冠心病的数据,这里选的是ieu-a-7这个数据。然后,需要下载ieu-a-835和ieu-a-7这两个数据,先使用extract_instruments函数对暴露数据进行下载。如果是已经下载到电脑里面的数据,使用read_exposure_data函数和clump_data函数读取。
# 从数据库中查找用于 MR 分析的数据library(TwoSampleMR)exposure_dat <- extract_instruments(outcomes='ieu-a-835',#access_token = NULL,                         options(ieugwasr_api = 'gwas-api.mrcieu.ac.uk/'),                         force_server = FALSE,                         p1 = 5e-08,                         clump = TRUE,                         p2 = 5e-08,                         r2 = 0.001,                         kb = 10000)
# 查看exposure_dat的SNP数据exposure_dat$SNP#>  [1] "rs543874"   "rs11165643" "rs17024393" "rs3101336"  "rs2820292" #>  [6] "rs657452"   "rs6656785"  "rs1528435"  "rs7599312"  "rs10182181"#> [11] "rs13021737" "rs2121279"  "rs12986742" "rs1016287"  "rs2365389" #> [16] "rs3849570"  "rs13078960" "rs16851483" "rs6804842"  "rs1516725" #> [21] "rs13107325" "rs11727676" "rs10938397" "rs17001654" "rs2112347" #> [26] "rs13191362" "rs2033529"  "rs9400239"  "rs205262"   "rs2207139" #> [31] "rs1167827"  "rs2245368"  "rs17405819" "rs2033732"  "rs10968576"#> [36] "rs6477694"  "rs1928295"  "rs4740619"  "rs10733682" "rs7903146" #> [41] "rs7899106"  "rs17094222" "rs2176598"  "rs4256980"  "rs3817334" #> [46] "rs12286929" "rs11030104" "rs11057405" "rs7138803"  "rs12429545"#> [51] "rs9579083"  "rs10132280" "rs7141420"  "rs16951275" "rs3736485" #> [56] "rs879620"   "rs758747"   "rs9926784"  "rs3888190"  "rs4889606" #> [61] "rs1558902"  "rs12940622" "rs1000940"  "rs1808579"  "rs6567160" #> [66] "rs17066856" "rs29941"    "rs17724992" "rs2287019"
# 因为ieu-a-7这个数据是冠心病数据,所以这一步要在ieu-a-7这个数据中找到与上面65个SNP匹配的数据,这样就生成了结果数据outcome_dat <- extract_outcome_data(snps=exposure_dat$SNP, outcomes='ieu-a-7', options(ieugwasr_api = 'gwas-api.mrcieu.ac.uk/'), proxies = FALSE, maf_threshold =0.01#, #access_token =NULL )
# 2.效应等位与效应量保持统一dat<- harmonise_data( exposure_dat = exposure_dat, outcome_dat = outcome_dat)
#以上2步完成就可以进行MR分析。mr默认使用五种方法(MR Egger, Weighted median, Inverse variance weighted, Simple mode, Weighted mode)
res<- mr(dat)# 在res数据表中,b是效应值,se是标准误,pval是P值,最重要的就是看Inverse variance weighted这个方法的P值。这里P值小于0.05,表明体重和冠心病解决是有统计意义的。#res
# 敏感性分析mr_heterogeneity(dat)# 水平多效性检验,如果变量工具不通过暴露影响结果,就违反了孟德尔的假设,就是存在多水平效应。mr_pleiotropy_test(dat)# Leave-one-out analysis,Leave-one-out analysis是指逐步剔除SNP后观察剩余的稳定性,理想的是剔除后变化不大。res_loo <- mr_leaveoneout(dat)#mr_leaveoneout_plot(res_loo)
# 可视化,散点图,可以看出趋势是正相关p1 <-mr_scatter_plot(res, dat)# p1
# 森林图res_single<- mr_singlesnp(dat)# mr_forest_plot(res_single)
# 漏斗图中的不对称性对于衡量特定 MR 分析的可靠性很有用。可以使用单个 SNP 结果生成漏斗图,如下所示:# mr_funnel_plot(res_single)

实战2

参考:https://mp.weixin.qq.com/s/_1Zj5hAK8PvCSlW388Uffw

参考论文题目:Mendelian randomization analysis does not reveal a causal influence of mental diseases on osteoporosis.https://doi.org/10.3389/fendo.2023.1125427

library(MRPRESSO)# 导入数据进行分析Ins<-data<- read_excel("data/FA_BMD_data.xlsx",1)
# 暴漏数据aaa<- extract_instruments(outcomes='ieu-a-22',clump=TRUE, r2=0.001,kb=10000)
# 数据不是标准格式,需要修改数据格式outcome_dat<-format_data(Ins,                         type = "outcome", header = TRUE,                         phenotype_col ="Phenotype", snp_col = "SNPs",                         beta_col ="beta.outcome",                         se_col ="se.outcome",                         eaf_col = "eaf.outcome",                         effect_allele_col ="effect_allele.outcome",                         other_allele_col ="other_allele.outcome",                         pval_col = "pval.outcome")
# 进行数据效应等位与效应两保持统一,然后进行MR分析Mydata<- harmonise_data(aaa, outcome_dat)
res<-mr(Mydata,method_list=c("mr_ivw", "mr_ivw_fe","mr_two_sample_ml","mr_egger_regression", "mr_weighted_median","mr_penalised_weighted_median","mr_simple_mode", "mr_weighted_mode"))
# 进行MR-PRESSO检验,是一个多水平效应检验mr_presso(BetaOutcome="beta.outcome", BetaExposure ="beta.exposure", SdOutcome ="se.outcome", SdExposure = "se.exposure",OUTLIERtest =TRUE,DISTORTIONtest = TRUE, data =Mydata, NbDistribution = 1000,SignifThreshold = 0.05)#> $`Main MR results`#>        Exposure       MR Analysis Causal Estimate         Sd   T-stat   P-value#> 1 beta.exposure               Raw      0.02720285 0.01820984 1.493855 0.1390517#> 2 beta.exposure Outlier-corrected              NA         NA       NA        NA#> #> $`MR-PRESSO results`#> $`MR-PRESSO results`$`Global Test`#> $`MR-PRESSO results`$`Global Test`$RSSobs#> [1] 82.80165#> #> $`MR-PRESSO results`$`Global Test`$Pvalue#> [1] 0.529
# 异质性检验mr_heterogeneity(Mydata,method_list=c("mr_egger_regression", "mr_ivw"))# 多水平校验pleio<- mr_pleiotropy_test(Mydata)# pleio
# Leave-one-out analysis,Leave-one-out analysis是指逐步剔除SNP后观察剩余的稳定性,理想的是剔除后变化不大,这和我们的meta分析剔除法很相似。single<- mr_leaveoneout(Mydata)# mr_leaveoneout_plot(single)
# 散点图# mr_scatter_plot(res,Mydata)
# 绘制森林图res_single<- mr_singlesnp(Mydata)# mr_forest_plot(res_single)
# 绘制漏斗图,主要是看蓝线周围的散点是否对称# mr_funnel_plot(res_single)
# 最后生成OR和置信区间out<-generate_odds_ratios(res)

绘制孟德尔随机化结果图

# Load dataMR01 <- read.table(file = "data/data_single.txt", sep = "\t", header = TRUE, check.names = FALSE)
# Plot# 绘图Set <- "A"col.palette  <- setNames(c("#762a83"), c(Set))p1 <- ggplot(MR01, aes(x = Total, y = Causal_effects, color = Set                       )) +  geom_point(size = 6, shape = 22, fill = "#945893", stroke = 0.8) + # Larger point size with border  geom_errorbar(aes(xmin = Total - error_bar, xmax = Total + error_bar, colour = Set),                width = 0.3, size = 0.8) + # Error bars with adjusted width and size  labs(y = "", x = "Estimate (95% CI)") +  geom_vline(xintercept = 0, linetype = "dashed", color = "black", size = 0.7) +  scale_color_manual(values = col.palette) +  theme_classic(base_size = 14) + # Use a classic theme for a clean look  theme(    panel.grid.major = element_blank(), # Remove major grid lines    panel.grid.minor = element_blank(), # Remove minor grid lines    axis.text = element_text(color = 'black', size = 12),    axis.title = element_text(size = 14, face = "bold"),    legend.position = "none",    legend.title = element_text(size = 14, face = "bold"),    legend.text = element_text(size = 12),    plot.title = element_text(size = 16, face = "bold", hjust = 0.5)  )
# Save the plot# 图保存ggsave(filename = "results/Error_bar_plot01.pdf", plot = p1, width = 8, height = 6, useDingbats = FALSE, limitsize = FALSE)
# 分组对比(双)MR02 <- read.table(file = "data/data_double.txt", sep = "\t", header = TRUE, check.names = FALSE)
# Plot# 绘图Set1 <- "A"Set2 <- "B"col.palette  <- setNames(c("#762a83","#5aae61"), c(Set1, Set2))p2 <- ggplot(MR02, aes(x = Total, y = Causal_effects, fill = Set, color = Set                       )) +  geom_point(size = 6, shape = 22, stroke = 0.8, position = position_dodge(0.5)) + # Larger point size with border  geom_errorbar(aes(xmin = Total - error_bar, xmax = Total + error_bar, colour = Set), position = position_dodge(0.5),                width = 0.3, size = 0.8) + # Error bars with adjusted width and size  labs(y = "", x = "Estimate (95% CI)") +  geom_vline(xintercept = 0, linetype = "dashed", color = "black", size = 0.7) +  #geom_hline(yintercept = c(3.5, 6.5, 9.5), linetype = "dashed", color = "black", size = 0.7) +  scale_color_manual(values = col.palette) +  scale_fill_manual(values = col.palette) +  theme_classic(base_size = 14) + # Use a classic theme for a clean look  theme(    panel.grid.major = element_blank(), # Remove major grid lines    panel.grid.minor = element_blank(), # Remove minor grid lines    axis.text = element_text(color = 'black', size = 12),    axis.title = element_text(size = 14, face = "bold"),    legend.position = "none",    legend.title = element_text(size = 14, face = "bold"),    legend.text = element_text(size = 12),    plot.title = element_text(size = 16, face = "bold", hjust = 0.5)  )
# Save the plot# 图保存ggsave(filename = "results/Error_bar_plot02.pdf", plot = p2, width = 8, height = 6, useDingbats = FALSE, limitsize = FALSE)

# 分组对比(单+双)MR03 <- read.table(file = "data/data_single_double.txt", sep = "\t", header = TRUE, check.names = FALSE)
# Plot# 绘图Set1 <- "A"Set2 <- "B"Set3 <- "C"col.palette  <- setNames(c("#762a83","#5aae61","#5791c9"), c(Set1, Set2, Set3))p3 <- ggplot(MR03, aes(x = Total, y = Causal_effects, fill = Set, color = Set                       )) +  geom_point(size = 6, shape = 22, stroke = 0.8, position = position_dodge(0.5)) + # Larger point size with border  geom_errorbar(aes(xmin = Total - error_bar, xmax = Total + error_bar, colour = Set), position = position_dodge(0.5),                width = 0.3, size = 0.8) + # Error bars with adjusted width and size  labs(y = "", x = "Estimate (95% CI)") +  geom_vline(xintercept = 0, linetype = "dashed", color = "black", size = 0.7) +  scale_color_manual(values = col.palette) +  scale_fill_manual(values = col.palette) +  theme_classic(base_size = 14) + # Use a classic theme for a clean look  theme(    panel.grid.major = element_blank(), # Remove major grid lines    panel.grid.minor = element_blank(), # Remove minor grid lines    axis.text = element_text(color = 'black', size = 12),    axis.title = element_text(size = 14, face = "bold"),    legend.position = "none",    legend.title = element_text(size = 14, face = "bold"),    legend.text = element_text(size = 12),    plot.title = element_text(size = 16, face = "bold", hjust = 0.5)  )
# Save the plot# 图保存ggsave(filename = "results/Error_bar_plot03.pdf", plot = p3, width = 8, height = 6, useDingbats = FALSE, limitsize = FALSE)

# 分组对比(单+双+空白行)MR04 <- read.table(file = "data/data_single_double2.txt", sep = "\t", header = TRUE, check.names = FALSE)
# Plot# 绘图Set1 <- "A"Set2 <- "B"Set3 <- "C"col.palette  <- setNames(c("#762a83","#5aae61","#5791c9"), c(Set1, Set2, Set3))
MR04 = MR04 %>%  mutate(Causal_effects = ordered(Causal_effects,                         levels=c("Oscillibacter WHR","Oscillibacter BMI","Oscillibacter triglyceride",                                  "Observational analysis","Analysis2","TIVM",                                  "TSLS","Analysis1"                                  )))p4 <- ggplot(MR04, aes(x = Total, y = Causal_effects, fill = Set, color = Set                       )) +  geom_point(size = 6, shape = 22, stroke = 0.8, position = position_dodge(0.5)) + # Larger point size with border  geom_errorbar(aes(xmin = Total - error_bar, xmax = Total + error_bar, colour = Set), position = position_dodge(0.5),                width = 0.3, size = 0.8) + # Error bars with adjusted width and size  labs(y = "", x = "Estimate (95% CI)") +  geom_vline(xintercept = 0, linetype = "dashed", color = "black", size = 0.7) +  scale_color_manual(values = col.palette) +  scale_fill_manual(values = col.palette) +  theme_classic(base_size = 14) + # Use a classic theme for a clean look  theme(    panel.grid.major = element_blank(), # Remove major grid lines    panel.grid.minor = element_blank(), # Remove minor grid lines    axis.text = element_text(color = 'black', size = 12),    axis.title = element_text(size = 14, face = "bold"),    legend.position = "none",    legend.title = element_text(size = 14, face = "bold"),    legend.text = element_text(size = 12),    plot.title = element_text(size = 16, face = "bold", hjust = 0.5)  )
# Save the plot# 图保存ggsave(filename = "results/Error_bar_plot04.pdf", plot = p4, width = 8, height = 6, useDingbats = FALSE, limitsize = FALSE)

排版Combo plots

组合多个子图为发表格式

library(cowplot)width = 89height = 59p0 = plot_grid(p1, p2, p3, p4, labels = c("A", "B", "C", "D"), ncol = 2)ggsave("results/MR_forest.pdf", p0, width = width * 3, height = height * 3, units = "mm")

使用此脚本,请引用下文:

Yong-Xin Liu, Lei Chen, Tengfei Ma, Xiaofang Li, Maosheng Zheng, Xin Zhou, Liang Chen, Xubo Qian, Jiao Xi, Hongye Lu, Huiluo Cao, Xiaoya Ma, Bian Bian, Pengfan Zhang, Jiqiu Wu, Ren-You Gan, Baolei Jia, Linyang Sun, Zhicheng Ju, Yunyun Gao, Tao Wen, Tong Chen. 2023. EasyAmplicon: An easy-to-use, open-source, reproducible, and community-based pipeline for amplicon data analysis in microbiome research. iMeta 2: e83. https://doi.org/10.1002/imt2.83

Copyright 2016-2024 Defeng Bai baidefeng@caas.cn, Chuang Ma 22720765@stu.ahau.edu.cn, Jiani Xun 15231572937@163.com, Yong-Xin Liu liuyongxin@caas.cn

宏基因组推荐
本公众号现全面开放投稿,希望文章作者讲出自己的科研故事,分享论文的精华与亮点。投稿请联系小编(微信号:yongxinliu 或 meta-genomics)

猜你喜欢

iMeta高引文章 fastp 复杂热图 ggtree 绘图imageGP 网络iNAP
iMeta网页工具 代谢组MetOrigin 美吉云乳酸化预测DeepKla
iMeta综述 肠菌菌群 植物菌群 口腔菌群 蛋白质结构预测

10000+:菌群分析 宝宝与猫狗 梅毒狂想曲 提DNA发Nature

系列教程:微生物组入门 Biostar 微生物组  宏基因组

专业技能:学术图表 高分文章 生信宝典 不可或缺的人

一文读懂:宏基因组 寄生虫益处 进化树 必备技能:提问 搜索  Endnote

扩增子分析:图表解读 分析流程 统计绘图

16S功能预测   PICRUSt  FAPROTAX  Bugbase Tax4Fun

生物科普:  肠道细菌 人体上的生命 生命大跃进  细胞暗战 人体奥秘  

写在后面

为鼓励读者交流快速解决科研困难,我们建立了“宏基因组”讨论群,己有国内外6000+ 科研人员加入。请添加主编微信meta-genomics带你入群,务必备注“姓名-单位-研究方向-职称/年级”。高级职称请注明身份,另有海内外微生物PI群供大佬合作交流。技术问题寻求帮助,首先阅读《如何优雅的提问》学习解决问题思路,仍未解决群内讨论,问题不私聊,帮助同行。

点击阅读原文


宏基因组
宏基因组/微生物组是当今世界科研最热门的研究领域之一,为加强本领域的技术交流与传播,推动中国微生物组计划发展,中科院青年科研人员创立“宏基因组”公众号,目标为打造本领域纯干货技术及思想交流平台。
 最新文章