下载完整GWAS Catalog的snp数据(孟德尔随机化分析使用)

文摘   2024-12-15 01:39   江苏  

欧洲生物信息学研究所(European Bioinformatics Institute,简称EBI)是一个位于英国剑桥的世界领先的生物信息学研究机构,隶属于欧洲分子生物学实验室(EMBL)。EBI提供广泛的数据库和工具,供全球科学家用于生物数据分析、存储、共享和检索。EBI数据库涵盖了基因组学、蛋白质组学、转录组学、代谢组学等多个领域,为生命科学研究提供了支持。


EBI提供了多种与GWAS相关的数据和资源。EBI有专门的数据库和平台,供研究人员获取GWAS数据、分析GWAS结果,并与其他生物信息学数据整合。

在EBI中,与GWAS相关的主要资源:

1.GWAS Catalog,可以用于mr分析

  • 功能
    :GWAS Catalog是EBI主办的一个公共数据库,收录了来自世界各地的全基因组关联研究结果。该数据库包含了与疾病、性状、环境因子等相关的基因变异(如单核苷酸多态性,SNPs)的信息。
  • 用途
    :研究人员可以通过GWAS Catalog搜索各种疾病或性状的相关基因变异,查看不同研究的GWAS结果,以及对应的变异和疾病关联。GWAS Catalog不仅包括变异信息,还提供了每个研究的详细数据和元信息(如样本大小、样本来源等)。
  • 网址
    GWAS Catalog
https://www.ebi.ac.uk/gwas/downloads/summary-statistics

假如我对Type 1 diabetes mellitus感兴趣

点击进来可以发现 rs信息


但是如何下载完整的GWAS summary数据呢?


第一步 进入GWAS catalog的官网(https://www.ebi.ac.uk/gwas/),点击Summary statistics(如下图所示)

第二步,直接搜索自己感兴趣的疾病

第三步,点击下载完整版本数据


wget -c http://ftp.ebi.ac.uk/pub/databases/gwas/summary_statistics/GCST90043001-GCST90044000/GCST90043633/harmonised/34737426-GCST90043633-EFO_0001359.h.tsv.gz
r中读取数据


注:Curation Process

Each GWAS Catalog study entry comprises one or more samples, designated as “Discovery” or “Replication” samples, depending on the stage of the GWA study in which they were analysed. For each sample, the detailed description is either submitted by authors as free text or extracted by curators from the relevant publication. To generate the controlled description, we selected, from a limited list of terms, the category label noted by the author or the closest match. Genetically assessed methods for defining population groups are given precedence or, if not stated, the category label that best correlates with the detailed description for the same sample. For example, we selected the category label “East Asian” for detailed descriptions containing the descriptor “Han Chinese”. The full list of category group labels and their definitions can be found in Table 1 of Morales et al, 2018.

We rely heavily on author-provided data, giving precedence to information inferred using genomic methods, such as principal component analysis (PCA). In some cases, when the information provided by authors is limited or ambiguous, we consider other sources in order to improve data completeness. These include peer-reviewed population genetics publications to obtain additional information on groups that are not adequately characterized by authors or when samples are solely described using ethno-cultural terms. When the only information provided in the curated publication is the location of recruitment, we consult The United Nations M49 Standard of Geographic Regions and The World Factbook. The latter is a regularly updated compendium of worldwide demographic data, covering all countries and territories of the world.

An internal analysis performed in 2023 suggests that the number of inferences made by curators for recently published studies is small.



2.NHGRI-EBI GWAS Catalog

  • 背景
    :这个数据库是由**美国国立人类基因组研究院(NHGRI)**和EBI合作维护的,专门记录GWAS研究中发现的显著基因-性状关联。它涵盖了大量的GWAS研究结果,特别是与常见疾病和多种表型相关的数据。
  • 数据内容
    :数据包括每个研究的详细描述,涉及的SNP信息,关联性状(如糖尿病、高血压、肥胖等),以及多种相关统计分析。
  • 用途
    :提供给研究人员用于查找和验证不同基因变异与疾病或性状之间的关系,为后续的功能研究、遗传学研究和疾病机制分析提供支持。


3.Ensembl (与GWAS关联)

  • 功能
    :虽然Ensembl主要是提供基因组注释和基因信息的数据库,但它也允许用户访问与GWAS结果相关的变异信息。Ensembl通过与GWAS研究的整合,帮助研究人员将GWAS中发现的基因变异与基因功能、蛋白质结构等信息联系起来。
  • 用途
    :Ensembl为GWAS结果提供背景信息,如变异的功能影响、基因位置、注释等,帮助研究人员理解与疾病相关的遗传变异。


4.GA4GH (Global Alliance for Genomics and Health)

  • 功能
    :GA4GH是一个全球合作平台,旨在通过标准化推动基因组数据的共享和分析。它与EBI密切合作,促进了包括GWAS数据在内的遗传学数据的跨平台互操作性。
  • 用途
    :GA4GH为GWAS研究提供了数据共享标准和协议,支持更广泛的数据整合和互通,有助于进一步的跨学科研究。


5.UK Biobank

  • 功能
    :UK Biobank是一个基因组学与健康数据的资源库,包含了大量的GWAS数据。虽然它是一个独立的项目,但EBI提供了一些与UK Biobank数据相关的支持和分析工具。
  • 用途
    :UK Biobank包含了来自50多万人群的基因组数据和健康数据,提供丰富的GWAS研究资源。研究人员可以使用EBI的工具和平台访问、分析这些数据。


6.PheWeb

  • 功能
    :PheWeb是一个EBI支持的工具,用于展示和分析GWAS结果,它将GWAS数据可视化为图表,使研究人员更容易理解基因变异与疾病或性状之间的关系。
  • 用途
    :用户可以通过PheWeb浏览GWAS数据集,查看不同疾病或性状的关联结果,进行交互式的分析,甚至将结果与其他公共数据库中的信息结合起来。



Additional sources of summary statistics

Consortium
Full consortium name
Summary statistics link
ALSKP
ALS Knowledge portal
http://alskp.org/informational/data
CARDIoGRAMplusC4D
Coronary ARtery DIsease Genome wide Replication and Meta-analysis (CARDIoGRAM) plus The Coronary Artery Disease (C4D) Genetics
http://www.cardiogramplusc4d.org/data-downloads/
CDKP/ISGC
Cerebrovascular Disease Knowledge portal/International Stroke Genetics Consortium
https://cd.hugeamp.org/downloads.html
CHARGE
Cohorts for Heart and Aging Research in Genetic Epidemiology
http://www.chargeconsortium.com/main/results
CKDGen
Chronic Kidney Disease Genetics Consortium
http://ckdgen.imbi.uni-freiburg.de
CMDKP
Common Metabolic Diseases Knowledge portal
https://hugeamp.org/downloads.html
CVDKP
Cardiovascular Disease Knowledge portal
https://cvd.hugeamp.org/downloads.html
deCODE
deCODE genetics
https://www.decode.com/summarydata/
Diagram
DIAbetes Genetics Replication And Meta-analysis
http://diagram-consortium.org/downloads.html
EAGLE
EAGLE eczema consortium
http://data.bris.ac.uk/datasets/tar/28uchsdpmub118uex26ylacqm.zip
EGG
Early Growth Genetics Consortium
http://egg-consortium.org/
GEFOS
GEnetic Factors for OSteoporosis Consortium
http://www.gefos.org
GIANT
Genetic Investigation of ANthropometric Traits
http://portals.broadinstitute.org/collaboration/giant/index.php/GIANT_consortium_data_files
GLGC
Global Lipids Genetics Consortium
http://csg.sph.umich.edu//abecasis/public/lipids2013/
GRASP
Genome-Wide Repository of Associations Between SNPs and Phenotypes
https://grasp.nhlbi.nih.gov/FullResults.aspx
IBDGenetics
International Inflammatory Bowel Disease Genetics Consortium
https://www.ibdgenetics.org/downloads.html
JENGER
Japanese ENcyclopedia of GEnetic associations by Riken
http://jenger.riken.jp/en/
MAGIC
Meta-Analyses of Glucose and Insulin-related traits Consortium
https://www.magicinvestigators.org/downloads/
MSKKP
Musculoskeletal Knowledge portal
https://msk.hugeamp.org/downloads.html
NIAGADS
National Institute on Aging Genetics of Alzheimer's Disease
https://www.niagads.org/genomics/showXmlDataContent.do?name=XmlQuestions.Documentation#about
PGC
Psychiatric Genomic Consortium
https://www.med.unc.edu/pgc/results-and-downloads
PGRN
Pharmacogenomics Research Network
http://www.pgrn.org/riken-gwas-statistics.html
RGC
Reproductive Genetics Consortium
http://www.reprogen.org/data_download.html
Sleep Disorder KP
Sleep Disorder Knowledge portal
https://sleep.hugeamp.org/downloads.html
T2DKP
Type II Diabetes Knowledge portal
https://t2d.hugeamp.org/downloads.html
UKB
UK Biobank
http://geneatlas.roslin.ed.ac.uk
UKB
UK Biobank
http://www.nealelab.is/uk-biobank
PanUKBB
Pan-ancestry genetic analysis of UK BioBank
https://pan.ukbb.broadinstitute.org
WTCC
Wellcome Trust Case Control Consortium (access by request)
https://www.wtccc.org.uk/ccc1/summary_stats.html
AncestryDNA via EGA
AncestryDNA COVID-19 GWAS with Eight Phenotypes
https://ega-archive.org/studies/EGAS00001005099
PLCO
Prostate Lung Colorectal Ovarian Cancer Screening Study (National Cancer Institute)
https://exploregwas.cancer.gov/plco-atlas/



总结

EBI通过GWAS Catalog和其他相关工具,提供了大量与GWAS相关的资源。GWAS Catalog是EBI的核心资源之一,涵盖了全球各种疾病和性状的全基因组关联研究数据,帮助研究人员找到特定疾病和表型相关的基因变异。

此外,EBI的EnsemblUK BiobankPheWeb等平台也为GWAS数据的分析和理解提供了有力支持。因此,EBI是全球科学家进行GWAS研究、分析和数据共享的重要平台之一。

https://www.ebi.ac.uk/gwas/studies/GCST90013445https://mrcieu.r-universe.dev/TwoSampleMR/doc/manual.htmlhttps://mrcieu.github.io/TwoSampleMR/articles/introduction.html完整gwas数据:https://www.ebi.ac.uk/gwas/downloads/summary-statistics


生信小博士
【生物信息学】R语言开始,学习生信。Seurat,单细胞测序,空间转录组。 Python,scanpy,cell2location。资料分享
 最新文章