在模仿中精进数据可视化_按照clusterProfiler手搓一个GO富集分析的结
❝
在模仿中精进数据可视化
该系列推文中,我们将从各大顶级学术期刊的Figure
入手,
解读文章的绘图思路,
模仿文章的作图风格,
构建适宜的绘图数据,
并且将代码应用到自己的实际论文中。
绘图缘由:小伙伴们总会展示出一些非常好看且精美的图片。我大概率会去学习和复现一下。其实每个人的时间和精力都非常有限和异常宝贵的。之所以我会去做,主要有以下原因:
图片非常好看,我自己看着也手痒痒 图片我自己在Paper也用的上,储备着留着用 保持了持续学习的状态
❝
Y叔
的ClusterProfiler
的结果
❝
ggplot2
手搓的可视化结果
❝合并结果
❝点评一下,其实我想展示的是图例的问题。我在前期写了2个推文,分别总结了
ggplot2
图例的修改方式。
总结的真的还蛮不错的,就是很少有人看,也很少有人操作。
具体推文如下:
在模仿中精进数据可视化_深入了解ggplot2中的图例系统(一)
在模仿中精进数据可视化_深入了解ggplot2中的图例系统(二)
直接上代码:
加载R
包
rm(list = ls())
####----load R Package----####
library(tidyverse)
library(clusterProfiler)
library(org.Hs.eg.db)
library(patchwork)
加载数据
####----load Data----####
data(geneList, package='DOSE')
de <- names(geneList)[1:300]
# GO enrichment
deg_go <- enrichGO(gene = de,
OrgDb = "org.Hs.eg.db",
ont = "BP",
pvalueCutoff = 0.05,
qvalueCutoff = 0.05)
可视化
####----Plot----####
p1 <- dotplot(deg_go)
deg_go_df_top10 <- as.data.frame(deg_go) %>%
tidyr::separate(col = GeneRatio, sep = "/", into = c("n1", "n2"), remove = F) %>%
dplyr::mutate(GeneRatio = as.numeric(n1)/as.numeric(n2)) %>%
dplyr::slice(1:10) %>%
dplyr::arrange(desc(GeneRatio)) %>%
dplyr::mutate(Description = factor(Description, levels = rev(Description), ordered = T))
deg_go_df_top10$p.adjust %>% range()
p2 <- ggplot(data = deg_go_df_top10) +
geom_point(aes(x = GeneRatio, y = Description, size = Count, fill = p.adjust),
shape = 21, alpha = 0.75) +
scale_fill_gradient(low = "red", high = "blue",
limits = c(1.315956e-33, 8.664594e-17),
breaks = seq(1.415956e-33, 8.564594e-17, length.out = 5),
guide = guide_colorbar(reverse = T, order = 1)
) +
scale_size(range = c(3,8),
guide = guide_legend(override.aes = list(fill = "#000000"))) +
labs(y = "") +
theme_bw() +
theme(
axis.text = element_text(color = "#000000", size = 12)
)
p2
p_combine <- p1 + p2 +
plot_layout(widths = c(2, 1.85)) +
plot_annotation(tag_levels = "A")
ggsave(filename = "GO_enrichment.pdf",
plot = p_combine,
height = 6,
width = 16
)
版本信息
####----sessionInfo----####
sessionInfo()
R version 4.3.0 (2023-04-21)
Platform: x86_64-apple-darwin20 (64-bit)
Running under: macOS 14.6.1
Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.3-x86_64/Resources/lib/libRlapack.dylib; LAPACK version 3.11.0
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
time zone: Asia/Shanghai
tzcode source: internal
attached base packages:
[1] stats4 stats graphics grDevices utils datasets methods base
other attached packages:
[1] patchwork_1.2.0.9000 org.Hs.eg.db_3.18.0 AnnotationDbi_1.64.1 IRanges_2.36.0
[5] S4Vectors_0.40.2 Biobase_2.62.0 BiocGenerics_0.48.1 clusterProfiler_4.10.0
[9] lubridate_1.9.3 forcats_1.0.0 stringr_1.5.1 dplyr_1.1.4
[13] purrr_1.0.2 readr_2.1.5 tidyr_1.3.1 tibble_3.2.1
[17] ggplot2_3.5.1 tidyverse_2.0.0
loaded via a namespace (and not attached):
[1] RColorBrewer_1.1-3 rstudioapi_0.15.0 jsonlite_1.8.7
[4] magrittr_2.0.3 farver_2.1.2 ragg_1.2.6
[7] fs_1.6.4 zlibbioc_1.48.0 vctrs_0.6.5
[10] memoise_2.0.1 RCurl_1.98-1.13 ggtree_3.10.0
[13] htmltools_0.5.7 AnnotationHub_3.10.0 curl_5.1.0
[16] gridGraphics_0.5-1 plyr_1.8.9 cachem_1.1.0
[19] igraph_2.0.3 mime_0.12 lifecycle_1.0.4
[22] pkgconfig_2.0.3 Matrix_1.6-5 R6_2.5.1
[25] fastmap_1.2.0 gson_0.1.0 GenomeInfoDbData_1.2.11
[28] shiny_1.8.0 digest_0.6.36 aplot_0.2.3
[31] enrichplot_1.22.0 colorspace_2.1-1 textshaping_0.3.7
[34] RSQLite_2.3.3 labeling_0.4.3 filelock_1.0.2
[37] timechange_0.2.0 fansi_1.0.6 httr_1.4.7
[40] polyclip_1.10-7 compiler_4.3.0 bit64_4.0.5
[43] withr_3.0.1 BiocParallel_1.36.0 viridis_0.6.4
[46] DBI_1.1.3 ggforce_0.4.2 MASS_7.3-60
[49] rappdirs_0.3.3 HDO.db_0.99.1 tools_4.3.0
[52] ape_5.8 scatterpie_0.2.1 interactiveDisplayBase_1.40.0
[55] httpuv_1.6.12 glue_1.7.0 nlme_3.1-163
[58] GOSemSim_2.28.0 promises_1.2.1 grid_4.3.0
[61] shadowtext_0.1.2 reshape2_1.4.4 fgsea_1.28.0
[64] generics_0.1.3 gtable_0.3.5 tzdb_0.4.0
[67] data.table_1.16.0 hms_1.1.3 tidygraph_1.2.3
[70] utf8_1.2.4 XVector_0.42.0 ggrepel_0.9.6
[73] BiocVersion_3.18.1 pillar_1.9.0 yulab.utils_0.1.5
[76] later_1.3.1 splines_4.3.0 tweenr_2.0.3
[79] BiocFileCache_2.10.1 treeio_1.26.0 lattice_0.22-5
[82] bit_4.0.5 tidyselect_1.2.1 GO.db_3.18.0
[85] Biostrings_2.70.1 gridExtra_2.3 graphlayouts_1.0.2
[88] stringi_1.8.3 lazyeval_0.2.2 ggfun_0.1.5
[91] yaml_2.3.7 codetools_0.2-19 ggraph_2.1.0
[94] qvalue_2.34.0 BiocManager_1.30.22 ggplotify_0.1.2
[97] cli_3.6.3 systemfonts_1.1.0 xtable_1.8-4
[100] munsell_0.5.1 Rcpp_1.0.13 GenomeInfoDb_1.38.1
[103] dbplyr_2.4.0 png_0.1-8 parallel_4.3.0
[106] ellipsis_0.3.2 blob_1.2.4 DOSE_3.28.1
[109] bitops_1.0-7 viridisLite_0.4.2 tidytree_0.4.5
[112] scales_1.3.0 crayon_1.5.2 rlang_1.1.4
[115] cowplot_1.1.3 fastmatch_1.1-4 KEGGREST_1.42.0
历史绘图合集
进化树合集
环状图
散点图
基因家族合集
换一个排布方式:
首先查看基础版热图:
然后再看进阶版热图:
基因组共线性
WGCNA ggplot2版本
其他科研绘图
合作、联系和交流
有很多小伙伴在后台私信作者,非常抱歉,我经常看不到导致错过,请添加下面的微信联系作者,一起交流数据分析和可视化。