一文搞定R语言箱线图添加显著性的方法

科技   科技   2024-09-23 19:54   河南  

大家好,我是邓飞,下面介绍一下箱线图实现显著性添加的方法,类似这种:

「单因素二水平T检验箱线图可视化」

「单因素三水平T检验箱线图可视化」

「单因素三水平柱形图」

「单因素三水平折线图」「二因素柱形图」

「二因素折线图」

1. 单因素二水平

这种试验,比如有两个品种,株高的差异,每个品种调查了10株,就构成了这样的试验数据。

「模拟数据:」

set.seed(123)
y1 = rnorm(10) + 5
y2 = rnorm(10) + 15
dd = data.frame(Group = rep(c("A","B"),each=10),y = c(y1,y2))
dd
str(dd)
dd$Group = as.factor(dd$Group)

「数据:」

> dd
Group y
1 A 4.439524
2 A 4.769823
3 A 6.558708
4 A 5.070508
5 A 5.129288
6 A 6.715065
7 A 5.460916
8 A 3.734939
9 A 4.313147
10 A 4.554338
11 B 16.224082
12 B 15.359814
13 B 15.400771
14 B 15.110683
15 B 14.444159
16 B 16.786913
17 B 15.497850
18 B 13.033383
19 B 15.701356
20 B 14.527209

这里,使用的是ggpubr包进行绘图:

1.1 绘制箱线图

library(ggplot2)
library(ggpubr)
ggboxplot(dd,x = "Group",y = "y")
在这里插入图片描述

1.2 箱线图添加不同颜色

ggboxplot(dd,x = "Group",y = "y",color = "Group")

1.3 箱线图添加散点图

ggboxplot(dd,x = "Group",y = "y",color = "Group",add = "jitter")

1.4 箱线图+散点图+显著性水平

这里,默认的统计方法是非参数统计Wilcoxon,如果想用t.test,见下面操作

ggboxplot(dd,x = "Group",y = "y",color = "Group",add = "jitter") + 
stat_compare_means()

1.5 用t.test作为统计方法


ggboxplot(dd,x = "Group",y = "y",color = "Group",add = "jitter") +
stat_compare_means(method = "t.test")

1.6 直接输出显著性


ggboxplot(dd,x = "Group",y = "y",color = "Group",add = "jitter") +
stat_compare_means(method = "t.test",label = "p.signif")

2. 单因素三水平

二个水平可以用T检验,三个水平或者多个水平的数据,如何检验呢?

「模拟数据:」

# 构建三个水平 ANOVA
set.seed(123)
y1 = rnorm(10) + 5
y2 = rnorm(10) + 15
y3 = rnorm(10) + 15

dd = data.frame(Group = rep(c("A","B","C"),each=10),y = c(y1,y2,y3))
dd
str(dd)
dd$Group = as.factor(dd$Group)

「数据如下:」

> dd
Group y
1 A 4.439524
2 A 4.769823
3 A 6.558708
4 A 5.070508
5 A 5.129288
6 A 6.715065
7 A 5.460916
8 A 3.734939
9 A 4.313147
10 A 4.554338
11 B 16.224082
12 B 15.359814
13 B 15.400771
14 B 15.110683
15 B 14.444159
16 B 16.786913
17 B 15.497850
18 B 13.033383
19 B 15.701356
20 B 14.527209
21 C 13.932176
22 C 14.782025
23 C 13.973996
24 C 14.271109
25 C 14.374961
26 C 13.313307
27 C 15.837787
28 C 15.153373
29 C 13.861863
30 C 16.253815

2.1 箱线图+散点图

p = ggboxplot(dd,x = "Group",y = "y",color = "Group",add = "jitter") 
p

2.2 箱线图+散点图+显著性

p + stat_compare_means(method = "anova")

2.3 两两之间显著性绘制

my_comparisons = list( c("A", "B"), c("A", "C"), c("B", "C") )
p + stat_compare_means(comparisons = my_comparisons,
# label = "p.signif",
method = "t.test")

2.4 显示显著性

p + stat_compare_means(comparisons = my_comparisons,
label = "p.signif",
method = "t.test")

3. 两因素数据

「模拟数据:」

# 两个因素的数据
set.seed(123)
y1 = rnorm(10) + 5
y2 = rnorm(10) + 8
y3 = rnorm(10) + 7
y4 = rnorm(10) + 15
y5 = rnorm(10) + 18
y6 = rnorm(10) + 17

dd = data.frame(Group1 = rep(c("A","B","C"),each=10),
Group2 = rep(c("X","Y"),each=30),
y = c(y1,y2,y3,y4,y5,y6))
dd
str(dd)
dd$Group1 = as.factor(dd$Group1)
dd$Group2 = as.factor(dd$Group2)
str(dd)

「数据预览:」

> dd
Group1 Group2 y
1 A X 4.439524
2 A X 4.769823
3 A X 6.558708
4 A X 5.070508
5 A X 5.129288
6 A X 6.715065
7 A X 5.460916
8 A X 3.734939
9 A X 4.313147
10 A X 4.554338
11 B X 9.224082
12 B X 8.359814
13 B X 8.400771
14 B X 8.110683
15 B X 7.444159
16 B X 9.786913
17 B X 8.497850
18 B X 6.033383
19 B X 8.701356
20 B X 7.527209
21 C X 5.932176
22 C X 6.782025
23 C X 5.973996
24 C X 6.271109
25 C X 6.374961
26 C X 5.313307
27 C X 7.837787
28 C X 7.153373
29 C X 5.861863
30 C X 8.253815
31 A Y 15.426464
32 A Y 14.704929
33 A Y 15.895126
34 A Y 15.878133
35 A Y 15.821581
36 A Y 15.688640
37 A Y 15.553918
38 A Y 14.938088
39 A Y 14.694037
40 A Y 14.619529
41 B Y 17.305293
42 B Y 17.792083
43 B Y 16.734604
44 B Y 20.168956
45 B Y 19.207962
46 B Y 16.876891
47 B Y 17.597115
48 B Y 17.533345
49 B Y 18.779965
50 B Y 17.916631
51 C Y 17.253319
52 C Y 16.971453
53 C Y 16.957130
54 C Y 18.368602
55 C Y 16.774229
56 C Y 18.516471
57 C Y 15.451247
58 C Y 17.584614
59 C Y 17.123854
60 C Y 17.215942

3.1 绘制分组箱线图

p = ggboxplot(dd,x = "Group1",y="y",color = "Group2",
add = "jitter")
p

3.2 增加P值

p + stat_compare_means(aes(group = Group2),method = "t.test")

3.3 修改为显著性结果

p + stat_compare_means(aes(group = Group2),method = "t.test",label = "p.signif")

3.4 将分组数据分开绘制

p = ggboxplot(dd,x = "Group2",y="y",color = "Group1",
add = "jitter",facet.by = "Group1")
p

3.5 分组显示统计检验

p + stat_compare_means(method = "t.test")

3.6 分组显示显著性结果

p + stat_compare_means(method = "t.test",label = "p.signif",label.y = 17)

4. 单因素直方图绘制

直方图+标准误,之前用ggplot2需要很长的代码,这里有更好的方案。

4.1 直方图+标准误

p = ggbarplot(dd,x = "Group1",y = "y",add = "mean_se",color = "Group1")
p

4.2 直方图+标准误+显著性

p + stat_compare_means(method = "anova",,label.y = 15)+ 
stat_compare_means(comparisons = my_comparisons)

5. 单因素折线图绘制

5.1 折线图+标准误

p = ggline(dd,x = "Group1",y = "y",add = "mean_se")
p

5.2 折线图+标准误+显著性

p + stat_compare_means(method = "anova",,label.y = 15)+ 
stat_compare_means(comparisons = my_comparisons)

6. 二因素直方图绘制

6.1 直方图+标准误

p = ggbarplot(dd,x = "Group1",y = "y",add = "mean_se",color = "Group2", position = position_dodge(0.8))
p

6.2 直方图+标准误+显著性

p + stat_compare_means(aes(group=Group2), label = "p.signif")

7. 二因素折线图绘制

7.1 折线图+标准误

p = ggline(dd,x = "Group1",y = "y",add = "mean_se",color = "Group2", position = position_dodge(0.8))
p

7.2 折线图+标准误+显著性

p + stat_compare_means(aes(group=Group2), label = "p.signif")

8. 代码汇总

下面代码是所有代码的汇总,里面包括生成数据,做不同类型的图。只需要将数据整理为这种格式,就可以出图了,对于初学者而言,是最简单最快捷的方法。show you the code!


# > 欢迎关注我的公众号:`育种数据分析之放飞自我`。主要分享R语言,Python,育种数据分析,生物统计,数量遗传学,混合线性模型,GWAS和GS相关的知识。

# 构建两个水平 T-test
set.seed(123)
y1 = rnorm(10) + 5
y2 = rnorm(10) + 15
dd = data.frame(Group = rep(c("A","B"),each=10),y = c(y1,y2))
dd
str(dd)
dd$Group = as.factor(dd$Group)

library(ggplot2)
library(ggpubr)
ggboxplot(dd,x = "Group",y = "y")
ggboxplot(dd,x = "Group",y = "y",color = "Group")
ggboxplot(dd,x = "Group",y = "y",color = "Group",add = "jitter")
ggboxplot(dd,x = "Group",y = "y",color = "Group",add = "jitter") +
stat_compare_means()

ggboxplot(dd,x = "Group",y = "y",color = "Group",add = "jitter") +
stat_compare_means(method = "t.test")

ggboxplot(dd,x = "Group",y = "y",color = "Group",add = "jitter") +
stat_compare_means(method = "t.test",label = "p.signif")




# 构建三个水平 ANOVA
set.seed(123)
y1 = rnorm(10) + 5
y2 = rnorm(10) + 15
y3 = rnorm(10) + 15

dd = data.frame(Group = rep(c("A","B","C"),each=10),y = c(y1,y2,y3))
dd
str(dd)
dd$Group = as.factor(dd$Group)

p = ggboxplot(dd,x = "Group",y = "y",color = "Group",add = "jitter")
p
p + stat_compare_means(method = "anova")


# Perorm pairwise comparisons
# compare_means(y ~ Group, data = dd,method = "anova")

my_comparisons = list( c("A", "B"), c("A", "C"), c("B", "C") )
p + stat_compare_means(comparisons = my_comparisons,
# label = "p.signif",
method = "t.test")

p + stat_compare_means(comparisons = my_comparisons,
label = "p.signif",
method = "t.test")

# 两个因素的数据
set.seed(123)
y1 = rnorm(10) + 5
y2 = rnorm(10) + 8
y3 = rnorm(10) + 7
y4 = rnorm(10) + 15
y5 = rnorm(10) + 18
y6 = rnorm(10) + 17

dd = data.frame(Group1 = rep(c("A","B","C"),each=10),
Group2 = rep(c("X","Y"),each=30),
y = c(y1,y2,y3,y4,y5,y6))
dd
str(dd)
dd$Group1 = as.factor(dd$Group1)
dd$Group2 = as.factor(dd$Group2)
str(dd)

## 分组查看
p = ggboxplot(dd,x = "Group1",y="y",color = "Group2",
add = "jitter")
p
p + stat_compare_means(aes(group = Group2),method = "t.test")
p + stat_compare_means(aes(group = Group2),method = "t.test",label = "p.signif")

## 分组查看
p = ggboxplot(dd,x = "Group2",y="y",color = "Group1",
add = "jitter",facet.by = "Group1")
p
p + stat_compare_means(method = "t.test")
p + stat_compare_means(method = "t.test",label = "p.signif",label.y = 17)


# 单分组
# 三水平直方图
p = ggbarplot(dd,x = "Group1",y = "y",add = "mean_se",color = "Group1")
p
p + stat_compare_means(method = "anova",,label.y = 15)+
stat_compare_means(comparisons = my_comparisons)

# 有误差的折线图
p = ggline(dd,x = "Group1",y = "y",add = "mean_se")
p
p + stat_compare_means(method = "anova",,label.y = 15)+
stat_compare_means(comparisons = my_comparisons)


# 二分组
p = ggbarplot(dd,x = "Group1",y = "y",add = "mean_se",color = "Group2", position = position_dodge(0.8))
p
p + stat_compare_means(aes(group=Group2), label = "p.signif")


# 有误差的折线图
p = ggline(dd,x = "Group1",y = "y",add = "mean_se",color = "Group2", position = position_dodge(0.8))
p
p + stat_compare_means(aes(group=Group2), label = "p.signif")

欢迎关注我的公众号:育种数据分析之放飞自我。主要分享R语言,Python,育种数据分析,生物统计,数量遗传学,混合线性模型,GWAS和GS相关的知识。

分割线



想要更好的学习和交流,快来加入飞哥的知识星球,这是一个生物统计+数量遗传学+GWAS+GS的社区,在这里你可以向飞哥提问、帮你定学习计划、跟着飞哥一起做实战项目,冲冲冲。点击这里加入吧:飞哥的学习圈子


1,快来领取 | 飞哥的GWAS分析教程


2,飞哥汇总 | 入门数据分析资源推荐


3,数量遗传学,分享几本书的电子版


4,学习R语言这几本电子书就够了!


5,书籍及配套代码领取--统计遗传分析导论



育种数据分析之放飞自我
本公众号主要介绍动植物育种数据分析中的相关问题, 算法及程序代码.
 最新文章