R 语言|ggplot2 绘图并添加表格|回归分析|可视化

文摘   2024-06-28 00:02   江苏  

引言

在数据分析和可视化的时,将统计模型的结果直接嵌入到图中可能是有用的。

加载包

library(tibble)
library(dplyr)
library(janitor)
library(DHARMa)
library(ggplot2)
library(ggpmisc)
library(gridExtra)

数据

使用 iris 数据集作为示例。首先,将其导出为 .csv 文件,然后再次读入,以演示实际运用中情形(往往是读取 .csv 文件)。接着,随机抽取 20 行数据展示,以便对数据有一个初步的了解。

# 以 iris 数据为例
write.csv(
janitor::clean_names(iris),
"dat.csv",
row.names = FALSE
)

# 读入 .csv 数据
dat <- read.csv("dat.csv", header = TRUE)

# 随机显示 20 行
set.seed(1)
dplyr::slice_sample(dat, n = 20)
   sepal_length sepal_width petal_length petal_width    species
1 5.8 2.7 4.1 1.0 versicolor
2 6.4 2.8 5.6 2.1 virginica
3 4.4 3.2 1.3 0.2 setosa
4 4.3 3.0 1.1 0.1 setosa
5 7.0 3.2 4.7 1.4 versicolor
6 5.4 3.0 4.5 1.5 versicolor
7 5.4 3.4 1.7 0.2 setosa
8 7.6 3.0 6.6 2.1 virginica
9 6.1 2.8 4.7 1.2 versicolor
10 4.6 3.4 1.4 0.3 setosa
11 6.3 2.5 4.9 1.5 versicolor
12 6.0 2.9 4.5 1.5 versicolor
13 5.5 3.5 1.3 0.2 setosa
14 6.5 3.0 5.8 2.2 virginica
15 7.2 3.6 6.1 2.5 virginica
16 5.5 4.2 1.4 0.2 setosa
17 5.8 2.7 5.1 1.9 virginica
18 7.2 3.2 6.0 1.8 virginica
19 5.6 3.0 4.1 1.3 versicolor
20 5.2 4.1 1.5 0.1 setosa

数据探索,建立回归模型

使用 R 图形探索数据间的关系,并建立回归模型分析 sepal_length 与 petal_length 之间的关系。

# 设置一种字体
FONT <- "Prompt"

# 数据探索
par(family = FONT, cex = 1.5)
plot(dat$sepal_length ~ dat$petal_length, pch = 16)


# 回归模型
lm_fit <- lm(sepal_length ~ petal_length * species, dat)

# 模型显著性检验表格
anova(lm_fit)
Analysis of Variance Table

Response: sepal_length
Df Sum Sq Mean Sq F value Pr(>F)
petal_length 1 77.643 77.643 685.8998 < 2.2e-16 ***
species 2 7.843 3.922 34.6441 5.206e-13 ***
petal_length:species 2 0.381 0.190 1.6828 0.1895
Residuals 144 16.301 0.113
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

检查残差

为了确保模型的有效性,使用 Q-Q plot 或 DHARMa 包来检查模型的残差。

# 使用 Q-Q plot 检查残差
par(family = FONT, cex = 1.5); car::qqPlot(lm_fit, pch = 16, cex = 0.8)


[1]  15 142
# 使用 DHARMa 包检查残差
simu_resd <- DHARMa::simulateResiduals(lm_fit, plot = FALSE)
par(family = FONT, cex = 1.5)
DHARMa::plotQQunif(simu_resd, pch = 16, cex = 0.8)


DHARMa::plotResiduals(simu_resd, pch = 16, cex = 0.8)


模型表格

把 anova 的结果转换为一个整洁的表格,并将这个表格添加到 ggplot2 图表中。

# 把 anova 结果变成一个表格
lm_aov_tab <- lm_fit |>
anova() |>
as.data.frame() |>
tibble::rownames_to_column(var = "term") |>
dplyr::filter(term != "Residuals") |>
janitor::clean_names() |>
dplyr::mutate(
p_value = ifelse(
pr_f < 0.001,
"<0.001",
round(pr_f, 3)
)
) |>
dplyr::select(-mean_sq, -pr_f) |>
dplyr::mutate_if(
is.numeric,
# 若是数值型,则会在单元格里居中
# ~ round(.x, 2)
# 若是字符型,则会在单元格里左对齐
~ as.character(round(.x, 2))
)
print(lm_aov_tab)
                  term df sum_sq f_value p_value
1 petal_length 1 77.64 685.9 <0.001
2 species 2 7.84 34.64 <0.001
3 petal_length:species 2 0.38 1.68 0.189
# 为了添加到 ggplot2 图中,
# 要设置表格坐标 x、y,
# 表格作为一个 list 存在一个 tibble 中
lm_aov_tib <- tibble(x = 0, y = 9, tab = list(lm_aov_tab))
print(lm_aov_tib)
# A tibble: 1 × 3
x y tab
<dbl> <dbl> <list>
1 0 9 <df [3 × 5]>

定义表格显示的风格

# 定义表格显示的风格(主题)
tab_style <- gridExtra::ttheme_minimal(
base_family = FONT, # Set the base font family to FONT
base_size = 14, # Set the base font size
core = list(
fg_params = list(
hjust = 0, # Horizontally align text to the left
x = 0.01 # Add a small left margin to the text
),
padding = unit(c(5, 5), "pt") # padding for cells to ... (unit) on all sides
),
colhead = list(
fg_params = list(
hjust = 0, # Horizontally align column header text to the left
x = 0.01, # Add a small left margin to the column header text
fontfamily = FONT
),
padding = unit(c(5, 5), "pt") # padding for column headers to (unit) on all sides
)
)

最终绘图

使用 ggplot2 来创建最终的图表,并将之前准备好的表格添加到图表中。

# Plot
fig <- dat |>
ggplot(
aes(
x = petal_length,
y = sepal_length
)
) +
geom_point(
aes(
fill = species,
# 第二个 fill 设为 after_scale 只透明内部填充色,
# 而黑色外圈不透明
fill = after_scale(alpha(fill, 0.6))
),
shape = 21,
size = 3,
stroke = 1.5
) +
stat_smooth(
method = lm,
formula = y ~ x,
color = "grey0",
se = FALSE,
fullrange = TRUE,
show.legend = FALSE
) +
geom_table(
data = lm_aov_tib,
aes(x, y, label = tab),
table.theme = tab_style,
hjust = 0,
vjust = 1
) +
scale_fill_brewer(
palette = "PiYG",
guide = guide_legend(
override.aes = list(size = 4)
)
) +
scale_y_continuous(expand = expansion(mult = c(0, 0.05))) +
coord_cartesian(
xlim = c(0, 8),
ylim = c(4, 9),
clip = "off"
) +
theme_bw(
base_family = FONT,
base_size = 22,
base_rect_size = 0.5,
base_line_size = 0.5
) +
theme(
panel.grid = element_blank(),
axis.text = element_text(color = "grey0"),
legend.background = element_blank(),
legend.position = c(1, 0),
legend.justification = c(1, 0),
legend.key.height = unit(6, "pt")
)
Warning: Duplicated aesthetics after name standardisation: fill
fig


# 保存为高分辨率 .png 图片
ggsave(
"ggplot2-add-table.png",
fig,
width = 7,
height = 6,
dpi = 900
)

ecologyR
🚀打赏可获取本公众号代码合集(见置顶文章)📌统计案例📌统计制图📌显著性标记📌结构方程模型可视化工具📌SEM教程与案例📌论文代码复现📌地图可视化
 最新文章