ggplot2实战指南！榴莲出品！

学术健康 2025-01-20 16:00 上海

ggplot2的底层逻辑跟敏捷项目管理的逻辑很相似：不断迭代。当然也可以理解为类似AI或者PS软件的图层管理，ggplot2也可以一个一个图层不断增加。ggplot2也是每个图层可以相对独立，从而创建一个复杂而美丽的图片。

除了三大基本元素和数据转换外，标度Scale是将数据取值映射到图形空间，例如用颜色、大小或形状来表示不同取值，从而将数据转化为视觉效果。坐标轴Coordinate用于确定x和y美学如何组合以在图中定位元素。默认坐标轴是笛卡尔坐标轴，函数coord_flip() 用于反转坐标轴，把x轴和y轴对调。

图层Layer生成在图像上可被人感知的图形。一个图层包括四部分：数据和图形属性映射；一种统计变换；一种几何对象；一种位置调整方式。分面Facet描述了如何将数据分解为各个子集，以及如何对子集作图并进行联合展示。分面也叫条件作图或网格作图。

01. ggplot2实战，一个变量：连续型

# 生成一个dataframe数据集set.seed(1234)wdata = data.frame(        sex = factor(rep(c("F", "M"), each=200)),        weight = c(rnorm(200, 55), rnorm(200, 58)))head(wdata)
# 计算不同性别的体重平均值：library(plyr) # 自行安装R包mu <- ddply(wdata, "sex", summarise, grp.mean=mean(weight)) 
# 首先绘制一个图层a，然后逐渐添加图层。a <- ggplot(wdata, aes(x = weight))  # data =wdata, aes映射X轴

可能添加的图层有：

对于一个连续变量：

面积图geom_area()
密度图geom_density()
点图geom_dotplot()
频率多边图geom_freqpoly()
直方图geom_histogram()
经验累积密度图stat_ecdf()
QQ图stat_qq()

对于一个离散变量：

条形图geom_bar()

geom_area（）：创建面积图


a + geom_area(stat = "bin") # 改变颜色  a + geom_area(aes(fill = sex), stat ="bin", alpha=0.6) + theme_classic()

注意：y轴默认为变量weight的数量即count，如果y轴要显示密度，可用以下代码：

a+geom_area(aes(y=..density..), stat = "bin")

geom_density（）：创建密度图

将使用如下函数：

geom_density():绘制密度图
geom_vline():添加竖直线

scale_color_manual():手动修改颜色

# 密度图 a + geom_density() # 改变线的颜色 a + geom_density(aes(color = sex))  # 修改填充色及透明度 a + geom_density(aes(fill = sex), alpha=0.4)    # 添加均值线及手动修改颜色 a + geom_density(aes(color = sex)) + geom_vline(data=mu, aes(xintercept=grp.mean, color=sex),   linetype="dashed") + scale_color_manual(values=c("#999999", "#E69F00"))

geom_dotplot（）：点图

在点图中，每个点代表一个观察点。

# 点图 a + geom_dotplot() # 改变填充色 a + geom_dotplot(aes(fill = sex))  # 手动改变填充色   a + geom_dotplot(aes(fill = sex)) + scale_fill_manual(values=c("#999999", "#E69F00"))

geom_freqpoly（）：频率多边形

# 频率多边图 a + geom_freqpoly()  # 将y轴变为密度值，改变主题 a + geom_freqpoly(aes(y = ..density..)) + theme_minimal() # 改变颜色和线型 a + geom_freqpoly(aes(color = sex, linetype = sex)) + theme_minimal()

geom_histogram（）：直方图

# 直方图 a + geom_histogram() # 改变线颜色 a + geom_histogram(aes(color = sex), fill = "white", position = "dodge") # 将y轴变为密度值 a + geom_histogram(aes(y = ..density..))

02. 一个变量：离散型

geom_bar（）可用于一个离散变量的可视化。以下采用公共数据集mpg数据集，记录了各种汽车效能指标与气缸数、重量、马力等其它因子的真实数据。不用在意数据集，这是内置的数据集。应用场景：经常在观测数据的分布情况的时候用到。

data(mpg) # 直方图，统计不同汽油种类的数目b <- ggplot(mpg, aes(fl))  # 映射了数据集和x轴 b + geom_bar()  # 绘制直方图
# 改变填充色 b + geom_bar(fill = "steelblue", color ="steelblue") + theme_minimal()

与geom_bar（）有一个相同功能的函数 stat_count（），这就是我们之前说的geom_*与stat_*一一对应的关系。替换函数，实现的图一样

b + stat_count()

03. 两个变量：X，Y都是连续变量的情况

下面将使用数据集mtcars，多练习使用几次，内置的几个数据集就都熟悉了。

data(mtcars) mtcars$cyl <- as.factor(mtcars$cyl)   # 转化成因子，类似于分类变量head(mtcars[, c("wt", "mpg", "cyl")])

一样的，图层叠加原理，首先绘制一个图层b，然后逐层添加。可能添加的图层有：

geom_point(): 散点图
geom_smooth(): 平滑线

geom_quantile(): 分位线
geom_rug(): 边际地毯线

geom_jitter(): 避免重叠
geom_text(): 添加文本注释

geom_point（）：散点图

关于配色，ggplot2提供了 scale_colour_manual()， scale_fill_manual()函数用于Diy配色。前者常用于散点图，后者常用于条形图。常用参数value指定颜色，alpha指定颜色透明度。必须掌握，常用。

# 散点图 b + geom_point() # 改变点形状和颜色 b + geom_point(aes(color = cyl, shape = cyl))  # color映射到cyl分类，shape也是。 # 手动修改点颜色，theme_minimal() 内置的图片主题b + geom_point(aes(color = cyl, shape = cyl)) + scale_color_manual(values = c("#999999", "#E69F00", "#56B4E9")) + theme_minimal()

geom_smooth（）：平滑线

要在散点图上添加回归线，函数geom_smooth（）将与参数method = lm结合使用。lm代表线性模型。必须掌握：在做相关性分析的时候常用。

# 添加回归曲线 b + geom_smooth(method = lm) 
# 散点图+回归线  b + geom_point() + geom_smooth(method = lm, se = FALSE) 
# 使用loess方法 b + geom_point() + geom_smooth() 
# 改变颜色和形状 b + geom_point(aes(color=cyl, shape=cyl)) +  geom_smooth(aes(color=cyl, shape=cyl), method=lm, se=FALSE, fullrange=TRUE)

与geom_smooth（）有一个相同功能的函数stat_smooth（），这就是我们之前说的geom_*与stat_*一一对应的关系。

b + stat_smooth(method = "lm")

geom_quantile(): 分位线。做个了解，实际应用不多。

ggplot(mpg, aes(cty, hwy)) + geom_point() + geom_quantile() + theme_minimal()

另一相同功能的函数：stat_quantile()

ggplot(mpg, aes(cty, hwy)) +   geom_point() + stat_quantile(quantiles = c(0.25, 0.5, 0.75))

geom_rug(): 边际地毯线。下面使用数据集faithful。通过边际地毯，可以快速查看每个坐标轴上数据的密疏情况。还可以通过向边际地毯线的位置坐标添加扰动并设定size减少线宽，从而减轻边际地毯线的重叠程度。必须理解：以后做ssgesa就很容易看懂了。

ggplot(data = faithful, aes(x=eruptions, y=waiting))+   geom_point()+geom_rug()

geom_jitter（）：避免重叠。函数geom_jitter() 是函数geom_point(position = ‘jitter’) 的简化形式，下面的例子将使用数据集mpg。

p <- ggplot(mpg, aes(displ, hwy)) # 添加散点图 p + geom_point() # 避免重叠 p + geom_jitter(position = position_jitter(width = 0.5, height = 0.5)

可以使用函数position_jitter() 中的width 和width 参数来调整抖动的程度：

width：x轴方向的抖动幅度
height：y轴方向的抖动幅度

geom_text（）：文本注释

参数label 用来指定注释标签。

b + geom_text(aes(label = rownames(mtcars)))

03. 两个变量：连续二元分布

老规矩，先介绍数据集，用的是内置的diamonds数据集。

data(diamonds) head(diamonds[, c("carat", "price")] # 老规矩，查看前几行数据，了解数据结构

# 首先绘制一个图层c，然后再逐层添加。c <- ggplot(diamonds, aes(carat, price))

可能添加的图层有——以下仅作了解，实际用的机会比较小：

geom_bin2d()：二维封箱热图
geom_hex():：六边形封箱图
geom_density_2d():：二维等高线密度图

geom_bin2d（）：二维封箱热图

geom_bin2d() 将点的数量用矩形封装起来，通过颜色深浅来反映点密度。

# 二维封箱热图 c + geom_bin2d() # 改变bin的数量 c + geom_bin2d(bins = 15)

另一相同功能的函数：stat_bin_2d(), stat_summary_2d()

c + stat_bin_2d() c + stat_summary_2d(aes(z = depth))

geom_hex（）：六边形封箱图

geom_hex()依赖于另一个R包hexbin，所以没安装的先安装：

install.packages("hexbin")

geom_hex()函数的用法如下：

require(hexbin)

# 六边形封箱图 c + geom_hex() # 改变bin的数量 c + geom_hex(bins = 10

另一相同功能的函数：stat_bin_hex(), stat_summary_hex()

c + stat_bin_hex() c + stat_summary_hex(aes(z = depth))

geom_density_2d（）：二维等高线密度图

geom_density_2d() 或 stat_density_2d() 可将二维等高线密度图添加到散点图上，首先绘制一个散点图：

sp <- ggplot(faithful, aes(x=eruptions, y=waiting)) # 添加二维等高线密度图 sp + geom_density_2d() # 添加散点图 sp + geom_point() + geom_density_2d() # 将默认图形改为多边形 sp + geom_point() + stat_density_2d(aes(fill = ..level..), geom="polygon")

另一相同功能的函数：stat_density_2d()

sp + stat_density_2d()

03 两个变量：连续函数（重点，常用）

在本节中，主要是关于如何用线来连接两个变量。老规矩，数据集内置的economics。

data(economics) head(economics)

首先绘制一个图层d，然后逐层绘制。

d <- ggplot(economics, aes(x = date, y = unemploy))

可能添加的图层有：

geom_area()：面积图
geom_line()：折线图
geom_step():：阶梯图

# 面积图 d + geom_area() # 折线图 d + geom_line() # 阶梯图 set.seed(1234) ss <- economics[sample(1:nrow(economics), 20), ] ggplot(ss, aes(x = date, y = unemploy)) + geom_step()

04. 两个变量：x离散，y连续

下面将使用数据集ToothGrowth,其中的变量len(Tooth length)是连续变量，dose是离散变量。

data("ToothGrowth") ToothGrowth$dose <- as.factor(ToothGrowth$dose) head(ToothGrowth)

首先绘制一个图层e，然后逐层绘制。

e <- ggplot(ToothGrowth, aes(x = dose, y = len))

可能添加的图层有：

geom_boxplot(): 箱线图
geom_violin()：小提琴图
geom_dotplot()：点图
geom_jitter(): 带状图
geom_line(): 线图
geom_bar(): 条形图

geom_boxplot（）：箱线图

# 箱线图 e + geom_boxplot() # 添加有缺口的箱线图 e + geom_boxplot(notch = TRUE) # 改变颜色 e + geom_boxplot(aes(color = dose)) # 改变填充色 e + geom_boxplot(aes(fill = dose))

# 多组的箱线图 ggplot(ToothGrowth, aes(x=dose, y=len, fill=supp)) +  geom_boxplot()

另一相同功能的函数：stat_boxplot()

e + stat_boxplot(coeff = 1.5)

geom_violin（）：小提琴图

# 添加小提琴图 e + geom_violin(trim = FALSE) # 添加中值点

e + geom_violin(trim = FALSE) + stat_summary(fun.data="mean_sdl",  fun.args = list(mult=1), geom="pointrange", color = "red") # 与箱线图结合 e + geom_violin(trim = FALSE) + geom_boxplot(width = 0.2) # 将dose映射给颜色进行分组  e + geom_violin(aes(color = dose), trim = FALSE)

另一相同功能的函数：stat_ydensity()

e + stat_ydensity(trim = FALSE)

geom_dotplot（）：点图

# 添加点图 e + geom_dotplot(binaxis = "y", stackdir = "center") # 添加中值点 e + geom_dotplot(binaxis = "y", stackdir = "center") +  stat_summary(fun.data="mean_sdl",  fun.args = list(mult=1),  geom="pointrange", color = "red")

# 与箱线图结合 e + geom_boxplot() + geom_dotplot(binaxis = "y", stackdir = "center")  # 添加小提琴图 e + geom_violin(trim = FALSE) +geom_dotplot(binaxis='y', stackdir='center')    # 将dose映射给颜色以及填充色  e + geom_dotplot(aes(color = dose, fill = dose), binaxis = "y", stackdir = "center")

geom_jitter（）：带状图

带状图是一种一维散点图，当样本量很小时，与箱线图相当。

# 添加带状图 e + geom_jitter(position=position_jitter(0.2)) # 添加中值点 e + geom_jitter(position=position_jitter(0.2)) + stat_summary(fun.data="mean_sdl",  fun.args = list(mult=1), geom="pointrange", color = "red")

# 与点图结合 e + geom_jitter(position=position_jitter(0.2)) + geom_dotplot(binaxis = "y", stackdir = "center")  # 与小提琴图结合 e + geom_violin(trim = FALSE) + geom_jitter(position=position_jitter(0.2))    # 将dose映射给颜色和形状  e +  geom_jitter(aes(color = dose, shape = dose), position=position_jitter(0.2))

geom_line（）：线图

# 构造一个数据集 df <- data.frame(supp=rep(c("VC", "OJ"), each=3), dose=rep(c("D0.5", "D1", "D2"),2),len=c(6.8, 15, 33, 4.2, 10, 29.5)) head(df)

# 将supp变量映射给线型：# 改变线型 ggplot(df, aes(x=dose, y=len, group=supp)) +geom_line(aes(linetype=supp))+ geom_point() # 修改线型、点的形状以及颜色 ggplot(df, aes(x=dose, y=len, group=supp)) +   geom_line(aes(linetype=supp, color = supp))+   geom_point(aes(shape=supp, color = supp))

geom_bar（）：条形图

# 构造一个数据集 df <- data.frame(dose=c("D0.5", "D1", "D2"),  len=c(4.2, 10, 29.5))df2 <- data.frame(supp=rep(c("VC", "OJ"), each=3), dose=rep(c("D0.5", "D1", "D2"),2), len=c(6.8, 15, 33, 4.2, 10, 29.5)) 绘制一个图层：f <- ggplot(df, aes(x = dose, y = len)# 添加条形图f + geom_bar(stat = "identity") # 修改填充色以及添加标签 f + geom_bar(stat="identity", fill="steelblue")+ geom_text(aes(label=len), vjust=-0.3, size=3.5)+ theme_minimal() # 将dose映射给条形图颜色 f + geom_bar(aes(color = dose),stat="identity", fill="white") # 修改填充色 f + geom_bar(aes(fill = dose), stat="identity")

g <- ggplot(data=df2, aes(x=dose, y=len, fill=supp))  # 堆积条形图，position参数默认值为stack g + geom_bar(stat = "identity") # 修改position为dodge g + geom_bar(stat="identity", position=position_dodge())

另一相同功能的函数：stat_identity()

g + stat_identity(geom = "bar") g + stat_identity(geom = "bar", position = "dodge")

04. 两个变量：X，Y皆离散

数据集diamonds中的两个离散变量color以及cut。

ggplot(diamonds, aes(cut, color)) +   geom_jitter(aes(color = cut), size = 0.5)  # 散点抖动

两个变量：绘制误差图

df <- ToothGrowth df$dose <- as.factor(df$dose)

下面这个函数用来计算每组的均值以及标准误。

data_summary <- function(data, varname, grps){  require(plyr)   summary_func <- function(x, col){ c(mean = mean(x[[col]], na.rm=TRUE),       sd = sd(x[[col]], na.rm=TRUE))   }   data_sum <- ddply(data, grps, .fun=summary_func, varname)   data_sum <- rename(data_sum, c("mean" = varname))  return(data_sum) }
df2 <- data_summary(df, varname="len", grps= "dose") 
# 将dose转换为因子型变量 df2$dose = as.factor(df2$dose)
# 创建一个图层f。f <- ggplot(df2, aes(x = dose, y = len, ymin = len-sd, ymax = len+sd))

可添加的图层有：
    geom_crossbar(): 空心柱，上中下三线分别代表ymax、mean、ymin
    geom_errorbar(): 误差棒
    geom_errorbarh(): 水平误差棒
    geom_linerange()：竖直误差线
    geom_pointrange()：中间为一点的误差线

geom_crossbar（）：空心柱，上中下三线分别代表ymax、mean、ymin

# 添加空心柱 f + geom_crossbar() # 将dose映射给颜色 f + geom_crossbar(aes(color = dose)) # 手动修改颜色 f + geom_crossbar(aes(color = dose)) +  scale_color_manual(values = c("#999999", "#E69F00", "#56B4E9"))+   theme_minimal() # 修改填充色 f + geom_crossbar(aes(fill = dose)) + scale_fill_manual(values = c("#999999", "#E69F00", "#56B4E9"))+   theme_minimal()

# 构造数据集df3。df3 <- data_summary(df, varname="len", grps= c("supp", "dose")) 
f <- ggplot(df3, aes(x = dose, y = len,  ymin = len-sd, ymax = len+sd)) # 将supp映射给颜色 f + geom_crossbar(aes(color = supp)) # 避免重叠 f + geom_crossbar(aes(color = supp), position = position_dodge(1))

geom_crossbar（）的一个替代方法是使用函数stat_summary（）。在这种情况下，可以自动计算平均值和标准误。

f <- ggplot(df, aes(x = dose, y = len, color = supp))  f + stat_summary(fun.data="mean_sdl", fun.args = list(mult=1), geom="crossbar", width = 0.6, position = position_dodge(0.8))

geom_errorbar（）：误差棒

创建一个图层f。
f <- ggplot(df2, aes(x = dose, y = len, ymin = len-sd, ymax = len+sd))
# 将dose映射给颜色 f + geom_errorbar(aes(color = dose), width = 0.2) # 与折线图结合 f + geom_line(aes(group = 1)) + geom_errorbar(width = 0.2) # 与条形图结合，并将dose映射给颜色 f + geom_bar(aes(color = dose), stat = "identity", fill ="white") +  geom_errorbar(aes(color = dose), width = 0.2)

geom_errorbarh（）：水平误差棒

# 构造数据集df2。df2 <- data_summary(ToothGrowth, varname="len", grps = "dose") df2$dose <- as.factor(df2$dose) head(df2)

# 创建一个图层f。# 参数xmin与xmax用来设置水平误差棒。f <- ggplot(df2, aes(x = len, y = dose , xmin=len-sd, xmax=len+sd))f + geom_errorbarh()# 通过映射实现分组。f + geom_errorbarh(aes(color=dose))

geom_linerange（）：竖直误差线

f <- ggplot(df2, aes(x = dose, y = len,  ymin=len-sd, ymax=len+sd)) # Line range f + geom_linerange()

geom_pointrange（）：中间为一点的误差线

# Point range f + geom_pointrange()

点图+误差棒

# 创建一个图层g。g <- ggplot(df, aes(x=dose, y=len)) + geom_dotplot(binaxis='y', stackdir='center')
# 添加空心柱 g + stat_summary(fun.data="mean_sdl", fun.args = list(mult=1), geom="crossbar", width=0.5) # 添加误差棒 g + stat_summary(fun.data=mean_sdl, fun.args = list(mult=1), geom="errorbar", color="red", width=0.2) +   stat_summary(fun.y=mean, geom="point", color="red")     # 添加中间为一点的误差线 g + stat_summary(fun.data=mean_sdl, fun.args = list(mult=1),  geom="pointrange", color="red")

参考数据

《R数据科学》https://r4ds.had.co.nz/data-visualisation.html
《ggplot2：数据分析与图形艺术》

芒果师兄

1.生信技能和基因编辑。2.论文发表和基金写作。3. 健康管理和医学科研资讯。4.幸福之路，读书，音乐和娱乐。