ggplot2图片自定义与拼图

文摘   科技   2024-10-18 09:05   江苏  

写在前面


此前我们进行过ggplot2的基础教学(可视化初探(ggplot2)),本次内容我们将进行坐标轴、颜色、主题、字体、图例、标注等方面的修改。更多R语言技巧可参考:
R语言基础学习手册
本文输出图片集锦:


18.1 修改标尺 


本次教程除了ggplot2和dplyr包,还需要安装用于数据的ISLR和gapminder;用于改进图形的ggrepel, showtext, patchwork和plotly。


18.1.1 自定义坐标轴

ggplot2中,函数scale_x_* 和 scale_y_* 控制图形的x y轴,其中*指定标尺类型。主要有以下几种:

  • 18.1.1.1 自定义连续型变量的坐标轴

在函数scale_*_continuous 中的一些常用选项包括:

  • 18.1.1.1.1 举例

用数据集mtcars,绘制重量(wt)与汽车的燃油效率(mpg)散点图:

1.简单版

library(ggplot2)
ggplot(data=mtcars,aes(x=wt,y=mpg))+
geom_point()+
labs(title = "Fuel efficiency by car weight")

2.修改版

进行如下修改:

1.对于重量(wt):添加轴标签”Weight(1000lbs)“,标尺范围设为1.5到5.5,使用10个主刻度线,不显示次刻度线;

2.对于每加仑汽油行驶英里数(mpg):添加轴标签”Miles per gallon”,标尺范围设为10-35,将主刻度线设定在10、15、20、25、30和35,以1加仑为单位绘制次刻度线。

library(ggplot2)
ggplot(data=mtcars,aes(x=wt,y=mpg))+
geom_point()+
labs(title = "Fuel efficiency by car weight")+
scale_x_continuous(name = "Weight(1000lbs)",
n.breaks=10,
limits = c(1.5,5.5),
minor_breaks = NULL)+
scale_y_continuous(name = "Miles per gallon",
limits= c(10,35),
minor_breaks=seq(10,35,1))

  • 18.1.1.2 自定义分类变量的坐标轴

ggplot2中常用的scale_*_discrete选项如下:

参数描述
name标尺名称,与函数labs(x=,y=)等同
breaks刻度的数字向量
limits定义标尺及其顺序的值的字符向量
labels提供标签的字符向量(必须与breaks参数的长度一致)使用labels=abbreviate可以缩短长标签
position坐标轴位置(y轴的左边/右边,x轴的上/下边),left or right for y axes, top or bottom for x axes.
  • 18.1.1.2.1 举例

用ISLR包中的Wage数据框进行举例,数据包含2011年收集的美国某地区3000名男性员工的工资和人口统计信息。绘制此数据样本中婚姻状态与受教育程度间的关系图。

1.简单绘图

library(ISLR)
## Warning: 程辑包'ISLR'是用R版本4.3.2 来建造的
## 
## 载入程辑包:'ISLR'
## The following object is masked from 'package:vcd':
##
## Hitters
library(ggplot2)
data(Wage,package = "ISLR")
ggplot(data=Wage,aes(maritl,fill=education))+
geom_bar(position = "fill")+
labs(title = "Participant Education by Maritlr")

2.修改绘图

针对x轴:x轴名称修改为”Maritl”,将标签前面的编号去掉;

针对y轴:y轴名称修改为”Percent”,y轴使用百分比格式。

library(ISLR)
library(ggplot2)
library(scales)
head(Wage)
##        year age           maritl     race       education             region
## 231655 2006 18 1. Never Married 1. White 1. < HS Grad 2. Middle Atlantic
## 86582 2004 24 1. Never Married 1. White 4. College Grad 2. Middle Atlantic
## 161300 2003 45 2. Married 1. White 3. Some College 2. Middle Atlantic
## 155159 2003 43 2. Married 3. Asian 4. College Grad 2. Middle Atlantic
## 11443 2005 50 4. Divorced 1. White 2. HS Grad 2. Middle Atlantic
## 376662 2008 54 2. Married 1. White 4. College Grad 2. Middle Atlantic
## jobclass health health_ins logwage wage
## 231655 1. Industrial 1. <=Good 2. No 4.318063 75.04315
## 86582 2. Information 2. >=Very Good 2. No 4.255273 70.47602
## 161300 1. Industrial 1. <=Good 1. Yes 4.875061 130.98218
## 155159 2. Information 2. >=Very Good 1. Yes 5.041393 154.68529
## 11443 2. Information 1. <=Good 1. Yes 4.318063 75.04315
## 376662 2. Information 2. >=Very Good 1. Yes 4.845098 127.11574
data(Wage,package = "ISLR")
ggplot(data=Wage,aes(maritl,fill=education))+
geom_bar(position = "fill")+
labs(title = "Participant Education by Maritlr")+
scale_x_discrete(name="Maritl",
labels=c("Never Married","Married","Widowed","Divorced","Separated"))+
scale_y_continuous(name="Percent",
label= percent_format(accuracy=2),
n.breaks=10)

注意:垂直轴(y)代表的是数值型变量,因此需要用scale_y_continuous 而不是scale_y_discrete。

18.1.2 自定义颜色

ggplot2中scale_color_* ()函数用于点、线、边界和文本颜色设定, scale_fill_* ()函数用于带面积的形状对象的颜色填充。

常用的设定颜色标尺的函数有:

  • 18.1.2.1 连续型调色板

依然用mtcars数据集举例,绘制了汽车重量(wt)和燃油效率(mpg)的关系图,通过映射发动机排量(disp)到点的颜色来添加第3个变量。

library(ggplot2)
p <- ggplot(mtcars,aes(x=wt,y=mpg,color=disp))+
geom_point(shape=19,size=3)+
scale_x_continuous(name = "Weight(1000 1bs.)",
n.breaks = 10,
minor_breaks = NULL,
limits = c(1.5,5.5))+
scale_y_continuous(name = "Mile per gallon",
breaks = seq(10,35,5),
minor_breaks = seq(10,35,1),
limits = c(10,35))
#设置颜色
p+ggtitle("A.Default color gradient")

p+scale_color_gradient(low = "grey",high = "black")+
ggtitle("B.Greyscale gradient")

p+scale_color_gradient(low = "red",high = "blue")+
ggtitle("C.Red-blue color Gradient")

p+scale_color_steps(low = "red",high = "blue")+
ggtitle("D.Red-blue binned color Gradient")

p+scale_color_steps2(low = "red",mid="white",high = "blue")+
ggtitle("E.Red-white-blue binned color Gradient")

p+scale_color_viridis_c(direction = -1)+
ggtitle("F.Viridis color gradient")

可以看出分桶与不分桶的区别是:不分桶的颜色变化是连续的,而分桶的颜色是具有梯度的。

  • 18.1.2.2 分类型调色板

用ISLR包中的Wage数据框进行举例,数据包含2011年收集的美国某地区3000名男性员工的工资和人口统计信息。education是分类变量,映射到离散颜色。

library(ISLR)
library(ggplot2)
library(scales)
b <- ggplot(data=Wage,aes(maritl,fill=education))+
geom_bar(position = "fill")+
labs(title = "Participant Education by Maritlr")+
scale_x_discrete(name="Maritl",
labels=c("Never Married","Married","Widowed","Divorced","Separated"))+
scale_y_continuous(name="Percent",
label= percent_format(accuracy=2),
n.breaks=10)
#设置颜色
b+ggtitle("A.Default colors")

b+scale_fill_brewer(palette = "Set2")+
ggtitle("B.ColorBrewer Set2 palette")

b+scale_fill_viridis_d()+
ggtitle("C.Viridis color scheme")

b+scale_fill_manual(values = c("gold4","orange2","deepskyblue3","brown2","yellowgreen"))+
ggtitle("D.Manual color scheme")

注意:这里education属于数值型分类变量;另外,注意使用scale_fill_而非scale_color_

18.2 修改主题


ggplot2中 theme()可以用于自定义图形的非数据部分。具体参数可见?theme()

主图元素:

例如:将x y轴的标题设置为14点蓝色字体

library(ggplot2)
ggplot(mtcars,aes(x=wt,y=mpg))+
geom_point()+
theme(axis.title = element_text(size = 14,color = "blue"))

18.2.1 预置主题

在ggplot2中,有8个预设主题可供选择。分别是:

theme_gray(): 默认灰色主题,具有简洁的灰色背景和白色网格线。

theme_bw(): 黑白主题,具有白色背景和黑色网格线。

theme_minimal(): 极简主题,具有简洁的白色背景和无网格线

theme_classic(): 经典主题,具有白色背景、黑色xy边框。

theme_void(): 空白主题,没有背景、网格线和边框,只有数据图形。

theme_light(): 亮色主题,具有浅色的背景和网格线。

theme_dark(): 暗色主题,具有深色的背景和网格线。

theme_linedraw(): 线条绘制主题,具有简单的黑色线条和白色背景。

用实例看一下:

library(ggplot2)
p <- ggplot(mtcars,aes(x=wt,y=mpg))+
geom_point()
p+theme_grey()+ggtitle("theme_grey")

p+theme_bw()+ggtitle("theme_bw")

p+theme_minimal()+ggtitle("theme_minimal")

p+theme_classic()+ggtitle("theme_classic")

p+theme_void()+ggtitle("theme_void")

p+theme_light()+ggtitle("theme_light")

p+theme_dark()+ggtitle("theme_dark")

p+theme_linedraw()+ggtitle("theme_linedraw")

18.2.2 自定义字体

  • 18.2.2.1 操作方法

a.下载本地/google字体

A.查找本地字体文件:

findfont <- function(x){
suppressMessages(require(showtext))
suppressMessages(require(dplyr))
filter(font_files(),grepl(x,family,ignore.case=TRUE))%>%
select(path,file,family,face)
}
findfont("comic")
## Warning: 程辑包'showtext'是用R版本4.3.2 来建造的
## Warning: 程辑包'sysfonts'是用R版本4.3.2 来建造的
## Warning: 程辑包'showtextdb'是用R版本4.3.2 来建造的
## Error in select(., path, file, family, face): 参数没有用(path, file, family, face)

将此文件加载到R,并使用自定义名称”comic”:

font_add("comic",regular = "comic.ttf",
bold = "comicbd.ttf")

B.goole字体下载方式:

font_add_google("name","family")

name:google字体的名称;family:自定义名称,后续代码中将用此名称来引用该字体。

例如:

font_add_google("Schoolbell","bell")

b.将showtext设为图形输出的设备

showtext_auto()

c.在函数ggplot2的theme()中指定字体

可以使用element_text()指定字体的系列、字形、大小、颜色和方向:theme(* =element_text()), *指与文本相关的theme()参数,包括:

参数描述
axis.title, axis.title.x, axis.title.y坐标轴标题
axis.text, axis.text.x, axis.text.y坐标轴上的刻度线标签
legend.text, legend.title图例项标签和图例标题
plot.title, plot.subtitle, plot.caption图例题,副标题和图的标注栏
strip.text, strip.text.x, strip.text.y分图标签
  • 18.2.2.2 举例

library(extrafont)
## Registering fonts with R
## 
## 载入程辑包:'extrafont'
## The following object is masked from 'package:showtextdb':
##
## font_install
library(sysfonts)
font_add("comic",regular = "comic.ttf",
bold = "comicbd.ttf")

font_add_google("Open Sans","Sans")

showtext_auto()

p <- ggplot(mtcars,aes(x=wt,y=mpg))+
geom_point()+
labs(title = "Fuel Efficiency by Car Weight")

p+theme(plot.title = element_text(family = "Sans",size=14),
axis.title = element_text(family = "comic"))

18.2.3 自定义图例

与图例有关的theme()参数:

  • 18.2.3.1 举例

还是用mtcars数据集,wt为x轴,mpg为y轴,根据发动机气缸数量给点选择颜色。并进行图例修改:

将图例放置在图的右上角;添加图例的标题Cylinders”;横向列出图例类别;将图例背景设为浅灰色,并去除主要元素(带颜色的符号)周围的背景;给图例添加白色边框。

library(ggplot2)
p <- ggplot(mtcars,aes(x=wt,y=mpg,color=factor(cyl)))+
geom_point(size=3)+
scale_color_discrete(name="Cylinders")+
labs(title = "Fuel Efficiency for 32 Automobiles",
x="weight(1000 1bs",
y="miles per gallon")
p

p+theme(legend.position = c(0.95,0.95),
legend.justification = c(1,1),
legend.title = element_text("Cylinders"),
legend.background = element_rect(fill ="grey",
color = "white"),
legend.key =element_blank())
## Warning in grid.Call(C_stringMetric, as.graphicsAnnot(x$label)): font family
## 'Cylinders' not found, will use 'sans' instead

## Warning in grid.Call(C_stringMetric, as.graphicsAnnot(x$label)): font family
## 'Cylinders' not found, will use 'sans' instead

## Warning in grid.Call(C_stringMetric, as.graphicsAnnot(x$label)): font family
## 'Cylinders' not found, will use 'sans' instead

## Warning in grid.Call(C_stringMetric, as.graphicsAnnot(x$label)): font family
## 'Cylinders' not found, will use 'sans' instead

## Warning in grid.Call(C_stringMetric, as.graphicsAnnot(x$label)): font family
## 'Cylinders' not found, will use 'sans' instead

## Warning in grid.Call(C_stringMetric, as.graphicsAnnot(x$label)): font family
## 'Cylinders' not found, will use 'sans' instead

## Warning in grid.Call(C_stringMetric, as.graphicsAnnot(x$label)): font family
## 'Cylinders' not found, will use 'sans' instead

## Warning in grid.Call(C_stringMetric, as.graphicsAnnot(x$label)): font family
## 'Cylinders' not found, will use 'sans' instead

## Warning in grid.Call(C_stringMetric, as.graphicsAnnot(x$label)): font family
## 'Cylinders' not found, will use 'sans' instead

## Warning in grid.Call(C_stringMetric, as.graphicsAnnot(x$label)): font family
## 'Cylinders' not found, will use 'sans' instead

## Warning in grid.Call(C_stringMetric, as.graphicsAnnot(x$label)): font family
## 'Cylinders' not found, will use 'sans' instead

## Warning in grid.Call(C_stringMetric, as.graphicsAnnot(x$label)): font family
## 'Cylinders' not found, will use 'sans' instead

## Warning in grid.Call(C_stringMetric, as.graphicsAnnot(x$label)): font family
## 'Cylinders' not found, will use 'sans' instead

## Warning in grid.Call(C_stringMetric, as.graphicsAnnot(x$label)): font family
## 'Cylinders' not found, will use 'sans' instead
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family 'Cylinders' not found, will use 'sans' instead

## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family 'Cylinders' not found, will use 'sans' instead

## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family 'Cylinders' not found, will use 'sans' instead

## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family 'Cylinders' not found, will use 'sans' instead

## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family 'Cylinders' not found, will use 'sans' instead

## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family 'Cylinders' not found, will use 'sans' instead

## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family 'Cylinders' not found, will use 'sans' instead

## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family 'Cylinders' not found, will use 'sans' instead

## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family 'Cylinders' not found, will use 'sans' instead

## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family 'Cylinders' not found, will use 'sans' instead

## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family 'Cylinders' not found, will use 'sans' instead

## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family 'Cylinders' not found, will use 'sans' instead

## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family 'Cylinders' not found, will use 'sans' instead

## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family 'Cylinders' not found, will use 'sans' instead

## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family 'Cylinders' not found, will use 'sans' instead

## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family 'Cylinders' not found, will use 'sans' instead

## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family 'Cylinders' not found, will use 'sans' instead

## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family 'Cylinders' not found, will use 'sans' instead

## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family 'Cylinders' not found, will use 'sans' instead

## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family 'Cylinders' not found, will use 'sans' instead

## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family 'Cylinders' not found, will use 'sans' instead

## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family 'Cylinders' not found, will use 'sans' instead

## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family 'Cylinders' not found, will use 'sans' instead

## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family 'Cylinders' not found, will use 'sans' instead

## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family 'Cylinders' not found, will use 'sans' instead
## Warning in grid.Call.graphics(C_text, as.graphicsAnnot(x$label), x$x, x$y, :
## font family 'Cylinders' not found, will use 'sans' instead

## Warning in grid.Call.graphics(C_text, as.graphicsAnnot(x$label), x$x, x$y, :
## font family 'Cylinders' not found, will use 'sans' instead

18.2.4 自定义绘图区

theme()参数可以自定义绘图区,代码清单如下:

  • 18.2.4.1 举例

在这个例子中:主网格线设置为灰色实线,次网格线设置为灰色虚线,带状标签背景设置为白色,并把位置放在上方。

library(ggplot2)
mtcars$am <- factor(mtcars$am,labels = c("Automatic","Manual"))
ggplot(mtcars,aes(x=disp,y=mpg))+
geom_point(aes(color=factor(cyl)),size=2)+
geom_smooth(method = "lm",
formula=y~x+I(x^2),linetype="dotted",se=FALSE)+
facet_wrap(~am,ncol=2)+
theme_bw()+
theme(strip.background = element_rect(fill="white"),
panel.grid.major = element_line(color = "lightgrey"),
panel.grid.minor = element_line(color = "lightgrey",
linetype = "dashed"),
legend.position = "top")


18.3 添加标注


添加标注参数如下:

18.3.1 给数据点添加标签

当观测值以点在图上显示时难以分辨不同的点代表的是哪个观测值,例如mtcars数据中以汽车重量(wt)和行驶英里数(mpg)之间关系图。

library(ggplot2)
ggplot(mtcars,aes(x=wt,y=mpg))+
geom_point(color="steelblue")

分别使用函数geom_text()和geom_label()对观测点进行添加标签:

library(ggplot2)
ggplot(mtcars,aes(x=wt,y=mpg))+
geom_point(color="steelblue")+
geom_text(label=row.names(mtcars))

library(ggplot2)
ggplot(mtcars,aes(x=wt,y=mpg))+
geom_point(color="steelblue")+
geom_label(label=row.names(mtcars))

文本重叠严重,下面使用geom_text_repel()和geom_label_repel()函数添加标注:

library(ggplot2)
library(ggrepel)
ggplot(mtcars,aes(x=wt,y=mpg))+
geom_point(color="steelblue")+
geom_text_repel(label=row.names(mtcars))

library(ggplot2)
ggplot(mtcars,aes(x=wt,y=mpg))+
geom_point(color="steelblue")+
geom_label_repel(label=row.names(mtcars))

结果显示文本重叠的问题被解决了。

18.3.2 给条形添加标签

1.常规条形图

用ISLR包中的Wage数据集举例,画出该数据集中不同婚姻状态的百分比。

先计算出百分比:

library(ISLR)
library(dplyr)
plotdata <- Wage %>%
group_by(maritl) %>%
summarise(n=n()) %>%
mutate(pct=n/sum(n),
lbls=scales::percent(pct))

plotdata
## # A tibble: 5 × 4
## maritl n pct lbls
## <fct> <int> <dbl> <chr>
## 1 1. Never Married 648 0.216 21.6%
## 2 2. Married 2074 0.691 69.1%
## 3 3. Widowed 19 0.00633 0.6%
## 4 4. Divorced 204 0.068 6.8%
## 5 5. Separated 55 0.0183 1.8%

作图:

注意,这里stat=“identity”是必须,以避免x y 美学投射报错;因为stat默认是count。

柱状图中的geom_text则是主要label(标签内容)、vjust(标签位置)和size(标签大小)的设置。

library(ggplot2)
ggplot(plotdata,aes(x=maritl,y=pct))+
geom_bar(stat = "identity",fill="steelblue")+
geom_text(aes(label=lbls),
vjust=-0.5,size=3)+
theme_bw()

18.4 图形组合


使用patchwork包进行,首先需要将待组合的单图分别保存为单独的对象并命名,然后直接用|以及/来拼图,A|B:将图A和B左右组合,A/B:将图A和B上下组合;并可以使用单括号()创建图形的子组,例如:(A|B)/(C|D):就是将ABCD图组合为左上A,右上B,左下C和右下D。

举例:

library(ggplot2)
library(patchwork)
## 
## 载入程辑包:'patchwork'
## The following object is masked from 'package:MASS':
##
## area
P1 <- ggplot(mtcars,aes(disp,mpg))+
geom_point()

P2 <- ggplot(mtcars,aes(factor(cyl),mpg))+
geom_boxplot()

P3 <- ggplot(mtcars,aes(mpg))+
geom_histogram(bins = 8,color="white")

(P1|P2)/P3


完整教程请查看


R语言基础学习手册


如何联系我们

公众号后台消息更新不及时,超过48h便不许回复读者消息,这里给大家留一下领取资料及免费服务器(生信分析为什么要使用服务器?)的微信号,方便各位随时交流、提建议(科研任务繁重,回复不及时请见谅)。此外呼声一直很高的交流群也建好了,欢迎大家入群讨论:

永久免费的千人生信、科研交流群

大家可以阅读完这几篇之后添加
给生信入门初学者的小贴士
如何搜索公众号过往发布内容

您点的每个赞和在看,我都认真当成了喜欢


Biomamba 生信基地
本人为在读博士研究生,此公众号旨在分享生信知识及科研经验与体会,欢迎各位同学、老师与专家的批评指正,也欢迎各界人士的合作与交流。
 最新文章