新特性: 等待PostgreSQL 17 -增加增量备份支持

文摘   科技   2024-07-31 06:02   上海  

1 前言

本来依然来自postgresweekly推荐,内容不错。将其内容整理至这里。共享给广大PGer们。感觉内容不错的或者有自己看法的,欢迎文后留言。

2 全文

2.1 补丁

在2023年12月20日,Robert Haas提交了补丁[1]:

Add support for incremental backup.

To take an incremental backup, you use the new replication command
UPLOAD_MANIFEST to upload the manifest for the prior backup. This
prior backup could either be a full backup or another incremental
backup. You then use BASE_BACKUP with the INCREMENTAL option to take
the backup. pg_basebackup now has an --incremental=PATH_TO_MANIFEST
option to trigger this behavior.

An incremental backup is like a regular full backup except that
some relation files are replaced with files with names like
INCREMENTAL.${ORIGINAL_NAME}, and the backup_label file contains
additional lines identifying it as an incremental backup. The new
pg_combinebackup tool can be used to reconstruct a data directory
from a full backup and a series of incremental backups.

Patch by me. Reviewed by Matthias van de Meent, Dilip Kumar, Jakub
Wartak, Peter Eisentraut, and Álvaro Herrera. Thanks especially to
Jakub for incredibly helpful and extensive testing.

Discussion: http://postgr.es/m/CA+TgmoYOYZfMCyOXFyC-P+-mdrZqm5pP2N7S-r0z3_402h9rsA@mail.gmail.com

这是一个巨大的代码变更提交。虽然已经有工具提供增量备份(如pgBackRest),但内部(to pg) pg_basebackup,到目前为止还不允许它搞增备。

它可以用于从服务器上复制,并可选择压缩/tarball它。但目前为止仅此而已。

那么,让我们看看原来的pg_basebackup是如何工作的,以及新功能是做什么的。

2.2 实验验证

为了确保我做了变化,在一些不大的数据库上,我做了一个表:

=# CREATE TABLE just_for_fun (last_updated timestamptz);
=# INSERT INTO just_for_fun (last_updated) VALUES (now());

接着每秒钟都运行:

=# UPDATE just_for_fun SET last_updated = now();

有了这些,我可以运行简单的pg_basebackup:

=$ mkdir /var/tmp/backups; pg_basebackup -D /var/tmp/backups

=$ ls -l /var/tmp/backups/
total 360
-rw------- 1 pgdba pgdba 227 Jan 8 17:16 backup_label
-rw------- 1 pgdba pgdba 226076 Jan 8 17:16 backup_manifest
drwx------ 7 pgdba pgdba 4096 Jan 8 17:16 base/

-rw------- 1 pgdba pgdba 88 Jan 8 17:16 postgresql.auto.conf
-rw------- 1 pgdba pgdba 29806 Jan 8 17:16 postgresql.conf

这个命令基本上将整个PGDATA复制到/var/tmp/backups,如果我愿意的话,我可以使用pg_ctl -D /var/tmp/backups start在那里启动pg。当然,要做一些更改以避免与现有Pg冲突(例如端口号)。

这是普通(plain)备份。还有tar备份:

=$ rm -rf /var/tmp/backups/; mkdir /var/tmp/backups; pg_basebackup -Ft -D /var/tmp/backups

=$ ls -l /var/tmp/backups/
total 56176
-rw------- 1 pgdba pgdba 226218 Jan 8 17:19 backup_manifest
-rw------- 1 pgdba pgdba 40509440 Jan 8 17:19 base.tar
-rw------- 1 pgdba pgdba 16778752 Jan 8 17:19 pg_wal.tar

其内部基本上是相同的东西,但作为tar存档(默认情况下没有压缩)。有趣的是backup_manifest文件。它包含的内容如下:

=$ cat  /var/tmp/backups/backup_manifest  | head -n 10
{ "PostgreSQL-Backup-Manifest-Version": 1,
"Files": [
{ "Path": "backup_label", "Size": 227, "Last-Modified": "2024-01-08 16:21:14 GMT", "Checksum-Algorithm": "CRC32C", "Checksum": "f6db08ca" },
{ "Path": "tablespace_map", "Size": 0, "Last-Modified": "2024-01-08 16:21:14 GMT", "Checksum-Algorithm": "CRC32C", "Checksum": "00000000" },
{ "Path": "pg_xact/0000", "Size": 8192, "Last-Modified": "2024-01-08 16:21:13 GMT", "Checksum-Algorithm": "CRC32C", "Checksum": "c79e44f3" },
{ "Path": "PG_VERSION", "Size": 3, "Last-Modified": "2024-01-08 13:08:53 GMT", "Checksum-Algorithm": "CRC32C", "Checksum": "64440205" },
{ "Path": "pg_multixact/offsets/0000", "Size": 8192, "Last-Modified": "2024-01-08 13:09:02 GMT", "Checksum-Algorithm": "CRC32C", "Checksum": "23464490" },
{ "Path": "pg_multixact/members/0000", "Size": 8192, "Last-Modified": "2024-01-08 13:08:53 GMT", "Checksum-Algorithm": "CRC32C", "Checksum": "23464490" },
{ "Path": "conf.d/depesz.conf", "Size": 512, "Last-Modified": "2024-01-08 13:08:54 GMT", "Checksum-Algorithm": "CRC32C", "Checksum": "c6f171e0" },
{ "Path": "pg_ident.conf", "Size": 2640, "Last-Modified": "2024-01-08 13:08:53 GMT", "Checksum-Algorithm": "CRC32C", "Checksum": "0ce04d87" },

{ "Path": "base/5/2652", "Size": 16384, "Last-Modified": "2024-01-08 13:08:53 GMT", "Checksum-Algorithm": "CRC32C", "Checksum": "259eec8e" },
{ "Path": "pg_logical/replorigin_checkpoint", "Size": 8, "Last-Modified": "2024-01-08 16:21:13 GMT", "Checksum-Algorithm": "CRC32C", "Checksum": "c74b6748" },
{ "Path": "current_logfiles", "Size": 44, "Last-Modified": "2024-01-08 13:08:54 GMT", "Checksum-Algorithm": "CRC32C", "Checksum": "97357c1c" },
{ "Path": "log/postgresql-2024-01-08_140854.log", "Size": 1021834, "Last-Modified": "2024-01-08 16:21:14 GMT", "Checksum-Algorithm": "CRC32C", "Checksum": "d2498fb2" },
{ "Path": "global/pg_control", "Size": 8192, "Last-Modified": "2024-01-08 16:21:14 GMT", "Checksum-Algorithm": "CRC32C", "Checksum": "43872087" }
],
"WAL-Ranges": [
{ "Timeline": 1, "Start-LSN": "4/86000028", "End-LSN": "4/86000750" }
],
"Manifest-Checksum": "106517baea81404769cd4deb686ff58b58997308f0d90d9afbfa9d0111a5003d"}

此文件可用于验证备份是否完成。

但它也可以用来查看自备份以来发生了哪些变化

所以,让我们删除这个备份,在稍微复杂一点的路径上创建一个完整的备份:

=$ rm -rf /var/tmp/backups/; mkdir /var/tmp/backups/
=$ pg_basebackup -Ft -D "/var/tmp/backups/$( date +%Y-%m-%d_%H%M%S-FULL )"

=$ ls -l /var/tmp/backups/
total 4
drwx------ 2 pgdba pgdba 4096 Jan 8 17:39 2024-01-08_173902-FULL/

=$ ls -l /var/tmp/backups/2024-01-08_173902-FULL/
total 56356
-rw------- 1 pgdba pgdba 226219 Jan 8 17:39 backup_manifest
-rw------- 1 pgdba pgdba 40691712 Jan 8 17:39 base.tar
-rw------- 1 pgdba pgdba 16778752 Jan 8 17:39 pg_wal.tar

完美。现在,我可以:

=$ pg_basebackup -i /var/tmp/backups/2024-01-08_173902-FULL/backup_manifest -Ft -D "/var/tmp/backups/$( date +%Y-%m-%d_%H%M%S-INCREMENTAL )"
pg_basebackup: error: could NOT initiate base backup: ERROR: incremental backups cannot be taken unless WAL summarization IS enabled
pg_basebackup: removing DATA directory "/var/tmp/backups/2024-01-08_173956-INCREMENTAL"

糟糕。我不能。那么,这个WAL总结是什么呢? pg_basebackup的文档[2]没有提到它。但是PostgreSQL配置文档[3]可以。

于是,很快我便做了如下配置:

$ ALTER system SET summarize_wal = ON;
$ SELECT pg_reload_conf();

现在:

=$ pg_basebackup -i /var/tmp/backups/2024-01-08_173902-FULL/backup_manifest -Ft -D "/var/tmp/backups/$( date +%Y-%m-%d_%H%M%S-INCREMENTAL )"

=$ ls -l /var/tmp/backups/
total 8
drwx------ 2 pgdba pgdba 4096 Jan 8 17:39 2024-01-08_173902-FULL/
drwx------ 2 pgdba pgdba 4096 Jan 8 17:40 2024-01-08_174043-INCREMENTAL/

=$ ls -l /var/tmp/backups/2024-01-08_174043-INCREMENTAL/
total 23860
-rw------- 1 pgdba pgdba 236528 Jan 8 17:40 backup_manifest
-rw------- 1 pgdba pgdba 7413248 Jan 8 17:40 base.tar
-rw------- 1 pgdba pgdba 16778752 Jan 8 17:40 pg_wal.tar

甜的。备份完成后,比之前的备份要小得多(完整备份为40MB,增量备份为7MB)。

有趣的是,清单中的文件数量是相同的:

=$ jq '.Files | length' /var/tmp/backups/2024-01-08_173902-FULL/backup_manifest
1571

=$ jq '.Files | length' /var/tmp/backups/2024-01-08_174043-INCREMENTAL/backup_manifest
1571

快速查看清单显示,增量清单可以包含两种类型的文件:

=$ jq .Files[13] /var/tmp/backups/2024-01-08_174043-INCREMENTAL/backup_manifest
{
"Path": "global/1214_fsm",
"Size": 24576,
"Last-Modified": "2024-01-08 16:10:41 GMT",
"Checksum-Algorithm": "CRC32C",
"Checksum": "722d586a"
}

以及:

=$ jq .Files[12] /var/tmp/backups/2024-01-08_174043-INCREMENTAL/backup_manifest
{
"Path": "global/INCREMENTAL.2695",
"Size": 12,
"Last-Modified": "2024-01-08 13:08:53 GMT",
"Checksum-Algorithm": "CRC32C",
"Checksum": "e34c7d7c"
}

如果文件名没有以INCREMENTAL开头,那么它是普通文件,可以在tar中轻松找到。

但是,如果文件以INCREMENTAL开始,则必须从较早的备份中获取它。

看起来要找到原始文件,路径应该只删除“INCREMENTAL.”:

=$ jq '.Files[] | select(.Path == "global/2695")' /var/tmp/backups/2024-01-08_173902-FULL/backup_manifest
{
"Path": "global/2695",
"Size": 16384,
"Last-Modified": "2024-01-08 13:08:53 GMT",
"Checksum-Algorithm": "CRC32C",
"Checksum": "a3f40679"
}

虽然,它让我感到惊讶的是,校验和不匹配(e34c7d7c vs. a3f40679),但它可能只是因为文件名是生成校验和的数据的一部分。

好的。但是如何恢复这样的备份呢?显然,要做到这一点,我需要处理多个单独的备份,并将它们组合起来。

幸运的是,这里有一个专门用于此的工具:pg_combinebackup[4]

所以,让我们试一下:

=$ pg_combinebackup -o /var/tmp/backups/combined /var/tmp/backups/2024-01-08_173902-FULL/ /var/tmp/backups/2024-01-08_174043-INCREMENTAL/
pg_combinebackup: error: could not open file "/var/tmp/backups/2024-01-08_174043-INCREMENTAL//PG_VERSION": No such file or directory

好的。因此,看起来它需要目录中的PG_VERSION文件,这意味着它必须解压缩。这有点烦人,但我想我可以解决它:

=$ mkdir /var/tmp/backups/FULL
=$ tar -x -C /var/tmp/backups/FULL -f /var/tmp/backups/2024-01-08_173902-FULL/base.tar
=$ tar -x -C /var/tmp/backups/FULL/pg_wal/ -f /var/tmp/backups/2024-01-08_173902-FULL/pg_wal.tar
=$ cp /var/tmp/backups/2024-01-08_173902-FULL/backup_manifest /var/tmp/backups/FULL/

=$ mkdir /var/tmp/backups/INCR
=$ tar -x -C /var/tmp/backups/INCR -f /var/tmp/backups/2024-01-08_174043-INCREMENTAL/base.tar
=$ tar -x -C /var/tmp/backups/INCR/pg_wal -f /var/tmp/backups/2024-01-08_174043-INCREMENTAL/pg_wal.tar
=$ cp /var/tmp/backups/2024-01-08_174043-INCREMENTAL/backup_manifest /var/tmp/backups/INCR/

现在,我可以执行:

=$ pg_combinebackup -o /var/tmp/backups/combined /var/tmp/backups/FULL /var/tmp/backups/INCR

非常酷!

更酷的是,您可以轻松地进行增量备份,增量从“最后一次完整”开始,或从“任何先前的”开始。这可以很容易地用于创建任何增量备份链。

总而言之,我认为这是一个巨大的变化。感谢所有参与其中的人。

参考资料

[1]

Robert Haas提交了补丁: https://git.postgresql.org/pg/commitdiff/dc212340058b4e7ecfc5a7a81ec50e7a207bf288

[2]

pg_basebackup的文档: https://www.postgresql.org/docs/devel/app-pgbasebackup.html

[3]

PostgreSQL配置文档: https://www.postgresql.org/docs/devel/runtime-config-wal.html#RUNTIME-CONFIG-WAL-SUMMARIZATION

[4]

pg_combinebackup: https://www.postgresql.org/docs/devel/app-pgcombinebackup.html


我是【Sean】,  欢迎大家长按关注并加星公众号:数据库杂记。有好资源相送,同时为你提供及时更新。已关注的朋友,发送0、1到7,都有好资源相送。

往期导读: 
1. PostgreSQL中配置单双向SSL连接详解
2. 提升PSQL使用技巧:PostgreSQL中PSQL使用技巧汇集(1)
3. 提升PSQL使用技巧:PostgreSQL中PSQL使用技巧汇集(2)
4. PostgreSQL SQL的基础使用及技巧
5. PostgreSQL开发技术基础:过程与函数
6. PostgreSQL中vacuum 物理文件truncate发生的条件
7. PostgreSQL中表的年龄与Vacuum的实验探索:Vacuum有大用
8. PostgreSQL利用分区表来弥补AutoVacuum的不足
9. 也聊聊PostgreSQL中的空间膨胀与AutoVacuum
10. 正确理解SAP BTP中hyperscaler PG中的IOPS (AWS篇)

数据库杂记
PostgreSQL,SAP HANA,Sybase ASE/ASA,Oracle,MySQL,SQLite各类数据库, SAP BTP云计算技术, 以及陈式太极拳教学倾情分享。
 最新文章