前言
PostgreSQL数据库的空间膨胀 监控及其维护是PGer DBA重点关注的一项工作之一。
而Vaccum Full在生产环境基本上是不太可能用上的。取而代之的,是pg_repack, pgcompacttable以及pg_sequeeze。
在实际的生产环境当中,这三者都有使用场景。论开销,pgcompacttable相对小一些。pg_repack相对大一些。
兼容性方面:
估计它是兼容性最好的。能支持从PG9.2开始的所有版本,所以用户用起来,基本上不用担心它不支持。
与pg_repack的比较,它的优点:
不需要太多的空间
就地处理相关的表 索引会一个个的重建,并且是从最小的索引开始建,直到重建最大的索引,因而重建索引所需要的空间只是最大的索引所占用的空间 表在处理时通过自适应的延迟来避免太重的I/O和复制延迟的峰值(有一个--delay-ration值可以进行定制 指定)。
本文就针对pgcompacttable的实现,尝试进行原理上的解读。
分析
获取pgcompacttable
https://github.com/dataegret/pgcompacttable
从上边的github网址就可以得到。
环境依赖及准备
拿到pgcompacttable以后,我们可以看到, pgcompacttable是用Perl编写的,需要Perl DBI库,显然有PostgreSQL支持模块。依赖项可以很容易地安装:
Debian系的Linux: apt-get install libdbi-perl libdbd-pg-perl
RedHat/Centos系的Linux: yum install perl-Time-HiRes perl-DBI perl-DBD-Pg
当然,还有一层依赖关系,PG环境准备好以后,需要安装内置插件:`pgstattuple`[1], 相关命令是:
create extension if not exists pgstattuple;
运行
pgcompacttable可以在任何操作系统用户上运行(甚至可以在另一台主机上运行,参见——host选项;虽然建议与目标数据库在同一台主机上运行),但PostgreSQL的超级用户访问是必需的。首选的方式是作为PostgreSQL集群所有者运行,通常是postgres。在这种情况下,pgcompacttable可以为PostgreSQL后端pid执行ionice -c 3来降低IO优先级。
pgcompacttable——man打印可用选项列表和使用信息。
pgcompacttable——all——verbose压缩集群中所有数据库中的所有膨胀表,包括索引。打印其他进度信息。
pgcompacttable——dbname billing——exclude-schema pgq压缩计费数据库中的所有膨胀表及其膨胀索引(pgq模式中的表除外)。
pgcompacttable——dbname billing -t operations -f在数据库计费中强制压缩表操作。注意,膨胀小于20%(硬编码的MINIMAL_COMPACT_PERCENT常量)的表和索引被认为是正常的,直到采取option -force才进行处理。
看看pgcompacttable --help:
Name:
pgcompacttable - PostgreSQL bloat reducing tool.
Usage:
pgcompacttable [OPTION...]
General options:
[-?mV] [(-q | -v LEVEL)]
Connection options:
[-h HOST] [-p PORT] [-U USER] [-W PASSWD] [-P PATH]
Targeting options:
(-a | -d DBNAME...) [-n SCHEMA...] [-t TABLE...] [-N SCHEMA...] [-T
TABLE...]
Examples:
Shows usage manual.
pgcompacttable --man
Compacts all the bloated tables in all the database in the cluster plus their bloated indexes. Prints additional progress information.
pgcompacttable --all --verbose info
Compacts all the bloated tables in the billing database and their
bloated indexes excepts ones that are in the pgq schema.
pgcompacttable --dbname billing --exclude-schema pgq
实例:
[08:41:03-postgres@sean-rh1:/iihero/source/pgcompacttable/bin]$ psql mydb -c "create extension if not exists pgstattuple;"
CREATE EXTENSION
[08:41:49-postgres@sean-rh1:/iihero/source/pgcompacttable/bin]$ psql mydb -c "\d"
List of relations
Schema | Name | Type | Owner
--------+-----------------------+-------+----------
public | customer | table | postgres
public | databasechangelog | table | mydb
public | databasechangeloglock | table | mydb
public | distributed_locks | table | mydb
public | sean_test1 | table | mydb
public | t1 | table | mydb
public | t2 | table | postgres
(7 rows)
[08:41:44-postgres@sean-rh1:/iihero/source/pgcompacttable/bin]$ ./pgcompacttable --dbname mydb
[Sat Sep 14 08:41:48 2024] (mydb) Connecting to database
[Sat Sep 14 08:41:48 2024] (mydb) Postgres backend pid: 22427
[Sat Sep 14 08:41:48 2024] (mydb) Handling tables. Attempt 1
[Sat Sep 14 08:41:49 2024] (mydb:public.t1) Statistics: 1966 pages (2521 pages including toasts and indexes), it is expected that ~0.310% (6 pages) can be compacted with the estimated space saving being 49.234KB.
[Sat Sep 14 08:41:49 2024] (mydb:public.t1) Reindex: public.t1_pkey, initial size 551 pages(4.305MB), has been reduced by 0% (0.000B), duration 0 seconds.
[Sat Sep 14 08:41:49 2024] (mydb:public.t1) Processing results: 1966 pages left (2521 pages including toasts and indexes), size reduced by 0.000B (0.000B including toasts and indexes) in total.
[Sat Sep 14 08:41:49 2024] (mydb) Processing complete.
[Sat Sep 14 08:41:49 2024] (mydb) Processing results: size reduced by 0.000B (-128.000KB including toasts and indexes) in total.
[Sat Sep 14 08:41:49 2024] (mydb) Disconnecting from database
[Sat Sep 14 08:41:49 2024] Processing complete: 1 retries to process has been done
[Sat Sep 14 08:41:49 2024] Processing results: size reduced by 0.000B (-128.000KB including toasts and indexes) in total.
实现解读
几个函数:
show_version: 获取数据库的版本
show_help: 显示帮助
show_usage: 显示使用说明
show_man: 显示完整的manual帮助
set_current_db_name: 设置当前的数据库名
set_current_schema_name_table_name:为目标表指定schema名
unset_current_db_name:重置当前数据库名
unset_current_schema_name_table_name:为目标表重置schema值
unset_after_round_statement: 重置变量:_after_round_statement
logger:直接输出日志
nice_size: 输出对人可读的大小
my @sizes = qw/B KB MB GB TB PB/;
DB相关的函数:
_dbh:连接数据库,然后得到连接:$_dbh
_after_round_statement:得到目标查询的statement对象
db_connect: 真正的连接目标数据库, 涉及参数:db_name, host, user, password, port
注意看看源码,它做了一些附加的动作:
set client_min_messages to warning;
set lc_messages TO 'C';
set application_name TO pgcompacttable;
--
SHOW server_version_num
server_version_num
--------------------
140006
(1 row)db_disconnect: 断开连接
get_databases: 获取所有数据库的列表
is_table: 判断目标对象是不是表:用的是SQL:
SELECT exists(
SELECT 1 FROM pg_catalog.pg_tables
WHERE schemaname = ? and tablename = ?
)
示例:
SELECT exists(
SELECT 1 FROM pg_catalog.pg_tables
WHERE schemaname = 'public' and tablename = 'test'
);
exists
--------
t
(1 row)get_database_tables:获取目标数据库的所有表列表:
(带三个参数:table LIKE ..., table_size_min, table_size_max ...) 抛开一些内部schema的表,如temp, toast, catalog下的表。
SELECT schemaname, tablename FROM pg_catalog.pg_tables
WHERE
NOT (schemaname = 'pg_catalog' AND tablename = 'pg_index') AND
schemaname !~ 'pg_(temp|toast|catalog).*' AND
NOT schemaname = 'information_schema'
[AND tablename LIKE 't%'
AND pg_catalog.pg_relation_size(
quote_ident(schemaname) || '.' || quote_ident(tablename)) >= {table_size_min}
AND AND pg_catalog.pg_relation_size(
quote_ident(schemaname) || '.' || quote_ident(tablename)) < {table_size_max}
]
ORDER BY
pg_catalog.pg_relation_size(
quote_ident(schemaname) || '.' || quote_ident(tablename)),
schemaname, tablename
示例:
mydb=# SELECT schemaname, tablename FROM pg_catalog.pg_tables WHERE NOT (schemaname = 'pg_catalog' AND tablename = 'pg_index') AND schemaname !~ 'pg_(temp|toast|catalog).*' AND NOT schemaname = 'information_schema' ORDER BY pg_catalog.pg_relation_size( quote_ident(schemaname) || '.' || quote_ident(tablename)), schemaname, tablename;
schemaname | tablename
------------+-----------------------
public | databasechangelog
public | databasechangeloglock
public | sto_theme_file
public | pgbench_branches
public | pgbench_tellers
public | t
public | pgbench_history
public | pgbench_accounts
(8 rows)get_pgstattuple_schema_name:获取pgstattuple的schema名:
SELECT nspname FROM pg_catalog.pg_proc
JOIN pg_catalog.pg_namespace AS n ON pronamespace = n.oid
WHERE proname = 'pgstattuple' LIMIT 1;
nspname
---------
public
(1 row)
get_size_stats:获取size的统计:
SELECT
size,
total_size,
ceil(size / bs) AS page_count,
ceil(total_size / bs) AS total_page_count
FROM (
SELECT
current_setting('block_size')::integer AS bs,
pg_catalog.pg_relation_size(quote_ident({schema_name})||'.'||quote_ident({table_name})) AS size,
pg_catalog.pg_total_relation_size(quote_ident({schema_name})||'.'||quote_ident({table_name})) AS total_size
) AS sq;
示例:
SELECT
size,
total_size,
ceil(size / bs) AS page_count,
ceil(total_size / bs) AS total_page_count
FROM (
SELECT
current_setting('block_size')::integer AS bs,
pg_catalog.pg_relation_size(quote_ident('public')||'.'||quote_ident('t')) AS size,
pg_catalog.pg_total_relation_size(quote_ident('public')||'.'||quote_ident('t')) AS total_size
) AS sq;
size | total_size | page_count | total_page_count
---------+------------+------------+------------------
8339456 | 8372224 | 1018 | 1022
(1 row)
get_bloat_stats:获取表膨胀的统计。
下边的SQL,我们可以直接用到我们自己的环境当中啊。看看这种实现,还是蛮有用的。
get_bloat_stats的相关SQL:(参数就是三个,pgstattuple的schema, 目标表的schema以及表名)
SELECT
ceil((size - free_space - dead_tuple_len) * 100 / fillfactor / bs) AS effective_page_count,
greatest(round(
(100 * (1 - (100 - free_percent - dead_tuple_percent) / fillfactor))::numeric, 2
),0) AS free_percent,
greatest(ceil(size - (size - free_space - dead_tuple_len) * 100 / fillfactor), 0) AS free_space
FROM (
SELECT
current_setting('block_size')::integer AS bs,
pg_catalog.pg_relation_size(pg_catalog.pg_class.oid) AS size,
coalesce(
(
SELECT (
regexp_matches(
reloptions::text, E'.*fillfactor=(\\\\d+).*'))[1]),
'100')::real AS fillfactor,
pgst.*
FROM pg_catalog.pg_class
CROSS JOIN
" . $pgstattuple_schema_name . ".pgstattuple(
(quote_ident($schema_name) || '.' || quote_ident($table_name))) AS pgst
WHERE pg_catalog.pg_class.oid = (quote_ident($schema_name) || '.' || quote_ident($table_name))::regclass
) AS sq
来一个例子:
SELECT
ceil((size - free_space - dead_tuple_len) * 100 / fillfactor / bs) AS effective_page_count,
greatest(round(
(100 * (1 - (100 - free_percent - dead_tuple_percent) / fillfactor))::numeric, 2
),0) AS free_percent,
greatest(ceil(size - (size - free_space - dead_tuple_len) * 100 / fillfactor), 0) AS free_space
FROM (
SELECT
current_setting('block_size')::integer AS bs,
pg_catalog.pg_relation_size(pg_catalog.pg_class.oid) AS size,
coalesce(
(
SELECT (
regexp_matches(
reloptions::text, E'.*fillfactor=(\\\\d+).*'))[1]),
'100')::real AS fillfactor,
pgst.*
FROM pg_catalog.pg_class
CROSS JOIN
public.pgstattuple(
('public.t')) AS pgst
WHERE pg_catalog.pg_class.oid = ('public.t')::regclass
) AS sq;
effective_page_count | free_percent | free_space
----------------------+--------------+------------
782 | 23.20 | 1934844
(1 row)
get_update_column:获取更新列. 给定schema名和表名,得到该表的更新列的名。
SELECT quote_ident(attname)
FROM pg_catalog.pg_attribute
WHERE
attnum > 0 AND -- neither system
NOT attisdropped AND -- nor dropped
attrelid = (quote_ident(?) || '.' || quote_ident(?))::regclass
ORDER BY
-- Variable legth attributes have lower priority because of the chance
-- of being toasted
(attlen = -1),
-- Preferably not indexed attributes
(
attnum::text IN (
SELECT regexp_split_to_table(indkey::text, ' ')
FROM pg_catalog.pg_index
WHERE indrelid = (quote_ident(?) || '.' || quote_ident(?))::regclass)),
-- Preferably smaller attributes
attlen,
attnum
LIMIT 1;
示例:
SELECT quote_ident(attname)
FROM pg_catalog.pg_attribute
WHERE
attnum > 0 AND -- neither system
NOT attisdropped AND -- nor dropped
attrelid = ('public.t1')::regclass
ORDER BY
-- Variable legth attributes have lower priority because of the chance
-- of being toasted
(attlen = -1),
-- Preferably not indexed attributes
(
attnum::text IN (
SELECT regexp_split_to_table(indkey::text, ' ')
FROM pg_catalog.pg_index
WHERE indrelid = ('public.t1')::regclass)),
-- Preferably smaller attributes
attlen,
attnum
LIMIT 1;
quote_ident
-------------
id
(1 row)get_pages_per_round:得到每轮的页数, 参数:page_count, to_page
get_pages_before_vacuum: 获取vacuum之前的页数, 参数: page_count, expected_page_count
get_max_tupples_per_page: 每页最大元组数
SELECT ceil(current_setting('block_size')::real / sum(attlen))
FROM pg_catalog.pg_attribute
WHERE
attrelid = '$ident_name'::regclass AND
attnum < 0;
");
示例:使用public.t1替代其中的参数:$ident_name
SELECT ceil(current_setting('block_size')::real / sum(attlen))
FROM pg_catalog.pg_attribute
WHERE
attrelid = 'public.t1'::regclass AND
attnum < 0;
");
ceil
------
316
(1 row)has_triggers: 是否有触发器
相关SQL是:
"SELECT count(1) FROM pg_catalog.pg_trigger
WHERE
tgrelid = '$ident_name'::regclass AND
tgenabled IN ('A', 'R') AND
(tgtype & 16)::boolean"try_advisory_lock:尝试获得建议锁
SELECT pg_try_advisory_lock(
'pg_catalog.pg_class'::regclass::integer,
(quote_ident(?)||'.'||quote_ident(?))::regclass::integer)::integer;
"
Vacuum函数
用的SQL是:
PG12以上:
vacuum (ANALYZE, INDEX_CLEANUP ON) <target table>
如果PG12以前,则去掉INDEX_CLEANUP ON选项.
analyze函数
analyze {schema}.{table}
set_session_replication_role函数
set session_replication_role to replica;
create_clean_pages_function PGSQL函数
指定表、列,to_page, page_offset, 每页最大元组数,即可执行该函数。得到要clean的page.
CREATE OR REPLACE FUNCTION public.pgcompact_clean_pages_$$(
i_table_ident text,
i_column_ident text,
i_to_page integer,
i_page_offset integer,
i_max_tupples_per_page integer)
RETURNS integer
LANGUAGE plpgsql AS \$\$
DECLARE
_from_page integer := i_to_page - i_page_offset + 1;
_min_ctid tid;
_max_ctid tid;
_ctid_list tid[];
_next_ctid_list tid[];
_ctid tid;
_loop integer;
_result_page integer;
_update_query text :=
'UPDATE ONLY ' || i_table_ident ||
' SET ' || i_column_ident || ' = ' || i_column_ident ||
' WHERE ctid = ANY(\$1) RETURNING ctid';
BEGIN
-- Check page argument values
IF NOT (
i_page_offset IS NOT NULL AND i_page_offset >= 1 AND
i_to_page IS NOT NULL AND i_to_page >= 1 AND
i_to_page >= i_page_offset)
THEN
RAISE EXCEPTION 'Wrong page arguments specified.';
END IF;
-- Check that session_replication_role is set to replica to
-- prevent triggers firing
IF NOT (
SELECT setting = 'replica'
FROM pg_catalog.pg_settings
WHERE name = 'session_replication_role')
THEN
RAISE EXCEPTION 'The session_replication_role must be set to replica.';
END IF;
-- Define minimal and maximal ctid values of the range
_min_ctid := (_from_page, 1)::text::tid;
_max_ctid := (i_to_page, i_max_tupples_per_page)::text::tid;
-- Build a list of possible ctid values of the range
SELECT array_agg((pi, ti)::text::tid)
INTO _ctid_list
FROM generate_series(_from_page, i_to_page) AS pi
CROSS JOIN generate_series(1, i_max_tupples_per_page) AS ti;
<<_outer_loop>>
FOR _loop IN 1..i_max_tupples_per_page LOOP
_next_ctid_list := array[]::tid[];
-- Update all the tuples in the range
FOR _ctid IN EXECUTE _update_query USING _ctid_list
LOOP
IF _ctid > _max_ctid THEN
_result_page := -1;
EXIT _outer_loop;
ELSIF _ctid >= _min_ctid THEN
-- The tuple is still in the range, more updates are needed
_next_ctid_list := _next_ctid_list || _ctid;
END IF;
END LOOP;
_ctid_list := _next_ctid_list;
-- Finish processing if there are no tupples in the range left
IF coalesce(array_length(_ctid_list, 1), 0) = 0 THEN
_result_page := _from_page - 1;
EXIT _outer_loop;
END IF;
END LOOP;
-- No result
IF _loop = i_max_tupples_per_page AND _result_page IS NULL THEN
RAISE EXCEPTION
'Maximal loops count has been reached with no result.';
END IF;
RETURN _result_page;
END \$\$;
drop_clean_pages_function PGSQL函数
DROP FUNCTION public.pgcompact_clean_pages_$$(text, text,integer, integer, integer)
clean_pages函数
直接调用:pgcompact_clean_pages_$$, 获取结果。
get_index_data_list函数
里边用到了一个复杂的查询, 只需要参数schema和表名:
SELECT
indexname, tablespace, indexdef,
regexp_replace(indexdef, E'.* USING (\\\\w+) .*', E'\\\\1') AS indmethod,
conname,
CASE
WHEN contype = 'p' THEN 'PRIMARY KEY'
WHEN contype = 'u' THEN 'UNIQUE'
ELSE NULL END AS contypedef,
(
SELECT
bool_and(
deptype IN ('n', 'a', 'i') AND
NOT (refobjid = indexoid AND deptype = 'n') AND
NOT (
objid = indexoid AND deptype = 'i'"
. ($_dbh_server_version < 90100 ? " AND contype NOT IN ('p', 'u')":"") . "
))
FROM pg_catalog.pg_depend
LEFT JOIN pg_catalog.pg_constraint ON
pg_catalog.pg_constraint.oid = refobjid
WHERE
(objid = indexoid AND classid = pgclassid) OR
(refobjid = indexoid AND refclassid = pgclassid)
)::integer AS replace_index_possible,
(
SELECT string_to_array(indkey::text, ' ')::int2[] operator(pg_catalog.@>) array[0::int2]
FROM pg_catalog.pg_index
WHERE indexrelid = indexoid
)::integer as is_functional,
condeferrable as is_deferrable,
condeferred as is_deferred,
(contype = 'x') as is_exclude_constraint,
pg_catalog.pg_relation_size(indexoid) as idxsize
FROM (
SELECT i.relname AS indexname,
(SELECT spcname AS tablespace FROM pg_catalog.pg_tablespace WHERE oid = (
case when i.reltablespace != 0 then i.reltablespace else
(SELECT dattablespace
FROM pg_catalog.pg_database
WHERE datname = current_database() AND
spcname != current_setting('default_tablespace'))
end)
) as tablespace,
pg_get_indexdef(i.oid) AS indexdef,
i.oid as indexoid,
'pg_catalog.pg_class'::regclass AS pgclassid
FROM pg_index x
JOIN pg_class c ON c.oid = x.indrelid
JOIN pg_class i ON i.oid = x.indexrelid
LEFT JOIN pg_namespace n ON n.oid = c.relnamespace
WHERE c.relkind IN ('r', 'm', 'p') AND i.relkind IN ('i', 'I') AND
n.nspname = ? AND
c.relname = ?
) AS sq
LEFT JOIN pg_catalog.pg_constraint ON
conindid = indexoid AND contype IN ('p', 'u', 'x')
ORDER BY idxsize
get_index_size_statistics函数
统计索引大小:
SELECT size, ceil(size / bs) AS page_count
FROM (
SELECT
pg_catalog.pg_relation_size((quote_ident(?) || '.' || quote_ident(?))::regclass) AS size,
current_setting('block_size')::real AS bs
) AS sq
get_reindex_query函数
得到重建索引的语句
get_alter_index_query函数
生成alter index的查询语句. 中间有个rename过程。
get_straight_reindex_query函数
REINDEX INDEX (...)来生成reindex语句
get_index_bloat_stats函数:
得到索引膨胀的统计结果。相关的重要的SQL语句:
SELECT
CASE
WHEN avg_leaf_density = 'NaN' THEN 0
ELSE
round(
(100 * (1 - avg_leaf_density / fillfactor))::numeric, 2
)
END AS free_percent,
CASE
WHEN avg_leaf_density = 'NaN' THEN 0
ELSE
ceil(
index_size * (1 - avg_leaf_density / fillfactor)
)
END AS free_space
FROM (
SELECT
coalesce(
(
SELECT (
regexp_matches(
reloptions::text, E'.*fillfactor=(\\\\d+).*'))[1]),
'90')::real AS fillfactor,
pgsi.*
FROM pg_catalog.pg_class
CROSS JOIN $pgstattuple_schema_name.pgstatindex(
quote_ident(?) || '.' || quote_ident(?)) AS pgsi
WHERE pg_catalog.pg_class.oid = (quote_ident(?) || '.' || quote_ident(?))::regclass
) AS oq
这个可以直接拿到实际环境当中使用。
drop_temp_index函数
删除临时索引。
DROP INDEX CONCURRENTLY {schema}.{tablename}
get_pg_backend_pid函数
使用SQL:
select pg_backend_pid();
reindex_index_concurrently函数
调用的是:
REINDEX INDEX CONCURRENTLY {schema}.{index_name}
执行完之后,会调用get_index_size_statistics重新统计index信息。
reindex_index_old_replace函数
reindex_table函数
对目标表重建索引, 在统计信息判断,如果目标表需要进行reindex, 则执行此操作。
process函数
这个是主执行函数,具体流程如下:
try_advisory_lock,获取表的建议锁 (轻量级的)
get_size_stats, 拿到表的统计信息
执行一个vacuum
get_size_stats, 再次拿到统计信息
如果表只有一页或者是空页,则直接跳过。
对需要的表进行reindex. get_bloat_stats获取表的膨胀信息
判断表是否需要进行compact:
my $can_be_compacted = ($bloat_stats->{'free_percent'} > 0 && ($table_info->{stats}{page_count} > $bloat_stats->{effective_page_count}));
has_triggers如果表有trigger, 则直接跳过
main函数
基本是对process函数的调用。外围有很多判断。对每个进入处理的表,则依次会调用相应的process函数。
小结
虽然只是一个perl脚本文件,里边包含了很多实用的函数。并且其中用到的SQL脚本都是公开的。我们根据函数名,可以直接拿其中的SQL脚本,用到实际的环境当中。这是完全可以借鉴的。
参考资料
pgstattuple
: https://www.postgresql.org/docs/current/static/pgstattuple.html
往期导读:
1. PostgreSQL中配置单双向SSL连接详解
2. 提升PSQL使用技巧:PostgreSQL中PSQL使用技巧汇集(1)
3. 提升PSQL使用技巧:PostgreSQL中PSQL使用技巧汇集(2)
4. PostgreSQL SQL的基础使用及技巧
5. PostgreSQL开发技术基础:过程与函数
6. PostgreSQL中vacuum 物理文件truncate发生的条件
7. PostgreSQL中表的年龄与Vacuum的实验探索:Vacuum有大用
8. PostgreSQL利用分区表来弥补AutoVacuum的不足
9. 也聊聊PostgreSQL中的空间膨胀与AutoVacuum
10. 正确理解SAP BTP中hyperscaler PG中的IOPS (AWS篇)