pg_duckdb简介
https://github.com/duckdb/pg_duckdb
https://docs.pgduckdb.com/quickstart
https://hub.docker.com/r/pgduckdb/pgduckdb
pg_duckdb 是一个扩展模块,旨在将 PostgreSQL 和 DuckDB 结合使用,提供了一种在 PostgreSQL 中访问 DuckDB 功能的途径。DuckDB 是一种面向分析工作负载优化的嵌入式列存储数据库,其特点是高性能、开箱即用且支持多种分析功能。
pg_duckdb 是一种强大的扩展,特别适合数据分析、外部数据集成以及需要高性能查询的场景。通过它,用户可以在 PostgreSQL 的生态中轻松引入 DuckDB 的能力,进一步扩展数据库的应用范围。
核心功能
嵌入式集成:pg_duckdb 在 PostgreSQL 中嵌入了 DuckDB 的功能,使用户可以利用 DuckDB 的列存分析引擎处理复杂的分析查询。
无缝访问:通过 pg_duckdb,用户可以在 PostgreSQL 中直接查询 DuckDB 表,而无需离开 PostgreSQL 环境。
高性能分析:DuckDB 优化了分析场景中的列式存储和向量化计算,pg_duckdb 可以充分利用这些特性。
多格式支持:DuckDB 支持查询多种外部数据格式(如 Parquet、CSV 等),通过 pg_duckdb,用户可以直接从这些格式中读取数据。
优势
提升查询性能:在分析场景中,利用 DuckDB 的列存优势,可以显著提升查询性能。
节省存储资源:DuckDB 的列存设计在处理高基数列时具有更高的压缩效率。
数据格式兼容性:支持直接加载和查询 Parquet、Arrow 等流行的数据格式,无需额外的转换。
使用场景
数据分析和聚合: pg_duckdb 适用于处理大量的分析查询,特别是涉及复杂聚合和多表联接的场景。
外部文件查询: 通过 pg_duckdb 可以直接查询 CSV、Parquet 等文件,而不需要将文件数据加载到 PostgreSQL 中。
加速 BI 工具查询: 对于与 BI 工具集成的 PostgreSQL 数据库,pg_duckdb 可以显著提高查询响应速度。
示例查询
1-- Remote Parquet scans:
2SELECT count(*) FROM read_parquet('https://shell.duckdb.org/data/tpch/0_01/parquet/orders.parquet') AS (o_orderkey int);
3SELECT avg(c_acctbal) FROM read_parquet('https://shell.duckdb.org/data/tpch/0_01/parquet/customer.parquet') AS (c_acctbal float);
4SELECT count(*)::int as aws_service_cnt FROM read_parquet('https://raw.githubusercontent.com/tobilg/aws-iam-data/main/data/parquet/aws_services.parquet') AS (service_id int);
5SELECT code, city, state FROM read_parquet('https://raw.githubusercontent.com/tobilg/aws-edge-locations/main/data/aws-edge-locations.parquet') AS (code text, city text, state text);
6SELECT cloud_provider, sum(ip_address_cnt)::int as cnt FROM read_parquet('https://raw.githubusercontent.com/tobilg/public-cloud-provider-ip-ranges/main/data/providers/all.parquet') AS (cloud_provider text, ip_address_cnt int) GROUP BY cloud_provider;
7
8-- Remote CSV scan
9SELECT * FROM read_csv('https://raw.githubusercontent.com/tobilg/public-cloud-provider-ip-ranges/main/data/providers/all.csv') AS (cloud_provider text, cidr_block text, ip_address text) limit 30;
pg_duckdb安装
docker安装使用
1docker pull pgduckdb/pgduckdb:15-main
2docker pull pgduckdb/pgduckdb:16-main
3docker pull pgduckdb/pgduckdb:17-main
4
5docker run -d --name pgduckdb17 -h pgduckdb15 -p 5632:5432 -e POSTGRES_PASSWORD=lhr pgduckdb/pgduckdb:15-main
6docker run -d --name pgduckdb17 -h pgduckdb16 -p 5633:5432 -e POSTGRES_PASSWORD=lhr pgduckdb/pgduckdb:16-main
7docker run -d --name pgduckdb17 -h pgduckdb17 -p 5634:5432 -e POSTGRES_PASSWORD=lhr pgduckdb/pgduckdb:17-main
8
9
10psql postgres://postgres:lhr@127.0.0.1:5634/postgres
11
12
13# Or if using docker compose
14git clone https://github.com/duckdb/pg_duckdb && cd pg_duckdb && docker compose up -d
15
16docker compose exec db psql
编译安装
To build pg_duckdb, you need:
Postgres 15-17
Ubuntu 22.04-24.04 or MacOS
Standard set of build tools for building Postgres extensions
Build tools that are required to build DuckDB :
1sudo yum install -y git g++ cmake ninja-build openssl-devel
2
3-- CMake 3.5...3.29 or higher is required
4
5git clone https://github.com/duckdb/pg_duckdb
6cd pg_duckdb
7make install -j 16
8
9
10-- 主要文件
11/pg17/pg17/lib/postgresql/pg_duckdb.so
12/pg17/pg17/lib/postgresql/libduckdb.so
13/pg17/pg17/share/postgresql/extension/pg_duckdb--0.1.0--0.2.0.sql
14/pg17/pg17/share/postgresql/extension/pg_duckdb--0.1.0.sql
15/pg17/pg17/share/postgresql/extension/pg_duckdb.control
16
17
18alter system set shared_preload_libraries = 'pg_duckdb';
19CREATE EXTENSION pg_duckdb;
20SET duckdb.force_execution = true;
21
22
23-- Ubuntu 安装
24curl -sfL https://install.pgx.sh | sh -
25pgxman install pg_duckdb
26shared_preload_libraries = 'pg_duckdb'
27CREATE EXTENSION pg_duckdb;
28SET duckdb.force_execution = true;
大表测试性能
11. 安装duckdb、生成数据、导出数据
2wget https://github.com/duckdb/duckdb/releases/download/v1.1.3/duckdb_cli-linux-amd64.zip
3unzip duckdb_cli-linux-amd64.zip
4./duckdb
5install tpcds;
6load tpcds;
7-- 生成规模因子为1的测试数据,即1GB左右数据
8call dsdgen(sf=1);
9-- 导出数据为csv格式,并且用“|”分隔
10export database 'public' (format csv, delimiter '|');
11.quit
12
13
142. 导入数据到PG数据库中
15sed -i 's/COPY/\\copy/' public/load.sql
16createdb testduckdb
17psql -d testduckdb -f public/schema.sql
18psql -d testduckdb -f public/load.sql
19
20
21
22testduckdb=# \dt+
23 List of relations
24 Schema | Name | Type | Owner | Persistence | Access method | Size | Description
25--------+------------------------+-------+----------+-------------+---------------+------------+-------------
26 public | call_center | table | postgres | permanent | heap | 16 kB |
27 public | catalog_page | table | postgres | permanent | heap | 1976 kB |
28 public | catalog_returns | table | postgres | permanent | heap | 23 MB |
29 public | catalog_sales | table | postgres | permanent | heap | 293 MB |
30 public | customer | table | postgres | permanent | heap | 15 MB |
31 public | customer_address | table | postgres | permanent | heap | 7400 kB |
32 public | customer_demographics | table | postgres | permanent | heap | 139 MB |
33 public | date_dim | table | postgres | permanent | heap | 11 MB |
34 public | household_demographics | table | postgres | permanent | heap | 440 kB |
35 public | income_band | table | postgres | permanent | heap | 8192 bytes |
36 public | inventory | table | postgres | permanent | heap | 496 MB |
37 public | item | table | postgres | permanent | heap | 5800 kB |
38 public | promotion | table | postgres | permanent | heap | 80 kB |
39 public | reason | table | postgres | permanent | heap | 16 kB |
40 public | ship_mode | table | postgres | permanent | heap | 16 kB |
41 public | store | table | postgres | permanent | heap | 16 kB |
42 public | store_returns | table | postgres | permanent | heap | 38 MB |
43 public | store_sales | table | postgres | permanent | heap | 405 MB |
44 public | time_dim | table | postgres | permanent | heap | 8424 kB |
45 public | warehouse | table | postgres | permanent | heap | 16 kB |
46 public | web_page | table | postgres | permanent | heap | 16 kB |
47 public | web_returns | table | postgres | permanent | heap | 11 MB |
48 public | web_sales | table | postgres | permanent | heap | 147 MB |
49 public | web_site | table | postgres | permanent | heap | 48 kB |
50(24 rows)
做查询操作,测试性能:
不使用pg_duckdb插件的情况下:
1[pg17@lhrpgalloe soft]$ psql -d testduckdb
2psql (17.1)
3Type "help" for help.
4
5testduckdb=# \timing on
6Timing is on.
7testduckdb=# \o /dev/null
8testduckdb=# \i ./pg_duckdb/third_party/duckdb/extension/tpcds/dsdgen/queries/01.sql
9Time: 145934.025 ms (02:25.934)
10testduckdb=# \i ./pg_duckdb/third_party/duckdb/extension/tpcds/dsdgen/queries/02.sql
11Time: 494.014 ms
12testduckdb=# \i ./pg_duckdb/third_party/duckdb/extension/tpcds/dsdgen/queries/03.sql
13Time: 134.083 ms
14testduckdb=# \i ./pg_duckdb/third_party/duckdb/extension/tpcds/dsdgen/queries/04.sql
15
16
17^CCancel request sent
18psql:pg_duckdb/third_party/duckdb/extension/tpcds/dsdgen/queries/04.sql:119: ERROR: canceling statement due to user request
19Time: 2062108.995 ms (34:22.109)
20
21-- ... 00:25:15 未出结果
22testduckdb=#
23testduckdb=# \i ./pg_duckdb/third_party/duckdb/extension/tpcds/dsdgen/queries/05.sql
24Time: 575.717 ms
25testduckdb=# \i ./pg_duckdb/third_party/duckdb/extension/tpcds/dsdgen/queries/06.sql
26Time: 50091.984 ms (00:50.092)
27testduckdb=# \i ./pg_duckdb/third_party/duckdb/extension/tpcds/dsdgen/queries/07.sql
28Time: 372.611 ms
29testduckdb=# \i ./pg_duckdb/third_party/duckdb/extension/tpcds/dsdgen/queries/08.sql
30Time: 199.187 ms
31testduckdb=# \i ./pg_duckdb/third_party/duckdb/extension/tpcds/dsdgen/queries/09.sql
32Time: 1467.471 ms (00:01.467)
33testduckdb=# \i ./pg_duckdb/third_party/duckdb/extension/tpcds/dsdgen/queries/10.sql
34Time: 5629.187 ms (00:05.629)
35testduckdb=#
36
37
38testduckdb=# explain WITH customer_total_return AS
39testduckdb-# (SELECT sr_customer_sk AS ctr_customer_sk,
40testduckdb(# sr_store_sk AS ctr_store_sk,
41testduckdb(# sum(sr_return_amt) AS ctr_total_return
42testduckdb(# FROM store_returns,
43testduckdb(# date_dim
44testduckdb(# WHERE sr_returned_date_sk = d_date_sk
45testduckdb(# AND d_year = 2000
46testduckdb(# GROUP BY sr_customer_sk,
47testduckdb(# sr_store_sk)
48testduckdb-# SELECT c_customer_id
49testduckdb-# FROM customer_total_return ctr1,
50testduckdb-# store,
51testduckdb-# customer
52testduckdb-# WHERE ctr1.ctr_total_return >
53testduckdb-# (SELECT avg(ctr_total_return)*1.2
54testduckdb(# FROM customer_total_return ctr2
55testduckdb(# WHERE ctr1.ctr_store_sk = ctr2.ctr_store_sk)
56testduckdb-# AND s_store_sk = ctr1.ctr_store_sk
57testduckdb-# AND s_state = 'TN'
58testduckdb-# AND ctr1.ctr_customer_sk = c_customer_sk
59testduckdb-# ORDER BY c_customer_id
60testduckdb-# LIMIT 100;
61 QUERY PLAN
62------------------------------------------------------------------------------------------------------------------------
63 Limit (cost=55885.77..55885.78 rows=2 width=17)
64 CTE customer_total_return
65 -> Finalize GroupAggregate (cost=9458.80..9633.24 rows=1381 width=40)
66 Group Key: store_returns.sr_customer_sk, store_returns.sr_store_sk
67 -> Gather Merge (cost=9458.80..9604.48 rows=1150 width=40)
68 Workers Planned: 2
69 -> Partial GroupAggregate (cost=8458.78..8471.71 rows=575 width=40)
70 Group Key: store_returns.sr_customer_sk, store_returns.sr_store_sk
71 -> Sort (cost=8458.78..8460.21 rows=575 width=14)
72 Sort Key: store_returns.sr_customer_sk, store_returns.sr_store_sk
73 -> Parallel Hash Join (cost=1963.80..8432.42 rows=575 width=14)
74 Hash Cond: (store_returns.sr_returned_date_sk = date_dim.d_date_sk)
75 -> Parallel Seq Scan on store_returns (cost=0.00..6015.45 rows=119945 width=18)
76 -> Parallel Hash (cost=1961.12..1961.12 rows=214 width=4)
77 -> Parallel Seq Scan on date_dim (cost=0.00..1961.12 rows=214 width=4)
78 Filter: (d_year = 2000)
79 -> Sort (cost=46252.54..46252.54 rows=2 width=17)
80 Sort Key: customer.c_customer_id
81 -> Hash Join (cost=43005.51..46252.53 rows=2 width=17)
82 Hash Cond: (customer.c_customer_sk = ctr1.ctr_customer_sk)
83 -> Seq Scan on customer (cost=0.00..2872.00 rows=100000 width=21)
84 -> Hash (cost=43005.48..43005.48 rows=2 width=4)
85 -> Nested Loop (cost=0.00..43005.48 rows=2 width=4)
86 Join Filter: (ctr1.ctr_store_sk = store.s_store_sk)
87 -> CTE Scan on customer_total_return ctr1 (cost=0.00..42987.08 rows=460 width=8)
88 Filter: (ctr_total_return > (SubPlan 2))
89 SubPlan 2
90 -> Aggregate (cost=31.09..31.11 rows=1 width=32)
91 -> CTE Scan on customer_total_return ctr2 (cost=0.00..31.07 rows=7 width=32)
92 Filter: (ctr1.ctr_store_sk = ctr_store_sk)
93 -> Materialize (cost=0.00..11.51 rows=1 width=4)
94 -> Seq Scan on store (cost=0.00..11.50 rows=1 width=4)
95 Filter: ((s_state)::text = 'TN'::text)
96(33 rows)
97
98Time: 26.596 ms
99
使用pg_duckdb插件的情况下:
1[pg17@lhrpgalloe soft]$ psql -d testduckdb
2psql (17.1)
3Type "help" for help.
4
5testduckdb=# \timing on
6Timing is on.
7testduckdb=# \o /dev/null
8testduckdb=# SET duckdb.force_execution = true;
9Time: 0.922 ms
10testduckdb=# \i ./pg_duckdb/third_party/duckdb/extension/tpcds/dsdgen/queries/01.sql
11Time: 217.649 ms
12testduckdb=# \i ./pg_duckdb/third_party/duckdb/extension/tpcds/dsdgen/queries/02.sql
13Time: 824.615 ms
14testduckdb=# \i ./pg_duckdb/third_party/duckdb/extension/tpcds/dsdgen/queries/03.sql
15Time: 808.953 ms
16testduckdb=# \i ./pg_duckdb/third_party/duckdb/extension/tpcds/dsdgen/queries/04.sql
17Time: 2618.012 ms (00:02.618)
18testduckdb=# \i ./pg_duckdb/third_party/duckdb/extension/tpcds/dsdgen/queries/05.sql
19Time: 1918.740 ms (00:01.919)
20testduckdb=# \i ./pg_duckdb/third_party/duckdb/extension/tpcds/dsdgen/queries/06.sql
21Time: 450.428 ms
22testduckdb=# \i ./pg_duckdb/third_party/duckdb/extension/tpcds/dsdgen/queries/07.sql
23Time: 840.963 ms
24testduckdb=# \i ./pg_duckdb/third_party/duckdb/extension/tpcds/dsdgen/queries/07.sql
25Time: 760.951 ms
26testduckdb=# \i ./pg_duckdb/third_party/duckdb/extension/tpcds/dsdgen/queries/08.sql
27Time: 428.280 ms
28testduckdb=# \i ./pg_duckdb/third_party/duckdb/extension/tpcds/dsdgen/queries/09.sql
29Time: 9708.039 ms (00:09.708)
30testduckdb=# \i ./pg_duckdb/third_party/duckdb/extension/tpcds/dsdgen/queries/10.sql
31Time: 1495.543 ms (00:01.496)
32testduckdb=#
33
34testduckdb=# explain WITH customer_total_return AS
35testduckdb-# (SELECT sr_customer_sk AS ctr_customer_sk,
36testduckdb(# sr_store_sk AS ctr_store_sk,
37testduckdb(# sum(sr_return_amt) AS ctr_total_return
38testduckdb(# FROM store_returns,
39testduckdb(# date_dim
40testduckdb(# WHERE sr_returned_date_sk = d_date_sk
41testduckdb(# AND d_year = 2000
42testduckdb(# GROUP BY sr_customer_sk,
43testduckdb(# sr_store_sk)
44testduckdb-# SELECT c_customer_id
45testduckdb-# FROM customer_total_return ctr1,
46testduckdb-# store,
47testduckdb-# customer
48testduckdb-# WHERE ctr1.ctr_total_return >
49testduckdb-# (SELECT avg(ctr_total_return)*1.2
50testduckdb(# FROM customer_total_return ctr2
51testduckdb(# WHERE ctr1.ctr_store_sk = ctr2.ctr_store_sk)
52testduckdb-# AND s_store_sk = ctr1.ctr_store_sk
53testduckdb-# AND s_state = 'TN'
54testduckdb-# AND ctr1.ctr_customer_sk = c_customer_sk
55testduckdb-# ORDER BY c_customer_id
56testduckdb-# LIMIT 100;
57 QUERY PLAN
58------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
59 Custom Scan (DuckDBScan) (cost=0.00..0.00 rows=0 width=0)
60 DuckDB Execution Plan:
61
62 ┌───────────────────────────┐
63 │ TOP_N │
64 │ ──────────────────── │
65 │ Top: 100 │
66 │ │
67 │ Order By: │
68 │ customer.c_customer_id ASC│
69 └─────────────┬─────────────┘
70 ┌─────────────┴─────────────┐
71 │ PROJECTION │
72 │ ──────────────────── │
73 │ c_customer_id │
74 │ │
75 │ ~28785 Rows │
76 └─────────────┬─────────────┘
77 ┌─────────────┴─────────────┐
78 │ CTE │
79 │ ──────────────────── │
80 │ CTE Name: │
81 │ customer_total_return ├───────────────────────────────────────────┐
82 │ │ │
83 │ Table Index: 0 │ │
84 └─────────────┬─────────────┘ │
85 ┌─────────────┴─────────────┐ ┌─────────────┴─────────────┐
86 │ HASH_GROUP_BY │ │ FILTER │
87 │ ──────────────────── │ │ ──────────────────── │
88 │ Groups: │ │ (CAST(ctr_total_return AS │
89 │ #0 │ │ DOUBLE) > SUBQUERY) │
90 │ #1 │ │ │
91 │ │ │ │
92 │ Aggregates: sum(#2) │ │ │
93 │ │ │ │
94 │ ~28785 Rows │ │ ~5757 Rows │
95 └─────────────┬─────────────┘ └─────────────┬─────────────┘
96 ┌─────────────┴─────────────┐ ┌─────────────┴─────────────┐
97 │ PROJECTION │ │ RIGHT_DELIM_JOIN │
98 │ ──────────────────── │ │ ──────────────────── │
99 │ sr_customer_sk │ │ Join Type: RIGHT │
100 │ sr_store_sk │ │ │
101 │ sr_return_amt │ │ Conditions: ├────────────────────────────────────────────────────────────────────────┐
102 │ │ │ ctr_store_sk IS NOT │ │
103 │ │ │ DISTINCT FROM ctr_store_sk│ │
104 │ │ │ │ │
105 │ ~57570 Rows │ │ ~5757 Rows │ │
106 └─────────────┬─────────────┘ └─────────────┬─────────────┘ │
107 ┌─────────────┴─────────────┐ ┌─────────────┴─────────────┐ ┌─────────────┴─────────────┐
108 │ HASH_JOIN │ │ HASH_JOIN │ │ HASH_JOIN │
109 │ ──────────────────── │ │ ──────────────────── │ │ ──────────────────── │
110 │ Join Type: INNER │ │ Join Type: INNER │ │ Join Type: RIGHT │
111 │ │ │ │ │ │
112 │ Conditions: ├──────────────┐ │ Conditions: ├──────────────┐ │ Conditions: ├───────────────────────────────────────────┐
113 │ sr_returned_date_sk = │ │ │ c_customer_sk = │ │ │ ctr_store_sk IS NOT │ │
114 │ d_date_sk │ │ │ ctr_customer_sk │ │ │ DISTINCT FROM ctr_store_sk│ │
115 │ │ │ │ │ │ │ │ │
116 │ ~57570 Rows │ │ │ ~5757 Rows │ │ │ ~5757 Rows │ │
117 └─────────────┬─────────────┘ │ └─────────────┬─────────────┘ │ └─────────────┬─────────────┘ │
118 ┌─────────────┴─────────────┐┌─────────────┴─────────────┐┌─────────────┴─────────────┐┌─────────────┴─────────────┐ ┌─────────────┴─────────────┐ ┌─────────────┴─────────────┐
119 │ POSTGRES_SEQ_SCAN ││ POSTGRES_SEQ_SCAN ││ POSTGRES_SEQ_SCAN ││ HASH_JOIN │ │ PROJECTION │ │ DUMMY_SCAN │
120 │ ──────────────────── ││ ──────────────────── ││ ──────────────────── ││ ──────────────────── │ │ ──────────────────── │ │ │
121 │ Function: ││ Function: ││ Function: ││ Join Type: INNER │ │ (avg(ctr_total_return) * 1│ │ │
122 │ POSTGRES_SEQ_SCAN ││ POSTGRES_SEQ_SCAN ││ POSTGRES_SEQ_SCAN ││ │ │ .2) │ │ │
123 │ ││ ││ ││ Conditions: │ │ ctr_store_sk │ │ │
124 │ Projections: ││ Projections: ││ ││ ctr_store_sk = s_store_sk │ │ │ │ │
125 │ sr_returned_date_sk ││ d_date_sk ││ ││ ├──────────────┐ │ │ │ │
126 │ sr_customer_sk ││ ││ ││ │ │ │ │ │ │
127 │ sr_store_sk ││ Filters: ││ ││ │ │ │ │ │ │
128 │ sr_return_amt ││ d_year=2000 AND d_year IS ││ ││ │ │ │ │ │ │
129 │ ││ NOT NULL ││ ││ │ │ │ │ │ │
130 │ ││ ││ ││ │ │ │ │ │ │
131 │ ~287867 Rows ││ ~14609 Rows ││ ~100000 Rows ││ ~5757 Rows │ │ │ ~143 Rows │ │ │
132 └───────────────────────────┘└───────────────────────────┘└───────────────────────────┘└─────────────┬─────────────┘ │ └─────────────┬─────────────┘ └───────────────────────────┘
133 ┌─────────────┴─────────────┐┌─────────────┴─────────────┐┌─────────────┴─────────────┐
134 │ CTE_SCAN ││ POSTGRES_SEQ_SCAN ││ HASH_GROUP_BY │
135 │ ──────────────────── ││ ──────────────────── ││ ──────────────────── │
136 │ CTE Index: 0 ││ Function: ││ Groups: #0 │
137 │ ││ POSTGRES_SEQ_SCAN ││ Aggregates: avg(#1) │
138 │ ││ ││ │
139 │ ││ Filters: ││ │
140 │ ││ s_state='TN' AND s_state ││ │
141 │ ││ IS NOT NULL ││ │
142 │ ││ ││ │
143 │ ~28785 Rows ││ ~24 Rows ││ ~143 Rows │
144 └───────────────────────────┘└───────────────────────────┘└─────────────┬─────────────┘
145 ┌─────────────┴─────────────┐
146 │ PROJECTION │
147 │ ──────────────────── │
148 │ ctr_store_sk │
149 │ ctr_total_return │
150 │ │
151 │ ~287 Rows │
152 └─────────────┬─────────────┘
153 ┌─────────────┴─────────────┐
154 │ HASH_JOIN │
155 │ ──────────────────── │
156 │ Join Type: INNER │
157 │ │
158 │ Conditions: ├──────────────┐
159 │ctr_store_sk = ctr_store_sk│ │
160 │ │ │
161 │ ~287 Rows │ │
162 └─────────────┬─────────────┘ │
163 ┌─────────────┴─────────────┐┌─────────────┴─────────────┐
164 │ CTE_SCAN ││ DELIM_SCAN │
165 │ ──────────────────── ││ ──────────────────── │
166 │ CTE Index: 0 ││ Delim Index: 1 │
167 │ ││ │
168 │ ~28785 Rows ││ ~2878 Rows │
169 └───────────────────────────┘└───────────────────────────┘
170
171
172(113 rows)
173
测试结果
震惊,在未使用pg_duckdb插件的情况下,sql1的时间为145秒,使用了pg_duckdb插件后变为了0.2秒;sql4的时间为33分钟+ 未出结果秒,使用了pg_duckdb插件后变为了2.6秒。
脚本 | 未使用pg_duckdb | 使用了pg_duckdb |
---|---|---|
01.sql | 145934.025 ms | 217.649 ms |
02sql | 494.014 ms | 824.615 ms |
03.sql | 134.083 ms | 808.953 ms |
04.sql | 33分钟+ 未出结果 | 2618.012 ms |
05.sql | 575.717 ms | 1918.740 ms |
06.sql | 50091.984 ms | 450.428 ms |
07.sql | 372.611 ms | 760.951 ms |
08.sql | 199.187 ms | 428.280 ms |
09.sql | 1467.471 ms | 9708.039 ms |
10.sql | 5629.187 ms | 1495.543 ms |
04.sql未跑出结果:
总结
1、目前仅支持Postgres 15-17、Ubuntu 22.04-24.04 or MacOS,或使用docker安装使用较快
2、在openEuler 22.03也可以编译成功(本文使用openEuler 22.03)
3、pg_duckdb编译需要CMake 3.5…3.29 or higher (cmake --version
)
4、可以在会话级别配置当前SQL是否使用pg_duckdb插件:
1\timing on
2\o /dev/null
3
4
5SET duckdb.force_execution = 0;
6\i ./pg_duckdb/third_party/duckdb/extension/tpcds/dsdgen/queries/22.sql
7
8
9SET duckdb.force_execution = 1;
10\i ./pg_duckdb/third_party/duckdb/extension/tpcds/dsdgen/queries/22.sql
5、在实际使用场景中,可以优先考虑PG的索引只扫描+表并行;若性能依然差,则可以考虑安装使用pg_duckdb
参考
https://mp.weixin.qq.com/s/ljT4Ubt8eEgNaV84Jfv9Dg
https://pigsty.io/zh/blog/pg/pg-duckdb/
https://pgxman.com/x/pg_duckdb
https://cloud.tencent.com/developer/article/2439084