性能突破极限!在 openEuler22上使用 pg_duckdb + PostgreSQL 17 执行速度提升不止1000倍

教育   2024-11-28 20:36   陕西  

pg_duckdb简介

https://github.com/duckdb/pg_duckdb

https://docs.pgduckdb.com/quickstart

https://hub.docker.com/r/pgduckdb/pgduckdb

pg_duckdb 是一个扩展模块,旨在将 PostgreSQL 和 DuckDB 结合使用,提供了一种在 PostgreSQL 中访问 DuckDB 功能的途径。DuckDB 是一种面向分析工作负载优化的嵌入式列存储数据库,其特点是高性能、开箱即用且支持多种分析功能。

pg_duckdb 是一种强大的扩展,特别适合数据分析、外部数据集成以及需要高性能查询的场景。通过它,用户可以在 PostgreSQL 的生态中轻松引入 DuckDB 的能力,进一步扩展数据库的应用范围。

核心功能

  • 嵌入式集成:pg_duckdb 在 PostgreSQL 中嵌入了 DuckDB 的功能,使用户可以利用 DuckDB 的列存分析引擎处理复杂的分析查询。

  • 无缝访问:通过 pg_duckdb,用户可以在 PostgreSQL 中直接查询 DuckDB 表,而无需离开 PostgreSQL 环境。

  • 高性能分析:DuckDB 优化了分析场景中的列式存储和向量化计算,pg_duckdb 可以充分利用这些特性。

  • 多格式支持:DuckDB 支持查询多种外部数据格式(如 Parquet、CSV 等),通过 pg_duckdb,用户可以直接从这些格式中读取数据。

优势

  • 提升查询性能:在分析场景中,利用 DuckDB 的列存优势,可以显著提升查询性能。

  • 节省存储资源:DuckDB 的列存设计在处理高基数列时具有更高的压缩效率。

  • 数据格式兼容性:支持直接加载和查询 Parquet、Arrow 等流行的数据格式,无需额外的转换。

使用场景

  • 数据分析和聚合: pg_duckdb 适用于处理大量的分析查询,特别是涉及复杂聚合和多表联接的场景。

  • 外部文件查询: 通过 pg_duckdb 可以直接查询 CSV、Parquet 等文件,而不需要将文件数据加载到 PostgreSQL 中。

  • 加速 BI 工具查询: 对于与 BI 工具集成的 PostgreSQL 数据库,pg_duckdb 可以显著提高查询响应速度。

示例查询

1-- Remote Parquet scans:
2SELECT count(*) FROM read_parquet('https://shell.duckdb.org/data/tpch/0_01/parquet/orders.parquet'AS (o_orderkey int);
3SELECT avg(c_acctbal) FROM read_parquet('https://shell.duckdb.org/data/tpch/0_01/parquet/customer.parquet'AS (c_acctbal float);
4SELECT count(*)::int as aws_service_cnt FROM read_parquet('https://raw.githubusercontent.com/tobilg/aws-iam-data/main/data/parquet/aws_services.parquet'AS (service_id int);
5SELECT code, city, state FROM read_parquet('https://raw.githubusercontent.com/tobilg/aws-edge-locations/main/data/aws-edge-locations.parquet'AS (code text, city text, state text);
6SELECT cloud_provider, sum(ip_address_cnt)::int as cnt FROM read_parquet('https://raw.githubusercontent.com/tobilg/public-cloud-provider-ip-ranges/main/data/providers/all.parquet'AS (cloud_provider text, ip_address_cnt intGROUP BY cloud_provider;
7
8-- Remote CSV scan
9SELECT * FROM read_csv('https://raw.githubusercontent.com/tobilg/public-cloud-provider-ip-ranges/main/data/providers/all.csv'AS (cloud_provider text, cidr_block text, ip_address textlimit 30;

pg_duckdb安装

docker安装使用

 1docker pull pgduckdb/pgduckdb:15-main
2docker pull pgduckdb/pgduckdb:16-main
3docker pull pgduckdb/pgduckdb:17-main
4
5docker run -d --name pgduckdb17 -h pgduckdb15 -p 5632:5432  -e POSTGRES_PASSWORD=lhr pgduckdb/pgduckdb:15-main
6docker run -d --name pgduckdb17 -h pgduckdb16 -p 5633:5432  -e POSTGRES_PASSWORD=lhr pgduckdb/pgduckdb:16-main
7docker run -d --name pgduckdb17 -h pgduckdb17 -p 5634:5432  -e POSTGRES_PASSWORD=lhr pgduckdb/pgduckdb:17-main
8
9
10psql postgres://postgres:lhr@127.0.0.1:5634/postgres
11
12
13# Or if using docker compose
14git clone https://github.com/duckdb/pg_duckdb && cd pg_duckdb && docker compose up -d
15
16docker compose exec db psql

编译安装

To build pg_duckdb, you need:

  • Postgres 15-17

  • Ubuntu 22.04-24.04 or MacOS

  • Standard set of build tools for building Postgres extensions

  • Build tools that are required to build DuckDB :

 1sudo yum install -y git g++ cmake ninja-build openssl-devel
2
3--  CMake 3.5...3.29 or higher is required
4
5git clone https://github.com/duckdb/pg_duckdb
6cd pg_duckdb
7make install -j 16
8
9
10-- 主要文件
11/pg17/pg17/lib/postgresql/pg_duckdb.so
12/pg17/pg17/lib/postgresql/libduckdb.so
13/pg17/pg17/share/postgresql/extension/pg_duckdb--0.1.0--0.2.0.sql
14/pg17/pg17/share/postgresql/extension/pg_duckdb--0.1.0.sql
15/pg17/pg17/share/postgresql/extension/pg_duckdb.control
16
17
18alter system set shared_preload_libraries = 'pg_duckdb';
19CREATE EXTENSION pg_duckdb;
20SET duckdb.force_execution = true;
21
22
23-- Ubuntu 安装
24curl -sfL https://install.pgx.sh | sh -
25pgxman install pg_duckdb
26shared_preload_libraries = 'pg_duckdb'
27CREATE EXTENSION pg_duckdb;
28SET duckdb.force_execution = true;

大表测试性能

 11. 安装duckdb、生成数据、导出数据
2wget https://github.com/duckdb/duckdb/releases/download/v1.1.3/duckdb_cli-linux-amd64.zip
3unzip duckdb_cli-linux-amd64.zip
4./duckdb
5install tpcds;
6load tpcds;
7-- 生成规模因子为1的测试数据,即1GB左右数据
8call dsdgen(sf=1);
9-- 导出数据为csv格式,并且用“|”分隔
10export database 'public' (format csv, delimiter '|');
11.quit
12
13
142. 导入数据到PG数据库中
15sed -i 's/COPY/\\copy/' public/load.sql
16createdb testduckdb
17psql -d testduckdb -f public/schema.sql
18psql -d testduckdb -f public/load.sql
19
20
21
22testduckdb=# \dt+
23                                              List of relations
24 Schema |          Name          | Type  |  Owner   | Persistence | Access method |    Size    | Description 
25--------+------------------------+-------+----------+-------------+---------------+------------+-------------
26 public | call_center            | table | postgres | permanent   | heap          | 16 kB      | 
27 public | catalog_page           | table | postgres | permanent   | heap          | 1976 kB    | 
28 public | catalog_returns        | table | postgres | permanent   | heap          | 23 MB      | 
29 public | catalog_sales          | table | postgres | permanent   | heap          | 293 MB     | 
30 public | customer               | table | postgres | permanent   | heap          | 15 MB      | 
31 public | customer_address       | table | postgres | permanent   | heap          | 7400 kB    | 
32 public | customer_demographics  | table | postgres | permanent   | heap          | 139 MB     | 
33 public | date_dim               | table | postgres | permanent   | heap          | 11 MB      | 
34 public | household_demographics | table | postgres | permanent   | heap          | 440 kB     | 
35 public | income_band            | table | postgres | permanent   | heap          | 8192 bytes | 
36 public | inventory              | table | postgres | permanent   | heap          | 496 MB     | 
37 public | item                   | table | postgres | permanent   | heap          | 5800 kB    | 
38 public | promotion              | table | postgres | permanent   | heap          | 80 kB      | 
39 public | reason                 | table | postgres | permanent   | heap          | 16 kB      | 
40 public | ship_mode              | table | postgres | permanent   | heap          | 16 kB      | 
41 public | store                  | table | postgres | permanent   | heap          | 16 kB      | 
42 public | store_returns          | table | postgres | permanent   | heap          | 38 MB      | 
43 public | store_sales            | table | postgres | permanent   | heap          | 405 MB     | 
44 public | time_dim               | table | postgres | permanent   | heap          | 8424 kB    | 
45 public | warehouse              | table | postgres | permanent   | heap          | 16 kB      | 
46 public | web_page               | table | postgres | permanent   | heap          | 16 kB      | 
47 public | web_returns            | table | postgres | permanent   | heap          | 11 MB      | 
48 public | web_sales              | table | postgres | permanent   | heap          | 147 MB     | 
49 public | web_site               | table | postgres | permanent   | heap          | 48 kB      | 
50(24 rows)

做查询操作,测试性能:

不使用pg_duckdb插件的情况下:

 1[pg17@lhrpgalloe soft]$ psql -d testduckdb
2psql (17.1)
3Type "help" for help.
4
5testduckdb=# \timing on
6Timing is on.
7testduckdb=# \o /dev/null
8testduckdb=# \i ./pg_duckdb/third_party/duckdb/extension/tpcds/dsdgen/queries/01.sql
9Time: 145934.025 ms (02:25.934)
10testduckdb=# \i ./pg_duckdb/third_party/duckdb/extension/tpcds/dsdgen/queries/02.sql
11Time: 494.014 ms
12testduckdb=# \i ./pg_duckdb/third_party/duckdb/extension/tpcds/dsdgen/queries/03.sql
13Time: 134.083 ms
14testduckdb=# \i ./pg_duckdb/third_party/duckdb/extension/tpcds/dsdgen/queries/04.sql
15
16
17^CCancel request sent
18psql:pg_duckdb/third_party/duckdb/extension/tpcds/dsdgen/queries/04.sql:119: ERROR:  canceling statement due to user request
19Time: 2062108.995 ms (34:22.109)
20
21-- ... 00:25:15 未出结果
22testduckdb=# 
23testduckdb=# \i ./pg_duckdb/third_party/duckdb/extension/tpcds/dsdgen/queries/05.sql
24Time: 575.717 ms
25testduckdb=# \i ./pg_duckdb/third_party/duckdb/extension/tpcds/dsdgen/queries/06.sql
26Time: 50091.984 ms (00:50.092)
27testduckdb=# \i ./pg_duckdb/third_party/duckdb/extension/tpcds/dsdgen/queries/07.sql
28Time: 372.611 ms
29testduckdb=# \i ./pg_duckdb/third_party/duckdb/extension/tpcds/dsdgen/queries/08.sql
30Time: 199.187 ms
31testduckdb=# \i ./pg_duckdb/third_party/duckdb/extension/tpcds/dsdgen/queries/09.sql
32Time: 1467.471 ms (00:01.467)
33testduckdb=# \i ./pg_duckdb/third_party/duckdb/extension/tpcds/dsdgen/queries/10.sql
34Time: 5629.187 ms (00:05.629)
35testduckdb=# 
36
37
38testduckdb=# explain WITH customer_total_return AS
39testduckdb-#   (SELECT sr_customer_sk AS ctr_customer_sk,
40testduckdb(#           sr_store_sk AS ctr_store_sk,
41testduckdb(#           sum(sr_return_amt) AS ctr_total_return
42testduckdb(#    FROM store_returns,
43testduckdb(#         date_dim
44testduckdb(#    WHERE sr_returned_date_sk = d_date_sk
45testduckdb(#      AND d_year = 2000
46testduckdb(#    GROUP BY sr_customer_sk,
47testduckdb(#             sr_store_sk)
48testduckdb-# SELECT c_customer_id
49testduckdb-# FROM customer_total_return ctr1,
50testduckdb-#      store,
51testduckdb-#      customer
52testduckdb-# WHERE ctr1.ctr_total_return >
53testduckdb-#     (SELECT avg(ctr_total_return)*1.2
54testduckdb(#      FROM customer_total_return ctr2
55testduckdb(#      WHERE ctr1.ctr_store_sk = ctr2.ctr_store_sk)
56testduckdb-#   AND s_store_sk = ctr1.ctr_store_sk
57testduckdb-#   AND s_state = 'TN'
58testduckdb-#   AND ctr1.ctr_customer_sk = c_customer_sk
59testduckdb-# ORDER BY c_customer_id
60testduckdb-# LIMIT 100;
61                                                       QUERY PLAN                                                       
62------------------------------------------------------------------------------------------------------------------------
63 Limit  (cost=55885.77..55885.78 rows=2 width=17)
64   CTE customer_total_return
65     ->  Finalize GroupAggregate  (cost=9458.80..9633.24 rows=1381 width=40)
66           Group Key: store_returns.sr_customer_sk, store_returns.sr_store_sk
67           ->  Gather Merge  (cost=9458.80..9604.48 rows=1150 width=40)
68                 Workers Planned: 2
69                 ->  Partial GroupAggregate  (cost=8458.78..8471.71 rows=575 width=40)
70                       Group Key: store_returns.sr_customer_sk, store_returns.sr_store_sk
71                       ->  Sort  (cost=8458.78..8460.21 rows=575 width=14)
72                             Sort Key: store_returns.sr_customer_sk, store_returns.sr_store_sk
73                             ->  Parallel Hash Join  (cost=1963.80..8432.42 rows=575 width=14)
74                                   Hash Cond: (store_returns.sr_returned_date_sk = date_dim.d_date_sk)
75                                   ->  Parallel Seq Scan on store_returns  (cost=0.00..6015.45 rows=119945 width=18)
76                                   ->  Parallel Hash  (cost=1961.12..1961.12 rows=214 width=4)
77                                         ->  Parallel Seq Scan on date_dim  (cost=0.00..1961.12 rows=214 width=4)
78                                               Filter: (d_year = 2000)
79   ->  Sort  (cost=46252.54..46252.54 rows=2 width=17)
80         Sort Key: customer.c_customer_id
81         ->  Hash Join  (cost=43005.51..46252.53 rows=2 width=17)
82               Hash Cond: (customer.c_customer_sk = ctr1.ctr_customer_sk)
83               ->  Seq Scan on customer  (cost=0.00..2872.00 rows=100000 width=21)
84               ->  Hash  (cost=43005.48..43005.48 rows=2 width=4)
85                     ->  Nested Loop  (cost=0.00..43005.48 rows=2 width=4)
86                           Join Filter: (ctr1.ctr_store_sk = store.s_store_sk)
87                           ->  CTE Scan on customer_total_return ctr1  (cost=0.00..42987.08 rows=460 width=8)
88                                 Filter: (ctr_total_return > (SubPlan 2))
89                                 SubPlan 2
90                                   ->  Aggregate  (cost=31.09..31.11 rows=1 width=32)
91                                         ->  CTE Scan on customer_total_return ctr2  (cost=0.00..31.07 rows=7 width=32)
92                                               Filter: (ctr1.ctr_store_sk = ctr_store_sk)
93                           ->  Materialize  (cost=0.00..11.51 rows=1 width=4)
94                                 ->  Seq Scan on store  (cost=0.00..11.50 rows=1 width=4)
95                                       Filter: ((s_state)::text = 'TN'::text)
96(33 rows)
97
98Time: 26.596 ms
99

使用pg_duckdb插件的情况下:

  1[pg17@lhrpgalloe soft]$ psql -d testduckdb 
2psql (17.1)
3Type "help" for help.
4
5testduckdb=# \timing on
6Timing is on.
7testduckdb=#  \o /dev/null
8testduckdb=# SET duckdb.force_execution = true;
9Time: 0.922 ms
10testduckdb=# \i ./pg_duckdb/third_party/duckdb/extension/tpcds/dsdgen/queries/01.sql
11Time: 217.649 ms
12testduckdb=# \i ./pg_duckdb/third_party/duckdb/extension/tpcds/dsdgen/queries/02.sql
13Time: 824.615 ms
14testduckdb=# \i ./pg_duckdb/third_party/duckdb/extension/tpcds/dsdgen/queries/03.sql
15Time: 808.953 ms
16testduckdb=# \i ./pg_duckdb/third_party/duckdb/extension/tpcds/dsdgen/queries/04.sql
17Time: 2618.012 ms (00:02.618)
18testduckdb=# \i ./pg_duckdb/third_party/duckdb/extension/tpcds/dsdgen/queries/05.sql
19Time: 1918.740 ms (00:01.919)
20testduckdb=# \i ./pg_duckdb/third_party/duckdb/extension/tpcds/dsdgen/queries/06.sql
21Time: 450.428 ms
22testduckdb=# \i ./pg_duckdb/third_party/duckdb/extension/tpcds/dsdgen/queries/07.sql
23Time: 840.963 ms
24testduckdb=# \i ./pg_duckdb/third_party/duckdb/extension/tpcds/dsdgen/queries/07.sql
25Time: 760.951 ms
26testduckdb=# \i ./pg_duckdb/third_party/duckdb/extension/tpcds/dsdgen/queries/08.sql
27Time: 428.280 ms
28testduckdb=# \i ./pg_duckdb/third_party/duckdb/extension/tpcds/dsdgen/queries/09.sql
29Time: 9708.039 ms (00:09.708)
30testduckdb=# \i ./pg_duckdb/third_party/duckdb/extension/tpcds/dsdgen/queries/10.sql
31Time: 1495.543 ms (00:01.496)
32testduckdb=# 
33
34testduckdb=# explain WITH customer_total_return AS
35testduckdb-#   (SELECT sr_customer_sk AS ctr_customer_sk,
36testduckdb(#           sr_store_sk AS ctr_store_sk,
37testduckdb(#           sum(sr_return_amt) AS ctr_total_return
38testduckdb(#    FROM store_returns,
39testduckdb(#         date_dim
40testduckdb(#    WHERE sr_returned_date_sk = d_date_sk
41testduckdb(#      AND d_year = 2000
42testduckdb(#    GROUP BY sr_customer_sk,
43testduckdb(#             sr_store_sk)
44testduckdb-# SELECT c_customer_id
45testduckdb-# FROM customer_total_return ctr1,
46testduckdb-#      store,
47testduckdb-#      customer
48testduckdb-# WHERE ctr1.ctr_total_return >
49testduckdb-#     (SELECT avg(ctr_total_return)*1.2
50testduckdb(#      FROM customer_total_return ctr2
51testduckdb(#      WHERE ctr1.ctr_store_sk = ctr2.ctr_store_sk)
52testduckdb-#   AND s_store_sk = ctr1.ctr_store_sk
53testduckdb-#   AND s_state = 'TN'
54testduckdb-#   AND ctr1.ctr_customer_sk = c_customer_sk
55testduckdb-# ORDER BY c_customer_id
56testduckdb-# LIMIT 100;
57                                                                                                                QUERY PLAN                                                                                                                
58------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
59 Custom Scan (DuckDBScan)  (cost=0.00..0.00 rows=0 width=0)
60   DuckDB Execution Plan: 
61
62 ┌───────────────────────────┐
63 │           TOP_N           │
64 │    ────────────────────   │
65 │          Top: 100         │
66 │                           │
67 │         Order By:         │
68 │ customer.c_customer_id ASC│
69 └─────────────┬─────────────┘
70 ┌─────────────┴─────────────┐
71 │         PROJECTION        │
72 │    ────────────────────   │
73 │       c_customer_id       │
74 │                           │
75 │        ~28785 Rows        │
76 └─────────────┬─────────────┘
77 ┌─────────────┴─────────────┐
78 │            CTE            │
79 │    ────────────────────   │
80 │         CTE Name:         │
81 │   customer_total_return   ├───────────────────────────────────────────┐
82 │                           │                                           │
83 │       Table Index: 0      │                                           │
84 └─────────────┬─────────────┘                                           │
85 ┌─────────────┴─────────────┐                             ┌─────────────┴─────────────┐
86 │       HASH_GROUP_BY       │                             │           FILTER          │
87 │    ────────────────────   │                             │    ────────────────────   │
88 │          Groups:          │                             │ (CAST(ctr_total_return AS │
89 │             #0            │                             │     DOUBLE) > SUBQUERY)   │
90 │             #1            │                             │                           │
91 │                           │                             │                           │
92 │    Aggregates: sum(#2)    │                             │                           │
93 │                           │                             │                           │
94 │        ~28785 Rows        │                             │         ~5757 Rows        │
95 └─────────────┬─────────────┘                             └─────────────┬─────────────┘
96 ┌─────────────┴─────────────┐                             ┌─────────────┴─────────────┐
97 │         PROJECTION        │                             │      RIGHT_DELIM_JOIN     │
98 │    ────────────────────   │                             │    ────────────────────   │
99 │       sr_customer_sk      │                             │      Join Type: RIGHT     │
100 │        sr_store_sk        │                             │                           │
101 │       sr_return_amt       │                             │        Conditions:        ├────────────────────────────────────────────────────────────────────────┐
102 │                           │                             │    ctr_store_sk IS NOT    │                                                                        │
103 │                           │                             │ DISTINCT FROM ctr_store_sk│                                                                        │
104 │                           │                             │                           │                                                                        │
105 │        ~57570 Rows        │                             │         ~5757 Rows        │                                                                        │
106 └─────────────┬─────────────┘                             └─────────────┬─────────────┘                                                                        │
107 ┌─────────────┴─────────────┐                             ┌─────────────┴─────────────┐                                                          ┌─────────────┴─────────────┐
108 │         HASH_JOIN         │                             │         HASH_JOIN         │                                                          │         HASH_JOIN         │
109 │    ────────────────────   │                             │    ────────────────────   │                                                          │    ────────────────────   │
110 │      Join Type: INNER     │                             │      Join Type: INNER     │                                                          │      Join Type: RIGHT     │
111 │                           │                             │                           │                                                          │                           │
112 │        Conditions:        ├──────────────┐              │        Conditions:        ├──────────────┐                                           │        Conditions:        ├───────────────────────────────────────────┐
113 │   sr_returned_date_sk =   │              │              │      c_customer_sk =      │              │                                           │    ctr_store_sk IS NOT    │                                           │
114 │          d_date_sk        │              │              │       ctr_customer_sk     │              │                                           │ DISTINCT FROM ctr_store_sk│                                           │
115 │                           │              │              │                           │              │                                           │                           │                                           │
116 │        ~57570 Rows        │              │              │         ~5757 Rows        │              │                                           │         ~5757 Rows        │                                           │
117 └─────────────┬─────────────┘              │              └─────────────┬─────────────┘              │                                           └─────────────┬─────────────┘                                           │
118 ┌─────────────┴─────────────┐┌─────────────┴─────────────┐┌─────────────┴─────────────┐┌─────────────┴─────────────┐                             ┌─────────────┴─────────────┐                             ┌─────────────┴─────────────┐
119 │     POSTGRES_SEQ_SCAN     ││     POSTGRES_SEQ_SCAN     ││     POSTGRES_SEQ_SCAN     ││         HASH_JOIN         │                             │         PROJECTION        │                             │         DUMMY_SCAN        │
120 │    ────────────────────   ││    ────────────────────   ││    ────────────────────   ││    ────────────────────   │                             │    ────────────────────   │                             │                           │
121 │         Function:         ││         Function:         ││         Function:         ││      Join Type: INNER     │                             │ (avg(ctr_total_return) * 1│                             │                           │
122 │     POSTGRES_SEQ_SCAN     ││     POSTGRES_SEQ_SCAN     ││     POSTGRES_SEQ_SCAN     ││                           │                             │            .2)            │                             │                           │
123 │                           ││                           ││                           ││        Conditions:        │                             │        ctr_store_sk       │                             │                           │
124 │        Projections:       ││        Projections:       ││                           ││ ctr_store_sk = s_store_sk │                             │                           │                             │                           │
125 │    sr_returned_date_sk    ││         d_date_sk         ││                           ││                           ├──────────────┐              │                           │                             │                           │
126 │       sr_customer_sk      ││                           ││                           ││                           │              │              │                           │                             │                           │
127 │        sr_store_sk        ││          Filters:         ││                           ││                           │              │              │                           │                             │                           │
128 │       sr_return_amt       ││ d_year=2000 AND d_year IS ││                           ││                           │              │              │                           │                             │                           │
129 │                           ││          NOT NULL         ││                           ││                           │              │              │                           │                             │                           │
130 │                           ││                           ││                           ││                           │              │              │                           │                             │                           │
131 │        ~287867 Rows       ││        ~14609 Rows        ││        ~100000 Rows       ││         ~5757 Rows        │              │              │         ~143 Rows         │                             │                           │
132 └───────────────────────────┘└───────────────────────────┘└───────────────────────────┘└─────────────┬─────────────┘              │              └─────────────┬─────────────┘                             └───────────────────────────┘
133                                                                                        ┌─────────────┴─────────────┐┌─────────────┴─────────────┐┌─────────────┴─────────────┐
134                                                                                        │          CTE_SCAN         ││     POSTGRES_SEQ_SCAN     ││       HASH_GROUP_BY       │
135                                                                                        │    ────────────────────   ││    ────────────────────   ││    ────────────────────   │
136                                                                                        │        CTE Index: 0       ││         Function:         ││         Groups: #0        │
137                                                                                        │                           ││     POSTGRES_SEQ_SCAN     ││    Aggregates: avg(#1)    │
138                                                                                        │                           ││                           ││                           │
139                                                                                        │                           ││          Filters:         ││                           │
140                                                                                        │                           ││  s_state='TN' AND s_state ││                           │
141                                                                                        │                           ││         IS NOT NULL       ││                           │
142                                                                                        │                           ││                           ││                           │
143                                                                                        │        ~28785 Rows        ││          ~24 Rows         ││         ~143 Rows         │
144                                                                                        └───────────────────────────┘└───────────────────────────┘└─────────────┬─────────────┘
145                                                                                                                                                  ┌─────────────┴─────────────┐
146                                                                                                                                                  │         PROJECTION        │
147                                                                                                                                                  │    ────────────────────   │
148                                                                                                                                                  │        ctr_store_sk       │
149                                                                                                                                                  │      ctr_total_return     │
150                                                                                                                                                  │                           │
151                                                                                                                                                  │         ~287 Rows         │
152                                                                                                                                                  └─────────────┬─────────────┘
153                                                                                                                                                  ┌─────────────┴─────────────┐
154                                                                                                                                                  │         HASH_JOIN         │
155                                                                                                                                                  │    ────────────────────   │
156                                                                                                                                                  │      Join Type: INNER     │
157                                                                                                                                                  │                           │
158                                                                                                                                                  │        Conditions:        ├──────────────┐
159                                                                                                                                                  │ctr_store_sk = ctr_store_sk│              │
160                                                                                                                                                  │                           │              │
161                                                                                                                                                  │         ~287 Rows         │              │
162                                                                                                                                                  └─────────────┬─────────────┘              │
163                                                                                                                                                  ┌─────────────┴─────────────┐┌─────────────┴─────────────┐
164                                                                                                                                                  │          CTE_SCAN         ││         DELIM_SCAN        │
165                                                                                                                                                  │    ────────────────────   ││    ────────────────────   │
166                                                                                                                                                  │        CTE Index: 0       ││       Delim Index: 1      │
167                                                                                                                                                  │                           ││                           │
168                                                                                                                                                  │        ~28785 Rows        ││         ~2878 Rows        │
169                                                                                                                                                  └───────────────────────────┘└───────────────────────────┘
170
171
172(113 rows)
173

测试结果

震惊,在未使用pg_duckdb插件的情况下,sql1的时间为145秒,使用了pg_duckdb插件后变为了0.2秒;sql4的时间为33分钟+  未出结果秒,使用了pg_duckdb插件后变为了2.6秒。

脚本未使用pg_duckdb使用了pg_duckdb
01.sql145934.025 ms217.649 ms
02sql494.014 ms824.615 ms
03.sql134.083 ms808.953 ms
04.sql33分钟+  未出结果2618.012 ms
05.sql575.717 ms1918.740 ms
06.sql50091.984 ms450.428 ms
07.sql372.611 ms760.951 ms
08.sql199.187 ms428.280 ms
09.sql1467.471 ms9708.039 ms
10.sql5629.187 ms1495.543 ms

04.sql未跑出结果:

总结

1、目前仅支持Postgres 15-17、Ubuntu 22.04-24.04 or MacOS,或使用docker安装使用较快

2、在openEuler 22.03也可以编译成功(本文使用openEuler 22.03)

3、pg_duckdb编译需要CMake 3.5…3.29 or higher (cmake --version)

4、可以在会话级别配置当前SQL是否使用pg_duckdb插件:

 1\timing on
2\o  /dev/null
3
4
5SET duckdb.force_execution = 0;
6\i ./pg_duckdb/third_party/duckdb/extension/tpcds/dsdgen/queries/22.sql
7
8
9SET duckdb.force_execution = 1;
10\i ./pg_duckdb/third_party/duckdb/extension/tpcds/dsdgen/queries/22.sql

5、在实际使用场景中,可以优先考虑PG的索引只扫描+表并行;若性能依然差,则可以考虑安装使用pg_duckdb

参考

https://mp.weixin.qq.com/s/ljT4Ubt8eEgNaV84Jfv9Dg

https://pigsty.io/zh/blog/pg/pg-duckdb/

https://pgxman.com/x/pg_duckdb

https://cloud.tencent.com/developer/article/2439084



AiDBA
【PostgreSQL培训认证】【Oracle OCP、OCM、高可用(RAC+DG+OGG)培训认证】【MySQL OCP培训认证】【GreenPlum培训】【SQL Server培训】官网:www.dbaup.com,学习不止数据库
 最新文章