Apache Doris 内置支持包括 Hive、Iceberg、Hudi、Paimon、LakeSoul、JDBC 在内的多种 Catalog,并为其提供原生高性能且稳定的访问能力,以满足与数据湖的集成需求。而随着 Apache Doris 用户的增加,新的数据源连接需求也随之增加。因此,从 3.0 版本开始,Apache Doris 引入了 Trino Connector 兼容框架。
Delta Lake:https://doris.apache.org/zh-CN/docs/lakehouse/datalake-analytics/deltalake
Apache Kudu:https://doris.apache.org/zh-CN/docs/lakehouse/datalake-analytics/kudu
Bigquery:https://doris.apache.org/zh-CN/docs/lakehouse/datalake-analytics/bigquery
TPCH:https://doris.apache.org/zh-CN/docs/lakehouse/datalake-analytics/tpch
TPCDS:https://doris.apache.org/zh-CN/docs/lakehouse/datalake-analytics/tpcds
Trino Connector 插件兼容方案作为 Apache Doris Catalog 功能的补充,旨在帮助用户快速进行数据源集成和基础的数据迁移,在性能和兼容性方面可能存在不足,欢迎加入社区一同改进。对于 Hive、Iceberg、Hudi、Paimon 等数据源,建议使用 Apache Doris 原生 Catalog 进行访问,以便于获得最好的性能和稳定性。
使用指南
01 环境准备
02 环境部署
docker network create -d bridge trinoconnector-net
2. 启动所有组件
sh start-trinoconnector-compose.sh
3. 启动后,可以使用如下脚本,登陆 Doris 命令行
sh login-doris.sh
03 创建 Catalog
delta_lake
和kudu_catalog
的 Catalog(可通过SHOW CATALOGS/
/SHOW CREATE CATALOG ${catalog_name}
查看)。以下为这两个 Catalog 的创建语句:-- 已创建,无需执行
create catalog delta_lake properties (
"type"="trino-connector",
"trino.connector.name"="delta_lake",
"trino.hive.metastore.uri"="thrift://hive-metastore:9083",
"trino.hive.s3.endpoint"="http://minio:9000",
"trino.hive.s3.region"="us-east-1",
"trino.hive.s3.aws-access-key"="minio",
"trino.hive.s3.aws-secret-key"="minio123",
"trino.hive.s3.path-style-access"="true"
);
CREATE CATALOG `kudu_catalog` PROPERTIES (
"type" = "trino-connector",
"trino.connector.name" = "kudu",
"trino.kudu.authentication.type" = "NONE",
"trino.kudu.client.master-addresses" = "kudu-master-1:7051,kudu-master-2:7151,kudu-master-3:7251"
);
04 数据查询
mysql> switch delta_lake;
Query OK, 0 rows affected (0.00 sec)
mysql> use default;
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A
Database changed
mysql> select * from customer limit 10;
+-----------+--------------------+------------------------------------+-------------+-----------------+-----------+--------------+---------------------------------------------------------------------------------------------------------------+
| c_custkey | c_name | c_address | c_nationkey | c_phone | c_acctbal | c_mktsegment | c_comment |
+-----------+--------------------+------------------------------------+-------------+-----------------+-----------+--------------+---------------------------------------------------------------------------------------------------------------+
| 2 | Customer#000000002 | XSTf4,NCwDVaWNe6tEgvwfmRchLXak | 13 | 23-768-687-3665 | 121.65 | AUTOMOBILE | l accounts. blithely ironic theodolites integrate boldly: caref |
| 34 | Customer#000000034 | Q6G9wZ6dnczmtOx509xgE,M2KV | 15 | 25-344-968-5422 | 8589.70 | HOUSEHOLD | nder against the even, pending accounts. even |
| 66 | Customer#000000066 | XbsEqXH1ETbJYYtA1A | 22 | 32-213-373-5094 | 242.77 | HOUSEHOLD | le slyly accounts. carefully silent packages benea |
| 98 | Customer#000000098 | 7yiheXNSpuEAwbswDW | 12 | 22-885-845-6889 | -551.37 | BUILDING | ages. furiously pending accounts are quickly carefully final foxes: busily pe |
| 130 | Customer#000000130 | RKPx2OfZy0Vn 8wGWZ7F2EAvmMORl1k8iH | 9 | 19-190-993-9281 | 5073.58 | HOUSEHOLD | ix slowly. express packages along the furiously ironic requests integrate daringly deposits. fur |
| 162 | Customer#000000162 | JE398sXZt2QuKXfJd7poNpyQFLFtth | 8 | 18-131-101-2267 | 6268.99 | MACHINERY | accounts along the doggedly special asymptotes boost blithely during the quickly regular theodolites. slyly |
| 194 | Customer#000000194 | mksKhdWuQ1pjbc4yffHp8rRmLOMcJ | 16 | 26-597-636-3003 | 6696.49 | HOUSEHOLD | quickly across the fluffily dogged requests. regular platelets around the ironic, even requests cajole quickl |
| 226 | Customer#000000226 | ToEmqB90fM TkLqyEgX8MJ8T8NkK | 3 | 13-452-318-7709 | 9008.61 | AUTOMOBILE | ic packages. ideas cajole furiously slyly special theodolites: carefully express pinto beans acco |
| 258 | Customer#000000258 | 7VbADek8qYezQYotxNUmnNI | 12 | 22-278-425-9944 | 6022.27 | MACHINERY | about the regular, bold accounts; pending packages use furiously stealthy warhorses. bold accounts sleep fur |
| 290 | Customer#000000290 | 8OlPT9G 8UqVXmVZNbmxVTPO8 | 4 | 14-458-625-5633 | 1811.35 | MACHINERY | sts. blithely pending requests sleep fluffily on the regular excuses. carefully expre |
+-----------+--------------------+------------------------------------+-------------+-----------------+-----------+--------------+---------------------------------------------------------------------------------------------------------------+
10 rows in set (0.12 sec)
2. 查询 Kudu 表数据
mysql> switch kudu_catalog;
Query OK, 0 rows affected (0.00 sec)
mysql> use default;
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A
Database changed
mysql> select * from test_table limit 10;
+------+----------+--------+
| key | value | added |
+------+----------+--------+
| 0 | NULL | 12.345 |
| 4 | NULL | 12.345 |
| 20 | NULL | 12.345 |
| 26 | NULL | 12.345 |
| 29 | value 29 | 12.345 |
| 42 | NULL | 12.345 |
| 50 | NULL | 12.345 |
| 56 | NULL | 12.345 |
| 66 | NULL | 12.345 |
| 74 | NULL | 12.345 |
+------+----------+--------+
10 rows in set (1.49 sec)
3. 联邦查询
mysql> select * from delta_lake.`default`.customer c join kudu_catalog.`default`.test_table t on c.c_custkey = t.`key` where c.c_custkey < 50;
+-----------+--------------------+---------------------------------------+-------------+-----------------+-----------+--------------+--------------------------------------------------------------------------------------------------------+------+----------+--------+
| c_custkey | c_name | c_address | c_nationkey | c_phone | c_acctbal | c_mktsegment | c_comment | key | value | added |
+-----------+--------------------+---------------------------------------+-------------+-----------------+-----------+--------------+--------------------------------------------------------------------------------------------------------+------+----------+--------+
| 1 | Customer#000000001 | IVhzIApeRb ot,c,E | 15 | 25-989-741-2988 | 711.56 | BUILDING | to the even, regular platelets. regular, ironic epitaphs nag e | 1 | value 1 | 12.345 |
| 33 | Customer#000000033 | qFSlMuLucBmx9xnn5ib2csWUweg D | 17 | 27-375-391-1280 | -78.56 | AUTOMOBILE | s. slyly regular accounts are furiously. carefully pending requests | 33 | value 33 | 12.345 |
| 3 | Customer#000000003 | MG9kdTD2WBHm | 1 | 11-719-748-3364 | 7498.12 | AUTOMOBILE | deposits eat slyly ironic, even instructions. express foxes detect slyly. blithely even accounts abov | 3 | value 3 | 12.345 |
| 35 | Customer#000000035 | TEjWGE4nBzJL2 | 17 | 27-566-888-7431 | 1228.24 | HOUSEHOLD | requests. special, express requests nag slyly furiousl | 35 | value 35 | 12.345 |
| 2 | Customer#000000002 | XSTf4,NCwDVaWNe6tEgvwfmRchLXak | 13 | 23-768-687-3665 | 121.65 | AUTOMOBILE | l accounts. blithely ironic theodolites integrate boldly: caref | 2 | NULL | 12.345 |
| 34 | Customer#000000034 | Q6G9wZ6dnczmtOx509xgE,M2KV | 15 | 25-344-968-5422 | 8589.70 | HOUSEHOLD | nder against the even, pending accounts. even | 34 | NULL | 12.345 |
| 32 | Customer#000000032 | jD2xZzi UmId,DCtNBLXKj9q0Tlp2iQ6ZcO3J | 15 | 25-430-914-2194 | 3471.53 | BUILDING | cial ideas. final, furious requests across the e | 32 | NULL | 12.345 |
+-----------+--------------------+---------------------------------------+-------------+-----------------+-----------+--------------+--------------------------------------------------------------------------------------------------------+------+----------+--------+
7 rows in set (0.13 sec)适配新的 Trino Connector
Trino-Connector-Catalog
访问对应数据源。节选自 Apache Doris 官网文档,完整内容可见:https://doris.apache.org/zh-CN/community/how-to-contribute/trino-connector-developer-guide/
01 编译 Kakfa Connector 插件
拉取 Trino 源码:$ git clone https://github.com/trinodb/trino.git 将 Trino 切换至 435 版本:$ git checkout 435 进入 Kafka 插件源码目录:$ cd trino/plugin/trino-kafka 编译 Kafka 插件:$ mvn clean install -DskipTest 编译完成后,trino/plugin/trino-kafka/目录下会生成 target/trino-kafka-435
目录
注意:
每一个 Connector 插件都是一个子目录,不是 JAR 包。 由于 Doris 当前使用 435 版本的 trino-main
包,建议编译 435 版本的 Connector 插件。其他版本的 Connector 插件可能会存在兼容性问题。如在使用中遇到问题,随时向 Apache Doris 社区反馈。
02 设置 Doris 的 fe.conf / be.conf
fe.conf
、be.conf
进行配置,使 Doris 能够找到该插件。trino-kafka-435
目录存放在/path/to/connectors
目录下,接着进行配置:fe.conf:在 fe.conf 文件中配置 trino_connector_plugin_dir=/path/to/connectors
(若 fe.conf 中没有配置trino_connector_plugin_dir
属性,则默认使用${Doris_HOME}/fe/connectors
目录)be.conf:在 be.conf 文件中配置 trino_connector_plugin_dir=/path/to/connectors
(若 be.conf 中没有配置trino_connector_plugin_dir
属性 ,则默认使用${Doris_HOME}/be/connectors
目录)
注意:Doris 采用懒加载的方式加载 Trino Connector 插件,这意味着如果第一次在 Doris 中使用 Trino-Connector Catalog 功能,无需重启 FE / BE 节点、Doris 会自动加载插件,且只加载 1 次。而如果 /path/to/connectors/
目录下插件发生了变化,则需重启 FE / BE 节点,重新加载变化后的插件。
03 使用 Trino-Connector-Catalog 功能
create catalog kafka_tpch properties (
"type"="trino-connector",
-- 下面这四个属性来源于 trino,与 trino 的 etc/catalog/kakfa.properties 中的属性一致。
"trino.connector.name"="kafka",
"trino.kafka.table-names"="tpch.customer,tpch.orders,tpch.lineitem,tpch.part,tpch.partsupp,tpch.supplier,tpch.nation,tpch.region",
"trino.kafka.nodes"="localhost:9092",
"trino.kafka.table-description-dir" = "/mnt/datadisk1/fangtiewei"
);
type
关于 Catalog 类型必须设置为trino-connector
;属性 trino.connector.name
、trino.kafka.table-names
、trino.kafka.nodes
、trino.kafka.table-description-dir
均来源于 Trino,具体可参考:https://trino.io/docs/current/connector/kafka.html#configuration不同的 Connector 插件应该设置不同的属性,可参考 Trino 官方文档:https://trino.io/docs/current/connector.html#connector--page-root
使用 Catalog
switch kafka_tpch
语句切换到该 Catalog 后,即可查询 Kafka 数据源中数据。结束语
后续我们还将陆续推出 Apache Doris 与其他主流数据湖格式、存储系统构建湖仓一体架构的使用指南和方法论,请持续关注。
Apache Doris + Apache Hudi 快速搭建指南|Lakehouse 使用手册(一) Apache Doris + Apache Paimon 快速搭建指南|Lakehouse 使用手册(二)
另外我们创建了数据湖/湖仓一体专项支持群,有需求的伙伴可扫描下方二维码申请加入(备注:湖仓一体),即可获得专职工程师的技术指导与支持。
- END -
更多标杆企业信赖
智慧金融与政企:杭银消金|河北幸福消费金融|汇添富基金|金融壹账通|平安人寿|奇富科技|同程数科|无锡锡商银行|星云零售信贷|银联商务|招商信诺人寿|招联金融|360数科 |360企业安全浏览器
企业服务与新经济:宝尊科技|橙联|度言|观测云|慧策|快成物流|领健|领创|名创优品|Moka BI|美联物业|钱大妈|拈花云科|上海家化|思必驰|顺丰科技|物易云通|云积互动|有赞|雨润集团|纵腾集团