Dinky配置Flink集群并提交双流JOIN任务↗

文摘   2024-09-24 00:24   重庆  
昨天介绍了如何安装Dinky,今天介绍Dinky怎么配置Flink集群并提交任务。

01

Dinky配置Flink集群

Dinky注册中心模块添加Flink集群,点击确定

如果状态是正常,即说明添加成功。

这或许是一个对你有用的开源项目data-warehouse-learning 项目是一套基于 MySQL + Kafka + Hadoop + Hive + Dolphinscheduler + Doris + Seatunnel + Paimon + Hudi + Iceberg + Flink + Dinky + DataRT + SuperSet 实现的实时离线数仓(数据湖)系统,以大家最熟悉的电商业务为切入点,详细讲述并实现了数据产生、同步、数据建模、数仓(数据湖)建设、数据服务、BI报表展示等数据全链路处理流程。

https://gitee.com/wzylzjtn/data-warehouse-learning

https://github.com/Mrkuhuo/data-warehouse-learning

https://bigdatacircle.top/

项目演示:

02

Dinky提交双流Join任务

新建Flink任务

编写代码:
SET 'execution.checkpointing.interval' = '100s';SET 'table.exec.state.ttl'= '8640000';SET 'table.exec.mini-batch.enabled' = 'true';SET 'table.exec.mini-batch.allow-latency' = '60s';SET 'table.exec.mini-batch.size' = '10000';SET 'table.local-time-zone' = 'Asia/Shanghai';SET 'table.exec.sink.not-null-enforcer'='DROP';SET 'table.exec.sink.upsert-materialize' = 'NONE';
CREATE TABLE show_log_table ( log_id BIGINT, show_params STRING, row_time AS CURRENT_TIMESTAMP, WATERMARK FOR row_time AS row_time ) WITH ( 'connector' = 'datagen', 'rows-per-second' = '1', 'fields.show_params.length' = '1', 'fields.log_id.min' = '1', 'fields.log_id.max' = '10' ); CREATE TABLE click_log_table ( log_id BIGINT, click_params STRING, row_time AS CURRENT_TIMESTAMP, WATERMARK FOR row_time AS row_time ) WITH ( 'connector' = 'datagen', 'rows-per-second' = '1', 'fields.click_params.length' = '1', 'fields.log_id.min' = '1', 'fields.log_id.max' = '10' ); CREATE TABLE sink_table ( s_id BIGINT, s_params STRING, c_id BIGINT, c_params STRING ) WITH ( 'connector' = 'print' );                      INSERT INTO sink_table SELECT show_log_table.log_id as s_id, show_log_table.show_params as s_params, click_log_table.log_id as c_id, click_log_table.click_params as c_params FROM show_log_table INNER JOIN click_log_table ON show_log_table.log_id = click_log_table.log_id                AND show_log_table.row_time BETWEEN click_log_table.row_time - INTERVAL '4' HOUR AND click_log_table.row_time;

右侧选择执行模式等信息

点击保存,检查

点击运行,在控制台会给出提交到Flink的集群地址

10

查看结果

查看Flink任务运行结果

查看打印日志

停止任务

11

代码获取

https://gitee.com/wzylzjtn/data-warehouse-learning

https://github.com/Mrkuhuo/data-warehouse-learning


12

文档获取

13

进交流群群添加作者

推荐阅读系列文章

大数据技能圈
分享大数据前沿技术,实战代码,详细文档
 最新文章