Building Streaming Data Pipelines, Part 2: Data Processing and Enrichment With SQL

Company

Confluent

Date Published

July 17, 2025

Author

Robin Moffatt

Word count

3780

Language

English

Hacker News points

None

URL

www.confluent.io/blog/streaming-etl-flink-tableflow

Summary

The blog post outlines a step-by-step approach to building a streaming ETL pipeline using Confluent Cloud, Apache Kafka, and Apache Flink to process and transform environmental data from the UK Environment Agency. Initially, the data, including river levels and rainfall, is extracted via REST API and streamed into Kafka topics, which are then exposed as Apache Iceberg tables using Tableflow. The process involves unpacking and cleaning the data, enriching it with additional station and measure information, and finally joining these datasets. The enriched data is continuously written into new tables, enabling real-time insights and visualization through tools like Apache Superset. The pipeline emphasizes a shift-left approach, processing data upstream to reduce latency and ensure consistency. The transformed data, stored in Iceberg format, supports various analytics and AI applications, demonstrating the power of combining Kafka and Flink for efficient data processing and visualization.