Company
Date Published
Author
Robin Moffatt
Word count
3780
Language
English
Hacker News points
None

Summary

The blog post outlines a step-by-step approach to building a streaming ETL pipeline using Confluent Cloud, Apache Kafka, and Apache Flink to process and transform environmental data from the UK Environment Agency. Initially, the data, including river levels and rainfall, is extracted via REST API and streamed into Kafka topics, which are then exposed as Apache Iceberg tables using Tableflow. The process involves unpacking and cleaning the data, enriching it with additional station and measure information, and finally joining these datasets. The enriched data is continuously written into new tables, enabling real-time insights and visualization through tools like Apache Superset. The pipeline emphasizes a shift-left approach, processing data upstream to reduce latency and ensure consistency. The transformed data, stored in Iceberg format, supports various analytics and AI applications, demonstrating the power of combining Kafka and Flink for efficient data processing and visualization.