Company
Date Published
Author
Michael Noll, Victoria Xia, Wade Waldron
Word count
2349
Language
English
Hacker News points
None

Summary

In this blog post, Kafka Streams is used to build an end-to-end streaming application that analyzes Wikipedia real-time updates through a combination of Kafka Streams and Kafka Connect. The goal is to enrich an incoming stream of user click events with the latest geo-region information for users and then compute aggregations based on the enriched stream. The authors argue that traditional approaches to implementing this use case, such as querying an external database, are problematic due to scalability issues. Instead, they introduce the concept of a stream-table duality and leverage Kafka Streams' built-in support for KTables, which are backed by state stores in Kafka Streams. This allows for fast local table lookups without network round-trips and decouples the availability of the stream processing application from that of an external database. The authors demonstrate how to implement this use case with Kafka Streams using a KStream-KTable join to enrich the user click events with geo-location side data and then compute aggregations based on the enriched stream.