Home / Companies / Confluent / Blog / Post Details
Content Deep Dive

Distributed, Real-time Joins and Aggregations on User Activity Events using Kafka Streams

Blog post from Confluent

Post Details
Company
Date Published
Author
Michael Noll, Victoria Xia, Wade Waldron
Word Count
2,349
Company Posts That Month
6
Language
English
Hacker News Points
-
Summary

In this blog post, Kafka Streams is used to build an end-to-end streaming application that analyzes Wikipedia real-time updates through a combination of Kafka Streams and Kafka Connect. The goal is to enrich an incoming stream of user click events with the latest geo-region information for users and then compute aggregations based on the enriched stream. The authors argue that traditional approaches to implementing this use case, such as querying an external database, are problematic due to scalability issues. Instead, they introduce the concept of a stream-table duality and leverage Kafka Streams' built-in support for KTables, which are backed by state stores in Kafka Streams. This allows for fast local table lookups without network round-trips and decouples the availability of the stream processing application from that of an external database. The authors demonstrate how to implement this use case with Kafka Streams using a KStream-KTable join to enrich the user click events with geo-location side data and then compute aggregations based on the enriched stream.

Trends Found in this Post
Trend Post Mentions Total Month Mentions Posts Companies MoM
Real-time 8 205 60 26 -27%
Data Pipeline 3 22 8 4 +214%
RAG 1 5 5 1 +67%