Enriching Kafka streams for real-time queries

Post Details

Company

Tinybird

Date Published

May 15, 2023

Author

Jorge Sancha

Word Count

1,170

Language

English

Hacker News Points

-

Source URL

www.tinybird.co/blog/enriching-kafka-streams-in-real-time

Summary

Tinybird offers a solution for enriching data captured by Kafka in real-time, particularly useful for e-commerce transaction analysis. By utilizing a modified version of the Star Schema Benchmark dbgen tool, users can generate and ingest large quantities of fake data, such as Customers, Suppliers, and Parts, into Tinybird's Datasources API. This process automatically identifies data types, creating dimension tables necessary for enrichment. Line orders are then pushed into Kafka, and a consumer reads these events, sending them to Tinybird at a rate of about 20,000 records per second. Enrichment can be done using traditional SQL joins, which may become slow with data growth, or more efficiently by enriching at ingestion time using ClickHouse's columnar database capabilities. Tinybird allows the creation of "Ingestion" Pipes that materialize query results into a new Data Source, enabling fast and efficient data queries by denormalizing data at ingestion. This approach significantly speeds up query execution, providing results in milliseconds rather than seconds, and supports a high rate of requests per second, ensuring up-to-date results. Tinybird’s real-time data processing and enrichment capabilities can be explored using their free Build Plan, with community support available through their Slack channel.