Home / Companies / Tinybird / Blog / Post Details
Content Deep Dive

Real-Time Analytics on Apache Iceberg with Tinybird

Blog post from Tinybird

Post Details
Company
Date Published
Author
Alberto Romeu
Word Count
1,246
Language
English
Hacker News Points
-
Summary

Tinybird has introduced experimental support for Apache Iceberg, a high-performance open table format for large-scale analytics datasets, in a private beta. To test its capabilities, the GitHub Archive dataset was used to analyze real-time GitHub activity, illustrating that while Apache Iceberg is effective for data warehouse-like loads, it struggles with real-time analytics due to high latency and complexity in sorting and partitioning. As a solution, a hybrid architecture was developed where Iceberg remains the source of truth, and Tinybird handles data synchronization, transformation, and serving via fast HTTP endpoints. This approach leverages copy pipes to synchronize data from Iceberg, materialized views for real-time data transformation, and endpoint pipes to expose aggregated data as APIs, enabling scalable, low-latency access. The setup demonstrates a scalable, real-time, and cost-efficient system that integrates well with developers' workflows by maintaining Iceberg as the durable source of truth while providing interactive speed and automatic API generation through Tinybird. Future plans include handling schema evolution dynamically, merging historic and real-time events, and exploring an event-sourcing architecture with Kafka and Iceberg.