Real-Time Analytics on Apache Iceberg with Tinybird
Blog post from Tinybird
Tinybird has introduced experimental support for Apache Iceberg, a high-performance open table format for large-scale analytics datasets, in a private beta. To test its capabilities, the GitHub Archive dataset was used to analyze real-time GitHub activity, illustrating that while Apache Iceberg is effective for data warehouse-like loads, it struggles with real-time analytics due to high latency and complexity in sorting and partitioning. As a solution, a hybrid architecture was developed where Iceberg remains the source of truth, and Tinybird handles data synchronization, transformation, and serving via fast HTTP endpoints. This approach leverages copy pipes to synchronize data from Iceberg, materialized views for real-time data transformation, and endpoint pipes to expose aggregated data as APIs, enabling scalable, low-latency access. The setup demonstrates a scalable, real-time, and cost-efficient system that integrates well with developers' workflows by maintaining Iceberg as the durable source of truth while providing interactive speed and automatic API generation through Tinybird. Future plans include handling schema evolution dynamically, merging historic and real-time events, and exploring an event-sourcing architecture with Kafka and Iceberg.