Home / Companies / Rill / Blog / Post Details
Content Deep Dive

Seeking the Perfect Apache Druid Rollup

Blog post from Rill

Post Details
Company
Date Published
Author
Neil Buesing
Word Count
1,186
Language
English
Hacker News Points
-
Summary

Apache Druid's rollup feature is designed to optimize storage and improve query performance by compressing multiple data entries into a single row where dimensions are identical and metrics can be pre-computed. To use rollups effectively, several key concepts must be understood: rollups combine rows with identical facts, with generalization improving potential; timestamp granularity affects rollup opportunities, with Apache Druid offering various granularity settings; not all aggregations, such as averages, can be pre-computed at ingestion, but workarounds exist; ingestion metrics like count are crucial for accurate record counting; Apache DataSketches project provides solutions for high-cardinality problems without significantly increasing storage; custom transformations on the __time dimension allow for specific granularity adjustments; and best-effort rollups are essential for real-time data ingestion, especially with Kafka integration. Apache Druid's compaction feature further enhances rollup efficiency by reducing segment numbers and performing additional rollups, making it a powerful tool for managing large-scale data efficiently.