Seeking the Perfect Apache Druid Rollup

Post Details

Company

Rill

Date Published

Dec. 16, 2021

Author

Neil Buesing

Word Count

1,186

Company Posts That Month

2

Language

English

Hacker News Points

-

Source URL

www.rilldata.com/blog/seeking-the-perfect-apache-druid-rollup

Summary

Apache Druid's rollup feature is designed to optimize storage and improve query performance by compressing multiple data entries into a single row where dimensions are identical and metrics can be pre-computed. To use rollups effectively, several key concepts must be understood: rollups combine rows with identical facts, with generalization improving potential; timestamp granularity affects rollup opportunities, with Apache Druid offering various granularity settings; not all aggregations, such as averages, can be pre-computed at ingestion, but workarounds exist; ingestion metrics like count are crucial for accurate record counting; Apache DataSketches project provides solutions for high-cardinality problems without significantly increasing storage; custom transformations on the __time dimension allow for specific granularity adjustments; and best-effort rollups are essential for real-time data ingestion, especially with Kafka integration. Apache Druid's compaction feature further enhances rollup efficiency by reducing segment numbers and performing additional rollups, making it a powerful tool for managing large-scale data efficiently.

Trends Found in this Post

Trend	Post Mentions	Total Month Mentions	Posts	Companies	MoM
Real-time	5	1,004	320	104	+5%