Company
Date Published
Author
Julia Brouillette
Word count
1533
Language
English
Hacker News points
None

Summary

Apache Druid's new feature, Query from Deep Storage, introduced in Druid 28.0, allows users to perform queries directly from its deep storage layer, optimizing both performance and cost by eliminating the need to preload all data onto Druid’s data servers. This feature enables a more elastic and simplified data architecture by keeping only necessary real-time data in memory while storing older, less frequently accessed data in deep storage, where queries can run asynchronously to avoid impacting real-time query latency. This development broadens Druid’s capabilities, enabling on-demand historical data analysis, exports, downloads, and complex reporting without requiring additional high-performance storage or compute resources, thus supporting both real-time and non-real-time workloads in a single system. The integration of query-time Broadcast Joins and Shuffle Joins further enhances Druid’s ability to handle large, complex queries efficiently, making the deep storage not just a backup option but an integral part of the analytics system. The deep storage layer also facilitates data redundancy, fault tolerance, and cluster scaling by reducing data movement during scaling operations, thus positioning Druid as a comprehensive solution for real-time analytics applications with extended historical analysis capabilities.