Company
Date Published
Author
Clint Wylie
Word count
2382
Language
English
Hacker News points
None

Summary

Apache Druid has introduced a new technique called incremental encoding, specifically front-coding, to optimize the storage and query performance of STRING columns within its segments. This method focuses on reducing segment sizes by storing only the suffixes of string values after identifying common prefixes, thereby enhancing the efficiency of memory usage and minimizing disk reads. The initiative capitalizes on the structure of Druid's dictionary encoding, which traditionally maps distinct string values to integer identifiers, to achieve significant space savings, particularly for data with large common prefixes like URLs. The front-coding approach has proven promising in preliminary tests, offering space savings without compromising read performance, and is slated to potentially become the default setting in future Druid releases. This change is part of ongoing efforts to refine Druid's segment format and explore further improvements, such as adaptive encoding strategies based on data characteristics. Users are encouraged to experiment with this new encoding strategy, albeit with caution regarding compatibility with older Druid versions, and provide feedback to guide future enhancements.