Company
Date Published
Author
-
Word count
961
Language
English
Hacker News points
None

Summary

In 2011, Lucene's document store lacked compression, leading to increased storage sizes, but the release of Lucene 4.0 in 2012 introduced a codec API that facilitated experimentation with file formats and ensured backward compatibility. This allowed for significant changes to the index format, including automatic document store compression using LZ4 in Lucene 4.1, which efficiently compresses short documents by grouping them into 16KB blocks. Lucene 5.0 further improved compression by allowing the use of DEFLATE, which provides better compression but at the cost of stored field performance, making it particularly beneficial for users with large data volumes. Additionally, these updates enable better management of hot and cold data, allowing older indices to be stored on cheaper machines with enhanced compression to save disk space. Furthermore, Lucene 5 introduced improved merging of stored fields, overcoming previous CPU-bound issues during merges by tracking and managing incomplete compressed blocks. These advancements are set to be available in Elasticsearch 2.0, highlighting the ongoing evolution of store compression in Lucene and Elasticsearch.