Company
Date Published
Author
Michael McCandless
Word count
1179
Language
-
Hacker News points
None

Summary

Apache Lucene has undergone significant changes in how document values (doc values) are indexed and accessed, aiming to enhance performance and ensure users only pay for what they use. These updates, which will feature in the upcoming Lucene 7.0 release, move from a random-access API to a more restrictive iterator API, allowing for better compression and optimization, particularly benefiting sparse cases. Improvements include a new codec design to remove abstraction layers, implement sparse cases directly, and introduce a faster advanceExact API for specific document targeting. These changes have resulted in improved search performance and reduced index sizes, as demonstrated by new benchmarks using the New York City taxi ride data corpus. The benchmarks, which test both sparse and dense documents, indicate significant performance enhancements, despite initial setbacks from transitioning APIs. Additionally, index-time sorting, although slower during indexing, offers notable search speedups, illustrating a beneficial trade-off for many users. This evolution in Lucene underscores the importance of automated benchmarks in detecting performance regressions and guiding optimizations.