Company
Date Published
Author
Adrien Grand
Word count
759
Language
-
Hacker News points
None

Summary

Elasticsearch users often face challenges with heap memory usage when attempting to store large amounts of data per node, leading to potential stability issues as they push their storage limits. The problem arises because Lucene, the underlying library used by Elasticsearch, requires some data to be stored in memory for efficient disk access, such as the terms index, which maps term prefixes to disk offsets. Traditionally, this has led to significant memory consumption, but recent changes in Elasticsearch version 7.7 have dramatically improved efficiency by moving more data structures from the JVM heap to disk, relying on the filesystem cache for frequently accessed data. This shift has significantly reduced heap memory requirements, allowing users to store more data per node and reducing costs, with some datasets experiencing up to a 100-fold decrease in memory usage. The improvements, particularly evident in datasets like Geonames and NYC taxis, result from changes that optimize the storage of Lucene indices, such as moving the terms index for the _id field on-disk, benefiting users who primarily index logs and metrics. Elasticsearch 7.7 promises to enhance storage efficiency, encouraging users to test the updates and provide feedback.