Company
Date Published
Author
-
Word count
2330
Language
-
Hacker News points
None

Summary

A common issue faced by users running Elasticsearch at scale is the Java heap pressure caused by fielddata, a data structure that inverts the inverted index to facilitate sorting and aggregations but consumes significant memory. Fielddata is loaded on demand, leading to increased memory usage and potential instability as segments are added. Although the Fielddata Circuit Breaker can block requests that exceed available memory, it does not clear existing fielddata. To address this, Elasticsearch recommends using doc values, which store fielddata on disk at index time, reducing heap memory usage and improving performance. Doc values are not compatible with analyzed strings, but multifields can be used to manage both analyzed and un-analyzed strings. Elasticsearch 2.0 aims to set doc values as the default for all fields, except analyzed strings, to mitigate this problem.