Content Deep Dive
How we stopped memory intensive queries from crashing ElasticSearch
Blog post from Plaid
Post Details
Company
Date Published
Author
Angela Zhang
Word Count
1,656
Language
English
Hacker News Points
-
Summary
We investigated the repeated ElasticSearch outages at Plaid, which were caused by memory-intensive queries crashing data nodes and bringing down the cluster. The root cause was identified as user-written queries aggregating over a large number of buckets, causing individual counters to take up too much memory on each data node. To address this issue, we configured request memory circuit breakers to cap memory usages for individual queries and limited the number of buckets ElasticSearch would use for aggregations. We also worked with AWS support to update the cluster settings, which allowed us to prevent similar issues in the future.