How Grafana Mimir's store-gateway overcame out-of-memory errors

Post Details

Company

Grafana Labs

Date Published

July 6, 2023

Author

Marco Pracucci and Dimitar Dimitrov

Word Count

2,640

Language

English

Hacker News Points

-

Source URL

grafana.com/blog/breaking-the-memory-barrier-how-grafana-mimirs-store-gateway-overcame-out-of-memory-errors

Summary

Grafana Mimir, an open-source distributed time series database, has addressed a significant challenge of out-of-memory errors during complex queries by introducing innovative techniques within its store-gateway. Historically, Mimir struggled with unbounded memory utilization when processing high cardinality or long-time range queries, which could lead to system crashes. To mitigate this, the system now employs query sharding, allowing for concurrent execution across multiple machines, thereby reducing CPU and memory usage. Furthermore, the store-gateway has transitioned to a streaming model, which batches and progressively sends data to the querier, decoupling memory usage from data size and significantly reducing out-of-memory errors by 90%. This new approach not only maintains efficient memory utilization but also manages to lower latency through a preloading mechanism. Despite these advancements, challenges like managing a high number of inflight requests and optimizing series ID selection remain. Future improvements will continue to enhance the system's efficiency and scalability, as detailed in upcoming posts about further optimizations and enhancements in Grafana Mimir.