Optimizing SLO Performance at Scale: How We Solved Pyrra’s Query Performance on Thanos

Post Details

Company

Polar Signals

Date Published

Dec. 30, 2025

Author

Matthias Loibl

Word Count

1,422

Company Posts That Month

4

Language

-

Hacker News Points

-

Post removed?

No

Source URL

www.polarsignals.com/blog/posts/2025/12/30/optimizing-slo-performance-at-scale-how-we-solved-pyrra-s-query-performance-on-thanos

Summary

Polar Signals faced significant cross-zone traffic costs on Google Cloud Platform due to the way Thanos handled Service Level Objective (SLO) monitoring with Pyrra, leading to substantial data transfer expenses. The root cause was the evaluation process of SLOs that required fetching millions of raw samples across zones for each evaluation, significantly increasing network bandwidth consumption and CPU usage. To address this, Polar Signals implemented an optimization technique using Prometheus subqueries to pre-aggregate data into manageable chunks, which dramatically reduced the data volume transferred across zones by processing fewer pre-aggregated values instead of raw samples. This approach resulted in a 90% reduction in cross-zone traffic and improved query performance, although it introduced a minor 1% accuracy discrepancy. The trade-off between accuracy and performance was managed by offering this optimization as an opt-in feature, allowing users to balance resource efficiency with precision. The new method is currently being tested in production, with plans to integrate it into Pyrra for broader community use, emphasizing that performance should not hinder comprehensive observability.

Trends Found in this Post

Trend	Post Mentions	Total Month Mentions	Posts	Companies	MoM
Observability	1	2,671	527	151	+5%
Real-time	1	7,285	1,202	224	+60%

Use This Data

Use this post, company, and trend context to find content marketing opportunities, perform competitive analysis, or address product feature gaps via the Plushcap MCP server or the Plushcap API.