Home / Companies / Polar Signals / Blog / Post Details
Content Deep Dive

Optimizing SLO Performance at Scale: How We Solved Pyrra’s Query Performance on Thanos

Blog post from Polar Signals

Post Details
Company
Date Published
Author
Matthias Loibl
Word Count
1,422
Language
-
Hacker News Points
-
Summary

Polar Signals faced significant cross-zone traffic costs on Google Cloud Platform due to the way Thanos handled Service Level Objective (SLO) monitoring with Pyrra, leading to substantial data transfer expenses. The root cause was the evaluation process of SLOs that required fetching millions of raw samples across zones for each evaluation, significantly increasing network bandwidth consumption and CPU usage. To address this, Polar Signals implemented an optimization technique using Prometheus subqueries to pre-aggregate data into manageable chunks, which dramatically reduced the data volume transferred across zones by processing fewer pre-aggregated values instead of raw samples. This approach resulted in a 90% reduction in cross-zone traffic and improved query performance, although it introduced a minor 1% accuracy discrepancy. The trade-off between accuracy and performance was managed by offering this optimization as an opt-in feature, allowing users to balance resource efficiency with precision. The new method is currently being tested in production, with plans to integrate it into Pyrra for broader community use, emphasizing that performance should not hinder comprehensive observability.