Company
Date Published
Author
Harshal Brahmbhatt, Kevin Deems, Nina Giunta, Michael Hoffmann
Word count
1902
Language
English
Hacker News points
None

Summary

Cloudflare's Health Mediated Deployments (HMD) is a data-driven solution that automates software updates across its global network, using Prometheus metrics to determine whether new code should continue to roll out or be reverted. HMD uses Thanos, a system for storing and scaling Prometheus metrics, to query the performance of Cloudflare's services and detect potential issues. If the success rate is unexpectedly decreasing, HMD reverts the change in order to stabilize the system. The solution has improved Thanos' ability to handle high-load queries, reducing batch runtimes by 15x. Additionally, HMD introduces an adaptive priority-based concurrency control mechanism to tackle spiky load patterns and prioritize on-call engineer queries over HMD batch requests. The project also explores optimizing time series storage for object storage using Parquet files.