Home / Companies / Cloudflare / Blog / Post Details
Content Deep Dive

How Workers powers our internal maintenance scheduling pipeline

Blog post from Cloudflare

Post Details
Company
Date Published
Author
Kevin Deems and Michael Hoffmann
Word Count
2,586
Language
English
Hacker News Points
-
Summary

Cloudflare's expansive global network, with data centers in over 330 cities, necessitated the development of a sophisticated, automated maintenance scheduler to manage the complexities of disruptive maintenance without compromising service reliability. Previously, manual coordination between network operations and infrastructure specialists was inadequate to prevent conflicts during maintenance, which could lead to downtime. The new scheduler, built on Cloudflare Workers, programmatically enforces safety constraints, ensuring that critical operations do not overlap and disrupt services like the Zero Trust product, Aegis. Utilizing graph processing inspired by Facebook's TAO and leveraging Cloudflare's CDN, the system efficiently manages data by making targeted API requests and employing a smart middleware layer to handle subrequest issues. Additionally, the use of Thanos and conversion of historical data into Apache Parquet files allows for efficient real-time and historical analysis, reducing latency and enhancing performance. This system represents a significant advancement in balancing network growth with product performance, although further challenges remain as Cloudflare continues to scale.