Home / Companies / Cloudflare / Blog / Post Details
Content Deep Dive

Code Orange: Fail Small is complete. The result is a stronger Cloudflare network

Blog post from Cloudflare

Post Details
Company
Date Published
Author
Jeremy Hartman
Word Count
2,038
Language
English
Hacker News Points
-
Summary

Cloudflare has concluded a significant engineering initiative, "Code Orange: Fail Small," aimed at enhancing the resiliency, security, and reliability of its infrastructure following global outages in November and December 2025. The project focused on safer configuration changes, reducing failure impacts, revising incident management procedures, and improving communication. The introduction of Snapstone allows for health-mediated deployment of configuration changes, ensuring issues are caught early and do not affect large portions of traffic. Failure impact reduction strategies now include using last known good configurations and segmenting systems to minimize outage effects. Revised incident management procedures provide broader access to essential tools and pathways during outages, supported by drills to ensure readiness. A new internal Codex has been established to enforce best practices across Cloudflare's development lifecycle, integrating AI reviews to prevent regressions. The company has also enhanced its communication strategy to keep customers informed during incidents, aiming for transparency and timely updates throughout the resolution process. These comprehensive efforts have led to a strengthened infrastructure and a commitment to continual improvement in resilience.