The Cloudflare outage and why code quality matters more than ever
Blog post from Sonar
On November 18th, 2025, Cloudflare experienced a significant outage due to a small change in database permissions and a hard-coded limit in its process routing traffic, highlighting the challenges of maintaining code quality in complex systems. The incident underscores the importance of understanding interconnected service dependencies and prioritizing code quality to prevent such disruptions. The blog emphasizes that the critical question is not about assigning blame but rather about ensuring conversations about failure modes occur and are documented. It suggests that while code reviews are standard practice, they may not catch all issues, emphasizing the need for automated tools like static code analyzers to identify potential problems proactively. The discussion extends to the broader concept of code quality as governance, noting that it involves maintaining the structural integrity of software so it operates as intended over time. It warns of the increasing necessity for automated verification, especially with the rise of AI-generated code, to prevent outages and the associated costs and reputational damage, advocating for deterministic static analysis as a safety net to catch logic errors before they reach production.