Home / Companies / Neon / Blog / Post Details
Content Deep Dive

Postmortem: Delayed Start Compute Operations

Blog post from Neon

Post Details
Company
Date Published
Author
Em Sharnoff
Word Count
1,000
Language
English
Hacker News Points
-
Summary

In May 2025, Neon experienced two significant outages in their AWS us-east-1 region, affecting customers' ability to create or activate inactive databases, while running databases remained unaffected. The outages, which totaled 5.5 hours, were attributed to IP address allocation issues due to overloaded clusters and subnet capacity limits. To address these incidents, Neon implemented changes to their IP allocation strategies, adjusted subnet sizes, and redirected some traffic to other regions. They also initiated long-term architectural changes to their Kubernetes setup, known as the Cells project, to prevent future scalability issues. During the incidents, temporary solutions included reconfiguring AWS CNI settings, upscaling control plane databases, and enabling rate limiting to manage traffic, although some customers encountered rate limit errors. The company is actively investigating the root causes, working with AWS Support, and plans to provide further updates while encouraging customers with production databases to avoid scale-to-zero configurations to minimize impact.