On August 14, 2024, an outage affected approximately 0.4% of customer projects in the us-east-1 region for up to two hours due to the failure of an EC2 instance hosting a Neon pageserver. Although the incident was resolved by migrating projects from the failed pageserver to others, the response time was longer than desired, with some customers experiencing up to a two-hour service gap. The delay was attributed to the latency in alert systems and the semi-manual process of migrating projects. Neon has since implemented a new Storage Controller designed to improve fault tolerance and reduce downtime by autonomously managing project migrations in the event of server failures. This new system, which has been in use since May 2024, enables rapid and autonomous responses to node failures, minimizing service disruptions. The company is accelerating the transition of all customer projects to this new infrastructure to enhance resilience and support critical workloads.