Company
Date Published
Author
Mihai Bojin
Word count
981
Language
English
Hacker News points
None

Summary

A change in the Postgres execution plan for the Neon Control Plane's Activity Monitor led to a significant outage in the AWS us-east-1 region on May 16, 2025, by causing CPU saturation in the database and preventing the suspension of idle Computes. The Neon Control Plane, which manages starting and suspending Neon Postgres VMs, relies on a Postgres query that experienced a dramatic performance drop due to the use of a suboptimal index, increasing execution times and causing an excess number of concurrently running VMs, leading to IP allocation failures. This issue was confined to the us-east-1 region due to region-isolated databases, and mitigation efforts included refreshing statistics with ANALYZE and using a materialized CTE to improve query performance stability. Future plans involve implementing a more active Garbage Collection strategy and potentially moving historical data to an OLAP system to prevent similar outages.