Incident Report: October 28th, 2025
Blog post from Railway
An outage at Railway impacted its backend API, causing the dashboard to be inaccessible, CLI operations to fail, and delays in GitHub-based deployments, although running deployments and platform-level features remained unaffected for users not accessing those services during the incident. The outage, occurring on October 28, 2025, was caused by a Postgres database change that introduced a new column with an index to a critical table, causing an exclusive lock and leading to a cascading failure as API requests queued behind the locked table, exceeding connection limits. Despite manual intervention attempts failing due to exhausted connection slots, the migration completed after about 30 minutes, releasing the lock and allowing queued operations to process normally. Railway plans to prevent similar incidents by enforcing the use of the CONCURRENTLY option for index creation in CI, adjusting PgBouncer connection pool limits, and configuring database user connection limits to maintain administrative access during incidents.