Incident Report: October 16th, 2025
Blog post from Railway
Railway experienced an outage affecting its Edge Network connectivity due to a combination of increased traffic load, high memory utilization, and a routine network service upgrade, resulting in intermittent connection failures for users accessing services via public endpoints. The incident began when the routing services faced high memory utilization under a spike in request volume, causing load balancers to stop forwarding traffic as health checks failed, leading to HTTP errors and Cloudflare error pages for users. While private networking remained unaffected, the service gradually recovered as caching mechanisms mitigated the increased load. Railway has taken immediate steps to increase memory limits and improve monitoring, and it is in the process of rewriting its internal routing service to enhance memory efficiency and traffic handling capabilities. Future plans include migrating to a multi-region, distributed architecture to isolate failures and significantly boost capacity, reflecting Railway's commitment to maintaining a stable and resilient cloud infrastructure.