October 2025 incident report

Post Details

Company

Inngest

Date Published

Oct. 24, 2025

Author

Darwin Wu and Dan Farrelly

Word Count

2,368

Language

-

Hacker News Points

-

Source URL

www.inngest.com/blog/2025-10-24-october-incident-report

Summary

Over recent weeks, a series of incidents have impacted the system, including a critical event resulting in data loss, which the company acknowledges has affected customers and their businesses. The report details several incidents, such as issues with the Event API caused by a Kafka cluster disk filling up, leading to API request timeouts and delays. Additionally, performance degradation in the application database, hosted on AWS RDS, prompted a migration to PlanetScale, improving database performance significantly. Further challenges included execution delays due to Kafka throughput issues and high load on the ClickHouse database, resulting in dashboard performance problems. The company has implemented various mitigations, including optimizing queries, refactoring execution workers, and isolating database operations to improve system performance. In response to these incidents, the company has committed to enhancing their incident response protocols, increasing transparency, and providing more consistent updates to users. They plan to conduct training and dry runs to improve their incident handling capabilities and have apologized for the disruption caused to their customers.