Company
Date Published
Author
Brian Scanlan, Jess Connor
Word count
1901
Language
English
Hacker News points
1

Summary

Intercom, a software company, realized the need to improve their incident response process after a complicated and long-running incident. They defined different stages of an incident, including identification, triage, communication, and resolution, with clear guidelines for escalating incidents from P1 to P0 based on severity. They also established formalized roles such as Incident Commander and Business Lead to ensure clear communication and accountability. Key principles were defined to guide decision-making during incidents, including doing one thing at a time, trying the easy stuff first, and wrapping things up quickly. The company created training videos and documentation for incident management to ensure a smooth process, tested their processes with scenarios and real-life events, and invested in continuous improvement to handle larger and more serious incidents.