Home / Companies / PagerDuty / Blog / Post Details
Content Deep Dive

Keeping PagerDuty Always On With Remote Incident Response

Blog post from PagerDuty

Post Details
Company
Date Published
Author
Dave Bresci
Word Count
636
Language
English
Hacker News Points
-
Summary

Earlier this month, a router misconfiguration within a widely used service provider caused significant internet disruptions, affecting several well-known SaaS companies, which led PagerDuty to observe a global spike in incidents. In response to the unusual increase in incident volume, PagerDuty initiated a Major Incident Response, leveraging its mobile app to coordinate a remote team of incident commanders, subject-matter experts, and stakeholders via Slack and Zoom from various locations, including San Francisco, Toronto, and Atlanta. Their process emphasized the importance of a distributed work culture, allowing the team to rapidly acknowledge, react, and coordinate responses irrespective of physical location. PagerDuty’s Slack integration served as a central hub for real-time communication and documentation, which aided both immediate response efforts and postmortem analysis to enhance future incident management. This incident underscored the efficacy of PagerDuty's platform and practices in maintaining operational continuity for customers during major disruptions, demonstrating that remote orchestration can effectively mirror in-office incident management.