Home / Companies / PagerDuty / Blog / Post Details
Content Deep Dive

Outage Post-Mortem – Jan 16, 2014

Blog post from PagerDuty

Post Details
Company
Date Published
Author
Tony Albanese
Word Count
180
Language
English
Hacker News Points
-
Summary

PagerDuty emphasizes its commitment to transparency regarding service outages and advises following their dedicated Twitter account for updates. On January 16th, a minor incident resulted in the delay of six alerts due to a rare race condition caused by efforts to improve service scalability, affecting operations in Cassandra and Zookeeper. Although no alerts were lost, the delays affected three email, two SMS, and one push notification. The issue was swiftly resolved, with regression testing conducted to prevent future occurrences. PagerDuty apologizes for the inconvenience and encourages users to reach out with any questions.