Incident Report: The Missing Trigger Notification Emails
Blog post from Honeycomb
An update to Honeycomb's business intelligence telemetry led to a significant issue where approximately 94.1% of email notifications for triggers were not sent from November 18 to November 22, due to an undetected defect. This problem was unnoticed until a customer reported it, highlighting the lack of proper instrumentation and testing, particularly with a third-party SDK used for email delivery. The issue arose from a new update that added metadata to API requests, which was not handled correctly by the existing error detection system. To prevent future incidents, Honeycomb has improved its use of the SDK, added more instrumentation points, and incorporated an integration testing interface provided by their email partner into their automated testing. The incident underscored the importance of observability in the development process, especially when dealing with third-party APIs, to quickly identify and resolve potential errors. Honeycomb has since apologized to customers and is committed to ensuring the reliability of its services.