Home / Companies / Honeycomb / Blog / Post Details
Content Deep Dive

Tracking On-Call Health

Blog post from Honeycomb

Post Details
Company
Date Published
Author
Fred Hebert
Word Count
1,441
Language
English
Hacker News Points
-
Summary

On-call rotations, essential for handling system incidents, can significantly impact engineers' well-being, as highlighted in the exploration of strategies used by Honeycomb to monitor on-call health. The traditional approach of counting disruptions, such as alarms and incidents, is critiqued for its emphasis on negative outcomes and lack of consideration for the subjective experiences of engineers. Instead, Honeycomb adopts a more nuanced method inspired by Erik Hollnagel's resilience grid, focusing on abilities to respond, monitor, learn, and anticipate. This approach includes regular surveys using a simplified Google Form to capture qualitative feedback on these dimensions, though it acknowledges limitations in capturing the full complexity of on-call experiences, such as rotation size, alert volume, and organizational trust. The data, while not perfect, aids in informing operational decisions and is shared anonymously within the organization to improve planning and interventions. Honeycomb continues to refine its methodology to ensure the feedback remains relevant and useful, promoting a more supportive on-call environment.