Webinar Recap: How to Avoid Being On Call With Under-Instrumented Tools
Blog post from Honeycomb
Paige Cruz, a retired Site Reliability Engineer (SRE), shares her journey from relying on Application Performance Monitoring (APM) tools to embracing observability after experiencing a critical on-call incident that revealed the limitations of her existing tools in a modern, distributed system. Despite initial resistance due to cost and the learning curve associated with new tools, Cruz illustrates how observability provides comprehensive insights into system behavior, enabling faster debugging and more informed responses to incidents. She challenges common misconceptions about observability, such as its perceived expense and complexity, by highlighting its benefits over traditional APM tools, including its ability to handle high-cardinality data without additional cost and its capacity to equip even novice engineers with the information needed to resolve issues efficiently. Through her experience, Cruz emphasizes that observability is not merely an additional expense but a crucial component for managing the intricacies of contemporary cloud environments, ultimately leading to more effective and less stressful on-call experiences.