Company
Date Published
Author
Anatoly Mikhaylov, Nick Hefty
Word count
3804
Language
English
Hacker News points
None

Summary

Engineers at Zendesk, including Anatoly Mikhaylov and Nick Hefty, detailed their approach to optimizing the cost of observability data while maintaining essential visibility for troubleshooting. By leveraging Datadog's tools, they centralized metrics and monitors and adopted Application Performance Monitoring (APM) and Log Management for deeper insights, leading to enhanced incident response and performance. The team conducted an observability audit using the Pareto Principle to identify data that was both valuable and costly, focusing on optimizing trace and log usage. They implemented single-span ingestion and custom facets to enrich root spans, reducing data volume without losing critical insights. They also experimented with log reduction strategies, such as batching log entries and applying exclusion filters, to cut costs. These efforts led to a significant reduction in observability expenses and a clearer understanding of cost distribution across services, without disrupting engineering workflows. The initiative successfully demonstrated that strategic optimizations can sustain critical operational data access while controlling financial impact.