Company
Date Published
Author
Matt Perpick
Word count
780
Language
English
Hacker News points
None

Summary

Sobotka, a system that processes incoming metrics data, experienced a backlog growth issue on Friday night, prompting the author to cancel dinner plans and work on resolving the problem. The dashboard provided high-level information, indicating lower throughput in metrics per second and a growing backlog. Further investigation revealed that Postgres instance was likely the bottleneck due to increased queries from Sobotka. A quick fix was implemented by increasing the cache expiration time, which led to improved performance and reduced load on the database. The author emphasizes the importance of good telemetry and real-time graphs in debugging operational issues, highlighting Datadog's role in providing such tools for developers and ops teams.