Company
Date Published
Author
Matthew Ratzke
Word count
605
Language
English
Hacker News points
None

Summary

Apollo has introduced APM dashboard templates for Datadog, aimed at enhancing the observability of GraphOS Router performance with minimal setup time. These templates allow platform and SRE teams to quickly gain insight into the health of the GraphOS Router, supergraph, and subgraph by importing the dashboards into Datadog. The templates facilitate the identification of performance issues like latency spikes and error rates by providing a clear view of the system's operations, including first-class GraphQL error tracking and drill-down capabilities from supergraph to subgraph levels. They support best practices in observability by mapping spans and metrics with stable operation and resource names, and promoting GraphQL errors into APM error views. To manage observability costs efficiently, Apollo recommends maintaining low cardinality for resource names and employing tail-based sampling for high-throughput services, while ensuring consistent application of Datadog’s Unified Service Tagging. The initiative is designed to streamline the monitoring process and improve operational insights, allowing teams to correlate performance shifts with version rollouts and other activities efficiently.