Handling error rate in OpenTelemetry and New Relic
Blog post from New Relic
The blog post examines the differences in how errors are defined, captured, and presented between New Relic's application performance monitoring (APM) services and OpenTelemetry APM services, highlighting why users may observe varying error rate graphs for the same service on their respective summary pages. New Relic defines a transaction as a logical unit of work in a software application, recording only one error per transaction, whereas OpenTelemetry does not have a transaction concept and instead focuses on spans, with errors determined by status codes of ERROR in root spans. This distinction results in different error rate calculations, with New Relic counting errors based on transactions and OpenTelemetry relying on HTTP metrics or spans. The blog emphasizes the importance of redefining error baselines and adjusting alert conditions, service level objectives, and dashboards when transitioning between these instrumentation methods, noting that there is no direct comparison for error rates between New Relic and OpenTelemetry due to their fundamentally different models.