Home / Companies / Honeycomb / Blog / Post Details
Content Deep Dive

Ask Miss O11y: Long-Running Requests

Blog post from Honeycomb

Post Details
Company
Date Published
Author
Liz Fong-Jones
Word Count
396
Language
English
Hacker News Points
-
Summary

When dealing with long-lived streaming RPC workloads, setting service-level objectives (SLOs) can be challenging due to the absence of a clear "success" metric per stream and the potential for streams to last several days. The suggested approach involves instrumenting the workload to provide regular health updates by creating a root span for each stream per minute, known as a "tick," with a span duration of 60 seconds. This setup allows tracking of successful versus failed writes and delays against Kafka offsets, creating metric-like data that can be used to feed SLOs. By aggregating data through a stream ID and using minute-long spans, it becomes possible to monitor each stream's behavior without accumulating excessive spans or waiting for the stream to conclude. This method enables setting SLOs on the number of successful or failed connections per minute and even on individual message success rates, thus ensuring continuous observability and management of the streaming workload.