Home / Companies / Honeycomb / Blog / Post Details
Content Deep Dive

How Honeycomb Uses Honeycomb, Part 3: End-to-End Failures

Blog post from Honeycomb

Post Details
Company
Date Published
Author
Christine Yen
Word Count
539
Language
English
Hacker News Points
-
Summary

Honeycomb emphasizes reliability by implementing end-to-end checks that write and read a single data point within a specific time frame, retrying up to 30 times upon failure. An issue arose with one of these checks, specifically partition 5, where read durations were elevated, suggesting a problem not with the API server or storage but potentially with the Kafka-related processes. The investigation involved analyzing various metrics like cum_write_time and latency_api_msec to isolate the issue, demonstrating Honeycomb's ability to quickly slice and compare metrics to identify problems. This incident exemplifies Honeycomb's approach to systems observability, combining pre-aggregated time series metrics with log aggregator flexibility, and highlights its potential as a next-generation tool.