How Honeycomb Uses Honeycomb, Part 3: End-to-End Failures

Post Details

Company

Honeycomb

Date Published

Jan. 20, 2017

Author

Christine Yen

Word Count

539

Company Posts That Month

2

Language

English

Hacker News Points

-

Post removed?

No

Source URL

www.honeycomb.io/blog/how-honeycomb-uses-honeycomb-part-3-end-to-end-failures

Summary

Honeycomb emphasizes reliability by implementing end-to-end checks that write and read a single data point within a specific time frame, retrying up to 30 times upon failure. An issue arose with one of these checks, specifically partition 5, where read durations were elevated, suggesting a problem not with the API server or storage but potentially with the Kafka-related processes. The investigation involved analyzing various metrics like cum_write_time and latency_api_msec to isolate the issue, demonstrating Honeycomb's ability to quickly slice and compare metrics to identify problems. This incident exemplifies Honeycomb's approach to systems observability, combining pre-aggregated time series metrics with log aggregator flexibility, and highlights its potential as a next-generation tool.

Trends Found in this Post

No tracked trend matches for this post yet.

Use This Data

Use this post, company, and trend context to find content marketing opportunities, perform competitive analysis, or address product feature gaps via the Plushcap MCP server or the Plushcap API.