Home / Companies / Honeycomb / Blog / Post Details
Content Deep Dive

OpenTelemetry Best Practices #3: Data Prep and Cleansing

Blog post from Honeycomb

Post Details
Company
Date Published
Author
Martin Thwaites
Word Count
951
Language
English
Hacker News Points
-
Summary

Effective observability requires not just collecting telemetry data but ensuring it is curated and useful for gaining insights into production systems. While OpenTelemetry auto-instrumentation can quickly generate large amounts of data, the challenge lies in refining this data to avoid being overwhelmed by irrelevant or sensitive information. This involves using processors like the Transform processor to manipulate data attributes—such as dropping, combining, or hashing attributes to maintain privacy while preserving data utility. Redacting sensitive data is crucial, with processors allowing both passive and aggressive modes to identify and filter out sensitive patterns like Social Security Numbers or credit card information. Maintaining data cardinality while excluding Personally Identifiable Information (PII) is essential to track user interactions without compromising privacy, often achieved through hashing, though this method has limitations. Additionally, filtering out non-useful spans, such as those from health checks, helps streamline data for better observability. Building secure and efficient observability pipelines involves configuring collectors and processors correctly, emphasizing the need for strategic data management practices in telemetry systems.