OpenTelemetry Best Practices #3: Data Prep and Cleansing
Blog post from Honeycomb
Effective observability requires not just collecting telemetry data but ensuring it is curated and useful for gaining insights into production systems. While OpenTelemetry auto-instrumentation can quickly generate large amounts of data, the challenge lies in refining this data to avoid being overwhelmed by irrelevant or sensitive information. This involves using processors like the Transform processor to manipulate data attributes—such as dropping, combining, or hashing attributes to maintain privacy while preserving data utility. Redacting sensitive data is crucial, with processors allowing both passive and aggressive modes to identify and filter out sensitive patterns like Social Security Numbers or credit card information. Maintaining data cardinality while excluding Personally Identifiable Information (PII) is essential to track user interactions without compromising privacy, often achieved through hashing, though this method has limitations. Additionally, filtering out non-useful spans, such as those from health checks, helps streamline data for better observability. Building secure and efficient observability pipelines involves configuring collectors and processors correctly, emphasizing the need for strategic data management practices in telemetry systems.