Optimizing the OpenTelemetry Python SDK for LLM Workloads
Blog post from Honeycomb
Agentic workloads require precise tooling, similar to developers, needing high cardinality and fast feedback loops to explore their code effectively, but instrumentation tends to be resource-intensive. The growing popularity of Python due to LLM libraries has highlighted performance issues with the OpenTelemetry Python API and SDK, prompting investigations into improving instrumentation efficiency. A benchmarked web service using WSGI and requests libraries was tested, primarily focusing on tracing, revealing the OpenTelemetry SDK's significant memory footprint. OpenTelemetry's API and SDK separation allows for flexible configuration, reducing performance costs and enabling easier swapping of SDK implementations. By leveraging C++ through Pybind11, a prototype was developed to reduce resource usage and address context propagation challenges, demonstrating promising performance improvements with lower CPU and memory footprints compared to existing SDKs. The current implementation is in its early stages and available on GitHub, with an invitation for collaboration to expand its capabilities.