Observing vLLM with OpenTelemetry and Dash0

Post Details

Company

Dash0

Date Published

May 5, 2026

Author

Julia Furst Morgado

Word Count

2,995

Company Posts That Month

9

Language

English

Hacker News Points

-

Source URL

www.dash0.com/blog/observing-vllm-with-opentelemetry-and-dash0

Summary

vLLM, an inference server with built-in OpenTelemetry instrumentation, requires specific configurations for effective observability in production environments. Unlike standard Application Performance Monitoring (APM) that indicates slow requests, vLLM's observability identifies distinct latency causes such as KV cache preemptions or decode bottlenecks through inference-specific signals like cache utilization and queue depth. This setup uses the OTel Collector and Dash0 as the observability backend to capture these signals for capacity planning and latency debugging. The architecture involves setting up a trace and metrics pipeline using Docker Compose with a FastAPI RAG app, vLLM server, and OTel Collector. This setup allows for detailed distributed tracing and metrics collection, helping differentiate between latency causes in LLM inference and standard HTTP services. The system provides insights into phases of LLM latency, such as scheduling, prefill, and decode, which require different tuning strategies. The integration with Dash0 enables monitoring of metrics related to GPU cache usage and queue depth, facilitating proactive capacity management and debugging.

Trends Found in this Post

Trend	Post Mentions	Total Month Mentions	Posts	Companies	MoM
OpenTelemetry	35	945	122	49	-21%
RAG	14	2,105	333	83	+124%
Observability	10	3,421	707	180	-24%
LLM	5	9,074	1,640	224	+53%
Real-time	4	5,735	1,391	247	-9%
AI Agents	1	4,942	1,264	250	+12%
Kubernetes	1	1,965	371	106	-15%
Multi-agent systems	1	546	198	78	+19%