How to track LLM token usage (2026): Prompt, completion, context window, and per-step visibility
Blog post from Braintrust
The text delves into the intricacies of token usage in large language models (LLMs) and the factors contributing to increased token consumption, such as prompt bloat, context window pressure, and agent loops. It emphasizes the importance of structured token tracking across multiple levels—prompt and completion tokens per call, context window utilization, and span-level tracking within agent traces—to identify and resolve production issues effectively. The guide explores methods for logging token usage through integrations like BraintrustSDK, OpenTelemetry, and auto-instrumentation, which connect token counts to various operational metrics, making it easier to control costs and enhance performance. It also highlights the need for detailed token attribution in agent workflows and monitoring context window utilization to prevent overflow errors. Additionally, it discusses the impact of caching and batching on token usage and underscores the importance of avoiding common token tracking errors to ensure accurate data for decision-making.