Pricing 101: Token Math & Cost-Per-Completion Explained

Post Details

Company

Deepinfra

Date Published

Jan. 13, 2026

Author

Deep

Word Count

6,002

Language

English

Hacker News Points

-

Source URL

deepinfra.com/blog/pricing-101-token-math-cost-per-completion

Summary

DeepInfra's article on LLM pricing provides a comprehensive overview of how to calculate costs associated with using large language models by understanding input and output token counts. It explains that every request, including system prompts, conversation history, and tool-call JSON, contributes to input tokens, while the model's responses contribute to output tokens. The article details the pricing model, highlighting that both input and output tokens are billed separately, with potential discounts available for cached input, which can significantly reduce costs for applications with repeated text. It further explores various scenarios and strategies to manage costs effectively, such as using shorter system prompts, maintaining a rolling window of conversation history, and optimizing retrieval for fewer, higher-quality context chunks. The guide also emphasizes the importance of logging token usage and estimated costs to monitor spending and implement budget guardrails to prevent unexpected expenses, ultimately offering best practices to keep deployment cost-efficient and predictable.