Home / Companies / Deepinfra / Blog / Post Details
Content Deep Dive

Pricing 101: Token Math & Cost-Per-Completion Explained

Blog post from Deepinfra

Post Details
Company
Date Published
Author
Deep
Word Count
6,002
Language
English
Hacker News Points
-
Summary

DeepInfra's article on LLM pricing provides a comprehensive overview of how to calculate costs associated with using large language models by understanding input and output token counts. It explains that every request, including system prompts, conversation history, and tool-call JSON, contributes to input tokens, while the model's responses contribute to output tokens. The article details the pricing model, highlighting that both input and output tokens are billed separately, with potential discounts available for cached input, which can significantly reduce costs for applications with repeated text. It further explores various scenarios and strategies to manage costs effectively, such as using shorter system prompts, maintaining a rolling window of conversation history, and optimizing retrieval for fewer, higher-quality context chunks. The guide also emphasizes the importance of logging token usage and estimated costs to monitor spending and implement budget guardrails to prevent unexpected expenses, ultimately offering best practices to keep deployment cost-efficient and predictable.