Introducing the Batch API: Run Large Inference Jobs 20% Cheaper

Post Details

Company

Deepinfra

Date Published

June 19, 2026

Author

Vasilije Novakovic

Word Count

1,051

Company Posts That Month

6

Language

English

Hacker News Points

-

Source URL

deepinfra.com/blog/batch-api

Summary

DeepInfra has launched a Batch API that enables users to run large, non-urgent inference jobs at a 20% reduced cost compared to real-time pricing, making it suitable for tasks such as dataset evaluation, generating embeddings, and large-scale classification. The API is compatible with OpenAI's Batch API, allowing users to easily transition their existing workflows by uploading a JSONL file, creating a batch, and polling for completion. This approach is designed for use cases where immediate responses are unnecessary, offering a more cost-effective solution by trading off latency for throughput. The Batch API supports multiple endpoints, including completions and embeddings, and applies automatic discounts to batch requests. Users can start by pointing their OpenAI client at DeepInfra and following the detailed documentation for seamless integration.

Trends Found in this Post

Trend	Post Mentions	Total Month Mentions	Posts	Companies	MoM
Real-time	6	5,457	1,338	238	-5%
Vector Search	4	2,091	556	118	-8%
LLM	1	5,172	1,006	220	-43%
RAG	1	885	228	95	-58%