Home / Companies / Deepinfra / Blog / Post Details
Content Deep Dive

Introducing the Batch API: Run Large Inference Jobs 20% Cheaper

Blog post from Deepinfra

Post Details
Company
Date Published
Author
Vasilije Novakovic
Word Count
1,051
Company Posts That Month
6
Language
English
Hacker News Points
-
Summary

DeepInfra has launched a Batch API that enables users to run large, non-urgent inference jobs at a 20% reduced cost compared to real-time pricing, making it suitable for tasks such as dataset evaluation, generating embeddings, and large-scale classification. The API is compatible with OpenAI's Batch API, allowing users to easily transition their existing workflows by uploading a JSONL file, creating a batch, and polling for completion. This approach is designed for use cases where immediate responses are unnecessary, offering a more cost-effective solution by trading off latency for throughput. The Batch API supports multiple endpoints, including completions and embeddings, and applies automatic discounts to batch requests. Users can start by pointing their OpenAI client at DeepInfra and following the detailed documentation for seamless integration.

Trends Found in this Post
Trend Post Mentions Total Month Mentions Posts Companies MoM
Real-time 6 5,457 1,338 238 -5%
Vector Search 4 2,091 556 118 -8%
LLM 1 5,172 1,006 220 -43%
RAG 1 885 228 95 -58%