Introducing the Batch API: Run Large Inference Jobs 20% Cheaper
Blog post from Deepinfra
DeepInfra has launched a Batch API that enables users to run large, non-urgent inference jobs at a 20% reduced cost compared to real-time pricing, making it suitable for tasks such as dataset evaluation, generating embeddings, and large-scale classification. The API is compatible with OpenAI's Batch API, allowing users to easily transition their existing workflows by uploading a JSONL file, creating a batch, and polling for completion. This approach is designed for use cases where immediate responses are unnecessary, offering a more cost-effective solution by trading off latency for throughput. The Batch API supports multiple endpoints, including completions and embeddings, and applies automatic discounts to batch requests. Users can start by pointing their OpenAI client at DeepInfra and following the detailed documentation for seamless integration.
| Trend | Post Mentions | Total Month Mentions | Posts | Companies | MoM |
|---|---|---|---|---|---|
| Real-time | 6 | 5,457 | 1,338 | 238 | -5% |
| Vector Search | 4 | 2,091 | 556 | 118 | -8% |
| LLM | 1 | 5,172 | 1,006 | 220 | -43% |
| RAG | 1 | 885 | 228 | 95 | -58% |