Improved Batch Inference API: Enhanced UI, Expanded Model Support, and 3000Ã Rate Limit Increase

Post Details

Company

Together AI

Date Published

Sept. 15, 2025

Author

Rajas Bansal, Mitali Meratwal, Nikitha Suryadevara, Will Van Eaton, Rishabh Bhargava

Word Count

374

Language

English

Hacker News Points

-

Source URL

www.together.ai/blog/batch-inference-api-updates-2025

Summary

The improved Batch Inference API offers significant enhancements, including a streamlined user interface, expanded support for all serverless models and private deployments, and a substantial increase in rate limits from 10 million to 30 billion enqueued tokens per model per user, representing a 3000× increase. This makes it simpler, faster, and more economical, operating at 50% of the cost of the real-time API for processing large-scale datasets. It is particularly advantageous for high-throughput tasks like large-scale text analysis, fraud detection, synthetic data generation, and content moderation, enabling teams like Inception Labs to conduct massive experiments efficiently. These updates aim to make large-scale inference more accessible and cost-effective, positioning the Batch Inference API as an ideal solution for handling extensive workloads without real-time constraints.

Improved Batch Inference API: Enhanced UI, Expanded Model Support, and 3000Ã Rate Limit Increase

Improved Batch Inference API: Enhanced UI, Expanded Model Support, and 3000Ã Rate Limit Increase