Home / Companies / Together AI / Blog / Post Details
Content Deep Dive

Improved Batch Inference API: Enhanced UI, Expanded Model Support, and 3000× Rate Limit Increase

Blog post from Together AI

Post Details
Company
Date Published
Author
Rajas Bansal, Mitali Meratwal, Nikitha Suryadevara, Will Van Eaton, Rishabh Bhargava
Word Count
374
Language
English
Hacker News Points
-
Summary

The improved Batch Inference API offers significant enhancements, including a streamlined user interface, expanded support for all serverless models and private deployments, and a substantial increase in rate limits from 10 million to 30 billion enqueued tokens per model per user, representing a 3000× increase. This makes it simpler, faster, and more economical, operating at 50% of the cost of the real-time API for processing large-scale datasets. It is particularly advantageous for high-throughput tasks like large-scale text analysis, fraud detection, synthetic data generation, and content moderation, enabling teams like Inception Labs to conduct massive experiments efficiently. These updates aim to make large-scale inference more accessible and cost-effective, positioning the Batch Inference API as an ideal solution for handling extensive workloads without real-time constraints.