Improved Batch Inference API: Enhanced UI, Expanded Model Support, and 3000Ã Rate Limit Increase
Blog post from Together AI
The improved Batch Inference API offers significant enhancements, including a streamlined user interface, expanded support for all serverless models and private deployments, and a substantial increase in rate limits from 10 million to 30 billion enqueued tokens per model per user, representing a 3000× increase. This makes it simpler, faster, and more economical, operating at 50% of the cost of the real-time API for processing large-scale datasets. It is particularly advantageous for high-throughput tasks like large-scale text analysis, fraud detection, synthetic data generation, and content moderation, enabling teams like Inception Labs to conduct massive experiments efficiently. These updates aim to make large-scale inference more accessible and cost-effective, positioning the Batch Inference API as an ideal solution for handling extensive workloads without real-time constraints.