Together AI Delivers Top Speeds for DeepSeek-R1-0528 Inference on NVIDIA Blackwell

Post Details

Company

Together AI

Date Published

July 17, 2025

Author

Together AI

Word Count

1,527

Language

English

Hacker News Points

-

Source URL

www.together.ai/blog/fastest-inference-for-deepseek-r1-0528-with-nvidia-hgx-b200

Summary

Together AI has introduced support for NVIDIA Blackwell GPUs in their inference platform, enhancing the performance of AI models like DeepSeek-R1-0528, particularly when deployed on NVIDIA HGX B200 GPUs. This advancement positions Together AI as a leader in high-speed AI inference, leveraging a combination of bespoke GPU kernels, a proprietary inference engine, and innovative techniques such as speculative decoding and lossless quantization. The platform demonstrates notable speed improvements, achieving up to 334 tokens per second, and offers customizable Dedicated Endpoints for further optimization in production environments. The improved performance is achieved without compromising model quality, thanks to Together AI's advanced inference stack that includes state-of-the-art components and methodologies. This development enables efficient and scalable deployment of AI workloads, with Together AI providing options for both serverless and dedicated endpoints to meet diverse customer needs.