Home / Companies / Together AI / Blog / Post Details
Content Deep Dive

Together AI Delivers Top Speeds for DeepSeek-R1-0528 Inference on NVIDIA Blackwell

Blog post from Together AI

Post Details
Company
Date Published
Author
Together AI
Word Count
1,527
Language
English
Hacker News Points
-
Summary

Together AI has introduced support for NVIDIA Blackwell GPUs in their inference platform, enhancing the performance of AI models like DeepSeek-R1-0528, particularly when deployed on NVIDIA HGX B200 GPUs. This advancement positions Together AI as a leader in high-speed AI inference, leveraging a combination of bespoke GPU kernels, a proprietary inference engine, and innovative techniques such as speculative decoding and lossless quantization. The platform demonstrates notable speed improvements, achieving up to 334 tokens per second, and offers customizable Dedicated Endpoints for further optimization in production environments. The improved performance is achieved without compromising model quality, thanks to Together AI's advanced inference stack that includes state-of-the-art components and methodologies. This development enables efficient and scalable deployment of AI workloads, with Together AI providing options for both serverless and dedicated endpoints to meet diverse customer needs.