Together AI Delivers Top Speeds for DeepSeek-R1-0528 Inference on NVIDIA Blackwell
Blog post from Together AI
Together AI has introduced support for NVIDIA Blackwell GPUs in their inference platform, enhancing the performance of AI models like DeepSeek-R1-0528, particularly when deployed on NVIDIA HGX B200 GPUs. This advancement positions Together AI as a leader in high-speed AI inference, leveraging a combination of bespoke GPU kernels, a proprietary inference engine, and innovative techniques such as speculative decoding and lossless quantization. The platform demonstrates notable speed improvements, achieving up to 334 tokens per second, and offers customizable Dedicated Endpoints for further optimization in production environments. The improved performance is achieved without compromising model quality, thanks to Together AI's advanced inference stack that includes state-of-the-art components and methodologies. This development enables efficient and scalable deployment of AI workloads, with Together AI providing options for both serverless and dedicated endpoints to meet diverse customer needs.