Home / Companies / Together AI / Blog / Post Details
Content Deep Dive

Learn how Cursor partnered with Together AI to deliver real-time, low-latency inference at scale

Blog post from Together AI

Post Details
Company
Date Published
Author
Dan Fu, Ingrid Xu, Ce Zhang, Cyrus Lalkaka, Sonny Khan
Word Count
683
Language
English
Hacker News Points
-
Summary

Cursor is an AI-driven coding platform that utilizes real-time intelligence to optimize code development by predicting edits, refactoring, and managing context as developers work. It collaborates with Together AI to build an efficient infrastructure using NVIDIA Blackwell architecture, focusing on low-latency inference to maintain responsiveness. The integration supports the platform's need for predictable latency and stable operation under concurrent workloads. Cursor benefits from early access to NVIDIA Blackwell hardware, utilizing NVIDIA GB200 NVL72 and HGX B200 for enhanced performance. The collaboration includes porting the inference stack to ARM architecture and developing custom kernels for Blackwell's new Tensor Core instructions, ensuring efficient parallelism. The process involves quantization to balance memory constraints and output quality, crucial for maintaining code accuracy. Cursor's production deployment emphasizes throughput and utilization, aiming to enhance per-GPU economics with higher-throughput endpoints as demand increases.