The Baseten Inference Stack at NVIDIA Dynamo Day

Post Details

Company

Baseten

Date Published

Feb. 4, 2026

Author

Rachel Rapp

Word Count

1,098

Language

English

Hacker News Points

-

Source URL

www.baseten.co/blog/nvidia-dynamo-day-baseten-inference-stack

Summary

The Baseten Inference Stack utilizes a combination of open-source tools like NVIDIA Dynamo and in-house innovations to optimize generative AI workloads with the lowest latency and highest throughput. By leveraging Dynamo, which is framework-agnostic and regularly updated, Baseten can integrate various inference engines tailored to specific models and use cases. The use of Dynamo facilitates improvements in system-level inference performance through optimizations like disaggregated serving, KV cache-aware routing, and KV cache offloading, leading to significant reductions in latency and increases in throughput. Baseten's engineers contribute to the Dynamo ecosystem by offering enhancements and new features, which were highlighted during NVIDIA's Dynamo Day event. These strategies enable Baseten to achieve a 99.99% reliability rate in their AI model performance, while also supporting multimodal model serving by extending the capabilities of Dynamo in handling complex AI workloads.