Home / Companies / DigitalOcean / Blog / Post Details
Content Deep Dive

How DigitalOcean’s Agentic Inference Cloud powered by NVIDIA GPUs Achieved 67% Lower Inference Costs for Workato

Blog post from DigitalOcean

Post Details
Company
Date Published
Author
Rithish Ramesh
Word Count
2,756
Language
English
Hacker News Points
-
Summary

DigitalOcean's collaboration with Workato's AI Research Lab resulted in a significant reduction in inference costs and improved performance for Workato's automation processes using agentic AI capabilities. By deploying NVIDIA Dynamo with vLLM on DigitalOcean Kubernetes Service (DOKS), the team achieved a 67% lower inference cost by utilizing NVIDIA H200 GPUs, which provided enhanced memory capacity and efficient throughput. The key innovation was the implementation of KV-aware routing, which minimized redundant computations by leveraging warm KV caches, dramatically reducing latency and increasing throughput. This approach facilitated a 67% increase in tokens per second per GPU and reduced the number of GPUs needed by 40%, leading to substantial cost savings. The success of this project underscores the importance of optimizing the system architecture around AI models for efficient inference at scale, rather than merely relying on additional hardware.