Home / Companies / Datadog / Blog / Post Details
Content Deep Dive

How we minimized the overhead of Kubernetes in our job system

Blog post from Datadog

Post Details
Company
Date Published
Author
Lally Singh, Ashwin Venkatesan
Word Count
2,373
Language
English
Hacker News Points
3
Summary

At Datadog, transitioning an existing job system to Kubernetes initially led to performance regression, with increased CPU time and slower job completion rates. The solution involved performance tuning, timing analysis, and optimizing Kubernetes configurations. Initial experiments showed underutilization and inefficiencies due to incorrect deployment setups and node configurations. By adjusting resource requests for pods and reducing the interval for worker status checks, the team reduced node count and improved efficiency. Key findings included that Kubernetes overhead was minimal in terms of memory and CPU, and further optimization could enhance performance. The transition ultimately enabled a more manageable and scalable system, leveraging Kubernetes' benefits across cloud providers.