Company
Date Published
Author
Vinay Suryadevara & Jianfei Hu
Word count
2008
Language
English
Hacker News points
16

Summary

ClickHouse Cloud optimized their Kubernetes and EKS (Elastic Kubernetes Service) setup to reduce costs by improving pod allocation and increasing resource utilization. They analyzed CPU/Memory utilization in their EKS cluster nodes, identified the root cause of low utilization due to the LeastAllocated scoring policy favoring sparse distribution of pods on cluster nodes. They then explored alternative solutions, including tuning cluster autoscaler and overprovisioning, but ultimately chose to change the kube-scheduler scoring policy from LeastAllocated to MostAllocated to pack their clusters more efficiently. This solution implemented the bin-packing paradigm for their pods, favoring nodes with higher utilization ratios, reducing total cost. They set up a custom scheduler in their Kubernetes cluster using the most-allocated scoring policy and ensured high availability by defining three pods with leader election enabled. The new setup resulted in a 20-30% increase in EKS cluster resource utilization and significant cost savings on EC2 instances, with an estimated reduction of over $10 million annually.