How We Found 7 TiB of Memory Just Sitting Around
Blog post from Render
Brian Stack's article discusses a significant memory optimization in managing Kubernetes clusters at Render, which resulted in freeing up 7 TiB of memory. The issue arose from the excessive memory consumption by daemonsets, particularly Calico and Vector, due to the inefficient handling of namespace listwatching. By questioning the necessity of using namespaces to reference pod labels, the team discovered that disabling this feature in Vector could save substantial memory. A new configuration option was introduced, allowing for the opt-in setting to avoid unnecessary namespace listwatching, leading to significant memory savings. The article highlights the importance of small observations and incremental changes in debugging infrastructure at scale, emphasizing collaboration and persistence in the process. The successful reduction in memory usage has not only improved the system's efficiency but also reduced the risk during rollouts, showcasing Render's commitment to providing robust and cost-effective cloud infrastructure solutions.