Content Deep Dive
Debugging TCP socket leak in a Kubernetes cluster
Blog post from Hasura
Post Details
Company
Date Published
Author
Shahidh K Muhammed
Word Count
996
Language
English
Hacker News Points
-
Summary
The author experienced network connectivity issues in their Kubernetes cluster running on Google Kubernetes Engine (GKE). They noticed delayed API responses and connection refused errors, particularly when the response body size was larger. After investigating, they found that one particular node was running out of TCP stack memory. This issue led to a discussion about kubelet's responsibility for monitoring health of a node, including CPU/RAM/disk usage but not network health. The author filed an issue with Kubernetes to consider monitoring tcp_mem statistics as well.