Home / Companies / Cloudflare / Blog / Post Details
Content Deep Dive

The story of one latency spike

Blog post from Cloudflare

Post Details
Company
Date Published
Author
Marek Majkowski
Word Count
1,462
Language
English
Hacker News Points
10
Summary

A customer reported slow HTTP responses from CloudFlare CDN servers. The issue was not easily reproducible and went unnoticed by usual monitoring systems. After investigating the problem, it was discovered that there were spikes in latency between the router and the server within their datacenter. System Tap, a debugging tool for Linux, helped identify the function causing the latency spike as tcp_collapse. The issue was resolved by adjusting the rmem sysctl to limit the receive buffer size on TCP sockets, which in turn reduced the time required for garbage collection and improved performance.