/plushcap/analysis/cloudflare/linux-transport-protocol-port-selection-performance

connect() - why are you so slow?

What's this blog post about?

Cloudflare has developed three solutions to solve the problem of port selection performance bottlenecks in TCP connections: 1. The "select, test, repeat" solution involves creating a socket and trying to connect repeatedly with different source IP addresses until a free port is found. This method can be time-consuming. 2. The second solution is called "select port by random shifting range". It generates a random offset within the ephemeral port range and then tries to bind to that shifted range. If it fails, it shifts the range again randomly until a free port is found. 3. The third solution involves using a new patch introduced in kernel versions 6.8 and later. This solution eliminates the need for window shifting and instead uses a similar approach to "select port by random shifting range" such that the start offset is randomized to be even or odd, but then loops incrementally rather than skipping every other port. The user space implementation of these solutions results in better performance compared to TCP's default behavior. The kernel solution performs slightly faster due to algorithm improvements and the ability to always find a port given the full search space of the range. These solutions can help improve the connect() latency for workloads with high numbers of unicast egress connections. In addition, other protocols such as UDP and DCCP also benefit from these port selection strategies, although they may have some differences in how ports are selected and managed. It is recommended to explore and measure your own systems to determine which strategy works best for your specific needs.

Company
Cloudflare

Date published
Feb. 8, 2024

Author(s)
Frederick Lawler

Word count
2902

Hacker News points
5

Language
English


By Matt Makai. 2021-2024.