Home / Companies / Sigma / Blog / Post Details
Content Deep Dive

Implementing a Crash-Resistant Socket.io Service Without Sticky Routing

Blog post from Sigma

Post Details
Company
Date Published
Author
Madison Chamberlain
Word Count
1,542
Language
English
Hacker News Points
-
Summary

Building a Socket.io multi-server system resistant to server crashes involves efficiently preserving state without relying on sticky routing, which can lead to issues like unbalanced loads and difficulties in rerouting. One proposed solution is using the `fetchSockets` method to share information across servers by attaching user data to a socket.io server's data attribute. However, this method has a high runtime complexity of O(n²), which is inefficient for large-scale operations. An alternative approach is to store session information in Redis, specifically using a ZSET to map room IDs to user IDs with the timestamps of the client's last server ping as scores. This allows for efficient state management with operations such as adding, removing, and checking stale users running in O(log n), significantly reducing the computational load. The ZSET approach also requires implementing a heartbeat to update timestamps and remove outdated entries, thereby maintaining data freshness. The complexity of managing stale data is further reduced by ensuring operations are performed at regular intervals. Despite its efficiency, this approach must be carefully managed to prevent memory leaks by clearing intervals when sockets disconnect, thus ensuring optimal performance and stability of the WebSocket service across multiple servers.