Work Stealing: Load-balancing for compute-heavy tasks
Blog post from Convex
The text explores the concept of "work stealing" as a strategy for distributing resource-intensive tasks, contrasting it with traditional push-based routing. Work stealing is beneficial for workloads that are time-consuming, do not share resources well, and prioritize throughput over latency. It involves workers pulling tasks from a shared queue when they have available capacity, which allows for better system utilization, consistent concurrency, and eliminates the need for service discovery. While push-based routing assigns tasks to specific workers, leading to potential inefficiencies, the pull-based approach is likened to receiving an order number at a restaurant, where tasks are dynamically managed. The text discusses the suitability of each approach, using examples from the author's experience at Dropbox, and highlights the potential of reactive databases in facilitating work stealing by simplifying data flow and ensuring consistent updates. The discussion underscores the importance of choosing the right approach based on specific application needs, infrastructure control, and the type of workload being managed.