Doubling the throughput of data redistribution
Blog post from Vespa
Vespa, a data platform, has significantly enhanced its data redistribution process, doubling its throughput and reducing the time required to replace a failing content node by half. These improvements are part of Vespa version 7.528.3 and involve several technical optimizations, such as enhanced scheduling semantics, asynchronous operations, and optimized handling of delete bucket operations, which collectively minimize latency spikes and bottlenecks. Specifically, the upgraded system now allows for an average throughput of 44 MB/sec during data redistribution, reducing the process duration significantly from 3 hours and 50 minutes to about 2 hours. These advancements ensure that data redistribution occurs with minimal disruption to query or write traffic and are crucial in maintaining data redundancy and system reliability in Vespa Cloud.