Company
Date Published
Author
Tim Faulkes
Word count
1853
Language
English
Hacker News points
None

Summary

Aerospike's new API calls in version 5.6 make validating cluster synchronization easier. Two common approaches to check if two clusters have exactly the same set of records are comparing record counts and using primary index queries with batch gets. However, both methods have flaws - comparing record counts can only determine if the clusters differ when counts don't match, while primary index queries may miss records in one direction due to data replication. QueryPartitions, a new method introduced by Aerospike, allows traversing one or more partitions in digest order, treating them as very large sorted lists. This enables implementing a cluster comparator that identifies missing records between two clusters by comparing digests and advancing the thread accordingly. The process can be enhanced to also compare record contents if needed. Handling multiple partitions requires concurrency control and selecting start and end partitions, last-update-times, or metadata for comparison. An open-source implementation of this algorithm is available.