How to scan 475 million partitions 12x faster using efficient full table scan with ScyllaDB 1.6
Blog post from ScyllaDB
The blog post by Tomer Sandler discusses how to significantly speed up full table scans on ScyllaDB, a database known for handling large-scale data. By shifting from a traditional serial full table scan, which processes 42,110 rows per second, to an efficient parallel full table scan, ScyllaDB achieves a remarkable rate of 510,752 rows per second, a 12-fold increase. The post explains how ScyllaDB's architecture can leverage server and core parallelism to optimize performance by utilizing the token function for partitioning data, allowing for multiple concurrent queries across token ranges. The setup described includes a ScyllaDB cluster deployed on Google Compute Engine with specific hardware configurations, and the use of a Golang client with the gocql driver to execute the parallel scans. Key strategies for maximizing parallelism involve calculating the number of parallel queries based on nodes and cores, dividing the token range into many smaller ranges, and randomizing query execution to ensure that all shards are actively engaged. The detailed guide includes a demonstration and suggestions for configuring the scanning process, emphasizing the importance of optimizing queries to improve efficiency and throughput.