Efficient full table scans with ScyllaDB 1.6
Blog post from ScyllaDB
ScyllaDB, a distributed database known for its high performance, offers methods to efficiently perform full table scans, which are essential for data analytics despite being less common than operations on individual partitions. Traditional full table scans can be slow due to limited server and client parallelism, but ScyllaDB 1.6 introduces improvements that enhance performance by automatically tuning paging and reducing CPU consumption. The database utilizes a token function to evenly distribute data across nodes, allowing for parallel scans by dividing the token range into sub-ranges, which can be processed simultaneously by multiple threads. This approach maximizes the utilization of server nodes and client cores, though the number of parallel queries should be adjusted based on cluster size to prevent a "processing tail." Additionally, the use of message queues can coordinate these parallel scans effectively. ScyllaDB's improvements make it easier to query entire tables efficiently, but integrated tools like Presto and Apache Spark can also be used for full table scans, offering ease of use for ad-hoc queries at the potential cost of efficiency.