How to scan 475 million partitions 12x faster using efficient full table scan with ScyllaDB 1.6

Post Details

Company

ScyllaDB

Date Published

March 28, 2017

Author

Tomer Sandler

Word Count

811

Company Posts That Month

6

Language

English

Hacker News Points

-

Post removed?

No

Source URL

www.scylladb.com/2017/03/28/parallel-efficient-full-table-scan-scylla

Summary

The blog post by Tomer Sandler discusses how to significantly speed up full table scans on ScyllaDB, a database known for handling large-scale data. By shifting from a traditional serial full table scan, which processes 42,110 rows per second, to an efficient parallel full table scan, ScyllaDB achieves a remarkable rate of 510,752 rows per second, a 12-fold increase. The post explains how ScyllaDB's architecture can leverage server and core parallelism to optimize performance by utilizing the token function for partitioning data, allowing for multiple concurrent queries across token ranges. The setup described includes a ScyllaDB cluster deployed on Google Compute Engine with specific hardware configurations, and the use of a Golang client with the gocql driver to execute the parallel scans. Key strategies for maximizing parallelism involve calculating the number of parallel queries based on nodes and cores, dividing the token range into many smaller ranges, and randomizing query execution to ensure that all shards are actively engaged. The detailed guide includes a demonstration and suggestions for configuring the scanning process, emphasizing the importance of optimizing queries to improve efficiency and throughput.

Trends Found in this Post

No tracked trend matches for this post yet.

Use This Data

Use this post, company, and trend context to find content marketing opportunities, perform competitive analysis, or address product feature gaps via the Plushcap MCP server or the Plushcap API.