ScyllaDB SSTable 3.0 Can Decrease File Sizes 50% or More
Blog post from ScyllaDB
ScyllaDB Open Source 3.0 introduces a new on-disk file format, SSTable 3.0, which offers significant improvements in data storage and retrieval efficiency, particularly in terms of disk space savings and compatibility with Apache Cassandra 3.x. The new format, referred to as "mc," addresses several limitations of the older formats, such as data duplication and inefficient data structure alignment with CQL, by introducing row-based storage and separating metadata from the data files. This restructuring allows for more efficient binary searching and reduces the need for data compression, potentially enhancing performance. With examples showing disk space savings ranging from a few percentage points to over 50%, the new format is especially beneficial for schemas with wide rows and long column names, though the actual advantages can vary based on specific use cases. While SSTable 3.0 is not enabled by default in version 3.0 to ensure a smooth transition, future releases plan to make it standard, alongside further performance enhancements.