Large Partition Support in ScyllaDB
Blog post from ScyllaDB
ScyllaDB 1.3 introduced enhanced support for large partitions, addressing the challenges they pose to data modeling and performance. Previously, large partitions could lead to memory issues, cache eviction, and performance degradation, particularly when partitions exceeded the 2 billion clustering rows limit or when dealing with unpredictable data inputs. The update allows the database engine to handle partitions at a clustering row granularity, enabling processes like SSTable compaction and streaming to operate without needing the entire partition in memory. While the new version improves performance by reading only data relevant to query results, there are still limitations, such as handling partitions larger than 128 kB during data streaming and maintaining cache efficiency for large partitions. Tools like nodetool cfstats and cfhistograms aid in monitoring partition sizes, and future updates are expected to further enhance capabilities, addressing current constraints like reversed queries needing entire partitions in memory.