Troubleshooting Large Rows and Large Cells in ScyllaDB
Blog post from ScyllaDB
ScyllaDB, while adept at managing large partitions, can encounter performance issues with very large rows and cells due to the need for allocating substantial contiguous memory, potentially increasing latency. To address this, ScyllaDB introduced tools to detect large rows and cells, with updates to the SSTable format in ScyllaDB 3.0 and ScyllaDB Enterprise 2019.1, which are enabled by default in ScyllaDB Open Source 3.1 and above. These tools use system tables to store data about large rows and cells, capturing details like keyspace and table name, SSTable name, row size, clustering key, and, for cells, the column name. Users can query these tables to troubleshoot performance issues, and they are alerted through warnings in the ScyllaDB log when specified thresholds are surpassed. The system.large_rows and system.large_cells tables store data with a 30-day Time To Live (TTL) to prevent stale data, highlighting the importance of early data modeling decisions in avoiding performance bottlenecks. To help users, ScyllaDB offers free data modeling courses at ScyllaDB University, aimed at both beginners and advanced users.