Troubleshooting Cassandra column selection to boost database performance

Post Details

Company

Sysdig

Date Published

April 19, 2016

Author

Gianluca Borello

Word Count

2,068

Language

English

Hacker News Points

-

Source URL

www.sysdig.com/blog/column-selection-effects-query-performance

Summary

Gianluca Borello, an engineer at Sysdig, describes a performance issue encountered with Cassandra, a database known for its scalability and flexibility, when used for storing and processing streams of binary blobs. The problem arose when querying large streams, leading to degraded response times due to Cassandra processing all columns in a row, even when only specific columns were queried. Through tests and system tracing with tools like sysdig, Borello identified that Cassandra was reading entire data files instead of the specific data requested, due to the way it handles CQL row semantics. To resolve this, the team refactored their schema to distribute blobs across multiple rows, reducing the size of each row and significantly improving query performance. This experience highlights the importance of monitoring and troubleshooting at the system level, demonstrating how system call analysis can efficiently identify and solve database performance issues without delving into the application's internal code.