Home / Companies / Sysdig / Blog / Post Details
Content Deep Dive

Troubleshooting Cassandra column selection to boost database performance

Blog post from Sysdig

Post Details
Company
Date Published
Author
Gianluca Borello
Word Count
2,068
Language
English
Hacker News Points
-
Summary

Gianluca Borello, an engineer at Sysdig, describes a performance issue encountered with Cassandra, a database known for its scalability and flexibility, when used for storing and processing streams of binary blobs. The problem arose when querying large streams, leading to degraded response times due to Cassandra processing all columns in a row, even when only specific columns were queried. Through tests and system tracing with tools like sysdig, Borello identified that Cassandra was reading entire data files instead of the specific data requested, due to the way it handles CQL row semantics. To resolve this, the team refactored their schema to distribute blobs across multiple rows, reducing the size of each row and significantly improving query performance. This experience highlights the importance of monitoring and troubleshooting at the system level, demonstrating how system call analysis can efficiently identify and solve database performance issues without delving into the application's internal code.