Home / Companies / DataStax / Blog / Post Details
Content Deep Dive

Leveled Compaction in Apache Cassandra

Blog post from DataStax

Post Details
Company
Date Published
Author
Jonathan Ellis
Word Count
568
Company Posts That Month
4
Language
English
Hacker News Points
-
Summary

Cassandra's log-structured storage engine enables its performance and features like application-transparent compression by turning all updates into data files called sstables that are written sequentially to disk. Over time, multiple versions of a row may exist in different sstables with varying sets of columns. To prevent read speed from deteriorating, compaction runs in the background, merging sstables together. Cassandra's size-tiered compaction strategy is similar to Google's Bigtable paper and combines sstables when enough similar-sized ones are present. However, this approach has issues with update-heavy workloads. Cassandra 1.0 introduces the Leveled Compaction Strategy, based on LevelDB from Google's Chromium team. This strategy creates fixed-size sstables grouped into levels, ensuring non-overlapping sstables within each level. Each level is ten times as large as the previous. This approach solves problems with tiered compaction and can be enabled by setting the compaction_strategy option to LeveledCompactionStrategy. While leveled compaction performs roughly twice as much i/o compared to size-tiered compaction, it offers benefits for update-heavy workloads due to fewer obsolete row versions involved.

Trends Found in this Post

No tracked trend matches for this post yet.