The Taming of the B-Trees
Blog post from ScyllaDB
Pavel Emelyanov's blog post discusses the implementation and optimization of B-trees and B+-trees in ScyllaDB, a database designed to handle large volumes of data efficiently. Initially, ScyllaDB used Red-Black trees, but performance issues like high memory consumption and CPU usage led to a reevaluation of data structures, particularly focusing on B-trees for in-memory collections. B-trees were favored for their cache locality, which enhances CPU cache usage, and better branch prediction, reducing CPU pipeline flushes. The blog explains how B+-trees, with their separation of real and separation keys, allow for more efficient scans by focusing searches on leaf nodes. It also delves into the trade-offs of memory overhead and performance, particularly when using strict versus light separation modes for keys, and the challenges posed by tree balancing and memory defragmentation. Emelyanov concludes that implementing abstract mathematical concepts like B-trees in practical programming requires careful consideration of numerous factors, underscoring the complexity of optimizing database performance.