Home / Companies / ScyllaDB / Blog / Post Details
Content Deep Dive

The Taming of Collection Scans

Blog post from ScyllaDB

Post Details
Company
Date Published
Author
Pavel "Xemul" Emelyanov
Word Count
2,740
Language
English
Hacker News Points
-
Summary

In a detailed exploration of collection scanning efficiencies, Pavel "Xemul" Emelyanov analyzes various data structures, including arrays, intrusive lists, arrays of pointers, and a novel "split-list" structure, to optimize hardware utilization during data processing tasks. The study reveals that while arrays typically offer the best performance due to cache efficiency, they lack flexibility in independent object management. Intrusive lists allow independent allocation and destruction of elements but fall short in scanning speed compared to arrays. The array of pointers improves scanning performance but incurs significant memory overhead. The proposed split-list structure balances memory efficiency with scanning performance, offering a viable compromise by organizing data into multiple sub-lists processed in parallel, thus enhancing processor parallelism without additional memory requirements. The analysis further notes varying performance outcomes based on processor architecture and compiler optimizations, emphasizing the context-dependent nature of choosing an optimal data structure for task queues.