Hash tables in ClickHouse and C++ Zero-cost Abstractions

Company

ClickHouse

Date Published

May 16, 2023

Author

Maksim Kita

Word count

4060

Language

English

Hacker News points

URL

clickhouse.com/blog/hash-tables-in-clickhouse-and-zero-cost-abstractions

Summary

Hash tables are the core data structure used in ClickHouse for aggregation and join operations. They provide constant average performance for insert, lookup, and delete operations. The design of a hash table is critical for performance, as it depends on factors such as the type of keys, unique keys, and load factor. ClickHouse uses a zero-cost C++ framework to generate ideal hash tables for specific use cases. The framework provides a policy-based design, allowing users to customize the hash function, allocator, cell, grower, and hash table itself. It also includes features such as zero-value storage, custom resizing policies, and LRU caches. Specialized hash tables are available for various scenarios, including small tables, string hash tables, and two-level hash tables. The framework's performance is optimized for cache locality, making it suitable for large-scale data aggregation and join operations.