Home / Companies / ClickHouse / Blog / Post Details
Content Deep Dive

Hash tables in ClickHouse and C++ Zero-cost Abstractions

Blog post from ClickHouse

Post Details
Company
Date Published
Author
Maksim Kita
Word Count
4,060
Language
English
Hacker News Points
3
Summary

Hash tables are the core data structure used in ClickHouse for aggregation and join operations. They provide constant average performance for insert, lookup, and delete operations. The design of a hash table is critical for performance, as it depends on factors such as the type of keys, unique keys, and load factor. ClickHouse uses a zero-cost C++ framework to generate ideal hash tables for specific use cases. The framework provides a policy-based design, allowing users to customize the hash function, allocator, cell, grower, and hash table itself. It also includes features such as zero-value storage, custom resizing policies, and LRU caches. Specialized hash tables are available for various scenarios, including small tables, string hash tables, and two-level hash tables. The framework's performance is optimized for cache locality, making it suitable for large-scale data aggregation and join operations.