Index-based pruning in ClickHouse
Blog post from ClickHouse
The blog post explores three index-based pruning techniques in ClickHouse to optimize query performance by minimizing data reads, focusing on a dataset of UK property sales. The primary index, which organizes data based on a table's primary key, allows for efficient pruning by skipping entire granules when the filter condition is met. Lightweight projections offer secondary indexing without duplicating full rows by storing only a sorting key and a pointer to the base table, which is useful for non-primary key columns and can significantly improve query performance. The minmax index, a type of skip index, records minimum and maximum values for each granule, enabling effective pruning if the indexed column is correlated with the primary key. The post demonstrates these techniques using real-world queries on a substantial dataset, showcasing how they can drastically reduce query times and data processing loads, ultimately enhancing the efficiency of data retrieval in ClickHouse.