Streaming secondary indices: incremental, demand-driven index evaluation
Blog post from ClickHouse
ClickHouse's introduction of streaming secondary indices in version 25.9 marks a significant improvement in query execution by interleaving index evaluation with data reads, as opposed to the previous method of fully scanning secondary indices before query execution. This change allows for incremental and demand-driven index evaluation, reducing latency and memory usage. Previously, secondary indices were scanned upfront to determine which granules might contain matching rows, but this process could lead to inefficiencies, particularly with highly selective queries or those with LIMIT clauses. The new approach concurrently checks index entries and reads data, halting both processes as soon as the query's requirements are met, which eliminates unnecessary work and startup delays. Demonstrations on large datasets showed that using streaming indices can significantly speed up query execution and reduce memory usage, especially for queries that can terminate early due to LIMIT conditions.