Home / Companies / ClickHouse / Blog / Post Details
Content Deep Dive

Introducing Inverted Indices in ClickHouse

Blog post from ClickHouse

Post Details
Company
Date Published
Author
Robert Schulze
Word Count
2,372
Language
English
Hacker News Points
28
Summary

Inverted indexes enable fast and powerful searching in vast collections of text documents by compiling a database of terms with pointers to the documents that contain these terms. ClickHouse, an open-source relational database management system, has recently introduced experimental support for inverted indexes, which can significantly improve query performance when used effectively. Inverted indexes are stored as three files: metadata, dictionary, and posting list files, each serving specific purposes in the index construction process. The dictionary file contains a minimized Finite State Transducer (FST) that translates terms to posting list offsets, while the posting list file stores compressed lists of row positions for each term. By utilizing state-of-the-art roaring bitmaps format for compression, ClickHouse can efficiently store and retrieve posting lists, enabling fast disjunctions or conjunctions searches. The experimental support for inverted indexes in ClickHouse has been demonstrated on a real-world dataset, showcasing improved query performance with an index compared to without one.