Company
Date Published
Author
Sreekanth Sivasankaran
Word count
710
Language
English
Hacker News points
None

Summary

The Full Text Search (FTS) indexing system in Couchbase has evolved with the introduction of Scorch, an advanced and optimized index type that addresses scalability concerns and improves performance. The previous upside_down indexing format was replaced by Scorch due to its limitations, including huge index size amplification, data deduplication potentials not being tapped, and less friendly representation for natural language queries. Scorch follows a segment-based index architecture, with each segment composed of term dictionary, postings lists, frequency norms/location details, stored fields, and docValues, which are optimized using Finite State Transducers, bitmap representations, varint encoding, compression techniques, and columnar representation. These optimizations have resulted in significant index size reduction (up to 4X) and query performance improvements (up to 20X) for many queries.