Tripling the query performance of lexical search
Blog post from Vespa
In an effort to enhance the efficiency of lexical search, Vespa has introduced several tunable query-time optimizations that reportedly triple the performance of natural language text search, according to the blog post by Vespa engineers. These optimizations involve reducing the precision required for very common words, filtering out such words automatically, and minimizing the number of internal result candidates to lower ranking costs, all while maintaining a marginal expected loss of query result quality. These changes are available in Vespa version 8.503.27 and leverage a hybrid search model combining lexical and vector search techniques to balance specificity and semantics. The post details how these optimizations reduce the reliance on large posting lists and improve query performance through parameters like filter-threshold, stopword-limit, and adjust-target, which can be adjusted in a rank profile or overridden per query. Experiments with datasets from the BEIR benchmark show that these changes optimize query performance without significantly affecting quality, especially when used in conjunction with Vespa's weakAnd query operator and a modified MMAP madvise setting for better I/O subsystem utilization. This advancement promises reduced hardware costs, lower query latencies, and improved stability, particularly in environments with limited physical memory.