Home / Companies / Vespa / Blog / Post Details
Content Deep Dive

Announcing support for global significance models

Blog post from Vespa

Post Details
Company
Date Published
Author
Gleb Sizov
Word Count
787
Language
English
Hacker News Points
-
Summary

Vespa has introduced support for global significance models, which enhance ranking accuracy for streaming searches and ensure consistent results in multi-node deployments using indexed mode. Significance measures the rarity of terms within a document collection, affecting their weight in ranking functions like bm25 and nativeRank. Previously, significance values were calculated locally, leading to non-deterministic results. The new global models standardize significance values across all content nodes, benefiting both indexing and streaming scenarios. Experiments using datasets like NFCorpus, TREC-COVID, and MS MARCO demonstrated improved ranking quality for streaming searches with global models, although slight decreases were noted in larger, general domain datasets when indexing. For small collections, models built from external data like Wikipedia work well, but for larger collections, it's advisable to generate models from the documents themselves. The feature, available in Vespa version 8.426.8, incurs no additional performance cost and is detailed further in the significance model documentation.