Result diversification using Vespa result grouping
Blog post from Vespa
Result diversification in Vespa focuses on enhancing search and recommendation outputs by reorganizing ranked lists of documents to optimize variety. By utilizing Vespa's distributed query execution, a query is fanned out across content nodes, each generating a locally ranked list of top hits, which are then merged to form a globally ranked list. Vespa's grouping language facilitates the grouping, aggregation, and presentation of query matches into diversified result sets, allowing for enhanced search experiences. The system supports dense retrieval with nearest neighbor search and provides options to control group ordering and implement complex post-processing logic. The phased execution process begins with efficient candidate selection using bucketing and proceeds to apply more complex diversity functions on top-ranked lists. Serving performance is influenced by factors such as the number of matches per node, the uniqueness of field values, and the number of nodes involved, with techniques available to limit matches and optimize performance. The blog emphasizes the importance of a phased execution approach for comprehensive result diversification and highlights Vespa's capability for both single-level and multi-level grouping expressions.