Pretrained Transformer Language Models for Search - part 2

Post Details

Company

Vespa

Date Published

May 28, 2021

Author

Jo Kristian Bergum

Word Count

2,643

Language

English

Hacker News Points

-

Source URL

blog.vespa.ai/pretrained-transformer-language-models-for-search-part-2

Summary

The blog post delves into the use of pretrained transformer language models in a multiphase retrieval and ranking pipeline, utilizing Vespa.ai to evaluate these models against the MS Marco Passage ranking dataset. It highlights the efficiency of miniature transformer models with only 22M parameters, which can outperform larger ensemble models. The discussion covers efficient candidate retrievers, including sparse retrieval using lexical term matching accelerated by the WAND algorithm and dense retrieval using dense vector representation with approximate nearest neighbor search algorithms. The post also explores hybrid retrieval methods that combine dense and sparse retrieval techniques to improve recall. It provides a detailed analysis of retrieval methods and their effectiveness, demonstrating that dense retrievers offer substantial recall improvements over sparse retrievers with fewer hits required for re-ranking. The series promises further exploration into re-ranking techniques using ColBERT and cross-encoders in subsequent posts.