Pretrained Transformer Language Models for Search - part 4

Post Details

Company

Vespa

Date Published

June 22, 2021

Author

Jo Kristian Bergum

Word Count

2,457

Language

English

Hacker News Points

-

Source URL

blog.vespa.ai/pretrained-transformer-language-models-for-search-part-4

Summary

In a detailed exploration of transformer models for search and document ranking, this blog post series discusses utilizing Vespa.ai to implement a multiphase retrieval and ranking pipeline evaluated on the MS Marco Passage ranking dataset. The series demonstrates how smaller transformer models with only 22 million parameters can achieve near state-of-the-art ranking results, outperforming larger models with billions of parameters. The final post introduces a cross-encoder model, which is integrated as the last ranking stage in the pipeline, utilizing an all-to-all interaction between query and passage, and fine-tuned for binary classification on the MS Marco passage dataset. The cross-encoder model, based on a 6-layer MiniLM model, has been optimized for performance using a quantized int8 representation to accelerate inference. The post also details the configuration and deployment of this model in Vespa, including integration with ONNX format and benchmarking of its performance in terms of latency and throughput against other retrieval methods. Additionally, the blog discusses the trade-offs between accuracy and performance, illustrating how the cross-encoder model significantly increases computational cost but substantially improves ranking accuracy, making it a viable option for production environments where accuracy is critical.