Home / Companies / Vespa / Blog / Post Details
Content Deep Dive

Pretrained Transformer Language Models for Search - part 4

Blog post from Vespa

Post Details
Company
Date Published
Author
Jo Kristian Bergum
Word Count
2,457
Language
English
Hacker News Points
-
Summary

In a detailed exploration of transformer models for search and document ranking, this blog post series discusses utilizing Vespa.ai to implement a multiphase retrieval and ranking pipeline evaluated on the MS Marco Passage ranking dataset. The series demonstrates how smaller transformer models with only 22 million parameters can achieve near state-of-the-art ranking results, outperforming larger models with billions of parameters. The final post introduces a cross-encoder model, which is integrated as the last ranking stage in the pipeline, utilizing an all-to-all interaction between query and passage, and fine-tuned for binary classification on the MS Marco passage dataset. The cross-encoder model, based on a 6-layer MiniLM model, has been optimized for performance using a quantized int8 representation to accelerate inference. The post also details the configuration and deployment of this model in Vespa, including integration with ONNX format and benchmarking of its performance in terms of latency and throughput against other retrieval methods. Additionally, the blog discusses the trade-offs between accuracy and performance, illustrating how the cross-encoder model significantly increases computational cost but substantially improves ranking accuracy, making it a viable option for production environments where accuracy is critical.