Accelerating stateless model evaluation on Vespa

Post Details

Company

Vespa

Date Published

July 5, 2021

Author

Lester Solbakken

Word Count

2,164

Language

English

Hacker News Points

-

Source URL

blog.vespa.ai/stateless-model-evaluation

Summary

Vespa.ai enhances its capabilities by accelerating stateless model evaluation in its container cluster using ONNX Runtime, allowing for efficient processing and transformation of documents or queries before storage or execution. This development introduces new use cases, such as generating vector representations for natural language text and enabling nearest neighbor retrieval, which were traditionally performed in the content cluster. The Vespa platform supports both stateless and stateful model evaluations, with container nodes handling stateless evaluations that involve single model evaluations per query or document, while content nodes manage stateful evaluations that require multiple evaluations using combined query and document data. The platform's architecture allows for ease of deployment and flexibility, automatically enabling a REST API for model discovery and evaluation and facilitating the integration of custom request handlers, document processors, and searchers, which leverage machine-learned models to enhance query processing and document transformation tasks. The introduction of stateless model evaluation acceleration in Vespa.ai's container cluster further solidifies its position as a robust platform for executing low-latency computations over large datasets without the need for external model servers, thereby reducing system complexity and improving scalability.