Home / Companies / Vespa / Blog / Post Details
Content Deep Dive

Stateful model serving: how we accelerate inference using ONNX Runtime

Blog post from Vespa

Post Details
Company
Date Published
Author
Lester Solbakken
Word Count
3,164
Company Posts That Month
3
Language
English
Hacker News Points
-
Summary

Vespa.ai, an open-source platform for real-time data processing over large datasets, has integrated ONNX Runtime to enhance its capabilities in stateful model serving, particularly for applications requiring complex machine learning models. Unlike stateless model serving, stateful evaluation combines input data with stored information, making it suitable for tasks like search and recommendation. Vespa.ai efficiently processes large volumes of data by deploying machine-learned models across stateful content nodes, reducing query-time data transportation costs. The integration of ONNX Runtime has significantly boosted Vespa.ai's performance in evaluating large models, such as BERT and other Transformers, by leveraging hardware acceleration and model optimizations like quantization. This integration allows Vespa.ai to support a wide range of models without vendor lock-in, utilizing ONNX's interoperability standard. Despite initial challenges with supporting complex models, ONNX Runtime's features, including multi-threading control and zero-copy tensor operations, have proven beneficial. Vespa.ai continues to explore ONNX Runtime's potential, such as GPU support, to further optimize its machine learning applications.

Trends Found in this Post
Trend Post Mentions Total Month Mentions Posts Companies MoM
AI Guardrails 2 No monthly metrics for this publish month.
Real-time 2 695 222 75 +8%
AI Model Fine-tuning 1 No monthly metrics for this publish month.