Home / Companies / Vespa / Blog / Post Details
Content Deep Dive

Optimizing realtime evaluation of neural net models on Vespa

Blog post from Vespa

Post Details
Company
Date Published
Author
-
Word Count
1,114
Language
English
Hacker News Points
-
Summary

In this blog post, the authors discuss significant optimizations made to Vespa's tensor framework, enhancing the real-time evaluation of neural network models by over 20 times. Vespa, an open-source platform for scalable real-time data processing used in applications like search and recommendation systems, introduced the tensor API to facilitate efficient computations on multi-dimensional data. The post highlights the process of optimizing the evaluation of a two-layer neural network by reducing end-to-end latency from 150-160 ms to 7 ms through specific optimizations, such as recognizing and optimizing vector-matrix multiplications with hardware-accelerated code. By leveraging the tensor API, users can implement advanced ranking models without relying on Vespa's core development team, allowing for faster application development and the ability to meet performance requirements independently. The approach underscores Vespa’s commitment to continually enhancing performance and supporting complex machine learning models within its framework.