Optimizing realtime evaluation of neural net models on Vespa

Post Details

Company

Vespa

Date Published

Jan. 5, 2018

Author

-

Word Count

1,114

Language

English

Hacker News Points

-

Source URL

blog.vespa.ai/optimizing-realtime-evaluation-of-neural-net

Summary

In this blog post, the authors discuss significant optimizations made to Vespa's tensor framework, enhancing the real-time evaluation of neural network models by over 20 times. Vespa, an open-source platform for scalable real-time data processing used in applications like search and recommendation systems, introduced the tensor API to facilitate efficient computations on multi-dimensional data. The post highlights the process of optimizing the evaluation of a two-layer neural network by reducing end-to-end latency from 150-160 ms to 7 ms through specific optimizations, such as recognizing and optimizing vector-matrix multiplications with hardware-accelerated code. By leveraging the tensor API, users can implement advanced ranking models without relying on Vespa's core development team, allowing for faster application development and the ability to meet performance requirements independently. The approach underscores Vespa’s commitment to continually enhancing performance and supporting complex machine learning models within its framework.