Enhancing Vespa’s Embedding Management Capabilities
Blog post from Vespa
Vespa has announced upgrades to its embedding management capabilities, enhancing its support for inference with text embedding models by integrating Huggingface models, including multilingual options and GPU acceleration for faster processing. The updates allow developers to efficiently implement semantic search applications without the need for separate systems for embedding inference and vector search. Vespa now supports embedding models in ONNX format, enabling streamlined deployment and improved scalability, while the Vespa Model Hub offers a wider selection of state-of-the-art text embedding models for developers to explore. These improvements reduce latency and cost while supporting cross-lingual applications, allowing developers to leverage powerful models with minimal configuration. Vespa Cloud further simplifies scaling by automatically handling changes in inference traffic volume, and GPU acceleration is available for instances utilizing GPU resources, enhancing performance and cost-effectiveness.