Home / Companies / Voyage AI / Blog / Post Details
Content Deep Dive

Harvey partners with Voyage to build custom legal embeddings

Blog post from Voyage AI

Post Details
Company
Date Published
Author
Voyage AI
Word Count
563
Language
English
Hacker News Points
-
Summary

Retrieval-augmented-generation (RAG) systems, crucial in real-world large language model (LLM) applications, are enhanced by embeddings, which allow retrieval based on semantic meaning. However, standard embeddings, trained on general data, often fail in specialized fields like law, where distinguishing relevant text can be challenging. Voyage AI, led by Stanford's Tengyu Ma, excels in developing customized embedding models tailored for specific domains. Collaborating with Harvey, Voyage AI fine-tuned embeddings using voyage-law-2, training on over 20 billion tokens of US case law and expert annotations. This led to the creation of voyage-law-2-harvey, a custom model that significantly improved retrieval accuracy by reducing irrelevant results by nearly 25% compared to other leading models, while also benefiting storage and latency due to reduced embedding dimensionality. Harvey plans to continue working with Voyage AI to develop additional custom embedding models for legal and other domains to further enhance enterprise search and RAG systems.