Domain-Specific Embeddings and Retrieval: Legal Edition (voyage-law-2)
Blog post from Voyage AI
Voyage-law-2 is a newly released domain-specific embedding model optimized for legal document retrieval, significantly outperforming general-purpose models like OpenAI v3 large, particularly in legal contexts. Trained on a vast dataset of legal documents, it features a 16K-context length, excelling in long-context retrieval. On eight legal retrieval datasets, voyage-law-2 led in seven, including notable performance on LeCaRDv2, LegalQuAD, and GerDaLIR with over 10% improvement in comparison to competitors. The model also demonstrates strong cross-domain capabilities, having been trained on various domains to enhance its applicability outside the legal field. It surpasses OpenAI v3 large in retrieval tasks across 34 datasets and eight categories, including technical documentation, finance, and medicine, indicating its robust adaptability and effectiveness in various contexts.