Cohere int8 & binary Embeddings - Scale Your Vector Database to Large Datasets

Company

Cohere

Date Published

March 18, 2024

Author

Nils Reimers

Word count

1786

Language

English

Hacker News points

None

URL

cohere.com/blog/int8-binary-embeddings

Summary

Cohere's Embed model now supports int8 and binary embeddings, providing significant memory savings and improved performance for semantic search across large datasets. By reducing the memory requirement for storing embeddings, the model enables cost reductions from $130k to $1,300 annually while maintaining high search quality. Int8 embeddings offer a 4x memory saving and a 30% speed increase, whereas binary embeddings provide a 32x memory reduction and 40x faster search speeds. The Embed v3 model is specifically designed to be compression-friendly, ensuring superior search quality across various compression techniques like int8, binary, and product quantization. These advancements are evaluated on the MIRACL benchmark and leverage training methods that maintain performance across different precision levels. Cohere's efforts in this area are part of a broader strategy to enhance enterprise AI capabilities, with applications spanning various industries such as technology, healthcare, and public sector security.