Binary Quantization - Vector Search, 40x Faster

Post Details

Company

Qdrant

Date Published

Sept. 18, 2023

Author

Nirant Kasliwal

Word Count

1,963

Language

English

Hacker News Points

-

Source URL

qdrant.tech/articles/binary-quantization

Summary

Qdrant's latest innovation, binary quantization (BQ), optimizes vector search by converting high-dimensional vectors into binary values, reducing memory usage and improving retrieval speeds by up to 40 times while allowing users to balance speed and recall accuracy during searches. This method is particularly beneficial for large vector lengths, such as OpenAI's 1536-dimensional embeddings, which can be compressed from 32-bit to 1-bit, offering significant storage efficiency and faster boolean operations. While BQ can degrade recall accuracy, especially with small embeddings, it excels in managing large datasets with high recall expectations by using a binary index that oversamples a smaller vector subset for precise search results. The implementation involves storing full vectors on disk and binary vectors in RAM, with the ability to adjust search parameters like oversampling and rescoring to optimize performance and accuracy. This approach is ideal for scenarios requiring rapid data processing but may not be suitable for smaller embeddings due to potential accuracy losses.