Binary Quantization - Vector Search, 40x Faster
Blog post from Qdrant
Qdrant's latest innovation, binary quantization (BQ), optimizes vector search by converting high-dimensional vectors into binary values, reducing memory usage and improving retrieval speeds by up to 40 times while allowing users to balance speed and recall accuracy during searches. This method is particularly beneficial for large vector lengths, such as OpenAI's 1536-dimensional embeddings, which can be compressed from 32-bit to 1-bit, offering significant storage efficiency and faster boolean operations. While BQ can degrade recall accuracy, especially with small embeddings, it excels in managing large datasets with high recall expectations by using a binary index that oversamples a smaller vector subset for precise search results. The implementation involves storing full vectors on disk and binary vectors in RAM, with the ability to adjust search parameters like oversampling and rescoring to optimize performance and accuracy. This approach is ideal for scenarios requiring rapid data processing but may not be suitable for smaller embeddings due to potential accuracy losses.