Company
Date Published
Author
David Myriel, Yang Cen
Word count
1417
Language
English
Hacker News points
None

Summary

LanceDB has introduced RaBitQ quantization, a new method that complements the existing IVF_PQ strategy to enhance the efficiency of handling high-dimensional vectors. While IVF_PQ has been the default for its compression and search capabilities, it requires expensive codebook training and performs less effectively as dimensionality increases. RaBitQ offers a significant improvement by compressing vectors more efficiently, providing faster index creation, and maintaining higher recall rates in high-dimensional and multimodal datasets. It achieves this by using a binary sign pattern and corrective factors to store vectors, significantly reducing their size and enabling quick binary dot product comparisons during searches. Tested against IVF_PQ on datasets like DBpedia and GIST1M, RaBitQ showed superior recall and throughput while requiring less index build time. It does not necessitate retraining when data distributions shift, making it robust to updates and ideal for large, complex datasets. RaBitQ's scalable and precise approach is now available in LanceDB alongside IVF_PQ, offering users the flexibility to choose the most suitable method for their specific workloads.