Product Quantization in Vector Search
Blog post from Qdrant
Qdrant has introduced Product Quantization as a new feature in version 1.2.0, following the earlier introduction of Scalar Quantization in version 1.1.0, to significantly reduce memory usage in vector search by converting floating-point numbers into integers. This method involves dividing vectors into subvectors, applying the K-means clustering algorithm to map these chunks to the closest centroids, and then storing only the centroid identifiers, thereby compressing the data and reducing memory requirements. Although Product Quantization can increase indexing and search times and reduce search precision, it offers substantial memory savings and can sometimes reduce search time, making it ideal for low-RAM environments or where disk reads are more limiting than vector comparisons. Benchmarks using datasets like Glove-100 and Arxiv-titles-384-angular-no-filters demonstrate the trade-offs between precision, search time, and memory compression at various cluster dimensions. Compared to Scalar Quantization, Product Quantization provides a higher compression rate but may sacrifice accuracy and search speed, making it preferable in specific scenarios such as high-dimensional data or when indexing speed is not critical.