Home / Companies / DataStax / Blog / Post Details
Content Deep Dive

Vector Search for Production: A GPU-Powered KNN Ground Truth Dataset Generator

Blog post from DataStax

Post Details
Company
Date Published
Author
Sebastian Estevez
Word Count
2,338
Language
English
Hacker News Points
3
Summary

DataStax Astra DB and Apache Cassandra have released Neighborhood Watch (nw), a configurable GPU-powered ground truth KNN dataset generator, to address limitations in existing KNN datasets. The tool is designed for generating ground truth datasets for high-dimension embeddings vectors that are more representative of what people are actually using today. It incorporates GPU acceleration and supports multiple embedding models (both open source and proprietary). Neighborhood Watch can be used to test the quality of Approximate Nearest Neighbors (ANN) by ensuring it returns a large, representative, ground truth KNN dataset.