Home / Companies / LanceDB / Blog / Post Details
Content Deep Dive

OpenSearch vs LanceDB for Vector Search: Query Cost and Infrastructure

Blog post from LanceDB

Post Details
Company
Date Published
Author
Justin Miller
Word Count
3,782
Language
English
Hacker News Points
-
Summary

Choosing between OpenSearch and LanceDB for vector databases involves a tradeoff between a distributed search service and an embedded library, with each having distinct infrastructure and cost implications. OpenSearch operates as a distributed cluster with full-text search, security, and other features, storing vectors and HNSW graph in RAM and EBS, while images are stored in S3. LanceDB, on the other hand, stores everything in S3 using a columnar file format, pulling index pages into memory as needed, which allows it to scale with query per second (QPS) rather than corpus size, resulting in potentially lower costs. Both systems handle a workload involving 287,360 images from the COCO 2017 dataset, embedded into 1152-dimensional vectors, with LanceDB being generally more cost-effective due to its reliance on S3 for storage and its ability to scale with demand. The key cost driver is how each system stores and accesses the vector index, with OpenSearch's costs scaling with RAM usage, while LanceDB's costs scale with QPS and S3 GET requests. Operational complexity differs, with OpenSearch offering broader features and LanceDB focusing on efficient vector search, and the choice between them should consider the specific needs such as recall targets, latency, and the necessity of additional features like full-text search and security.