Vector Databases Are the Wrong Abstraction

Post Details

Company

Timescale

Date Published

Oct. 29, 2024

Author

Matvey Arye

Word Count

2,698

Language

English

Hacker News Points

493

Source URL

www.timescale.com/blog/vector-databases-are-the-wrong-abstraction

Summary

The text discusses the challenges faced by engineering teams when using vector databases for building AI applications. It highlights that while everything works smoothly for simple applications and proofs of concept, taking these systems into production reveals flawed abstractions with vector databases and the way they are used today. The main issue is that vector databases treat embeddings as independent data, divorced from the source data from which embeddings are created, rather than what they truly are: derived data. This results in unnecessary complexity for developers who have to manage multiple databases and synchronize them manually. The solution proposed by the author is treating embeddings more like database indexes through a new abstraction called "vectorizer". This approach automatically keeps embeddings in sync with their source data, eliminating the maintenance costs that plague current implementations. The author also introduces an open-source tool called pgai Vectorizer, which implements this vectorizer abstraction in PostgreSQL and works with other extensions for vector search like pgvector and pgvectorscale. The article concludes by encouraging developers to try out pgai Vectorizer as it can simplify their AI workflows significantly.