Larger than RAM Vector Indexes for Relational Databases
Blog post from PlanetScale
With the rise of modern embedding models transforming data into multi-dimensional vectors, the integration of vector indexes into relational databases has become essential, yet challenging due to the absence of existing research on implementing vector indexes within these databases. The article discusses the integration of Hierarchical Navigable Small Worlds (HNSW) into MySQL for PlanetScale, emphasizing the necessity of supporting larger-than-RAM indexes for real-world applications, as relational databases typically manage data that exceeds available memory. The design addresses limitations of HNSW, such as its static nature and lack of transactional support, by developing a hybrid vector index that combines in-memory HNSW for performance and on-disk storage for scalability. This hybrid approach maintains transactional integrity and crash resilience through a Write Ahead Log (WAL) while optimizing for inserts, updates, and deletes. The article highlights the implementation of background maintenance operations, such as splits, reassignments, and merges, to ensure the index remains efficient and accurate, offering a promising solution for vector indexing in relational databases.