Lance File Format 2.2: Taming Complex Data
Blog post from LanceDB
LanceDB's file format 2.2 is designed to meet the evolving demands of AI/ML workloads by accommodating large multimodal datasets, complex data types, and dynamic schema evolution while enhancing storage efficiency and compression. This version introduces Blob V2 for improved management of large files and external media, allowing for efficient, adaptive storage and streaming access without data duplication. It supports nested schema evolution, enabling seamless addition of new fields without rewriting existing data and introduces the native Map type to simplify coding practices. Format 2.2 also extends compression to various data types, offering significant space savings and improved performance, particularly for text, JSON, and sparse features. The upgrade is fully backward-compatible, offering flexibility in implementation. LanceDB is also planning future enhancements, such as native media type support and advanced encoding algorithms, further optimizing the format for AI/ML applications.