Home / Companies / LanceDB / Blog / Post Details
Content Deep Dive

Rethinking Table File Paths with Uber: Lance’s Multi-Base Layout

Blog post from LanceDB

Post Details
Company
Date Published
Author
Jack Ye
Word Count
3,259
Language
English
Hacker News Points
-
Summary

Table formats like Iceberg, Delta Lake, and Lance provide a structured way to manage data, primarily through their path management strategies, which significantly impact the portability and operational complexity of data at scale. While Iceberg originally used absolute paths for file references, it is transitioning to relative paths to enhance portability without the need for path rewrites during relocation. Conversely, Delta Lake began with relative paths to ensure zero-rewrite portability but later incorporated absolute paths to accommodate features like shallow cloning. Lance, however, prioritizes predictability and strict portability by using a fixed directory structure and exclusively relative paths, allowing datasets to be copied without metadata modifications. Lance introduces a multi-base path model, enabling a single dataset to span multiple storage locations while maintaining maximum portability, as demonstrated by Uber's AI infrastructure team, which required distributing datasets across multiple S3 buckets. This model supports various use cases, including multi-region data distribution, efficient disaster recovery, and AI experimentation workflows, by explicitly defining base paths in the manifest, thereby simplifying operational tasks like garbage collection and credential management.