Company
Date Published
Author
Weston Pace
Word count
2265
Language
English
Hacker News points
None

Summary

Nearly a year after announcing the development of a new file format version 2.0, the format was released and became the default. The iteration eliminated row groups and optimized I/O performance, with results often surpassing Parquet in full scans. Despite successes, improvements were identified, leading to the development of version 2.1, which focuses on structural and compressive encoding to enhance compression and I/O efficiency. This new version also introduces a refined approach to statistics-based pushdown and I/O scheduling, aiming to meet the "1-2 IOP challenge" for accessing data efficiently. As version 2.1 enters beta, the team encourages experimentation and feedback, while cautioning against using the beta for production data due to potential future unreadability.