Company
Date Published
Author
Alfonso Subiotto Marqués
Word count
1043
Language
-
Hacker News points
None

Summary

Interfaces, while essential for data exchange, can impose limitations on user performance, as highlighted in a talk by Joran at Systems Distributed 2025. This concept was put into practice when a team sought to overcome the limitations imposed by the Parquet file format, which was dominating their query CPU time due to the need to convert data into the Arrow format for efficient querying. Despite Parquet's widespread use and design goals of efficient data storage, retrieval, and interoperability, it fell short in handling computational queries efficiently and required costly conversion processes. The team explored Vortex, a file format optimized for decoding and querying data directly from object storage, which resulted in a 70% performance improvement and better storage size efficiency. Vortex's design, which supports general-purpose compute pushdown and offers extensibility for future encodings, provided a more suitable interface for their needs, illustrating the importance of selecting an interface that aligns with specific use-cases for enhanced performance.