Building a Native GPU Iceberg Writer for Apache Iceberg
Blog post from Bodo
Building a distributed execution engine on modern GPUs highlights the critical importance of I/O performance, as demonstrated by Bodo's approach to scaling GPU DataFrames with its Single Program, Multiple Data (SPMD) architecture. By avoiding the overhead of traditional task-based engines, Bodo's system requires a storage layer that can keep pace with the GPU's capabilities, particularly when writing to Apache Iceberg, which demands adherence to specific partitioning and file-level metrics for efficient query pruning. The design of Bodo's GPU-accelerated Iceberg writer involves a streaming SPMD pipeline that eliminates the need for a central scheduler by using a push-based model where data flows asynchronously through physical operators, with the PhysicalGPUWriteIceberg operator acting as a stateful sink that accumulates data batches before triggering a flush sequence to avoid the small files problem. This architecture hinges on continuous, asynchronous delivery, zero driver overhead, and collective synchronization without a central scheduler, requiring meticulous state management and stream ordering by the physical operators. The solution involves implementing Iceberg's partition transforms and metadata extraction directly on the GPU using C++/CUDA, maintaining data within device memory to maximize performance and throughput. By integrating these capabilities directly into Bodo’s native execution engine, the system preserves the efficiency of the distributed pipeline and creates a GPU-native Iceberg sink that enhances Parquet write speeds without compromising the architectural benefits of device-side computing.