Introducing the Trino spooling protocol
Blog post from Starburst
The introduction of the Trino spooling protocol aims to enhance the efficiency of handling large query result sets by distributing the load from the coordinator to the workers, contrasting with the direct protocol which has served for over a decade. While the direct protocol excels in low-latency delivery for smaller datasets, it struggles with large data volumes, leading to potential bottlenecks. The spooling protocol addresses this by allowing data to be retrieved in parallel from external storage, thus offering higher throughput and supporting more space and CPU-efficient formats. This new protocol maintains backward and forward compatibility with existing clients and can automatically switch between direct and spooled data transmission based on the result set size. Initial tests, such as those conducted in the Trino Community Broadcast, have shown significant performance improvements, reducing query completion times from 35 seconds to 9 seconds for large datasets. Adoption of the spooling protocol requires configuration changes and client updates, but it promises to optimize data processing in active clusters by freeing up coordinator resources and supporting extensible encoding schemes.