Home / Companies / Starburst / Blog / Post Details
Content Deep Dive

Introducing the Trino spooling protocol

Blog post from Starburst

Post Details
Company
Date Published
Author
Lester Martin
Word Count
1,275
Language
English
Hacker News Points
-
Summary

The introduction of the Trino spooling protocol aims to enhance the efficiency of handling large query result sets by distributing the load from the coordinator to the workers, contrasting with the direct protocol which has served for over a decade. While the direct protocol excels in low-latency delivery for smaller datasets, it struggles with large data volumes, leading to potential bottlenecks. The spooling protocol addresses this by allowing data to be retrieved in parallel from external storage, thus offering higher throughput and supporting more space and CPU-efficient formats. This new protocol maintains backward and forward compatibility with existing clients and can automatically switch between direct and spooled data transmission based on the result set size. Initial tests, such as those conducted in the Trino Community Broadcast, have shown significant performance improvements, reducing query completion times from 35 seconds to 9 seconds for large datasets. Adoption of the spooling protocol requires configuration changes and client updates, but it promises to optimize data processing in active clusters by freeing up coordinator resources and supporting extensible encoding schemes.