The dangers of the JDBC bottleneck in Trino
Blog post from Starburst
Trino, an execution engine for querying data across various sources, often encounters performance bottlenecks when accessing traditional databases through the JDBC (Java Database Connectivity) protocol. This bottleneck arises because JDBC was not designed for the large-scale data transfers Trino typically requires, leading to slow data extraction rates that hinder Trino's high-performance capabilities. The issue is compounded when using a Trino cluster, as the single-threaded nature of JDBC limits data ingress speed. To mitigate this, strategies such as using multiple JDBC connections and partitioning data for parallel extraction are suggested, although these solutions depend on the capabilities of the database system and the Trino distribution used. Starburst, a distribution of Trino, is highlighted for its effectiveness in overcoming these limitations by leveraging partitioned data, thus maximizing parallelism and improving data performance.