A Gentle Introduction to the Hive Connector
Blog post from Starburst
The Hive connector in Starburst Enterprise, which builds on the Trino platform, is designed to read data from object storage organized according to Hive's specifications without relying on Hive's runtime code. This connector addresses the slow query turnaround associated with traditional Hive clusters by replacing the Hive runtime with Trino's engine while retaining components like the Hive Metastore Service (HMS) to handle metadata. The HMS, a simple service using the Thrift protocol, updates metadata stored in relational databases and is essential for managing data in storage systems such as AWS S3, Google Cloud Storage, and MinIO. The architecture allows for a seamless transition from Hive to Trino, enabling faster interactive queries over large datasets without altering the storage or metadata management components. While Starburst's Hive connector shares similarities with its open-source counterpart, it includes additional features like Ranger security integration for enhanced role-based access control.