Home / Companies / Starburst / Blog / Post Details
Content Deep Dive

Comparing Foundational Features of Trino, Hive & Spark

Blog post from Starburst

Post Details
Company
Date Published
Author
Lester Martin
Word Count
1,271
Language
English
Hacker News Points
-
Summary

The text provides a historical overview of the development and evolution of three popular open-source frameworks for data lake analytics: Apache Hive, Trino (formerly PrestoSQL), and Apache Spark. It highlights their journey towards achieving three key features: SQL support, performance, and durability. Initially, Apache Hadoop provided a foundation for Hive, which was created by Facebook in 2010 to offer a SQL abstraction layer over Hadoop's Java MapReduce API. Trino emerged in 2012, also at Facebook, as a solution for faster query execution by maintaining a separate compute cluster and offering federated queries across multiple data systems. Apache Spark was developed in 2014 at UC Berkeley’s AMPLab, focusing on performance by utilizing in-memory caching and resource allocation strategies. Over time, Spark added SQL support, and Hive introduced the LLAP framework for enhanced performance. In 2022, Trino incorporated fault-tolerant execution, aligning with the durability feature. Each framework ultimately achieved the three core attributes, making them essential tools in the realm of data analytics.