Comparing Foundational Features of Trino, Hive & Spark

Post Details

Company

Starburst

Date Published

Sept. 15, 2025

Author

Lester Martin

Word Count

1,271

Language

English

Hacker News Points

-

Source URL

www.starburst.io/blog/trino-hive-spark-foundational-features

Summary

The text provides a historical overview of the development and evolution of three popular open-source frameworks for data lake analytics: Apache Hive, Trino (formerly PrestoSQL), and Apache Spark. It highlights their journey towards achieving three key features: SQL support, performance, and durability. Initially, Apache Hadoop provided a foundation for Hive, which was created by Facebook in 2010 to offer a SQL abstraction layer over Hadoop's Java MapReduce API. Trino emerged in 2012, also at Facebook, as a solution for faster query execution by maintaining a separate compute cluster and offering federated queries across multiple data systems. Apache Spark was developed in 2014 at UC Berkeley’s AMPLab, focusing on performance by utilizing in-memory caching and resource allocation strategies. Over time, Spark added SQL support, and Hive introduced the LLAP framework for enhanced performance. In 2022, Trino incorporated fault-tolerant execution, aligning with the durability feature. Each framework ultimately achieved the three core attributes, making them essential tools in the realm of data analytics.