Home / Companies / Bodo / Blog / Post Details
Content Deep Dive

Performance and Cost Evaluation of Bodo vs. Spark, Dask, and Ray

Blog post from Bodo

Post Details
Company
Date Published
Author
Ehsan Totoni and Zhuchang Zhan
Word Count
1,980
Language
English
Hacker News Points
-
Summary

Benchmarks can help evaluate computational performance and scalability, but they are imperfect. Bodo, a compute engine that utilizes High-Performance Computing (HPC) style speed, was compared to distributed compute technologies such as Spark, Dask, and Ray for large-scale data processing workloads. The results showed that Bodo provided a 22.9x median speedup over Spark with an associated 95%+ compute cost reduction and 148x median speedup over Dask with an associated 99% compute cost reduction. Ray was not included in the results due to its inability to handle large-scale data processing workloads. The study used TPC-H benchmarks, which are traditionally used for SQL database use cases but provide representative computations for complex data workloads. Bodo's HPC-based inferential compiler approach is often orders of magnitude faster than distributed task scheduling libraries like Spark and Dask. The results translate to over 95% infrastructure cost savings.