Company
Date Published
Author
Ehsan Totoni and Zhuchang Zhan
Word count
1980
Language
English
Hacker News points
None

Summary

Benchmarks can help evaluate computational performance and scalability, but they are imperfect. Bodo, a compute engine that utilizes High-Performance Computing (HPC) style speed, was compared to distributed compute technologies such as Spark, Dask, and Ray for large-scale data processing workloads. The results showed that Bodo provided a 22.9x median speedup over Spark with an associated 95%+ compute cost reduction and 148x median speedup over Dask with an associated 99% compute cost reduction. Ray was not included in the results due to its inability to handle large-scale data processing workloads. The study used TPC-H benchmarks, which are traditionally used for SQL database use cases but provide representative computations for complex data workloads. Bodo's HPC-based inferential compiler approach is often orders of magnitude faster than distributed task scheduling libraries like Spark and Dask. The results translate to over 95% infrastructure cost savings.