Python DataFrames (Bodo, Daft, Polars, PySpark, Dask, Modin/Ray) Compete for Your NYC Taxi Fare

Post Details

Company

Bodo

Date Published

Sept. 17, 2025

Author

Todd A. Anderson

Word Count

1,761

Language

-

Hacker News Points

-

Source URL

www.bodo.ai/blog/python-dataframes-bodo-daft-polars-pyspark-dask-modin-ray-compete-for-your-nyc-taxi-fare

Summary

The third part of the series on Python DataFrames revisits the NYC Taxi benchmark to evaluate the performance of Bodo DataFrames, a high-performance, scalable alternative to Pandas that maintains the familiar Pandas API with minimal code changes. Bodo DataFrames leverages a C++ backend and Bodo JIT compiler to deliver significant speed improvements, comparable to the Bodo JIT compiler alone, while outperforming other systems like Daft, Polars, PySpark, Dask, and Modin/Ray by 2x–250x. The library excels in processing data larger than available memory through streaming and spilling capabilities, making it an attractive option for large-scale Pandas workloads without needing extensive code rewrites. This installment highlights Bodo DataFrames' ability to provide top-tier performance and seamless scalability across single-node and multi-node setups, while preserving Pandas idioms and minimizing developer effort, thus offering an efficient solution for data engineering pipelines.