R2 SQL: a deep dive into our new distributed query engine

Post Details

Company

Cloudflare

Date Published

Sept. 25, 2025

Author

Yevgen Safronov, Nikita Lapkov, and JÃ©rÃ´me Schneider

Word Count

2,969

Language

English

Hacker News Points

-

Source URL

blog.cloudflare.com/r2-sql-deep-dive

Summary

R2 SQL is a serverless query engine designed to execute SQL queries efficiently over petabyte-scale data stored in Cloudflare's R2 object storage, leveraging the Apache Iceberg format for logical organization. It eliminates the need to set up separate services like Apache Spark or Trino by enabling direct querying of Iceberg tables, utilizing a two-phase approach to overcome I/O and compute challenges. The Query Planner intelligently prunes data using metadata and statistics, while the Query Execution system distributes the workload across Cloudflare's global network for parallel processing. By implementing a streaming planning pipeline and prioritizing data that aligns with the query's ORDER BY clause, R2 SQL minimizes query latency and often finishes processing early without reading the entire dataset. The architecture incorporates Apache DataFusion for efficient partition-based query execution, optimizing data access and reducing computational overhead. Future enhancements aim to support complex aggregations and improve developer experience, with R2 SQL currently available in open beta.