Home / Companies / Cloudflare / Blog / Post Details
Content Deep Dive

R2 SQL: a deep dive into our new distributed query engine

Blog post from Cloudflare

Post Details
Company
Date Published
Author
Yevgen Safronov, Nikita Lapkov, and Jérôme Schneider
Word Count
2,969
Language
English
Hacker News Points
-
Summary

R2 SQL is a serverless query engine designed to execute SQL queries efficiently over petabyte-scale data stored in Cloudflare's R2 object storage, leveraging the Apache Iceberg format for logical organization. It eliminates the need to set up separate services like Apache Spark or Trino by enabling direct querying of Iceberg tables, utilizing a two-phase approach to overcome I/O and compute challenges. The Query Planner intelligently prunes data using metadata and statistics, while the Query Execution system distributes the workload across Cloudflare's global network for parallel processing. By implementing a streaming planning pipeline and prioritizing data that aligns with the query's ORDER BY clause, R2 SQL minimizes query latency and often finishes processing early without reading the entire dataset. The architecture incorporates Apache DataFusion for efficient partition-based query execution, optimizing data access and reducing computational overhead. Future enhancements aim to support complex aggregations and improve developer experience, with R2 SQL currently available in open beta.