Home / Companies / Vantage / Blog / Post Details
Content Deep Dive

Querying a Billion Rows of AWS Cost Data 100X Faster with DuckDB

Blog post from Vantage

Post Details
Company
Date Published
Author
Vantage Team
Word Count
1,239
Language
English
Hacker News Points
-
Summary

In a comparison of database technologies for analyzing over a billion rows of AWS cost data for the Q4 2022 Cloud Costs Report, DuckDB significantly outperformed PostgreSQL, demonstrating query speeds between 4 to 200 times faster. The transition from Postgres to DuckDB was driven by DuckDB's column-store architecture, which enabled rapid data processing, efficient storage, and real-time data exploration, especially for complex queries and derivative table creation. While DuckDB excelled in query performance, data ingestion, and compression, reducing disk usage from 21GB to 1.7GB, it also presented challenges such as limited built-in functions compared to Postgres. Despite these drawbacks, DuckDB's ability to process large datasets swiftly allowed for a more efficient and productive analysis, contributing to the timely publication of the Q4 report. The experience suggests that while Postgres may not have been the best fit for this task, DuckDB's performance and capabilities have made it a preferred choice for future data analysis needs.