Home / Companies / Tiger Data / Blog / Post Details
Content Deep Dive

Row vs Columnar Storage for Analytics: Why PostgreSQL Scans Are Slower Than They Should Be

Blog post from Tiger Data

Post Details
Company
Date Published
Author
Nano
Word Count
1,751
Language
English
Hacker News Points
-
Summary

The blog post explains why PostgreSQL's row-oriented storage can lead to inefficiencies in analytical queries, especially when only a few columns are needed from large datasets. It highlights the concept of read amplification, where more data is read from disk than is necessary, leading to increased I/O costs. The post describes how PostgreSQL stores data in 8KB pages, which results in reading entire rows even when only specific columns are required, causing a read amplification ratio that can significantly slow down queries. Traditional solutions like indexing are ineffective for this issue because they optimize for row selection rather than reducing data volume read. The article contrasts this with columnar storage, where data is organized by columns, allowing only the needed data to be read, thereby reducing I/O and improving query performance. It introduces Hypercore, a hybrid storage solution that combines row and columnar storage to optimize both read and write performance by keeping recent data in row format and converting older data to columnar format. The post also provides diagnostic methods to measure read amplification and suggests leveraging Hypercore as a solution for high read amplification scenarios.