Home / Companies / Onehouse / Blog / Post Details
Content Deep Dive

Inflated data lakehouse costs and latencies? - Blame S3's choice of HTTP/1.1

Blog post from Onehouse

Post Details
Company
Date Published
Author
Rajesh Mahindra
Word Count
4,396
Language
English
Hacker News Points
-
Summary

The performance of cloud object storage platforms like Amazon S3 and Google Cloud Storage (GCS) is significantly influenced by the HTTP protocols they utilize, with S3 relying on HTTP/1.1 and GCS employing HTTP/2. HTTP/1.1's limitations, such as head-of-line blocking and higher latency, lead to inefficiencies and increased costs, as demonstrated by S3 showing up to 15 times higher latency compared to GCS in practical workloads. These inefficiencies arise because HTTP/1.1 lacks the multiplexing and header compression benefits of HTTP/2, resulting in higher TCP overhead and variability in software development kit (SDK) performance. Onehouse addresses these challenges by implementing optimizations like byte-range coalescing and smart concurrency management to improve cost efficiency and performance in data lake operations. The shift from distributed file systems to object storage systems has made HTTP behavior critical in managing data lakes, emphasizing the need for effective protocol management to reduce compute costs and improve throughput. Onehouse's lakehouse platform incorporates these insights to enhance connection management and protocol behavior, ensuring better performance and cost savings for users.