Company
Date Published
Author
Chris Biow
Word count
2147
Language
English
Hacker News points
None

Summary

We set out to build an inexpensive petabyte database using MongoDB and Amazon Web Services, with the goal of creating a database as inexpensively as possible. We chose AWS due to its popularity among MongoDB users and hourly billing for disk and server resources. The key to MongoDB write performance is storage performance, specifically random seeks (IOPS), which tend to be the limiting factor at scale. We experimented with three types of storage: Elastic Block Storage (EBS) PIOPS, Ephemeral SSD instances, and Ephemeral Spinning Disk instances. Our approach was to use a large array of spinning disks for petascale storage and utilize MongoDB shards on each server to improve write performance. We also customized the YCSB load generation tool to better work with MongoDB and added project-specific enhancements to support meaningful selection criteria and queries. The resulting cluster achieved its goal of reaching a petabyte in just two hours, demonstrating the feasibility of building an inexpensive petabyte database using MongoDB and AWS.