Fast, lazy container loading in Modal.com

Company

Modal

Date Published

Sept. 8, 2024

Author

Jonathan Belotti

Word count

2582

Language

English

Hacker News points

None

URL

modal.com/blog/jono-containers-talk

Summary

We've developed a serverless GPU cloud platform called Modal that offers fast and developer-friendly application development and deployment, particularly for machine learning prediction inference functions. Our platform uses a lazy loading approach to improve efficiency and responsiveness by initializing components as they are needed. This allows us to load the BERT model, which is typically around 512 MiB in size, in just 200 milliseconds from disk cache and about 300 milliseconds from the network. We've achieved this through various optimizations such as caching, using big hosts with sufficient bandwidth, optimizing FUSE settings like read-ahead and request sizes, managing congestion and background threads, and minimizing heap usage.