We've developed a serverless GPU cloud platform called Modal that offers fast and developer-friendly application development and deployment, particularly for machine learning prediction inference functions. Our platform uses a lazy loading approach to improve efficiency and responsiveness by initializing components as they are needed. This allows us to load the BERT model, which is typically around 512 MiB in size, in just 200 milliseconds from disk cache and about 300 milliseconds from the network. We've achieved this through various optimizations such as caching, using big hosts with sufficient bandwidth, optimizing FUSE settings like read-ahead and request sizes, managing congestion and background threads, and minimizing heap usage.