Serverless computing platforms like AWS Lambda can lead to unexpectedly high costs for deploying AI applications due to their billing model, which charges based on the total time a function runs rather than the CPU time actually used. This is particularly costly for AI applications that spend significant time idle, waiting for responses from large language models (LLMs). In contrast, the DBOS serverless compute platform charges only for the CPU time utilized, making it 53 times more cost-efficient in specific benchmarks, such as processing 10 million requests to OpenAI's gpt-4o-mini model. DBOS achieves this efficiency by sharing execution environments across requests, allowing it to handle multiple concurrent tasks without unnecessary idle time. This approach not only reduces costs for AI applications but also benefits any application that involves significant waiting for input/output operations. DBOS's architecture allows for efficient resource utilization and scalability by continuously monitoring execution environments and using Firecracker to quickly create new ones as needed.