The artificial intelligence (AI) industry has experienced significant growth since 2016, driven by advancements in GPU technology and the need for faster model training. The focus has shifted towards deploying AI models to production and managing the entire AI lifecycle. A critical step in this process is AI serving, which involves deploying a task usually performed by an AI inference engine. To achieve fast end-to-end inferencing/serving, several challenges must be addressed, including optimizing AI processing, running the AI inference platform where data lives, and using a purpose-built serverless platform. By overcoming these challenges, businesses can benefit from running AI on dedicated inference chipsets and ensure a seamless user experience despite the potential slowness in the AI inference engine.