LiteRT: Maximum performance, simplified
Blog post from Google Cloud
Over the past decade, the integration of powerful accelerators like GPUs and NPUs into mobile phones has significantly enhanced the performance of AI models, offering speed increases of up to 25 times compared to CPUs and reducing power consumption by five times. Despite these benefits, developers have struggled with the complexities of interfacing with hardware-specific APIs and vendor-specific SDKs. To address these challenges, the Google AI Edge team has introduced improvements to LiteRT, including a new API that simplifies on-device ML inference, cutting-edge GPU acceleration, and NPU support developed with MediaTek and Qualcomm. These updates feature MLDrift for superior GPU performance, a uniform method for developing and deploying models on various NPUs, and an advanced TensorBuffer API that reduces memory overhead. Additionally, asynchronous execution capabilities allow for more efficient parallel processing across different processors, enhancing the responsiveness and efficiency of AI applications on mobile devices. These advancements aim to provide developers with tools to maximize AI model performance on mobile platforms, with further enhancements and broader support anticipated in the coming year.