How Reducto improved enterprise-scale document processing latency by 3x
Blog post from Modal
Reducto, a company specializing in transforming unstructured documents into structured data for enterprises, successfully improved its document processing latency by 3x by adopting Modal's infrastructure. Originally relying on manually provisioned EC2 instances and later Kubernetes, Reducto faced challenges with scaling and latency due to variable workloads and high traffic spikes. Modal provided the necessary flexibility, GPU availability, and development experience to overcome these issues, allowing Reducto to independently scale models, customize scaling for individual customers, and reduce cold boot times through GPU memory snapshotting. A significant load test demonstrated Reducto's capability to scale its ingestion pipeline to over 1,000 GPUs, reinforcing its ability to handle demanding workloads. This transition also resulted in improved operational efficiency for Reducto's engineers, reducing the complexity and overhead associated with infrastructure management and enabling them to focus more on developing new AI models. Looking ahead, Reducto plans to expand its use of Modal for deploying new AI models and enhancing its document intelligence pipelines.