How Reducto improved enterprise-scale document processing latency by 3x

Post Details

Company

Modal

Date Published

Nov. 19, 2025

Author

-

Word Count

803

Company Posts That Month

5

Language

English

Hacker News Points

-

Source URL

modal.com/blog/reducto-case-study

Summary

Reducto, a company specializing in transforming unstructured documents into structured data for enterprises, successfully improved its document processing latency by 3x by adopting Modal's infrastructure. Originally relying on manually provisioned EC2 instances and later Kubernetes, Reducto faced challenges with scaling and latency due to variable workloads and high traffic spikes. Modal provided the necessary flexibility, GPU availability, and development experience to overcome these issues, allowing Reducto to independently scale models, customize scaling for individual customers, and reduce cold boot times through GPU memory snapshotting. A significant load test demonstrated Reducto's capability to scale its ingestion pipeline to over 1,000 GPUs, reinforcing its ability to handle demanding workloads. This transition also resulted in improved operational efficiency for Reducto's engineers, reducing the complexity and overhead associated with infrastructure management and enabling them to focus more on developing new AI models. Looking ahead, Reducto plans to expand its use of Modal for deploying new AI models and enhancing its document intelligence pipelines.

Trends Found in this Post

Trend	Post Mentions	Total Month Mentions	Posts	Companies	MoM
Kubernetes	5	1,297	225	80	-9%
AI Model Fine-tuning	1	558	140	61	-27%
LLM	1	5,556	752	184	+14%
Real-time	1	4,542	1,005	235	-31%