How Modal uses ClickHouse to power real-time observability for AI workloads
Blog post from ClickHouse
Modal utilizes ClickHouse Cloud to enhance real-time observability for AI workloads across numerous GPUs and containers, overcoming previous scaling issues with data reads and writes. ClickHouse Cloud enables Modal to ingest 1-2 million events per minute and manage around 500 billion logs while maintaining sub-second query speeds, which is crucial for AI infrastructure that supports large-scale GPU workloads for training, inference, and batch processing. Through a seamless Python SDK, developers can deploy workloads without dealing with the complexity behind the scenes. Modal's use of ClickHouse has resulted in the development of several real-time dashboards that offer users detailed insights into function performance, such as execution time and latency trends. These dashboards, powered by a single ClickHouse table, facilitate efficient data querying and provide full lifecycle visibility of function calls. As Modal continues to grow, with event ingestion doubling to 2 million per minute, the team is exploring new features like a billing API and a visual function call graph to further enhance user experience and performance.
| Trend | Post Mentions | Total Month Mentions | Posts | Companies | MoM |
|---|---|---|---|---|---|
| Real-time | 9 | 4,542 | 1,005 | 235 | -31% |
| Observability | 5 | 2,534 | 521 | 146 | +9% |