Home / Companies / Datadog / Blog / Post Details
Content Deep Dive

Toto 2.0: Time series forecasting enters the scaling era

Blog post from Datadog

Post Details
Company
Date Published
Author
Emaad Khwaja, Gerald Woo, Chris Lettieri, Ameet Talwalkar, David Asker
Word Count
3,054
Language
English
Hacker News Points
-
Summary

Toto 2.0, a family of open-weight time series forecasting models released on Hugging Face, ranges from 4 million to 2.5 billion parameters and demonstrates that scaling improves model performance, as evidenced by its top rankings on benchmarks like BOOM, GIFT-Eval, and TIME. The models, which do not rely on public forecasting data for pretraining, show advancements over the previous Toto 1.0 in terms of parameter efficiency and inference speed, particularly through techniques like contiguous patch masking. Toto 2.0 models consistently sit on the Pareto frontier, indicating optimal quality-for-size tradeoffs, and outperform competitive models across various metrics such as CRPS and MASE. The release also includes model weights and infrastructure for distributed training, and it highlights the importance of data curation and the potential for future improvements in areas such as long-horizon stability and multimodal modeling for observability.