Toto 2.0: Time series forecasting enters the scaling era
Blog post from Datadog
Toto 2.0, a family of open-weight time series forecasting models released on Hugging Face, ranges from 4 million to 2.5 billion parameters and demonstrates that scaling improves model performance, as evidenced by its top rankings on benchmarks like BOOM, GIFT-Eval, and TIME. The models, which do not rely on public forecasting data for pretraining, show advancements over the previous Toto 1.0 in terms of parameter efficiency and inference speed, particularly through techniques like contiguous patch masking. Toto 2.0 models consistently sit on the Pareto frontier, indicating optimal quality-for-size tradeoffs, and outperform competitive models across various metrics such as CRPS and MASE. The release also includes model weights and infrastructure for distributed training, and it highlights the importance of data curation and the potential for future improvements in areas such as long-horizon stability and multimodal modeling for observability.