Home / Companies / Together AI / Blog / Post Details
Content Deep Dive

DSGym: A holistic framework for evaluating and training data science agents

Blog post from Together AI

Post Details
Company
Date Published
Author
Fan Nie, Junlin Wang, Harper Hua, Federico Bianchi, Yongchan Kwon, Zhenting Qi, Owen Queen, Shang Zhu, James Zou
Word Count
1,270
Language
English
Hacker News Points
-
Summary

DSGym is a comprehensive framework developed to evaluate and train large language model (LLM)-based data science agents, addressing the limitations of existing benchmarks that assess isolated skills in varied environments. By integrating diverse data science evaluation suites into a single API, DSGym standardizes abstractions for datasets, agents, and metrics, thus facilitating fairer comparisons and reducing integration costs. The framework introduces novel scientific analysis tasks and modeling competitions, such as 90 bioinformatics tasks and 92 Kaggle competitions, to expand the evaluation scope. Beyond evaluation, DSGym supports agent training through trajectory generation and synthetic data pipelines, demonstrated by training a 4B model on 2,000 generated examples to achieve state-of-the-art performance. DSGym's design simplifies the addition of new tasks and evaluation scripts by using a modular approach, while its benchmarks reveal that many models rely on memorization instead of actual data analysis, particularly for general tasks. Through systematic investigation, DSGym aims to enhance the capability of data science agents to genuinely reason about data, rather than merely recall patterns.