Releasing LiteCoder-Terminal-SFT
Blog post from HuggingFace
LiteCoder-Terminal-SFT has been released, offering improved performance over its previous version and includes a comprehensive training dataset of 11,255 trajectories. This release features new terminal environments to enhance reinforcement learning (RL) training and expands task categories to include coding, scientific/numerical computing, and games, thereby covering a wider range of terminal interactions. The development involved a five-stage synthesis pipeline to create executable environments from task descriptions, addressing the challenge of missing execution feedback. The updated training pipeline now integrates trajectories from various frameworks, improving cross-scaffold generalization. Performance on Terminal Benchmarks 1.0, 2.0, and Pro shows significant improvement, particularly for the LiteCoder-30a3b-Terminal model, which achieved a 31.5% Pass@1 on Terminal Bench Pro. The release also includes an exploratory dataset for environmental state prediction to tackle the computational challenges of real-time terminal interactions, though current models face difficulties with state prediction. The open-sourcing of this data aims to encourage the community to explore solutions and advance the development of robust world modeling.