RedPajama 7B now available, instruct model outperforms all open 7B models on HELM benchmarks
Blog post from Together AI
The RedPajama project has released its v1 versions of models, including instruct-tuned and chat versions, under the Apache 2.0 license. The RedPajama-INCITE-7B-Instruct model outperforms all open 7B models on HELM benchmarks by 2-9 points, making it ideal for a wide range of tasks. The chat model is built on fully open-source data and does not use distilled data from closed models like OpenAI's, ensuring it is clean for use in open or commercial applications. A base model was also trained on the RedPajama dataset, with the same architecture as the popular Pythia model suite, but slightly behind Falcon-7B on HELM. The project aims to make future open-source models even better through community feedback and improvements on the data side, including balancing data mixture over each data slice and exploring further data deduplication strategies.