EuroLLM-22B
Blog post from HuggingFace
EuroLLM-22B, a fully open multilingual language model developed in Europe, marks a significant advancement in supporting the 24 official EU languages and 11 additional international languages. Built using the EuroHPC infrastructure, it has been trained on approximately 4 trillion tokens using 400 Nvidia H100 GPUs on the MareNostrum5 supercomputer. The model excels at machine translation and general benchmarks, outperforming other models like Gemma-3-27B, Qwen-3-32B, and Apertus-70B. Its development involved several European institutions and utilized a multi-phase training process to ensure high-quality language understanding and generation. EuroLLM-22B is notable for its large context size of 32K tokens and its ability to handle multi-turn conversations, making it a powerful tool for diverse language tasks. Its creation was supported by grants from EuroHPC, the EU's Horizon Europe Research and Innovation Actions, and the Portuguese Recovery and Resilience Plan, highlighting a collaborative effort across multiple research centers and universities.