Vocabulary-Augmented Prompting for Sango — Production African Language AI Without a Parallel Corpus
Blog post from HuggingFace
In May 2026, Google Translate added Sango, the national language of the Central African Republic, to its supported languages, marking a significant step for zero-resource African languages in AI. The addition highlights the need for domain-specific vocabulary and grammar infrastructures, which general-purpose translations often lack. To address this, SangoAI was developed using a method called vocabulary-augmented prompting, which involves using a curated lexicon and language-specific prompts with a general-purpose language model, avoiding the need for large parallel corpora or fine-tuning. This approach, although not as theoretically elegant as classical neural machine translation methods, provides production-quality translations for languages like Sango and can be adapted to other low-resource African languages. The project emphasizes the importance of specialized vocabulary and grammar infrastructure for effective communication in various domains such as healthcare and education. The success of SangoAI demonstrates a scalable solution for the approximately 2,000 African languages that need similar support, with future plans to expand to other languages like Ewondo and Lingala.