Alyah ⭐️: Toward Robust Evaluation of Emirati Dialect Capabilities in Arabic LLMs
Blog post from HuggingFace
Alyah is an Emirati-centric benchmark created to evaluate the proficiency of Arabic large language models (LLMs) in understanding the Emirati dialect, which is rich in cultural and linguistic nuances distinct from Modern Standard Arabic. Acknowledging the gap in existing benchmarks that focus primarily on Modern Standard Arabic, Alyah aims to assess models on their ability to interpret culturally embedded meanings, idiomatic expressions, and dialect-specific nuances through a dataset of 1,173 samples collected from native speakers. The evaluation involves multiple-choice questions across categories like greetings, social sensitivity, and poetry, with models assessed on semantic correctness rather than literal translations. The study found that instruction-tuned models generally outperformed base models, particularly in categories involving conversational norms and culturally appropriate responses, though challenges persist in areas like implicit meanings and rare expressions. This benchmark serves as a tool for improving model training and adaptation efforts, promoting the development of LLMs that are more attuned to the cultural and linguistic needs of the Emirati community.