SabiYarn is a study exploring optimization methods to advance low-resource languages in NLP through efficient pre-training of large language models (LLMs). The research addresses challenges posed by resource-intensive training processes that hinder the inclusion of languages with limited data, such as Nigerian languages. By implementing techniques like mask-based loss computation, the researchers were able to train a state-of-the-art multilingual model using a single 24 GB GPU, focusing compute resources on task-relevant tokens instead of static prompts. This approach allows for improved task performance and faster convergence without the need for post-training alignment, which is often infeasible in resource-constrained environments. The work also emphasizes the significance of developing language-specific tokenizers to better capture the linguistic nuances of African languages, thus enhancing the model's efficiency and performance. The study highlights a shift towards building native LLMs that do not inherit cultural biases and provides valuable insights into the training dynamics of African languages, while also proposing future exploration into modern LLM architectures and hardware-specific optimizations.