Recent developments in the field of language models have focused on the potential of small language models (SLMs) such as TinyLlama, Phi2, Gemma, and StableLM2, which may offer comparable performance to larger models while being more manageable and efficient. This exploration involves fine-tuning SLMs to perform specific tasks with the same efficacy as larger models, leveraging distributed training techniques such as Distributed Data Parallelism (DDP) and Fully Sharded Data Parallelism (FSDP) to manage the challenges posed by training on massive datasets. The integration of Ray, a tool for distributed computing, plays a crucial role in managing distributed computing tasks, ensuring fault tolerance, and optimizing resource utilization. The training infrastructure is supported by a cluster of nodes with shared volumes for data streaming, using WebDataset for efficient data handling. Despite encountering challenges such as data loading and streaming interruptions, the setup aims to demonstrate the viability of SLMs in real-world applications, with future releases planned to showcase fine-tuned models for specific tasks.