Apriel 5B: ServiceNow’s Enterprise AI Trained and Deployed on Lambda
Blog post from Lambda
Apriel 5B is a compact yet powerful large language model (LLM) developed by ServiceNow, designed to be efficient and cost-effective for enterprise deployment. With 4.8 billion parameters and a decoder-only transformer architecture, it focuses on maximizing performance while minimizing compute costs and latency, making it suitable for high-throughput inference and fine-tuning tasks. Trained on a diverse dataset of 4.5 trillion tokens, including natural language text and programming languages, Apriel 5B excels in various enterprise applications such as IT service management automation, conversational AI, code generation, and domain-specific natural language processing. Despite its smaller size compared to larger models, it achieves an impressive inference throughput of approximately 1,250 tokens per second. Apriel 5B supports mixed precision quantization, pipeline and tensor parallelism, and is ready for production use through ONNX Runtime and Triton Inference Server deployments. Its development was supported by Lambda's infrastructure, facilitating experimentation and iteration without the constraints of hardware limitations or disruptions.