Home / Companies / HuggingFace / Blog / Post Details
Content Deep Dive

Kog Laneformer 2B: The Latency-First Model Behind Kog Inference Engine

Blog post from HuggingFace

Post Details
Company
Date Published
Author
Morgan Giraud, Gauthier Tallec, and Gaƫl Delalleau
Word Count
3,042
Company Posts That Month
90
Language
-
Hacker News Points
-
Summary

Kog, a Paris-based AI infrastructure startup, has released Laneformer 2B, a 2.3 billion parameter coding model optimized for high-speed decoding, on the Hugging Face Hub. Unlike traditional approaches that prioritize benchmark quality, Kog focused on maximizing inference speed from the outset, designing the model and its architecture to integrate seamlessly with their Kog Inference Engine. This latency-first approach led to the development of Delayed Tensor Parallelism (DTP), which delays inter-GPU communication costs, enhancing decoding speed without compromising model quality. Laneformer 2B, trained with a mixture of open-source data, demonstrates competitive coding capabilities, achieving high scores on benchmarks like HumanEval+ and MBPP+. Kog's open-source release includes the model weights, architecture, and documentation, aiming to encourage community involvement and innovation in latency-oriented model design. The model's training leveraged efficient European infrastructure and high-performance GPUs, ensuring a robust and repeatable process.

Trends Found in this Post
Trend Post Mentions Total Month Mentions Posts Companies MoM
LLM 7 5,172 1,006 220 -43%
Real-time 6 5,457 1,338 238 -5%
AI Model Fine-tuning 3 694 169 62 +13%
AI Agents 1 4,874 1,103 240 -1%
AI Guardrails 1 437 127 49 +102%
Data Pipeline 1 441 203 86 -29%