Home / Companies / HuggingFace / Blog / Post Details
Content Deep Dive

Holotron-12B - High Throughput Computer Use Agent

Blog post from HuggingFace

Post Details
Company
Date Published
Author
Pierre-Louis Cedoz, Hamza Benchekroun, Aurélien Lac, delfosse, Tony Wu, Mats L. Richter, Antoine Bonnet, Kai Yuan, Aleix Cambray (H-AI), and Alexandra
Word Count
868
Language
-
Hacker News Points
-
Summary

Holotron-12B is a multimodal computer-use model developed by H Company, post-trained from NVIDIA's Nemotron-Nano-2 VL model and designed for high-throughput serving in interactive environments. It utilizes a hybrid State-Space Model (SSM) and attention mechanism to achieve efficient and scalable inference, particularly benefitting agentic workloads with lengthy interaction histories and multiple images. Demonstrating significant performance improvements over its predecessors, Holotron-12B excels in benchmarks such as WebVoyager, showcasing enhanced throughput and VRAM utilization on a single H100 GPU. It was trained using supervised fine-tuning on proprietary data, achieving notable advances in localization, grounding, and UI-level interactions. The model, outperforming previous iterations like Holo2-8B, is available on Hugging Face under an NVIDIA Open Model License, setting the stage for future advancements with the upcoming Nemotron 3 Omni, which aims to further enhance reasoning and multimodal precision for large-scale autonomous applications.