Home / Companies / HuggingFace / Blog / Post Details
Content Deep Dive

Introduction to Trimming ✂

Blog post from HuggingFace

Post Details
Company
Date Published
Author
Loïck BOURDOIS, Tom Aarsen, Bram Vanroy, Woojun Jung, Manuel Romero, and Prithiv Sakthi
Word Count
19,577
Language
-
Hacker News Points
-
Summary

The blog post introduces "trimming," a technique for reducing the size of machine learning models by modifying or removing model weights, specifically focusing on vocabulary-related parts of the architecture. Unlike pruning, trimming targets the model's vocabulary size to optimize memory usage and computational efficiency without retraining, making it suitable for multilingual models. The discussion includes experiments on various models, demonstrating that trimming can maintain or even enhance performance while significantly reducing model size. The article explores the impact of trimming on different architectures, such as text embeddings, encoders, decoders, and vision-language models (VLM), and emphasizes the advantages of trimming over distillation and quantization. The post also touches on open questions related to the optimal number of tokens to retain, the order of trimming and fine-tuning, and its effect on biases, suggesting that trimming could offer a simple yet effective alternative for model optimization.