Pruning has emerged as a critical technique for reducing the size of large language models (LLMs) while maintaining their functionality, with a focus on structured width pruning in models with Gated Linear Unit (GLU) architecture, such as LLaMA 3.2. This process involves selectively removing neurons from MLP layers to decrease the model's size, ensuring that essential capabilities remain intact. The article outlines a method for pruning that respects the GLU structure, demonstrating significant size reduction without compromising performance on tasks like BoolQ, although challenges remain in tasks requiring broad context understanding, such as Lambada. The approach includes assessing neuron importance and carefully adjusting layer dimensions, resulting in a model that retains coherence and can be further optimized through techniques like knowledge distillation. The exploration of depth pruning and capability recovery processes suggests areas for future research to enhance the efficiency and performance of pruned models, making them more accessible for deployment without extensive infrastructure.