The article highlights concerns about the current trajectory of deep learning, particularly the trend towards increasingly large models like NVIDIA's MegatronLM, which, despite their impressive performance, consume significant resources and hinder democratization and scalability. The author argues for a shift in focus from state-of-the-art accuracy to efficiency, advocating for smaller, faster, and more resource-efficient models that can run on a wide range of devices. Techniques such as knowledge distillation, pruning, and quantization are discussed as effective strategies for reducing model size and maintaining performance, exemplified by significant size reductions in well-known models without sacrificing accuracy. The article emphasizes the importance of tailoring models to fit the specific hardware capabilities of various devices to ensure consistent performance, suggesting that the future of deep learning lies in optimizing models for both size and efficiency to broaden their applicability and accessibility.