/plushcap/analysis/assemblyai/deep-learning-paper-recap-redundancy-reduction-and-sparse-moes

Deep Learning Paper Recap - Redundancy Reduction and Sparse MoEs

What's this blog post about?

"Barlow Twins" introduces a novel self-supervised learning (SSL) solution that doesn't require negative instances. Unlike most SSL algorithms based on contrastive learning, Barlow Twins avoids collapse by measuring the cross correlation matrix between outputs of two identical networks fed with distorted versions of a sample, aiming to make it as close to the identity matrix as possible. Additionally, batch normalization of features prior to the Barlow Twins loss is crucial for preventing collapse. This technique has shown competitive performance compared to state-of-the-art contrastive methods like SimCLR. In "Sparse MoEs Meet Efficient Ensembles," the paper explores using sparse Mixtures of Experts (MoE) and model ensembles together. MoE are neural networks that use dynamic routing at the token level to execute subgraphs, allowing for a larger parameter count than dense counterparts while maintaining the same compute requirements. The results show that sparse MoEs and static ensembles can have complementary features and benefits, providing higher accuracy, more robustness, and better calibration when used together. This suggests that even as the number of experts in an MoE increases, there is still additional value added by incorporating more models into a traditional model ensemble.

Company
AssemblyAI

Date published
Aug. 17, 2022

Author(s)
Domenic Donato, Kevin Zhang

Word count
470

Hacker News points
None found.

Language
English