DiScoFormer: One transformer for density and score, across distributions
Blog post from HuggingFace
DiScoFormer, a novel transformer model, is introduced as a solution for estimating both the density and score of data distributions in a single forward pass without retraining, overcoming limitations in traditional methods like kernel density estimation (KDE) and neural score-matching models. By leveraging cross-attention and a shared backbone with separate output heads for density and score, DiScoFormer can evaluate these metrics at any point, maintaining accuracy even in high-dimensional spaces where KDE struggles. The model is trained using Gaussian Mixture Models, which provide exact targets for supervision due to their universal density approximation capabilities. DiScoFormer significantly outperforms KDE in both density and score estimation, particularly in high dimensions, and demonstrates adaptability to out-of-distribution inputs without requiring ground-truth data. Its promise lies in its ability to serve as a plug-in estimator that remains accurate across various applications, such as generative modeling and Bayesian inference, reducing the need for retraining across different problems.
No tracked trend matches for this post yet.