BERTs that chat: turn any BERT into a chatbot with dLLM
Blog post from HuggingFace
Researchers Zhanhui Zhou and Lingjie Chen have developed a method to transform a standard BERT model into a conversational chatbot using a minimal amount of open-source instruction-following data and the diffusion framework, showcasing that such adaptation requires only supervised finetuning rather than extensive generative pretraining. They introduce dLLM, an open-source framework that standardizes the training, inference, and evaluation of diffusion language models (DLMs), addressing current barriers such as the lack of a unified framework and high computational costs. The framework supports easy reproduction of experiments and includes open implementations of previously unavailable algorithms. Their ModernBERT-Chat model, finetuned on instruction-response pairs, demonstrates performance close to the Qwen1.5-0.5B benchmark across several tests, suggesting that the original masked language modeling pretraining of BERT already imparts sufficient knowledge for diffusion-based generation. The researchers encourage further community contributions to enhance dLLM's capabilities, aiming to make it a comprehensive platform for DLM research.