Are We Really Making Much Progress in Text Classification? A Comparative Review - Summary
Blog post from Portkey
The paper provides a comprehensive review and comparison of methods for single-label and multi-label text classification, categorizing them into bag-of-words, sequence-based, graph-based, and hierarchical methods. It concludes that pre-trained language models consistently outperform graph-based and hierarchy-based methods, and sometimes even surpass traditional machine learning techniques like multilayer perceptrons on bag-of-words models. The study highlights the limited impact of graph-based methods, which often require more resources, and suggests that future research should benchmark against strong bag-of-words baselines and state-of-the-art pre-trained models. Additionally, it notes that simple methods such as multilayer perceptrons and logistic regression have been overlooked as substantial competitors, while sequence-based Transformers are identified as leading in text classification tasks.