Speeding Up Text Generation with Non-Autoregressive Language Models
Blog post from Unstructured
Unstructured's team has been working on enhancing Vision Transformers (ViTs) for document processing by optimizing text generation methods. The focus is on converting PDFs and images into structured data formats like JSON efficiently enough for industrial applications. Traditional autoregressive language models, while accurate, are slow and computationally expensive due to their sequential token generation process. To address this, researchers are exploring non-autoregressive models, which can generate text without dependency on previously generated tokens, thus reducing computational costs. Key innovations include using neural conditional random fields (CRF) to manage token generation and early exit strategies in models like ELMER and CALM, which facilitate faster text generation with minimal accuracy loss. These advancements aim to improve the speed and efficiency of ViTs, making them viable for real-world document preprocessing needs.