easyaligner: Forced alignment of text and audio, made easy
Blog post from HuggingFace
easyaligner is a forced alignment library that simplifies the process of aligning text transcripts with audio, focusing on ease of use, flexibility, and performance. It is applicable in various scenarios, such as synchronizing e-texts with audiobooks, aligning podcast transcripts, and improving accessibility in parliamentary debates. The library supports processing audio at any granularity level while maintaining text formatting and can handle long recordings without segmentation. It employs a three-stage pipeline of voice activity detection, emission extraction, and forced alignment, which can be run as a single call, with options for model selection such as pyannote or silero. easyaligner outputs alignment results in JSON format, providing word-level timestamps that facilitate interactive applications, like synchronized text highlighting during audio playback. Additionally, it integrates with easytranscriber for automatic speech recognition and easysearch for querying alignment outputs, offering enhanced capabilities for managing and interacting with audio-text pairs.