Boosting Transcript Readability with Automatic Punctuation and Casing and ITN

Post Details

Company

AssemblyAI

Date Published

Feb. 7, 2022

Author

Kelsey Foster

Word Count

1,471

Language

English

Hacker News Points

-

Source URL

www.assemblyai.com/blog/boosting-transcript-readability-with-automatic-punctuation-and-casing-and-itn

Summary

AssemblyAI's Speech-to-Text API enhances transcription readability by automatically applying punctuation and casing, transforming raw transcripts into more legible and user-friendly text. This process involves a deep neural network model trained on billions of words to accurately predict punctuation and casing, achieving a high accuracy rate of 93.5%. Additionally, the model includes Inverse Text Normalization (ITN) to convert spoken forms into their written counterparts, ensuring proper formatting of text elements like dates and numbers. Users can customize the model for specific vocabularies or scenarios through the Word Boost feature. The API allows for real-time and asynchronous transcriptions, with options to disable automatic punctuation and casing if desired. Regular updates and new training data keep the model current, balancing between model size and prediction speed to maintain performance.