How to Build Custom Deep Learning Based OCR models?

Company

Nanonets

Date Published

May 15, 2022

Author

Anuj Sable

Word count

2857

Language

English

Hacker News points

230

URL

nanonets.com/blog/attention-ocr-for-text-recogntion

Summary

The Optical Character Recognition (OCR) technology uses machine learning and deep learning to recognize text from digital images. It's commonly used for tasks such as reading bank cheques, ID cards, street signs, and extracting data from documents, invoices, and legal forms. OCR has evolved over the years with various approaches, including conventional computer vision techniques, deep learning models like Attention Mechanisms and Transformers, and Visual Attention Models. These models have improved the accuracy of OCR tasks by enabling the model to focus on specific parts of an image, reducing the impact of variations in data. The paper presents a project called Attention-OCR, which uses a Convolutional Recurrent Neural Network (CRNN) followed by an attention-based decoder to predict text from images. The code for this project is available, and it can be used to train models on custom datasets. Additionally, the paper discusses how to use the Nanonets API to build OCR models without writing any code, providing a user-friendly interface for data extraction tasks.