Home / Companies / Voxel51 / Blog / Post Details
Content Deep Dive

Optical Character Recognition with PyTesseract

Blog post from Voxel51

Post Details
Company
Date Published
Author
Jacob Marks
Word Count
2,148
Language
English
Hacker News Points
-
Summary

In week five of "Ten Weeks of Plugins", a series dedicated to building FiftyOne plugins, we discuss Optical Character Recognition (OCR) and Keyword Search. The PyTesseract OCR plugin leverages the Tesseract OCR engine to perform optical character recognition on samples in a dataset, while the Keyword Search plugin allows users to search within labels generated by the first plugin. These two plugins combined enable searching through documents like pages of old books, handwritten notes or resumes based on their textual content.