Generated by DeepSeek V3.2| optical character recognition | |
|---|---|
| Name | Optical Character Recognition |
| Caption | A process converting images of text into machine-encoded text. |
| Developer | Various, including Ray Kurzweil, IBM, Google |
| Released | 1974 (commercial systems) |
| Genre | Computer vision, pattern recognition |
| License | Proprietary and open-source (e.g., Tesseract) |
optical character recognition is a field of computer science and artificial intelligence focused on the mechanical or electronic conversion of images of typed, handwritten, or printed text into machine-encoded text. It is a foundational technology for digitizing printed materials, enabling data entry automation, and making scanned documents searchable and editable. The process typically involves image preprocessing, text detection, character recognition, and post-processing to improve accuracy.
The core function is to analyze a raster image or digital photograph containing text and translate the shapes of glyphs into corresponding characters. Early systems relied on pattern matching techniques, often requiring specific fonts, while modern approaches utilize advanced machine learning and neural network models. This technology is integral to systems developed by companies like Adobe Systems in its Acrobat software and Microsoft in its OneNote application. The output is commonly used for data extraction, enabling further processing by natural language processing systems or storage in databases like those from Oracle Corporation.
The conceptual origins can be traced to early 20th-century inventions like Emanuel Goldberg's statistical machine for searching microfilm archives. The first practical systems emerged in the mid-20th century, with pioneering work at institutions like the Stanford Research Institute and by individuals such as David H. Shepard, who founded Cognitronics Corporation. A significant breakthrough came with the invention of the Kurzweil Reading Machine by Ray Kurzweil in the 1970s, which was capable of recognizing text in multiple fonts and was later acquired by Xerox. The 1990s saw the rise of open-source engines, most notably Tesseract, originally developed by Hewlett-Packard and later maintained by Google.
Traditional methods often employed feature extraction and template matching, comparing character images to stored prototypes. The field was revolutionized by the adoption of hidden Markov models and support vector machines for statistical classification. Contemporary state-of-the-art systems are dominated by deep learning architectures, particularly convolutional neural networks for feature detection and recurrent neural networks like Long short-term memory networks for sequence modeling. Preprocessing steps, such as those developed for the MNIST database, include image segmentation, noise reduction, and binarization to improve input quality for these models.
This technology is ubiquitous in enterprise environments for automating data entry from forms, invoices, and receipts, with software solutions provided by companies like ABBYY and Kofax. It enables the digitization of historical archives and libraries, such as projects undertaken by the Internet Archive and Google Books. In the financial sector, it processes cheques for electronic clearing in systems used by the Federal Reserve. Mobile applications, including Google Translate's instant camera translation and Apple's iOS Live Text feature, rely on real-time recognition. It is also critical for automating mail sorting in postal services like USPS and for extracting text from vehicle license plates in automatic number-plate recognition systems.
Accuracy is highly dependent on input quality; high-resolution scans of clean printed text from documents like the Gutenberg Bible can achieve near-perfect results, while poor-quality sources like faxes or historical newspapers pose significant difficulties. Major challenges include recognizing cursive handwriting, degraded print, complex document layout analysis, and multilingual text containing mixed scripts like Latin and Cyrillic. The International Conference on Document Analysis and Recognition serves as a key forum for discussing advances. Ongoing research focuses on improving robustness through models trained on diverse datasets and integrating contextual knowledge from linguistics to resolve ambiguities.
Category:Computer vision Category:Artificial intelligence Category:Data management