Home Developers In Depth Tesseract OCR Engine
Tesseract OCR Engine
Developers - In Depth
Written by Jenny Curtis   
Friday, 15 January 2010

         Tesseract is a free optical character recognition engine, originally developed as proprietary software by Hewlett-Packard, and later revived by Google.  It is one of the oldest engines of its kind, as it was first developed between 1985 and 1994.

          In 1995 it was one of the top three engines in the UNLV Accuracy test. In 2005, when almost no work had been done on it for ten years, HP and UNLV decided to release it as open source. It was then that Google specialists started work on reviving the two-decade-old OCR engine. The project was part of the overall goal of Google to organize and index the world’s information. With the help of Tesseract, other institutions and engineers would also be able to help digitize information in the form of papers.
          Tesseract is currently released under the Apache License, Version 2.0. It processes TIFF images of a single column to create text. Other formats need to be converted to TIFF before submitted to Tesseract. Nowadays it is still considered to be one of the most accurate OCR engines available. A raw OCR engine, Tesseract has no graphical user interface, no output formatting, and no document layout analysis, which means that it cannot interpret multi-column text or equations. Even though only Windows and Ubuntu Linux are actively tested by the developers, Tesseract can successfully be used on Mac OS X. It is suitable for use as a backend. The supported languages are English, Spanish, French, German, Italian, Dutch and Brazilian Portuguese but it can be trained to work in other languages.

          Download is available at http://code.google.com/p/tesseract-ocr

Related Articles:
How to Ensure Optimum OCR Results for Asian Languages Using ABBYY FineReader
Software > Learning Center
          ABBYY FineReader is famous for its multilingual capabilities, which include OCR recognition in 186 languages. It can also recognize documents containing multiple language combinations. If you would like to achieve the best OCR...
How to Get the Best OCR Results Possible When Working with ABBYY FineReader
Software > Learning Center
          Selecting the right options is the key to getting quality OCR results. In deciding which options to use, what matters is the type and complexity of your paper document as well as the purpose you will use the electronic version...
How to Photograph Documents to Achieve Best OCR Results
Software > Learning Center
It is obvious that the higher image quality will produce better optical recognition results. This is why it is important to follow some simple steps when taking photographs of documents for OCR processing. If you start with better images, you will...
Improved OCR in PaperVision Capture
Software > In Depth
       Major improvements have been made to PaperVision Capture.  PaperVision Capture allows the capturing of paper documents and their conversion to electronic formats. Users can also import images and capture documents from fax machines...
New Free Online OCR Service
Software > News
New OCR online service is now available online. Users can upload their file and download the result in DOC, PDF, RTF or TXT format immediately. Unlike most other similar services, Free-Online-OCR.com requires no registration and there is no limit to...