Optical Character Recognition System

Optical character recognition (OCR) is the translation of optically scanned bitmaps of printed or written text characters into character codes, such as ASCII. This is an efficient way to turn hard-copy materials into data files that can be edited and otherwise manipulated on a computer. This is the technology long used by libraries and government agencies to make lengthy documents quickly available electronically. Advances in OCR technology have spurred its increasing use by enterprises. For many document-input tasks, OCR is the most cost-effective and speedy method available. And each year, the technology frees acres of storage space once given over to file cabinets and boxes full of paper documents. Before OCR can be used, the source material must be scanned using an optical scanner (and sometimes a specialized circuit board in the PC) to read in the page as a bitmap (a pattern of dots). Software to recognize the images is also required. Our software package proposes to solve the classification of isolated handwritten characters and digits of the UJI Pen Characters Data Set using Neural Networks. The data consists of samples of 26 characters and 10 digits written by 11 writers on a tablet PC. The characters (in standard UNIPEN format) are written both in upper and lower case and there is a whole two set of characters per writer. So the output should be in one of the 35 classes. The ultimate objective is building a writer independent model for each character. The selection of valuable features is crucial in character recognition, therefore a new and meaningful set of features, the Uniform Differential Normalized Coordinates (UDNC), introduced by C. Agell, is adopted. These features are shown to improve the recognition rate using simple classification algorithms so they are used to train a Neural Network and test its performance on UJI Pen Characters Data Set. Index Terms: Matlab, source, code, ocr, optical character recognition, scanned text, written text, ascii, isolated character.
LicenseFree
File Size65.29 kB
Version1.0
Operating System Windows Vista Windows 8 Windows Server 2008 Windows NT Windows 2000 Windows 2003 Windows Me Windows Windows 7 Windows 98 Windows XP
System RequirementsMatlab