IJCSI International Journal Of Computer Science Issues .

3y ago
47 Views
2 Downloads
313.15 KB
10 Pages
Last View : 23d ago
Last Download : 3m ago
Upload by : Bennett Almond
Transcription

IJCSI International Journal of Computer Science Issues, Vol. 8, Issue 4, No 1, July 2011ISSN (Online): 1694-0814www.IJCSI.org480Gujarati Script Recognition: A ReviewMamta Maloo1, Dr. K.V. Kale212PG Dept. Of Computer Science and Technology, SGB Amravati University,Amravati (M.S.), IndiaDepartment of Computer Science and Information Technology, Dr. BAM University,Aurangabad (M.S.) ,IndiaAbstractThis paper is step to locate the researchers, their track in the wayfor recognizing the characters from various regional scripts inIndia. This survey paper would provide a path for developingrecognition tools for Indian scripts where there is still a scope ofrecognition accuracy. It provides the reasons for researchers towork for recognition of Indian scripts. In this paper, the variousscripts along with their special properties, there featureextraction and recognition techniques are described.Simultaneously a detailed comparison is made on the fronts oftechniques used and recognition of Gujarati script.Keywords: Handwritten character recognition; featureextraction techniques; classification; Indian Scripts, GujaratiScript.1. IntroductionIn the age of technological development each paper issomewhere in process to look forward to be in machineeditable format. Also automation of many official workshas promoted the researchers to bridge up the gap betweenthe common man and the technology.Handwritten character recognition, usually abbreviated asHCR, is the recognition of handwritten text usingcomputers. There are two types in which the characters arerecognized: Offline and Online. As pen-paper was theprime way of communication the motivational factorbranches us to offline character recognition. Variousapplications of offline handwritten character recognitionare reading aid for the blind, preserving handwrittenold/historical documents in electronic format, automaticreading for sorting of postal mail, bank cheques,atomization of various administrative offices, etc.These days, vast research has been carried out for makingcommercially available efficient and inexpensive OCRpackages to recognize printed texts. Printed characters canhave variety of fonts and point sizes. A large amount ofliterature is available for the recognition of English,Japanese, Chinese, Arabian characters; whereascomparatively a meager amount of work has been reportedfor the recognition of Indian scripts [1, 5, 6, 8, 11, 14, 22,28, 46]. There is variety in the handwritten Indian scriptson the fronts of basic consonants and vowels, there scriptwise representation, there conjunctional appearance. Thefree-hand written characters itself is a challenge forrecognition.This survey paper would provide a path for developingrecognition tools for Indian scripts where there is still ascope of recognition accuracy. In this paper, the variousscripts along with their special properties, there featureextraction and recognition techniques are described. aredescribed. Simultaneously a detailed comparison ofGujarati script is made on the fronts of techniques andrecognition. This paper is a step for researchers to locatethe track in the way for recognizing the characters fromGujarati script.2. Steps for HCRWhenever a document is thought for recognition, there areenumerable factors involved herewith. Firstly thedocument is scanned so that the text on paper becomes theimage on computer. Then this image is preprocessed andthen converted into either machine-editable format of justrecognized as the set of characters or might be convertedinto some other script on PC as a language translation tool.To handle the image, preprocessing involves a lot of steps[3] so that the ratio for recognition enhances so also themotive of error reduction increases. These generalpreprocessing steps are summarized as under Binarization of scanned image Removal of Noise from scanned image Thinning of binarized image Skew detection and correction of scanned image, Segmentation of image Feature Extraction Techniques Recognition on the basis of Classifiers

IJCSI International Journal of Computer Science Issues, Vol. 8, Issue 4, No 1, July 2011ISSN (Online): 1694-0814www.IJCSI.org4812.1 BinarizationBinarization plays an important role in documentprocessing. Due to binarization the segmentation ofcharacter and its recognition is affected. Basicallyseparation of background and foreground of a scannedimage is called binarization. The most popular techniquefor binarization is thresholding [9,48] in which anoptimum threshold is selected and accordingly all the pixelintensities are converted to 1 i.e. background and 0 i.e.foreground.Figure 3 Gujarat Samachar News paper with noiseThe sample image Figure 3 is the image from GujaratiNews paper with noise whereas after noise removal itappears to be like image in Figure 4The sample image Figure 1 is the simple scanned imagefrom Gujarati News paper while after binarization itappears to be like image in Figure 2Figure 4 Gujarat Samachar News paper with noise removed2.3 ThinningFigure 1 Scanned image of Gujarati SamacharFigure 2 Binarized image of Gujarati Samachar (threshold level 0.4)2.2 Removal of NoiseDigital images are prone to a variety of types of noise.Noise is the result of errors in the image acquisitionprocess that result in pixel values that do not reflect thetrue intensities of the real scene. There are several waysthat noise can be introduced into an image, depending onhow the image is created. If the image is scanned from aphotograph made on film, the film grain is a source ofnoise. Noise can also be the result of damage to the film,or be introduced by the scanner itself. If the image isacquired directly in a digital format, the mechanism forgathering the data (such as a CCD detector) can introducenoise. Electronic transmission of image data can introducenoise.Whenever the image is created after scanning thedocument using any flat bed scanner there are chances ofintrusion of some signals that are not the part of the image.To remove such excess signals is the process of noiseremoval which involves many filtering techniques [26].Image thinning reduces a large amount of memory usagefor structural information storage. Binary digital imagecan be represented by a matrix, where each element inmatrix is either zero (white) or one (black) and the pointsare called pixels. Thinning is a process that deletes theunwanted pixels and transforms the image pattern onepixel thick .i.e. the thinning operation is typically appliedrepeatedly, leaving only pixel-wide linear representationsof the image objects. The thinning operation halts when nomore pixels can be removed from the image. This occurswhen the thinning produces no change in the input image.At this point, the thinned image is identical to the inputimage [17].The sample image Figure 5 is the binarized form of theimage that was scanned while Figure 6 displays the resultsof the thinning operation, reducing the original objects tosingle pixel wide lines.Figure 5 Binarized form of Gujarat Samachar News paper

IJCSI International Journal of Computer Science Issues, Vol. 8, Issue 4, No 1, July 2011ISSN (Online): 1694-0814www.IJCSI.org482While considering the line segmentation, the image isdivided into the lines which make the understanding ofimage restricted to the lines in it and for doing so thealgorithm works only for portioning the document imageinto small blocks called lines.Figure 6 Thinned form of Gujarat Samachar News paper2.4 Skew detection and correctionFor any image created by scanning the documentcomprises of human intervention such as while feeding thepaper for scanning the document it may be placed withsome tilt in either of the direction. Or it may happen thatwhile saving the scanned image slight rotation may occurdue to human error or it may be the fact that the documentitself may have handwritten characters with some anglemade by human while writing the document.Figure 7 shows the skew in the document and Figure 8shows the resulted image after skew correctionFigure 9 Line wise Segmentation of image from Gujarati SamacharFor considering the word wise segmentation, the imagewhich is divided into the lines is further divided intowords which now make the understanding of imagerestricted to the words in lines and for doing so thealgorithm works only for portioning the document imageinto more small blocks called wordsFigure 10 Word wise Segmentation of image from Gujarati SamacharFigure 7 Gujarati document having skewFor considering the character wise segmentation, theimage which is divided into the lines is further dividedinto words is then further divided into characters whichnow make the understanding of image restricted to thecharacters in words from lines in documents and for doingso the algorithm works only for portioning the documentimage into more small blocks called charactersFigure 11 Character wise Segmentation of image from Gujarati SamacharFigure 8 Skew corrected imageThe images in figure 7 and 8 are from the published workof Shah [31]2.5 Segmentation of ImageThe image cannot be handled completely at a glance. Ithas to be subdivided into many parts so that each part ofthe image is readable. To accomplish this task the image issubdivided considering three aspects, i.e. line wisesegmentation, word wise segmentation and finallycharacter wise segmentation.For doing all these types of segmentations manyalgorithms have been proposed [8]2.6 Feature Extraction TechniquesEach and every character or numeral has some special anddistinct parameters to represent and define them. But italso may happen that some of the parameters may collidewhile selecting the characters. So one needs to have suchset of parameters in which each character is discriminatedto its maximum extent. To find a set for parameters thatdefines the character is called feature extraction whilesubset of parameters which can define a character to itsmaximum extent is called as feature selectionThe process of feature selection can be carried out at threefronts: Statistical Features, Syntactical/Structural Features

IJCSI International Journal of Computer Science Issues, Vol. 8, Issue 4, No 1, July 2011ISSN (Online): 1694-0814www.IJCSI.org483and Hybrid Features. For statistical features, features areDevanagari script was used for literature and academicderived from statistical moments, geometrical moments,writings.etc. For Syntactical/Structural features, features areGujarati, an Indo-Aryan language spoken by about 46derived as strokes, holes, end points, loops, cross-over ormillion people in the Indian states of Gujarat,such structures in characters. The set of Hybrid featuresMaharashtra, Rajasthan, Karnataka and Madhya Pradesh,has the combination of Statistical and Structural features atand also in Bangladesh, Fiji, Kenya, Malawi, Mauritius,necessary level of representationOman, Pakistan, Réunion, Singapore, South Africa,Tanzania, Uganda, United Kingdom, USA, Zambia andZimbabwe. The major difference between Gujarati and2.7 Recognition on the basis of ClassifiersDevanagari is the lack of the top horizontal bar inGujarati. Otherwise the two scripts are fairly similar [35].Once the features are selected, the step that then comesforward is for recognition. This process recognizesThe basic consonants and vowels of the Gujarati script areindividual character and then results into the machineshown in Figure 12.editable format. To perform this process many classifiersare available out of which some are very popular liketemplate matching method or distance classifier likeEuclidean distance measure.Now-a-days nearest neighbor classifiers, fuzzy classifiersand Support Vector Machine classifiers are also claimingto give better results. Some researchers are using neuralnetwork as classifier.3. Properties and Recognition Techniques ofvarious Indian ScriptsDiversity in India, the scripts of India also show a widerange of variety in the characters and numerals of variousscripts. One can trace the complexity among the Indianscripts. Many of the Indian regional scripts do not haveshirorekha like Gujarati, Oriya, etc. Here for the study wehave taken following prime scripts that are worked on:Figure 12 Vowels and Consonants of Gujarati ScriptFigure 13 Numerals of Gujarati ScriptFigure 13 shows the ten numerals from Gujarati script3.2 Devanagari TeluguBengaliMore than 500 million people speak and write Devanagariscript around the world. Many languages use Devanagariscript [32] like Hindi and Marathi. Devanagari has 11vowels and 33 simple consonants. Besides the consonantsand the vowels, other constituent symbols in Devanagariare set of vowel modifiers called matra (placed to the left,right, above, or at the bottom of a character or conjunct),pure-consonant (also called half-letters) which whencombined with other consonants yield conjuncts. Ahorizontal line called shirorekha (a header line) runsthrough the entire span of work [45].3.1 Gujarati ScriptThe figure 14 shows the basic Devanagari alphabets whilefigure 15 shows the numerals of Devanagari script. The Gujarati script was adapted from the Devanagariscript to write the Gujarati language. The earliest knowndocument in the Gujarati script is a manuscript datingfrom 1592, and the script first appeared in print in a 1797advertisement. Until the 19th century it was used mainlyfor writing letters and keeping accounts, while theFigure 14 Basic vowels and consonants of Devanagari

IJCSI International Journal of Computer Science Issues, Vol. 8, Issue 4, No 1, July 2011ISSN (Online): 1694-0814www.IJCSI.org484extracting the features capturing the shape of thecharacters and used Support Vector Machine as theclassifier and achieved 87 to 95% accuracyFigure 15Numerals in DevanagariA lot of work has been reported for printed Devanagaritext, whereas very little is reported for handwrittenDevanagari script. For the first time Sethi [34] worked forhand printed characters and for typed Devanagari scriptSinha and Mahabala put their efforts [27]. Sinha andBansal [36] achieved 93% performance on individualcharacters. Recognition of Devanagari text in Sanskritmanuscript ’Saddharmapundarika’ [37] is achieved withan accuracy of 98.09% using structural features and neuralnetworks for classification.Pal and Chaudhuri have attempted OCR for two scripts,Bangla and Devanagari in [38]. Database evaluationmethods are given in [39] and database for Devanagarinumerals has been collected from mail addresses and jobapplication forms in [40]. Machine recognition of onlinehandwritten Devanagari characters has been reported in[33] with 82-85% accuracy. In [41] online Devanagariscript recognition is attempted with 86.5% accuracy on adatabase of 20 writers. A combination of on-line andoffline features has been used in [42] Binary Wavelettransform is used for feature extraction of handwrittenDevanagari characters. In [43], a survey of differentstructural techniques used for feature extraction in OCR ofdifferent scripts is given. Recently in [44], Quadraticclassifier based method is proposed with 81% accuracy.3.3 Kannada ScriptThe Kannada script is used in the southern Indian state ofKarnataka to write the Kannada language. It is derivedfrom the Old Kannada script and is closely related to theTelugu script. The components of Kannada script areshown in figure 16 and numerals in figure 17 respectively.Figure 16 Components of Kannada ScriptFigure 17Numerals of Kannada ScriptIn [8] the author split each Kannada segment image intonumber of zones in the radial and the angular directions3.4 Gurmukhi ScriptGurmukhi script alphabets consist of 41 consonants and12 vowels as shown in figure 18. It also contains 10numerals as shown in figure 19. Besides these, some of thecharacters in form of half characters are present in the feetof characters. Writing style is from left to right. Theconcept of upper/lowercase characters is absent inGurmukhi. A line of Gurmukhi script may be partitionedinto three horizontal zones, the middle zone being thebusiest one. The upper and lower zones may contain partsof vowel modifiers and diacritical markers.Figure 18Basic alphabets of Gurumukhi Script.Figure 19 Numerals in Gurumukhi ScriptLehal and Singh presented an OCR system for printedGurumukhi script [46]. The skew angle is determined bycalculating horizontal and vertical projections at differentangles at fixed interval in the range [0 to 90 ].Arecognition rate of 96.6% at a processing speed of 175characters/second was reported. Lehal and Singh [16] alsodeveloped a post processor for Gurmukhi.3.5 Oriya ScriptThe Oriya script developed from an early form of theBengali script, which belongs to the Northern group ofSouth Asian scripts. Oriya is used to write the Oriyalanguage, which is spoken in the modern Indian state ofOrissa, located on the east coast of India. While thecursive shapes of the Oriya letters appear to suggestinfluences from Southern scripts, it is thought that thecursive shape evolved from the need to write on palmleaves with a pointed stylus, which has a tendency to tearif you use too many straight lines. One can notice fromfigure 20, the round form of alphabets used in Oriya script.Figure 21 shows the numerals used in Oriya script.

IJCSI International Journal of Computer Science Issues, Vol. 8, Issue 4, No 1, July 2011ISSN (Online): 1694-0814www.IJCSI.org485It is an abugida from the Brahmic family of scripts. It hasa complex orthography with a large number of distinctcharacter shapes composed of simple and compoundcharacters. The basic alphabet of Telugu consists of 16vowels (called achchus) and 36 consonants (called hallus)totaling to 52 symbols as shown in figure 24. Each vowelin Telugu is associated with a vowel signs.Figure 20 Oriya lettersFigure 21Oriya numeralsChaudhari et al.[5] used preprocessing techniques likeskew correction, line segmentation, zone detection, wordand character segmentation and then the combination ofstroke and run-number based and water reservoir basedfeatures were used as classifiers. They achieved 96.3% ofaccuracy.3.6 Tamil ScriptTamil is a Dravidian language and one of the oldestlanguages in the world. It is the official language of theIndian state of Tamil Nadu; it also has official status in SriLanka, Malaysia and Singapore. The Tamil script has 10numerals, 12 vowels, 18 consonants (as shown in figure22 and 23) and five grantha letters. The script, however, issyllabic and not alphabetic. The complete script, therefore,consists of 31 letters in their independent form, and anadditional 216 combining letters representing everypossible combination of a vowel and a consonant.Figure 22 Tamil LettersFigure 23Tamil NumeralsSiromony et al. [28] described a method for recognition ofmachine printed letters of the Tamil alphabet using anencoded character string dictionary.3.7 Telugu ScriptTelugu is one of the ancient (5000 years old) languages ofIndia with rich cultural heritage. This language is a mothertongue of 100 million population in southern part of India.Figure 24Telugu vowels and consonantsFigure 25Telugu numeralsThe first reported work on OCR of Telugu Character is byRajasekaran and Deekshatulu [25]. Sukhaswami et al. [29]proposed a neural network based system. Hopfield modelof neural network working as an associative memory ischosen for recognition purposes initially.Pujari et al. [24] used wavelet multi-resolution analysis forcapturing the distinctive characteristics of Telugu scriptand associative memory model for recognizing thecharacters. The author had very conservative recognitionrate across fonts and sizes and is reported as varying from93% to 95%. An OCR for Telugu is reported by Negi, etal. [18]. Raw OCR accuracy with no post processing isreported as 92%. Performance across fonts varied from97.3% for Hemalatha font to 70.1% for the newspaperfont. Non-linear normalization to improve performancewas used by Negi et al., [19] by selectively scali

Kannada Gurumukhi Oriya Tamil Telugu Bengali 3.1 Gujarati Script The Gujarati script was adapted from the Devanagari script to write the Gujarati language. The earliest known document in the Gujarati script is a manuscript dating from 1592, and the script first appeared in print in a 1797 advertisement.

Related Documents:

fingerprint verification and fingerprint identification. While the goal of fingerprint verification is to verify the identity of a person, the goal of fingerprint IJCSI International Journal of Computer Science Issues, Vol. 10, Issue 4, No 1, July 2013 ISSN (Print): 1694-0814 ISSN (Online): 1694-0784 www.IJCSI.org 192

IJCSI International Journal of Computer Science Issues, Vol. 8, Issue 2, March 2011 ISSN (Online): 1694-0814 www.IJCSI.org 368 From UML Activity Diagrams to CSP Expressions: A Graph Transformation Approach using Atom 3 Too

(EHR) system, and creating a data model for that database is challenging due to the EHR system's special nature. Because of complexity, spatial, sparseness, interrelation, temporal, . Entry PACS Billing HL7 & DICOM HL7 HL7 HL7 HL7 HL7 IJCSI International Journal of Computer Science Issues, Vol. 9, Issue 5, No 1, September 2012

[ ] International Journal of Mechanical Engineering and Research (HY) Rs. 3500.00 [ ] International Journal of Mechanical and Material Sciences Research (HY) Rs. 3500.00 [ ] International Journal of Material Sciences and Technology (HY) Rs. 3500.00 [ ] International Journal of Advanced Mechanical Engineering (HY) Rs. 3500.00

Anatomy of a journal 1. Introduction This short activity will walk you through the different elements which form a Journal. Learning outcomes By the end of the activity you will be able to: Understand what an academic journal is Identify a journal article inside a journal Understand what a peer reviewed journal is 2. What is a journal? Firstly, let's look at a description of a .

excess returns over the risk-free rate of each portfolio, and the excess returns of the long- . Journal of Financial Economics, Journal of Financial Markets Journal of Financial Economics. Journal of Financial Economics. Journal of Financial Economics Journal of Financial Economics Journal of Financial Economics Journal of Financial Economics .

Create Accounting Journal (Manual) What are the Key Steps? Create Journal Enter Journal Details Submit the Journal Initiator will start the Create Journal task to create an accounting journal. Initiator will enter the journal details, and add/populate the journal lines, as required. *Besides the required fields, ensure at least

international journal for parasitology-parasites and wildlife england int j bank mark international journal of bank marketing england int j bus commun international journal of business communication united states int j entrep behav r international journal of entrepreneurial behaviour & research england