HMM-Based Handwritten Amharic Word Recognition With .

2y ago
60 Views
4 Downloads
266.19 KB
5 Pages
Last View : 9d ago
Last Download : 3m ago
Upload by : Philip Renner
Transcription

2009 10th International Conference on Document Analysis and RecognitionHMM-Based Handwritten Amharic Word Recognition withFeature ConcatenationYaregal Assabie and Josef BigunSchool of Information Science, Computer and Electrical EngineeringHalmstad University, Halmstad, Sweden{yaregal.assabie, josef.bigun}@hh.sebase characters and others represent their derived vocalsounds. Part of the alphabet is shown in Table 1.AbstractAmharic is the official language of Ethiopia anduses Ethiopic script for writing. In this paper, wepresent writer-independent HMM-based Amharic wordrecognition for offline handwritten text. Theunderlying units of the recognition system are a set ofprimitive strokes whose combinations formhandwritten Ethiopic characters. For each character,possibly occurring sequences of primitive strokes andtheir spatial relationships, collectively termed asprimitive structural features, are stored as feature list.Hidden Markov models for Amharic words are trainedwith such sequences of structural features ofcharacters constituting words. The recognition phasedoes not require segmentation of characters but onlyrequires text line detection and extraction of structuralfeatures in each text line. Text lines and primitivestructural features are extracted by making use ofdirection field tensor. The performance of therecognition system is tested by a database ofunconstrained handwritten documents collected fromvarious sources.Table 1. A sample of handwritten Ethiopic rders4th 5th(i) (a) (e)3rd6th( ten text is one of the most challenging patternrecognition problems due to the varying nature of thedata. With respect to this inherent nature, variousrecognition methods have been proposed. Offlinerecognition of Latin, Chinese, Indian, and Arabichandwritten text has long been an area of activeresearch and development [3], [4]. However, Ethiopichandwriting recognition in general, and Amharic wordrecognition in particular, is one of the leastinvestigated problems. The difficulty in automaticrecognition of Ethiopic script arises from the relativelylarge number of characters, their interclass similarityand structural complexity. In this paper, we presentAmharic word recognition in unconstrainedhandwritten text using hidden Markov model (HMM).1. IntroductionAmharic is the official language of Ethiopia whichhas a population of over 80 million at present. Itbelongs to Afro-Asiatic language family, and today ithas become the second most widely spoken Semiticlanguage in the world, next to Arabic. Along withseveral other Ethiopian languages, Amharic usesEthiopic script for writing. The Ethiopic script used byAmharic has 265 characters including 27 labialized(characters mostly representing two sounds, e.g. for) and 34 base characters with six ordersrepresenting derived vocal sounds of the basecharacter. The alphabet is written in a tabular format ofseven columns where the first column represents the978-0-7695-3725-2/09 25.00 2009 IEEEDOI 10.1109/ICDAR.2009.50122nd961

usually extracted by a moving a sliding window fromleft to right to produce a sequence of observations.When sliding the window a set of features areextracted after passing through image normalizationprocedures. In our proposed system the featuresvectors are computed from the structural features, i.e.primitive strokes and their spatial relationships, whichare extracted in a sequential order based their spatialarrangements. Primitive strokes are formed fromvertical and diagonal lines and end points of horizontallines, whereas connectors are defined as horizontallines between two primitives. Spatial relationshiprefers to the way two primitives are connected to eachother with horizontal lines. Primitives are furtherclassified hierarchically based on their orientation orstructure, relative length with in the character, andrelative spatial position. This classification schemeresults in 15 types of primitives, which is summarizedin Table 2, with numbers in brackets showing featurevalues. Thus a primitive stroke is represented by threefeature values.2. Description of the Recognition SystemOriginally applied to the domain of speechrecognition, HMMs have emerged as a powerfulparadigm for modeling pattern sequences in differentareas such as online handwriting recognition. Inspiredby the success in such fields, they have attracted agrowing interest more recently in various computervision applications including offline handwritingrecognition [3], [4], [5].2.1. Theoretical backgroundHMMs are doubly stochastic processes whichmodel time varying dynamic patterns. The systembeing modeled is assumed to be a Markov process thatis hidden (not observable), but can be observedthrough another stochastic process that produces thesequence of observations. The hidden process consistsof a set of states connected to each other by transitionswith probabilities, while the observed process consistsof a set of outputs or observations, each of which maybe emitted by states according to some outputprobability density function. HMMs are characterizedby the following parameters [5]: N, the number of states in the model. Individualstates are denoted as S {S1,S2, , SN}, where thestate at time t is denoted as qt . M, the number of distinct observation symbols perstate, denoted as V {v1, v2, , vM}. A {aij}, the state transition probability distributionwhere aij P[qt 1 Sj qt Si], 1 i, j N. B {bj(k)}, the observation symbol probabilitydistribution in state j, where bj(k) P[vk at t qt Sj],1 j N, 1 k M. { i}, the initial state distribution, where i P[q1 Si ], 1 i N.Table 2. Classification of primitive StructureLong (9)Vertical (8)Medium (8)Forward Slash (9)Backslash (7)Top (9)Bottom (7)Short (7)Middle (6)Long (9)Top-to-bottom (8)Medium (8)Top (9)Bottom (7)Short (7)Middle (6)Long (9)Top-to-bottom (8)Medium (8)Short (7)The above HMM is represented by a compactnotation: {A,B, }. Parameters of HMMs areestimated from training data samples that they intendto model. The maximum likelihood parameterestimation can obtained by the iterative procedure withmultiple observation sequences. The Bauch-Welchalgorithm is one of the powerful tools to estimateparameters with probabilistic approach. The mostlikely sequence of hidden states corresponding to asequence of observations are also obtained by makinguse of the well known Viterbi algorithm.Top-to-bottom (8)Top (9)Bottom (7)Middle (6)Top (9)Appendage (6)Short (7)Middle (6)Bottom (7)A primitive can be connected to another at one ormore of the following regions: top (1), middle (2), andbottom (3). A connection between two primitives isrepresented by xy where x and y are numbersrepresenting connection regions for the left and rightprimitives, respectively. Between two primitives, therecan also be two or three connections, and a total of 18spatial relationships are identified, which are listed as:11 ( ), 12 ( ), 13 ( ), 21 ( ), 22 ( ), 23 ( ), 31( ), 32 ( ), 33 ( ), 1123 ( ), 1132 ( ), 1133 ( ),1232 ( ), 2123 ( ), 2132 ( ), 2133 ( ), 1122322.2. Feature designThe traditional way of using HMMs for handwrittenword recognition is by concatenation of HMMs ofcharacters constituting the word. Input features are962

( ), and 112233 ( ). The classification and definitionof these structural features is further exposed in [1]. Aspatial relationship between two primitives is definedto have six feature values where a value of zero ispadded at the beginning for those whose number ofconnections are two or less. For example, the featurevalue of a spatial relationship of the type 13 ( ) willbe {0,0,0,0,1,3}. The sequential order of primitivestrokes and is set as AB if is spatially located atthe left or top of . However, if A and B are bothconnected to the right of another primitive stroke andA is located at the top of B, their sequential order isrepresented as BA. Each primitive is connected toanother one to the left except the first primitive in eachcharacter, in which case it does not have any one to beconnected to the left. In such cases, all the six featurevalues for such spatial relationship will be all zeros.contain possibly occurring sample features eachEthiopic character. A character can have many samplefeatures stored as character feature list reflectingvariations of writing styles. Suppose that the inputword W has sequences of characters C1, C2, C3, ,Cm,where m is the total number of characters making upthe word. Then, sample features of the word aregenerated as all combinations of sample features ofeach character. Figure 2 shows a sample feature forthe word “ ” generated from the character features.Each group in the rectangular box beneath charactersrepresents sample features for the correspondingcharacter, whereas each line represents a feature vectorof primitives and their associated spatial relationships.Characters’ features 0000008890000318870000218872.3. Training and recognitionThe goal of the training phase is to estimate theparameter values of word models from a set of trainingsamples, and the recognition phase decodes the wordbased on the observation sequence. In this work,Baum-Welch algorithm is used for training and Viterbialgorithm is used for recognition. A simplifiedflowchart of training and recognition procedures isshown in Fig. 1. The dotted-line box in the flowchartshows repetitive tasks for each word.Input word000000679000031887000021887Model trainingMastermodelfile. 2679.Sample feature for the word “ ”000000889 000031887 000021887000000679 000011988000000988 000032677 000012679Figure 2. Generation of sample feature for “ ”.After generating sample features for the input word,the next procedure is HMM initialization which sets aprototype for HMM of the word to be trained includingits model topology, transition and output distributionparameters. Gaussian probability function that consistsof means and variances is used to define the modelparameters. The number of states of a wordcorresponds to the total number of primitive strokes inthe word. The HMM topology of “ ” which haseight primitive strokes is shown in Fig. 3. Once theHMM is trained with sample features of the word, themodel is stored into a master model file which will beused later during the recognition phase.In the recognition phase, handwritten text image isprocessed to detect lines and segment words. For eachword in the text image, a sequence of primitive strokesand their spatial relationship is extracted. Figure 4shows primitive strokes and spatial relationships”. Then, theidentified for the handwritten word “Handwritten textHMM uresGeneration of sampleword features Text line detection andword segmentationFeature extractionModel recognitionOutput wordFigure 1. Training and recognition flowchart.Training samples for a given word are generatedfrom the stored feature lists of characters which963

a111a22a12a13b1( )a33a2223a34a35a24b2( )a44b3( )4a55a45a46b4( )5a66a56a676a57b5( )a777a88a78significant eigenvector modulated by the errordifferences (the difference of eigenvalues). This vectorfield is also known as the linear symmetry (LS) vectorfield and can be obtained directly by use of complexmoments. The latter are defined as:Imn ³³ (( Dx iDy ) f ) m (( Dx iDy ) f ) n dxdy (2)8a68b6( )b 7( )where m and n are non-negative integers. Among otherorders, of interest to us are I10, I11, and I20 derived as:(3)I 10 ³³ ((Dx iD y ) f )dxdyb8( )Figure 3. HMM topology for the word “ ”.2I 11 ³³ ( D x iD y ) f dxdysequence is generated as: {{ , , },{ ,},{ , , }}, where the Greek capital lettersrepresent primitive strokes and smaller letters representassociated spatial relationships. Note that , , and are not shown in the figure since they correspond to aspatial relationship of the first primitive stroke in eachcharacter, in which they do not have primitive to theleft to be connected with. Once structural features areidentified and classified, they are assigned with featurevalues as discussed in Section 2.2. Then, the extractedfeature sequences are considered as observationswhich are used by the decoder for recognition. ³³ ((D iD y ) f ) dxdy(5)In a local neighborhood of an image, I10 computesthe ordinary gradient field; I11 measures gray valuechanges (the sum of eigenvalues of S); and I20 gives acomplex value where its argument is the optimaldirection of pixels in double angle representation andits magnitude is the local LS strength (the difference ofeigenvalues of S). Pixels with low magnitude are saidto be lacking LS property. As shown in Fig. 5, I10 andI20 images can be displayed in color where the huerepresents direction of pixels with the red colorcorresponding to the direction of zero degree. I 20 (4)2x (a) (b)(c)(d)Figure 5. (a) Handwritten , (b) I10, (c) I20, (d) I11 of a.Figure 4. Structural features for the word “”.3.2. The segmentation process3. Feature ExtractionSegmentation and text line detection is done on thedirection field image (I20) in two passes. In the firstpass, the image is traversed from top to down andpixels are grouped into two as blocked (character) andopen (background) regions. A pixel is recursivelyclassified as open if it: is in the first row of the direction field image, lacks LS and one of its immediate top and/orsideways neighborhood is open.The remaining are grouped as blocked pixels.Boundaries of blocked regions produce segmentedcharacters. In the second pass, the I20 image istraversed from left to right grouping each segmentedcharacter into appropriate text lines based oncharacter’s proximity along global (average directionof the text line) and local (direction at the head of thetext line) directions. Segmented characters that do notfit into the existing text lines form a new text line. Thedirections of a text line help to predict the direction inThe recognition system requires text lines, wordsand pseudo-characters to be segmented for analysis.We developed an algorithm for such segmentationtasks using direction field image. Pseudo-charactersrepresent two or more physically connected characters,but hereafter we simply refer to them as characters.3.1. Computation of direction field imageDirection field tensor S is a 2x2 matrix whichcomputes the optimal direction of pixels in a localneighborhood of an image f [2]. It is computed as:§ ³³ (D x f) 2 dxdy³³ (D x f)(D y f)dxdy · S (1)2³³ (D y f) dxdy ¹ ³³ (Dx f)(D y f)dxdyThe integrals are implemented as convolutions witha Gaussian kernel, and Dx and Dy are derivativeoperators. The local direction vector is the most964

which the next member character is found duringtraversal, which is essential especially in skeweddocuments and non-straight text lines. Figure 6 showssegmentation and text line detection for handwrittenAmharic text skewed by 15o. Words are thensegmented based on the relative gap R betweencharacters within a text line, defined as Ri Gi – Gi-1 ,where Gi is the horizontal gap between the ith characterand its predecessor. Although the horizontal gapbetween consecutive characters varies greatly, therelative gap suppresses those variations and a thresholdsegments words fairly well.due to the quality of the handwriting and number oftraining words. Documents are classified as good andpoor based on their quality such as readability andconnectivity of characters and words. The mostfrequent 10 and 100 words were also used for trainingand testing the system with different sizes. The resultis summarized in Table 3.Table 3. Recognition result.Quality of textGoodPoorNumber of training words1098%85%10093%81%10,93276%53%(a)5. Discussion and ConclusionHMM-based Amharic word recognition system forhandwritten text is presented. Script-independent textline detection, and character and word segmentationalgorithms are also presented. Our proposed methodgenerates sample features of training words from astored feature list of characters. The feature list stores avariety of sample features for each character reflectingdifferent real-world writing styles. The advantage ofthis is that it is possible to produce real-world sampleword features for any word without collecting sampletext, which is turned out to be writer-independentrecognition system. It also means that the system canbe directly applied for other Ethiopian languageswhich use Ethiopic script for writing. Since we areencoding the relative size of primitive strokes,recognition system does not require size normalization.The recognition result can be further improved byworking more on extraction of structural features andemploying language models to HMMs. The databasewe developed can be used as a benchmark resource forfurther studies on recognition of Ethiopic script.(b)Figure 6. Results of (a) character segmentation,and (b) text line detection.Pixels are grouped as parts of primitives andconnectors based on their optimal direction. Afterhalving the double angle of I20, pixels having LSproperties and directions of [0.60] or [120.180]degrees are set as parts of primitives whereas thosewith LS properties and having directions of (60.120)degrees are considered as parts of connectors. Theextracted linear structures in the I20 image are mappedonto the I10 image to classify them into left and rightedges of primitives. A primitive is then formed fromthe matching left and right edges. Primitives are thenfurther classified using their direction, relative length,and spatial position.References4. Experiment[1] Y. Assabie and J. Bigun, " Writer-independent offlinerecognition of handwritten Ethiopic characters", In:Proc. 11th ICFHR, Montreal, pp. 652-656, 2008.[2] J. Bigun, Vision with Direction: A SystematicIntroduction to Image Processing and Vision. Springer,Heidelberg, 2006.[3] B. B. Chaudhuri (Ed.), Digital Document Processing:Major Directions and Recent Adavances, Springer,ISBN 978-1-84628-501-1, London, pp. 165-183, 2007.[4] R. Plamondon, S.N. Srihari, “On-line and off-line handwriting recognition: A comprehensive survey”, IEEETrans. PAMI, 22(1): pp. 63-84, 2000.[5] R. Rabiner, “A Tutorial on Hidden Markov Models andSelected Applications in Speech Recognition”, Proc.IEEE , 77( 2), pp. 257-286, 1989.To test the performance of the system, a databaseof handwritten Amharic documents collected from 177writers is developed. The writers were provided withAmharic documents dealing with various real-lifeissues and they used ordinary pen and white papers forwriting. A total of 307 pages were collected andscanned at a resolution of 300dpi, from which weextracted 10,932 distinct words to build a list of wordsfor training. For filtering operations of scanned texts, asymmetric Gaussian of 3x3 pixels was used. Trainingand recognition of HMMs were implemented by usingthe HTK toolkit. Recognition rates show variations965

recognition methods have been proposed. Offline recognition of Latin, Chinese, Indian, and Arabic handwritten text has long been an area of active research and development [3], [4]. However, Ethiopic handwriting recognition in general, and Amharic word recognition in particular, is one of the

Related Documents:

Genbank, RAST and AMIgene predicted same CDS in 12 instances, RAST, AMIgene Genmark hmm . 507 529 536 511 Genbank RAST Genmark hmm AMIgene . Total Number of CDS Predicted . Genbank RAST Genmark hmm AMIgene 433 22 19 43 468 42 448 28 56 54 385 72 . ATG TTG GTG . Variation in Start Codon. Genbank RAST Genemark HMM AMIgene

Typing in Amharic The Amharic keyboard ( ) uses a system where the Amharic sounds are matched to the English letters (a style called “phonetic”).File Size: 230KB

Toolkit: Ethiopia and Amharic Cinema A commercially viable Amharic-language film industry emerged in Addis Ababa, Ethiopia in 2002. The industry depends on theatrical releases inside Ethiopia to make viable profits.

An HMM-Based Anomaly Detection Approach for SCADA Systems 87 incorporates a mechanism that checks whether a detected anomaly is based on missing messages instead of an attack. Ntalampiras et al. used an HMM to model the relationship between data streams from two network nodes [14]. They used a combination of emulated

International Journal of Computer Applications (0975 - 8887) Volume 18- No.4, March 2011 43 DIC Structural HMM based IWAK-Means to Enclosed Face Data Mohammed Alhanjouri Asst. Prof. at Islamic university of Gaza Gaza, Palestine . the wavelet multiresolution analysis and HMM were combined in 2003 for face recognition. In this approach a

handwritten essays: N best list of word recognition results are used Second order HMM is used to incorporate trigram model Find most likely sequence of hidden states given a sequence of observed paths in a second order HMM– Viterbi Path Can improve performance when sentence follows

Abhinav Srivastava [2008] et al describes the "Credit card fraud detection method by using Hidden Markov Model (HMM)". In this method, they model the sequence of operations in credit card transaction processing using a Hidden Markov Model (HMM). An HMM is a double embedded stochastic process with two hierarchy levels. It can be used to .

April 23-25, 2018 ASTM International Headquarters West Conshohocken, Pennsylvania, USA October 14-17, 2018 The Pulitzer Hotel Amsterdam, The Netherlands Annual Business Meeting June 24, 2018 Sheraton San Diego Hotel & Marina San Diego, California, USA. 2018 Board of Directors www.astm.org 3 Chairman of the Board Dale F. Bohn Vice Chairmen of the Board Taco van der Maten Andrew G. Kireta Jr .