Fuzzy Stroke Analysis Of Devnagari Handwritten Characters

1y ago
3 Views
1 Downloads
658.22 KB
12 Pages
Last View : 2m ago
Last Download : 3m ago
Upload by : Javier Atchley
Transcription

WSEAS TRANSACTIONS on COMPUTERSPrachi Mukherji and Priti P. RegeFuzzy Stroke Analysis of Devnagari Handwritten CharactersPRACHI MUKHERJI1, PRITI P. REGE21Electronics and Telecommunication Department,Smt. Kashibai Navle College of Engg., Pune 411041,INDIA2Electronics and Telecommunication Department,College of Engineering Pune, Shivajinagar, Pune, 411005.INDIAprachimukherji@rediffmail.com, ppr@coep.extcAbstract: - Devnagari script is a major script of India widely used for various languages. In this work, wepropose a fuzzy stroke-based technique for analyzing handwritten Devnagari characters. After preprocessing,the character is segmented in strokes using our thinning and segmentation algorithm. We propose AverageCompressed Direction Codes (ACDC) for shape description of segmented strokes. The strokes are classified asleft curve, right curve, horizontal stroke, vertical stroke and slanted lines etc. We assign fuzzy weight to thestrokes according to their circularity to find similarity between over segmented strokes and model strokes. Thecharacter is divided into nine zones and the occurrences of strokes in each zone and combinations of zones arefound to contribute to Zonal Stroke Frequency (ZSF) and Regional Stroke Frequency (RSF) respectively. Theclassification space is partitioned on the basis of number of strokes, Zonal Stroke Frequency and RegionalStroke Frequency. The knowledge of script grammar is applied to classify characters using features like ACDCbased stroke shape, relative strength, circularity and relative area. Euclidean distance classifier is applied forunordered stroke matching. The system tolerates slant of about 10º left and right and a skew of 5º up and down.The system proves to be fast and efficient with regard to space and time and gives high discrimination betweensimilar characters and gives a recognition accuracy of 92.8%.Key-Words: - Devnagari Script, Segmentation, Strokes, Average Compressed Direction Code, Zonaland Regional Stroke Frequency, Euclidean Classifier.1 IntroductionOver 500 million people all over the world useDevnagari Script. It provides written form to overforty languages [1] including Hindi, Konkani andMarathi. It is a logical composition of itsconstituent symbols in two dimensions [2]. It has ahorizontal line drawn on top of all characters [2].The character set consists of 35 consonants and 13vowels. A marked distinction in Devnagari scriptfrom those scripts of Roman genre is the fact that acharacter represents a syllabic sound, complete initself. There has been intense research work doneon the English, Latin, Chinese, Persian, Tamil, andBangla scripts on both handwritten and machineprinted texts. There are two main approaches forfeature extraction of handwritten characters:statistical and structural approach. Structuralapproach has been successfully used for printed [3]and broken [4] Thai character recognition. Whilemost work has been published for printedDevnagari text, very little is reported forhandwritten Devnagari script. One of the firstattempts for handprinted characters has been bySethi [5] and for typed Devnagari script by SinhaISSN: 1109-2750and Mahabala [6]. V. Bansal and Sinha in [7]divided the typed word in three strips and separatedit in top strip, core strip and bottom strip andachieved 93% performance on individualcharacters. Pal and Chaudhuri have attempted OCRfor two scripts, Bangla and Devnagari in [8].Machine recognition of online handwrittenDevnagari characters has been reported in [2] with82-85% accuracy. In [9], a survey of differentstructural techniques used for feature extraction inOCR of different scripts and status of other Indianscripts is given. In [10], Devnagari handwrittencharacter recognition is achieved using connectedsegments and neural network. U. Pal et al in [11]extract directional chain code of contour points ofthe characters. The length of feature is 64 and theyachieved an accuracy of 80.36% using quadraticclassifier. More recently in [12], U. Pal et al,extracted a 400 feature vector based on Robertsfilter on isolated normalized characters andachieved an accuracy of 94%. They considercharacters without top-modifiers.351Issue 5, Volume 7, May 2008

WSEAS TRANSACTIONS on COMPUTERSPrachi Mukherji and Priti P. RegeIn this proposed work, attempt has been made torecognize 45 handwritten characters in Devnagariscript using a structural approach. This systemallows variations that occur in individualhandwriting. The handwriting should be legible andadhering to structural syntax of the Devnagariscript. The processing steps of our OCR system canbe summarized concisely. The documents arescanned and filtered. The filtered images aresubjected to binarization as the text is printed asdark points on light background (or vice versa) andcan be mapped as a binary image. The charactersare extracted from the form sheet and enclosed in acomponent box. Skeletonization, topline detectionand top modifier segmentation are the next steps.Skew removal and slant removal are integral part ofany optical character recognition (OCR) system asit increases accuracy by removing the variability ofhandwritten data. Average Compressed DirectionCoding (ACDC) algorithm codes the strokes of thecharacter and suppresses the variations due to slopeand skew thereby reducing the need for slope andskew normalization. Thus the algorithm alsosuppresses shape variations. Cognitive scientistsreport that humans base their thinking onconceptual patterns and mental images rather thanon any numerical quantities [10]. Keeping this viewand the construction of Devnagari script, fifteenbasic shapes are defined with a fuzzy membershipfunction for better classification. The classificationspace is partitioned on the basis of number ofsegments, Zonal and Regional Stroke Frequency(ZSF and RSF). The characters are finallyclassified using Euclidean classifier on a featurevector for unordered stroke matching. The featuresused are shape codes, circularity, relative area andrelative strength of the segmented strokes. Theaccuracy of recognition achieved is 92.8%.The paper is organized as follows: Section 2gives the characteristics of Devnagari script.Section 3 describes pre-processing of the character,section 4 explains feature extraction and ACDCalgorithm in detail. Section 5 explains stroke andcharacter classification. In Section 6 segmentation,feature extraction and recognition results arediscussed. In Section 7, conclusions and directionsfor future research are givencharacter set for this work. It consists of 12 vowels(first two lines) and 33 consonants. As can beobserved the characters have circular strokes indifferent directions. This feature demands aspecific curve detection to recognize them as setmethods of other scripts may fail. Every characterhas a horizontal header line or the ‘shirorekha’ asshown in Fig. 2. This line serves as a reference todivide the character into two distinct portions:Head and Body if the top modifier is present. Asshown in Fig. 2, a Devnagari word may be dividedin three zones. Zone 1 gives top-modifier; Zone 2the body of the word and Zone 3 is the lowermodifier region. The basic character width rangesto largegoing throughfrom very smallmany medium sizes for typed characters but it isdifficult to specify for handwritten data as the sizeis dependent on individual’s stroke, pen width andstyle of writing. Many characters have a vertibar,which can be present in the middle region or in theend of the character as shown in Fig.3 (a). Somecharacters do not have vertibar as shown in Fig. 4.In Fig.5, characters from the basic set with topmodifier are shown.Figure 1(a). Basic Character Set (Typed)2 The Devnagari Character SetThe basic character set of Devnagri script is of 48characters. The character set for experimentation isbased on the present-day usage and is shown in Fig.1(a) for typed data. Fig.1 (b) shows the handwrittenFigure 1(b). Basic Character Set (handwritten)ISSN: 1109-2750352Issue 5, Volume 7, May 2008

WSEAS TRANSACTIONS on COMPUTERSPrachi Mukherji and Priti P. Regeimportant to smoothen the strokes. Gaussian lowpass filters can be used efficiently for this purpose[14]. The form of this filter in two dimensions isgiven by (1).Another special feature of Devnagari script isconjuncts or structural compositions (Yuktakshar)of two or three consonants and/or vowels. Theirformation is by simple rules and restrictions of thelanguage of application. Each of the consonant(Vyanjan) and the conjuncts (Yuktakshar) can befurther modified by vowel (Matra) modifier. In amanner similar to that of the formation ofconjuncts, combinations of vowels, with the nasalsounds, gives rise to combines in vowels also.There are as many characters in Devnagari scriptas there are syllables in the spoken language. Theproposed work is limited to the recognition of 45characters with top modifiers only. This researchwork though, is a step towards the recognition ofunconstrained Devnagari script recognition.H (u , v) e D2( u ,v ) / 2 D02(1)where D(u,v) is the distance from the origin of thetransform and D0 is the cut-off frequency.Binarization is achieved by using Otsu’s algorithm[15]. Next step, skeletonization [14] is an importantaspect of feature extraction in our scheme. Thespurs in the image are removed by the spurremoval algorithm [15]. The process of cleaning up(or removing) these spurs is called pruning. Slantof a character is detected using a method of slicesand applied successfully to Devnagari words in[13]. Keeping the topline as reference horizontalshear transform [14] is applied only if the slant ismore than the set threshold of 10 .Shirorekha4 Feature ExtractionA feature is a point of human interest in an image, aplace where something happens. It could be anintersection between two lines, or it could be acorner, or it could be just a dot surrounded byspace. These relationships are used for characteridentification, and hence feature points areexploited for the task of character recognition.Figure 2. Devnagari word with modifiers andZonesFigure 3. Character with end vertibar4.1 Top Modifier FeaturesAs shown, in Fig. 2, the distinct feature ofDevnagari script as well as individual characters isthe topline (Shirorekha) that is detected usingHough Transform [15] and Horizontal Projection[6]. If top modifier is present, it is recognized onthe basis of transitions [6] and shadows [16].Figure 4. Characters without vertibar4.2 Segmentation of Character in StrokesAn adaptive thinning algorithm has been developedfor separating an image in its constituent strokes. Inthe first step, the complete neighborhood pattern ismapped by finding and summing the contributionof all eight neighbors of a black pixel,Sum of Neighbours, s (i, j ) as given in (2).Figure 5. Characters with top modifier3 PreprocessingEach character is preprocessed before it can berecognized. Preprocessing includes filtering byGaussian filter, binarization, and thinning.Each handwritten document is scanned andconverted into a digitized image using a desktopscanner. Noise removal and image smoothing isISSN: 1109-2750Sum of Neighbours s (i, j )(2)i 1 j 1 p(i, j ) 1i 1 j 1353Issue 5, Volume 7, May 2008

WSEAS TRANSACTIONS on COMPUTERSPrachi Mukherji and Priti P. Regei 1 j 1IfSince the character image has already passedthrough skeletonization once, expectedly themaximum neighborhood can be of four pixelssignifying a crosspoint as shown in Fig. 6(a). Thispoint is given value zero and then the map ischecked for values of s (i, j ) equal to three. Blindlythresholding these pixels to zero gives rise to oversegmentation.333202343000333202 s(i, j ) 1 3i 1 j 1(3)s (i, j ) 0;elses (i, j ) 1;endFig. 7 gives various samples of character KSHAand their segmentation in strokes. As can beobserved, various characters have some basicstrokes in common. This knowledge is furtherexploited in extracting and matching of variousstrokes(a)Cross-point s(i,j) (b)Two Neighbor Cross-point4.3 Average compressed Direction Codes030020023002000000c) Lower Left Cornerd) Two Neighbor corner222111232101322011(e) Hole s(i,j)After segmentation of the character in strokes, allsegments are labeled from left to right in the imageframe. As each stroke (segment) is labeled [15], itsrow and column indices are also stored. The nextstep is coding of these strokes using Freeman’schain codes with slight modification. Normally,Freeman chain codes define a closed boundary anddirections are according to the Fig. 8. We use themhere to indicate the shape of the strokes only onepixel thick. Each pixel has only two neighborsexcept the two endpoints. We code from the higherendpoint i.e. the endpoint with lower row index. Ifrow indices are same, coding begins from the leftendpoint, i.e. the endpoint with lesser columnindex. This is done to follow the intuitive left toright and top to bottom construction of(f) Two-Neighbor HoleFigure 6. Steps in Adaptive Thinning Algorithm.Devnagari script.Instead, the combinations shown in Fig. 6(c) andFig. 6(d)-(f) are used to detect corners andsmoothen them. The pruned map is thenthresholded, as in (3).23148576Figure 8. Freeman chain code directionThe obtained code is converted back into anglesand averaged depending upon the length of thestroke. Each direction indicates an angle of 45deg.Averaging reduces minute variations in the strokesand reduces the vector space. Care has been takenwhen combinations of (8,1,2) and (7,1,2) areobtained, as direct averaging gives wrong result.Figure 7. Character ‘KSHA’ and its strokes.ISSN: 1109-2750354Issue 5, Volume 7, May 2008

WSEAS TRANSACTIONS on COMPUTERSCode is (8,1,1)Figure 9. (a) StrokePrachi Mukherji and Priti P. RegeApart from these directly obtained codes, insertionsand deletions are done so as to accommodate thevariations in handwritten strokes. This is from theperceptual study of how people read Devnagariscript. For example the shape in Fig. 10(a) isequivalent to (4,5,6,7,8) and the basic shapedefining code is (5,7) or (5,6,7) which indicates aleft curve. In this way similar curve shapes andarctan (2/3) 33. 9 , Code 2curves with distortions have similar ACDCcodes and are classified as one type of stroke.(b) Quantized AngleThe code per section of the stroke is averaged as acombination of 1, 2, or 3 codes. For exampleconsider a section of stroke shown in Fig. 9(a). thedirection code is (8,1,1). This code is averaged byconverting code directions in angles presentation value is 33. 9 . This angle isthen re-quantized as per the relation between codedirections 1-8 and the angle given in Table 1 to avalue of 2.If after averaging using angles, average codeobtained of a vertical line is 666666. This code isrun length coded to obtain a compressed code of(6,6). This Average Compressed Direction Code(ACDC) is then stored as a part of the featurevector. Code 9 is used to represent a change from(8,1) in the averaged code. Code 10 indicatesisolated single pixels. Some basic ACDC codes areshown in Fig.9.4.4 Stroke FeaturesSeven features extracted on all k strokes are listedbelow where k 1 to Num, Num is the total number ofstrokes obtained in a character.1. Lk S (i, j ) , total no of black pixels inthei, jstroke k.2.Rk ( i) / M , mean of row indices of theistroke k.3. Ck ( j ) / N , mean of column indices ofjthe stroke k.4. RELk Lk / Max(L k), Length of stroke Lk,divided by the maximum length stroke.5. CIR k sqrt [(R1k – R2k)2 (C1k – C2k)2] /Lk,indicating a straight line or a curve, where R1k andR2k, C1k and C2k are the row and column indices ofendpoints of the stroke k.(a) (4,5,7,8)Left Curve6. AREA k (RE2k – RE1k) * (CE2k – CE1k) , areaoccupied in pixels by the stroke k where RE2k andRE1k are row indices of top most row and lowermost row, CE2k and CE1k are leftmost and rightmostcolumn indices of the stroke.(b) (8,7,5,4)Right Curve7. REL AREA k AREA k / max(AREA k)Here (M,N) is the size of the imageAll seven features for character ‘Ksha’ in Fig.10 (a)and (b) are given in Table 2 (last page). Thealgorithm accurately codes all strokes and assignsthe codes as per their perceived directions(c) (4,5,7,5,4)(d) (6,5,4,3,2)S-curve‘U’ curveFigure 10. Examples of ACDC.ISSN: 1109-2750355Issue 5, Volume 7, May 2008

WSEAS TRANSACTIONS on COMPUTERSPrachi Mukherji and Priti P. Regerecognition. Classification is performed in threesteps. First, strokes are classified according to theirACDC codes and fuzzy weights are assigned. In thesecond step ZSF and RSF features partition thevector space. In the third step, character is classifiedby unordered stroke matching based on Euclideandistance of stroke features.(a)(b)Figure 11. Results of Stroke Extraction of character‘Ksha’5.1 Stroke ClassificationThe direction codes classify the strokes as one of the15 primitives enlisted in Table 3 as left curve, rightcurve, u-curve, s-curve and straight lines. Columnthree of Table 3 gives equivalent ACDC codes ofrespective strokes. As is evident from the codes, thisalgorithm brings any stroke of arbitrary size in itsbasic shape defining form. This can be thus said toemulate human-vision. For example a deep leftcurve may have a code as 45678 whereas left curvecan also be defined by codes 567 or 57. SimilarlyACDC code 6 and 67, 76 are equivalent dependingon the circularity feature CIR k.From the values given in the Table 2, we canobserve that the features extracted as well as strokecodes show similarity in their values, which provesthe representational power of our ACDC algorithm.4.5 Zoning Segment FrequencyThe character is divided in 9 zones as shown inFig.12. All character image co-ordinates arenormalized to 0-1 range on both X and Y-axis. Thezones are marked as Z1 – Z9. The zone boundariesare 0-0.33, 0.33-0.67 and 0.67 -1 on both X and Yaxis. The occurrence of stroke center of gravity isfound from Rk and Ck. The number of stokes in thezone divided by the total number of strokesconstitutes this ZSF feature. Zones are combinedinto regions in four ways and regional strokefrequencies (RSF) are calculated as given in (4)-(7).Z1Z2Z3Z4Z5Z6Z7Z8Z9Table 3. Stroke Type and ACDC CodeStrokeTypeStraight LineSlantLine 1SlantLine S’curveHorizontalLineFigure 12. Zones of CharacterREGION 1 Z1 Z2(4)REGION 2 Z1 Z2 Z4 Z5(5)REGION 3 Z4 Z5 Z7 Z8(6)REGION 4 Z3 Z6 64576854876546588,4Strokes are also accorded a weight according to thefuzzy membership function given in (8). The fixedweight in relation to other strokes is given in Table4. This helps in case of over-segmentation of5 Character ClassificationAfter character segmentation in strokes and featureextraction, classification is the main step forISSN: 1109-2750Segment356Issue 5, Volume 7, May 2008

WSEAS TRANSACTIONS on COMPUTERSPrachi Mukherji and Priti P. Regecharacter. The fuzzy stroke feature enhances therecognition capability. The membership function isweighed by the circularity feature CIR k of thestroke.vertibar, no vertibar, high activity in region R1 andlow activity in region R1 characters.5.3 Fine Classificationµ (1/ CIR k )* µfix(8)Pattern classification using distance measures isone of the earliest concepts in shape and patternrecognition. Dissimilarity, and conversely similarity,can be calculated by using a distancemetricbetween two feature vectors. The nearest neighbourrule (NNR) [17], [18] classifies an unknown sampleinto the class of its nearest neighbour, according tosome similarity measure (a distance). Given adistance, it is very simple to build up a classifierbased on this rule, which often becomes unbeatable(or at least hard to beat) by other types of classifiers.Euclidean distance or L2-Norm is the most commonmeasure of dissimilarity and is used by our system.It is defined in (9).where µfix is a parameter based upon partialsimilarity of two segments and is given in the Table3 . The matching parameter values range from 1-0.Lower value indicates a higher degree of matching.These values are developed intuitively for oversegmented segments.In Table 4, relationship values for straight lines,curves and corners are given. Only one corner and itsrelationship value is enlisted as similar values areassigned to the remaining three corners. This featureis not useful for circular; u -curve and s-curve shapestrokes, as their shape is already well defined and isfully recognizable.Table 4. Stroke and µfixdStroke1.5.5.33 .33 0.5.51.5.33 .3300.5.51.33 .3300.33.33 .33 10.33 .67.33.33 .33 01.33 0000.33 .331.5.500.67 0.512 n a bi 1i(9)2iWhere d is the distance between ai and bi, thefeature vector elements. The feature vectors are oflength n. The NNR is formulated as given in (10).d (Si, x) MINi(d(Si, x))(10)Where d is the distance measure, the set ofsamples {S1, S2, S3, ., Si, , SN } is called thelearning set, and x is the unknown sample to beclassified. Two characters are similar if theEuclidean distance of their feature vectors isrelatively small. d11 d 21D d ij d 31 M d i1[ ]5.2 Coarse ClassificationFeature vector is formed using number of strokesand the following :d12 d13 L d1 j 0 d 22 d 23 L d 2 j d 21d 32 d 33 L d 3 j d 31 M d i 2 d i 3 L d ij d i1d12 d13 L d1 j 0 d 23 L d 2 j d 32 0 L d 3 j d i 2 d i 3 L 0 (11)1) No of strokes (NUM)Thus, computing all the distances as given in(11) between sample and each prototype in thetraining set in a brute force manner, the nearestneighbour of a character can be selected bysimply looking at row-min, i.e., the element in arow closest to zero in the distance metric. Thediagonal zeros are the case when comparison ismade between models characters. But as in thiswork hand drawn characters have been2) Nine ZSF (Z1-Z9)3) Four RSFThe length of the vector is 14 for partitioning thevector space into five classes. These classes basicallyseparate the characters as end vertibar, centralISSN: 1109-2750357Issue 5, Volume 7, May 2008

WSEAS TRANSACTIONS on COMPUTERSPrachi Mukherji and Priti P. Regeconsidered, perfect zero on the diagonal is notpossible. However, smallest value on the diagonalwill be a proof of system ability to recall andfinding a closest match.Total seven features are extracted on eachstroke but all features are not used directly. Thefeature vector has stroke type as the main feature.Other supportive features are RELk andREL AREA k. As stroke order is not consideredthis can be termed as unordered stroke matchingas depicted in Fig.13.Figure 14. Samples of character ‘KHA’Figure 13. Unordered stroke matching.Figure 15. Samples of character ‘SHA’The strokes of segmented character ‘KHA’ of isshown in Fig. 16. It indicates the fact that thealgorithm is independent of character size andthe algorithm developed is free ofnormalization.The length of the vector (maximum no ofsegmented strokes is 15) is 15 * 3 i.e. 45. (Stroketype, RELk, REL AREA k ). S1, S2, S3 in Fig. 13indicate the strokes and their associated features.6 Experimentation and ResultsCharacter recognition experiments were performedusing three databases collected by us. Database I isof 270 trained writers each writing the 45‘Devnagari’ characters 3 times, in a sheet providedwith boxes. A reference sheet is provided withstandard characters to avoid too many variations incharacters. Database II was collected from 20persons, each one writing 45 characters 3 times. Thewriters were from different age groups and socialbackgrounds. They were given a sheet with boxesand no reference character sheet. Database III istotally unconstrained where 100 writers were giventhe instruction orally to write the characters 3-6times. This data was collected from school children,college students, middle-aged working professionalsand the older generation. 60 % of the database isused for feature extraction. Feature vectors of threeprototypes of each character are stored. Theremaining database is used for validation.The database for character ‘Kha’ is shown in Fig.14 and for character ‘Sha’ is shown in Fig. 15. Ascan be observed there is variation in shapes and alsoin the writing style of character ‘Kha’.ISSN: 1109-2750Figure 16.‘KHA’Strokes of one model of CharacterSimilarly Fig. 17 indicates the strokes of one modelof character ‘Sha’. Maximum variation in character‘Sha’ is observed in the short tilted line in betweenthe primary stroke.358Issue 5, Volume 7, May 2008

WSEAS TRANSACTIONS on COMPUTERSPrachi Mukherji and Priti P. RegeFig. 19 shows the plot of ZSF against the zones forcharacter ‘Kha’ and ‘Sha’. From the graph it can beinferred that every character or a group of similarlooking characters have similar ZSF and RSF. It isnoted for character ‘SHA’ there is no ZSF for zonesZ4, Z7 and Z8. This clearly distinguishes it fromcharacter ‘Kha’ as can be done by human eye.Similarly other characters have been segmented instrokes and two models are stored for eachcharacter.Both the characters shown in Fig. 15 and 16 have avertibar at the end of the character and thus come inthe same class of characters with end vertibar. Theplot in Fig. 20 indicates that they have similar RSF.Regional Stroke FrequencyStrokes of one model of CharacterSum of probabilitiesFigure 17.‘SHA’The number of strokes extracted may slightlydiffer but it is distributed around the mean. Themean for ‘Kha’ is 7.3 and for’ Sha’ is 4.2. Fig.18indicates the number of strokes on ‘Y-axis’ andnumber of samples 1-25 are marked on ‘X-axis’for‘Kha’ and gure. 20 Regional Stroke FrequencyNo.of yStroke252117139510Sample No.Figure 18. Number of Strokes Vs. Sample NoFigure 21. Character ‘Sha’ and its Primary StrokeProbabilityPDF of strokes in Zones Z1-Z9The primary stroke of character ‘Sha’ is the long leftcurve stroke as shown in Fig. 21.The distribution ofthis stroke over 11 samples is plotted in Fig. 22.This localized information of the primary strokeindicates that although the ‘y co-ordinate of thecenter of gravity of the stroke is confined to a smallrange of values; its ‘X’ co-ordinate shows morevariation. This is due to the difference in the lengthof topline, which changes the starting point of theimage frame.0.40.3'SHA'0.2'KHA'0.10z1 z2 z3 z4 z5 z6 z7 z8 z9ZonesFigure 19. Zonal Stroke FrequencyISSN: 1109-2750359Issue 5, Volume 7, May 2008

WSEAS TRANSACTIONS on COMPUTERSPrachi Mukherji and Priti P. RegeNormalized RowPrimary Stroke Distribution0.80.6Center of Gravity0.4(a)(b)0.2000.20.40.60.8Normalized Column(c )(d)Figure 22. Primary Stroke Distribution.Figure 24. Same characters written differently.The result of recognition is dependent ondatabase, varying from 89 - 97 % as given in Table5. It indicates a need for developing a standardbenchmark for Devnagari character recognition. Theoverall result for training samples is 93.67% and forunknown data is 92%. Feature vector size andaccuracy results of existing schemes and ouralgorithm are compared in Table. 6. From the valuesit is evident that our algorithm is efficient and workswith minimum vector size. The time complexity ofthe algorithm is very less compared to the onementioned in [11],[12] as no size or line densitynormalization is carried out. In [12] the character isnormalized to a size of 72 pixels and then pixelbased Sobel filter output value is extracted. OurACDC algorithm itself removes variations is veryfast as it works on thinned and segmented strokesand stores very few features. Thus it is independentof character size and pen width. The total length ofthe feature vector is 45 only. The time to recognizeone character is of the order of .01 ms on a PentiumV, 500 Mb Ram using Matlab 7.The poor accuracy can be attributed to confusingcharacters and also due to different writing patterns.As India is a multilingual country and Devnagariscript is used for different languages, it is observedthat there are variations in characters when thelanguage is changed. In Fig. 23(a) ‘Ma’ and (b)‘Bha’ represent the group of structurally similarcharacters. Fig. 24 (a) - (b) show the character ‘Kha’written differently. Fig. 24(c) and (d) show character‘Ae’ written by Hindi language speaker and Marathilanguage speaker respectively.Table 5. Recognition ResultDatabase typeRecognitionforTrainingI (Trainedwriters)II (Semitrained)III(Unconstrained)Overall Result93.5 %92.3%90.14%89.2%93.67%92.0%Table 6. Comparison with existing TechniquesTechniqueDirectionalcodehistogramSobel FilterOurACDCalgorithmFeaturevector Size64Accuracy80%4004594%92.8%7 ConclusionThe proposed work presents recognition ofDevnagari characters free from normalizationthereby giving flexibility and allowing sizevariation. Database was collected from writers ofvaried background. Modified thinning algorithmneeded for separating a character in its constituentstrokes was developed and tested successfully.Average Compressed Direction Codes (ACDC)algorithm, though simple, work efficiently to detectcurves and turning points and emulate humanvision. The overall recognition accuracy is 92.8%.(a)(b)Figure 23. Structurally similar characterISSN: ue 5, Volume 7, May 2008

WSEAS TRANSACTIONS on COMPUTERSPrachi Mukherji and Priti P. RegeRejection also plays an important role in systemevaluation. Around 1.1% of samples are rejected asunrecognized. This work is a step towardsunconstrained Devnagari script recognition, as inunconstrained text; words have to be segmented intoits basic characters and modifiers in the order of topmodifier, body (characters) and lower modifier.Also as the system is based on strokes it can be astep towards the conjunct separation andrecognition. This is a maiden attempt at analyzingDevnagari characters on stroke-based model.Further preprocessing may improve the results andalso more models may be stored.

Smt. Kashibai Navle College of Engg., Pune 411041, INDIA 2Electronics and Telecommunication Department, College of Engineering Pune, Shivajinagar, Pune, 411005. INDIA prachimukherji@rediffmail.com, ppr@coep.extc Abstract: - Devnagari script is a major script of India widely used for various languages. In this work, we

Related Documents:

ing fuzzy sets, fuzzy logic, and fuzzy inference. Fuzzy rules play a key role in representing expert control/modeling knowledge and experience and in linking the input variables of fuzzy controllers/models to output variable (or variables). Two major types of fuzzy rules exist, namely, Mamdani fuzzy rules and Takagi-Sugeno (TS, for short) fuzzy .

fuzzy controller that uses an adaptive neuro-fuzzy inference system. Fuzzy Inference system (FIS) is a popular computing framework and is based on the concept of fuzzy set theories, fuzzy if and then rules, and fuzzy reasoning. 1.2 LITERATURE REVIEW: Implementation of fuzzy logic technology for the development of sophisticated

Different types of fuzzy sets [17] are defined in order to clear the vagueness of the existing problems. D.Dubois and H.Prade has defined fuzzy number as a fuzzy subset of real line [8]. In literature, many type of fuzzy numbers like triangular fuzzy number, trapezoidal fuzzy number, pentagonal fuzzy number,

Fuzzy Logic IJCAI2018 Tutorial 1. Crisp set vs. Fuzzy set A traditional crisp set A fuzzy set 2. . A possible fuzzy set short 10. Example II : Fuzzy set 0 1 5ft 11ins 7 ft height . Fuzzy logic begins by borrowing notions from crisp logic, just as

of fuzzy numbers are triangular and trapezoidal. Fuzzy numbers have a better capability of handling vagueness than the classical fuzzy set. Making use of the concept of fuzzy numbers, Chen and Hwang [9] developed fuzzy Technique for Order of Preference by Similarity to Ideal Solution (TOPSIS) based on trapezoidal fuzzy numbers.

ii. Fuzzy rule base: in the rule base, the if-then rules are fuzzy rules. iii. Fuzzy inference engine: produces a map of the fuzzy set in the space entering the fuzzy set and in the space leaving the fuzzy set, according to the rules if-then. iv. Defuzzification: making something nonfuzzy [Xia et al., 2007] (Figure 5). PROPOSED METHOD

2D Membership functions : Binary fuzzy relations (Binary) fuzzy relations are fuzzy sets A B which map each element in A B to a membership grade between 0 and 1 (both inclusive). Note that a membership function of a binary fuzzy relation can be depicted with a 3D plot. (, )xy P Important: Binary fuzzy relations are fuzzy sets with two dimensional

successfully in captivity, yet animal nutrition is a new and relatively unexplored field. Part of the problem is a lack of facilities in zoological institutions and a lack of expertise. There is, thus, a strong need to develop nutritional studies and departments in zoological institutions. Research on nutrition is carried out both as a problem-solving exercise (in relation to ill-health or .