Assas-band, An Affix-Exception-List Based Urdu Stemmer

2y ago
13 Views
2 Downloads
241.15 KB
8 Pages
Last View : 29d ago
Last Download : 3m ago
Upload by : Ciara Libby
Transcription

Assas-Band, an Affix-Exception-List Based Urdu StemmerQurat-ul-Ain AkramCenter for Research in UrduLanguage ProcessingNUCES, Pakistanainie.akram@nu.edu.pkAsma NaseerSarmad HussainCenter for Research in UrduCenter for Research in UrduLanguage ProcessingLanguage ProcessingNUCES, PakistanNUCES, Pakistanasma.naseer@nu.edu.pk sarmad.hussain@nu.edu.pkDue to urgent need for some applications, an Urdustemmer called Assas-Band1, has been developed.The current work explains the details of Assas-Bandand its enhancements using exceptions lists instead oflexical lookup methods, to improve its accuracy.Finally results are reported and discussed.AbstractBoth Inflectional and derivational morphology leadto multiple surface forms of a word. Stemmingreduces these forms back to its stem or root, and isa very useful tool for many applications. There hasnot been any work reported on Urdu stemming.The current work develops an Urdu stemmer orAssas-Band and improves the performance usingmore precise affix based exception lists, instead ofthe conventional lexical lookup employed fordeveloping stemmers in other languages. Testingshows an accuracy of 91.2%.Furtherenhancements are also suggested.2. Literature ReviewUrdu is rich in both inflectional and derivationalmorphology. Urdu verbs inflect to show agreementfor number, gender, respect and case. In addition tothese factors, verbs in Urdu also have differentinflections for infinitive, past, non-past, habitual andimperative forms. All these forms (twenty in total)for a regular verb are duplicated for transitive andcausative (di-transitive) forms, thus giving a total ofmore than sixty inflected variations. Urdu nouns alsoshow agreement for number, gender and case. Inaddition, they show diminutive and vocativeaffixation. Moreover, the nouns show derivationalchanges into adjectives and nouns. Adjectives showsimilar agreement changes for number, gender andcase. A comprehensive computational analysis ofUrdu morphology is given by Hussain (2004).Stemmers may be developed by using either rulebased or statistical approaches. Rule-based stemmersrequire prior morphological knowledge of thelanguage, while statistical stemmers use corpus tocalculate the occurrences of stems and affixes. Bothrule-based and statistical stemmers have beendeveloped for a variety of languages.A rule-based stemmer is developed for English byKrovetz (1993) using machine-readable dictionaries.Along with a dictionary, rules for inflectional andderivational morphology are defined. Due to highdependency on dictionary the systems lacksconsistency (Croft and Xu 1995). In Porter Stemmer(Porter 1980) the algorithm enforces someterminating conditions of a stem. Until any of theconditions is achieved, it keeps on removing endingsof the word iteratively. Thabet has proposed astemmer that performs stemming of classical Arabic1. IntroductionA stemmer extracts stem from various forms ofwords, for example words actor, acted, and acting allwill reduce to stem act. Stemmers are very useful fora variety of applications which need to acquire rootform instead of inflected or derived forms of words.This is especially true for Information Retrieval tasks,which search for the base forms, instead of inflectedforms. The need of stemmers becomes even morepronounced for languages which are morphologicallyrich, and have a variety of inflected and derivedforms.Urdu is spoken by more than a 100 million people(accessed from http://www.ethnologue.com/showlanguage.asp ?code urd). It is the national languageof Pakistan and a state language of India. It is anIndo-Aryan language, and is morphologically rich.Currently there is no stemmer for Urdu, howeverrecent work has shown that it may have much utilityfor a variety of applications, much wider than someother languages. Due to the morphological richnessof Urdu, its application to information retrieval tasksis quite apparent. However, there are also a few zation for text to speech systems, chunking,word sense disambiguation and statistical machinetranslation.In most of these cases, stemmingaddresses the sparseness of data caused by multiplesurface forms which are caused mostly by inflections,though also applicable to some derivations.1In Urdu Assas means stem and Assas-Band meansstemmer40Proceedings of the 7th Workshop on Asian Language Resources, ACL-IJCNLP 2009, pages 40–47,Suntec, Singapore, 6-7 August 2009. c 2009 ACL and AFNLP

in Quran (Thabet 2004) using stop-word list. Themain algorithm for prefix stemming creates lists ofwords from each surah. If words in the list do notexist in stop-word list then prefixes are removed. Theaccuracy of this algorithm is 99.6% for prefixstemming and 97% for postfix stemming. Aninteresting stemming approach is proposed by Paikand Parui (2008), which presents a general analysis ofIndian languages. With respect to the occurrences ofconsonants and vowels, characters are divided intothree categories. Different equivalence classes aremade of all the words in the lexicon using the matchof prefix of an already defined length. This techniqueis used for Bengali2, Hindi and Marathi languages. Arule-based stemming algorithm is proposed forPersian language by Sharifloo and Shamsfard (2008),which uses bottom up approach for stemming. Thealgorithm identifies substring (core) of words whichare derived from some stem and then reassemblesthese cores with the help of some rules. Morphemeclusters are used in rule matching procedure. An antirule procedure is also employed to enhance theaccuracy. The algorithm gives 90.1 % accuracy.Besides rule-based stemmers there are a number ofstatistical stemmers for different languages. Croft andXu provide two methods for stemming i.e. CorpusSpecific Stemming and Query-Specific Stemming(Croft and Xu 1995). Corpus-Specific Stemminggathers unique words from the corpus, makesequivalence classes, and after some statisticalcalculations and reclassification makes a dictionary.Query-Based Stemming utilizes dictionary that iscreated by Corpus-Based Stemming. Thus the usualprocess of stemming is replaced with dictionarylookup. Kumar and Siddiqui (2008) propose analgorithm for Hindi stemmer which calculates ngrams of the word of length l. These n-grams aretreated as postfixes. The algorithm calculatesprobabilities of stem and postfix. The combination ofstem and postfix with highest probability is selected.The algorithm achieves 89.9% accuracy. Santoshet.al. (2007) presents three statistical techniques forstemming Telugu language. In the first technique theword is divided into prefix and postfix. Then scoresare calculated on the basis of frequency of prefix,length of prefix, frequency of postfix, and length ofpostfix. The accuracy of this approach is 70.8%. Thesecond technique is based on n-grams. Words areclustered using n-grams. Within the cluster a smallestword is declared as the stem of the word. Thealgorithm gives 65.4% accuracy. In the thirdapproach a successive verity is calculated for eachword’s prefix. This approach increases accuracy to74.5%.Looking at various techniques, they can generallybe divided into rule based or statistical methods. Rulebased methods may require cyclical application ofrules. Stem and/or affix look-ups are needed for therules and may be enhanced by maintaining a lexicon.Statistical stemmers are dependent on corpus size,and their performance is influenced by morphologicalfeatures of a language. Morphologically richerlanguages require deeper linguistic analysis for betterstemming. Three different statistical approaches forstemming Telugu (Kumar and Murthy 2007) wordsreveal very low accuracy as the language is rich inmorphology. On the other hand rule-based techniqueswhen applied to morphologically rich languagesreveal accuracy up to 99.6% (Thabet 2004). Likeother South Asian languages, Urdu is alsomorphologically rich. Therefore, the current workuses a rule based approach with a variation fromlexical look-up, to develop a stemmer for Urdu. Thenext sections discuss the details of development andtesting results of this stemmer.3. Corpus CollectionAn important phase of developing Assas-Band iscorpus collection. For this four different lexica andcorpora3: C1 (Sajjad 2007), C24, C3 (Online UrduDictionary, available at www.crulp.org/oud) and C4(Ijaz and Hussain 2007) are used for analysis andtesting. Furthermore, prefix and postfix lists5 are alsoused during the analysis. The summary of each of theresources is given in table 1.Table 1: Corpora Words StatisticsCorpusTotal No. ,486149,477C419,296,84650,0004. MethodologyThe proposed technique uses some conventionsfor the Urdu stemmer Assas-Band. The stem returnedby this system is the meaningful root e.g. the stem of ں larkiyan (girls) is 5 larki (girl) and not the ک larak (boy/girl-hood; not a surface from). It alsomaintains distinction between the masculine and3Available from CRULP (www.crulp.org)Unpublished, internally developed by CRULP5Internally developed at CRULP42Also see Islam et al. (2007) for Bengali stemming41

The algorithm for Assas-Band is given in Figure1 and explained in more detail below.Prefix Extraction: To remove the prefix fromthe word, first it is checked whether the input wordexists in the Prefix-Global-Exceptional-List (PrGEL).If it exists in PrGEL, then it means that the word hasan initial string of letters which matches a prefix butis part of the stem and thus should not be stripped. Ifthe word does not exist in PrGEL, then prefix ruleslist is looked up. If an applicable prefix is found,starting from longest matching prefix to shorterprefix, appropriate rule is applied to separate prefixfrom stem-postfix. Both parts of the word are retainedfor further processing and output.Postfix Extraction: This process separates thepostfix from word and performs the post-processingstep, if required, for generating the surface form.First the remaining Stem-(Postfix) is looked up ina general Postfix-Global-Exceptional-List (PoGEL).If the word exists in the list, then it is marked as thestem. If the word does not exist in this list, it indicatesthat a possible postfix is attached. Postfix matching isthen performed. The candidate postfix rules aresorted in descending order according to the postfixlength. In addition, a Postfix-Rule-Exception-List(PoREL) is also maintained for each postfix. Thefirst applicable postfix from the list is taken and it ischecked if the word to be stemmed exists in PoREL.If the word does not exist in PoREL, then the currentpostfix rule is applied and the Stem and Postfix areextracted. If the word exists in the PoREL then thecurrent postfix rule is not applied and the next postfixrule is considered. This process is repeated for allcandidate postfix rules, until a rule is applied or thelist is exhausted. In both cases the resultant word ismarked as Stem.A complete list of prefixes and postfixes arederived by analyzing various lexica and corpora (andusing grammar books). In addition, complete ruleexception list for each postfix (PoREL), completegeneral exception list for prefixes PrGEL and generalexception list for postfixes PoGEL are developedusing C1, C2, C3 and C4. PrGEL and PoGEL arealso later extended to include all stems generatedthrough this system.After applying prefix and postfix rules, postprocessing is performed to create the surface form ofthe stem. The stem is looked up in the Add-CharacterLists (ACL). There are only five lists, maintained foreach of the following letter(s): ، ی ، ہ ، ت ، ( ا yay-hay,choti-yah, gol-hay, tay, alif), because only these canbe possibly added. If the stem is listed, thecorresponding letter(s) are appended at the end tofeminine forms of the stem. Assas-Band gives thestemlarka (boy) for word ں larkon (boys) andstem 5 larki (girl) for ں larkiyan (girls). Thereason for maintaining the gender difference is itsusability for other tasks in Urdu, e.g. machinetranslation, automatic diacritization etc. The wordcan easily be converted to underlying stem (e.g. ک larak (boy/girl-hood)), if needed.Assas-Band is trained to work with Urdu words,though it can also process foreign words, e.g.Persian, Arabic and English words, to a limitedextent. Proper nouns are considered stems, thoughonly those are handled which appear in the corpora.Figure 1: Flow Chart for the Stemming ProcessAn Urdu word is composed of a sequence ofprefixes, stem and postfixes. A word can be dividedinto (Prefix)-Stem-(Postfix). Assas-Band extractsStem from the given word, and then converts it tosurface form, as per requirement. The algorithm ofthe system is as follows. First the prefix (if it exists)is removed from the word. This returns the Stem(Postfix) sequence. Then postfix (if it exists) isremoved and Stem is extracted. The post-processingstep (if required) is performed at the end to generatethe surface form.However, while applying affix rules for anyword, the algorithm checks for exceptional cases andapplies the affix stripping rules only if the exceptionalcases are not found. This is different from othermethods which first strip and then repair.42

generate the surface form, else the stem is consideredthe surface form.Though the algorithm is straight forward, to thelists have been developed manually after repeatedanalysis, which has been a very difficult task, asexplained in next section. Some sample words inthese lists are given in the Appendices A and B.removed to give invalid stem 5 ndhay. This singlelist is maintained globally for all prefixes.Development of PoGEL: There are also manywords which do not contain any postfix but their finalfew letters may match with one. If they are notidentified and prevented from postfix removalprocess, they may result in erroneous invalid stems.For example,hathi (elephant) may be truncatedtohath (hand), which is incorrect removal of thepostfix ( ی letter choti-yay). All such words are kept inthe PoGEL, and considered as a stem. This single listis maintained globally for all the postfixes.Rule Exceptional Lists: Candidate postfixes areapplied in descending order of length. For example,for the worda ں bastiyan (towns), the followingpostfixes can be applied: ں tiyan, ں yan, اں aanand ں noon-gunna.First, if the maximal length postfix matches, it isstripped. However, there are cases, when there is amatch, but the current postfix should not be detached(a shorter postfix needs to be detached). In this case apostfix specific list is needed to list the exceptions toensure preventing misapplication of the longerpostfix. For this situation PoREL is maintained foreach postfix separately. So for ں bastiyan(towns), first the maximum length postfix ں tiyan ismatched. However, this creates the stembas (bus)which is incorrect. Thus, ں bastiyan (towns) isstored in the PrREL of ں tiyan. Due to this, thispostfix is not extracted and the next longest postfixrule is applied. Even in this case nonsense stembast is generated. Thus, ں bastiyan (towns) isalso stored in the PrREL of postfix ں yan. Next thepostfix اں an is applied. This yields 5 basti (town),which is correct. This checking and PrRELdevelopment process is manually repeated for all thewords in the corpus.Add Character Lists: During second phase theACLs (already developed in the first phase) areupdated against each of the five possible lettersequences, i.e. ، ہ ، ی ، ت ، ا , to generate correct surfaceforms. For example, when postfix 5 gi is removedfrom 5 ز zindagi (life), it creates the stem ز zind,which is not a surface form. The letter ہ hay has to beappended at the end to produce the correct surfaceform ز ہ zinda (alive). So ز zind is stored in theACL of letter ہ . In the same way the lists aredeveloped and maintained for the five lettersseparately. After applying a particular postfix rule on5. Analysis PhaseThe analysis has been divided into two phases. Firstphase involved the extraction of prefixes andpostfixes.The second phase dealt with thedevelopment of Prefix and Postfix Global ExceptionalLists (PrGEL, PoGEL), Postfix Rule ExceptionalLists (PoREL) and Add Character Lists (ACL).These are discussed here.5.1. Extraction of AffixesC1 and C2 are used for the extraction of affixes.These corpora are POS tagged. The analysis isperformed on 11,000 high frequency words. Thedetails of these corpora are given in Table 1. Bylooking at each word, prefixes and postfixes areextracted. Words may only have a prefix e.g. رت bud-surat (ugly), only a postfix, e.g. رات tasawraat (imaginations), or both prefix and postfix, e.g.5 ا bud-ikhlaq-i (bad manners). After analysis, 40prefixes and 300 postfixes are extracted. This list ismerged with an earlier list of available postfixes andprefixes6. A total of 174 prefixes and 712 postfixesare identified. They are listed in Appendix C. In thisphase, the post-processing rules are also extractedseparately.5.2. Extraction of Exception and Word ListsThe following lists are used to improve theaccuracy of Assas-Band.1. Prefix and Postfix Global Exceptional Lists(PrGEL, PoGEL)2. Postfix Rule Exceptional List (PoREL) for eachpostfix3. Add Character List (ACL) for each letter/sequenceThe second phase of analysis is performed togenerate these lists. This analysis is based on C3.Development of PrGEL: The PrGEL containsall those words from which a prefix cannot beextracted. The list contains words with first fewletters which match a prefix but do not contain thisprefix, e.g. 5 bandh-ay (tied). This word exists inPrGEL to ensure that the prefix ba (with) is not6Internally developed at CRULP43

the word, the result is checked in each ACL. If thestring is found in any of the lists then respectivecharacter is attached at the end.For example, the word آ ؤں aansuon (tears) isfound in the Exceptional List during manual analysis,because the postfix ؤں on was not initially identified.Thus, the algorithm applied the postfix ں n, leavingthe incorrect stem آ ؤ aansuo.This was(obviously) not found in OUD dictionary, so it wasplaced in PoGEL. By manually scanning each of thewords in this list, new postfix was found, whichcreated the correct stem آ aansu (tear). ACL isalso updated by this manual analysis.Instead of manually doing all the work, the process isautomated using an online Urdu dictionary (OUD)(available at www.crulp.org/oud) using the followingalgorithm.1. Take a word from corpus.2. Generate all applicable rules.3. Sort all rules in descending order according to themaximum length of each.4. Extract upper- most rule from the rules list.5. Apply extracted rule on the word. Checkremaining word’s existence in the dictionary.a. If remaining word exists in the dictionary, storethat original word in the respective rule’s StemList and stop the loop.b. Otherwise store original word in the RuleExceptional List of the respective rule and go toStep 4 for the next rule.6. Repeat steps 4 and 5 untila. Stop condition (5a) occurs, orb. All the generated rules have been traversed.7. If termination of the loop is due to step 6b, thenthe word is stored in the Global Exceptional Listwhich is universal for all the rules.8. Repeat step 1-7 for all the words in the corpus.7. TestingThe test results are given in this section.Testing Phase 1: The corpora C1 and C2 are usedwhich have combined 11,339 unique words. Thefollowing table summarizes the testing results.Table 2: Initial Testing ResultsTesting ResultsValuesTotal Number of tested words11339Accurately Stemmed7241Incorrect Stemming4098Accuracy Rate64%The above algorithm is first run for prefixes. Once acomplete manual check is performed on the results,the same algorithm is applied for the postfixes.6. Manual CorrectionsManual inspection is needed to fix the errorsgenerated by the automated system. The stem list ismanually scanned to identify real-word errors, i.e. thestemming is incorrect but results in a valid word. Forexample when ری ri postfix is applied to the word ی tokri (basket), the word ک tok (stop) isobtained which exists in the dictionary but is incorrectstemming. The inspection is also needed to ensurethat the distinction between the masculine andfeminine forms of a word is maintained. As discussedthe gender distinction is kept to ensure better use inother applications.Postfix Rule Exceptional List is scannedmanually to check for any missing entries (in case thelexicon contains incomplete information about aword) or spurious entries (in case a word is not in thelexicon). Similarly, the process is also useful inidentifying additional missing prefixes and postfixes.Inaccurate Add CharacterInaccurate Prefix StrippingInaccurate Postfix StrippingErrors due to Foreign Words27875410062107Number of Times Prefix Rules AppliedCorrectIncorrect1656942714Number of Times Postfix Rules AppliedCorrectIncorrect599049841006Number of Times Character AddedCorrectIncorrect819541278The accuracy of 64% is achieved. Some of thestems created are not in the lists and are erroneous.They are created by invalid prefix/postfix removal.Analysis showed that some prefixes and postfixescontributed to this error rate because they werederived from foreign words transliterated in Urdu. Forexample ز z postfix is correctly applied to the English44

As errors from C1 and C2 have been manuallyfixed, testing is again performed by using 10,418 highfrequency Urdu words from C4 (Ijaz and Hussain2007). The summary of testing results is in Table 3.Table 3 shows that removing foreign languageaffixes improves the results significantly. The prefixerror rate is higher than the postfix error rate. Inaddition, the ACL has to be more comprehensive.There are also some errors because some wordsrequire both prefix and postfix to be extracted, butduring stemming, if the prefix is wrongly applied anda faulty stem is generated, then the postfix is alsoincorrectly applied.Testing Phase III: After analyzing test results ofthe second phase, amendments are made in thealgorithm. Following post-processing, the stemgenerated is verified in PoGEL. If it does not exist, itis assumed that wrong rule is applied and thus it isskipped and the next rule is applied. This is repeateduntil the resulting stem is found in PoGEL. Byimplementing this methodology, the accuracy isenhanced from 90.96% to 91.18% for C4 corpusbased word list as shown in Table 4.wordladiez (ladies) yielding the stem ی ladie(lady). But this ز z postfix rule when applied to Urduwords increases the error rate. Similarly Arabic prefix ال al (the), which applies to Arabic words correctlye.g. ا آن al-Quran (the Quran), wrongly applies toUrdu words.Another reason for error in stemming isineffective post-processing due to insufficient wordsin the lists. There are also some other sources oferrors which are not directly associated withstemming but are common for Urdu corpora. Errorsare caused by spelling errors, including spacecharacter related errors (Naseem and Hussain 2007).There are also encoding normalization issues, whichneed to be corrected before string matching. This iscaused by the variation in keyboards.Testing Phase II: On the basis of previous resultanalysis, prefix and postfix rules which are applicableto only foreign words are removed from the rule lists.Such rules create errors in Urdu word stemming,while trying to cater non-essential task of stemmingtransliterated foreign words. The foreign words foundin C1 and C2 are stored in global lists i.e. PrGEL andPoGEL to ensure that they are not processed.Table 4: Test Results after Enhancing AlgorithmTesting ResultsValuesTotal Number of tested words10418Accurately Stemmed9499Incorrect Stemming919Accuracy Rate91.18%Table 3: Test Results after Removing ForeignPrefixes and Postfixes RulesTesting ResultsValuesTotal Number of tested words10418Accurately Stemmed9476Incorrect Stemming942Accuracy Rate90.96%Inaccurate Add CharacterInaccurate Prefix StrippingInaccurate Postfix StrippingErrors due to Foreign Words354734690Number of Times Prefix RulesAppliedCorrectIncorrect660187473Number of Times Postfix RulesAppliedCorrectIncorrect34452976469Number of Times Character AddedCorrectIncorrectInaccurate Add CharacterInaccurate Prefix StrippingInaccurate Postfix StrippingErrors due to Foreign Words354734460Number of Times Prefix Rules AppliedCorrectIncorrect660187473Number of Times Postfix Rules AppliedCorrectIncorrectNumber of Times Character AddedCorrectIncorrect3445299944662659135The methodology does not affect prefix removal andthe process of adding characters. The improvementmade by this methodology is only in the accuracy of6265913545

Kumar, M. S. and Murthy, K. N. 2007. Corpus BasedStatistical Approach for Stemming Telugu. Creation ofLexical Resources for Indian Language Computing andProcessing (LRIL), C-DAC, Mumbai, India.postfixes because this modification is only performedon the second phase i.e. extraction of postfixes.8. ConclusionThe current paper presents work performed todevelop an Urdu stemmer. It first removes the prefix,then the postfix and then adds letter(s) to generate thesurface form of the stem. In the first two steps it usesexception lists if a prefix and/or postfix can beapplied. A successful lookup bypasses the strippingprocess. This is different from lexical or stem lookup in other work which triggers the stripping process.The current stemming accuracy can be furtherimproved by making the lists more comprehensive.ACL should also be maintained against each postfixfor more accuracy.The developed system iscurrently being used for various other applications forUrdu language processing, including automaticdiacritization.Paik, J. H. and Parui, S. K. 2008. A Simple Stemmer forInflectional Languages. Forum for Information RetrievalEvaluation,Islam, M. Z., Uddin, M. N. and Khan, M. 2007. A LightWeight Stemmer for Bengali and Its Use in SpellingChecker. In the Proceedings of 1st Intl. Conf. on DigitalComm. and Computer, Amman, Jordan.Sharifloo, A. A. and Shamsfard, M. 2008. A Bottom upApproach to Persian Stemming. In the Proceedings ofthe Third International Joint Conference on NaturalLanguage Processing. Hyderabad, India.Kumar, A. and Siddiqui, T. 2008. An Unsupervised HindiStemmer with Heuristics Improvements. In Proceedingsof the Second Workshop on Analytics for NoisyUnstructured Text Data.AcknowledgementsThe work has been partially supported by PANLocalization Project grant by InternationalDevelopment Research Center (IDRC) of Canada.ReferencesCroft, W. B. and Xu, J. 1995. Corpus-Specific Stemmingusing Word Form Co-occurrences. In Fourth AnnualSymposium on Document Analysis and InformationRetrieval.Krovetz, R. 1993. View Morphology as an InferenceProcess. In the Proceedings of 5th InternationalConference on Research and Development inInformation Retrieval.Porter, M. 1980. An Algorithm for Suffix Stripping.Program, 14(3): 130-137.Thabet, N. 2004. Stemming the Qur’an. In the Proceedingsof the Workshop on Computational Approaches toArabic Script-based Languages.Hussain, Sara. 2004. Finite-State Morphological Analyzerfor Urdu. Unpublished MS thesis, Center for Researchin Urdu Language Processing, National University ofComputer and Emerging Sciences, Pakistan.Sajjad, H. 2007. Statistical Part-of-Speech for Urdu.Unpublished MS Thesis, Center for Research in UrduLanguage Processing, National University of Computerand Emerging Sciences, Pakistan.Ijaz, M and Hussain, S. 2007. Corpus Based Urdu LexiconDevelopment. In the Proceedings of Conference onLanguage Technology (CLT07), Pakistan.Naseem, T., Hussain, S. 2007. Spelling Error Trends inUrdu. In the Proceedings of Conference on LanguageTechnology (CLT07), Pakistan.46

Appendix AC.2 List of Sample Postfixesa ں آ a ا ں a 5A.1 Postfix Rule Exceptional List SamplesPostfixa ت a5 ا a5aSome Exceptional Wordsa ت a5 , 5 رو , 5 ا , 5 ا a5 , 5 ا , 5, 5 ا , 5 aa , 5 a , 5 aa آ a ا , ا ,, ا ,, ا ا a وی aa ب aaaa ر a ذ a وی a5a5 ر aa5aa 5 ا a ں a5a ب a دار a ور ں a ور ں a راوی a ا ِان a آ ب ی Adda 5 a a ی a a ق aaaaa ر aa ا a ان aa ڈو aa رہ a ل a زود aa 5 ا a ک aaa آ aa ز a5aaaaa زی a اۓ aa ا ر aaa آن aaa دم a آ aa5 ا a د a ق aaa رو a 5 آ a آرام a ز aa ادا a ا a ا a5 رو aa ام a اں a5a ڈی a دل 47a وا a ا a ی a5a 5 دا a 5 و a وری a5 ا a ا a ی ر a از ں a5 وا a از a ں a5a ا وز a ں a5a ا وزوں a ں a ے a ں a aa ں a ں a5 aa وے ا a5 ا a ر ا a 5 ا a ز 5 آ aa ز a 5 و a5a واں a ر ں a د a5a5 ر a را ں C.1 List of Sample Prefixesaaaaa ز a اؤں Appendix Ca ر a ے a ں a آز دہ a a ہ a a آز د a ا a a ہ a a ا a ا دہ a a ہ a a ا د a آز دہ a a ہ a a آز د aa5 ار a ں a ورز ں ہ Adda5a آرا ں a ں a دت a a ت a a د aa a ت a a ش a ُ a a ت a a ُّّa ت a a ت a aa 5 آرا a5a ا ں a ا a a ا a a ا a ا ا aa aa ا a a ا a ا a a ا a a ا a aa a ا a aa و د a اؤں a ں Add Character List Samples ا Add ت Adda زی a ز ں a ں Appendix Ba 5 ا a5 ا a از ں a ں aa5a5a ا aa ں a راں A.3 Prefix Global Exception List Samplesaa ں a ی a دار ں A.2 Postfix Global Exception List Samplesaa ن a را aa وا رز a5a 5 د آ a5a 5 ا a5a 5 aaa داری a5a 5 ا a آ ز a آزاری a ری aa دی a5a از a راز a داز a 5 و a ری a ی a5a 5a آ a5

Urdu is rich in both inflectional and derivational morphology. Urdu verbs inflect to show agreement for number, gender, respect and case. In addition to these factors, verbs in Urdu also have different inflections for infinitive, past, non-past, habitual and imperative forms. All these forms (twenty in total)

Related Documents:

Cope & Stick - 500 Series Veneer Doors Miter Frame - 500 Series. Style Band 1 Band 2 Band 3 Band 4 Band 5 101 Glass 10 12 13 16 18 Material Band 1 Band 2 Band 3 Band 4 Band 5 102/104 Glass 12 14 15 18 20 Paint X Notes - PRICES ARE PER SQ/FT Maple Paint X Minimum Dimensions - Doors 7"W x 7"H Natural Soft Maple X

Affairs Group 75: Records of the Osage Agency - Annuity Payment Rolls, 1880-1907 (Roll 1 of 21) Osage Annuity Payment Roll: 1 st and 2nd Quarters of 1880 o Big Chief Band o Joe's Band o Big Hill Band o White Hair Band o Tall Chief Band o Black Dog Band o Saucy Chief Band o Beaver Band o Strike Axe Band o No-Pa-Walla Band

Le Pôle Langues de l’Université de Paris 2 Panthéon-Assas propose des TD facultatifs de langue et de civilisation espagnoles aux étudiants de toutes les filières pour les trois années de Licence, ainsi qu’en Master. Cette formation permet d’obtenir des points dans le cadre de l’UEC2. Pas de cours niveaux débutant.

CEFR level: IELTS band: C1 IELTS band: 8 IELTS band: 7.5 IELTS band: 7 B2 IELTS band: 6.5 IELTS band: 6 IELTS band: 5.5 IELTS : 4.5 IELTS band: 4 IELTS band: 5 978-X-XXX-XXXXX-X Author Title C M Cullen, French and Jakeman Y K Pantoene XXX STUDENT'S BOOK with DVD-ROM WITH ANSWERS B1-C1 The

WIKUS BAND SAW BLADES Bimetal band saw blades Diamond coated band saw blades Carbide band saw blades Carbon steel band saw blades Sales units: coils in fixed lengths and manufacturing coils up to 120 m, depending on the band width, welded-to-length band saw blades Band widths: 4 to 125 mm Constant tooth pitches: 0,75 to 18 teeth per inch (tpi)

TD-HSDPA/HSUPA: 2.8Mbps DL, 2.2Mbps UL EDGE: Multi Slot Class 12 236.8 kbps DL & UL GPRS: Multi Slot Class 10 85.6 kbps DL & UL Frequency Bands: LTE Band B1 (2100MHz) LTE Band B2 (1900MHz) LTE Band B3 (1800MHz) LTE Band B4 - AWS (1700MHz), LTE Band B5 (850MHz), LTE Band B7 (2600MHz) LTE Band B8 (900MHz) LTE Band B12 (700MHz) LTE

Mar 20, 2006 · B-BAND A3T SIDEMOUNT PREAMP WITH B-BAND UST OR AST TRANSDUCER This is a basic installation manual and tip sheet. For more information, technical support, and pictures of installations about all B-Band products please check the B-Band website at www.b-band.com or contact your B-Band dealer, distributor or B-Band directly. Date: 5th May 2007 .File Size: 435KBPage Count: 16

STM32 and ultra‑low‑power. 4 9 product series – more than 40 product lines . proliferation of hardware IPs and higher‑level programming languages greatly facilitates the work of developers. High‑ performance Cortex‑M STM32 F7 Ultra‑ low‑power Mainstream Cortex‑M3 STM32 F2 STM32 L1 STM32 F1 Cortex‑M STM32 F4 STM32 L4 STM32 F3 Cortex‑M M STM32 L0 STM32 F0 STM32 H7 ST .