Automatic Gazetteer Enrichment With User-geocoded Data

2y ago
8 Views
2 Downloads
828.68 KB
8 Pages
Last View : 27d ago
Last Download : 3m ago
Upload by : Pierre Damon
Transcription

Automatic gazetteer enrichment with user-geocoded dataJudith GelernterGautam GaneshHamsini KrishnakumarSchool of Computer ScienceCarnegie Mellon UniversityPittsburgh, PA 15323 U.S.A.Engineering & Computer ScienceUniversity of Texas at DallasRichardson, TX 75080 U.S.A.College of Engineering, GuindyAnna UniversityChennai 600025 @gmail.comWei ZhangSchool of Computer ScienceCarnegie Mellon UniversityPittsburgh, PA 15232 U.S.A.weizhan1@cs.cmu.eduABSTRACTGeographical knowledge resources or gazetteers that are enrichedwith local information have the potential to add geographicprecision to information retrieval. We have identified sources ofnovel local gazetteer entries in crowd-sourced OpenStreetMapand Wikimapia geotags that include geo-coordinates. We createda fuzzy match algorithm using machine learning (SVM) thatchecks both for approximate spelling and approximate geocodingin order to find duplicates between the crowd-sourced tags and thegazetteer in effort to absorb those tags that are novel. For eachcrowd-sourced tag, our algorithm generates candidate matchesfrom the gazetteer and then ranks those candidates based on wordform or geographical relations between each tag and gazetteercandidate. We compared a baseline of edit distance for candidateranking to an SVM-trained candidate ranking model on a citylevel location tag match task. Experiment results show that theSVM greatly outperforms the baseline.Categories and Subject DescriptorsD.2.12 Interoperability – Data mappingGeneral TermsAlgorithmsKeywordsGeographic information retrieval (GIR), gazetteer enrichment,gazetteer expansion, approximate string match, fuzzy match,location, geo-tag1. INTRODUCTIONData mining of local place names may be aided by a geoknowledge resource or gazetteer that includes neighborhoods andPermission to make digital or hard copies of all or part of this work forpersonal or classroom use is granted without fee provided that copies arenot made or distributed for profit or commercial advantage and that copiesbear this notice and the full citation on the first page. Copyrights forcomponents of this work owned by others than the author(s) must behonored. Abstracting with credit is permitted. To copy otherwise, orrepublish, to post on servers or to redistribute to lists, requires priorspecific permission and/or a fee. Request permissions fromPermissions@acm.org. GEOCROWD '13:, November 05-08 2013,Orlando, FL, USA Copyright is held by the owner/author(s). Publicationrights licensed to ACM. ACM 978-1-4503-2528-8/13/11… dmarks. GeoNames is our core gazetteer for being one of themost complete available, but it is not rich in local entries. Thisresearch contributes to enriching a gazetteer. Our researchconsiders how we can automatically create a geo-resource that ismore complete on the local level.Our research questionsa. How do different sources of local geographic informationcompare? We considered OpenStreetMap and Wikimapia.b. Can we compare mined local geographic information to thegazetteer such as to allow approximate matching in namespelling and in spatial coordinates?Others’ approach: sources of local entriesFundamentally, gazetteer entries contain place name andgeographic footprint.1 A number of sources have been used forgazetteer entries, including from paper maps, systematic on-sitecollection, data mining of government, postal or tourist websites,or volunteered geographic information sites.Authoritative toponyms and corresponding coordinates are atlasindexes or national gazetteers, or paper maps that had beendigitized.But scanning from digitized map graphics iscomplicated by image compression and in many maps, text labelsoverlap with each other or with other map features and so areunreadable [6]. Nonetheless, substantial work has been done inwhat is called georeferencing legacy content from maps, to createcrosswalks between cartographic and digital media. The NationalGeographic Maps’ database is one such effort [4].Toponyms have been collected manually by organizations such asthe United Nations Groups of Experts on Geographic Names, andsystematically at local scale by Courage Services, a U.S. companythat provides cultural and geographic research.212Additional attributes per entry might be alternate or colloquial names inthe original language (endonyms), and the same name in otherlanguages (exonyms), place population, and spatial hierarchy s.com/

Volunteered Geographic Information is a promising vernacularsource of geographic, in contrast to the more authoritative, officialsources [12], [9], [14].Many crowd-shared resources such as Wikipedia, Wikimapia andYouTube include geographical metadata. Souza et al. collectedlocal references for Brazil from a national mapping agency [30],the postal company and some local government institutions, buteven that was insufficient, so they included tourist websites.Others have mined places with their geographical extent from webpages [3], or mined places and determined their geographicalextent from references such as postal code [32], or through aspecially made tool such as the Jeocrowd to search user-generateddata sets [19]. Others have asked users to help generate placenames by setting up a separate web platform and asking people toadd place names directly [31], or asking users to photograph everykilometer of the earth's surface and mine the place name tagsindirectly, in the Geograph project3.We restrict our attention to resources that include geographicalcoordinates as well as well as place names, harvesting exampletags from Chennai, India from OpenStreetMap and Wikimapia.4Others’ approach: gazetteer creation and mergeBeard [2] outlines characteristics of a gazetteer based onVolunteered Geographic Information. Peng et al. [24] describe anarchitecture for a digital gazetteer that accepts place names fromWeb 2.0 sources. Kessler, Janowicz, Bishr et al. [18] proposedthat the next generation gazetteer be built on a "wiki" model sothat all entries are user-generated. Their suggestions for a nextgeneration gazetteer infrastructure, such as the ability to harvestplace names in blog posts and the ability to align new data withexisting data is appealing. That is in essence the infrastructure wehave built, although we prefer to retain the existing gazetteer withcore toponyms hand-generated by experts.Current methods build or augment gazetteers automatically bymining the web [26], [11], [8]. Others have experimented withgazetteer building by using social network sources [17], [20].Work has even been done in the field of geo-tagged social media[28]. Our aim is not to try to mine a large number of geographicalinformation sources, as did [25]. Nor is the aim of this research tobuild a comprehensive gazetteer resource, as did [22] and [1].Our aim is to extract and compare the place names with agazetteer, as did [19]. We determine the novelty of each entry forthe purpose of integration into a gazetteer, as did [17], althoughthe Kessler team did not consider the spatial precision of eachextracted geo-tag. Gazetteer enrichment and evaluation work hasbeen performed by [26], [25].Others’ approach: gazetteer merge and fuzzy matchWe enrich an existing gazetteer rather than build our own, so as toretain the core, hand-generated, high quality entries. Our researchis therefore also in distinguishing between what mined entries arethe same and what are different from the current gazetteer.Satisfaction of match between gazetteer entry and toponym hasleaned on name spelling, feature type (park, plaza, etc.), andspatial relationship [33] and semantic relations similarity andtemporal footprint [23].3www.geograph.org.uk4We started with Flickr, too, but the Yahoo-owned Flickr uses the geodatabase from GeoPlanet rather than raw names entered by users.Our fuzzy match algorithm uses spatial constraints and allowssome semantic ambiguity, as does the lingual location searchalgorithm of [15]. The JRC Fuzzy gazetteer is fuzzy in semanticsearch [16]. How to merge the newly-found information with theexisting information has been considered by [23], [29], [5].Examples of merge problems are that the same place may haveentirely different names, or different accepted spellings, or spatialrelations between places might be unclear [21].Martins [23] has the same objective of duplicate detection for agazetteer as we do, and also uses SVM as well as alternatingdecision trees. His accuracy is quite high. Our data sets aredifferent, however, so comparing our results would be misleading.Our method differs from Martins’ in that we automaticallygenerate pairs while Martins has manually generated pairs andclassifies them automatically.2.User geo-coded dataWe opted for mining toponyms and coordinates from Web 2.0mapping applications for the sake of efficiency since tags areclustered in and practicality. OpenStreetMap, which started inLondon and evaluated in London [13] and found to be withinabout 6 m of the position recorded by the Ordnance Survey, whichis considered to have higher overall quality.The tags are added to continually, and occasionally updated byothers in some applications such as OpenStreetMap, makes themuseful, despite the lack of authoritative sponsor. The volume ofdata may compensate for reliability that we could count on from amore authoritative source, such as a map issued by a federalgovernment. That more people have reviewed the information toadd to reliability has been called Linus's Law, after LinusTorvalds, on the principle of open source software [7].Table 1 compares properties of each Web 2.0 apiaGeoNamesLanguage perentrysingle onalSpatialHierarchynonoyesWe extract place names that have come with tags as have beensupplied for OpenStreetMap and Wikimapia. Table 1 comparesproperties of the two applications. Open Street Map is a platformfor contributing and viewing geographic information. Wikimapiais a multilingual, open-content resource where users drop placenames with descriptions and proof links and the optionalphotograph.Tag qualityWe are more assured of tag quality when the tags are duplicatedbetween geographic applications. Table 2 gives example tagsfrom associated with Chennai.

even OpenStreetMap, have city-scale accuracy if GoogleMaps isconsidered the gold standard.An alternative to obtain more precise locations would be to mineaddresses with street names, cities and zip codes, as did [11].However, these addresses will not be rich in vernacular names, asare the sources we use for our study.3500Error in feetTag geographic coordinatesSchockaert [27] allows that spatial boundaries for some placenames may be vague. Tag coordinates from our OpenStreetMapand Wikimapia data might be associated with a spot that is notnecessarily that region’s geographical center. Moreover, userinterfaces from these two applications also introduce imprecision.The input mode for geographic coordinates for OpenStreetMapand Wikimapia is similar in that both provide a map base onwhich the user marks places.Table 2 gives examples of tag types in Web 2.0 sourcesWikimapia or OpenStreetMap TagsMadhavaram Taluk, Thiruvallur District- மாதவரம் வட்டம்,Tag characteristics25002000Loyola College CampusKathipara Flyover or Nehru CircleCIT ColonyKumaran NagarKumaran Nagar 1st Street, GKMcolony,CH-600082500Alternate levelsspecificityFormal and colloquialnamesName onlyEntire addressNoiseAnnanagarAlternate forms(Nagar neighborhood)To test for geographical accuracy, we randomly selected 100 validChennai tags from OpenStreetMap and Wikimapia and comparedtheir coordinates to those in GoogleMaps and also to GeoNames.Surprisingly, we have that not only does OpenStreetMap have amuch higher accuracy than does Wikimapia, it also has a higheraccuracy than GeoNames in comparison to the Google Maps geocoordinates for those same places (Fig. 1).Are the OpenStreetMap coordinates accurate enough for a cityscale map? At 1:24,000 scale, which is city scale, 1/50th of aninch is 40 feet (12.2 meters), which is considered acceptableaccuracy.7 The Fig. 1 results show that none of the sources, not670OpenStreetMapWikimapiaGeoNamesFigure 1 shows the N/S and E/W error in feet for 100randomly-chosen tags for each of our Web 2.0 sites, and fromour GeoNames gazetteer in comparison GoogleMapsShortened FormsGeographical accuracyAccuracy is the degree to which information matches true oraccepted values. Would a map made with toponyms mined fromOpenStreetMap and Wikimapia be accurate?For acceptedvalues, we used Google Maps rather than GeoNames. This wasthe result of a conversation with Mark Wick, the founder ofGeoNames.5 To verify that GoogleMaps was complete enoughfor an experiment, we compared a random sample of 100 Chennaitags from OpenStreetMap and Wikimapia to entries in GoogleMaps.6 We found that 97% were in Google Maps.5E/W error (feet)1000danny thesisAnna NagarN/S error (feet)1500Alternate languagesதிருவள்ளூர் மாவட்டம்Loyola CollegeAverage geo-error for 100 random tags compared to GoogleMaps30003. METHOD3.1 Fuzzy-duplicate detect method and dataThe objective of the algorithm is to determine duplication amongplace name, geo-coordinate matches. Our proposed method findsduplicates among the crowd-sourced data first. Duplicates hereare confirmation of reliability, and help to reduce noise. Ourmethod then looks for duplicates between crowd-sourced data andthe gazetteer to determine which are novel. The algorithmaccounts for the expected user-generated variety in spelling andimprecision in geo-coding in determining whether any two entriesmatch.3.2Fuzzy duplicate detect architectureFig. 2 charts our experimental system for fuzzy duplicatedetection. The algorithm uses Lucene to index the gazetteer. 8We created the features manually, although statistical featureselection methods such as the Lasso regression could be usedalternatively.Before constructing the query, we check the synonym set to seewhether similar entries can be made. Then a query is made fromthe original word form, its bi-gram, tri-gram, and its geocoordinates. A related query is generated from the word with anysynonyms. The weights for the features are determined by aSupport Vector Machine process which uses a Support VectorRegression method.9 This is labeled in Fig 2 as “learning weightsfor the query”.Two separate post-processing pipelines in Fig 2 are labeled“baseline” and “advanced”. The baseline pipeline is languageindependent. Even for languages in which there is no trainingdata—and so cannot use the advanced pipeline—we will be ableto get reasonable results from the baseline.Mark Wick from GeoNames, email to Gelernter, May 21, 2013.Chennai, India; Chiclayo, Peru; Alexandria, EgyptUnited States Geographical Survey, Map Accuracy Standards Fact Sheetfrom 1999, from tml89http://lucene.apache.orgWe used the LibSVM package, in August 2013 athttp://www.csie.ntu.edu.tw/ cjllin/libsvm

The output for the baseline is categorized based on the confidencevalue generated as match (high confidence value), guess match(fairly high confidence), similar (some confidence) and non-match(low confidence). The confidence levels defining the categorieswere set by experimentation. The output for the advanced SVMmethod is either a match or not. Those geo-tags which do notmatch with the gazetteer become novel entries.Our confidence values are therefore between [0, 1]. Thethresholds for the confidence values were determinedexperimentally, and are somewhat data-dependent. Thisis necessary because there are too many entries toexamine by the classifier. This step acts as a pre-The outputs from baseline are “match”, “guess match”, “similar”and “no match (with or without containment)”, with the matchesfrom the baseline in the “similar” category given to a humancurator to judge. The matches generated by the Advanced SVMare just match or no match.3.31.2.3.Overview of the procedureWe pre-process the data before running the algorithm. Preprocessing consists of changing all characters in the data andalso in the gazetteer to lower case, removing the punctuation,and de-accenting the characters. Then we tokenize, and thenrun the match algorithm over the data. The algorithmoriginated in a mis-spell algorithm which aims to find the bestcandidate for a mis-spelled word (described in [10]).We added the ability to consider matches only within a certaingeographic range (here, we used a -0.5 and 0.5 latitude and 0.5 and 0.5 longitude). This could be refined later by addingthe latitude of the city, and also the population density (so thata rural area might have a wider buffer, for example).We ran the fuzzy match algorithm over the extracted Web 2.0data, with the GeoNames as the gazetteer lookup.a. We generated exact matches if the word is exactlymatched with a candidate in the gazetteer.b. We generated partial string matches [example:Adambakkam Police Station in the Web 2.0 data is apartial match with Adambakkam in GeoNames]. If therelationship shows one is a part of the other, output as"containment". These contained places do not matchwith any in the gazetteer.c. A manually-generated knowledge base helps reduce themis-match caused by synonyms that are semanticallyrelated by formally different, such as brook and stream. Ifa word in the entry is contained in the synonymdictionary, we use each synonym word to form differentqueries for the entry.d. We constructed a weighted feature query to search thegazetteer. The weights are generated by training an SVMusing the tagged data. (The tagged data consists of stringpairs and match or non-match tags). The features used totrain the SVM are consistent with the features used toconstruct the query, which are bigram, trigram andcomplete word form. Weights are generated from theSVM.e. Candidates are generated by selecting the top 10 resultsgenerated by the Lucene ranking algorithm, which usesan optimized tf-idf ranking algorithm. We experimentedwith the top 3 results, but decided to use the top 10results as candidates to increase recall.f. We use two methods to rank the gazetteer candidates (1)edit distance as the baseline method, and (2) candidate reranking with SVM.o Our baseline method re-ranks match candidatesusing pure string similarity (edit distance)1 d edit distance target string lengthFigure 2. Chart of major steps in our experimental procedureto arrive at a fuzzy match algorithmselection for potential matches. The algorithm outputs onlythose candidates within threshold to reduce the load formanual curation (candidates marked “similar”).oFor the SVM candidate re-ranking method, wecheck whether every query tag and gazetteercandidate produced by the baseline result is anactual match, which is done by classifying eachquery-candidate pair as match or no-match with aprobability given by SVM. We rank the matchcandidates in descending order, and output only theone with the highest probability.

4.5.g. Attach words from the geo-spatial hierarchy (example:province, country) to entries as preparation for inclusionin the gazetteer.h. Generate a new ID if the entry does not yet exist inGeoNames.Manually determine for the baseline method whether tags inthe “Similar” category are matches actually, whereas theSVM does not need this step. 10For all manually-verified novel entries, add Country,Province/State, City (from bounding box used to extractdata). All instances of containment count as new entries ifthe entry is more specific than an existing gazetteercandidate.3.4 SVM for query weighting and candidatere-rankingBoth the query weighting step and candidate re-ranking steprequire SVM training. These steps correspond to the two leftmost sub-process in Figure 2. The query weighting step uses SVMto help determine the weights of the query features (see section3.3 step 3d), whereas candidate re-ranking uses a wider feature setto re-rank the candidates using classification.SVM Features for query weighting.The query is the extracted geo-tag. The features for queryweighting reflect characteristics of those geo-tags:The coefficients are used to generate weights in the query (SeeTable 3). The coefficients show that the bigram feature is mosthelpful in finding a match between candidate term andgazetteer term. The bigram works better than the word ortrigram matching because it is more tolerant of spelling errors,while it preserves the character order.SVM for candidate re-rankingWe use the query weighting features along with these additionalfeatures to train the SVM model for re-ranking gazetteercandidates generated from step (f) in section 3.3.F1 , F2 , F3 as aboveF4 head matching proportion of geo-tag and candidateTreats matching as a string between letters at the beginning ofthe geo-tag and beginning of the gazetteer candidate, and findsthe longest string match counting from the first character.F5 average head-matching proportion of geo-tag andcandidateHere we tokenize a geo-tag phrase into words first, and then foreach word, we find the longest head-matching length among allthe words in the candidate, and normalize with word length.Finally we take the average among the normalized headmatching scores.F6 normalized geographical distance between geo-tag andcandidate.F1 word-level similarity. This represents the number of thewords that are matched in both the geo-tag and the candidate,divided by the number of words in the geo-tag.F2 The proportion of the matched bigrams divided by thenumber of bigrams in the geo-tag.F3 The proportion of the matched trigrams divided by thenumber of trigrams in the geo-tag.Each of these features aggregates numerous terms in the featurevector for the query. To determine weights for the terms, weborrowed the notion of “long query problem” from ad hocinformation retrieval. One of the ways to solve the problem isto learn the importance of the term for a specific query termvector. However, in our problem, we want to figure out theweight for each group of terms instead of for a specific singlewords, bigram or trigram. So we learn through the SVM theweight of F1 F2 and F3, which helps determine the weights forwords, bigrams, and trigrams.The training data includes 160 location phrases selectedrandomly from the Chennai OpenStreetMap and Wikimapiadata. We used the pipeline on the right-hand side of Figure 2(without SVM) to generate the candidates and manually tageach pair as match or not. We train the SVM regression modelon this data to figure out the coefficients for F1, F2 , F3.Table 3: Weights learned with SVM Regression in LibSVM,with linear kernel C 0.1.Featurecoefficient10F10.124F20.607F3-0.614This probability for baseline match here "Match" is an exact match, and“Guess match” is 85% confidence, "Similar" 70% x 85%, "Nomatch" is x 70%.Then the relevant distance is in the range [0,1]. The reason whywe use 0.5 * 20.5 is because we use a bounding box as thegeographic range, so the longest possible distance between geotag and candidate is not 0.5, but each corner point of the square.F7 edit distance between geo-tag and candidate, which isidentical to the baseline measurementF8 containment, where one entry is contained withinanother entryF9 Soundex code matchSoundex is an algorithm that uses phonetics (sound in speech)to aid in matching. F9 is used to address the vowel mismatchproblem. In Indian English, lots of the places use “oo” insteadof “u”, and “th” instead of “tt”. We use soundex to map thoseinto the same numerical value, to find the equivalence of thesounds.3.5 EvaluationFor the baseline method, we randomly selected 200 geo-tags fromOpenStreetMap and Wikimapia. Each entry consists of a locationword or phrase plus latitude and longitude, presumably of itscentroid. There are four types of output from the baseline: Match,Guess match, Similar, No match. Examples of each are in Table 4,as are the features that we inferred from these and other instances.The algorithm finds a gazetteer match for every tag. The "nomatch" decision is based on a poor gazetteer match with a giventag.We calculated precision and recall for the baseline method for 200randomly-selected tags from the OpenStreetMap and WikimapiaChennai data. If the fuzzy match algorithm found a match withthe GeoNames that it should have found, we considered precisionto be correct. We did not count instances of “guess match” and“similar” in our evaluation because there will be a person to judge

these. For the baseline, our precision .920, recall .830, andF1 .875.From this 200-tag sample, 88.5% are non-matches, that is, novelentries to be added to the gazetteer. Of these, 14% representcontainment (a subset of entries already in the gazetteer), and74.5% are entirely novel.Table 4 examples from OpenStreetMap and Wikimapia, thegazetteer candidate, and the baseline judgment.Crowdsourced tagGeoNamesgazetteerentry foundPerambur13.10, 80.24Perambur13.11 80.24Direct match.MatchPulianthope13.09, 80.26Puliantope13.10, 80.26Editing distance,soundexGuess MatchPurusawalkam13.08, 80.25PeramburLoco Works13.10, 80.22Purasawalkam13.08, 80.25PeramburLocoworks13.10, 80.22Editing distance,soundexEditing distance,averageheadmatchingPerungalathur12.90, 80.09Perunkalattu12.91, 80.08Head matching,soundexPudupakkam13.24, 80.21Madipakkam12.97, 80.20Head matchingAgaram Mel13.03, 80.07Agaram13.03, hContainmentNo 881,2,31,2,3,81,2,3,8,91,2,3,7,8,91,2,3,6,8,9No matchAveragematchingHeadNo match,ContainmentTable 5: SVM with 9 features for gazetteer candidate reranking using a linear kernel vs RBF kernellinearFeaturesGuess MatchFor the Advanced SVM classification method, we used 1768 geotag/candidate pairs, which contains 65 containment (these will beno matches), 69 matches, and 1634 no matches. Table 5 showsthat the features outlined in section 3.4 are effectively encoded inSVM with the RBF kernel.KernelTable 6: Precision, recall and F1-statistic for featurecombinations for candidate re-ranking SVM (RBF kernel).Bold numbers indicate the highest precision, recalland F1 per column.MatchContainmentNo 8430.9840.9980.988F10.8940.9470.9940.988We can see in Table 5 that the RBF kernel surpasses the linearkernel in the overall F1 statistic, however, the linear kernel gives ahigher recall. This recall is important because if there is aduplicate in the gazetteer, we do not want to add a separate,repetitive entry.The containment category is high enough if we use RBF kernel tosubstitute for the high accuracy heuristics in practical system. Forthe no match category, the accuracy is high because unmatchedtraining pairs dominate the training data.We compared different feature combinations for the candidate reranking SVM in Table 6 to test the effectiveness of the separatefeatures.1,2,3,4,5,8,9Table 6 shows that the features outlined in section 3.4, as addedone by one, improve our overall system performance, as shown inthe F1 value. First, we used the features for the query-weightingonly. Although the overall accuracy is high, there is not enoughinformation to accurately distinguish the match from the nomatch. What makes things worse is that containment is notrecognized. In order to address this problem, we addedcontainment feature F8 to the model, which greatly boosted theaccuracy for containment. F9 soundex feature is used to alleviatethe mis-match problem introduced by transliteration error. Nextwe added edit distance feature F7, and the score did not seem toimprove much. The same was the case for distance feature F6.We tried the head matching features F4 and F5 which boosted thescore a lot for the matches and a little bit for containment and nomatches. Finally, we add all the features together, and got thehighest F1 statistic.To conclude, using all the features combined, our candidate reranking SVM achieves higher F1 statistic than the edit distancebaseline. The SVM method alone could be used as an automaticgazetteer expansion method.4. DISCUSSIONThe number of novel matches between the extracted geo-tag setand the gazetteer demonstrate the utility of VolunteeredGeographic Information for gazetteer enrichment, our research

question 1. Our scores demonstrate the effectiveness of ourfeature set and the SVM with an RBF kernel in declaring whethera geo-tag is a match with the gazetteer, research question 2.Our algorithm uses two features which we have not found insimilar research. We use the head matching, average headmatching, and Soundex features. These help us to increase ourF1, as shown in Table 6. Our nine feature advance method toautomatically generate potential gazetteer matches allows thesystem to generate candidates of potentially higher relevance, sothere is a higher possibility that we will find the correct match.This serves to increase recall.Preliminary experiments with Arabic geo-tags using our baselinemethod gave results that are acceptable. This is because thebaseline rests upon editing distance, and editing distance usesstring similarity which is language independent.Our evaluation could be reproduced by downloading a set of geotags (location name geo-coordinates) for Chennai

D.2.12 Interoperability – Data mapping General Terms Algorithms Keywords Geographic information retrieval (GIR), gazetteer enrichment, gazetteer expansion, approximate string match, fuzzy match, location, geo-tag 1. INTRODUCTION Data mining of local place names may be aided by a geo-

Related Documents:

110 Gazetteer & Statistical Memoirs 01-Gazetteer Unit (Plan) 01 Salaries 5.00 02 Wages 0.50 11 Domestic Travel Expenses 0.50 13 Office Expenses 2.82 16 Publications 1.00 26 Advertising and Publicity 0.10 27 Minor Works 2.00 Total: 11.92 02 Gazetteer Unit (Non-Plan) 01 Salaries 31.03 02 Wages 0.63 11 Domestic Travel Expenses 0.50

The Ambala District Gazetteer of 1892 was compiled and published under the authority of Punjab Government in 1893. The Settlement Officers, Messrs. Kensington and Douie supplied the basic material for the gazetteer to Mr. F. Cunningham, Barrister-at-Law,

Chapter 5: Congruent Triangles 128 Enrichment Activities 151 Chapter 6: Relationships Within Triangles 153 Enrichment Activities 182 Chapter 7: Similarity and Trigonometry 189 Chapter 8: Circles 220 Enrichment Activities 244 Chapter 9: Polygons 248 Enrichment Activities 287 Chapter 10: Solids 293 Enrichment Activities 319 Chapter 11: Conics 326

Job enrichment is a strategy used to develop a dynamic and productive work condition in a rapidly changing business environment (Siengthai and Ngarm, 2016). Qualitatively, job enrichment is deemed to have a significant effect in improving organizational commitment. Job enrichment encourages workers to fully utilize their skills and abilities

RPGA's Greyhawk sourcebook, Living Greyhawk Gazetteer, and WotC,’s Gazetteer. I would additionally like to thank the creative efforts of two people very important in Greyhawk’s ressurection, Gary Holian and Fred Weining for their contribution on the Net and in the Gazeteer.

d enabling me to spend a very happy and rewarding year Un1vers1ty for awarding me a e ows 1p an in New 7t:aland. FOREWORD The Cromwell Association's decision to commission as part of its jubilee Commemorations a Gazetteer of Cromwellian Britain is o ?e commend d. C on:iwell's name is attached - rightly or

Appendix D: Cultural Heritage Assets Gazetteer Page 5 of Appendix D Visited by OS (RD) 22 February 1971 This pair of standing stones lies to the NE of the public road (A984) and is unusually situated at the foot of a steep slope at the rear of a terrace overlooking the River Tay. The N

ACCOUNTING 0452/22 Paper 2 October/November 2017 1 hour 45 minutes Candidates answer on the Question Paper. No Additional Materials are required. READ THESE INSTRUCTIONS FIRST Write your Centre number, candidate number and name on all the work you hand in. Write in dark blue or black pen. You may use an HB pencil for any diagrams or graphs. Do not use staples, paper clips, glue or correction .