Bakalárská Práceˇ

1y ago

0 Views

1 Downloads

1.73 MB

56 Pages

Last View : 1y ago

Last Download : 3m ago

Upload by : Konnor Frawley

Report this link

Download PDF

Transcription

Univerzita Karlova v PrazeMatematicko-fyzikální fakultaBAKALÁŘSKÁ PRÁCETomáš KnoppOn the Possibility of ESP Data Use in NaturalLanguage ProcessingÚstav formální a aplikované lingvistikyVedoucí bakalářské práce: Mgr. Barbora Vidová Hladká, Ph.D.Studijní program: InformatikaStudijní obor: Obecná informatika2011

I would like to sincerely thank my advisor Barbora Vidová-Hladká for her immensesupport and advice.Prohlašuji, že jsem svou bakalářskou práci napsal samostatně a výhradně s použitímcitovaných pramenů, literatury a dalších odborných zdrojů. Beru na vědomí, že sena moji práci vztahují práva a povinnosti vyplývající ze zákona č. 121/2000 Sb.,autorského zákona v platném znění, zejména skutečnost, že Univerzita Karlova vPraze má právo na uzavření licenční smlouvy o užití této práce jako školního dílapodle § 60 odst. 1 autorského zákona.V Praze dne 05/08/2011Tomáš Knopp2

ContentsIntroduction61The ESP Game1.1 Why ESP Game? . . . . . . . . . . . . . . . . . . . . . . . . . . .1.2 ESP Game Session . . . . . . . . . . . . . . . . . . . . . . . . . .1.3 Agreement on a Label by Two Individual Players . . . . . . . . . .77792Coreference Resolution2.1 Introduction and Motivation . . . . . . . . . . . . . . . . . . . .2.2 Coreference . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2.3 Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2.3.1 Analysis of the English Data . . . . . . . . . . . . . . . .2.3.2 The Retrieval of Czech Data and Their Analysis . . . . .2.4 Tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2.4.1 Label Viewer Application . . . . . . . . . . . . . . . . .2.4.2 Algorithm for Finding the Pseudo-Coreference Chains . .2.4.3 Chains Viewer application . . . . . . . . . . . . . . . . .2.4.4 Coreferences and Pseudo-coreferences on the ChainsViewer screenshot . . . . . . . . . . . . . . . . . . . . .2.5 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2.5.1 Evaluation Data Description . . . . . . . . . . . . . . . .2.5.2 Comparison of Czech versus English data . . . . . . . . .2.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3Lexical database WordNet and the ESP Data3.1 What is WordNet . . . . . . . . . . . . . . . . . . . . . . . . . .3.2 The Possible Benefits of the ESP Data to the WordNet . . . . . .3.3 Introduction to (Czech) WordNet . . . . . . . . . . . . . . . . . .3.3.1 Definitions . . . . . . . . . . . . . . . . . . . . . . . . .3.3.2 Definition of WordNet . . . . . . . . . . . . . . . . . . .3.3.3 The general types of relations between WordNet synsets according to Wikipedia . . . . . . . . . . . . . . . . . . . .3.10101011111215151620.2122222323.242424252525. 26

3.43.54The Czech WordNet database format . . . . . . . . . . . . . . . .3.4.1 Description of the database file . . . . . . . . . . . . . . .3.4.2 An Excerpt from the Database File . . . . . . . . . . . . .WordNet versus ESP data . . . . . . . . . . . . . . . . . . . . . .3.5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . .3.5.2 Computing the Synonymity of a Set of Labels . . . . . . .3.5.3 Definition of method . . . . . . . . . . . . . . . . . . . .3.5.4 Results of the Czech WordNet versus Czech ESP Data . .3.5.5 Results of the English WordNet versus (English) ESP Data3.5.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . .27272828282930303434Conclusion384.1 Benefits of the ESP Data to the WordNet Database . . . . . . . . . 384.2 ESP Data Use for Automatic Coreference Resolution . . . . . . . . 384.3 Possible Future Improvement and Alternatives . . . . . . . . . . . . 39References554

Název práce: On the Possibility of ESP Data Use in Natural Language ProcessingAutor: Tomáš KnoppKatedra (ústav): Ústav formální a aplikované lingvistikyVedoucí bakalářské práce: Mgr. Barbora Vidová Hladká, Ph.D.E-mail vedoucího: hladka@ufal.mff.cuni.czAbstrakt: Cílem této bakalářské práce je prozkoumat databázi popisků obrázkůze hry ESP z pohledu zpracování přirozeného jazyka. ESP hra je online hra, vekteré lidští hráči konají užitečnou práci - obrázky popisují. Výstupem hry ESP jepak databáze obrázků a jejich popisků. Zajímá nás, zda by data nashromážděnáv průběhu získávání popisků obrázků mohla být užitečná pro úkoly zpracovánípřirozeného jazyka. Konkrétně máme na mysli úkol automatického určování koreferencí, rozšíření lexikální databáze WordNet, zjišt’ování idiomů a zjišt’ováníslovních spojení. V této bakalářské práci se zaměříme na první dva z nich, tedyna použití databáze pro automatické určování koreferencí a na prozkoumání jejíhopotenciálního přínosu lexikální databázi WordNet.Klíčová slova: hra ESP, popisky obrázků, texty vs. obrázkyTitle: On the Possibility of ESP Data Use in Natural Language ProcessingAuthor: Tomáš KnoppDepartment: Institute of Formal and Applied LinguisticsSupervisor: Mgr. Barbora Vidová Hladká, Ph.D.Supervisor’s e-mail address: hladka@ufal.mff.cuni.czAbstract: The aim of this bachelor thesis is to explore this image label databasecoming from the ESP game from the natural language processing (NLP) point ofview. ESP game is an online game, in which human players do useful work - theylabel images. The output of the ESP game is then a database of images and theirlabels. What interests us is whether the data collected in the process of labelingimages will be of any use in NLP tasks. Specifically, we are interested in the tasks ofautomatic coreference resolution, extension of the lexical database WordNet, idiomdetection, and collocation detection. In this bachelor thesis we deal with the firsttwo of them, which is the task of the automatic coreference resolution and the taskof exploring the potential benefits to the lexical database WordNet.Keywords: ESP game, image labels, texts vs. images5

IntroductionThe database of the English labels we work with is an output of the extremelypopular on-line game called the ESP game.1 Figure 1 and a list of words london,man, driver, red, car, people, england, decker, double decker, transportation, ride,double, tour, road demonstrate a sample pair of image and its labels that representthe output of selected ESP game sessions.Figure 2 and a list of words guy, hat, swing, uniform, man,Figure 1: The game, sport, red, hit, bat, ball, houston, black, face, mouth,ESP image 1sports, white, helmet, team, astros, hitter, play, player is another example of an image - label pair from the ESP game.The aim of this bachelor thesis is to explore this image label database from the natural language processing (NLP) pointof view. What interests us is whether the data collected in theprocess of labeling images will be of any use in NLP tasks.Specifically, we are interested in the tasks of coreference resolution, WordNet extension, idiom detection, and collocation detection. In this bachelor thesis we deal with the possible use ofthe image label database for coreference resolution and WordFigure 2: The Net extension.ESP image 21http://gwap.com6

Chapter 1The ESP Game1.1Why ESP Game?The ESP game is basically an online game created for generating and harvestingvaluable descriptions of general images. Getting good description, i.e. good labelsis a hard task for computers. On the other hand, if a human sees some picture, s/heimmediately knows a few expressions, that describe it well. Simply put, the gameuses human players’ intelectual capacity for labeling images. The usual problem isthat one usually has to pay people for performing such a laborous work. However,thanks to the ESP game people do the work voluntarily and for free, because theyenjoy the game.If we realize that the game actually performs a computation because through thegame the players assign labels to images, then we can see the individual players asprocessors and the specifically designed game as an algorithm. The design of thegame guarantees that players ”compute” well. The ESP game is thus an example of ahuman algorithm game, which is a term coined by Luis von Ahn in (von Ahn, 2005).1.2ESP Game SessionA game session is played by two randomly chosen players. There is a timelimit for the session. During the session both the players are presented with thesame series of pictures. Both players always see the same image and when seeing aparticular picture, their task is to try enter such string, that they think their partneris entering. While playing none of the partners is able to see their partner’s guesses.The partners do not know each other and they are unable to communicate. So thetask is they should try thinking as their partner would have and the ESP stands for”extrasensory perception” here.7

Figure 1.1: The player’s online interface. Players have just agreed on a string”japan”.8

Figure 1.2: ESP Game session: players agreeing on a string1.3Agreement on a Label by Two Individual PlayersIf the two players enter the same strings when presented with the picture at sometime during which the picture is shown, they are given points for this agreement.Both the players have to agree on a particular string so that the string becomes alabel for the image since agreement by a pair of independent players implies thatthe string (consequently label) is probably meaningful. It turns out that the task towrite the same string as the partner guarantees (under the game’s conditions) thatthe strings upon which there were agreements are meaningful description to theirimages. It means that such strings are true labels of their images. The Figure 1.2 isilustrating an agreement on a string. It comes from (von Ahn, Dabish, 2004).Each image is associated with a list of taboo words that are not allowed to beentered by the players: the string becomes a label when two players agree on it andconsequently the label becomes a taboo word associated with the image, which willbe used when the system reuploads the image into a new game session. We are notfamiliar with any other restriction put on the strings which players write during thegame but their lengths - up to 13-character-long strings are allowed. A label can bea single-word label (london) or a multi-word label (double decker).9

Chapter 2Coreference Resolution2.1Introduction and MotivationIn this chapter, we are focusing on the task of automatic coreference resolution;the issue we tackle is whether the image labels can be a help for this task. Wespeculate that there can be labels among the image labels that co-refer, i.e. theyrefer to the same entity. We have not found any paper that addresses the same issueso far. So not only given that, it is almost impossible to predict the results beforeperforming experiments.This chapter is organized as follows: we remind the notion of coreference in Section 2.2. In Subsection 2.3.1 we present statistics on the ESP game image labels. Theapplication designed for user-friendly viewing the labels and their relations is introduced in Subsection 2.4.1. The algorithm to construct pseudo-coreference chains inthe text is provided in great details in Subsection 2.4.2. The application for viewing and comparing the coreference versus pseudo-coreference chains is describedin Subsection 2.4.3. Evaluation of pseudo-coreference chains against manual annotation is discussed in Section 2.5. We conclude with Section 2.6.2.2CoreferenceCoreference occurs when several referring expressions in a text refer to the sameentity (e.g. person, thing, fact). A coreferential pair is a pair of the referring expressions. A sequence of coreferential pairs referring to the same entity in a text formsa coreference chain. In the passage from (Doyle, 1887), one can read the following coreference chain I, I, me, I, me man; another coreference chain is someone,Stamford, who, dresser can be seen there: On the very day that I had come to thisconclusion I was standing at the Criterion Bar, when someone tapped me on theshoulder, and turning round. I recognized young Stamford who had been a dresser10

under me at Barts. The sight of a friendly face in the great wilderness of London isa pleasant thing indeed to a lonely man.2.32.3.1DataAnalysis of the English DataThe ESPgame100k data (”data”)1 we inspect throughout our project come fromthe ESP Game (von Ahn, Dabish, 2004)2 as described in Chapter 1As a result of the ESP game sessions, there is a label-set for each of the imagesthat have already been assigned some labels in the game and these label-sets arestored in the game’s database together with the corresponding image. The data,which we use as the input for our project, is a sample part of the whole databasethat consists of 100,000 image-labels pairs.3Thus for each image in the data there is a non-empty label-set. A label-set isa set of labels related to the same image, like car, decker, double, double decker,driver, england, london, man, people, red, ride, road, tour, transportation.Neighbors of a label lab is a set of labels which includes all labels from alllabel-sets in which the label lab occurred. So the neighbors of double decker include all labels from the label-set from Figure 1 including lab itself, i.e. car, decker,double, double decker, driver, england, london, man, people, red, ride, road, tour,transportation, but the neigbors of lab also include labels from other label-sets, forinstance cloud, tire, wheels because each one of them co-occur together with lab inat least one label-set. So we can say that for instance wheels is a neigbor of lab.f req(w1 , w2 ) stands for the number of label-sets where the labels w1 , w2 cooccur together. We call it neighborhood frequency. Table 1 provides statistics acquired from the data, i.e. from the ESPgame100k sample.According to the game designers, the game server is equipped with an English dictionary to alert players when they have misspelled a word. However,we discover that not always is the label a good description of the particularimage. It can be misspelled, which is frequent, or it even does not have tobe English. Some examples of the various spelling errors are havent or family fued. Examples of non-English labels are zeitungen, nuestra or zukunft. Weuse the open source spell checker Aspell4 to discover such errors. However,we do not have to handle them in a special manner since our procedure to construct pseudo-coreference chains in texts does not take them into account anyway.1http://www.cs.cmu.edu/ biglou/resources/http://gwap.com3We do not know exactly what good label threshold X was used for the images presented in thedata. According to literature (for ex. (von Ahn, 2005)), we suppose X 1.4http://aspell.net/211

# of unique label-setsaverage # of labels in label-setsthe biggest label-set sizethe smallest label-set size# of unique labels# of unique single word labels# of unique multi word labels# of unique le 2.1: The ESPgame100k data: statisticsWe also tagged the labels with Stanford Log-linear Part-Of-SpeechTagger (Toutanova, Klein, Manning, Singer, 2003) to discover foreign words.2.3.2The Retrieval of Czech Data and Their AnalysisWe translate the original data from English to Czech using The CzengProbability Dictionary5 (”dictionary”). For a given English word and itspart of speech tag (POS tag) the dictionary contains different Czech words withPOS tags and with probabilities P that the Czech word and its POS tag is a goodtranslation for the English word and its POS tag. We call these probabilities translation probabilities. As you can see in Table 2.2, the Czech translations are sorted bythe translation probabilities P .The Algorithm for Translating Label-sets into CzechEnglish label-sets are translated into Czech label by label.One English label can bear more morphological meanings mainly because ofthe common conversion in English (”run” as a verb or noun). Because the labelsare not part of a broader context like sentence, we cannot say for sure which POSthe original label is. Since we want to preserve as many possible meanings of theoriginal label as possible in the translated label-set, but still we do not want thetranslated label-sets to be too large either, we chose to translate the original labeltogether with the POS tag.In the dictionary, we look up the English label lab that we want to translate intoCzech. The label might appear in the dictionary with more POS tags as a differentpart of speech. In Table 2.3 we can see an example of looking up the label twelve.5It is a dictionary that was extracted from the parallel corpus CzEng by Zdeněk Žabokrtský atUFAL, Charles University in Prague in 2008.12

pský#AAgroSciences#Nspolečnost#NAgroSciences#N 6840.0086480.5020780.497922Table 2.2: Example of the dictionary formatWhen translating twelve in this example we get one translation for twelve as acardinal number and another translation for twelve as a noun.6To translate the combination of label and a POS tag we always pick the mostprobable Czech translation without further investigations. In our case the translations are dvanáct and dvanáctka and they are highlighted in Table 2.3.When translating the label-set into Czech, we simply substitute the English label with a list of Czech translations as described in the previous step. The Englishmulti-word labels are currently not translated, unless they appear in the Czengdictionary as a whole expression.7Example: Translating Label-setsIn this example we illustrate the translation in which the English label-set{asian, background, bag, blue, desk, desktop, face, girl, keyboard, kid, moni6It would also be possible to translate only the one combination of label and its POS tag whichhas the highest translation probability. This would obviously ensure that we always get only onetranslation for a label. In the example it would be dvanáct, because P (dvanáct) P (dvanáctka).However, it would also mean that we would not take the other part of speech types into account atall. As we want to keep as many morphological meanings as possible, we better decided to have anEnglish label translated by more Czech labels, than to lose meanings when choosing only the mostprobable part of speech.7Another method would be to tear apart the English multi-word labels, translate them word byword and then put them together. But this has no sense considering that the algorithms we test onthe ESP data work in fact only with single-word labels.13

en#C0.032349twelve#C dvanáctiměsíční#A e 2.3: Looking up the label twelve in the dictionarytor, mouse, pc, picture, red, screen, smile, wallpaper, white, windows, woman}is translated into the Czech label-set {asijský, Asie, pozadí, společenský, pytel, sebrat, modrý, stůl, plocha, čelit, tvář, dívka, holkařit, klávesnice, legrace,zhýčkaný, dítě, sledovat, monitor, pohyb, myš, PC, obrázek, představovat, věrná,červený, methylčerveně, prověrka, obrazovka, usmát se, úsměv, tapetovat, tapeta,bílý, bezvousý, okna, žena}.Table 2.4 shows parts from the dictionary used for translating the individualEnglish labels from the label-set into Czech. This excerpt is shortened. For a combination of a label and its POS tag only the most probable Czech translation (theone actually used for the translation of a label and its POS tag) is shown here andthe other translations were ommited from this excerpt of the dictionary.The first label in the original label-set is asian. So the algorithm looks it upin the dictionary and finds there are multiple translations for asian as an adjectiveand multiple translations for asian as a noun (only the most probable of them aredisplayed in Table 2.4). The algorithm takes the most probable translation for asianas a noun and as an adjective, i.e. the two words asijský and Asie. They both becomethe translation for the original label asian.In the same fashion, background as an adjective is translated into společenskýand background as a noun is translated into pozadí; white as an adjective and as anoun are translated into bílý and white as a verb is translated into bezvousý.88Of course, this is a nonsense. It only prooves that the dictionary is not flawless, because it hadbeen extracted by a program from the CzEng parallel corpus.14

ian#NAsie#N0.389752background#A společenský#A e 2.4: The excerpt from the dictionary used for translating the label-set. Shortened.2.42.4.1ToolsLabel Viewer ApplicationThe Label Viewer is an application designed for user-friendly viewing theEnglish or Czech labels and their relations. It has three main parts as depicted inFigure 2.1: part 1 - list of labels. Labels from the ESP data are displayed here. Eachlabel has its id and frequency. Frequency means the number of label-setscontaining the label. The label cathedral is selected in part 1 of the example. part 2 - list of label-sets. Selecting a label from part 1 reveals the list of id’sof label-sets in which the label (i.e. cathedral) occurred (part 2 - left column).One can click on a label-set id to inspect the label-set’s labels (part 2 - rightcolumn) and the original picture which belongs to the viewed label-set. In theFigure 2.1 the label-set with id 493 is shown. part 3 - list of neighbors. In part 3, the neighbors for a selected label (i.e.cathedral) are shown in the neighbors column. The frequency hereis the number of label-sets in which the selected label (cathedral) and theselected neighbor (spire in the right column in part 3) co-occurred together,i.e. f req(w1 , w2 ) described in 2.3.1. In our example 56 means that labelscathedral and spire co-occured together in 56 different label-sets.15

Both lists of labels and neighbors can be resorted alphabetically or by frequency.By default, labels are sorted alphabetically and neighbors by frequency.2.4.2Algorithm for Finding the Pseudo-Coreference ChainsWe take an input text and we find the pairs of ’semantically related’ ESP labelsin this text. We call the pairs of ESP labels in the texts pseudo-coreference pairs.Then we construct pseudo-coreference chains from them.We know how many times two labels co-occur together in the data, i.e.how many times they label the same images. We measure the degree of being semantically related labels by setting a neighborhood threshold Z.Both the pseudo-coreference pairs and chains are depending on the variableneighborhood threshold Z.For a given Z, we search for the pseudo-coreference chains in these five steps.The algorithm is ilustrated with on an example.1. We have Penn Treebank POS tags9 for each word in the text. Sincewe suppose that the candidates for coreference are Wh-determiners, Whpronouns, nouns, personal pronouns and cardinal numbers, the words withthe Penn Treebank POS tags WDT, WP, N, PRP, CD, respectively, become the candidates to be members of coreferential chains; thus weunlock them for the next steps while we lock the words with other POS tags.In this example of input text, the unlocked words are highlighted.”There was a modern brown building with an oldantique arch above the door. The arch didn’t fit tothe design of the building so much that I thoughtthe architect who projected it must have been madto use it here.”.2. On the ESP data layer, we iterate through label-sets and take each label-set’slabels as nodes V of a graph G (V, E) in which edges E are defined so:e (v1 , v2 ) E if f req(v1 , v2 ) Z. We call this graph neighborhoodgraph.In Table 2.5 there are label-set’s neighborhood frequencies, from whichwe construct the neighborhood graph as described above (see the resulting neighborhood graph in Table 2.6 and in Figure 2.2). Because neighborhood of labels is a symetrical relation, the neighborhood graph is undirected.9http://www.ling.upenn.edu/courses/Fall 2003/ling001/penntreebank pos.html16

Figure 2.1: The screenshot of the label viewer program17

Z 500 arch oroldwoodbuilding cathedral church41810823511489329369152037532762001187door old wood232 15666528 1056 1482771 8363643610914172 297692248 3173095866 3183056Table 2.5: Neighbor frequencies in a label-set.Z 500 arch brown oldwoodcathedral church door old0000001101110000100101wood01000001Table 2.6: Neighborhood graph of a label-set given by matrix.3. For each neighborhood graph we start the algorithm for finding connectedcomponents (Hopcroft, Tarjan, 1973). This algorithm computes the neighborhood graph’s components: G1 (V1 , E1 ), .Gn (Vn , En ). These components are the costituting units of pseudo-coreference chains.The graph in Figure 2.2 has three components whichare defined by these sets of vertices V1 {arch}, V2 {brown, building, church, door, old, wood} and V3 {cathedral}.4. We now have to go down from the layer of labels to the layer of the inputtext. A label can appear in the text 0, ., nlabel times. So for a neighborhoodgraph we take each of its components Gi (Vi , Ei ) from the previous stepand propagate the labels from that component Gi to the textual layer. For eachvertex v (i.e. label) we save the relevant occurences of the label v as a wordin the input text. If label v is not present in the text, we save nothing, if it ispresent 1, ., nv times, we save only the relevant occurences (i.e. those thathave been unlocked in the first step of the algorithm). So as a result for each18

Figure 2.2: Neighborhood graph of a label-setof the component Gi (Vi , Ei ) of the neighborhood graph we get a list ofoccurences (unlocked words) of the vertexes Vi (i.e. labels) in the input text.The resulting lists from the example are List1 {arch, arch} andList2 {building, building, door}.5. We filter out lists of occurences with size 1. The last step is to resort thelists of occurences, so that the pseudo-coreference chains are well ordered.A list of words is well ordered, if for the list l w1 , ., wn it is true that i 1, ., i : pos(wi ) pos(wi 1 ), where pos(wi ) is the function thatreturns the word’s position within the input text. The sorted lists with size 2 are the pseudo-coreference chains.Both the lists from the previous step contain at least two words, and afterordering we get these pseudo-coreference chains: Chain1 {arch, arch}and Chain2 {building, door, building}.Summing up the Algorithm1. Choose a threshold Z and iterate through label-sets.2. Take the labels from the current label-set (lset) as nodes of a graph, wheretwo labels (i.e. vertex) lab1, lab2 are adjacent if f req(lab1, lab2) Z (socalled neighborhood graph).3. Find the connected components of the neighborhood graph. Iterate throughthose components.4. For labels of a component c find unlocked occurences of those labels in theinput text. These well ordered occurences (if there are at least 2 of them) forma pseudo-coreference chain.19

Computing the pseudo-coreference chains - time complexityThere are n 100, 000 label-sets in the data, let the biggest label-set’s size bemax. f req(lab1, lab2) can be determined in O(1) (the whole neighborhood relationis initialized only once after the program starts). For each label-set we construct theadjacency matrix in max2 O(1) O(1). The algorithm for finding connectedcomponents is linear in the number of vertex, which has upper bound max. So foreach label-set, we construct the matrix in O(1) and then in O(max) O(1) we findthe connected components. So finally we get linear time O(n) in the #label-sets inthe data to count all the connected components.Now getting to the textual layer, suppose that there are m max different labelsin a particular label-set that have to be expanded into pseudo-coreference chains onthe textual layer.Let the text length be k. Then each label can theoretically appear on k positions.So while propagating to the textual layer, we get to much worse complexity class.k k. k k max , which yields exponential time O(k max ) for propagating a label,resp. a connected component from a label-set (measured by the length of the inputtext). The whole time complexity of counting the pseudo-coference chains for datawith n label-sets and maximum size of the label-set max is thus O(n k max ).This does not matter in this project, because here we were testing the algorithmon very small input texts where only excerpts of a length of 100 sentences wereused. The exponential complexity here is due to not having a limit or restriction onhow far from each other two pseudo-coreference words can be. Here the domaineused is the whole text. If our method turned out to be of any use, it would be simpleto introduce such rule and limit the domaine to e.g. a paragraph, which size is presumably not dependent on the length of the input text and can be taken as a constant.If we limit the size of paragraph in the input text to para, then the resulting timecomplexity is linear O(n paramax ) O(n).2.4.3Chains Viewer applicationThe Chains Viewer program is an application designed for computing thepseudo-coreference chains over a sample Czech or English text in

ilustrating an agreement on a string. It comes from (von Ahn, Dabish, 2004). Each image is associated with a list of taboo words that are not allowed to be entered by the players: the string becomes a label when two players agree on it and consequently the label becomes a taboo word associated with the image, which will

Related Documents:

BAB 1 PENDAHULUAN 1.1 Pengenalan

Pengajaran mikro merupakan peluang kepada bakal-bakal guru mengaplikasikan segala prinsip dan teknik serta teori yang perlu dikuasai untuk melaksanakan sesuatu proses pengajaran dan pembelajaran di dalam kelas. Pelajar akan dibimbing dan didedahkan untuk menyediakan perancang

15 Views

3y ago

%07 ,3/0 %/ 2-0 7 &/' '-$ 7 04%.(!7 0%.''-617 )7 7 7 7 7 7

. 0 8 53 &3 , 0 8 8 0."8 8 '3 8 87.,8 8 /ZS JSkqZU[y[SksfX wZkR MsU ,Mo[o RSk Uh XSfRSf 9k[qSk[Sf RSk 2hXkM-Skq -hfrkMQq khhU

2 Views

2y ago

OptiSystem Applications - BER Analysis BPSK RS FEC - Optiwave

The BER results for three common RS coding schemes (RS4, RS8, RS16) can be seen in the file BER Analysis BPSK FEC - Export Excel.xlsx . Prior to running a simulation these coding schemes can be set from the OptiSystem global parameters RSN and RSK. For example for RS16, RSN 255 and RSK 223

13 Views

7m ago

Kumpulan Istilah Pertanian - WordPress.com

Elradhie Nour Ambiya Kumpulan Istilah Pertanian 3 Basa adalah senyawa yang mendapatkan ion OH-dalam larutan (H 2 O). BCR (Benefit cots ratio) adalah perbandingan antara ekivalensi nilai dari manfaat yang terkandung pada suatu proyek. Benih adalah bakal bij yang telah dibuahi, terdiri atas embrio yang dilindungi oleh kulit biji yang berasal dari integument.

44 Views

3y ago

Cabaran Guru-Guru Pelatih Tahun Tiga Sarjana Muda ...

Kursus yang ditawarkan terdiri daripada subjek teras dan subjek elektif. Oleh kerana pelajar-pelajar ini merupakan bakal guru, mereka perlu menjalani latihan pengajaran mikro sebagai prasyarat sebelum keluar menjalani Latihan Mengajar yang sebenar di sekolah. Dalam latihan perguruan,

49 Views

3y ago

Pejantan Super - Toko Buku Nulis Buku

kalau membaca majalah-majalah bergambar cewek-cewek berbikini tersebut. Kalau sudah memegang majalah itu, dia tak akan bisa diganggu. Kalau ada yang berani mengganggu, siap-siap saja tak bakal dapat pinjaman majalah porno. Koleksi majalah pornonya Fauzi memang kelewat banyak. Ada saj

38 Views

2y ago

Engkaulah Adiwiraku - IASC

guru dan kanak-kanak dari 104 buah negara. Satu tinjauan global dalam Bahasa Arab, Inggeris, Itali, Perancis dan Sepanyol telah dilaksanakan bagi menilai keperluan mental, kesihatan dan psikososial kanak-kanak semasa wabak COVID-19. Satu kerangka topik-topik yang bakal dib

45 Views

2y ago

STRATEGI INDONESIA DALAM MENGHADAPI KONFLIK LAUT ... - Universitas Nasional

pada Konferensi Tingkat Tinggi (KTT) Amerika Serikat (AS) dan ASEAN yang . Penerapan Strategi Militer dan Pertahanan Jika dalam jangka pendek maupun panjang konflik Laut China Selatan tidak mendapatkan solusi yang tepat, maka, ketahanan nasional pun bakal terganggu. Hal tersebut selaras dengan Undang-undang No. 3 Tahun 2002

12 Views

1y ago

Recent Views

TENTH EDITION self-therapy for the stutterer

Stuttering Foundation of America self-therapy for the stutterer TENTH EDITION THE STUTTERING FOUNDATION PUBLICATION NO. 0012 self-therapy for the stutterer Publication No. 0012 First Edition—1978 Tenth Edition—2002 Revised Tenth Edition—2007 Published by Stuttering Foundation of America 3100 Walnut Grove Road, Suite 603 P.O. Box 11749 Memphis, Tennessee 38111-0749 Library of Congress .

3y ago

40 Views

Supply Chain Management: An International Journal

The organization is a partner of the Committee on Publication Ethics (COPE) and also works with Portico and the LOCKSS initiative for digital archive preservation. *Related content and download information correct at time of download. Downloaded by University of Nottingham At 06:12 31 October 2018 (PT) Modern slavery challenges to supply chain management Stefan Gold International Centre for .

3y ago

29 Views

Operation London Bridge - Fremington Parish Council

OPERATION LONDON BRIDGE . 1 CONTENTS Page 2 – 1. Introduction Page 3 – 2. Protocol Page 3 – 2.1 Implementation of Protocol Page 3 – 3. Flag Flying Page 3 – 4. Proclamation Day Schedule Page 4 – 4.1 Proclamation Day Page 4 – 4.2 Proclamation Day Protocol Page 5 – 5. Books of Condolence Page 6 – 5.1 Online Book of Condolence Page 6 – 6. Events During the Period of Mourning .

3y ago

62 Views

A CONTINUUM OF QUALITY: ON FIRE

ASTM D 5132 BSS 7230 MODEL 701-S MODEL 701-S-X (export) MODEL VC-1 MODEL VC-1-X (export) MODEL VC-2 MODEL VC-2-X (export) MODEL HC-1 MODEL HC-1-X (export) MODEL HC-2 MODEL HC-2-X (export) FAA Listed TM. FAA MULTI-PURPOSE SMALL SCALE FLAMMABILITY TESTER SPECIFICATIONS: FAR Part 25 Appendix F Part I (Vertical, Horizontal, 45 and 60 ) DRAPERY FLAMMABILITY The most widely cited .

3y ago

80 Views

Combustion Analysis of Nanoenergetic Materials

Osci 1 05 10 15 P a [MPa] Acc Osci. NEEM MURI Temperature Measurements for understanding Gas Generation Previous work: gas fraction at equilibrium Drawbacks: No intermediate gases (not present at equilibrium) nAl/MoO 3 30 Many of the equilibrium gases will not be realized until very high temperatures (ex. Cu: BP of 2835K) nAl/CuO in burn tube at 10 20 e ssure [MPa] 1atm in air nAl/MoO .

3y ago

37 Views

Wiring and testing electrical equipment and circuits

circuits to occur, strain on terminations, insufficient slack cable at terminations, continuity and polarity checks, insulation checks) K21 the care, handling and application of electrical test and measuring instruments (such as multimeter, insulation resistance tester, loop impedance test instruments) K22 applying approved test procedures; the safe working practices and procedures required .

3y ago

46 Views

GRID DIP METER DESIGN - makearadio

circuits). 2. Rough frequency and harmonic measurements 3. AM signal monitor receiver. 4. Simple RF signal generator including AM modulation if required. 5. Crystal Testing. 6. Use as a BFO for SSB and CW reception 7. Measurement of unknown capacitors and inductors I decided to include some extra features above the normal in functionality RF output from the oscillator enabling use of an .

3y ago

208 Views

OPHTHALMOLOGY GOALS AND OBJECTIVES

The objectives of Ophthalmology Residency Program are to: 1. Provide residents with a strong scientific understanding of the fundamentals of ophthalmology through a combination of mentoring and didactic education. 2. Provide residents with clinical skills in all subspecialties of ophthalmology. 3.

3y ago

60 Views

History of Computers

An analog computer does not store information digitally Values are stored as voltage levels Analog computers are particularly useful solving nonlinear simultaneous differential equations An electric circuit can be defined by an equation. An analog computer is programmed by creating a circuit that follows a desired equation.

3y ago

37 Views

Risk Management and Corporate Governance - OECD

Corporate Governance Risk Management and Corporate Governance Volume 2011/Number of issue,Year of edition Author (affiliation or title), Editor Tagline Groupe de travail/Programme (ligne avec top à 220 mm)

3y ago

66 Views

RF Design and Test Using MATLAB and NI Tools

RF Design and Test Using MATLAB and NI Tools . Antenna array, RF, and digital signal processing cannot be designed separately! – Large communication bandwidth digital signal processing is challenging – High-throughput DSP linearity requirements imposed over large bandwidth

3y ago

87 Views

Digital Signal Processing - Webspaces - Accueil

J.-P. Delmas et al. / Digital Signal Processing 95 (2019) 102579. lower far-ﬁeld DOA CRB. Furthermore, thanks to the decoupling be-tween the DOA and range parameters to the second-order w.r.t. the inverse of the range in the Fisher information matrix, the deriva-tion of closed-form approximate expressions of the CRB is greatly simpliﬁed.

3y ago

23 Views

History of U.S. Children’s Policy, 1900-Present

Social dislocations of the late 19th century, sparked by rapid industrialization, population growth, urbanization, and immigration, together with the economic crises of the late 1870s and 1890s, led to social reform movements in the 1890s and during the Progressive Era at the beginning of the 20th century. With respect to children, many reformers

3y ago

53 Views

EDUKASYONG PANGKATAWAN 5 Lesson Exemplars Karapatang Ari .

nakasaad sa ilalim ng makabagong kurikulum, ang K to 12 Currriculum. Layunin nito na mabigyan ng sapat na kaalaman at pagpapahalaga sa mga gawaing may kinalaman sa pagpapaunlad ng pangangatawan. Sa paghahanda ng mga aralin na nakapaloob sa exemplar na ito, isinasaalang-alang ang mga sumusunod na pangunahing kaisipan:

3y ago

99 Views

ELECTRICAL ENGINEERING GRADUATE

Electrical Engineering, or is not equivalent to the BSEE degree offered by Cal State LA, we may require you to complete certain prerequisite courses before being admitted to our program. These will normally be 300level courses, though the list mig0- ht contain a number of 2 or 400000-0-

3y ago

30 Views

Bakalárská Práceˇ

It looks like you're using an ad-blocker