A Path-based Transfer Model For Machine Translation

2y ago
2 Views
2 Downloads
200.08 KB
6 Pages
Last View : 1m ago
Last Download : 3m ago
Upload by : Nora Drum
Transcription

A Path-based Transfer Model for Machine TranslationDekang LinDepartment of Computing ScienceUniversity of AlbertaEdmonton, Alberta, Canadalindek@cs.ualberta.caAbstractWe propose a path-based transfer model formachine translation. The model is trained witha word-aligned parallel corpus where thesource language sentences are parsed. Thetraining algorithm extracts a set of transferrules and their probabilities from the trainingcorpus. A rule translates a path in the sourcelanguage dependency tree into a fragment inthe target dependency tree. The problem offinding the most probable translation becomesa graph-theoretic problem of finding theminimum path covering of the sourcelanguage dependency tree.1IntroductionGiven a source language sentence S, a statisticalmachine translation (SMT) model translates it byfinding the target language sentence T such that theprobability P(T S) is maximized. In word-basedmodels, such as IBM Model 1-5 (Brown et al1993), the probability P(T S) is decomposed intostatistical parameters involving words. There havebeen many recent proposals to improve translationquality by decomposing P(T S) into probabilitiesinvolving phrases.Phrase-based SMT approaches can be classifiedinto two categories. One type of approach workswith parse trees. In (Yamada&Knight 2001), forexample, the translation model applies threeoperations (re-order, insert, and translate) to anEnglish parse tree to produce its Chinesetranslation. A parallel corpus of English parse treesand Chinese sentences are used to obtain theprobabilities of the operations.In the second type of phrase-based SMT models,phrases are defined as a block in a word alignedcorpus such that words within the block are alignedwith words inside the block (Och et al 1999,Marcu&Wong 2002). This definition will treat asphrases many word sequences that are notconstituents in parse trees. This may looklinguistically counter-intuitive. However, (Koehnet al 2003) found that it is actually harmful torestrict phrases to constituents in parse trees,because the restriction would cause the system tomiss many reliable translations, such as thecorrespondence between “there is” in English and“es gibt” (“it gives”) in German.In this paper, we present a path-based transfermodel for machine translation. The model istrained with a word-aligned parallel corpus wherethe source language side consists of dependencytrees. The training algorithm extracts a set of pathsfrom the dependency trees and determines thetranslations of the paths using the word alignments.The result of the training process is a set of rulesfor translating paths in the source language intotree fragments in the target language with certainprobabilities. To translate a sentence, we first parseit and extract a set of paths from its dependencytree S. We then find a set of transfer rules thatcover S and produce a set of tree fragmentsobtained to form a tree T* such that T* argmaxTP(T S). The output sentence can then simply beread off T*.In the remainder of the paper, we first definepaths in dependency trees. We then describe analgorithm for learning transfer rules and theirprobabilities. The translation algorithm ispresented in Section 4. Experimental result ispresented in Section 5. We then discuss relatedwork in Section 6.2Paths in Dependency TreesThe dependency tree of a sentence consists of a setof nodes, each of which corresponds to a word inthe sentence. A link in the tree represents adependency relationship between a pair of words.The links are directed from the head towards themodifier. Except the root of tree, every node hasexactly one incoming link. An exampledependency tree is shown in Fig. 1.subjobjdettodetJohn found a solution to the problem.Figure 1. An example dependency tree

A sequence of nodes n1, , nk, nm and thedependency links between them form a path if thefollowing conditions hold:a. i (1 i k), there is a link from ni 1 to ni.b. i (k i m), there is a link from ni to ni 1.A set of paths is said to cover a dependency treeif the union of the nodes and links in the set ofpaths include all of the nodes and links in thedependency tree.3Acquisition of Transfer RulesA transfer rule specifies how a path in the sourcelanguage dependency tree is translated. We extracttransfer rules automatically from a word-alignedcorpus. For example, Fig. 2(b-g) are some of therules extracted from the word aligned sentence inFig. 2(a). The left hand side of a rule is a path inthe source dependency tree. The right hand side ofa rule is a fragment of a dependency tree in thetarget language. It encodes not only thedependency relations, but also the relative linearorder among the nodes in the fragment. Forexample, the rule in Fig. 2(e) specifies that whenthe path Connect to controller is translatedinto French Branchez precedes (but not necessarilyadjacent to) sur, and sur precedes (but notnecessarily adjacent to) contrôleur.Note that the transfer rules also contain word-toword mapping between the nodes in the source andthe target (obtained from word alignments). Thesemappings are not shown in order not to clutter thediagrams.Connect both power cables to the controllerBranchez les deux câbles d' alimentation sur le contrôleur12 34567 8 9(c)(d)(e)both cablespower cablesConnect cablesConnect to controllerdeux câblescâbles d' alimentationBranchez les câblesBranchez sur contrôleur(f)Connect cables to controllerBranchez les câbles sur contrôleur(g)Connect to the controllerBranchez sur le contrôleurFigure 2. Examples of transfer rules extractedfrom a word-aligned corpusSpansThe rule extraction algorithm makes use of thenotion of spans (Fox 2002, Lin&Cherry 2003).Given a word alignment and a node n in the sourcedependency tree, the spans of n induced by theword alignment are consecutive sequences ofwords in the target sentence. We define two typesof spans:Head span: the word sequence aligned with thenode n.Phrase span: the word sequence from the lowerbound of the head spans of all nodes in thesubtree rooted at n to the upper bound of thesame set of spans.For example, the spans of the nodes in Fig. 2(a) arelisted in Table 1. We used the word-alignmentalgorithm in (Lin&Cherry 2003a), which enforcesa cohesion constraint that guarantees that if twospans overlap one must be fully contained in theother.Table 1. Spans of nodes in Figure )(b)3.1Head Span[1,1][3,3][6,6][4,4][8,8][9,9]Phrase tion AlgorithmFor each word-aligned dependency tree in thetraining corpus, we extract all the paths where allthe nodes are aligned with words in the targetlanguage sentence, except that a preposition in themiddle of a path is allowed to be unaligned. In thedependency tree in Fig. 2(a), we can extract 21such paths, 6 of which are single nodes(degenerated paths).We first consider the translation of simple pathswhich are either a single link or a chain of twolinks with the middle node being an unalignedpreposition. An example of the latter case is thepath Connect to controller in Fig. 2(a). In suchcases, we treat the two dependency link as if it is asingle link (e.g., we call “Connect” the parent of“controller”).Suppose Si is a simple path from node h to nodem. Let h' and m' be target language words alignedwith h and m respectively. Let s be the phrase spanof a sibling of m that is located in between h’ andm’ and is the closest to m’ among all such phrasespans. If m does not have such a sibling, let s bethe head span of h.

The translation Ti of Si consists of the followingnodes and links: Two nodes labeled h' and m', and a link from h'to m'. A node corresponding to each word between sand the phrase span of m and a link from eachof these nodes to m’.Fig. 2(b-e) are example translations constructedthis way. The following table lists the words h' andm' and the span s in these instances:Table 2. Example spansExampleFigure 2(b)Figure 2(c)Figure 2(d)Figure oncâblescontrôleurs[4,4][4,4][1,1][4,6]In general, a path is either a single node, or asimple path, or a chain of simple paths. Thetranslations of single nodes are determined by theword alignments. The translation of a chain ofsimple paths can be obtained by chaining thetranslations of the simple paths. Fig. 2(f) providesan example.Note that even though the target of a rule istypically a path, it is not necessarily the case (e.g.,Fig. 2(g)). Our rule extraction algorithm guaranteesthe following property of target tree fragments: if anode in a target tree fragment is not aligned with anode in the source path, it must be a leaf node inthe tree fragment.3.3Generalization of RulesIn addition to the rules discussed the in theprevious subsection, we also generalize the rulesby replacing one of the end nodes in the path witha wild card and the part of speech of the word. Forexample the rule in Fig. 2(b) can be generalized intwo ways. The generalized versions of the ruleapply to any determiner modifying cable and bothmodifying any noun, respectively.both cablesdeux câblesgeneralize*/Det cablesboth */N*/Det câblesdeux */NFigure 3. Generalization of Transfer rule3.4Translation ProbabilityLet Si be a path in the source language dependencytree and Ti be a tree fragment in the targetlanguage. The translation probability P(Ti Si) canbe computed asP (Ti S i ) c(Ti , Si )c(Si ) Mwhere c(Si) is the count of Si in the training corpus,c(Ti,Si) is the number of times Ti is the translationof Si, and M is a smoothing constant.4Path-based TranslationGiven a source language sentence, it is translatedinto the target language in the following steps:Step 1: Parse the sentence to obtain its dependencystructure.Step 2: Extract all the paths in the dependency treeand retrieve the translations of all the paths.Step 3: Find a set of transfer rules such thata) They cover the whole dependency tree.b) The tree fragments in the rules can beconsistently merged into a target languagedependency tree.c) The merged tree has the highest probabilityamong all the trees satisfying the aboveconditions.Step 4: Output the linear sequence of words in thedependency tree.4.1Merging Tree FragmentsIn Step 3 of our algorithm, we need to merge thetree fragments obtained from a set of transfer rulesinto a single dependency tree. For example, themergers of target tree fragments in Fig. 4(b-d)result in the tree in Fig. 4(e). Since the paths inthese rules cover the dependency tree in Fig. 4(a),Fig. 4(e) is a translation of Fig. 4(a). The merger oftarget tree fragments is constrained by the fact thatif two target nodes in different fragments aremapped to the same source node, they must bemerged into a single node.Proposition 1: The merger of two target treefragments does not contain a loop.Proof: The unaligned nodes in each tree fragmentwill not be merged with another node. They havedegree 1 in the original tree fragment and will stillhave degree 1 after the merger. If there is a loop inthe merged graph, the degree of a node on the loopis at least 2. Therefore, all of the nodes on the loopare aligned nodes. This implies that there is a loopin the source dependency tree, which is clearlyfalse.Proposition 2: If the condition parts of a set oftransfer rules cover the input dependency tree, themerger of the right hand side of the rules is a tree.Proof: To prove it is a tree, we only need to provethat it is connected since Proposition 1 guaranteesthat there is no loop. Consider the condition part ofa rule, which is a path A in the source dependency

tree. Let r be the node in the path that is closest tothe root node of the tree. If r is not the root node ofthe tree, there must exist another path B that coversthe link between r and its parent. The paths A andB map r to the same target language node.Therefore, the target language tree fragments for Aand B are connected. Using mathematicalinduction, we can establish that all the treefragments are connected.The above two propositions establish the factthat the merge the tree fragments form a treestructure.(a)(b)(c)(d)both existing coaxial cablesboth cablesdeux câbles If m1 and m2 are on the same side of h but theirsource language counterpart are on differentsides of h, we will use the word order of theiroriginal in the source language.4.3Conflicts in MergerConflicts may arise when we merge tree fragments.Consider the two rules in Fig. 5. The rule in Fig.5(a) states that when the word same is used tomodify a noun, it is translated as même and appearsafter the noun. The rule in Fig. 5(b) states thatsame physical geometry is translated intogéométrie physique identique. When translating thesentence in Fig. 5(c), both of these rules can beapplied to parts of the tree. However, they cannotbe used at the same time as they translate same todifferent words and place them on differentlocation.(a)existing cablescoaxial cablescâbles existantssame */N(b)same physical geometrycâbles coaxiaux*/N mêmegéométrie physique identique(c)the disks have the same physical geometryFigure 5. Example Conflicts(e)deux câbles coaxiaux existantsFigure 4. Examples of word ordering4.2Node OrderingFor each node in the merged structure, we mustalso determine the ordering of among it and itschildren. If a node is present in only one of theoriginal tree fragments, the ordering between it andits children will be the same as the tree fragment.Suppose a node h is found in two tree fragments.For the children of h that come from the samefragment, their order is already specified. If twochildren m1 and m2 come from different fragments,we determine their order as follows: If m1 and m2 are on different sides of h in theiroriginal fragments, their order can be inferredfrom their positions relative to h. For example,the combination of the rules in Fig. 4(b) andFig. 4(c) translate both existing cables intodeux câbles existants. If m1 and m2 are on the same side of h and theirsource language counterparts are also on thesame side of h, we maintain their relativecloseness to the parent nodes: whichever wordwas closer to the parent in the source remainsto be closer to the parent in the target. Forexample, the combination of the rules in Fig.4(c) and Fig. 4(d) translates existing coaxialcables into câbles coaxiaux existants.4.4Probabilistic ModelOur translation model is a direct translation modelas opposed to a noisy channel model which iscommonly employed in statistical machinetranslation. Given the dependency tree S of asource language sentence, the probability of thetarget dependency tree T, P(T S), is computed bydecomposing it into a set of path translations:P (T S ) max P (Ti Si )CSi Cwhere C is a set of paths covering S; Si’s are pathsin C; Ti’s are possible translations for thecorresponding Si’s and T is the merger of all Ti’s.Note that the paths in C are allowed to overlap.However, no path should be totally contained inanother, as we can always remove the shorter pathto increase the probability without compromisingthe total coverage of C.4.5Graph-theoretic FormulationIf we ignore the conflict in merging tree fragmentsand assign the weight -log P(Ti Si) to the path Si,the problem of finding the most probabletranslation can be formulated as the followinggraph theory problem:Given a tree and a collection of paths in the treewhere each path is assigned a weight. Find asubset of the paths such that they cover all thenodes and edges in the tree and have theminimum total weight.

We call this problem the Minimum PathCovering of Trees. A closely related problem isthe Minimum Set Covering Problem:Given a collection F of subset set of a given setX, find a minimum-cardinality subcollection Cof F such that the union of the subsets in C is X.Somewhat surprisingly, while the Minimum SetCovering Problem is a very well-known NPComplete problem, the problem of Minimum PathCovering of Trees has not previously been studied.It is still an open problem whether this problem isNP-Complete or has a polynomial solution.If we assume that the number of paths coveringany particular node is bounded by a constant, thereexists a dynamic programming algorithm withO(n) complexity where n is the size of the tree(Lin&Lin, 2004). In the machine translation, thisseems to be a reasonable assumption.5Experimental ResultsWe implemented a path-based English-to-FrenchMT system. The training corpus consists of theEnglish-French portion of the 1999 EuropeanParliament Proceedings1 (Koehn 2002). It consistsof 116,889 pairs of sentences (3.4 million words).As in (Koehn, et. al. 2003), 1755 sentences oflength 5-15 were used for testing. We parsed theEnglish side of the corpus with Minipar2 (Lin2002). We then performed word-align on theparsed corpus with the ProAlign system(Cherry&Lin 2003, Lin&Cherry 2003b).From the training corpus, we extracted2,040,565 distinct paths with one or moretranslations. The BLEU score of our system on thetest data is 0.2612. Compared with the English toFrench results in (Koehn et. al. 2003), this ishigher than the IBM Model 4 (0.2555), but lowerthan the phrasal model (0.3149).6language parse tree. While the number of subtreesof a tree is exponential in the size of the tree, thenumber of paths in a tree is quadratic. The reducednumber of possible transfer units makes the dataless sparse.The target parse tree in a transfer-based systemtypically does not include word order information.A separate generation module, which ofteninvolves some target language grammar rules, isused to linearize the words in the target parse tree.In contrast, our transfer rules specify linear orderamong nodes in the rule. The ordering amongnodes in different rules is determined with a coupleof simply heuristics. There is no separategeneration module and we do not need a targetlanguage grammar.6.2Translational DivergenceThe Direct Correspondence Assumption (DCA)states that the dependency tree in source and targetlanguage have isomorphic structures (Hwa et. al.2002). DCA is often violated in the presence oftranslational divergence. It has been shown in(Habash&Dorr 2002) that translational divergencesare quite common (as much as 35% betweenEnglish and Spanish). For example, Fig. 6(a) is aHead Swapping Divergence.Even though we map the dependency tree in thesource language into a dependency tree in thetarget language, we are using a weaker assumptionthan DCA. We induce a target language structureusing a source language structure and the wordalignment. There is no guarantee that this targetlanguage dependency tree is what a target languagelinguist would construct. For example, deriveddependency tree for “X cruzar Y nadando” isshown in Fig. 6(b). Even though it is not a correctdependency tree for Spanish, it does generate thecorrect word order.Related Work and Discussions6.1Transfer-based MTBoth our system and transfer-based MT systemstake a parse tree in the source language andtranslate it into a parse tree in the target languagewith transfer rules. There have been many recentproposals to acquire transfer rules automaticallyfrom word-aligned corpus (Carbonell et al 2002,Lavoie et al 2002, Richardson et al 2001). Thereare two main differences between our system andprevious transfer-based approach: the unit oftransfer and the generation module.The units of transfer in previous transfer basedapproach are usually subtrees in the source12http://www.isi.edu/ koehn/europarl/http://www.cs.ualberta.ca/ lindek/minipar.htmX swim across YX cruzar Y nadandoX cruzar Y nadandoX cross Y swimming(b)(a)Figure 6. Translational Divergence7Conclusion and Future WorkWe proposed a path-based transfer model formachine translation, where the transfer rules areautomatically acquired from a word-alignedparallel corpus. The problem of finding the mostprobable translation is formulated as a graphtheoretic problem of finding the minimum pathcovering of the source language dependency tree.

8AcknowledgementsThis research is supported by NSERC and SunMi

Phrase-based SMT approaches can be classified into two categories. One type of approach works with parse trees. In (Yamada&Knight 2001), for example, the translation model applies three operations (re-order, insert, and translate) to an English parse tree to produce its Chinese trans

Related Documents:

ebay,4life transfer factor eczema,4life transfer factor effectiveness,4life transfer factor en el salvador,4life transfer factor en espanol,4life transfer factor en español,4life transfer factor energy go stix,4life transfer factor enummi,4life transfer factor 4life transfer factor equine,4li

DAC DAC ADC ADC. X19532-062819. Figur e 2: RF-ADC Tile Structure. mX0_axis Data Path ADC VinX0 mX1_axis Data Path ADC VinX1 mX2_axisData Path ADC VinX2 mX3_axis Data Path ADCVinX3 mX3_axis mX1_axis ADC mX0_axis Data Path ADC Data Path ADC VinX_23 VinX_01 Data Path Data Path Dual RF-ADC Tile Quad RF-ADC Tile. X23275-100919. Chapter 2: Overview

Each node maintains distance to destination e.g. 4 hops to network XYZ, . found better E path 4. C is lowest path cost in TENT, place C in PATH, exame C's LSP, found better E path again 5. E is lowest path cost in TENT, place E in PATH, examine E's LSP (no better paths) 6. TENT is empty, terminate . TDC375 Winter 2002 John Kristoff - DePaul University Open shortest path first (OSPF) 27 .

an O. Henry Award for “A Worn Path.” Her award-winning memoir, One Writer’s Beginnings (1984), was a runaway bestseller. Despite being very articulate about her writing, sitting graciously through countless interviews, and receiving large numbers of young fans at her home, Welty remained a very private person who always maintainedFile Size: 1MBPage Count: 16Explore furtherShort Story Project: "A Worn Path" by Eudora Welty - YouTubewww.youtube.com"A Worn Path" by Eudora Welty - 584 Words Essay Exampleivypanda.comA Worn Path: Study Guide SparkNoteswww.sparknotes.comWhat is the meaning of the short story "A Worn Path" by .www.enotes.comShort Story Analysis: A Worn Path by Eudora Welty - The .sittingbee.comRecommended to you b

Linux Command Line: Files and Directories 9 Relative path v.s. Absolute path (full path) Relative path—path related to the present working directory (pwd) If my current directory is /home2/ydu, relative path is: Training pwd will return : /home2/ydu If change directory to home2, relative path becomes ydu/Training pwd will return: /home2

Basic Heat and Mass Transfer complements Heat Transfer,whichispublished concurrently. Basic Heat and Mass Transfer was developed by omitting some of the more advanced heat transfer material fromHeat Transfer and adding a chapter on mass transfer. As a result, Basic Heat and Mass Transfer contains the following chapters and appendixes: 1.

A Heat Transfer Model Based on Finite Difference Method for Grinding A heat transfer model for grinding has been developed based on the finite difference method (FDM). The proposed model can solve transient heat transfer problems in grind-ing, and has the flexibility to deal with different boundary conditions. The model is first

Map Section 15 M-Path Overview The M-Path is a nine-mile paved multi-use path in urban Miami-Dade County. The M-Path was built in 1983 by Miami-Dade Transit (MDT) as part of the original Metrorail construction. The path or trail meanders within Miami-