Introduction To Machine Translation - Anoop Kunchukuttan

2y ago
51 Views
2 Downloads
1.23 MB
108 Pages
Last View : 1d ago
Last Download : 2m ago
Upload by : Nadine Tse
Transcription

Introduction to Machine TranslationAnoop KunchukuttanMicrosoft Translator, HyderabadNLP Course, IIT Hyderabad, 16 May 2020

Outline Introduction Statistical Machine Translation Neural Machine Translation Evaluation of Machine Translation Multilingual Neural Machine Translation Summary

Automatic conversion of text/speech from one natural language to anotherBe the change you want to see in the worldवह परिवर्तन बनो जो संसाि में दे खना चाहर्े होGovernment: administrative requirements,education, security.Enterprise: product manuals, customersupportSocial: travel (signboards, food),entertainment (books, movies, videos)Translation under the hood Cross-lingual Search Cross-lingual Summarization Building multilingual dictionariesAny multilingual NLP system will involve some kind of machine translation at some level

What is Machine Translation?Word order: SOV (Hindi), SVO (English)SOVपपछला पवश्व कप जमतनी ने जीर्ा ा ाE: Germany won the last World CupH: जमतनी ने पपछला पवश्व कप जीर्ा ा ाSOFree (Hindi) vs rigid (English) word orderThe last World Cup Germany wonThe last World Cup won Germany(correct)(grammatically incorrect)(meaning changes)VLanguage Divergence the great diversity among languages of the worldThe central problem of MT is to bridge this language divergence

Why is Machine Translation difficult? Ambiguity Same word, multiple meanings: मंत्री (minister or chess piece) Same meaning, multiple words: जल, पानी, नीि (water) Word Order Underlying deeper syntactic structure Phrase structure grammar? Computationally intensive Morphological Richness Identifying basic units/internal structure of wordsघिामागचा: घि ाा माग चा: that which is behind the house

Why should you study Machine Translation? One of the most challenging problems in Natural Language Processing Pushes the boundaries of NLP Involves analysis as well as synthesis Involves all layers of NLP: morphology, syntax, semantics, pragmatics, discourse Theory and techniques in MT are applicable to a wide range of other problems liketransliteration, speech recognition and synthesis, and other NLP problems.

Approaches to build MT systemsKnowledge based, Rule-based MTTransfer-basedData-driven, Machine Learning based MTInterlingua-basedExample-basedStatisticalNeural

Outline Introduction Statistical Machine Translation Neural Machine Translation Evaluation of Machine Translation Multilingual Neural Machine Translation Summary

Statistical Machine Translation

Parallel CorpusA boy is sitting in the kitchenे़एक लडका िसोई मे बैठा हैA boy is playing tennisएक लडका टे ननस खेल िहा हैA boy is sitting on a round tableएक लडका एक गोल मेज पि बैठा हैSome men are watching tennisकुछ आदमी टे ननस दे ख िहे हैA girl is holding a black bookएक लडकी ने एक काली ककर्ाब पकडी हैTwo men are watching a movieदो आदमी चलचचत्र दे ख िहे हैA woman is reading a bookएक औिर् एक ककर्ाब पढ िही हैA woman is sitting in a red carएक औिर् एक काले काि मे बैठी है

Let’s formalize the translation processWe will model translation using a probabilistic model. Why?- We would like to have a measure of confidence for the translations we learn- We would like to model uncertainty in translationE: target languageF: source languagee: target language sentencef : source language sentenceBesttranslationHow do wemodel thisquantity?Model: a simplified and idealized understanding of a physical processWe must first explain the process of translation

A very general frameworkfor many NLP problemsWe explain translation using the Noisy Channel ModelGenerate targetsentenceChannel corrupts thetargetSource sentence is acorruption of the targetsentenceTranslation is the process ofrecovering the original signalgiven the corrupted signalWhy use this counter-intuitive way of explaining translation? Makes it easier to mathematically represent translation and learn probabilities Fidelity and Fluency can be modelled separately

Let’s assume we know how to learn n-gram language modelsLet’s see how to learn the translation model 𝑃(𝒇 𝒆)To learn sentence translation probabilities, we first need to learn word-level translation probabilities

Parallel Corpusे़A boy is sitting in the kitchenएक लडका िसोई मे बैठा हैKey Idea 1A boy is playing tennisएक लडका टे निस खेल िहा हैA boy is sitting on a round tableएक लडका एक गोल मेज पि बैठा हैCo-occurrence of translatedwordsSome men are watching tennisकुछ आदमी टे निस दे ख रहे हैA girl is holding a black bookएक लडकी ने एक काली ककर्ाब पकडी हैTwo men are watching a movieदो आदमी चलचचत्र दे ख रहे हैA woman is reading a bookएक औिर् एक ककर्ाब पढ िही हैA woman is sitting in a red carएक औिर् एक काले काि मे बैठा हैWords which occur togetherin the parallel sentence arelikely to be translations(higher P(f e))

Key Idea 2Constraints:A source word can be aligned to a small number target language words in a parallel sentence.

Given a parallel sentence pair, find word level correspondencesThis set of links for asentence pair is calledan ‘ALIGNMENT’

But there are multiple possible alignmentsSentence 1With one sentence pair, we cannot find the correct alignment

Can we find alignments if we have multiple sentence pairs?Sentence 2Yes, let’s see how to do that

If we knew the alignments, we could compute P(f e)Sentence 1Sentence 2#(𝑓, 𝑒)𝑃 𝑓𝑒 #( , 𝑒)2𝑃 𝑃𝑟𝑜𝑓 प्रोफ 2#(𝑎, 𝑏): number of timesword a is aligned to word b

But, we can find the best alignment only if we know the wordtranslation probabilitiesThe best alignment is the one that maximizes the sentence translation probability𝑖 𝑚𝑃(𝒇, 𝒂 𝒆) 𝑃(𝑎) ς𝑖 𝑚𝑖 1 𝑃 𝑓𝑖 𝑒𝑎𝑖𝒂 argmax ෑ 𝑃(𝑓𝑖 𝑒𝑎𝑖 )𝒂This is a chicken and egg problem! How do we solve this?𝑖 1

We can solve this problem using a two-step, iterative processStart with random values for word translation probabilitiesStep 1: Estimate alignment probabilities using word translation probabilitiesStep 2: Re-estimate word translation probabilities- We don’t know the best alignment- So, we consider all alignments while estimating word translation probabilities- Instead of taking only the best alignment, we consider all alignments and weigh the wordalignments with the alignment probabilities𝑒𝑥𝑝𝑒𝑐𝑡𝑒𝑑 #(𝑓, 𝑒)𝑃 𝑓𝑒 𝑒𝑥𝑝𝑒𝑐𝑡𝑒𝑑 #( , 𝑒)Repeat Steps (1) and (2) till the parameters converge

At the end of the process Sentence 2Expectation-Maximization Algorithm: guaranteed to converge, maybe to local minimaHence we need to good initialization and training regimens.

IBM Models IBM came up with a series of increasingly complex models Called Models 1 to 5 Differed in assumptions about alignment probability distributions Simpler models are used to initialize the more complex models This pipelined training helped ensure better solutions

Phrase Based SMTParallel Corpusे़A boy is sitting in the kitchenएक लडका िसोई मे बैठा हैA boy is playing tennisएक लडका टे निस खेल िहा हैA boy is sitting on a round tableएक लडका एक गोल मेज पि बैठा हैSome men are watching tennisकुछ आदमी टे निस दे ख रहे हैA girl is holding a black bookएक लडकी ने एक काली ककर्ाब पकडी हैTwo men are watching a movieदो आदमी चलचचत्र दे ख रहे हैA woman is reading a bookएक औिर् एक ककर्ाब पढ िही हैA woman is sitting in a red carएक औिर् एक काले काि मे बैठा हैWhy stop at learning wordcorrespondences?KEY IDEAUse “Phrase” as the basictranslation unitNote: the term ‘phrase’ isnot used in a linguisticsense(Sequence of Words)

Examples of phrase pairsThe Prime Minister of Indiais running fasthonoured withRahul lost the matchभािर् के प्रधान मंत्रीbhArata ke pradhAna maMtrIIndia of Prime Ministerर्ेज भाग िहा हैteja bhAg rahA haifast run -continuous isसे सम्माननर् ककयाse sammanita kiyAwith honoured didिाहुल मुकाबला हाि गयाrAhula mukAbalA hAra gayARahul match lost

Benefits of PB-SMTLocal Reordering Intra-phrase re-ordering can be memorizedThe Prime Minister of Indiaभािर् के प्रधान मंत्रीbhaarat ke pradhaan maMtrIIndia of Prime MinisterSense disambiguation based on local context Neighbouring words help make the choiceheads towards Puneheads the committeeपणु े की ओि जा िहे हैpune ki or jaa rahe haiPune towards go –continuousisसममनर् की अध्यक्षर्ा किर्े हैSamiti kii adhyakshata karte haicommittee of leading verbalizer is

Benefits of PB-SMT (2)Handling institutionalized expressions Institutionalized expressions, idioms can be learnt as a single unithung assemblyत्रत्रशंकु पवधानसभाtrishanku vidhaansabhaHome Ministerगहृ मंत्रीgruh mantriiExit pollचुनाव बाद सवेक्षणchunav baad sarvekshana Improved Fluency The phrases can be arbitrarily long (even entire sentences)

Mathematical ModelLet’s revisit the decision rule for SMT modelLet’s revisit the translation model p(f e)- Source sentence can be segmented in I phrases- Then, p(f e) can be decomposed as:starti :start position in f of ith phrase of eendi :end position in f of ith phrase of eDistortionprobabilityPhrase TranslationProbability28

Learning The Phrase Translation ModelInvolves Structure Parameter Learning: Learn the Phrase Table: the central data structure in PB-SMTThe Prime Minister of Indiaभािर् के प्रधान मंत्रीर्ेज भाग िहा हैis running fastthe boy with the telescopeRahul lost the matchदिू बीन सेलड़के कोिाहुल मुकाबला हाि गया Learn the Phrase Translation ProbabilitiesPrime Minister of Indiaभािर् के प्रधान मंत्रीIndia of Prime Minister0.75Prime Minister of Indiaभािर् के भूर्पूवत प्रधान मंत्रीIndia of former Prime Minister0.02Prime Minister of Indiaप्रधान मंत्रीPrime Minister0.23

Learning Phrase Tables from Word Alignments Start with word alignments Word Alignment : reliable inputfor phrase table learning high accuracy reported for manylanguage pairs Central Idea: A consecutivesequence of aligned wordsconstitutes a “phrase pair”Which phrase pairs to include in the phrase table?

Source: SMT, Phillip KoehnProfessor CNRProfessor CNR RaoProfessor CNR Rao wasProfessor CNR Rao washonoured with the Bharat Ratnahonoured with the Bharat Ratnahonoured with the Bharat Ratnahonoured with the Bharat Ratnaप्रोफेसि सी.एन.आिप्रोफेसि सी.एन.आि िावप्रोफेसि सी.एन.आि िावप्रोफेसि सी.एन.आि िाव कोभािर्ित्न से �न से सम्माननर् ककयाभािर्ित्न से सम्माननर् ककया गयाको भािर्ित्न से सम्माननर् ककया गया

Discriminative Training of PB-SMT Directly model the posterior probability p(e f) Use the Maximum Entropy framework hi(f,e) are feature functions , λi’s are feature weights Benefits: Can add arbitrary features to score the translationsCan assign different weight for each featuresAssumptions of generative model may be incorrectFeature weights λi are learnt during tuning

Typical SMT gnedCorpusPhrasetablePhraseExtractionParallel TuningCorpusTuningModel parametersDistortion ModellingOther FeatureExtractorsTarget Language Monolingual erTargetsentence

DecodingRam ateिाम नेrice with the spoonचम्मच सेचावलखायेSearching for the best translations in the space of all translations

Decoding is challenging We picked the phrase translation that made sense to usThe computer has less intuitionPhrase table may give many options to translate the input sentenceMultiple possible word ordersRamिामिाम नेिाम कोिाम सेateखायेखा मलयाखा मलया हैrice withधानचावलके साासेthe चम्मचचम्मच सेचम्मच के सााAn NP complete search problem Needs a heuristic search method

PartialHypothesisSearch Space and Search OrganizationHypothesisExpansionखा मलयािाम नेचावलFinalHypothesisचम्मच Hypothesis Incremental construction Each hypothesis is scored using the model Promising Hypotheses are maintained in abounded priority queue Limit to the reordering window for efficiency

We have looked at a basic phrase-based SMT systemThis system can learn word and phrase translations from parallel corporaBut many important linguistic phenomena need to be handled Divergent Word Order Rich morphology Named Entities and Out-of-Vocabulary words

Getting word order rightPhrase based MT is not good at learning word orderingSolution: Let’s help PB-SMT with some preprocessing of the inputChange order of words in input sentence to match order of the words in the targetlanguageBahubali earned more than 1500 crore rupees at the boxofficeBahubali the boxoffice at 1500 crore rupees earnedबाहुबली ने बॉक्सओकफस पि 1500 किोड रुपए कमाए

Parse the sentence to understandits syntactic structure132Apply rules to transform the treeVP VBD NP PP VP PP NP VBDThis rule capturesSubject-Verb-Object to SubjectObject-Verb divergence34251

Prepositions in English become postpositions inHindiPP IN NP PP NP IN54The new input to the machine translation system isBahubali the boxoffice at 1500 crore rupees earnedNow we can translate with little reorderingबाहुबली ने बॉक्सओकफस पि 1500 किोड रुपए कमाएThese rules can bewritten manually orlearnt from parse trees

Addressing Rich MorphologyInflectional forms of the Marathi word �ाघिांसमोिhousein the houseon the housebelow the housein the housebehind the houseof the housethat which is behind the housein front of the housethat which is in front of the housein front of the housesHindi words with the suffix �दपंजू casteismimperialismThe corpus should contains allvariants to learn translationsThis is infeasible!Language is very productive, you can combine words to generate new words

Addressing Rich MorphologyInflectional forms of the Marathi word घिघिघि ाा र्घि ाा विर्ीघि ाा खालीघि ाा मध्येघि ाा मागेघि ाा चाघि ाा माग चाघि ाा समोिघि ाा समोि चाघि ाा ां समोिhousein the houseon the housebelow the housein the housebehind the houseof the housethat which is behind the housein front of the housethat which is in front of the housein front of the housesHindi words with the suffix वादसाम्य वादसमाज वादपंजू ी वादजार्ी वादसाम्राज्य ialism Break the words into itscomponent morphemes Learn translations for themorphemes Far more likely to findmorphemes in the corpus

Handling Names and OOVsSome words not seen during train will be seen at test timeThese are out-of-vocabulary (OOV) wordsNames are one of the most important category of OOVs There will always be names not seen during trainingHow do we translate names like Sachin Tendulkar to Hindi?What we want to do is map the Roman characters to Devanagari to they sound thesame when read सचचन र्ें दलु कि We call this process ‘transliteration’Can be seen as a simple translation problem at character level with no re-orderings a c h i n स च िा न

Outline Introduction Statistical Machine Translation Neural Machine Translation Evaluation of Machine Translation Multilingual Neural Machine Translation Summary

Neural Machine Translation

Topics Why NMT? Encoder-Decoder Models Attention Mechanism Backtranslation Subword-level Models

SMT, Rule-based MT and Example based MT manipulate symbolic representations of knowledgeEvery word has an atomic representation,which can’t be further analyzedhomewaterNo notion of similarity or relationship between words- Even if we know the translation of home, we can’ttranslate house if it an OOVhousetap01000101000010000123Difficult to represent new concepts- We cannot say anything about ‘mansion’ if it comes up at test time- Creates problems for language model as well whole are of smoothing exists to overcome thisproblemSymbolic representations are discrete representations- Generally computationally expensive to work with discrete representations- e.g. Reordering requires evaluation of an exponential number of candidates

Neural Network techniques work with distributed representationsEvery word is represented by a vector of numbers No element of the vector represents a particular wordThe word can be understood with all vector elementsHence distributed representationBut less interpretableCan define similarity between words- Vector similarity measures like cosine similarity- Since representations of home and house, wemay be able to translate .240.60.4tapNew concepts can be represented using a vector with different valuesSymbolic representations are continuous representations- Generally computationally more efficient to work with continuous values- Especially optimization problemsWord vectors orembeddings

Topics Why NMT? Encoder-Decoder Models Attention Mechanism Backtranslation Subword-level Models

Encode - Decode ParadigmInputEmbedEmbeddingEncoderSource RepresentationDecoderOutputEntire input sequence is processed before generation starts In PBSMT, generation was piecewiseThe input is a sequence of words, processed one at a time While processing a word, the network needs to know what ithas seen so far in the sequence Meaning, know the history of the sequence processing Needs a special kind of neural network: Recurrent neuralnetwork unit which can keep state information

Encode - Decode Paradigm ExplainedUse two RNN networks: the encoder and the decoder(3) This is usedto initialize thedecoder state(1) Encoderprocesses onesequence at atimeh0h1s0h2(4) Decodergenerates oneelement at ngh3theEncodingपढीs3s2h4I(5) continuetill end ofsequence tag isgeneratedbook(2) A representationof the sentence isgenerated EOS s4

What is the decoder doing at each time-step?softmaxFFThis captures y jRNN-LSTMThis captures x, c h4

Training an NMT ModelMaximumLikelihoodEstimation Optimized with Stochatic Gradient Descent or variants like ADAM in mini-batches End to end training Target Forcing: Gold-Standard previous word is used, otherwise performance deteriorates Discrepancy in train and test scenarios Solutions: scheduled sampling Word-level objective is only an approximation to sentence-level objectives Likelihood objective is different from evaluation metrics

Decoding Strategies Exhaustive Search: Score each and every possible translation – Forget it! Sampling Greedy Beam Search

Greedy Decodingw10.03w20.7w30.05w30.1w40.08w50.04Sampling DecodingSelect best word usingthe distribution𝑃(𝑦𝑗 𝑦 𝑗 , 𝒙)Generate one word at a time le next wordusing the distribution𝑃(𝑦𝑗 𝑦 𝑗 , 𝒙)

Greedy Search is not �𝟐Probability of sequence w1w3 0.15Probability of sequence w2w2 0.18

Topics Why NMT? Encoder-Decoder Models Attention Mechanism Backtranslation Subword-level Models

The entire sentence is represented by a single vectorProblems A single vector is not sufficient to represent to capture all the syntactic and semanticcomplexities of a sentence Solution: Use a richer representation for the sentences Problem of capturing long term dependencies: The decoder RNN will not be able to makeuse of source sentence representation after a few time steps Solution: Make source sentence information when making the next prediction Even better, make RELEVANT source sentence information availableThese solutions motivate the next paradigm

Encode - Attend - Decode ParadigmRepresent the source sentence bythe set of output vectors from theencoderAnnotationvectorsEach output vector at time t is acontextual representation of theinput at time te1e2s0e3s1Note: in the encoder-decodeparadigm, we ignore the encoderoutputse4s1s3s4IreadthebookLet’s call these encoder outputvectors annotation vectors

How should the decoder use the set of annotation vectors while predicting the next character?Key Insight:(1)Not all annotation vectors are equally important for prediction of the next element(2)The annotation vector to use next depends on what has been generated so far by the decodereg. To generate the 3rd target word, the 3rd annotation vector (hence 3rd source word) is most importantOne way to achieve this:Take a weighted average of the annotation vectors, with more weight to annotation vectors which needmore focus or attentionThis averaged context vector is an input to the decoder

मैंLet’s see an example of how the attention mechanism worksduring decodingh0h1c1𝑛𝑐𝑗 𝑎𝑖𝑗 𝑒𝑖𝑖 1a41a11e1a21e2a31e3e4For generation of ith output character:ci : context vectoraij : annotation weight for the jth annotationvectorej: jth annotation vector

नेमैंh0h1h2c2a42a12e1a22e2a32e3e4

a23a43a33e3e4

h4c4a44a14a24e1e2e3a34e4

h4c5a45a15a25e1e2e3 EOS a35e4h5

How do we find the attention weights?Let the training data help you decide!!Idea: Pick the attention weights that maximize the overall translation likelihood accuracyScoring function g to match theencoder and decoder states

How do we find the attention weights?Let the training data help you decide!!Idea: Pick the attention weights that maximize the overall translation likelihood accuracyg can be a feedforward network ora similarity metric like dot product

How do we find the attention weights?Let the training data help you decide!!Idea: Pick the attention weights that maximize the overall translation likelihood accuracyNormalize score to obtainattention weights

How do we find the attention weights?Let the training data help you decide!!Idea: Pick the attention weights that maximize the overall translation likelihood accuracyFinal context vector is weightedaverage of encoder outputs

Let us revisit what the decoder does at time step tsoftmaxFFThis captures y jRNN-LSTMThis captures source (x)

Topics Why NMT? Encoder-Decoder Models Attention Mechanism Backtranslation Subword-level Models

The models discussed so far do not use monolingual dataCan monolingual data help improve NMT models?

Backtranslationmonolingual target language corpusCreate pseudo-parallel corpus using Target tosource model (Backtranslated corpus)𝑻𝒎Decode usingTGT-SRC MT SystemJointly train the trueand backtranslatedcorpusNeed to find the right balance between true andbacktranslated corpusWhy is backtranslation useful?- Target side language model improves (target side is clean)- Adaptation to target language domain- Prevent overfitting by exposure to diverse corporaParticularly useful for low-resource languages𝑺′𝒎𝑺′𝒎𝑻𝒎Train newSRC-TGT MT System𝑺𝒑𝑻𝒑SRC-TGT MT model

Self TrainingTrain Initial SRC-TGT MT System𝑻𝒑𝑺𝒑Create pseudo-parallel corpus using initial sourceto target model (Forward translated corpus)Target side of pseudo-parallel corpus is noisy- Train the S-T mode on pseudo-parallel corpora- Tune on true parallel corporaWhy is self-training useful?- Adaptation to source language domain- Prevent overfitting by exposure to diverse corporaWorks well if the initial model is reasonably goodmonolingual source language corpus𝑺𝒎Decode usingSRC-TGT MT System𝑻′𝒎Train model with forward-translated corpus𝑻′𝒎𝑺𝒎Train new SRC-TGT MT System𝑺𝒑𝑻𝒑Finetune SRC-TGT MT SystemSRC-TGT MT model

Topics Why NMT? Encoder-Decoder Models Attention Mechanism Backtranslation Subword-level Models

The Vocabulary Problem- The input & output embedding layers are finite- How to handle an open vocabulary?- How to translate named entities?- Softmax computation at the output layer is expensive- Proportional to the vocabulary size

Subword-level TranslationOriginal sentence: प्रयागिाज में 43 ददनों र्क चलने वाला माघ मेला आज से शरूु हो गया हैPossible inputs to NMT system:- प्रयाग @@िाज में 43 दद @@नों र्क चल @@ने वाला माघ मेला आज से शरूु हो गया है- प्र या ग िा ज में 43 दद नों र् क च ल ने वा ला मा घ मे ला आज से शरूु हो गया हैObvious Choices: Character, Character n-gram, Morphemes They all have their flaws!The New Subword Representations: Byte-Pair Encoding, Sentence-piece

{प्रयाग, िाज, में दद, नों, र्क, चल, ने}Learn a fixed vocabulary &segmentation model fromtraining dataSegment Training Data basedon vocabulary{प्रयाग िाज}{च ल}{चल, �� @@िाज में 43 दद @@नों र्क चल @@ने वाला माघमेला आज से शुरू हो गया है- Every word can be expressed as a concatenation of subwords- A small subword vocabulary has good representative powerTrain NMT system on thesegmented model- 4k to 64k depending on the size of the parallel corpus- Most frequent words should not be segmented

Byte Pair EncodingByte Pair Encoding is a greedy compression technique (Gage, 1994)P1 ADNumber of BPE merge operations 3Vocab: A B C D E FWords to encodeBADDFADFEEDEADDEEFP2 EEP3 t segmentation Inspired from compression theoryMDL Principle (Rissansen, 1978) Select segmentation which maximizes datalikelihood79

Problems with subword level translationUnwanted splits:नािाज़ ना िाज़ no secretProblem is exacerbated for: Named Entities Rare Words Numbers

We can look at translation as a sequence to sequence transformation problemRead the entire sequence and predict the output sequence (using function F)Ireadthebook Length of output sequenceneed not be the same as inputsequenceFमैंने Prediction at any time step thas access to the entire inputककर्ाबपढी A very general framework

Sequence to Sequence transformation is a very general frameworkMany other problems can be expressed as sequence to sequence transformation Summarization: Article Summary Question answering: Question Answer Transliteration: character sequence character sequence Image labelling: Image Label Speech Recognition, TTS, etc.

Note no separate language model Neural MT generates fluent sentences Quality of word order is better No combinatorial search required for evaluating different word orders: Decoding is very efficient compared to PBSMT End-to-end training Attention as soft associative lookup

Outline Introduction Statistical Machine Translation Neural Machine Translation Evaluation of Machine Translation Multilingual Neural Machine Translation Summary

Evaluation of Machine Translation

Evaluation of MT output How do we judge a good translation? Can a machine do this? Why should a machine do this? Because human evaluation is time-consuming and expensive! Not suitable for rapid iteration of feature improvements

What is a good translation?Evaluate the quality with respect to: Adequacy: How good the output is in terms of preserving content ofthe source text Fluency: How good the output is as a well-formed target languageentityFor example, I am attending a lectureमैं एक व्याख्यान बैठा हूूँMain ek vyaakhyan baitha hoonI a lecture sit (Present-first person)I sit a lecture : Adequate but not fluentमैं व्याख्यान हूूँMain vyakhyan hoonI lecture amI am lecture: Fluent but not adequate.

Human EvaluationDirect AssessmentAdequacy:Is the meaning translated correctly?5 AllFluency:4 Most Is the sentence grammatically valid?3 Much5 Flawless2 Little4 Good1 None3 Non-native2 Disfluent1 IncomprehensibleRanking Translations

Automatic EvaluationHuman evaluation is not feasible in the development cycleKey idea of Automatic evaluation:The closer a machine translation is to a professional human translation, the better it is. Given: A corpus of good quality human reference translations Output: A numerical “translation closeness” metric Given (ref,sys) pair, score f(ref,sys) ℝwhere,sys (candidate Translation): Translation returned by an MT systemref (reference Translation): ‘Perfect’ translation by humansMultiple references are better

Some popular automatic evaluation metrics BLEU (Bilingual Evaluation Understudy) TER (Translation Edit Rate) METEOR (Metric for Evaluation of Translation with Explicit Ordering)10.9How good is an automatic metric?0.8Score0.70.60.50.40.3How well does it correlate with human judgment?0.20.101234SystemRefM1M25

Outline Introduction Statistical Machine Translation Neural Machine Translation Evaluation of Machine Translation Multilingual Neural Machine Translation Summary

Multilingual Neural MachineTranslation

NMT Models involving more than two languagesUse-cases for Multilingual NMTMassivelyMulti-wayNMT systemMultisourceTranslationLow-resourceNMT usingTransfer LearningUnseen LanguagePairsRaj Dabre, Chenhui Chu, Anoop Kunchukuttan. A Comprehensive Survey of Multilingual Neural MachineTranslation. pre-print arxiv: 2001.01115

Diversity of Indian LanguagesHighly multilingual countryGreenberg Diversity Index 0.9 4 major language familiesSource: Quora1600 dialects22 scheduled languages125 million English speakers8 languages in the world’s top 20 languages11 languages with more than 25 million speakers30 languages with more than 1 million speakersSources: Wikipedia, Census of India 2011

General Multilingual Neural Translation(Firat et al., 2016)HindiEncoder1Decoder1BengaliEncoder2Shared allel CorporaHindi EnglishTelugu EnglishBengali GermanGerman

Compact Multilingual NMT(Johnson et al., GermanTelugu

Combine Corpora from different languages(Nguyen and Chang, 2017)I am going homeIt rained last weekહુ ઘરે જવ છૂIt is cold in Puneછે લ્લા આઠવડિયા માવર્ાા દ પાિયોMy home is near the marketपुण्यार् ा ंड आहेमाझा घि बाजािाजवळ आहेConvert ScriptConcat CorporaI am going homeIt rained last weekIt is cold in PuneMy home is near the marketहु घिे जव छूछे ल्ला आठवडडया मा वसातद पाड्योपण्ु यार् ा ंड आहेमाझा घि बाजािाजवळ आहे

There is only one decoder, how do we generate multiple languages?Language Tag Trick Special token in input to indicate target languageOriginal Input: मकि संक्ांनर् भगवान सूयत के म

Introduction Statistical Machine Translation Neural Machine Translation Evaluation of Machine Translation Multilingual Neural Machine Translation Summary. Automatic conversion of text/speech from one natural langu

Related Documents:

(Statistical) Machine Translation Cristina Espana i Bonet MAI{ANLP Spring 2014. Overview 1 Introduction 2 Basics 3 Components 4 The log-linear model 5 Beyond standard SMT . Example-based Translation Rule-based systems. Introduction Machine Translation Taxonomy Machine Translation systems Human T

Rule-based machine translation. Statistical machine transla-tion. Evaluation of machine translation output. 1. Introduction. 1.1 Defining machine translation. In today’s globalized world, the need for instant translation is . constantly growing, a demand human translators cannot meet fast enough (Quah 57).

The importance of Translation theory in translation Many theorists' views have been put forward, towards the importance of Translation theory in translation process. Translation theory does not give a direct solution to the translator; instead, it shows the roadmap of translation process. Theoretical recommendations are, always,

neural machine translation (NMT) paradigm. To this end, an experiment was carried out to examine the differences between post-editing Google neural machine translation (GNMT) and from-scratch translation of English domain-specific and general language texts to Chinese. We analysed translation process and translation product data from 30 first-year

Philipp Koehn, Marcello Federico, Wade Shen, Nicola Bertoldi, Ondˇrej Bojar, Chris Callison-Burch, Brooke Cowan, Chris Dyer, Hieu Hoang, Richard Zens, . Statistical machine translation has emerged as the dominant paradigm in machine translation research. Statistical machine translation is built on the insight that many translation choices

Mar 27, 2018 · Deepak Chopra: So -- John Donvan: -- ladies and gentlemen, Anoop Kumar. 00:05:00 [applause] Anoop, thank you so much for joining us at Intelligence Squared. You are a Board-certified emergency physician. You are author of the book, "Michelangelo’s Medicine: How Redefining the Human Bod

Accepted translation 74 Constraints on literal translation 75 Natural translation 75 Re-creative translation 76 Literary translation 77 The sub-text 77 The notion of theKno-equivalent1 word - 78 The role of context 80 8 The Other Translation Procedures 81 Transference 81 Naturalisation 82 Cultural equivalent 82 Functional equivalent 83

An Introduction to Random Field Theory Matthew Brett , Will Penny †and Stefan Kiebel MRC Cognition and Brain Sciences Unit, Cambridge UK; † Functional Imaging Laboratory, Institute of Neurology, London, UK. March 4, 2003 1 Introduction This chapter is an introduction to the multiple comparison problem in func-