Natural Language Understanding In A Continuous Space

2y ago
19 Views
2 Downloads
3.00 MB
66 Pages
Last View : 16d ago
Last Download : 3m ago
Upload by : Rafael Ruffin
Transcription

Natural Language Understandingin a Continuous SpaceKarl-Moritz Hermann, Nal Kalchbrenner, Edward Grefenstette,and Phil Blunsomphil.blunsom@cs.ox.ac.uk

Features and NLPTwenty years ago log-linear models freed us from the shackles ofsimple multinomial parametrisations, but imposed the tyranny offeature engineering.2/38

Features and NLPDistributed/neural models allow us to learn shallow features for ourclassifiers, capturing simple correlations between inputs.3/38

Features and NLPFully connectedlayerK-Max pooling(k 3)FoldingWideconvolution(m 2)Dynamick-max pooling(k f(s) 5)Wideconvolution(m 3)Projectedsentencematrix(s 7)game's the same, just got more fierceDeep learning allows us to learn hierarchical generalisations.Something that is proving rather useful for vision, speech, and nowNLP.4/38

Outline1 Distributed Representations in Compositional Semantics2 From Vector Space Compositional Semantics to MTDistributed Representations in Compositional Semantics5/38

How to Represent Meaning in NLPWe can represent words using a number of approaches Characters POS tags Grammatical roles Named Entity Recognition Collocation and distributional representations Task-specific featuresAll of these representations can be encoded in vectors. Some ofthese representations capture meaning.Distributed Representations in Compositional Semantics6/38

A harder problem: paraphrase detectionQ: Do two sentences (roughly) mean the same?“He enjoys Jazz music” “He likes listening to Jazz” ?A: Use a distributional representation to find out?Distributed Representations in Compositional Semantics7/38

A harder problem: paraphrase detectionQ: Do two sentences (roughly) mean the same?“He enjoys Jazz music” “He likes listening to Jazz” ?A: Use a distributional representation to find out?Most representations not sensible on the sentence level Characters ? POS tags ? Grammatical roles ? Named Entity Recognition ? Collocation and distributional representations ? Task-specific features ?Distributed Representations in Compositional Semantics7/38

Why can’t we extract hierarchical features?The curse of dimensionalityAs the dimensionality of a representation increases, learningbecomes less and less viable due to sparsity.Dimensionality for collocation One word per entry: Size of dictionary (small) One sentence per entry: Number of possible sentences(infinite)) We need a di erent method for representing sentencesDistributed Representations in Compositional Semantics8/38

What is Deep LearningDeep Learning for LanguageLearning a hierarchy of features, where higher levels of abstractionare derived from lower levels.Distributed Representations in Compositional Semantics9/38

A door, a roof, a window: It’s a house,Distributed Representations in Compositional Semantics0.20.30.40.10.50.10.40.70.30.50.30.810/38

CompositionLots of possible ways to compose vectors Addition Multiplication Kronecker Product Tensor Magic Matrix-Vector multiplication .RequirementsNot commutativeEncode its parts?More than parts?Mary likes John 6 John likes MaryMagic carpet Magic CarpetMemory lane 6 Memory LaneDistributed Representations in Compositional Semantics11/38

AutoencodersWe want to ensure that the joint representation captures themeaning of its parts. We can achieve this by autoencoding ourdata at each step:For this to work, our autoencoder minimizes an objective functionover inputs xi , i 2 N and their reconstructions xi0 :N1X 0J xi2xi2iDistributed Representations in Compositional Semantics12/38

Recursive Autoencoders (RAE)We still want to learn how to represent a full sentence (or house).To do this, we chain autoencoders to create a recursive structure.We use a composition functiong (W input bias)g is a non-linearity (tanh, sigm)W is a weight matrixb is a biasDistributed Representations in Compositional Semantics13/38

A di erent task: paraphrase detectionQ: Do two sentences (roughly) mean the same?“He enjoys Jazz music” “He likes listening to Jazz” ?A: Use deep learning to find out!Distributed Representations in Compositional Semantics14/38

Other Applications: Stick a label on top1. Combine label andreconstruction errorE (N, l, ) XErec (n, ) Elbl (vn , l, )n2N1[xn kyn ] rn212Elbl (v , l, ) kl v k2Erec (n, ) 22. Strong results for a numberof tasks:Sentiment AnalysisParaphrase DetectionImage Search.Distributed Representations in Compositional Semantics15/38

Convolution Sentence ModelsDeep learning is suppose to learn the features for us, so can we doaway with all this structural engineering and forget about latentparse trees?Distributed Representations in Compositional Semantics16/38

Convolution Sentence ModelsOpentheDistributed Representations in Compositional SemanticspodbaydoorsHAL17/38

Convolution Sentence Modelsm 2OpentheDistributed Representations in Compositional SemanticspodbaydoorsHAL17/38

Convolution Sentence Modelsm 3m 2OpentheDistributed Representations in Compositional SemanticspodbaydoorsHAL17/38

Convolution Sentence Modelsm 3m 2OpentheDistributed Representations in Compositional SemanticspodbaydoorsHAL17/38

A CSM for Dialogue Act TaggingA: My favourite show is Masterpiece Theatre.Statement-Non-OpinionA: Do you like it by any chance?Yes-No-QuestionB: Oh yes!Yes-AnswersA: You do!Declarative Yes-No-QB: Yes, very much.Yes-AnswersA: Well, wouldn't you know.ExclamationB: As a matter of fact, I prefer public television.Statement-non-opinionB: And, uh, I have, particularly enjoy Englishcomedies.Statement-non-opinionDistributed Representations in Compositional Semantics18/38

A CSM for Dialogue Act TaggingDave: Hello HAL, doyou read me HAL?HAL: Affirmative, Dave,I read you.Distributed Representations in Compositional SemanticsDave: Open the pod baydoors, HAL.HAL: I'm sorry, Dave,I'm afraid I can't do that.19/38

A CSM for Dialogue Act TaggingDave: Hello HAL, doyou read me HAL?HAL: Affirmative, Dave,I read you.Distributed Representations in Compositional SemanticsDave: Open the pod baydoors, HAL.HAL: I'm sorry, Dave,I'm afraid I can't do that.19/38

A CSM for Dialogue Act TaggingDave: Hello HAL, doyou read me HAL?HAL: Affirmative, Dave,I read you.Distributed Representations in Compositional SemanticsDave: Open the pod baydoors, HAL.HAL: I'm sorry, Dave,I'm afraid I can't do that.19/38

A CSM for Dialogue Act TaggingDave: Hello HAL, doyou read me HAL?HAL: Affirmative, Dave,I read you.Dave: Open the pod baydoors, HAL.HAL: I'm sorry, Dave,I'm afraid I can't do that.s4HDaveOHALIx3hi g(Ixi1 Hi1hi1h4p4 Ssi )ipi softmax(O hi )Distributed Representations in Compositional Semantics19/38

A CSM for Dialogue Act TaggingDave: Hello HAL, doyou read me HAL?HAL: Affirmative, Dave,I read you.Dave: Open the pod baydoors, HAL.HAL: I'm sorry, Dave,I'm afraid I can't do that.s4HDaveOHALIx3hi g(Ixi1 Hi1hi1h4p4 Ssi )ipi softmax(O hi )State of the art results while allowing online processing of dialogue.Distributed Representations in Compositional Semantics19/38

Convolution Sentence Models: Question Answering?x : have-population-of(vancouver, x)Whatisthe population of Vancouver ?Competitive with a template based approach with lots of handengineered features.Distributed Representations in Compositional Semantics20/38

Convolution Sentence ModelsProjectedsentencematrix(s 7)The cat sat on the red matDistributed Representations in Compositional Semantics21/38

Convolution Sentence ModelsWideconvolution(m 3)Projectedsentencematrix(s 7)The cat sat on the red matDistributed Representations in Compositional Semantics21/38

Convolution Sentence ModelsDynamick-max pooling(k f(s) 5)Wideconvolution(m 3)Projectedsentencematrix(s 7)The cat sat on the red matDistributed Representations in Compositional Semantics21/38

Convolution Sentence ModelsWideconvolution(m 2)Dynamick-max pooling(k f(s) 5)Wideconvolution(m 3)Projectedsentencematrix(s 7)The cat sat on the red matDistributed Representations in Compositional Semantics21/38

Convolution Sentence ModelsFoldingWideconvolution(m 2)Dynamick-max pooling(k f(s) 5)Wideconvolution(m 3)Projectedsentencematrix(s 7)The cat sat on the red matDistributed Representations in Compositional Semantics21/38

Convolution Sentence ModelsK-Max pooling(k 3)FoldingWideconvolution(m 2)Dynamick-max pooling(k f(s) 5)Wideconvolution(m 3)Projectedsentencematrix(s 7)The cat sat on the red matDistributed Representations in Compositional Semantics21/38

Convolution Sentence ModelsFully connectedlayerK-Max pooling(k 3)FoldingWideconvolution(m 2)Dynamick-max pooling(k f(s) 5)Wideconvolution(m 3)Projectedsentencematrix(s 7)The cat sat on the red matDistributed Representations in Compositional Semantics21/38

Small Sentiment TaskSentiment prediction on the Stanford movie reviews dataset.Distributed Representations in Compositional Semantics22/38

Large Sentiment TaskAccuracy on the larger Twitter sentiment dataset.Distributed Representations in Compositional Semantics23/38

Question Classification TaskClassifierFeaturesH IERunigram, POS, head chunksNE, semantic relationsAcc. (%)91.0M AX E NTunigram, bigram, trigramPOS, chunks, NE, supertagsCCG parser, WordNet92.693.6M AX E NTunigram, bigram, trigramPOS, wh-word, head wordword shape, parserhypernyms, WordNet95.0SVMunigram, POS, wh-wordhead word, parserhypernyms, WordNet60 hand-coded rulesClassifierASVMB I NBM AX E NTM AX -TDNNNB OWDCNNM AX -TDNNunsupervised vectors84.4NB OWunsupervised vectors88.2DCNNunsupervised vectors93.0Table 3: Accuracy ondataset. The three non-neuon unigram and bigram feaported from (Go et al., 200function. At training time wpenultimate layer after the(Hinton et al., 2012).We see that the DCNNTable 2: Accuracy of six-way question classificaforms the other neural anSix-way question tionclassificationthe TRECquestionsdataset, e.g.on the TREConquestionsdataset.The secondThe NBoW performs simicolumn details the external features used in then-gram based ous approaches. The first four results are reforms worse than the NBospectively from Li and Roth(2002), Blunsom et al.Output:NUMBERcessive pooling of the max(2006), Huang et al. (2008) and Silva et al. (2011).Distributed Representations in Compositional Semanticslatter discards most of24/38the s

Feature: not only . . . but ransitionsbeonlymerelycrash,movieDistributed Representations in Compositional elyis25/38

Feature: as . . . as . . . leoncrimeathleteandlanefirestraintDistributed Representations in Compositional specialastheitmayandbringsan.theeffectswarmth26/38

Feature: kesquirkyDistributed Representations in Compositional dyouactedcastfrom scinatingimpressiveamusingintelligencecomedy27/38

Outline1 Distributed Representations in Compositional Semantics2 From Vector Space Compositional Semantics to MTFrom Vector Space Compositional Semantics to MT28/38

Generalisation in MT我From Vector Space Compositional Semantics to MT一杯白葡萄酒。29/38

Generalisation in MTLambda CalculusGeneralisation我From Vector Space Compositional Semantics to MT一杯白葡萄酒。29/38

Generalisation in MTi 'd like a glass of white wine , please .GenerationLambda CalculusGeneralisation我From Vector Space Compositional Semantics to MT一杯白葡萄酒。29/38

Generalisation in MTi 'd like a glass of white wine , please ormal logical representations are very hard to learn from data. Letus optimistically assume a vector space and see how we go.From Vector Space Compositional Semantics to MT29/38

GenerationA simple distributed representation language model:R(wn-2)C2 xpn Cnp(wn wnR(wn-1)C12 R(wn 2 )1 , wn 2 )/xpn Cn1 R(wn 1 )exp (R(wn )T pn )This is referred to as a log-bilinear model.From Vector Space Compositional Semantics to MT30/38

GenerationA simple distributed representation language model:R(wn-2)C2xpn Cnp(wn wnR(wn-1) C1x2 R(wn 2 ) Cn1 , wn 2 )/pn 1 R(wn 1 )exp (R(wn )T (pn ))Adding a non-linearity gives a version of what is often called aneural, or continuous space, LM.From Vector Space Compositional Semantics to MT30/38

Conditional GenerationR(tn-2)C2xR(tn-1) C1pn x cnCSMS(s1)pn CnS(s2)2 R(tn 2 )p(tn tnS(s3)S(s4) Cn1 , tn 2 , s)From Vector Space Compositional Semantics to MTS(s5)S(s6)1 R(tn 1 )S(s7) S(s8) CSM(n, s)/ exp (R(tn )T (pn ))31/38

Conditional Generation: A Naive First ModelR(tn-2)C2R(tn-1) xC1pn x cn S(s1) S(s2)pn C2 R(tn S(s3) S(s4) S(s5)2 ) C1 R(tn S(s6)1) S(s7) S(s8) s XS(sj )j 1p(tn tn1 , tn 2 , s)From Vector Space Compositional Semantics to MT/ exp (R(tn )T (pn ))32/38

Conditional Generation: A Naive First Model明天早上七点From Vector Space Compositional Semantics to MT叫醒我好?33/38

Conditional Generation: A Naive First Model明天早上七点From Vector Space Compositional Semantics to MT叫醒我好?33/38

Conditional Generation: A Naive First Model 明天 早上 七点From Vector Space Compositional Semantics to MT 叫醒 我 好 ?33/38

Conditional Generation: A Naive First Modelmay i have a wake-up call at seven tomorrow morning ?CLM 明天 早上 七点From Vector Space Compositional Semantics to MT 叫醒 我 好 ?33/38

Conditional Generation: A Naive First Modelwhere 's the currency exchange office ?CLM 在From Vector Space Compositional Semantics to MT 哪里?34/38

Conditional Generation: A Naive First Modeli 'd like a glass of white wine , please .CLM 我From Vector Space Compositional Semantics to MT 一 杯 白 葡萄酒。34/38

Conditional Generation: A Naive First Modeli 'm going to los angeles this afternoon .CLM 今天 下午From Vector Space Compositional Semantics to MT 准 去洛杉 。34/38

Conditional Generation: A Naive First Modeli 'd like to have a room under thirty dollars a night .CLM 我 想 要 一From Vector Space Compositional Semantics to MT 晚 三十 美元 以下 的房 。34/38

Conditional Generation: A Naive First Modeli 'd like to have a room under thirty dollars a night .CLM 我 想 要 一 晚 三十 美元 以下 的房 。Rough GlossI would like a night thirty dollars under room.From Vector Space Compositional Semantics to MT34/38

Conditional Generation: A Naive First Modeli 'd like to have a room under thirty dollars a night .CLM 我 想 要 一 晚 三十 美元 以下 的房 。Google TranslateI want a late thirties under ’s room.From Vector Space Compositional Semantics to MT34/38

Conditional Generation: A Naive First Modelyou have to do something about it .CLM 不 想想From Vector Space Compositional Semantics to MT 的 我 会 的。34/38

Conditional Generation: A Naive First Modeli can n't urinate .CLM 不 想想From Vector Space Compositional Semantics to MT 的 我 会 的。34/38

Conditional Generaton: Small test datasetChinese (zh) ! English (en)test 1test 2cdec (state-of-the-art MT)50.158.9Direct (naive bag of words source)Direct (convolution p(en zh))Noisy Channel (convolution p(zh en)p(en))Noisy Channel Direct30.844.650.151.033.250.451.855.2BLEU score results on a small Chinese ! English translation task.From Vector Space Compositional Semantics to MT35/38

SummaryAdvantages unsupervised features extraction alleviates domain andlanguage dependencies. very compact models. distributed representations for words naturally includemorphological properties. the conditional generation framework easily permits additionalcontext such as dialogue and domain level vectors.Challenges better conditioning on sentence position for long sentences,and all the other things this model does not capture! handling rare and unknown words.From Vector Space Compositional Semantics to MT36/38

ReferencesCompositional Morphology for Word Representations and Language ModellingBotha and Blunsom. ICML 2014Multilingual Models for Compositional Distributed SemanticsHermann and Blunsom. ACL 2014A Convolutional Neural Network for Modelling SentencesKalchbrenner, Grefenstette, and Blunsom. ACL 2014Recurrent Continuous Translation ModelsKalchbrenner and Blunsom. EMNLP 2013The Role of Syntax in Vector Space Models of Compositional SemanticsHermann and Blunsom. ACL 2013From Vector Space Compositional Semantics to MT37/38

Computational Linguistics at The University of OxfordWe are growing!Postdoctoral and DPhil studentships are available working inMachine Learning and Computational Linguisticshttp://www.clg.ox.ac.ukFrom Vector Space Compositional Semantics to MT38/38

HAL: I'm sorry, Dave, I'm afraid I can't do that. Dave: Hello HAL, do you read me HAL? HAL: Affirmative, Dave, I read you. Dave: Open the pod bay doors, HAL. h i g(Ix i1 H i 1h i1 Ss i) p i softmax(Oih i) HDave OHAL h 4 s 4 I x 3 p 4 State of the art results while allowing online p

Related Documents:

akuntansi musyarakah (sak no 106) Ayat tentang Musyarakah (Q.S. 39; 29) لًََّز ãَ åِاَ óِ îَخظَْ ó Þَْ ë Þٍجُزَِ ß ا äًَّ àَط لًَّجُرَ íَ åَ îظُِ Ûاَش

Collectively make tawbah to Allāh S so that you may acquire falāḥ [of this world and the Hereafter]. (24:31) The one who repents also becomes the beloved of Allāh S, Âَْ Èِﺑاﻮَّﺘﻟاَّﺐُّ ßُِ çﻪَّٰﻠﻟانَّاِ Verily, Allāh S loves those who are most repenting. (2:22

This is a book about Natural Language Processing. By "natural language" we mean a language that is used for everyday communication by humans; languages such as Eng-lish, Hindi, or Portuguese. In contrast to artificial languages such as programming lan-guages and mathematical notations, natural languages have evolved as they pass from

Since language provides the most natural means to com-municate, recent work has also explored the use of natural language and dialogue to teach agents actions. For exam-ple, [1] applied natural language dialogue technology to The 23rd IEEE International Symposium on Robot and Human Interactive Communication August 25-29, 2014. Edinburgh .

processing. More recently, neural network models started to be applied also to textual natural language signals, again with very promising results. This tutorial surveys neural network models from the perspective of natural language processing research, in an attempt to bring natural-language researchers up to speed with the neural techniques.

How we have applied NLP methods to the financial domain in the CFIE project Demo of our systems - Wmatrix-import - Wmatrix . Natural Language Processing 'getting computers to understand language' 'a branch of computer science dealing with analysing, understanding and generating natural language' .

understanding of language. Natural language processing is used to translate text, summarize large files, and provide sentiment analysis, among other applications. Natural Language Processing Overview 7 In Insurance: Natural language processing is often used in conjunction with machine learning models to extract information from unstructured data.

based on NLP can be applied to complete tasks but differ apparently from the human language processing system. The . presenting what actually happens within a Natural Language Processing system is by means of the 'levels of language' approach. This is also referred to as the synchronic model of language.(Liddy, 2001) [2]. This