Natural Language Processing - FIRE

2y ago
14 Views
2 Downloads
2.99 MB
96 Pages
Last View : 1m ago
Last Download : 3m ago
Upload by : Luis Waller
Transcription

Natural Language Processing andMachine Leaning: Synergy orDiscord- a Case Study with MT, IRand SentimentFIRE 2016Pushpak BhattacharyyaIIT Patna and IIT Bombaypb@cse.iitb.ac.in9th Dec, 2016

Need for NLP Huge amount of language data in electronic form Unstructured data (like free flowing text) will grow to 40zetabytes (1 zettabyte 1021 bytes) by 2020. How to make sense of this huge data? Example-1: e-commerce companies need to knowsentiment of online users, sifting through 1 lakh eopinions per week: needs NLP Example-2: Translation industry to grow to 37 billionbusiness by 2020

Nature of Machine Learning Automatically learning rules and concepts from dataLearning the concept of table.What is “tableness”Rule: a flat surface with 4 legs (approx.: to be refined gradually)

Why NLP and ML? Impossible for humans (single or a team) to makessense of and analyse humongous text data Many processing steps in NLP Impossible to give correct-consistent-complete rulescovering each and every situation Example: Rule: Adjectives preceded Nouns (“blue sky”),but not in French! (“ciel bleu”)

NLP: layered, art of SpeechTaggingDiscourse and Co FMEMMChunkingPOS taggingMorphologyEnglishAlgorithm

NLP Ambiguity Processing Lexical AmbiguityStructural AmbiguitySemantic AmbiguityPragmatic Ambiguity

Examples1. (ellipsis) Amsterdam airport: “Baby Changing Room”2. (Attachment/grouping) Public demand changes (credit for the phrase:Jayant Haritsa):(a) Public demand changes, but does any body listen to them?(b) Public demand changes, and we companies have to adapt tosuch changes.(c) Public demand changes have pushed many companies out ofbusiness3. (Pragmatics-1) The use of shin bone is to locate furniture in a darkroom9 Dec 2016FIRE16:NLP-ML7

New words and terms (people arevery creative!!)1. ROFL: rolling on the floor laughing; LOL: laugh out loud2. facebook: to use facebook; google: to search3. communifake: faking to talk on mobile; Obamacare:medical care system introduced through the mediation ofPresident Obama (portmanteau words)4. After BREXIT (UK's exit from EU), in Mumbai Mirror, andon Tweet: We got Brexit. What's next? Grexit. Departugal.Italeave. Fruckoff. Czechout. Oustria. Finish. Slovakout.Latervia. Byegium

Inter layer interactionText-1: “I saw the boy with a telescope which he dropped accidentally”Text-2: “I saw the boy with a telescope which I dropped accidentallynsubj(saw-2, I-1)root(ROOT-0, saw-2)det(boy-4, the-3)dobj(saw-2, boy-4)det(telescope-7, a-6)prep with(saw-2, telescope-7)dobj(dropped-10, telescope-7)nsubj(dropped-10, he-9)rcmod(telescope-7, dropped-10)advmod(dropped-10, accidentally-11)nsubj(saw-2, I-1)root(ROOT-0, saw-2)det(boy-4, the-3)dobj(saw-2, boy-4)det(telescope-7, a-6)prep with(saw-2, telescope-7)dobj(dropped-10, telescope-7)nsubj(dropped-10, I-9)rcmod(telescope-7, dropped-10)advmod(dropped-10, accidentally-11)Discourse and ology

NLP: deal with multilingualityLanguage Typology

Rules: when and when not When the phenomenon is understood AND expressed,rules are the way to go “Do not learn when you know!!” When the phenomenon “seems arbitrary” at the currentstate of knowledge, DATA is the only handle!– Why do we say “Many Thanks” and not “Several Thanks”!– Impossible to give a rule Rely on machine learning to tease truth out of data;Expectation not always met with

Impact of probability: Language modelingProbabilities computed in the context of corpora1. P(“The sun rises in the east”)2. P(“The sun rise in the east”) Less probable because of grammaticalmistake.3. P(The svn rises in the east) Less probable because of lexical mistake.4. P(The sun rises in the west) Less probable because of semantic mistake.9 Dec 2016FIRE16:NLP-ML12

Power of Data

Automatic image labeling(Oriol Vinyals, Alexander Toshev, Samy Bengio, andDumitru Erhan, 2014)Automatically captioned: “Two pizzassitting on top of a stove top oven”9 Dec 2016FIRE16:NLP-ML14

Automatic image labeling (cntd)9 Dec 2016FIRE16:NLP-ML15

Main methodology Object A: extract parts and features Object B which is in correspondence with A: extractparts and features LEARN mappings of these features and parts Use in NEW situations: called DECODING9 Dec 2016FIRE16:NLP-ML16

Feature correspondence“I am hungrynow”9 Dec 2016FIRE16:NLP-ML17

Linguistics-Computation Interaction Need to understand BOTH language phenomena andthe data An annotation designer has to understand BOTHlinguistics and statistics!Linguistics andLanguage phenomenaAnnotatorData andstatistical phenomena

Case Study-1: MachineTranslationGood Linguistics Good MLPushpak Bhattacharyya, Machine Translation, CRC Press,2015Raj Dabre, Fabien Cromiere, Sadao Kurohash and Pushpak Bhattacharyya,Leveraging Small Multilingual Corpora for SMT Using Many Pivot LanguagesNAACL 2015, Denver, Colorado, USA, May 31 - June 5, 2015.

Kinds of MT Systems(point of entry from source to the target text)(Vauquois. 1968)9 Dec 2016FIRE16:NLP-ML20

Simplified Vauquois

RBMT-EBMT-SMT spectrum: knowledge(rules) intensive to data (learning) intensiveRBMT9 Dec 2016EBMTFIRE16:NLP-MLSMT22

Illustration of difference of RBMT,SMT, EMT Peter has a house Peter has a brother This hotel has a museum9 Dec 2016FIRE16:NLP-ML23

The tricky case of ‘have’ translationEnglish Peter has a houseMarathi– Peter has a brother– This hotel has a museum–9 Dec 2016पीटरकडे एक घर आहे / piitar kadeek ghar aaheपीटरला एक भाऊ आहे / piitar laaek bhaauu aaheह्या हॉटे लमध्ये एक संग्रहालय आहे /hyaa hotel madhye eksaMgrahaalay aaheFIRE16:NLP-ML24

RBMTIfsyntactic subject is animate AND syntactic object is owned by subjectThen“have” should translate to “kade aahe”Ifsyntactic subject is animate AND syntactic object denotes kinship withsubjectThen“have” should translate to “laa aahe”Ifsyntactic subject is inanimateThen“have” should translate to “madhye aahe”9 Dec 2016FIRE16:NLP-ML25

EBMTX have Y X kade Y aahe /X laa Y aahe /X madhye Y aahe9 Dec 2016FIRE16:NLP-ML26

SMT has a house kade ek ghar aahe cm one house has has a car kade ek gaadii aahe cm one car has has a brother laa ek bhaau aahe cm one brother has has a sister laa ek bahiin aahe cm one sister has hotel has hotel madhye aahehotel cm has hospital has haspital madhye aahehospital cm has9 Dec 2016FIRE16:NLP-ML27

SMT: new sentence“This hospital has 100 beds” n-grams (n 1, 2, 3, 4, 5) like the following will beformed:– “This”, “hospital”, (unigrams)– “This hospital”, “hospital has”, “has 100”, (bigrams)– “This hospital has”, “hospital has 100”, (trigrams)DECODING !!!9 Dec 2016FIRE16:NLP-ML28

Foundation of SMT Data driven approach Goal is to find out the English sentence egiven foreign language sentence f whosep(e f) is maximum. Translations are generated on the basisof statistical model Parameters are estimated using bilingualparallel corpora9 Dec 2016FIRE16:NLP-ML29

The all important word alignment The edifice on which the structure of SMT is built(Brown et. Al., 1990, 1993; Och and Ney, 1993) Word alignment Phrase alignment (Koehn et al,2003) Word alignment Tree Alignment (Chiang 2005,200t; Koehn 2010) Alignment at the heart of Factor based SMT too(Koehn and Hoang 2007)9 Dec 2016FIRE16:NLP-ML30

Word alignment as the crux ofStatistical Machine TranslationEnglish(1) three rabbitsabFrench(1) trois lapinswx(2) rabbits of Grenoblebcd(2) lapins de Grenoblexyz9 Dec 2016FIRE16:NLP-ML31

Initial Probabilities:each cell denotes t(a w), t(a x) 41/41/41/4

/31/3z01/31/31/3 wxw1/21/200x1/21/200y0000z00009 Dec 2016FIRE16:NLP-ML33

Revised probabilities 31/3

“revised counts”ababcdbcdabcd /31/3z0000z02/91/31/39 Dec 2016FIRE16:NLP-ML35

Re-Revised probabilities 91/31/3Continue until convergence; notice that (b,x) binding gets progressively stronger;b rabbits, x lapins

Derivation: Key NotationsEnglish vocabulary : 𝑉𝐸French vocabulary : 𝑉𝐹No. of observations / sentence pairs : 𝑆Data 𝐷 which consists of 𝑆 observations looks like,𝑒11 , 𝑒12 , , 𝑒1𝑙1 𝑓11 , 𝑓12 , , 𝑓1𝑚1𝑒21 , 𝑒22 , , 𝑒2𝑙2𝑓21 , 𝑓22 , , 𝑓2𝑚2.𝑒𝑠1 , 𝑒𝑠2 , , 𝑒𝑠𝑙𝑠𝑓𝑠1 , 𝑓𝑠2 , , 𝑓𝑠𝑚𝑠.𝑒𝑆1 , 𝑒𝑆2 , , 𝑒𝑆𝑙𝑆𝑓𝑆1 , 𝑓𝑆2 , , 𝑓𝑆𝑚𝑆No. words on English side in 𝑠 𝑡ℎ sentence : 𝑙 𝑠No. words on French side in 𝑠 𝑡ℎ sentence : 𝑚 𝑠𝑖𝑛𝑑𝑒𝑥𝐸 𝑒𝑠𝑝 Index of English word 𝑒𝑠𝑝in English vocabulary/dictionary𝑖𝑛𝑑𝑒𝑥𝐹 𝑓𝑠𝑞 Index of French word 𝑓𝑠𝑞 in French vocabulary/dictionary(Thanks to Sachin Pawar for helping with the maths formulae processing)9 Dec 2016FIRE16:NLP-ML37

Modeling: Hidden variables andparametersHidden Variables (Z) :Total no. of hidden variables 𝑆𝑠 1 𝑙 𝑠 𝑚 𝑠 where each hidden variable isas follows:𝑠𝑧𝑝𝑞 1 , if in 𝑠 𝑡ℎ sentence, 𝑝𝑡ℎ English word is mapped to 𝑞 𝑡ℎ Frenchword.𝑠 0 , otherwise𝑧𝑝𝑞Parameters (Θ) :Total no. of parameters 𝑉𝐸 𝑉𝐹 , where each parameter is asfollows:𝑃𝑖,𝑗 Probability that 𝑖 𝑡ℎ word in English vocabulary is mapped to 𝑗𝑡ℎ wordin French vocabulary9 Dec 2016FIRE16:NLP-ML38

LikelihoodsData Likelihood L(D; Θ) :Data Log-Likelihood LL(D; Θ) :Expected value of Data Log-Likelihood E(LL(D; Θ)) :9 Dec 2016FIRE16:NLP-ML39

Constraint and Lagrangian𝑉𝐹𝑃𝑖,𝑗 1 , 𝑖𝑗 19 Dec 2016FIRE16:NLP-ML40

Differentiating wrt Pij9 Dec 2016FIRE16:NLP-ML41

Final E and M stepsM-stepE-step9 Dec 2016FIRE16:NLP-ML42

Pivot based MTAgain language property ML

Pivot for Indian languagetranslation9876BLEU543210p 0.1p 0.01p 0.001Bengali4.485.385.389 Dec du6.517.497.64

23BLEU2018.471714118DIRECT lDIRECT l BRIDGE BNDIRECT l BRIDGE GUDIRECT l BRIDGE KKDIRECT l BRIDGE MLDIRECT l BRIDGE MADIRECT l BRIDGE PUDIRECT l BRIDGE TADIRECT l BRIDGE TEDIRECT l BRIDGE URDIRECT l BRIDGE PU URl 20.53l 3721.3l 621.97l 3522.58l 622.64l 422.98l 3524.73

Effect of Multiple PivotsFr-Es translation using 2 pivotsSource: Wu & Wang (2007)Hi-Ja translation using 7 pivotsSource: Dabre et al (2015)SystemJa HiHi JaDirect33.8637.47Direct bestpivot35.74(es)39.49(ko)Direct Best-3pivots38.2241.09Direct All 7pivots38.4240.09

Multilingual Pseudo RelevanceFeedback:A way of Query Expansion andDisambiguation(Manoj Chinnakotla, Karthik Raman and Pushpak Bhattacharyya, MultilingualPRF: English Lends a Helping Hand, SIGIR 2010, Geneva, Switzerland, July,2010.)Manoj Chinnakotla, Karthik Raman and Pushpak Bhattacharyya, Multilingual RelevanceFeedback: One Language Can Help Another, Conference of Association of ComputationalLinguistics (ACL 2010), Uppsala, Sweden, July 2010.Arjun Atreya, Ashish Kankaria, Pushpak Bhattacharyya and GaneshRamakrishnan Query Expansion in Resource Scarce Languages: A MultilingualFramework Utilizing Document Structure, TALLIP (Transactions on Asian andLow-resource Language Processing), 2016.

Ranking: computing divergenceDocument wordsQuery wordsq1, q2, q3,q4, qnd1, d2, d3,d4, dnRanking Function – KL DivergenceScore(D) KL( R , D)Importance of term in Query P( w R )w9 Dec 2016FIRE16:NLP-ML log P( w D)Importance of term in Document48

Pseudo-Relevance Feedback(PRF)Initial ResultsDoc.d1d2d3d4.dmQuery QIR EngineUpdatedQuery RelevanceModelDocumentCollectionRerank Corpuswith (PRF)Score2.42.11.80.70.01 d1d2d3d4dkLearn Feedback Model from Documents9 Dec 2016FIRE16:NLP-ML49FinalResultsDoc. Scored22.3d12.2d31.8d50.6.dm0.01Assumetop ‘k’ asRelevant

Misses related wordsInitial Retrieval DocumentsAccessionto European UnionFinal ttiyearstateStemmed Query“access europe union”Relevant documentswith terms like“Membership”,“Member”, “Country”not ranked high enough9 Dec 2016FIRE16:NLP-ML50

Lack of RobustnessOlive OilProduction inMediterraneanFinal ExpandedQueryInitial Retrieved ervCup DocumentsaboutStemmed Query“oliv oil mediterranean” CookingproducCauses QueryDrift9 Dec 2016FIRE16:NLP-ML51

Harness Multilinguality Use Assisting Language An attractive proposition for languagesthat have poor monolingual performancedue to– Resource constraints like inadequate coverage– Morphological complexity9 Dec 2016FIRE16:NLP-ML52

Multilingual PRF: System FlowQuery inL1Top ‘k’ResultsInitialRetrievalGet ‘own’FeedbackModel inL1θL1InterpolateModelsL1IndexTranslateQuery into L2InitialRetrieval9 Dec 2016θL1TransTop odel inL2TranslateFeedbackModel into L153RankingusingFinalModel

KLD with Augmented QueryDocument wordsReformulatedQuery wordsq1, q2, q3,q4, qnOriginalQueryWords9 Dec 2016OWNPRFWordsd1, d2, d3,d4, dnPRFWordsfromTranslationFIRE16:NLP-ML54

English Lends a Helping Hand! English used as assisting language– Good monolingual performance– Ease of processing MultiPRF consistently and significantly outperformsmonolingual PRF baseline9 Dec 2016FIRE16:NLP-ML55

Experimental Setup English chosen as assisting language CLEF Standard Dataset for Evaluation– Four widely differing source languages uses French (Romance Family), German(WestGermanic) Finnish (Baltic-Finnic), Hungarian (Uralic-Ugric)– On more than 600 topics (only Title field) Use Google Translate for Query Translation9 Dec 2016FIRE16:NLP-ML56

italien, président (president),oscar , gouvern(governer) , scalfaro ,spadolin(molecular)MAP improves from0.1238 to 0.4324!Top ‘k’ResultsQuery inFrenchInitialRetrievalOscarhonorifiquepour desréalisateursitaliensTranslateQuery intoEnglishInitialRetrievalHonorary Oscarfor ItalianfilmmakersL1IndexTop ‘k’ResultsL2IndexGet Italien, oscar, cineastθL2GetFeedbackModel on,studio,italian,oscar,honarari,

rhein, ollunfall, fluss, ol,auen, erdreich, heizol, tank,lit, folg, oberrhein, teilMAP improves from0.0128 to 0.1184!Top ‘k’ResultsQuery inGermanInitialRetrievalGet ownFeedbackModelθL1MultiÖlunfälle undVögelL1IndexTranslateQuery intoEnglishInitialRetrievalBirds and Oil SpillsθL1Top eedbackModel inLOlunfall,vogel,ol,olverschmutz n,mcgrath,olivenol,fluss,tier,vergoss,vogelart (birdspecies),olkatastroph,olpreisOil, state,gallon

Can languages other thanEnglish help?

Language Typology9 Dec 2016FIRE16:NLP-ML60

MultiPRF with Non-English AssistingLanguages9 Dec 2016FIRE16:NLP-ML61

chronisch (chronic), pet, athlet(athlete), ekrank (ill), gesund(healthy), tuberkulos(tuberculosis), patient, reis (rice),personMAP improves from0.062 to 0.636!Top ‘k’ResultsQuery inGermanInitialRetrievalGet ownfeedbackmodel inL1BronchialasthmaL1IndexTranslateQuery intoSpanishInitialRetrievalEl asma bronquialTop olateasthma, allergi,krankheit (disease),allerg (allergenic),chronisch, hauterkrank(illness of skin), arzt(doctor), erkrank (ill)θL2GetFeedbackModel inLAsthma, bronquial,contamin, ozon, cient,enfermed, alerg, alergi, air

Results9 Dec 2016FIRE16:NLP-ML63

Dependence on MonolingualPerformanceMonolingual 0.4495 0.4033MAPRank9 Dec 201620.415354FIRE16:NLP-ML0.48050.4356 0.357813646

More than one assisting language Tried parallelcomposition for twoassisting languages Uniform interpolationweights used Exhaustively tried all60 combinations Improvementsreported over bestperforming PRF of L1or L29 Dec 2016FIRE16:NLP-ML65

Structure aware feedback terms(Atreya et. al, IJCNLP 2013) Title and conclusion are high importance regions In Wikipedia documents, get PRF terms from: title, body,infobox and categoriesMAP improvement9 Dec 2016Ablation resultsFIRE16:NLP-ML66

Cooperative Word SenseDisambiguationNiladri Dash, Pushpak Bhattacharyya, Jyoti Pawar (eds.), Wordnets ofIndian Languages, Springer, ISBN 978-981-10-1909-8, 2016.Mitesh Khapra, Salil Joshi and Pushpak Bhattacharyya, It takes two to Tango: ABilingual Unsupervised Approach for Estimating Sense Distributions usingExpectation Maximization, 5th International Joint Conference on Natural LanguageProcessing (IJCNLP 2011), Chiang Mai, Thailand, November 2011.

Definition: WSD Given a context:– Get “meaning”s of a set of words (targetted wsd) or all words (all words wsd) The “Meaning” is usually given by the id ofsenses in a sense repository– usually the wordnet

Example: “operation” (from Princeton Wordnet) Operation, surgery, surgical operation, surgical procedure, surgicalprocess -- (a medical procedure involving an incision with instruments;performed to repair damage or arrest disease in a living body; "they willschedule the operation as soon as an operating room is available"; "hedied while undergoing surgery") TOPIC- (noun) surgery#1 Operation, military operation -- (activity by a military or naval force (asa maneuver or campaign); "it was a joint operation of the navy and airforce") TOPIC- (noun) military#1, armed forces#1, armed services#1,military machine#1, war machine#1 mathematical process, mathematical operation, operation -((mathematics) calculation by mathematical methods; "the problems atthe end of the chapter demonstrated the mathematical processesinvolved in the derivation"; "they were learning the basic operations ofarithmetic") TOPIC- (noun) mathematics#1, math#1, maths#1

WSD for ALL Indian languages:Critical resource: dnetOriyaWordnetHindiWordnetMarathiWordnetNorth glishWordnet

Synset Based Multilingual DictionaryA sample entry from the MultiDict Expansion approach for creating wordnets [Mohanty et. al.,2008] Instead of creating from scratch link to the synsets ofexisting wordnet Relations get borrowed from existing wordnet

Cross Linkages Between SynsetMembers Captures native speakers intuition Wherever the word ladkaa appears inHindi one would expect to see theword mulgaa in Marathi A few wordnet pairs do not haveexplicit word linkages within synset, inwhich case one assumes every wordis linked all words on the other side

Resources for WSD- wordnet andcorpora: 5 scenariosAnnotated Corpus Aligned Wordnetsin L1Annotated Corpusin L2Scenario 1 Scenario 2 Scenario 3 VariesScenario 4 Scenario 5Seed Seed

Unsupervised WSD(No annotated corpora)Khapra, Joshi and Bhattacharyya, IJCNLP2011

ESTIMATING SENSE DISTRIBUTIONSIf sense tagged Marathi corpus were available, we could haveestimatedBut such a corpus is not available

EM for estimating sense distributionsE-Step‘M-Step

Results & DiscussionsOur valuesManual Cross LinkagesProbabilistic Cross LinkagesSkyline - self training data is availableWordnet first sense baselineS-O-T-A Knowledge Based ApproachS-O-T-A Unsupervised Approach Performance of projection using manual cross linkages i

Examples 1. (ellipsis) Amsterdam airport: “BabyChanging Room” 2. (Attachment/grouping) Public demand changes (credit for the phrase: Jayant Haritsa): (a) Public demand changes, but does any body listen to them? (b) Public demand changes, and we companies have to adapt to such changes. (c)

Related Documents:

FIRE TOPPER Fire Bowl User Manual Home » FIRE TOPPER » FIRE TOPPER Fire Bowl User Manual Contents [ hide 1 FIRE TOPPER Fire Bowl 2 Setting Up Your Fire Topper Fire Bowl 2.1 Set-Up 3 Placement and Location 3.1 Liquid Propane Tank 4 Using your Fire Topper Fire Bowl - For your safety, read before lighting. 5 Cleaning, Maintenance, Storage 6 .

social or cultural context (livelihoods, festivals, traditional, conflict) and perhaps regulatory framework (permit fires, illegal fires). The terms include fires, wildfires, wildland fire, forest fire, grass fire, scrub fire, brush fire, bush fire, veldt fire, rural fire, vegetation fire and so on (IUFRO 2018). The European Forest Fire

Rudolf Rosa - Deep Neural Networks in Natural Language Processing 14/116 ML in Natual Language Processing Before: complex multistep pipelines Preprocessing, low-level processing, high-level processing, classification, post-processing Massive feature engineering, linguistic knowledge Now: monolitic end-to-end systems (or nearly)

Fire Exit Legend Basement N Blood Fitness & Dance Center Fire Safety Plans 7.18.13 Annunciator Panel Sprinkler Room AP SR FIRE FIRE SR ELEV. Evacuation Route Stair Evacuation Route Fire Extinguisher Fire Alarm FIRE Pull Station Emergency Fire Exit Legend Level 1 N Blood Fitness & Dance Center Fire Safety Pl

Squirrel threw the fire to Chipmunk. The Fire Beings ran after the fire. One Fire Being grabbed Chipmunk’s back. The Fire Being’s hot hand put three stripes on Chipmunk’s back. Chipmunk threw the fire to Frog. The Fire Beings ran after the fire. One Fire Being grabbed Frog’s tail. Frog jumped, and

Appendix B: Glossary of Terms A p p e n d i x B-G l o s s a r y o f T e r m s Fire Depletion Area Burned: Fire Impacts: Fire Intensity: Fire Load: Fire Management: Fire Management Zone: Fire Prevention: Fire Protection: Fire Regime: Fire Risk: Area burned that directly impacts wood supply to the forest industry. This could include allocated .

processing. More recently, neural network models started to be applied also to textual natural language signals, again with very promising results. This tutorial surveys neural network models from the perspective of natural language processing research, in an attempt to bring natural-language researchers up to speed with the neural techniques.

Florez Jennifer 166204283 3/14/2022 Denver Fire Department Denver Fire Department Fire Inspector II Florez Joseph 196209071 9/29/2022 Denver Fire Department Denver Building Department Fire Inspector II Foster Joel 186607830 12/27/2021 Canon City Area Fire Protection District Canon City Area Fire Protection District Fire Suppression System Inspector