Structured Learning

3y ago

30 Views

2 Downloads

1.77 MB

36 Pages

Last View : 9d ago

Last Download : 3m ago

Upload by : Mollie Blount

Report this link

Download PDF

Transcription

Structured LearningWith Applications to Natural Language ProcessingRyan le.comJoint work with:Fernando Pereira, KobyCrammer, Jeff Reynar, KerryHannan, Tyler Neylon

Classification Goal: learn a function F : X ! Y– Or learn P(Y X), s(X,Y)– F(X) argmaxY P(Y X), F(X) argmaxY s(X,Y) Supervised setting– Assume training set T (x1, y1), (x2, y2), (x T , y T )– Where each (xi, yi) 2 X Y Atomic Classification– Binary: Y {-1, 1}– Multiclass: Y {0, 1, , M} Atomic classification suitable for many problems– Text classification and categorization, object rec.

Structured Classification Some problems require classification of structuredoutputs For example, part-of-speech taggingx John hit the ball with the sticky NV D NPDN Outputs, y, are structured set of atomic decisions Output space has exponential size relative to input Q: Why not just reduce to atomic classification by predicting eachcomponent independently? A: This assumes an unrealistic independence assumptionbetween the components of the output

Other examples Syntactic parsing:– Input x is a sentence– Output y is a syntactic parse tree Machine Translation– Input x is a sentenceI don’t speak french– Output y is another sentence Summarization– Input x is a set of documents– Output y is a set of relevant sentences Many more examples Je ne parle pas francais

Hidden Markov Models (DBNs) Generative sequence labeling modelsyx Prominent in speech, NLP, vision, etc.Easy to train in supervised settingEasy to incorporate hidden (missing) valuesCan be extended beyond sequences (DBNs)Relatively good empirical performance

Discriminative Models For many text classifications problems,discriminative models have superior performance Discriminative models optimize a function morecorrelated with prediction accuracy– SVMs: minimize prediction error– Logistic Regression: maximize likelihood of cond. dist.– Perceptron: min error– Unlike generative models, do not model input x– No need to make drastic independence assumptionsover different properties of x

Discriminative Modelsyx P(y1 ym x)

Some Definitions x is always an input, y always the output– x x1 x2 xn (usually sequence of words)– y y1 y2 ym (output variables, seq n m) Gen(x) is the space of valid outputs for input x Φ(x, y) Rp maps input/output pairs to a highdimensional feature representation w Rp is a corresponding weight vector

Conditional Random Fields (Lafferty et al. 01)Linear classifiers:P(y x) Z-1 ew · Φ(x,y)Z y* ew · Φ(x,y*)Training: set w to maximize conditional log likelihood oftraining data (x1,y1), (x2,y2), w arg max log P(yi xi)w(xi,yi)

Conditional Random Fields (Lafferty et al. 01)Inferencearg max P(y x) arg max Z-1 ew · Φ(x,y)y Gen(x)y Gen(x) arg max w · Φ(x,y)Gen(x) is exp in size of xTrainingy Gen(x)w arg max log P(yi xi)w(xi,yi)(Convex) L [ Φ(xi,yi) - EP(Y xi) Φ(xi,Y) ](xi,yi)EP(Y xi) Φ(xi,Y) P(y xi) Φ(xi,y)y Gen(xi)!Just like multinomial logistic regression!

Markov Random Fields P(y x) factors over clique “potentials” (Hammersleyand Clifford ‘71)P(y x) y1-1Zy2e w · Φ (x,c)c yyny3xnSequences (1st order)P(y x) -1Ze w · Φ (x,yi-1,yi)i 1

CRFs The fact that the probability distribution factors bycliques in the graph is useful E.g., sequence labeling– Inference: Viterbi (like HMMs)– Training: Forward-backward algorithm (like HMMs) In general– Junction tree algorithms– Trees: Belief propagation– Approximations: loopy BP, variational methods, MC Good overview: Sutton and McCallum 2006– http://www.cs.umass.edu/ casutton/publications/crf-tutorial.pdf

Large-Margin Learning SVMs, AdaBoost, winnow and other LM classifiersoften represent state-of-the art performance Other benefits– Easily kernalizable– Understood generalization error bounds What about for structured learning problems?

Binary Large-Margin Classification1min w 22Such that:yi(w Φ(xi) b) 1 (xi, yi) T

Large-Margin Multiclass ClassificationCrammer and Singer 2001min 1 w 22w s(xΦ(xi,i,yyi)i) -- s(xw i, Φ(xy) i, y)L(y i, y)L(yi, y)Such that: y, (xi, yi) TL(y1, y2) 0L(y1, y2) 0 iff y1 y2O( T Y ) constraints

Large-Margin Structured ClassificationAltun et al. ‘03, Taskar et al. ‘03min 1 w 22Such that:s(xi, yi) - s(xi, y) L(yi, y) y Gen(xi), (xi, yi) T- Exponential number of constraints!!O( T Gen(x)) constraintsL(yi, y) depends on problem- Sequence labeling usually hamming loss- F1, Blue, Rouge, labeled P/R,

How do we solve this?1min w 22Such that:s(xi, yi) - s(xi, y) L(yi, y) y Gen(xi), (xi, yi) T Taskar et al. ‘03 (M3Ns) Factor constraints by output structure (like CRFs) Assumes loss factors relative to output– Not always true Time consuming– No real large scale public implementations (yet)

How do we solve this?1min w 22Such that:s(xi, yi) - s(xi, y) L(yi, y) y Gen(xi), (xi, yi) T Tsochantaridis et al. ‘04 Sample constraints smartly (cutting plane alg.) Assumes loss factors relative to output– But empirically not that important Time consuming– A little better than M3Ns

Online Large-Margin Structured Learning Basically a large-margin perceptron algorithmT (x1,y1), (x T ,y T )MIRA0w 0Crammer & Singer ‘03repeat N timesMcDonald ‘05for t 1 . T Consider instance (xt,yt)w(i 1) arg minw* 1 w* - wi 22s(xt, yt) - s(xt, y) L(yt, y)s.t. y k-Best-Gen(xt)w wN* T return wMake smallchange to wEach QP hasjust k constraints

Online Large-Margin Structured Learning Advantages of MIRA– Fast (scales better than almost all SL algs)– Relies only on inference (e.g., Viterbi w/ f.o. Markov) Robust to approximate inference (McDonald ‘06)– Easy to implement– Arbitrary loss functions– Provable convergance, generalization error Disadvantages– Needs k-best inference But approximations work fine– Sub optimal parameter settings

Preliminary Experiments (Seq. Classification)

Review Structured Learning generalizes m.c. learning Use factorizations of output structure to:– Run inference (Viterbi, CKY, Junction Tree)– Learn parameters (CRFs, M3Ns) Conditional Random Fields– Max likelihood (must compute normalization Z) Large-Margin (M3Ns, MIRA, Cutting plane)– Min error– Often only needs inference– Kernels, slack variables, (no time!)

Applications to NLP Syntactic Dependency Parsing– 14 languages Sentiment Analysis– Bag-of-words vs fine-to-coarse models Summarization– Multi-document summarization– Dialogue summarization Meetings, IM, discussion

Syntactic Dependency Parsing Given a sentence, construct head-mod relationship Parse is weakly connected graph G– Usually satisfies tree constraint (single parent)Projective Parsing (English, Chinese)Non-Projective Parsing (Dutch, Czech, German)

Syntactic Dependency Parsing Let x x1, , xn be a sentence Let y be a dependency parse (tree)– Say (i,j) y if xi is the head of xj (or xj modifies xi) Let s(x,y) be the score of parse y for x Factor score by dependency edges(x,y) (i,j) y s(x,i,j) (i,j) y w Φ(x,i,j)s(x,root,hit) s(x,hit,John) s(x,hit,ball) s(x,hit,with) s(x,ball,the) s(x,with,bat) s(x,bat,the)

Inference Algorithms Non-projective case– Maximum r-arborescence– Chu-Liu-Edmonds Algorithm O(n2)– K-best O(kn2) Projective case– Eisner algorithm O(n3)– K-best O(k*log(k)*n3) Higher order edge factorization– Non-projective: NP-hard (some good approximations)– Projective still remains polynomial

Feature Representation Φ Identity of head and modifierPart-of-speech of head and modifierPart-of-speech of surrounding wordsPart-of-speech of words between head andmodifier Distance between head and modifier Direction of dependency edge Prefix, suffix of head and modifier

ExperimentsHighest reporting systemat CoNLL 2006

Sentiment Analysis[] [][][] Classify sentences as positive, negative, nosentiment, fact, opinion, etc. Classify document, paragraph, etc. as well Useful in many mining problems– Translation, summarization, Q&A,

Fine-to-Coarse Sentiment Analysis Most previous work analyzes sentiment at:– Phrase– Sentence– ParagraphMainly bag-of-wordsMulticlass classificationPang et al. ‘03, ‘04, ‘06– Document What if we want sentiment on multiple levels? Jointly learn sentiment on all levels– Higher levels influence lower level decisions “It is a lot harder to use than it looks” - pos or neg?– Lower levels help with high level decisions (Pang ‘04) Ignore neutral sentences, discourse cues (beginning / end)

Sentence-Document Level ModelDocument labelSentence labelsInput: sentences Document label dependent on sentences Sentences dependent on neighbors and doc label Models directly the relationship between differentlevels

Inference Let x s1, , sn be sentences of a document Let y (yd, ys) (yd, ys1, , ysn) be a jointdocument-sentence labeling of sentiment Let s(x,y) be the score of a labeling for input x Factor the score over cliquess(x,y) i s(x, ysi, ysi 1, yd) i w Φ(x, ysi, ysi 1, yd)y arg maxy s(x,y)

Inference Note that if we fix the label, yd, then inference isreduced to sequence classification– I.e., sentence label model forms a chain Algorithm for joint case– For each yd Find ys ys1, , ysn maximizing s(x, (yd, ys)) (with Viterbi) If s(x, (yd, ys)) better than previous best, record Linear in doc labels, quadratic in sentence labels K-best easy (just merging k-best lists)

Beyond Two-Level Models E.g., sentence, paragraph, document Nested hierarchical models– Inference is tractable (bottom-up dynamic programming) Can have dependencies across levels– But may make inference hard

Experiments Document-Classifier– Bag-of-words document classifier Sentence-Classifier– Bag-of-words sentence classifier Sentence-Structured– Sentence Classifier, with sequential structure Joint-Structured (models I just talked about)

The End CRFs– Sutton and McCallum. An Introduction to ConditionalRandom Fields for Relational Learning. 2006.– http://www.inference.phy.cam.ac.uk/hmw26/crf/ Large Margin– Ben Taskar. Learning Structured Prediction Models: ALarge Margin Approach. Stanford University, CA,December 2004. Learning with Structured Outputs– BakIr et al, eds. Predicting Structured Data. The MITPress, 2006.

Related Documents:

Structured Settlements - Ringler Associates

Key takeaway: After being educated on the difference between a lump-sum and a structured settlement, 73 percent of Americans would choose a structured settlement payout when they received their settlement in a personal injury case. Chose structured settlement Chose lump sum CHART 4 - REASONS FOR CHOOSING A STRUCTURED SETTLEMENT

13 Views

1y ago

Moving from structured to open inquiry: Challenges and limits

Moving from structured to open inquiry: Challenges and limits 385 Structured, guided, and open inquiry approaches: advantages and disadvantages The type of inquiry that is more relevant to the teaching and learning facilities available in schools remains controversial among educators. Some teachers prefer using structured or

43 Views

3y ago

Understanding Rheology of Structured Fluids

Understanding Rheology of Structured Fluids Keywords: structured fluids, sol gel transition, solution, yield stress, thixotropy, viscosity, mechanical stability, shelf life, flow curve, inks, cosmetics, dispersions, food 1 AAN016 Figure 1: Viscosity of a structured fluid as a function of shear rate and particle concentration1

59 Views

3y ago

Combining Unstructured, Fully Structured and Semi ...

Other examples are MySql3 and Postgres indixes4. 2.2 Fully Structured Data Fig.1: Sample Table in a Relational Database System Fully structured data follows a prede ned schema. "An instance of such a schema is some data that conforms to this speci cation,"[2]. A typical example for fully structured

23 Views

2y ago

GoldenTree Asset Management

Source: SIFMA for Structured Finance issuance; SP LCD, Barclays Research for Corporate issuance Structured Products issuance is a fraction of Corporate issuance post 2008 crisis Structured Finance Issuance US & Europe Structured Finance I

22 Views

2y ago

ALARACT 288/2010 STRUCTURED SELF-DEVELOPMENT (SSD) …

4. effective 1 apr 13, structured self development 1 (ssd-1) is a prerequisite to attend the warrior leader course (wlc). effective 1 jun 13, structured self development 3 (ssd-3) is a prerequisite to attend the senior leader course (slc). effective 1 jun 13, structured self development 4 (ssd-4) is a prerequisite to

16 Views

2y ago

ISSN Print: ISSN Online: Effect of structured teaching ...

Nola.J. Pender Health Promotion Model. The tools used were structured knowledge questionnaire and semi–structured 5 point Likert scale on attitude followed by structured teaching programme on preconception care. The result revealed that the mean pre–test score on knowledge and attitude was

14 Views

2y ago

Agile Software Development with Scrum - sosy-lab.org

Agile Software Development with Scrum An Iterative, Empirical and Incremental Framework for Completing Complex Projects (Slides by Prof. Dr. Matthias Hölzl, based on material from Dr. Philip Mayer with input from Dr. Andreas Schroeder and Dr. Annabelle Klarl) CHAOS Report 2009 Completion of projects: 32% success 44% challenged 24% impaired Some of the reasons for failure: Incomplete .

39 Views

3y ago

Recent Views

Novell SUSE Linux Package Description and Support Level .

aspell-eo An Esperanto Dictionary for Aspell L2 aspell-es A Spanish Dictionary for ASpell L2 aspell-et An Estonian dictionary for aspell L2 aspell-fa A Persian dictionary for aspell L2 aspell-fi Finnish Dictionary Package L2 aspell-fo A Faroese Dictionary for ASpell L2 aspell-fr A French Dictionary for ASpell L2 aspell-ga An Irish Dictionary .

2y ago

348 Views

Dictionary of Aviation - THE AIRLINE PILOTS

Dictionary of Accounting 0 7475 6991 6 . Dictionary of Computing 0 7475 6622 4 Dictionary of Economics 0 7136 8203 5 Dictionary of Environment and Ecology 0 7475 7201 1 Dictionary of Food Science and Nutrition 0 7136 7784 8 Dictionary of Human Resources and Personnel Management 0 7136 8142 X

2y ago

162 Views

12 PUBLIC LAW AND PRIVATE LAW - Home: The National .

INTRODUCTION TO LAW MODULE - 3 Public Law and Private Law Classification of Law 164 Notes z define Criminal Law; z list the differences between Public and Private Law; and z discuss the role of Judges in shaping Law 12.1 MEANING AND NATURE OF PUBLIC LAW Public Law is that part of law, which governs relationship between the State

3y ago

745 Views

Dr. Ram Manohar Lohiya National Law University, Lucknow

2. Health and Medicine Law 3. Int. Commercial Arbitration 4. Law and Agriculture IXth SEMESTER 1. Consumer Protection Law 2. Law, Science and Technology 3. Women and Law 4. Land Law (UP) Xth SEMESTER 1. Real Estate Law 2. Law and Economics 3. Sports Law 4. Law and Education **Seminar Courses Xth SEMESTER (i) Law and Morality (ii) Legislative .

3y ago

496 Views

Oxford and the Dictionary - Oxford English Dictionary

What makes an Oxford Dictionary? People find dictionary-making fascinating. The 250th anniversary of Samuel Johnson’s Dictionary in 2005 was widely celebrated, and the recent BBC television series Balderdash and Piffle had a huge response to its call to viewers to help track down elusive word and phrase or

2y ago

210 Views

Cambridge Essential English Dictionary

These Dictionary Guide Worksheets are downloadable versions of the Guide to the Dictionary presented in the Cambridge Essential English Dictionary, Second Edition. The Guide is designed to help you develop skills in using a dictionary. The worksheets are grouped as five separate units, whi

2y ago

516 Views

The Interactive Arabic Dictionary: Another Collaboratively .

the Interactive Arabic Dictionary” [11], and “Conceptual Design of the Interactive Arabic Dictionary” [12], were the main studies used in HIAST to implement the Interactive dictionary. 2.1. Objectives IAD is a Monolingual dictionary (Arabic-Arabic), targeted to

2y ago

333 Views

Dictionary-guided Scene Text Recognition

A dictionary is an explicit language model, and the ben-eﬁts of a dictionary for scene text recognition are well es-tablished. In most previous works, a dictionary was used to ensure that the output sequence of characters is a legit-imate word from the dictionary, and it improved the accu-r

2y ago

313 Views

Going Online with a German Collocations Dictionary - unibas.ch

dictionary articles on two levels: a minimalistic view for the search and navigation stage and a more detailed view once a collocation is found. Keywords: online dictionary, collocations, dictionary design, learners' dictionary, German language . 1. Introduction Many dictionaries are available on the Web today. However, as yet there areno well-

7m ago

66 Views

A Fault Dictionary-Based Fault Diagnosis Approach for CMOS Analog .

Step 5: Fault dictionary construction: The fault dictionary is a collection of potential faulty and fault-free responses. The signatures obtained will be stored in the dictionary. This dictionary involves for each fault a correspondence between the faulty circuit responses and the defect sites.

4m ago

56 Views

On Entries for Neologisms in English-Chinese Learner's Dictionaries

A New English Chinese Dictionary of Journalism (2007) by Hu Zhiyong, An English -Chinese Dictionary of Neologisms (2009) by Li Mingyi, English-Chinese Neologism Dictionary (2013) by Wu Xuemei, A Dictionary of New Chinese Phrases in English (2015) by China Daily and A Chinese-English Dictionary of New Words and Expressions (2015) by Wu .

4m ago

63 Views

Companies Law - Cayman Islands dollar

Law 1 of 1971-15th December, 1970 Law 7 of 2000- 20th July, 2000 Law 7 of 1973-28th June, 1973 Law 5 of 2001-20th April, 2001 Law 24 of 1974-22nd November, 1974 Law 10 of 2001-25th May, 2001 Law 25 of 1975-9th December, 1975 Law 29 of 2001-26th September, 2001 Law 19 of 1977-10th November, 1977 Law 46 of 2001-14th January, 2002

3y ago

454 Views

It’s the Law!

ciples stated in Boyle’s Law, Charles’ Law, Gay-Lussac’s Law, Henry’s Law, and Dalton’s Law. Students will be able to explain the application of Boyle’s Law, Charles’ Law, Gay-Lussac’s Law, Henry’s Law, and Dalton’s Law to observations or events related to SCUBA diving. MateriaLs None audio/visuaL MateriaLs None teachinG tiMe

2y ago

378 Views

WHAT LAW IS ? An Introduction to Law

common law system civil law system!! sources of law in civil law !! a1. primary: statutes (written law) enacted by legislative power are the principal source of law. ! a2. two subsidiary sources of law: ! a2.1 administrative regulations a.2.2 customs!! ! sources of law in common law !!! b1. two primary sources of

2y ago

385 Views

Ross E. Davies, George Mason University School of Law

Jan 15, 2012 · 4. Bryan A. Garner, Preface to the First Pocket Edition of BLACK‘S LAW DICTIONARY, reprinted in BLACK‘S LAW DICTIONARY vii (3d Pocket ed. 2006). Garner is the current editor-in-chief of Black‟s Law Dictionary and (even more surely than was Black in his own time) the most influential contemporary scho-lar of American legal language. 5.

2y ago

297 Views

Structured Learning

It looks like you're using an ad-blocker