Active Imitation Learning With Noisy Guidance

2y ago
42 Views
3 Downloads
1.29 MB
17 Pages
Last View : 3d ago
Last Download : 3m ago
Upload by : Jenson Heredia
Transcription

Active Imitation Learningwith Noisy GuidanceKianté Brantley,1 Amr Sharaf,1 Hal Daumé III 1,21University of Maryland, 2 Microsoft Research

Structured Prediction Problemsfor example, Named Entity Recognition:WordLabelAfterOcompletingOhisOPh.D.O,O .Expert*

Structured Prediction ProblemsProblem:for example, Named Entity oreduceLabelexpert annotation cost for structureprediction problems?OcompletingOhisOPh.D.O,O .*

Imitation LearningExpert Demonstrator: (Annotator)Named Entity RecognitionInput:After completing his Ph.D. , Ellis worked at Bell Labs from 1969 to 1972 on probability theory.Prediction:O- statescombine input with previous prediction- actionso, per, org, misc, loctraining set:goal:D {(state, actions)} from expertlearn agentπθ (s) - aπ**

Imitation Learning using DAggerInitialize DatasetInitialize π1̂Di 1 to N doπi βiπ* (1 βi)πîforPro:Sample T-step trajectory from πiNamed Entity RecognitionInput:πiAfter completing his Ph.D., Ellis workedDi {(s,Theπ*(s))}policy is able to learn from its own*stateAggregate dataset D Ddistribution. Dî on DTrain classifier πi 1Get datasetOOOOOOOO PEROOO PEROStéphane Ross, Geoff J. Gordon, and J. Andrew Bag- nell. 2011. A reduction of imitation learning and structured prediction to no-regret online learning. In AI-Stats.

Imitation Learning using DAggerDInitialize DatasetInitialize π1̂i 1 to N doπi βiπ* (1 βi)πîforCon:Name Entity RecognitionInput:πiSample T-step trajectory from πiForeverystatethatwevisitedwequeriedGet dataset Di {(s, π*(s))}an expert for the optimal action.D D Dî on Dπi 1Aggregate datasetTrain classifier*After completing his Ph.D., Ellis workedOOOOOOOO PEROOO PEROStéphane Ross, Geoff J. Gordon, and J. Andrew Bag- nell. 2011. A reduction of imitation learning and structured prediction to no-regret online learning. In AI-Stats.

Active LearningKey Idea: The learner queries the expert for labels — only when it is uncertainFormallyfor each trial t 1,2,.observe instance xt ℝ12set pt̂ πθ(yt xt) πθ(yt xt) (Margin between the most likely and the second most likely labels)predict with yt̂ argmax(πθ)draw a Bernoulli variableifZt 1query labelytZtof parameterbb pt̂ (Confidence parameter b )and perform update[T. Scheffer, C. Decomain, and S. Wrobel. Active hidden Markov models for information extraction. Proceedings of the International Conference on Advances in IntelligentData Analysis (CAIDA), 2001.][Nicolò Cesa-Bianchi, Claudio Gentile, and Luca Zaniboni. 2006. Worst-case analysis ofselective sampling for linear classification. JMLR.]

Leveraging Active LearningKey Idea: The learner queries the expert for labels — only when it is uncertainFormallybfor each trial t 1,2,.observe instance xt ℝbig - increases the probability of requesting a label12the most likely and the second most likely labels)set pt̂ πθ(yt xt) πθ(yt xt) (Margin betweensmall - decreases the probability of requesting a labelpredict with yt̂ argmax(πθ)Confidence parameter:draw a Bernoulli variableifZt 1query labelytZtof parameterbb pt̂ (Confidence parameter b )and perform update[T. Scheffer, C. Decomain, and S. Wrobel. Active hidden Markov models for information extraction. Proceedings of the International Conference on Advances in IntelligentData Analysis (CAIDA), 2001.][Nicolò Cesa-Bianchi, Claudio Gentile, and Luca Zaniboni. 2006. Worst-case analysis ofselective sampling for linear classification. JMLR.]

Active Learning with DAggerInitialize DatasetInitialize π1̂D 1 to N doπi βiπ* (1 βi)πîfor iQuestion:πSample T-step trajectory fromCanifor̂πi 1onDInput:After completing his Ph.D., Ellis workeπieven further?reduce expert queriest 1 to T12set pt̂ πθ(yt st) πθ(yt st)bdraw Bernoulli variable Zt of parameterb pt̂ if Zt 1Get dataset Dt {(st, π*(st))}Aggregate dataset D D DtTrain classifierName Entity RecognitionO*OOOOO PEROPERO

Our Approach: LeaQI(Learning to Query for Imitation)Key Ideas: - We assume access to a noisy heuristic function- Use a disagreement classifier to decide if we shouldquery the expert or the heuristic function- Train the disagreement classifier using the AppleTasting framework

Apple Tasting FrameworkOne-Side Feedback ProblemLearner encounters apples one by oneGoal is to avoid tasting to many bad apples and avoid throwing away to many good apples(reduce false negative rates)Problem is the learner can only identify the good and bad apples by tasting themLearner only gets feedback for apples that it tastesLearner does not feedback for apples that it throws away

One-Sided Feedback LearningHeuristic FunctionNamed Entity RecognitionInput:πNoisy, bias and cheapAfter completing his Ph.D. , Ellis worked at Bell Labs from 1969 to 1972 on proLeaQI One-Side Feedback OORG OOOOOLearn difference classifier to predict when aHeuristic and Expert disagreeOO ORG ORGOOOOOODifference classifier only gets feedback when itpredicts disagree and we query the expertDifference classifier does not get feedback whenit predicts agree and we query the heuristicfunction

LeaQIdraw Bernoulli variable Zt of parameterifZt 1d ̂ h (s)iiName Entity RecognitionInput:πiSet difference classifier(s) , dî )hAggregate dataset D D {(s, π (s))}If AppleTaste( s , πelsebb pt̂ hGazetteer:D D {(s, π*(s))}Aggregate dataset S S {(s, π h(s), d,̂ d)}Aggregate dataset̂πi 1DTrain difference classifier hi 1Train classifierAfter completing his Ph.D. , Ellis work*Difference Classifier:ononSOPEROOOOOOOOOOPERYNYNYYOO

Experiment cGazeteerHuer. QualityP88%, R27%KeyphraseEnglishSemEval 2017Task 10UnsupervisedmodelP20%, R44%POSModern GreekUniversalDependenciesDictionaryWiktionary67% acc

Q1Active vs PassiveQ2Heuristic as features vs PolicyQ3Difference Classifier EfficacyQ4Apple Tasting EfficacyQ5Robustness to Poor a HeuristicExperiment Results

We showed that the Apple Tasting framework has practicalbenefitsWe showed a relationship between using a heuristic functionand One-side feedback learningWe introduced a new algorithm and evaluated it on 3 task

Thank you!

Get dataset D t {(s t,π*(s t))} D Aggregate dataset D D D t Train classifier π i ̂ 1 on D p̂t π θ(y1 set t s t) π θ(y t 2 s t) draw Bernoulli variable Z t 1 of parameter b b p̂t if Z t π i for t 1to T * After completing his Ph.D., Ellis worked at Bell Labs from 1969 to 1972 on probability theory. Name Entity .

Related Documents:

The Imitation of Christ. As editor and translator he was not without faults, but thanks to him the . Imitation. became and has remained, after the Bible, the most widely read book in the world. It is his edition that is The Imitation of Christ Thomas, à KempisFile Size: 651KBPage Count: 127Explore further[PDF] The Imitation of Christ Book by Thomas a Kempis Free .blindhypnosis.comThe Imitation of Christ by à Kempis Thomas - Free Ebookgutenberg.orgThe Imitation of Christ by Thomas A. Kempisd2y1pz2y630308.cloudfront.netTHE IMITATION OF CHRIST - Catholic Planetcatholicplanet.comRecommended to you b

In the literature, the solutions of learning with noisy la-bels can be classified into two types: 1) detecting noisy la-bels and then cleansing potential noisy labels or reduce theirimpacts in the following training; 2) directly training noise-robust models with noisy labels.

I hope you find, after reading Imitation of Christ, that being Christ in your world is not only a. I of Imitation of Christ. INTRODUCTORY NOTE. Of the imitation of Christ

require real-time perception and reaction such as humanoid robots, self-driving vehicles, human computer interaction and computer games to name a few. However, specialized algorithms are needed to effectively and robustly learn models as learning by imitation poses its own set of challenges. In this paper, we sur-vey imitation learning methods and present design options in different steps of .

Image Deblurring with Blurred/Noisy Image Pairs Lu Yuan1 Jian Sun2 Long Quan2 Heung-Yeung Shum2 1The Hong Kong University of Science and Technology 2Microsoft Research Asia (a) blurred image (b) noisy image (c) enhanced noisy image (d) our deblurred result Figure 1: Photographs in a low light environment. (a) Blurred image (with shutter speed of 1 second, and ISO 100) due to camera shake.

aggregating individual sentiment labels in social media, where users under various scenarios ( e:g: , character and preference) may express invalid or noisy sentiments to different topics. 3 Noisy Label Aggregation Framework 3.1 Problem Denition The problem of noisy label aggregation is dened as follows: Given N documents (instances) anno-

Test BLEU Score We first compare the imita-tion models to their victims using in-domain test BLEU. For all settings, imitation models closely match their victims (Test column in Table1). We also evaluate the imitation models on OOD data to test how well they generalize compared to their vic-tims. We

THE IMITATION OF CHRIST BY THOMAS A KEMPIS TRANSLATED FROM THE LATIN INTO MODERN ENGLISH. 3 FOREWORD IN PREPARING this edition of The Imitation of Christ, the aim was to achieve a simple, readable text which would ring true to those who are already lovers