Multi-Task Learning & Transfer Learning Basics

1y ago

12 Views

3 Downloads

5.43 MB

39 Pages

Last View : 20d ago

Last Download : 3m ago

Upload by : Cannon Runnels

Report this link

Download PDF

Transcription

Multi-Task Learning & Transfer Learning BasicsCS 3301

LogisticsOptional homework 0 due Monday 9/27.PyTorch review session tomorrow at 6:00 pm PT.Project guidelines posted.Oﬃce hours start today2

Plan for TodayMulti-Task Learning- Problem statement- Models, objectives, optimization- Challenges- Case study of real-world multi-task learningTransfer Learning- Pre-training & ﬁne-tuningGoals for by the end of lecture:- Know the key design decisions when building multi-task learning systems- Understand the diﬀerence between multi-task learning and transfer learning- Understand the basics of transfer learning3

Multi-Task Learning4

Some notationWhat is a task?A task:θ(more formally this time)𝒯i {pi(x), pi(y x), ℒi}data generating distributionsTypical loss: negative log likelihoodℒ(θ, 𝒟) 𝔼(x,y) 𝒟[log fθ(y x)]Corresponding datasets:5catatlynxer ctiglength of paperfθ(y x)Single-task learning: 𝒟 {(x, y)k}[supervised]min ℒ(θ, 𝒟)tigyxerθtr𝒟itest𝒟itrwill use 𝒟i as shorthand for 𝒟i :

Examples of TasksA task:𝒯i {pi(x), pi(y x), ℒi}data generating distributionsCorresponding datasets:tr𝒟iMulti-task classiﬁcation: ℒi same across all taskse.g. per-languagehandwriting recognitione.g. personalizedspam ﬁltertest𝒟itrwill use 𝒟i as shorthand for 𝒟i :Multi-label learning: ℒi , pi(x) same across all taskse.g. CelebA attribute recognitione.g. scene understandingWhen might ℒi vary across tasks?6-mixed discrete, continuous labels across tasksmultiple metrics that you care about

θlength of paperyxsummary of paperpaper reviewzifθ(y x) fθ(y x, zi)task descriptore.g. one-hot encoding of the task indexor, whatever meta-data you have-personalization: user features/attributeslanguage description of the taskformal speciﬁcations of the taskVanilla MTL Objective: minDecisions on the model, the objective, and the optimization.How should we condition on zi ?What objective should we use?How to optimize our objective?7θT i 1ℒi(θ, 𝒟i)

ModelHow should the model be conditioned on zi?ObjectiveHow should the objective be formed?What parameters of the model should be shared?Optimization How should the objective be optimized?8

Conditioning on the taskLet’s assume zi is the one-hot task index.Question: How should you condition on the task in order to share as little as possible?9

Conditioning on the taskziy1xmultiplicative gatingy2x j xy yT— independent training within a single network!10 parameterswith no shared1(zi j)yj

The other extremexyziConcatenate zi with input and/or activationsall parameters are shared(except the parameters directly following zi, if zi is one-hot)11

An Alternative View on the Multi-Task ArchitectureshSplit θ into shared parameters θ and task-speciﬁc parameters θThen, our objective is:Choosing how tocondition on ziTθ sh,θ 1, ,θ T i 1minequivalent to12shiiℒi({θ , θ }, 𝒟i)Choosing how & whereto share parameters

Conditioning: Some Common Choices1. Concatenation-based conditioning2. Additive conditioningziziThese are actually equivalent!Question: why are they the same thing? (raise your hand)Concat followed by afully-connected layer:Diagram sources: distill.pub/2018/feature-wise-transformations/13

Conditioning: Some Common Choices3. Multi-head architecture4. Multiplicative conditioningRuder ‘17Why might multiplicativeconditioning be a good idea?-more expressive per layerrecall: multiplicative gatingMultiplicative conditioning generalizesindependent networks and independent heads.Diagram sources: distill.pub/2018/feature-wise-transformations/14

Conditioning: More Complex ChoicesCross-Stitch Networks. Misra, Shrivastava, Gupta, Hebert ‘16Multi-Task Attention Network. Liu, Johns, Davison ‘18Deep Relation Networks. Long, Wang ‘15Perceiver IO. Jaegle et al. ‘2115

Conditioning ChoicesUnfortunately, these design decisions arelike neural network architecture tuning:-problem dependentlargely guided by intuition orknowledge of the problemcurrently more of an art than ascience16

ModelHow should the model be conditioned on zi?ObjectiveHow should the objective be formed?What parameters of the model should be shared?Optimization How should the objective be optimized?17

Vanilla MTL Objective minθ-How to choose wi?T i 1ℒi(θ, 𝒟i)TOften want to weightminwiℒi(θ, 𝒟i)tasks diﬀerently:θ i 1dynamically adjustthroughout training-a. various heuristicsencourage gradients to have similar magnitudes(Chen et al. GradNorm. ICML 2018)b. optimize for the worst-case task lossmin max ℒi(θ, 𝒟i)θi(e.g. for task robustness, or for fairness)18manually based onimportance or priority

ModelHow should the model be conditioned on zi?ObjectiveHow should the objective be formed?What parameters of the model should be shared?Optimization How should the objective be optimized?19

Optimizing the objectiveVanilla MTL Objective: minθT i 1ℒi(θ, 𝒟i)Basic Version:1. Sample mini-batch of tasks ℬ {𝒯i}2. Sample mini-batch datapoints for each task̂ ℬ) 3. Compute loss on the mini-batch: ℒ(θ,b𝒟i 𝒟i 𝒯k ℬbℒk(θ, 𝒟k )4. Backpropagate loss to compute gradient θ ℒ̂5. Apply gradient with your favorite neural net optimizer (e.g. Adam)Note: This ensures that tasks are sampled uniformly, regardless of data quantities.Tip: For regression problems, make sureyourtasklabelsareonthesamescale!20

Challenges21

Challenge #1: Negative transferNegative transfer:Sometimes independent networks work the best.Multi-Task CIFAR-100}multi-head architectures} cross-stitch architecture} independent trainingrecent approaches(Yu et al. Gradient Surgery for Multi-Task Learning. 2020)Why?--optimization challenges- caused by cross-task interference- tasks may learn at diﬀerent rateslimited representational capacity- multi-task networks often need to be much largerthan their single-task counterparts22

If you have negative transfer, share less across tasks.It’s not just a binary decision!minθ sh,θ 1, ,θTshiℒi({θ , θ }, 𝒟i) T i 1T tt′ θ θ t′ 1“soft parameter sharing”y1 - - - - xconstrained weightsyTx allows for more fluid degrees of parameter sharing- yet another set of design decisions / hyperparameters23

Challenge #2: OverﬁttingYou may not be sharing enough!Multi-task learning - a form of regularizationSolution: Share more.24

Challenge #3: What if you have a lot of tasks?Should you train all of them together? Which ones will be complementary?The bad news: No closed-form solution for measuring task similarity.The good news: There are ways to approximate it from one training run.Fifty, Amid, Zhao, Yu, Anil, Finn. Eﬃciently Identifying Task Groupings for Multi-Task Learning. 202125

Plan for TodayMulti-Task Learning- Problem statement- Models, objectives, optimization- Challenges- Case study of real-world multi-task learningTransfer Learning- Pre-training & ﬁne-tuning26

Case studyGoal: Make recommendations for YouTube27

Case studyGoal: Make recommendations for YouTube-Conflicting objectives:-videos that users will rate highlyvideos that users they will sharevideos that user will watchimplicit bias caused by feedback:user may have watched it because it was recommended!28

Framework Set-UpInput: what the user is currently watching (query video) user features1. Generate a few hundred of candidate videos2. Rank candidates3. Serve top ranking videos to the userCandidate videos: pool videos from multiplecandidate generation algorithms- matching topics of query video- videos most frequently watched with query video- And othersRanking: central topic of this paper29

The Ranking ProblemInput: query video, candidate video, user & context featuresModel output: engagement and satisfaction with candidate videoEngagement:Satisfaction:- binary classiﬁcation tasks like clicks- binary classiﬁcation tasks like clicking “like”- regression tasks for tasks related to time spent - regression tasks for tasks such as ratingWeighted combination of engagement & satisfaction predictions - ranking scorescore weights manually tunedQuestion: Are these objectives reasonable? What are some of the issues that might come up?30

The ArchitectureBasic option: “Shared-Bottom Model"(i.e. multi-head architecture)- harm learning when correlationbetween tasks is low31

The ArchitectureInstead: use a form of soft-parameter sharing“Multi-gate Mixture-of-Experts (MMoE)"Allow diﬀerent parts of the network to “specialize"expert neural networksDecide which expert to use for input x, task k:Compute features fromselected expert:Compute output:32

ExperimentsResultsSet-Up--Implementation in TensorFlow, TPUsTrain in temporal order, running trainingcontinuously to consume newly arriving dataOﬄine AUC & squared error metricsOnline A/B testing in comparison toproduction system- live metrics based on time spent, surveyresponses, rate of dismissalsModel computational eﬃciency matters33Found 20% chance of gating polarization duringdistributed training - use drop-out on experts

Plan for TodayMulti-Task Learning- Problem statement- Models & training- Challenges- Case study of real-world multi-task learningTransfer Learning- Pre-training & ﬁne-tuning34

Multi-Task Learning vs. Transfer LearningMulti-Task LearningTransfer LearningSolve multiple tasks 𝒯1, , 𝒯T at once.Solve target task 𝒯b after solving source task 𝒯aminθT i 1ℒi(θ, 𝒟i)by transferring knowledge learned from 𝒯aKey assumption: Cannot access data 𝒟a during transfer.Transfer learning is a valid solution to multi-task learning.(but not vice versa)Side note: 𝒯a may includemultiple tasks itself.Question: In what settings might transfer learning make sense?(answer in chat or raise hand)35

Transfer learning via ﬁne-tuningpre-trained parameters tr r L( , D )(typically for many gradient steps)What makes ImageNet good for transfer learning? Huh, Agrawal, Efros. ‘16Where do you get the pre-trained parameters?- ImageNet classiﬁca7on- Models trained on large language corpora (BERT, LMs)- Other unsupervised learning techniques- Whatever large, diverse dataset you might havePre-trained models oOen available online.36training datafor new taskSome common prac6ces- Fine-tune with a smaller learning rate- Smaller learning rate for earlier layers- Freeze earlier layers, gradually unfreeze- Reini7alize last layer- Search over hyperparameters via cross-val- Architecture choices maLer (e.g. ResNets)

Universal Language Model Fine-Tuning for Text Classiﬁca6on. Howard, Ruder. ‘18Fine-tuning doesn’t work well with small target task datasetsUpcoming lectures: few-shot learning via meta-learning37

RemindersNext time: Meta-learning problem statement, Black-box meta-learning, GPT-339

Plan for Today Multi-Task Learning -Problem statement-Models, objectives, optimization -Challenges -Case study of real-world multi-task learning Transfer Learning -Pre-training & ﬁne-tuning3 Goals for by the end of lecture: -Know the key design decisions when building multi-task learning systems -Understand the diﬀerence between multi-task learning and transfer learning

Related Documents:

Sound Waves Practice Problems PSI AP Physics 1 Name ...

PSI AP Physics 1 Name_ Multiple Choice 1. Two&sound&sources&S 1∧&S p;Hz&and250&Hz.&Whenwe& esult&is:& (A) great&&&&&(C)&The&same&&&&&

384 Views

3y ago

Elenco Libri della Biblioteca dei ragazzi 2012-13

Argilla Almond&David Arrivederci&ragazzi Malle&L. Artemis&Fowl ColferD. Ascoltail&mio&cuore Pitzorno&B. ASSASSINATION Sgardoli&G. Auschwitzero&il&numero&220545 AveyD. di&mare Salgari&E. Avventurain&Egitto Pederiali&G. Avventure&di&storie AA.&VV. Baby&sitter&blues Murail&Marie]Aude Bambini&di&farina FineAnna

219 Views

3y ago

Taico&Incentive&Services&Inc.&&&&&&&&&&&&&&&&&&&&845&228&4438 ...

The program, which was designed to push sales of Goodyear Aquatred tires, was targeted at sales associates and managers at 900 company-owned stores and service centers, which were divided into two equal groups of nearly identical performance. For every 12 tires they sold, one group received cash rewards and the other received

69 Views

10m ago

An Acoustic / Radar System for Automated Detection, …

Registration Data Fusion Intelligent Controller Task 1.1 Task 1.3 Task 1.4 Task 1.5 Task 1.6 Task 1.2 Task 1.7 Data Fusion Function System Network DFRG Registration Task 14.1 Task 14.2 Task 14.3 Task 14.4 Task 14.5 Task 14.6 Task 14.7 . – vehicles, watercraft, aircraft, people, bats

60 Views

2y ago

CHAPTER 6:UNIFORMCIRCULARM OTION ANDGRAVITATION

College"Physics" Student"Solutions"Manual" Chapter"6" " 50" " 728 rev s 728 rpm 1 min 60 s 2 rad 1 rev 76.2 rad s 1 rev 2 rad , π ω π " 6.2 CENTRIPETAL ACCELERATION 18." Verify&that ntrifuge&is&about 0.50&km/s,∧&Earth&in&its& orbit is&about p;linear&speed&of&a .

187 Views

3y ago

Music OERs for Grade Level - Maine.gov

theJazz&Band”∧&answer& musical&questions.&Click&on&Band .

168 Views

3y ago

I SEE PROBLEM-SOLVING-UKS2

WORKED EXAMPLES Task 1: Sum of the digits Task 2: Decimal number line Task 3: Rounding money Task 4: Rounding puzzles Task 5: Negatives on a number line Task 6: Number sequences Task 7: More, less, equal Task 8: Four number sentences Task 9: Subtraction number sentences Task 10: Missing digits addition Task 11: Missing digits subtraction

50 Views

2y ago

Mark Scheme (Results) June 2011 - Pearson qualifications

A02 x 2 One mark for the purpose, which is not simply a tautology, and one for development. e.g. The Profit and Loss Account shows the profit or loss of FSC over a given period of time e.g. 3 months, 1 year, etc. (1) It describes how the profit or loss arose – e.g. categorising costs between cost of sales and operating costs/it shows both revenues and costs (1) (1 1) (2) 3(b) AO2 x 2 The .

62 Views

3y ago

Recent Views

Aina Haina Shopping Center Hours of Operation 820 West .

Grilled Pork Chop & Tofu (Cơm Sườn Nướng & Đậu Hũ) - 14.25 15. Grilled Tofu (Cơm Đậu Hũ Nướng) - 14.25 16. Grilled Shrimp & Tofu (Cơm Tôm Nuong & Đậu Hũ) - 16.25 . Thai Tea (Trà Thai) - 4.50 . Iced Coffee (Cà Phê S

2y ago

190 Views

Yahoo: Failures - Harvard University

Stock closes at an all time low 8.11 Yahoo invested 1Bn in Alibaba Yahoo co-founder & CEO Jerry Yang steps down after 18 months Microsoft and Yahoo agree to search partnership 2008 Yahoo tries to buy Google for 3Bn. Google denied the offer 2009 Yahoo acquires many media companies Microsoft tries to buy Yahoo for 44.6Bn Yahoo denied offer .

1y ago

200 Views

Reviewers Guide – AT&T Yahoo! Go Mobile

Reviewers Guide – AT&T Yahoo! Go Mobile AT&T Yahoo! Go Mobile gives you access to a wide range of the Yahoo! services you . select download then select attachments to view and download the attachment. 4 . emoticons, audibles, voice IMs and attach photos to IM conversations. To use Yahoo! Messenger, click on Messenger in the Yahoo! Go .

2y ago

369 Views

MANAGERIAL FINANCE - GBV

of Managerial Finance page 2 Introduction to Managerial Finance 1 Starbucks—A Taste for Growth page 3 1.1 Finance and Business What Is Finance? 4 Major Areas and Opportunities in Finance 4 Legal Forms of Business Organization 5 Why Study Managerial Finance? Review Questions 9 1.2 The Managerial Finance Function 9 Organization of the Finance

3y ago

6.8K Views

Chapter 1 The roles of finance function in organisations

The roles of the finance function in organisations 4. The role of ethics in the role of the finance function Ethics is the system of moral principles that examines the concept of right and wrong. Ethics underpins an organisation’s sustained value creation. The roles that the finance function performs should be carried out in an .File Size: 888KBPage Count: 10Explore furtherRole of the Finance Function in the Financial Management .www.managementstudyguide.c Roles and Responsibilities of a Finance Department in a .www.pharmapproach.comRoles and Responsibilities of a Finance Department .www.smythecpa.comTop 10 – Functions of Business Finance in an om23 Functions and Duties of Accounting and Finance nded to you b

2y ago

335 Views

Yahoo Microsoft: A Horizontal Romance, or a Broken

News, Finance, Sports and Rivals Entertainment -Yahoo! Music, Movies, TV, Games, Video and omg! Life Style - Yahoo! Autos, Real Estate, Food, Tech, Kids, Health o Connected Life - Co-branded broadband, Yahoo! Moblie Digital Home, Desktop

1y ago

127 Views

2017-2018 GRANDE ÉCOLE MSc in MANAGEMENT

Descriptif des cours Course Outlines 10 Catalogue des cours/ Course Catalog 2017-2018 FIN: Finance/Finance A : Actuariat/Actuarial, Insurance E : Finance d’entreprise/Corporate Finance The course liste tables and the course outlines G : Finance générale/General Finance M : Finance de marché/Market Finance S : Synthèse/Synthesis IDS: Systèmes d’Information, Sciences de la Décision et .

3y ago

312 Views

Behavioral Finance and Wealth L Management

Introduction to Behavioral Finance CHAPTER1 What Is Behavioral Finance? Behavioral Finance: The Big Picture Standard Finance versus Behavioral Finance The Role of Behavioral Finance with Private Clients How Practical Application of Behavioral Finance Can Create a Successful Advisory Rel

2y ago

377 Views

Catalogue des Cours Course Catalog - ESSEC Business School

10 Catalogue des cours/Course Catalog 2021-2022 FIN: Finance/Finance E : Finance d'entreprise/Corporate Finance G : Finance générale/General Finance M : Finance de marché/Market Finance S : Synthèse/Synthesis IDS: Systèmes d'Information, Sciences de la Décision et Statistiques/ Information Systems, Decision Sciences and Statistics

1y ago

222 Views

kama sastry 2004@yahoo.co.uk in.groups.yahoo .

kama_sastry_2004@yahoo.co.uk up/hot-indi

2y ago

477 Views

IX. “Can You Buy Me Now?”: The Erratic Closing of the .

2016-2017 Developments in Banking law 547 by both parties, Verizon was supposed to purchase Yahoo’s shares for 4,825,800,000.965 Excluded from the transaction were Yahoo’s holdings in Yahoo Japan and Alibaba.966 The sale will end Yahoo’s twenty

2y ago

358 Views

Implementasi Rest Web Service Pada Aplikasi Pengolah Pesan Yahoo . - Core

REST Web Service: Gambar 3. Desain Sistem REST Web Service 3. HASIL DAN PEMBAHASAN 3.1 Gambaran Umum Aplikasi Pada Penelitian ini akan menghasilkan sebuah aplikasi pengolah pesan Yahoo Messenger dan Aplikasi REST Web Service. Aplikasi pengolah pesan Yahoo Messenger berfungsi untuk mengirim dan menerima pesan Yahoo Messenger.

1y ago

165 Views

SINGAPORE - Kelly Services

FINANCE Chief Financial Officer Degree/Master 15 20,000 25,000 Finance Assistant Diploma 1-3 2,800 3,400 Finance Controller Degree 10-15 10,000 18,000 Finance Director Degree 15 15,000 20,000 Finance Executive/ Senior Finance Executive Degree 2-5 3,000 6,000 Finance Manager/ Assistan

2y ago

527 Views

Ministries of Finance and Nationally Determined Contributions

Rodrigo Rojo, IDB Sr. Consultant and advisor to Ministry of Finance of Chile. Colombia German Romero Otalora and Laura Marcela Ruiz Daza — Office of the Vice-Minister — Ministry of Finance. Ireland Paul Ryan — International Finance Division — Ministry of Finance Sean Judge — Department of Finance — Ministry of Finance

1y ago

232 Views

Trade Finance & Supply Chain Finance Awards 2022

In February 2022, Global Finance will publish its annual selections for the World's Best Trade Finance and Supply Chain Finance Providers. Global Finance will name the best trade finance providers in more than 100 countries and territories, eight global regions and

1y ago

215 Views

Multi-Task Learning & Transfer Learning Basics

It looks like you're using an ad-blocker