Deep Learning - Courses.cs.duke.edu

2y ago

46 Views

2 Downloads

1.36 MB

15 Pages

Last View : Today

Last Download : 3m ago

Upload by : Maxine Vice

Report this link

Download PDF

Transcription

4/23/21Deep LearningRonald ParrCompSci 370With thanks to Kris Hauser for some contentLate 1990’s: Neural Networks Hit the Wall Recall that a 3 layer network can approximate anyfunction arbitrarily closely (caveat: might requiremany, many hidden nodes) Q: Why not use big networks for hard problems? A: It didn’t work in practice!– Vanishing gradients– Not enough training data (local optima, variance)– Not enough training time (computers too slow tohandle huge data sets, even if they were available)1

4/23/21Why Deep? Deep learning is a family of techniques forbuilding and training large neural networks Why deep and not wide?– Deep sounds better than wide J– While wide is always possible, deep may requirefewer nodes to achieve the same result– May be easier to structure with human intuition:think about layers of computation vs. one flat, widecomputationExamples of Deep Learning Today Object/face recognition in your phone, your browser,autonomous vehicles, etc. Natural language processing (speech to text, parsing,information extraction, machine translation) Product recommendations (Netflix, Amazon) Fraud detection Medical imaging Image enhancement or restoration (e.g, Adobe Superresolution) the-acrteam-super-resolution.html Quick Draw: https://quickdraw.withgoogle.com2

4/23/21Vanishing Gradients Recall backprop derivation: E akδj h'(a j ) w kjδ kk ak a jk!! Activation functions often between -1 and 1 The furtheryou get from the output layer, thesmaller the gradient gets Hard to learn when gradients are noisy and smallRelated Problem: Saturation1.510.50-0.5-1-1.5-10-50510 Sigmoid gradient goes to 0 at tails Extreme values (saturation) anywhere alongbackprop path causes gradient to vanish3

4/23/21Summary of the Challenges Not enough training data in the 90’s to justifythe complexity of big networks (recall bias,variance trade off) Slow to train big networks Vanishing gradients, saturationSummary of Changes Massive data available Massive computation available Faster training methodsDifferent training methodsDifferent network structuresDifferent activation functions4

4/23/21Estimating the Gradient Efficiently Recall: Backpropagation is gradient descent Computing exact gradient of the loss function requiressumming over all training samples Thought experiment: What if you randomly sample one (or more) datapoint(s) and compute the gradient?Called online or stochastic gradientExpected value of sampled gradient true value of gradientSampled gradient true gradient noiseAs sample size increases, noise decreases, sampled gradient - truePractical idea: For massive data sets, estimate gradient using sampledtraining points to trade off computation vs. accuracy in gradient calculation– Possible pitfalls:––––– What is the right sampling strategy? Does the noise prevent convergence or lead to slower convergence?Batch/Minibatch Methods Find a sweet spot by estimating the gradientusing a subset of the samples Randomly sample subsets of the training dataand sum gradient computations over all samplesin the subset Take advantage of parallel architectures(multicore/GPU) Still requires careful selection of step size andstep size adjustment schedule – art vs. science5

4/23/21Other Tricks for Speeding Things Up Second order methods, e.g., Newton’s method – may becomputationally intensive in high dimensions Conjugate gradient is more computationally efficient, though notyet widely used Momentum: Use a combination of previous gradients to smoothout oscillations Line search: (Binary) search in gradient direction to find biggestworthwhile step size Some methods try to get benefits of second order methods withoutcost (without computing full Hessian), e.g., ADMMTricks For Breaking Down Problems Build up deep networks by training shallownetworks, then feeding their output into newlayers (may help with vanishing gradient andother problems) – a form of “pretraining” Train the network to solve “easier” problemsfirst, then train on harder problems –curriculum learning, a form of “shaping”6

4/23/21Convolutional Neural Networks (CNNs) Championed by LeCun (1998) Originally used for handwriting recognition Now used in state of the art systems in manycomputer vision applications Well-suited to data with a grid-like structureConvolutions What is a convolution? Way to combine two functions, e.g., x and w:𝑠 𝑡 𝑥 𝑎 𝑤 𝑡 𝑎 𝑑𝑎Entire Domain Discrete version𝑠 𝑡 * 𝑥 𝑎 𝑤(𝑡 𝑎)Example: Suppose s(t) is a decaying average of values of x around t, with w decreasingas a gets further from t7

4/23/21CHAPTER 9. CONVOLUTIONAL NETWORKSConvolution on Grid ExampleInputabcdefghijklKernelwxyzOutputFigure 9.1awey bxfz bwfy cxgz cwgy dxhz ewiy fxjz fwjy gxkz gwky hxlz Figure 9.1: An example of 2-D convolution without kernel-ﬂipping. In this case we restrictthe output to only positions where the kernel lies entirely within the image, called “valid”convolution in some contexts. We draw boxes with arrows to indicate how the upper-leftelement of the output tensor is formed by applying the kernel to the correspondingfromDeep Learning, Ian Goodfellow and Yoshua Bengio and Aaronupper-left region of the input tensor.Courville334Convolutions on Grids For image I Convolution “kernel” K:𝑆 𝑖, 𝑗 & & 𝐼 𝑚, 𝑛 𝐾(𝑖 𝑚, 𝑗 𝑛) & & 𝐼 𝑖 𝑚, 𝑗 𝑛 𝐾(𝑚, 𝑛)!"!"Examples:A convolution can blur/smooth/noise-filter an image by averaging neighboring pixels.A convolution can also serve as an edge detectorhttps://en.wikipedia.org/wiki/Kernel (image processing)Figure 9.6 from Deep Learning, Ian Goodfellow and Yoshua Bengio and Aaron Courville8

4/23/21Application to Images & Nets Images have huge input space: 1000x1000 1M Fully connected layers huge number of weights,slow training Convolutional layers reduce connectivity byconnecting only an mxn window around each pixel Can use weight sharing to learn a common set ofweights so that same convolution is appliedeverywhere (or in multiple places)Advantages of Convolutions withWeight Sharing Reduces of weights that must be learned– Speeds up learning– Fewer local optima– Less risk of overfitting Enforces uniformity in what is learned Enforces translation invariance – learns thesame thing for all positions in the image9

4/23/21Additional Stages &Different Activation Functions Convolutional stages (may) feed to intermediatestages Detectors are nonlinear, e.g., ReLUSource: wikipedia Pooling stages summarizing upstream nodes,e.g., average (shrinking image), max(thresholding)ReLU vs. Sigmoid ReLU is faster to compute Derivative is trivial Only saturates on one side Worry about non-differentiability at 0? Can use sub-gradientRelu in blue10

4/23/21Example Convolutional NetworkINPUT28x28feature maps4@24x24feature maps4@12x12feature maps12@8x8nvolbsutionSubsCoSuCofeature maps onFrom, Convolutional Networks for Images, Speech, and Time-Series, LeCun & BengioN.B.: Subsampling averagingWeight sharing results in 2600 weights shared over 100,000 connections.Why This Works ConvNets can use weight sharing to reduce the number ofparameters learned – mitigates problems with big networks Combination of convolutions with shared weights andsubsampling can be interpreted as learning position andscale invariant features Final layers combine feature to learn the target function Can be viewed as doingsimultaneous feature discovery and classification11

4/23/21ConvNets in Practice Work surprisingly well in many examples, eventhose that aren’t images Number of convolutional layers, form ofpooling and detecting units may beapplication specific – art & science hereOther Tricks Convnets and ReLUs tend can can helpw/vanishing gradient problem, but don’teliminate it Residual nets introduce connections acrosslayers, which tends to mitigate the vanishinggradient problem Techniques such as image perturbation and dropout reduce overfitting and produce more robustsolutions12

4/23/21Putting It all Together Why is deep learning succeeding now when neural netslost momentum in the 90’s? New architectures (e.g. ConvNets) are better suited to(some) learning tasks, reduce # of weights Smarter algorithms make better use of data, handlenoisy gradients better Massive amounts of data make overfitting less of aconcern (but still always a concern) Massive amounts of computation make handlingmassive amounts of data possible Large and growing bag of tricks to mitigatingoverfitting, vanishing gradient issuesSuperficial(?) Limitations Deep learning results arenot easily humaninterpretable Computationally intensive Combination of art, science,rules of thumb Can be tricked:– “Intriguing properties ofneural networks”, Szegedy etal. [2013]13

4/23/21Beyond Classification Deep networks (and other techniques) can beused for unsupervised learning Example: Autoencoder tries to compressinputs to a lower dimensional representationRecurrent Networks Recurrent networks feed (part of) the output of thenetwork back to the input Why?– Can learn (hidden) state, e.g., in a hidden Markov model– Useful for parsing language– Can learn a program LSTM: Variation on RNN that handles long termmemories better14

4/23/21Deeper Limitations We get impressive results but we don’t always understand why orwhether we really need all of the data and computation used Hard to explain results and hard to guard against adversarialspecial cases (“Intriguing properties of neural networks”, and“Universal adversarial perturbations”) Not clear how logic, high level reasoning could be incorporated Not clear how to incorporate prior knowledge in a principled way15

Why Deep? Deep learning is a family of techniques for building and training largeneural networks Why deep and not wide? –Deep sounds better than wide J –While wide is always possible, deep may require fewer nodes to achieve the same result –May be easier to structure with human

Related Documents:

CONTENTS BLUE BOOK - Duke University

MY.DUKE.EDU/STUDENTS- Personal info & important links Navigate Campus CALENDAR.DUKE.EDU-University events calendar STUDENTAFFAIRS.DUKE.EDU- Student services, student groups, cultural centers DUKELIST.DUKE.EDU- Duke’s Free Classifieds Marketplace Stay Safe EMERGENCY.DUKE.EDU-

35 Views

2y ago

Hillary Clinton & Doris Duke--Illuminati Grande Dames

„Doris Duke of the illuminati Duke family was an heiress (at 12 years old) to the large tobacco fortune of the Duke family. She was the only child of American tobacco Co. founder James Buchanan Duke. Doris Duke, herself a member of the illuminati. Doris Duke had 5 houses (which have served as sites for illuminati rituals) – one in Beverly

40 Views

2y ago

2015-16 Duke University Fuqua School of Business Bulletin

The Duke MBA—Daytime Academic Calendar 2015-16 9 Preface 10 General Information 11 Duke University 11 Resources of the University 13 Technology at Fuqua 14 Programs of Study 15 The Duke MBA—Daytime 15 Concurrent Degree Programs 17 The Duke MBA—Weekend Executive 18 The Duke MBA—Global Executive 18 The Duke MBA—Cross Continent 19

28 Views

1y ago

Introducing Deep Learning with MATLAB

Deep Learning: Top 7 Ways to Get Started with MATLAB Deep Learning with MATLAB: Quick-Start Videos Start Deep Learning Faster Using Transfer Learning Transfer Learning Using AlexNet Introduction to Convolutional Neural Networks Create a Simple Deep Learning Network for Classification Deep Learning for Computer Vision with MATLAB

77 Views

1y ago

Applying Deep Reinforcement Learning to Berkeley's Capture the Flag game

2.3 Deep Reinforcement Learning: Deep Q-Network 7 that the output computed is consistent with the training labels in the training set for a given image. [1] 2.3 Deep Reinforcement Learning: Deep Q-Network Deep Reinforcement Learning are implementations of Reinforcement Learning methods that use Deep Neural Networks to calculate the optimal policy.

103 Views

1y ago

Description Model Year Model Variant 1050 Adventure 2015 ...

990 Super Duke R 2007 2007 EU 990 Super Duke R 2008 2008 EU 990 Super Duke R 2009 2009 EU 990 Super Duke R 2010 2010 EU 990 Super Duke R 2011 2011 EU

91 Views

3y ago

Participant Guide for Success - Duke University

The original Duke University campus (East Campus) was rebuilt and the West Campus was built with the Duke Chapel as its center. West Campus opened in 1930 and East Campus served as the Women’s College of Duke University until 1972. Today male and female undergraduates attend Duke University in either the Trinity College of Arts

38 Views

2y ago

NFPA 30-2008: Basic Requirements for Storage Tanks

can distort the roof of a cone roof tank can exceed the design pressure of the tank maximum rateofflowinoroutrate of flow, in or out atmospheric or temperature changes size vent per API 2000 or approved standard min 1 ¼ in. (32 mm) or largest connection.

79 Views

3y ago

Recent Views

Legal Proceedings and Legal Privilege Exemptions: Myth-busting - ICO

If asking for legal advice, say so, and start new email chain If giving legal advice, say so Involve lawyers (before litigation contemplated) Maintain confidentiality of legal advice documents Limit dissemination of legal advice (need to know; original only) Make internal communications re legal advice factual

1y ago

240 Views

Smart People Ask for (My) Advice: Seeking Advice Boosts .

advice strategically is likely to be a different experi-ence for the advice seeker than seeking advice with the intention of using it, from the advisor’s perspec-tive, strategic advice seeking may elicit the same per-ceptual effects as authentic advice seeking because the advice seeker’s intentions (and her reliance on advice)

3y ago

177 Views

Legal Action Group The Role of Advice Services in Health Outcomes

The Role of Advice Services in Health Outcomes Evidence Review and Mapping Study June 2015 The Role of Advice Services in Health Outcomes . tor.!Our! r,!

1y ago

170 Views

Legal Information vs Legal Advice Guidelines - TMCEC

giving legal advice. Legal advice is a written or oral statement that: o Interprets some aspect of the law, court rules, or court procedures; o Recommends a specific course of conduct a person should take in an actual or potential legal proceeding; or o Applies the law to the individual person's specific factual circumstances. What is Legal .

1y ago

225 Views

ProQual L2 Certificate Supporting Access to Legal Advice

R/502/7657 Communicating with legal advice clients 2 3 D/503/0822 Supporting clients to make use of the legal advice service 2 3 R/502/7660 Enabling legal advice clients to access signposting and referral opportunities 2 3 Optional Units - a minimum of 6 credits Unit Reference Number Unit Title Unit Level Credit Value

1y ago

173 Views

Guidance for opponents in civil legal aid cases - Scottish Legal Aid Board

injury case - may apply for civil legal aid (since this leaﬂet deals only with civil legal aid, where we refer to "legal aid" we mean "civil legal aid"). Legal aid is ﬁnancial help from public funds. It helps people who qualify to get legal advice and the help of a solicitor to put their case in court.

4m ago

110 Views

legal and ethical dimensions of practice - Dovetail

Material in this Guide should never be taken as providing you or any other person with legal advice. Legal advice regarding the application of the law to a particular circumstance or situation can only come from a legal practitioner. A range of sources for legal advice can be found in the Guide.

1y ago

167 Views

How Social Welfare Legal Advice and Social Prescribing can work .

The position of social welfare legal advice and its role in London's recovery The Mayor of London and partners should position social welfare legal advice as a core pillar of Londons recovery from the OVID-19 pandemic, with a core focus on ensuring adequate funding and practical support for advice agencies to ensure ongoing viability.

1y ago

172 Views

WHAT TO DO IF YOU ARE SEXUALLY HARASSED

There are many legal clinics or legal information centres you can contact to obtain legal information, educational resources or legal referrals. Alberta Central Alberta Community Legal Clinic (Red Deer) Centre for Public Legal Education Alberta Pro Bono Law Alberta Women's Centre Legal Advice Clinic (Calgary)

3y ago

245 Views

Legal Advocacy Essentials

Legal Advocacy Essentials: a core training for legal advocates Presented by the Washington State Coalition Against Domestic Violence, 2008. This information is not intended as a substitute for legal advice. 1 Legal Advocacy Essentials . A core training for legal advocates . Table of Contents . What is a legal advocate?

1y ago

249 Views

Legal & Corporate Services: Strategic Plan - CP6

the provision of legal advice, managing legal risk and managing the legal supply chain. By doing this well, the team will move towards its vision. Legal Services is made up of 4 teams, each serving different customers with a dedicated legal resource. This is summarised in the figure right. Although Legal Services has customerdistinct, -focussed .

1y ago

171 Views

Regulatory Guide RG 90 Example Statement of Advice: Scaled advice for a .

representatives and advisers who give personal advice to retail clients. It explains how and why we have developed an example Statement of Advice (SOA) for scaled advice (i.e. personal advice that is limited in scope) on personal insurance for a new retail client. The example SOA was developed in consultation with stakeholders, and we

1y ago

186 Views

Removal of licence disqualification - Legal Aid WA

agencies, permission must first be obtained from Legal Aid Western Australia. This Kit provides information about the law only and does not constitute legal advice. You should seek legal advice if you have a specific legal problem. Every effort is made to ensure that the information contai

2y ago

253 Views

Legal Information vs - txcourts.gov

giving legal advice. Legal advice is a written or oral statement that: Inter p rets some as ect of th elaw, courtles, or du s; Recomme nd s a pecific c ourse of ndu ters h ld k ein an actual or ntial legal proceeding; or 'sApplies th elaw to individu alperso n seci fic actu circums a . What is Legal Information?

1y ago

174 Views

AUGUST 11, 2020 Business or Legal Advice? - acc

legal advice to the client. Mixed purpose depending on specific purpose for tax advice. 27 In-House Privilege: Scenario 2: Tax Advice. FACTS An anonymous employee submits a complaint about his boss. In-house counsel interviews various employees about the allegations. Is the

1y ago

136 Views

Deep Learning - Courses.cs.duke.edu

It looks like you're using an ad-blocker