Tutorial: Learning Deep Architectures

1y ago

33 Views

4 Downloads

2.61 MB

29 Pages

Last View : 1d ago

Last Download : 3m ago

Upload by : Emanuel Batten

Report this link

Download PDF

Transcription

Tutorial:Learning Deep ArchitecturesYoshua Bengio, U. MontrealYann LeCun, NYUICML Workshop on Learning Feature Hierarchies,June 18th, 2009, Montreal

Deep Motivations Brains have a deep architecture Humans organize their ideas hierarchically, throughcomposition of simpler ideas Unsufficiently deep architectures can be exponentiallyinefficient Distributed (possibly sparse) representations are necessary toachieve non-local generalization Intermediate representations allow sharing statistical strength

Deep Architecture in the BrainArea V4Higher level visualabstractionsArea V2Primitive shape detectorsArea V1Edge detectorsRetinapixels

Deep Architecture in our Mind Humans organize their ideas and concepts hierarchically Humans first learn simpler concepts and then compose themto represent more abstract ones Engineers break-up solutions into multiple levels of abstractionand processing

Architecture DepthDepth 4Depth 3

Good News, Bad NewsTheoretical arguments: deep architectures can be2 layers oflogic gatesformal neuronsRBF units universal approximatorTheorems for all 3:(Hastad et al 86 & 91, Bengio et al 2007)Functions representablecompactly with k layers mayrequire exponential size withk-1 layers 2n1 2 3 1 2 3n

The Deep Breakthrough Before 2006, training deep architectures was unsuccessful,except for convolutional neural nets Hinton, Osindero & Teh « A Fast Learning Algorithm for DeepBelief Nets », Neural Computation, 2006 Bengio, Lamblin, Popovici, Larochelle « Greedy Layer-WiseTraining of Deep Networks », NIPS’2006 Ranzato, Poultney, Chopra, LeCun « Efficient Learning ofSparse Representations with an Energy-Based Model »,NIPS’2006

Greedy Layer-Wise Pre-TrainingStacking Restricted Boltzmann Machines (RBM) Deep Belief Network (DBN)

Stacking Auto-Encoders

Greedy Layerwise Supervised TrainingGenerally worse than unsupervised pre-training but better thanordinary training of a deep neural network (Bengio et al. 2007).

Supervised Fine-Tuning is Important Greedy layer-wiseunsupervised pre-trainingphase with RBMs or autoencoders on MNIST Supervised phase with orwithout unsupervisedupdates, with or withoutfine-tuning of hiddenlayers

Denoising Auto-Encoder Corrupt the input Reconstruct the uncorrupted inputHidden code (representation)Corrupted inputRaw inputKL(reconstruction raw input)reconstruction

Denoising Auto-Encoder Learns a vector field towards higherprobability regions Minimizes variational lower bound on agenerative model Similar to pseudo-likelihoodCorrupted inputCorrupted input

Stacked Denoising Auto-Encoders No partition function,can measure trainingcriterion Encoder & decoder:any parametrization Performs as well orbetter than stackingRBMs for usupervisedpre-trainingInfinite MNIST

Deep Architectures and SharingStatistical Strength, Multi-Task Learning Generalizing better tonew tasks is crucial toapproach AItask 1 output y1task 2output y2 Deep architectureslearn goodintermediaterepresentations thatcan be shared acrosstasks A good representationis one that makes sensefor many taskstask 3 output y3sharedintermediaterepresentation hraw input x

Why is Unsupervised Pre-TrainingWorking So Well? Regularization hypothesis: Unsupervised component forces model close to P(x) Representations good for P(x) are good for P(y x) Optimization hypothesis: Unsupervised initialization near better local minimum of P(y x) Can reach lower local minimum otherwise not achievable byrandom initialization Easier to train each layer using a layer-local criterion

Learning Trajectories in Function Space Each point a modelin function space Color epoch Top: trajectories w/opre-training Each trajectoryconverges indifferent local min. No overlap ofregions with and w/opre-training

Unsupervised learning as regularizer Adding extraregularization(reducing # hiddenunits) hurts more thepre-trained models Pre-trained modelshave less variance wrttraining sample Regularizer infinitepenalty outside ofregion compatiblewith unsupervised pretraining

Better optimization of online error Both training and onlineerror are smaller withunsupervised pre-training As # samples training err. online err. generalization err. Without unsup. pretraining: can’t exploitcapacity to capturecomplexity in targetfunction from training data

Before fine-tuningAfter fine-tuningLearning Dynamics of Deep Nets As weights become larger, gettrapped in basin of attraction(“quadrant” does not change) Initial updates have a crucialinfluence (“critical period”),explain more of the variance Unsupervised pre-training initializesin basin of attraction with goodgeneralization properties0

Restricted Boltzmann Machines The most popular building block for deep architectures Main advantage over auto-encoders: can sample fromthe model Bipartite undirected graphical model.x observed, h hidden P(h x) and P(x h) factorize:Convenient Gibbs sampling x h x h In practice, Gibbs sampling does not always mix well

Boltzmann Machine Gradient Gradient has two components:‘positive phase’ and ‘negative phase’ In RBMs, easy to sample or sum over h x: Difficult part: sampling from P(x), typically with a Markov chain

Training RBMs Contrastive Divergence (CD-k): start negative Gibbs chain atobserved x, run k Gibbs steps. Persistent CD (PCD): run negative Gibbs chain in backgroundwhile weights slowly change Fast PCD: two sets of weights, one with a large learning rateonly used for negative phase, quickly exploring modes Herding (see Max Welling’s ICML, UAI and workshop talks)

Deep Belief Networks Sampling: Sample from top RBM Sample from level k given k 1h3Top-level RBMh2 Estimating log-likelihood (not easy)(Salakhutdinov & Murray,ICML’2008, NIPS’2008)h1 Training: Variational bound justifies greedylayerwise training of RBMsHow to train all levels together?observed x

Deep Boltzmann Machines(Salakhutdinov et al, AISTATS 2009, Lee et al, ICML 2009) Positive phase: variationalapproximation (mean-field) Negative phase: persistent chain h3Guarantees (Younes 89,2000; Yuille 2004)If learning rate decreases in 1/t, chainmixes before parameters change toomuch, chain stays converged whenparameters change.h2h1 Can (must) initialize from stacked RBMs Salakhutdinov et al improved performanceon MNIST from 1.2% to .95% error Can apply AIS with 2 hidden layersobserved x

Level-local learning is important Initializing each layer of an unsupervised deep Boltzmannmachine helps a lot Initializing each layer of a supervised neural network as an RBMhelps a lot Helps most the layers further away from the target Not just an effect of unsupervised prior Jointly training all the levels of a deep architecture is difficult Initializing using a level-local learning algorithm (RBM, autoencoders, etc.) is a useful trick

Estimating Log-Likelihood RBMs: requires estimating partition function Reconstruction error provides a cheap proxylog Z tractable analytically for 25 binary inputs or hiddenLower-bounded with Annealed Importance Sampling (AIS) Deep Belief Networks: Extensions of AIS (Salakhutdinov et al 2008)

Open Problems Why is it difficult to train deep architectures? What is important in the learning dynamics? How to improve joint training of all layers? How to sample better from RBMs and deep generative models? Monitoring unsupervised learning quality in deep nets? Other ways to guide training of intermediate representations? Getting rid of learning rates?

THANK YOU! Questions? Comments?

The Deep Breakthrough Before 2006, training deep architectures was unsuccessful, except for convolutional neural nets Hinton, Osindero & Teh « A Fast Learning Algorithm for Deep Belief Nets », Neural Computation, 2006 Bengio, Lamblin, Popovici, Larochelle « Greedy Layer-Wise Training of Deep Networks », NIPS'2006

Related Documents:

Artificial Intelligence & Cloud - Master Hands-on Innovation - Reply

Microservice-based architectures. Using containerisation in hybrid cloud architectures: Docker, Kubernetes, OpenShift: Designing microservice architectures. Managing microservice architectures. Continuous integration and continuous delivery (CI/CD) in containerised architectures. Cloud-native microservice architectures: serverless.

12 Views

1y ago

Deep Learning for Brain MRI Segmentation: State of the Art and Future ...

As the deep learning architectures are becoming more mature, they gradually outperform previous state-of-the-art classical machine learning algorithms. This review aims to provide an over-view of current deep learning-based segmentation ap-proaches for quantitative brain MRI. First we review the current deep learning architectures used for .

42 Views

1y ago

Introducing Deep Learning with MATLAB

Deep Learning: Top 7 Ways to Get Started with MATLAB Deep Learning with MATLAB: Quick-Start Videos Start Deep Learning Faster Using Transfer Learning Transfer Learning Using AlexNet Introduction to Convolutional Neural Networks Create a Simple Deep Learning Network for Classification Deep Learning for Computer Vision with MATLAB

77 Views

1y ago

Deep Learning Tutorial - University of Virginia School of Engineering ...

Deep Learning is about learning multiple levels of representation and abstraction that help to make sense of data such as images, sound, and text. For more about deep learning algorithms, see for example: The monograph or review paper Learning Deep Architectures for AI (Foundations & Trends in Ma-chine Learning, 2009). The ICML 2009 .

34 Views

1y ago

Applying Deep Reinforcement Learning to Berkeley's Capture the Flag game

2.3 Deep Reinforcement Learning: Deep Q-Network 7 that the output computed is consistent with the training labels in the training set for a given image. [1] 2.3 Deep Reinforcement Learning: Deep Q-Network Deep Reinforcement Learning are implementations of Reinforcement Learning methods that use Deep Neural Networks to calculate the optimal policy.

103 Views

1y ago

Learning Deep Structured Semantic Models for Web Search using ...

2.2 Deep Learning Recently, deep learning methods have been successfully applied to a variety of language and information retrieval applications [1][4][7][19][22][23][25]. By exploiting deep architectures, deep learning techniques are able to discover from training data the

42 Views

1y ago

High Performance Distributed Deep Learning - Nvidia

-The Past, Present, and Future of Deep Learning -What are Deep Neural Networks? -Diverse Applications of Deep Learning -Deep Learning Frameworks Overview of Execution Environments Parallel and Distributed DNN Training Latest Trends in HPC Technologies Challenges in Exploiting HPC Technologies for Deep Learning

52 Views

1y ago

ANNUAL BOOK OF ASTM STANDARDS - IHS Markit

Annual Book of ASTM Standards now available at the desktop! ASTM updates nearly 3,000 standards annually! Annual Book of Volume 01.05: Steel--Bars, Forgings, Bearing, Chain, Tool ASTM Standards now available at the desktop! Section 1: Iron and Steel Products Volume 01.01: Steel--Piping, Tubing, Fittings Volume 01.02: Ferrous Castings; Ferroalloys Volume 01.03: Steel--Plate, Sheet, Strip, Wire .

219 Views

3y ago

Recent Views

zen and the art of motorcycle maintenance, robert m .

zen and the art of motorcycle maintenance, robert m. pirsig Page 1 of 192 back to the bookshelf zen and the art of motorcycle maintenance an inquiry into values robert m. pirsig Author’s Note What follows is based on actual occurrences. Although much has been changed for rhetorical purposes, it must be regarded in its essence as fact.File Size: 632KBPage Count: 192Explore further[PDF] Zen and the Art of Motorcycle Maintenance: An .blindhypnosis.comZen and the art of motorcycle maintenance : an inquiry .archive.orgZEN AND THE ART OF MOTORCYCLE en and the Art of Motorcycle Maintenance: An Inquiry Into .www.goodreads.comRecommended to you b

2y ago

350 Views

MOTORCYCLE RIDER’S - Idaho Transportation Department

ridden is a motorcycle, motor-driven cycle or motorbike that requires a motorcycle endorsement on the driver’s license. DEFINITIONS AND REQUIREMENTS “Motorcycle” [49-114(11)] – Motorcycle means every motor vehicle having a seat or saddle for the use of the rider and designed to travel on not more than three wheels in contact

3y ago

185 Views

NJ Motorcycle Manual

Study by reading the Driver Manual and Motorcycle Manual. MOTORCYCLE PERMIT RIDING RESTRICTIONS The holder of a motorcycle examination permit is prohibited from: operating a motorcycle from one-half hour after sunset to one-half hour b

2y ago

166 Views

Motorcycle Spark Plugs - Denso

Spark Plugs DENSO’s Motorcycle Spark Plug range features the most advanced technology in motorcycle engine ignition. A commitment to R&D and a heritage in the highest level of motorcycle motorsports has pioneered innovation in DENSO Motorcycle Spark Plugs. Our cutting-edge ca

2y ago

273 Views

G2156 - A Study of Motorcycle Oils - Synthetic Warehouse

3 Editor's Note:At the time of its original printing in December 2005, the A Study of Motorcycle Oilswhite paper represented the most comprehensive study of motorcycle oils ever published.The document served to educate hundreds of thousands of readers on the complex dynamic of motorcycle oil and motorcycle operation.The paper revealed, through an exhaustive series

1y ago

146 Views

**New Motorcycle Safety Courses Offered** Intermediate RiderCourse .

WHY MOTORCYCLE SAFETY COURSES RENJ offers the 3 Wheel Basic RiderCourse An extensive study of motorcycle accidents conducted by the University of California (USC) revealed that over 90% of all motorcycle accidents involved motorcyclists who were self-taught or taught by a friend or relative. They had no formal motorcycle training.

1y ago

156 Views

Model National Administrative Standards for State Motorcycle Rider .

Motorcycle Rider Training Programs, a second technical working group (TWG) was organized and convened. The TWG consisted of State motorcycle rider training program administrators, motorcycle rider training specialists, and other leaders in program administration. TWG participants included: Stacey Axmaker, Idaho STAR Motorcycle Safety Program;

1y ago

164 Views

Personal insurance - Car & Business insurance King Price Insurance

The king's insurance options 5 Things you need to know 7 The stuff you need to do 14 How to claim 16 Our commitment to you 20 Car insurance 22 Car warranty 37 Shortfall cover 45 Scratch and dent 46 Tyre and rim 48 Motorbike insurance 53 Trailer and caravan insurance 64 Watercraft insurance 68 Home contents insurance 77 Buildings insurance 89

1y ago

673 Views

Rider's Manual (US Model) R1200ST - A&S BMW Motorcycle Parts

cerning your motorcycle, your authorized BMW motorcycle retailer will gladly provide advice and assistance. We hope you enjoy reading this Rider’s Manual and wish you many a pleasant, safe journey on your BMW motorcycle. Best wishes, BMW Motorrad k28_u.book Seite 1 Dienstag, 30. November 2004 10:39 10

3y ago

141 Views

BMW Motorcycle Charger Safety guidelines

The BMW motorcycle charger is a high quality, microprocessor-controlled charging and trickle charging device for 12V BMW motorcycle batteries. In this way, BMW motorcycle wet and maintenance-free lead acid batteries (gel or AGM) – with capacities between 6 – 25 Ah can be

3y ago

133 Views

Welcome to Midwest Motorcycle Supply’s Engine Reference Guide.

Welcome to Midwest Motorcycle Supply’s Engine Reference Guide. In an effort to make servicing your V-Twin engine easier, the staff at Midwest Motorcycle Supply have put together what we believe to be the most comprehensive list of replacement parts for the heart of your motorcycle. We hope that this reference

3y ago

151 Views

Owner s Manual - High-Performance Motorcycle Products

motorcycle since it affects the height of the motorcycle and the fork angle. note! Perform the following procedure on a flat surface. 1. Put the motorcycle on a workstand so that both wheels are off the ground and the suspension is unloaded. 2. Mark, e.g. with a piece of tape, a point immediately above the rear wheel axle. 3.

3y ago

129 Views

2014 MOTORCYCLE - eRegulations

Jul 14, 2014 · & click on Motorcycle Safety Training or call 1-866-754-3687 Basic Riders Course (BRC) Introduces the exciting world of motorcycling. Successful graduates earn their Motorcycle License with a 90-day license test waiver! 2-day Classroom & On-cycle Training Motorcycle & helmet

2y ago

143 Views

MOTORCYCLE A N U A L - dmv

In the District of Columbia, you must have a valid driver’s license with motorcycle (M) endorsement to operate a motorcycle. DC law refers to a motorcycle as a 2 or 3 wheeled motor vehicle that has one or more of the following characteristics: Pist

2y ago

123 Views

Maryland Motorcycle Handbook 2019 - e permit test

knowledge. This handbook has been prepared by the MVA, with assis-tance from the Motorcycle Safety Foundation, to provide you with the information needed to enable you to obtain a motorcycle license and to help you learn those motorcycle operating skills and knowledge. As you study this

2y ago

154 Views

Tutorial: Learning Deep Architectures

It looks like you're using an ad-blocker