Convolutional Neural Network Architectures: From LeNet To .

3y ago
41 Views
2 Downloads
7.28 MB
49 Pages
Last View : 22d ago
Last Download : 3m ago
Upload by : Jenson Heredia
Transcription

Convolutional Neural Network Architectures:from LeNet to ResNetLana LazebnikFigure source: A. Karpathy

What happened to my field?Classification: ImageNet Challenge top-5 errorFigure source: Kaiming He

What happened to my field?Object Detection: PASCAL VOC mean Average Precision (mAP)mean0Average0Precision0 (mAP)80%70%60%Before deep convnets50%40%Using deep 3201420152016yearFigure source: Ross Girshick

Actually, it happened a while ago LeNet 5Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner,Gradient-based learning applied to document recognition, Proc. IEEE 86(11): 2278–2324, 1998.

Let’s back up even more The PerceptronInputx1x2x3.xDWeightsw1w2Output: sgn(w x b)w3wDRosenblatt, Frank (1958), The Perceptron: A Probabilistic Model for Information Storage and Organizationin the Brain, Cornell Aeronautical Laboratory, Psychological Review, v65, No. 6, pp. 386–408.

Let’s back up even more

Two-layer neural network Can learn nonlinear functions provided each perceptron has adifferentiable nonlinearitySigmoid: g(t) 11 e t

Multi-layer neural network

Training of multi-layer networks Find network weights to minimize the training error betweentrue and estimated labels of training examples, e.g.:NE(w) ( yi fw (x i ))2i 1 EUpdate weights by gradient descent: w w α ww1w2

Training of multi-layer networks Find network weights to minimize the training error betweentrue and estimated labels of training examples, e.g.:NE(w) ( yi fw (x i ))2i 1 EUpdate weights by gradient descent: w w α wBack-propagation: gradients are computed in the directionfrom output to input layers and combined using chain ruleStochastic gradient descent: compute the weight updatew.r.t. one training example (or a small batch of examples) ata time, cycle through training examples in random order inmultiple epochs

Multi-Layer Network Demohttp://playground.tensorflow.org/

From fully connected to convolutional networksimageFully connected layer

From fully connected to convolutional networksimageConvolutional layer

From fully connected to convolutional networksfeature maplearnedweightsimageConvolutional layer

From fully connected to convolutional networksfeature maplearnedweightsimageConvolutional layer

Convolution as feature extraction.InputFeature Map

From fully connected to convolutional networksfeature maplearnedweightsimageConvolutional layer

From fully connected to convolutional networksimageConvolutional layernextlayer

Key operations in a CNNFeature mapsSpatial poolingNon-linearityConvolution(Learned).Input ImageSource: R. Fergus, Y. LeCunInputFeature Map

Key operationsFeature mapsSpatial poolingNon-linearityConvolution(Learned)Input ImageSource: R. Fergus, Y. LeCunRectified Linear Unit (ReLU)

Key operationsFeature mapsSpatial poolingNon-linearityConvolution(Learned)Input ImageSource: R. Fergus, Y. LeCunMax

LeNet-5 Average poolingSigmoid or tanh nonlinearityFully connected layers at the endTrained on MNIST digit dataset with 60K training examplesY. LeCun, L. Bottou, Y. Bengio, and P. Haffner,Gradient-based learning applied to document recognition, Proc. IEEE 86(11): 2278–2324, 1998.

Fast forward to the arrival of big visual data Validation classificationValidation classificationValidation classification 14 million labeled images, 20k classes Images gathered from Internet Human labels via Amazon MTurk ImageNet Large-Scale VisualRecognition Challenge (ILSVRC):1.2 million training images, 1000 classeswww.image-net.org/challenges/LSVRC/

AlexNet: ILSVRC 2012 winner Similar framework to LeNet but: Max pooling, ReLU nonlinearity More data and bigger model (7 hidden layers, 650K units, 60M params) GPU implementation (50x speedup over CPU) Trained on two GPUs for a week Dropout regularizationA. Krizhevsky, I. Sutskever, and G. Hinton,ImageNet Classification with Deep Convolutional Neural Networks, NIPS 2012

Clarifai: ILSVRC 2013 winner Refinement of AlexNetM. Zeiler and R. Fergus, Visualizing and Understanding Convolutional Networks,ECCV 2014 (Best Paper Award winner)

VGGNet: ILSVRC 2014 2nd place Sequence of deeper networkstrained progressivelyLarge receptive fields replacedby successive layers of 3x3convolutions (with ReLU inbetween)One 7x7 conv layer with Cfeature maps needs 49C2weights, three 3x3 conv layersneed only 27C2 weightsExperimented with 1x1convolutionsK. Simonyan and A. Zisserman,Very Deep Convolutional Networks for Large-Scale Image Recognition, ICLR 2015

Network in networkM. Lin, Q. Chen, and S. Yan, Network in network, ICLR 2014

1x1 convolutionsconv layer

1x1 convolutions1x1 conv layer

1x1 convolutions1x1 conv layer

GoogLeNet: ILSVRC 2014 winner The Inception deeperC. Szegedy et al., Going deeper with convolutions, CVPR 2015

GoogLeNet The Inception Module Parallel paths with different receptive field sizes andoperations are meant to capture sparse patterns ofcorrelations in the stack of feature mapsC. Szegedy et al., Going deeper with convolutions, CVPR 2015

GoogLeNet The Inception Module Parallel paths with different receptive field sizes andoperations are meant to capture sparse patterns ofcorrelations in the stack of feature mapsUse 1x1 convolutions for dimensionality reduction beforeexpensive convolutionsC. Szegedy et al., Going deeper with convolutions, CVPR 2015

GoogLeNetInception moduleC. Szegedy et al., Going deeper with convolutions, CVPR 2015

GoogLeNetAuxiliary classifierC. Szegedy et al., Going deeper with convolutions, CVPR 2015

GoogLeNetAn alternative view:C. Szegedy et al., Going deeper with convolutions, CVPR 2015

Inception v2, v3 Regularize training with batch normalization,reducing importance of auxiliary classifiersMore variants of inception modules with aggressivefactorization of filtersC. Szegedy et al., Rethinking the inception architecture for computer vision, CVPR 2016

Inception v2, v3 Regularize training with batch normalization,reducing importance of auxiliary classifiersMore variants of inception modules with aggressivefactorization of filtersIncrease the number of feature maps whiledecreasing spatial resolution (pooling)C. Szegedy et al., Rethinking the inception architecture for computer vision, CVPR 2016

ResNet: ILSVRC 2015 winnerKaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun,Deep Residual Learning for Image Recognition, CVPR 2016

Source (?)

ResNet The residual module Introduce skip or shortcut connections (existing before in variousforms in literature)Make it easy for network layers to represent the identity mappingFor some reason, need to skip at least two layersKaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun,Deep Residual Learning for Image Recognition, CVPR 2016 (Best Paper)

ResNetDeeper residual module(bottleneck) Directly performing 3x3convolutions with 256 featuremaps at input and output:256 x 256 x 3 x 3 600KoperationsUsing 1x1 convolutions to reduce256 to 64 feature maps, followedby 3x3 convolutions, followed by1x1 convolutions to expand backto 256 maps:256 x 64 x 1 x 1 16K64 x 64 x 3 x 3 36K64 x 256 x 1 x 1 16KTotal: 70KKaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun,Deep Residual Learning for Image Recognition, CVPR 2016 (Best Paper)

ResNetArchitectures for ImageNet:Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun,Deep Residual Learning for Image Recognition, CVPR 2016 (Best Paper)

Inception v4C. Szegedy et al.,Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning,arXiv 2016

Summary: ILSVRC 2012-2015TeamYearPlaceError (top-5)External dataSuperVision – Toronto(AlexNet, 7 layers)2012-16.4%noSuperVision20121st15.3%ImageNet 22kClarifai – NYU (7 layers)2013-11.7%noClarifai20131st11.2%ImageNet 22kVGG – Oxford (16 layers)20142nd7.32%noGoogLeNet (19 layers)20141st6.67%noResNet (152 layers)20151st3.57%Human magenet/

Accuracy vs. 06/04/nets.html

Design principles Reduce filter sizes (except possibly at thelowest layer), factorize filters aggressively Use 1x1 convolutions to reduce and expandthe number of feature maps judiciously Use skip connections and/or create multiplepaths through the network

What’s missing from the picture? Training tricks and details: initialization,regularization, normalization Training data augmentation Averaging classifier outputs over multiplecrops/flips Ensembles of networks What about ILSVRC 2016? No more ImageNet classificationNo breakthroughs comparable to ResNet

Reading list .htmlY. LeCun, L. Bottou, Y. Bengio, and P. Haffner,Gradient-based learning applied to document recognition, Proc. IEEE 86(11):2278–2324, 1998.A. Krizhevsky, I. Sutskever, and G. Hinton,ImageNet Classification with Deep Convolutional Neural Networks, NIPS 2012M. Zeiler and R. Fergus, Visualizing and Understanding Convolutional Networks,ECCV 2014K. Simonyan and A. Zisserman,Very Deep Convolutional Networks for Large-Scale Image Recognition, ICLR 2015M. Lin, Q. Chen, and S. Yan, Network in network, ICLR 2014C. Szegedy et al., Going deeper with convolutions, CVPR 2015C. Szegedy et al., Rethinking the inception architecture for computer vision,CVPR 2016K. He, X. Zhang, S. Ren, and J. Sun,Deep Residual Learning for Image Recognition, CVPR 2016

ImageNet Classification with Deep Convolutional Neural Networks, NIPS 2012 M. Zeiler and R. Fergus, Visualizing and Understanding Convolutional Networks, ECCV 2014 K. Simonyan and A. Zisserman, Very Deep Convolutional Networks for Large-Scale Image Recognition, ICLR 2015

Related Documents:

Learning a Deep Convolutional Network for Image Super-Resolution . a deep convolutional neural network (CNN) [15] that takes the low- . Convolutional Neural Networks. Convolutional neural networks (CNN) date back decades [15] and have recently shown an explosive popularity par-

Performance comparison of adaptive shrinkage convolution neural network and conven-tional convolutional network. Model AUC ACC F1-Score 3-layer convolutional neural network 97.26% 92.57% 94.76% 6-layer convolutional neural network 98.74% 95.15% 95.61% 3-layer adaptive shrinkage convolution neural network 99.23% 95.28% 96.29% 4.5.2.

Image Colorization with Deep Convolutional Neural Networks Jeff Hwang jhwang89@stanford.edu You Zhou youzhou@stanford.edu Abstract We present a convolutional-neural-network-based sys-tem that faithfully colorizes black and white photographic images without direct human assistance. We explore var-ious network architectures, objectives, color .

2 Convolutional neural networks CNNs are hierarchical neural networks whose convolutional layers alternate with subsampling layers, reminiscent of sim-ple and complex cells in the primary visual cortex [Wiesel and Hubel, 1959]. CNNs vary in how convolutional and sub-sampling layers are realized and how the nets are trained. 2.1 Image processing .

Video Super-Resolution With Convolutional Neural Networks Armin Kappeler, Seunghwan Yoo, Qiqin Dai, and Aggelos K. Katsaggelos, Fellow, IEEE Abstract—Convolutional neural networks (CNN) are a special type of deep neural networks (DNN). They have so far been suc-cessfully applied to image super-resolution (SR) as well as other image .

of networks are updated according to learning rate, cost function via stochastic gradient descent during the back propagation. In the following, we briefly introduce the structures of di erent DNNs applied in NLP tasks. 2.1.1 Convolutional Neural Network Convolutional neural networks (CNNs) learn local features and assume that these features

Deep Neural Networks Convolutional Neural Networks (CNNs) Convolutional Neural Networks (CNN, ConvNet, DCN) CNN a multi‐layer neural network with – Local connectivity: Neurons in a layer are only connected to a small region of the layer before it – Share weight parameters across spatial positions:

Animal nutrition is gained from grasses, grain crops, and pro-cessed products. Objectives: 1. Explain the functions of feed. 2. Describe the various types of feed. Key Terms: Functions of Feed Feed is any product consumed by an animal to meet nutritional needs. Feed provides the animal with energy to be mobile, protein to grow new or repair damaged cells, and vitamins and minerals to support .