Convolutional Neural Network Architectures:from LeNet to ResNetLana LazebnikFigure source: A. Karpathy
What happened to my field?Classification: ImageNet Challenge top-5 errorFigure source: Kaiming He
What happened to my field?Object Detection: PASCAL VOC mean Average Precision (mAP)mean0Average0Precision0 (mAP)80%70%60%Before deep convnets50%40%Using deep 3201420152016yearFigure source: Ross Girshick
Actually, it happened a while ago LeNet 5Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner,Gradient-based learning applied to document recognition, Proc. IEEE 86(11): 2278–2324, 1998.
Let’s back up even more The PerceptronInputx1x2x3.xDWeightsw1w2Output: sgn(w x b)w3wDRosenblatt, Frank (1958), The Perceptron: A Probabilistic Model for Information Storage and Organizationin the Brain, Cornell Aeronautical Laboratory, Psychological Review, v65, No. 6, pp. 386–408.
Let’s back up even more
Two-layer neural network Can learn nonlinear functions provided each perceptron has adifferentiable nonlinearitySigmoid: g(t) 11 e t
Multi-layer neural network
Training of multi-layer networks Find network weights to minimize the training error betweentrue and estimated labels of training examples, e.g.:NE(w) ( yi fw (x i ))2i 1 EUpdate weights by gradient descent: w w α ww1w2
Training of multi-layer networks Find network weights to minimize the training error betweentrue and estimated labels of training examples, e.g.:NE(w) ( yi fw (x i ))2i 1 EUpdate weights by gradient descent: w w α wBack-propagation: gradients are computed in the directionfrom output to input layers and combined using chain ruleStochastic gradient descent: compute the weight updatew.r.t. one training example (or a small batch of examples) ata time, cycle through training examples in random order inmultiple epochs
Multi-Layer Network Demohttp://playground.tensorflow.org/
From fully connected to convolutional networksimageFully connected layer
From fully connected to convolutional networksimageConvolutional layer
From fully connected to convolutional networksfeature maplearnedweightsimageConvolutional layer
From fully connected to convolutional networksfeature maplearnedweightsimageConvolutional layer
Convolution as feature extraction.InputFeature Map
From fully connected to convolutional networksfeature maplearnedweightsimageConvolutional layer
From fully connected to convolutional networksimageConvolutional layernextlayer
Key operations in a CNNFeature mapsSpatial poolingNon-linearityConvolution(Learned).Input ImageSource: R. Fergus, Y. LeCunInputFeature Map
Key operationsFeature mapsSpatial poolingNon-linearityConvolution(Learned)Input ImageSource: R. Fergus, Y. LeCunRectified Linear Unit (ReLU)
Key operationsFeature mapsSpatial poolingNon-linearityConvolution(Learned)Input ImageSource: R. Fergus, Y. LeCunMax
LeNet-5 Average poolingSigmoid or tanh nonlinearityFully connected layers at the endTrained on MNIST digit dataset with 60K training examplesY. LeCun, L. Bottou, Y. Bengio, and P. Haffner,Gradient-based learning applied to document recognition, Proc. IEEE 86(11): 2278–2324, 1998.
Fast forward to the arrival of big visual data Validation classificationValidation classificationValidation classification 14 million labeled images, 20k classes Images gathered from Internet Human labels via Amazon MTurk ImageNet Large-Scale VisualRecognition Challenge (ILSVRC):1.2 million training images, 1000 classeswww.image-net.org/challenges/LSVRC/
AlexNet: ILSVRC 2012 winner Similar framework to LeNet but: Max pooling, ReLU nonlinearity More data and bigger model (7 hidden layers, 650K units, 60M params) GPU implementation (50x speedup over CPU) Trained on two GPUs for a week Dropout regularizationA. Krizhevsky, I. Sutskever, and G. Hinton,ImageNet Classification with Deep Convolutional Neural Networks, NIPS 2012
Clarifai: ILSVRC 2013 winner Refinement of AlexNetM. Zeiler and R. Fergus, Visualizing and Understanding Convolutional Networks,ECCV 2014 (Best Paper Award winner)
VGGNet: ILSVRC 2014 2nd place Sequence of deeper networkstrained progressivelyLarge receptive fields replacedby successive layers of 3x3convolutions (with ReLU inbetween)One 7x7 conv layer with Cfeature maps needs 49C2weights, three 3x3 conv layersneed only 27C2 weightsExperimented with 1x1convolutionsK. Simonyan and A. Zisserman,Very Deep Convolutional Networks for Large-Scale Image Recognition, ICLR 2015
Network in networkM. Lin, Q. Chen, and S. Yan, Network in network, ICLR 2014
1x1 convolutionsconv layer
1x1 convolutions1x1 conv layer
1x1 convolutions1x1 conv layer
GoogLeNet: ILSVRC 2014 winner The Inception deeperC. Szegedy et al., Going deeper with convolutions, CVPR 2015
GoogLeNet The Inception Module Parallel paths with different receptive field sizes andoperations are meant to capture sparse patterns ofcorrelations in the stack of feature mapsC. Szegedy et al., Going deeper with convolutions, CVPR 2015
GoogLeNet The Inception Module Parallel paths with different receptive field sizes andoperations are meant to capture sparse patterns ofcorrelations in the stack of feature mapsUse 1x1 convolutions for dimensionality reduction beforeexpensive convolutionsC. Szegedy et al., Going deeper with convolutions, CVPR 2015
GoogLeNetInception moduleC. Szegedy et al., Going deeper with convolutions, CVPR 2015
GoogLeNetAuxiliary classifierC. Szegedy et al., Going deeper with convolutions, CVPR 2015
GoogLeNetAn alternative view:C. Szegedy et al., Going deeper with convolutions, CVPR 2015
Inception v2, v3 Regularize training with batch normalization,reducing importance of auxiliary classifiersMore variants of inception modules with aggressivefactorization of filtersC. Szegedy et al., Rethinking the inception architecture for computer vision, CVPR 2016
Inception v2, v3 Regularize training with batch normalization,reducing importance of auxiliary classifiersMore variants of inception modules with aggressivefactorization of filtersIncrease the number of feature maps whiledecreasing spatial resolution (pooling)C. Szegedy et al., Rethinking the inception architecture for computer vision, CVPR 2016
ResNet: ILSVRC 2015 winnerKaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun,Deep Residual Learning for Image Recognition, CVPR 2016
Source (?)
ResNet The residual module Introduce skip or shortcut connections (existing before in variousforms in literature)Make it easy for network layers to represent the identity mappingFor some reason, need to skip at least two layersKaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun,Deep Residual Learning for Image Recognition, CVPR 2016 (Best Paper)
ResNetDeeper residual module(bottleneck) Directly performing 3x3convolutions with 256 featuremaps at input and output:256 x 256 x 3 x 3 600KoperationsUsing 1x1 convolutions to reduce256 to 64 feature maps, followedby 3x3 convolutions, followed by1x1 convolutions to expand backto 256 maps:256 x 64 x 1 x 1 16K64 x 64 x 3 x 3 36K64 x 256 x 1 x 1 16KTotal: 70KKaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun,Deep Residual Learning for Image Recognition, CVPR 2016 (Best Paper)
ResNetArchitectures for ImageNet:Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun,Deep Residual Learning for Image Recognition, CVPR 2016 (Best Paper)
Inception v4C. Szegedy et al.,Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning,arXiv 2016
Summary: ILSVRC 2012-2015TeamYearPlaceError (top-5)External dataSuperVision – Toronto(AlexNet, 7 layers)2012-16.4%noSuperVision20121st15.3%ImageNet 22kClarifai – NYU (7 layers)2013-11.7%noClarifai20131st11.2%ImageNet 22kVGG – Oxford (16 layers)20142nd7.32%noGoogLeNet (19 layers)20141st6.67%noResNet (152 layers)20151st3.57%Human magenet/
Accuracy vs. 06/04/nets.html
Design principles Reduce filter sizes (except possibly at thelowest layer), factorize filters aggressively Use 1x1 convolutions to reduce and expandthe number of feature maps judiciously Use skip connections and/or create multiplepaths through the network
What’s missing from the picture? Training tricks and details: initialization,regularization, normalization Training data augmentation Averaging classifier outputs over multiplecrops/flips Ensembles of networks What about ILSVRC 2016? No more ImageNet classificationNo breakthroughs comparable to ResNet
Reading list .htmlY. LeCun, L. Bottou, Y. Bengio, and P. Haffner,Gradient-based learning applied to document recognition, Proc. IEEE 86(11):2278–2324, 1998.A. Krizhevsky, I. Sutskever, and G. Hinton,ImageNet Classification with Deep Convolutional Neural Networks, NIPS 2012M. Zeiler and R. Fergus, Visualizing and Understanding Convolutional Networks,ECCV 2014K. Simonyan and A. Zisserman,Very Deep Convolutional Networks for Large-Scale Image Recognition, ICLR 2015M. Lin, Q. Chen, and S. Yan, Network in network, ICLR 2014C. Szegedy et al., Going deeper with convolutions, CVPR 2015C. Szegedy et al., Rethinking the inception architecture for computer vision,CVPR 2016K. He, X. Zhang, S. Ren, and J. Sun,Deep Residual Learning for Image Recognition, CVPR 2016
ImageNet Classification with Deep Convolutional Neural Networks, NIPS 2012 M. Zeiler and R. Fergus, Visualizing and Understanding Convolutional Networks, ECCV 2014 K. Simonyan and A. Zisserman, Very Deep Convolutional Networks for Large-Scale Image Recognition, ICLR 2015
Learning a Deep Convolutional Network for Image Super-Resolution . a deep convolutional neural network (CNN) [15] that takes the low- . Convolutional Neural Networks. Convolutional neural networks (CNN) date back decades [15] and have recently shown an explosive popularity par-
Performance comparison of adaptive shrinkage convolution neural network and conven-tional convolutional network. Model AUC ACC F1-Score 3-layer convolutional neural network 97.26% 92.57% 94.76% 6-layer convolutional neural network 98.74% 95.15% 95.61% 3-layer adaptive shrinkage convolution neural network 99.23% 95.28% 96.29% 4.5.2.
Image Colorization with Deep Convolutional Neural Networks Jeff Hwang jhwang89@stanford.edu You Zhou youzhou@stanford.edu Abstract We present a convolutional-neural-network-based sys-tem that faithfully colorizes black and white photographic images without direct human assistance. We explore var-ious network architectures, objectives, color .
2 Convolutional neural networks CNNs are hierarchical neural networks whose convolutional layers alternate with subsampling layers, reminiscent of sim-ple and complex cells in the primary visual cortex [Wiesel and Hubel, 1959]. CNNs vary in how convolutional and sub-sampling layers are realized and how the nets are trained. 2.1 Image processing .
Video Super-Resolution With Convolutional Neural Networks Armin Kappeler, Seunghwan Yoo, Qiqin Dai, and Aggelos K. Katsaggelos, Fellow, IEEE Abstract—Convolutional neural networks (CNN) are a special type of deep neural networks (DNN). They have so far been suc-cessfully applied to image super-resolution (SR) as well as other image .
of networks are updated according to learning rate, cost function via stochastic gradient descent during the back propagation. In the following, we briefly introduce the structures of di erent DNNs applied in NLP tasks. 2.1.1 Convolutional Neural Network Convolutional neural networks (CNNs) learn local features and assume that these features
Deep Neural Networks Convolutional Neural Networks (CNNs) Convolutional Neural Networks (CNN, ConvNet, DCN) CNN a multi‐layer neural network with – Local connectivity: Neurons in a layer are only connected to a small region of the layer before it – Share weight parameters across spatial positions:
aliments contenant un additif alimentaire des dispositions des alinéas a) et d) du paragraphe 4(1) ainsi que du paragraphe 6(1) de la Loi sur les aliments et drogues de même que, s'il y a lieu, des articles B.01.042, B.01.043 et B.16.007 du Règlement sur les aliments et drogues uniquement en ce qui a trait