Convolutional Neural Nets II Hands On - GitHub Pages

3y ago
22 Views
2 Downloads
3.29 MB
38 Pages
Last View : 24d ago
Last Download : 3m ago
Upload by : Milo Davies
Transcription

Convolutional Neural Nets IIHands OnOliver DürrDatalab-Lunch Seminar SeriesWinterthur, April 22nd, 20151

Outline Motivation for CNN Focus here on image classificationFrameworks Caffe / Lasagne Recap MLP / Demo MLP Recap CNN / Demo CNN Some Tricks DropoutTraining and Test augmentation (Learning Symmetries)Demo CNN with AugmentationCode for demos: https://github.com/oduerr/dl tutorial

History, milestones of CNN 1980 Kunihiko Fukushima introduction 1998 Le Cun (Backpropagation) Many Contests won 2011& 2014 MINST Handwritten Dataset201X Chinese Handwritten Character 2011 German Traffic SignsImageNet Success Story Alex Net (2012) winning solution of ImageNet

Imagenet 2012,2013,2014Some ExamplesWith Alexnet results1000 Classes20122010-2014SuperVisionAlexNet 7 layers deephttp://cs.nyu.edu/ fergus/presentations/nips2013 b519Oxfordnet up to 19 layersGoogLeNet 6.7%

A really convincing factKaggle Plankton Competition (2015)There is another bold one

Frameworks

Overview of frameworksDisclaimer: „This is a fast changing field. The list is not exclusive“.Survey from the partcipiants of the Kaggle Challenge (Feb 2015)Most Mentioned lasagne / nolearn Python based on Theano, very flexible (winning team „Deep See“ used it)caffe C based library, python bindings, convenience functions, many existing / pretrainedmodelsAlso used Theano (plain vanilla) Symbolic computation of gradients and construction of numerical C-codeTorch, lua (used by e-your-toolkit

CaffeLasagne / nolearn C library with python bindings Settings (network layout and others) viafiles Documentation: poor Feels like a Blindflug Up to data components Alexnet,GoogLeNet Lasage: python using theano (librarydefine, optimize, and evaluatemathematical expressions on GPU) Nolearn is a wrapper around lasagne toprovide a similar interface then scikit-learn Documentation: poor, but use the sourceluke Feels like you understand /control bellsand whistles Custom components possible (providedTheano has functionallity) Input: Images or strange DBs Input: Any numpy arrays Data Augmentation: Not possible frompython Data Augmentation: Easy No predefined models (yet) Predefined models availableWe will focus on lasagne

Links for CaffeWe will focuss on Lasagne but some links to Caffe for reference (thanks toGabriel) r/python/FaceCaffe slides/Caffe/caffe tutorial.pdf caffe tutorial.pdf https://docs.google.com/presentation/.

Recap Neural Nets

Recap Neural Networks: Basic UnitN-D log RegressionActivation Function a.k.a.Nonlinearityf(z) exp(z) f (z) 1 exp(z) max(0,z) z x1W1 x2W2 W3 θ T xMotivation:Green:logistic regression.Red:ReLU fasterconvergenceSource:AlexnetKrizhevsky etal 2012For a more detail explanation see: https://home.zhaw.ch/ dueo/bbs/files/ConvNets 17 Dec 1.pdf

Recap Neural Networks: Stacking things togetherOutput (Softmax)f (zi ) eziNzie i 1Contains many weights W(l )ijThis is just a complex functions of the manyweights θ W l ij and the input predicting theprob. of a class.For a more detail explanation see: https://home.zhaw.ch/ dueo/bbs/files/ConvNets 17 Dec 1.pdfFigure taken from: iLayerNeuralNetworks/Propability of a classgiven the input image X

Recap Neural Networks: Training the NN Use the training data j 1,.,Ntrain to optimize a costfunction J sensitive tomisclassication Usual a subset n (minibatch) of the training data is taken for optimization in onego. nJ(θ ) n (Mini Batch Size) Cost of Training example X ii 1 Motivation of cost function from MaxLikelihhod Optimal weights are found in many iterations using gradient descent (α learning rate)θi θi α J(θ ) θ i Backpropagation a.k.a chainrule is used to calculate the gradient

Illustration of Gradient Descentθ2 J(θ )θi θi α θ iθ1,θ2 just two from millionsθ1

Demo MLP

Definition Data/Network (MLP)Images 28x28 784In reality much more nodes5005010Now for the Demo

To many weightsBut we want many layers.Remedy: Weight sharing à Convolution Sparse connectivity à Pooling

The convolutional layerIngredient I: ConvolutionWhat is convolution?The 9 weightsWij are calledKernel.The weights are not fixed they arelearned!Gimp documentation: http://docs.gimp.org/en/plug-in-convmatrix.html

The convolutional layerIngredient I: ConvolutionThe same weights are slid over the imageIllustration: http://deeplearning.stanford.edu/tutorial/

Example of a KernelEdge enhanceFilterBut again!The weights are not fixed. They are learned!Gimp documentation: http://docs.gimp.org/en/plug-in-convmatrix.html

The convolutional layerIngredient II: Max-PoolingAlso sliding windowversions.Simply join e.g. 2x2 adjacent pixels in one.Hinton: „The pooling operation used in convolutional neural networks is abig mistake and the fact that it works so well is a disaster“

A simple version of the CNN (LeNet5 Architecture)20 Kernels a 5x5weights to go fromone to the nextConvMax PoolConvMax Pool Full Con 1MultinomialLog. Reg

A typical recent architecture (AlexNet, 2012)Senimal paper. introduced 26.2% error à 16.5% Dropout (see below) ReLU instead of sigmoid Parallelisation on many GPUs Local Response Normalization (not used widelynowadays)A bit of a simplification, since Alex Net is build for 2 GPUs and normalization. Caffe code from here

Figure 3: GoogLeNet network with all the bells and whistlesWinning architecture (GoogLeNet, 2014)The inception module (convolutions and maxpooling)Few parameters, quite hardto train.Comments see hereGoing deeper with convolution http://arxiv.org/abs/1409.4842

A typical very recent architecture („Oxford Net“(s), 2014) Small pooling More than 1 conv beforemaxpooling. No strides (stride 1) ReLU after conv and FC More traditional, easier tomore weights thanGoogLeNet Caffe et Challenge 2014, 2nd classification

A typical very recent architecture („Oxford Net“(s), 2014)Definition (16 Layer) taken 7b538e2d8#file-readme-md

Demo Lasagne II(Convolutional)

Demo: Nolearn/Lasagne (LeNet Architecture)layers [('input', layers.InputLayer),Have ReLu non-linearity by default('conv1', layers.Conv2DLayer),('pool1', layers.MaxPool2DLayer),('conv2', layers.Conv2DLayer),('pool2', layers.MaxPool2DLayer),('hidden4', layers.DenseLayer),('output', layers.DenseLayer),],input shape (None, 1, PIXELS, PIXELS),conv1 num filters 32, conv1 filter size (3, 3), pool1 ds (2, 2),conv2 num filters 64, conv2 filter size (2, 2), pool2 ds (2, 2),hidden4 num units 500,output num units 10, output nonlinearity den4Image taken from: Master Thesis Christopher Mitchell .pdfoutput

(Some) tricks of the trade

Data Augmentation Create „new training“ data by „lable preserving transformation“ Force invariances under translational symetrie (translation, rotation, )OriginalAugmentedTaken from the winning solution of the plankton challenge. http://benanne.github.io/2015/03/17/plankton.html

DropoutOriginalTrainingAt each mini-batchremove randomnodes “dropout”Training timeTest TimeIdea: Averaging over many different configuration (exact in case of linear).Typically 10% performance invcreaseSrivastava et al., Journal of Machine Learning Research 15 (2014) 1929-1958

Demo Lasagne III(Convolutional with TrainingData Augmentation)

Attic

Taking a closer look: ConvolutionDocumentation: The convolutional layer is finished with a nonlinearity: Possible: identity (nothing, linear), rectify (default), tanh, softmax (good for last layer),sigmoidDifferent sizes no padding / padding via conv1 border mode ‘valid‘ #(None, 1, 28, 28) à (None, 32, 26, 26) conv1 border mode 'same‘ #(None, 1, 28, 28) à (None, 32, 28, 28) conv1 border mode ‘full‘ #(None, 1, 28, 28) à (None, 32, 30, 30) #? stride schrittweite default (1,1) conv1 strides (2,2) #(None, 1, 28, 28) à (None, 32, 13, 13)Observation:Seems to take quite a while for compiling

Cost Functions: ReLUSource: Krizhevsky et al 2012Six Times faster Convergence, than traditional approach.Intuition: Backpropagation

Generell Notes on Optimization Gradient Descent (only first order) Newton Taylor Expansion 2nd Order using Hessian

Taking a closer look: Learning Rate & Momentum Nice Description Caffe Tutorial Nice Visualization: torial/Problem with (stochastic) Gradient Descent are Valleys. You bounce up anddown the walls and don‘t descent the slope. Solutions Momentum, Nesterov Momentum (NAG) Put mu 0 and we have (S)GD

Taking a closer look: Learning Rate & Momentum AdaGrad use all historic information Hessian Free Optimizer

The convolutional layer Ingredient II: Max-Pooling Simply join e.g. 2x2 adjacent pixels in one. Hinton: „The pooling operation used in convolutional neural networks is a big mistake and the fact that it works so well is a disaster“ Also sliding window versions.

Related Documents:

2 Convolutional neural networks CNNs are hierarchical neural networks whose convolutional layers alternate with subsampling layers, reminiscent of sim-ple and complex cells in the primary visual cortex [Wiesel and Hubel, 1959]. CNNs vary in how convolutional and sub-sampling layers are realized and how the nets are trained. 2.1 Image processing .

Learning a Deep Convolutional Network for Image Super-Resolution . a deep convolutional neural network (CNN) [15] that takes the low- . Convolutional Neural Networks. Convolutional neural networks (CNN) date back decades [15] and have recently shown an explosive popularity par-

Video Super-Resolution With Convolutional Neural Networks Armin Kappeler, Seunghwan Yoo, Qiqin Dai, and Aggelos K. Katsaggelos, Fellow, IEEE Abstract—Convolutional neural networks (CNN) are a special type of deep neural networks (DNN). They have so far been suc-cessfully applied to image super-resolution (SR) as well as other image .

Performance comparison of adaptive shrinkage convolution neural network and conven-tional convolutional network. Model AUC ACC F1-Score 3-layer convolutional neural network 97.26% 92.57% 94.76% 6-layer convolutional neural network 98.74% 95.15% 95.61% 3-layer adaptive shrinkage convolution neural network 99.23% 95.28% 96.29% 4.5.2.

ImageNet Classification with Deep Convolutional Neural Networks, NIPS 2012 M. Zeiler and R. Fergus, Visualizing and Understanding Convolutional Networks, ECCV 2014 K. Simonyan and A. Zisserman, Very Deep Convolutional Networks for Large-Scale Image Recognition, ICLR 2015

Make the 3D shapes 13 Use the nets you just made. 1. Put the nets flat on thin cardboard or thick paper. 2. Trace around the nets with a pencil to draw the nets on the thin cardboard. Or you can glue your paper net on the thin cardboard. 3. Cut out the cardboard nets. 4. Decorate the

APPLICATIONS OF PETRI NETS A Thesis Submitted to . In this thesis we research into the analysis of Petri nets. Also we give the structure of Reachability graphs of Petri nets and . (Ye and Zhou 2003) about Petri nets and its’ properties. One can find further information about Pet

original reference. Referencing another writer’s graph. Figure 6. Effective gallic acid on biomass of Fusarium oxysporum f. sp. (Wu et al., 2009, p.300). A short guide to referencing figures and tables for Postgraduate Taught students Big Data assessment Data compression rate Data processing speed Time Efficiency Figure 5. Data processing speed, data compression rate and Big Data assessment .