Convolutional Neural Networks - Virginia Tech

1y ago
46 Views
2 Downloads
5.66 MB
88 Pages
Last View : 11d ago
Last Download : 4m ago
Upload by : Emanuel Batten
Transcription

Convolutional Neural NetworksComputer VisionJia-Bin Huang, Virginia Tech

Today’s class Overview Convolutional Neural Network (CNN) Understanding and Visualizing CNN Training CNN

Image Categorization: Training uresClassifierTrainingTrainedClassifier

Image Categorization: Testing sifierPredictionTestingImageFeaturesTest ImageOutdoor

Features are the KeysSIFT [Loewe IJCV 04]SPM [Lazebnik et al. CVPR 06]HOG [Dalal and Triggs CVPR 05]DPM [Felzenszwalb et al. PAMI 10]Color Descriptor [Van De Sande et al. PAMI 10]

Learning a Hierarchy of Feature Extractors Each layer of hierarchy extracts features from outputof previous layer All the way from pixels classifier Layers have the (nearly) same structureImage/videoLabelsLayer 1Layer 2Layer 3

Biological neuron and PerceptronsA biological neuronAn artificial neuron (Perceptron)- a linear classifier

Simple, Complex and Hypercomplex cellsDavid H. Hubel and Torsten WieselSuggested a hierarchy of feature detectorsin the visual cortex, with higher level featuresresponding to patterns of activation in lowerlevel cells, and propagating activationupwards to still higher level cells.David Hubel's Eye, Brain, and Vision

Hubel/Wiesel Architecture and Multi-layer Neural NetworkHubel and Weisel’s architectureMulti-layer Neural Network- A non-linear classifier

Multi-layer Neural Network A non-linear classifier Training: find network weights w to minimize theerror between true training labels 𝑦𝑖 andestimated labels 𝑓𝒘 𝒙𝒊 Minimization can be done by gradient descentprovided 𝑓 is differentiable This training method is calledback-propagation

Convolutional Neural Networks Also known as CNN, ConvNet, DCN CNN a multi-layer neural network with1. Local connectivity2. Weight sharing

CNN: Local ConnectivityHidden layerInput layerGlobal connectivity # input units (neurons): 7 # hidden units: 3 Number of parameters– Global connectivity: 3 x 7 21– Local connectivity: 3 x 3 9Local connectivity

CNN: Weight SharingHidden layerw1w3w2w5w4w7w6w1w9w3w2w8w1w2w1w3w3w2Input layerWithout weight sharing # input units (neurons): 7 # hidden units: 3 Number of parameters– Without weight sharing: 3 x 3 9– With weight sharing : 3 x 1 3With weight sharing

CNN with multiple input channelsHidden layerInput layerChannel 1Channel 2Single input channelFilter weightsMultiple input channelsFilter weights

CNN with multiple output mapsHidden layerMap 1Map 2Input layerSingle output mapMultiple output mapsFilter 1Filter weightsFilter 2Filter weights

Putting them together Local connectivityWeight sharingHandling multiple input channelsHandling multiple output mapsWeight sharingLocal connectivity# input channels# output (activation) mapsImage credit: A. Karpathy

Neocognitron [Fukushima, Biological Cybernetics 1980]Deformation-ResistantRecognitionS-cells: (simple)- extract local featuresC-cells: (complex)- allow for positional errors

LeNet [LeCun et al. 1998]Gradient-based learning applied to documentrecognition [LeCun, Bottou, Bengio, Haffner 1998]LeNet-1 from 1993

What is a Convolution? Weighted moving sum.InputFeature Activation Mapslide credit: S. Lazebnik

Convolutional Neural NetworksFeature mapsNormalizationSpatial poolingNon-linearityConvolution(Learned)Input Imageslide credit: S. Lazebnik

Convolutional Neural NetworksFeature mapsNormalizationSpatial t ImageFeature Mapslide credit: S. Lazebnik

Convolutional Neural NetworksFeature mapsNormalizationRectified Linear Unit (ReLU)Spatial poolingNon-linearityConvolution(Learned)Input Imageslide credit: S. Lazebnik

Convolutional Neural NetworksFeature mapsNormalizationMax poolingSpatial g: a non-linear down-samplingProvide translation invarianceInput Imageslide credit: S. Lazebnik

Convolutional Neural NetworksFeature mapsNormalizationSpatial poolingFeature MapsFeature MapsAfter ned)Input Imageslide credit: S. Lazebnik

Convolutional Neural NetworksFeature mapsNormalizationSpatial poolingNon-linearityConvolution(Learned)Input Imageslide credit: S. Lazebnik

Engineered vs. learned featuresLabelConvolutional filters are trained in asupervised manner by back-propagatingclassification n/poolFeature extractionConvolution/poolImageImage

SIFT DescriptorLowe [IJCV 2004]ImagePixelsApply gradientfiltersSpatial pool(Sum)Normalize to unitlengthFeatureVector

SIFT DescriptorLowe [IJCV 2004]ImagePixelsApplyoriented filtersSpatial pool(Sum)Normalize to unitlengthFeatureVectorslide credit: R. Fergus

Spatial Pyramid MatchingSIFTFeaturesFilter withVisual WordsLazebnik,Schmid,Ponce[CVPR 2006]MaxMulti-scalespatial pool(Sum)Classifierslide credit: R. Fergus

Deformable Part ModelDeformable Part Models are Convolutional Neural Networks [Girshick et al. CVPR 15]

AlexNet Similar framework to LeCun’98 but: Bigger model (7 hidden layers, 650,000 units, 60,000,000 params) More data (106 vs. 103 images) GPU implementation (50x speedup over CPU) Trained on two GPUs for a weekA. Krizhevsky, I. Sutskever, and G. Hinton,ImageNet Classification with Deep Convolutional Neural Networks, NIPS 2012

Using CNN for Image ClassificationFully connected layer Fc7d 4096AlexNetAveragingFixed input size:224x224x3d 4096SoftmaxLayer“Jia-Bin”

ImageNet Challenge 2012-2014Best non-convnet in 2012: 26.2%TeamYearPlaceError (top-5)External dataSuperVision – Toronto(7 layers)2012-16.4%noSuperVision20121st15.3%ImageNet 22kClarifai – NYU (7 layers)2013-11.7%noClarifai20131st11.2%ImageNet 22kVGG – Oxford (16 layers)20142nd7.32%noGoogLeNet (19 layers)20141st6.67%noHuman expert*5.1%TeamMethodError (top-5)DeepImage - BaiduData augmentation multi GPU5.33%PReLU-nets - MSRAParametric ReLU smart initialization4.94%BN-Inception ensemble- GoogleReducing internal covariate shift4.82%

Beyond classification DetectionSegmentationRegressionPose estimationMatching patchesSynthesisStyle transferand many more

R-CNN: Regions with CNN features Trained on ImageNet classification Finetune CNN on PASCALRCNN [Girshick et al. CVPR 2014]

Fast R-CNNFast RCNN [Girshick, R 2015]https://github.com/rbgirshick/fast-rcnn

Labeling Pixels: Semantic LabelsFully Convolutional Networks for Semantic Segmentation [Long et al. CVPR 2015]

Labeling Pixels: Edge DetectionDeepEdge: A Multi-Scale Bifurcated Deep Network for Top-Down Contour Detection[Bertasius et al. CVPR 2015]

CNN for RegressionDeepPose [Toshev and Szegedy CVPR 2014]

CNN as a Similarity Measure for MatchingStereo matching [Zbontar and LeCun CVPR 2015]Compare patch [Zagoruyko and Komodakis 2015]FlowNet [Fischer et al 2015]FaceNet [Schroff et al. 2015]Match ground and aerial images[Lin et al. CVPR 2015]

CNN for Online Visual TrackingHierarchical Convolutional Features for Visual Tracking [Ma et al. ICCV 2015]

CNN for Image GenerationLearning to Generate Chairs with Convolutional Neural Networks [Dosovitskiy et al. CVPR 2015]

Chair MorphingLearning to Generate Chairs with Convolutional Neural Networks [Dosovitskiy et al. CVPR 2015]

CNN for Image Restoration/EnhancementSuper-resolution[Dong et al. ECCV 2014]Non-uniform blur estimation[Sun et al. CVPR 2015]Non-blind deconvolution[Xu et al. NIPS 2014]

Style Transfer Find an output image with– similar activations of early layers (low-level) of source image– similar activations of later layers (high-level) of target imageSource imageTarget imageOutput (deepart)A Neural Algorithm of Artistic Style [Gatys et al. 2015]

Image eepimagesent/

Understanding and Visualizing CNN Find images that maximize some class scores Individual neuron activation Visualize input pattern using deconvnet Invert CNN features Breaking CNNs

Find images that maximize some class scoresperson: HOG templateDeep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps[Simonyan et al. ICLR Workshop 2014]

Individual Neuron ActivationRCNN [Girshick et al. CVPR 2014]

Individual Neuron ActivationRCNN [Girshick et al. CVPR 2014]

Individual Neuron ActivationRCNN [Girshick et al. CVPR 2014]

Map activation back to the input pixel space What input pattern originally caused a givenactivation in the feature maps?Visualizing and Understanding Convolutional Networks [Zeiler and Fergus, ECCV 2014]

Layer 1Visualizing and Understanding Convolutional Networks [Zeiler and Fergus, ECCV 2014]

Layer 2Visualizing and Understanding Convolutional Networks [Zeiler and Fergus, ECCV 2014]

Layer 3Visualizing and Understanding Convolutional Networks [Zeiler and Fergus, ECCV 2014]

Layer 4 and 5Visualizing and Understanding Convolutional Networks [Zeiler and Fergus, ECCV 2014]

Invert CNN features Reconstruct an image from CNN featuresUnderstanding deep image representations by inverting them[Mahendran and Vedaldi CVPR 2015]

CNN ReconstructionReconstruction from different layersMultiple reconstructionsUnderstanding deep image representations by inverting them[Mahendran and Vedaldi CVPR 2015]

Breaking CNNsIntriguing properties of neural networks [Szegedy ICLR 2014]

What is going on?x¶E¶x¶Ex x a¶xExplaining and Harnessing Adversarial Examples [Goodfellow ICLR -convnets/

What is going on? Recall gradient descent training: modify theweights to reduce classifier error Ew w w Adversarial examples: modify the image toincrease classifier error ¶Ex x a¶xExplaining and Harnessing Adversarial Examples [Goodfellow ICLR -convnets/

Fooling a linear classifier Perceptron weight update: add a smallmultiple of the example to the weight vector:w w αx To fool a linear classifier, add a small multipleof the weight vector to the training example:x x αwExplaining and Harnessing Adversarial Examples [Goodfellow ICLR -convnets/

Fooling a linear aking-convnets/

Breaking CNNsDeep Neural Networks are Easily Fooled: High Confidence Predictions forUnrecognizable Images [Nguyen et al. CVPR 2015]

Images that both CNN and Human can recognizeDeep Neural Networks are Easily Fooled: High Confidence Predictions forUnrecognizable Images [Nguyen et al. CVPR 2015]

Direct EncodingDeep Neural Networks are Easily Fooled: High Confidence Predictions forUnrecognizable Images [Nguyen et al. CVPR 2015]

Indirect EncodingDeep Neural Networks are Easily Fooled: High Confidence Predictions forUnrecognizable Images [Nguyen et al. CVPR 2015]

Training Convolutional Neural Networks Backpropagation stochastic gradient descentwith momentum– Neural Networks: Tricks of the Trade DropoutData augmentationBatch normalizationInitialization– Transfer learning

Training CNN with gradient descent A CNN as composition of functions𝑓𝒘 𝒙 𝑓𝐿 ( (𝑓2 𝑓1 𝒙; 𝒘1 ; 𝒘2 ; 𝒘𝐿 ) Parameters𝒘 (𝒘𝟏 , 𝒘𝟐 , 𝒘𝑳 ) Empirical loss function1𝐿 𝒘 𝑙(𝑧𝑖 , 𝑓𝒘 (𝒙𝒊 ))𝑛𝑖 Gradient descentNew weight𝒘𝒕 𝟏 𝒇 𝒘𝒕 𝜂𝑡(𝒘𝒕 ) 𝒘Old weightLearning rateGradient

An Illustrative example𝑓 𝑥, 𝑦 𝑥𝑦, 𝑓 𝑓 𝑦, 𝑥 𝑥 𝑦Example: 𝑥 4, 𝑦 3 𝑓 𝑥, 𝑦 12Partial derivatives 𝑓 3, 𝑥 𝑓 4 𝑦Gradient 𝑓 𝑓 𝑓 [ , ] 𝑥 𝑦Example credit: Andrej Karpathy

𝑓 𝑥, 𝑦, 𝑧 𝑥 𝑦 𝑧 𝑞𝑧𝑞 𝑥 𝑦 𝑞 1, 𝑥 𝑞 1 𝑦𝑓 𝑞𝑧 𝑓 𝑧, 𝑞 𝑓 𝑞 𝑧Goal: compute the gradient 𝑓 𝑓 𝑓 𝑓 [ , , ] 𝑥 𝑦 𝑧Example credit: Andrej Karpathy

𝑓 𝑥, 𝑦, 𝑧 𝑥 𝑦 𝑧 𝑞𝑧𝑞 𝑥 𝑦 𝑞 1, 𝑥 𝑞 1 𝑦𝑓 𝑞𝑧 𝑓 𝑧, 𝑞 𝑓 𝑞 𝑧Chain rule: 𝑓 𝑓 𝑞 𝑥 𝑞 𝑥Example credit: Andrej Karpathy

Backpropagation (recursive chain rule)𝑤1𝑞𝑤2𝑤𝑛 𝑓 𝑞 𝑓 𝑞 𝑓 𝑤𝑖 𝑤𝑖 𝑞Local gradientCan be computed during forward passGate gradientThe gate receives this during backprop

DropoutIntuition: successful conspiracies 50 people planning a conspiracy Strategy A: plan a big conspiracy involving 50 people Likely to fail. 50 people need to play their parts correctly. Strategy B: plan 10 conspiracies each involving 5 people Likely to succeed!Dropout: A simple way to prevent neural networks from overfitting [Srivastava JMLR 2014]

DropoutMain Idea: approximatelycombining exponentially manydifferent neural networkarchitectures efficientlyDropout: A simple way to prevent neural networks from overfitting [Srivastava JMLR 2014]

Data Augmentation (Jittering) Create virtual training samples– Horizontal flip– Random crop– Color casting– Geometric distortionDeep Image [Wu et al. 2015]

Parametric Rectified Linear UnitDelving Deep into Rectifiers: Surpassing Human-Level Performance onImageNet Classification [He et al. 2015]

Batch NormalizationBatch Normalization: Accelerating Deep Network Training byReducing Internal Covariate Shift [Ioffe and Szegedy 2015]

Transfer Learning Improvement of learning in a new task through thetransfer of knowledge from a related task that hasalready been learned. Weight initialization for CNNLearning and Transferring Mid-Level Image Representations usingConvolutional Neural Networks [Oquab et al. CVPR 2014]

Convolutional activation features[Donahue et al. ICML 2013]CNN Features off-the-shelf:an Astounding Baseline for Recognition[Razavian et al. 2014]

How transferable are features in CNN?How transferable are features in deepneural networks [Yosinski NIPS 2014]

Deep Rendering Model (DRM)A Probabilistic Theory of Deep Learning [Patel, Nguyen, and Baraniuk 2015]

CNN as a Max-Sum InferenceA Probabilistic Theory of Deep Learning [Patel, Nguyen, and Baraniuk 2015]

ModelInferenceLearning

Tools w

Resources http://deeplearning.net/– Hub to many other deep learning resources eplearning– A resource collection deep learning https://github.com/kjw0612/awesome-deep-vision– A resource collection deep learning for computer vision http://cs231n.stanford.edu/syllabus.html– Nice course on CNN for visual recognition

Things to remember Overview– Neuroscience, Perceptron, multi-layer neural networks Convolutional neural network (CNN)– Convolution, nonlinearity, max pooling– CNN for classification and beyond Understanding and visualizing CNN– Find images that maximize some class scores;visualize individual neuron activation, input pattern andimages; breaking CNNs Training CNN– Dropout; data augmentation; batch normalization; transferlearning Probabilistic interpretation– Deep rendering model; CNN forward-propagation as maxsum inference; training as an EM algorithm

Jia-Bin Huang, Virginia Tech. Today's class Overview Convolutional Neural Network (CNN) Understanding and Visualizing CNN Training CNN. Image Categorization: Training phase Training . CNN as a Similarity Measure for Matching FaceNet [Schroff et al. 2015] Stereo matching [Zbontar and LeCun CVPR 2015]

Related Documents:

Learning a Deep Convolutional Network for Image Super-Resolution . a deep convolutional neural network (CNN) [15] that takes the low- . Convolutional Neural Networks. Convolutional neural networks (CNN) date back decades [15] and have recently shown an explosive popularity par-

Video Super-Resolution With Convolutional Neural Networks Armin Kappeler, Seunghwan Yoo, Qiqin Dai, and Aggelos K. Katsaggelos, Fellow, IEEE Abstract—Convolutional neural networks (CNN) are a special type of deep neural networks (DNN). They have so far been suc-cessfully applied to image super-resolution (SR) as well as other image .

2 Convolutional neural networks CNNs are hierarchical neural networks whose convolutional layers alternate with subsampling layers, reminiscent of sim-ple and complex cells in the primary visual cortex [Wiesel and Hubel, 1959]. CNNs vary in how convolutional and sub-sampling layers are realized and how the nets are trained. 2.1 Image processing .

Deep Neural Networks Convolutional Neural Networks (CNNs) Convolutional Neural Networks (CNN, ConvNet, DCN) CNN a multi‐layer neural network with – Local connectivity: Neurons in a layer are only connected to a small region of the layer before it – Share weight parameters across spatial positions:

ImageNet Classification with Deep Convolutional Neural Networks, NIPS 2012 M. Zeiler and R. Fergus, Visualizing and Understanding Convolutional Networks, ECCV 2014 K. Simonyan and A. Zisserman, Very Deep Convolutional Networks for Large-Scale Image Recognition, ICLR 2015

Dual-domain Deep Convolutional Neural Networks for Image Demoireing An Gia Vien, Hyunkook Park, and Chul Lee Department of Multimedia Engineering Dongguk University, Seoul, Korea viengiaan@mme.dongguk.edu, hyunkook@mme.dongguk.edu, chullee@dongguk.edu Abstract We develop deep convolutional neural networks (CNNs)

Convolutional Neural Networks While in fully-connected deep neural networks, the activa-tion of each hidden unit is computed by multiplying the entire in-put by the correspondent weights for each neuron in that layer, in CNNs, the activation of each hidden unit is computed for a small input area. CNNs are composed of convolutional layers which

Deep Convolutional Neural Networks for Remote Sensing Investigation of Looting of the Archeological Site of Al-Lisht, Egypt by Timberlynn Woolf . potential to expedite the looting detection process using Deep Convolutional Neural Networks (CNNs). Monitoring of looting is complicated in that it is an illicit activity, subject to legal sanction .