Neural Networks Tutorial - Department Of Computer Science, University .

1y ago
12 Views
2 Downloads
1.32 MB
20 Pages
Last View : 1m ago
Last Download : 3m ago
Upload by : Genevieve Webb
Transcription

CSC411/2515Fall 2016Neural Networks TutorialLluís CastrejónOct. 2016Slides adapted from Yujia Li’s tutorial and Prof. Zemel’s lecture notes.

Overfitting The training data contains information about the regularitiesin the mapping from input to output. But it also contains noise– The target values may be unreliable.– There is sampling error. There will be accidentalregularities just because of the particular training casesthat were chosen When we fit the model, it cannot tell which regularities arereal and which are caused by sampling error.– So it fits both kinds of regularity.– If the model is very flexible it can model the sampling errorreally well. This is a disaster.2

OverfittingPicture credit: Chris Bishop. Pattern Recognition and Machine Learning. Ch.1.1.

Preventing overfitting Use a model that has the right capacity:– enough to model the true regularities– not enough to also model the spuriousregularities (assuming they are weaker) Standard ways to limit the capacity of a neural net:– Limit the number of hidden units.– Limit the size of the weights.– Stop the learning before it has time to overfit.4

Limiting the size of the weightsWeight-decay involves addingan extra term to the costfunction that penalizes thesquared weights.– Keeps weights smallunless they have big errorderivatives.C E λ22 wii C E λwi wi wi C1 Ewhen 0, wi wiλ wiCw5

The effect of weight-decay It prevents the network from using weights that it does notneed– This can often improve generalization a lot.– It helps to stop it from fitting the sampling error.– It makes a smoother model in which the output changesmore slowly as the input changes. But, if the network has two very similar inputs it prefers toput half the weight on each rather than all the weight onone à other form of weight decay?w/2w/2w06

Deciding how much to restrict the capacity How do we decide which limit to use and howstrong to make the limit?– If we use the test data we get an unfairprediction of the error rate we would get on newtest data.– Suppose we compared a set of models that gaverandom results, the best one on a particulardataset would do better than chance. But itwon’t do better than chance on another test set. So use a separate validation set to do modelselection.7

Using a validation set Divide the total dataset into three subsets:– Training data is used for learning the parametersof the model.– Validation data is not used of learning but is usedfor deciding what type of model and whatamount of regularization works best– Test data is used to get a final, unbiased estimateof how well the network works. We expect thisestimate to be worse than on the validation data We could then re-divide the total dataset to getanother unbiased estimate of the true error rate.8

Preventing overfitting by early stopping If we have lots of data and a big model, its veryexpensive to keep re-training it with differentamounts of weight decay It is much cheaper to start with very small weightsand let them grow until the performance on thevalidation set starts getting worse The capacity of the model is limited because theweights have not had time to grow big.9

Why early stopping works When the weights are verysmall, every hidden unit is in itslinear range.– So a net with a large layer ofhidden units is linear.– It has no more capacity thana linear net in which theinputs are directly connectedto the outputs! As the weights grow, the hiddenunits start using their non-linearranges so the capacity grows.outputsinputs10

Le Net Yann LeCun and others developed a really goodrecognizer for handwritten digits by usingbackpropagation in a feedforward net with:– Many hidden layers– Many pools of replicated units in each layer.– Averaging the outputs of nearby replicated units.– A wide net that can cope with several charactersat once even if they overlap. Demo of LENET11

Recognizing DigitsHand-written digit recognition network– 7291 training examples, 2007 test examples– Both contain ambiguous and misclassified examples– Input pre-processed (segmented, normalized) 16x16 gray level [-1,1], 10 outputs12

LeNet: SummaryMain ideas: Local à global processing Retain coarse posn infoMain technique: weight sharing –units arranged in feature mapsConnections: 1256 units, 64,660cxns, 9760 free parametersResults: 0.14% (train), 5.0% (test)vs. 3-layer net w/ 40 hidden units:1.6% (train), 8.1% (test)13

The 82 errors made by LeNet5Notice thatmost of theerrors arecases thatpeople findquite easy.The humanerror rate isprobably 20to 30 errors14

A brute force approach LeNet uses knowledge about the invariances to design:– the network architecture– or the weight constraints– or the types of feature But its much simpler to incorporate knowledge of invariancesby just creating extra training data:– for each training image, produce new training data byapplying all of the transformations we want to beinsensitive to– Then train a large, dumb net on a fast computer.– This works surprisingly well15

16

Making backpropagation work for recognizing digits Using the standard viewing transformations, and localdeformation fields to get lots of data. Use many, globally connected hidden layers and learnfor a very long time– This requires a GPU board or a large cluster Use the appropriate error measure for multi-classcategorization– Cross-entropy, with softmax activation This approach can get 35 errors on MNIST!17

Fabricating training dataGood generalization requires lots of training data,including examples from all relevant input regionsImprove solution if good data can be constructedExample: ALVINN18

ALVINN: simulating training examplesOn-the-fly training: current video camera image as input,current steering direction as targetBut: over-train on same inputs; no experience going offroadMethod: generate new examples by shifting imagesReplace 10 low-error & 5random trainingexamples with 15 newKey: relation between inputand output known!19

Neural Net DemosDigit recognitionScene recognition - Places MITNeural Nets PlaygroundNeural Style Transfer

Neural Networks Tutorial Lluís Castrejón Oct. 2016 Slides adapted from Yujia Li's tutorial and Prof. Zemel's lecture notes. Overfitting The training data contains information about the regularities in the mapping from input to output. But it also contains noise

Related Documents:

A growing success of Artificial Neural Networks in the research field of Autonomous Driving, such as the ALVINN (Autonomous Land Vehicle in a Neural . From CMU, the ALVINN [6] (autonomous land vehicle in a neural . fluidity of neural networks permits 3.2.a portion of the neural network to be transplanted through Transfer Learning [12], and .

Deep Neural Networks Convolutional Neural Networks (CNNs) Convolutional Neural Networks (CNN, ConvNet, DCN) CNN a multi‐layer neural network with – Local connectivity: Neurons in a layer are only connected to a small region of the layer before it – Share weight parameters across spatial positions:

neural networks using genetic algorithms" has explained that multilayered feedforward neural networks posses a number of properties which make them particularly suited to complex pattern classification problem. Along with they also explained the concept of genetics and neural networks. (D. Arjona, 1996) in "Hybrid artificial neural

4 Graph Neural Networks for Node Classification 43 4.2.1 General Framework of Graph Neural Networks The essential idea of graph neural networks is to iteratively update the node repre-sentations by combining the representations of their neighbors and their own repre-sentations. In this section, we introduce a general framework of graph neural net-

Chances are, if you are searching for a tutorial on artificial neural networks (ANN) you already have some idea of what they are, and what they are capable of doing. . neural network structure consists of an input layer, a hidden layer and an output layer. An example of such a structure can be seen below: Figure 10. Three layer neural network .

Neuro-physiologists use neural networks to describe and explore medium-level brain function (e.g. memory, sensory system, motorics). Physicists use neural networks to model phenomena in statistical mechanics and for a lot of other tasks. Biologists use Neural Networks to interpret nucleotide sequences.

Artificial Neural Networks Develop abstractionof function of actual neurons Simulate large, massively parallel artificial neural networks on conventional computers Some have tried to build the hardware too Try to approximate human learning, robustness to noise, robustness to damage, etc. Early Uses of neural networks

Video Super-Resolution With Convolutional Neural Networks Armin Kappeler, Seunghwan Yoo, Qiqin Dai, and Aggelos K. Katsaggelos, Fellow, IEEE Abstract—Convolutional neural networks (CNN) are a special type of deep neural networks (DNN). They have so far been suc-cessfully applied to image super-resolution (SR) as well as other image .