Neural Networks : Basics

2y ago
21 Views
2 Downloads
269.89 KB
46 Pages
Last View : 16d ago
Last Download : 2m ago
Upload by : Vicente Bone
Transcription

Neural Networks :BasicsEmil M. Petriu, Dr. Eng., P. Eng., FIEEEProfessorSchool of Information Technology and EngineeringUniversity of OttawaOttawa, ON., Canadahttp://www.site.uottawa.ca/ petriu/petriu@site.uottawa.caUniversity of OttawaSchool of Information Technology - SITESensing and Modelling Research LaboratorySMRLab - Prof. Emil M. Petriu

Biological NeuronsDendritesBodySynapseAxonDendrites carry electrical signals in into the neuron body.The neuron body integrates and thresholds the incoming signals.The axon is a single long nerve fiber that carries the signal fromthe neuron body to other neurons.A synapse is the connection between dendrites of two neurons.Incoming signals to a dendrite may be inhibitory or excitatory.The strength of any input signal is determined by the strength ofits synaptic connection. A neuron sends an impulse down its axonif excitation exceeds inhibition by a critical amount (threshold/offset/bias) within a time window (period of latent summation).Memories are formed by the modification of the synaptic strengthswhich can change during the entire life of the neural systems.Biological neurons are rather slow (10-3 s) when compared withthe modern electronic circuits. The brain is faster than anelectronic computer because of its massively parallel structure.The brain has approximately 1011 highly connected neurons (approx.104 connections per neuron).University of OttawaSchool of Information Technology - SITESensing and Modelling Research LaboratorySMRLab - Prof. Emil M. Petriu

Historical Sketch of Neural Networks1940sNatural components of mind-like machines are simple abstractions based on the behaviorof biological nerve cells, and such machines can be built by interconnecting such elements.W. McCulloch & W. Pitts (1943) the first theory on the fundamentals of neural computing(neuro-logicalnetworks) “A Logical Calculus of the Ideas Immanent in Nervous Activity” McCulloch-Pitts neuron model; (1947) “How We Know Universals” - an essay on networkscapable of recognizing spatial patterns invariant of geometric transformations.Cybernetics: attempt to combine concepts from biology, psychology, mathematics, and engineering.D.O. Hebb (1949) “The Organization of Behavior” the first theory of psychology on conjecturesabout neural networks (neural networks might learn by constructing internal representations ofconcepts in the form of “cell-assemblies” - subfamilies of neurons that would learn to support oneanother’s activities). Hebb’s learning rule: “When an axon of cell A is near enough to excite acell B and repeatedly or persistently takes part in firing it, some growth process or metabolic changetakes place in one or both cells such that A’s efficiency, as one of the cells firing B, is increased.”University of OttawaSchool of Information Technology - SITESensing and Modelling Research LaboratorySMRLab - Prof. Emil M. Petriu

1950sCybernetic machines developed as specific architectures to perform specific functions. “machines that could learn to do things they aren’t built to do”M. Minsky (1951) built a reinforcement-based network learning system.IRE Symposium “The Design of Machines to Simulate the Behavior of the Human Brain” (1955)with four panel members: W.S. McCulloch, A.G. Oettinger, O.H. Schmitt, N. Rochester, invitedquestioners: M. Minsky, M. Rubinoff, E.L. Gruenberg, J. Mauchly, M.E. Moran, W. Pitts, and themoderator H.E. Tompkins.F. Rosenblatt (1958) the first practical Artificial Neural Network (ANN) - the perceptron, “ThePerceptron: A Probabilistic Model for Information Storage and Organization in the Brain.”.By the end of 50s, the NN field became dormant because of the new AI advances based onserial processing of symbolic expressions.University of OttawaSchool of Information Technology - SITESensing and Modelling Research LaboratorySMRLab - Prof. Emil M. Petriu

1960sConnectionism (Neural Networks) - versus - Symbolism (Formal Reasoning)B. Widrow & M.E. Hoff (1960) “Adaptive Switching Circuits” presents an adaptive percepton-likenetwork. The weights are adjusted so to minimize the mean square error between the actual and desiredoutput Least Mean Square (LMS) error algorithm. (1961) Widrow and his students “Generalizationand Information Storage in Newtworks of Adaline “Neurons.”M. Minsky & S. Papert (1969) “Perceptrons” a formal analysis of the percepton networks explainingtheir limitations and indicating directions for overcoming them relationship between the perceptron’sarchitecture and what it can learn: “no machine can learn to recognize X unless it poses some schemefor representing X.”Limitations of the perceptron networks led to the pessimist view of the NN field as havingno future no more interest and funds for NN research!!!University of OttawaSchool of Information Technology - SITESensing and Modelling Research LaboratorySMRLab - Prof. Emil M. Petriu

1970sMemory aspects of the Neural Networks.T. Kohonen (1972) “Correlation Matrix Memories” a mathematical oriented paper proposing acorrelation matrix model for associative memory which is trained, using Hebb’s rule, to learnassociations between input and output vectors.J.A. Anderson (1972) “A Simple Neural Network Generating an Interactive Memory” a physiologicaloriented paper proposing a “linear associator” model for associative memory, using Hebb’s rule, to learnassociations between input and output vectors.S. Grossberg (1976) “Adaptive Pattern Classification and Universal Recording: I. Parallel Developmentand Coding of Neural Feature Detectors”describes a self-organizing NN model of the visual systemconsisting of a short-term and long term memory mechanisms. continuous-time competitivenetwork that forms a basis for the Adaptive Resonance Theory (ART) networks.University of OttawaSchool of Information Technology - SITESensing and Modelling Research LaboratorySMRLab - Prof. Emil M. Petriu

1980sRevival of Learning Machine.[Minsky]: “The marvelous powers of the brain emerge not from any single, uniformly structuredconnectionst network but from highly evolved arrangements of smaller, specialized networkswhich are interconnected in very specific ways.”D.E. Rumelhart & J.L. McClelland, eds. (1986) “Parallel Distributed Processing: Explorations in theMicrostructure of Cognition: Explorations in the Microstructure of Cognition” represents a milestonein the resurgence of NN research.J.A. Anderson & E. Rosenfeld (1988) “Neurocomputing: Foundations of Research” contains over fortyseminal papers in the NN field.DARPA Neural Network Study(1988) a comprehensive review of the theory and applications of theNeural Networks.International Neural Network Society (1988) . IEEE Tr. Neural Networks (1990).University of OttawaSchool of Information Technology - SITESensing and Modelling Research LaboratorySMRLab - Prof. Emil M. Petriu

Artificial Neural Networks (ANN)McCulloch-Pitts model of an artificial neuronp1pjpRyw1.wjwRSome transfer functions “f”Hard Limit: y 0 if z 0y 1 if z 0Σzf1ybz0ySymmetrical: y -1 if z 0Hard Limit y 1 if z 010yy f ( w1 . p1 wj . pj . wR . pR b)Log-Sigmoid:y 1/(1 e-z )-110zy f (W. p b)p (p 1, , pR)T is the input column-vectorzyLinear:y z0zW (w1, , wR) is the weight row-vector*) The bias b can be treated as a weight whose input is always 1.University of OttawaSchool of Information Technology - SITESensing and Modelling Research LaboratorySMRLab - Prof. Emil M. Petriu

The Architecture of an ANNANNs map input/stimulus valuesto output/response values: Y F (P).§Number of inputs and outputs of the network;§Number of layers;§How the layers are connected to each other;§The transfer function of each layer;§Number of neurons in each layer;Y’Y F (P)PP’Y’ F (P’)BPMeasure of system’s F creativity:Volume of “stimuli ball BP “YIntelligent systems generalize:BYtheir behavioral repertoires exceedtheir experience. An intelligentsystem is said to have a creativebehaviour if it provides appropriateresponses when faced with new stimuli. Usually the new stimuliP’ resemble known stimuli P and their corresponding responsesY’ resemble known/learned responses Y.Volume of “response ball BY”University of OttawaSchool of Information Technology - SITESensing and Modelling Research LaboratorySMRLab - Prof. Emil M. Petriu

Most of the mapping functions can be implemented by a two-layer ANN: a sigmoid layer feeding alinear output layer.ANNs with biases can represent relationships between inputs and outputs than networkswithout biases.Feed-forward ANNs cannot implement temporal relationships. Recurrent ANNs have internalfeedback paths that allow them to exhibit temporal behaviour.Layer 1Layer 2y(1,1)p1.pRN (1,1)N (3,1).y(1,R1)y (3,1)y(2,1)N (2,1).N (1,R1)Layer 3N (2,R2).y(2,R2)N (3,R3)Feed-forward architecture with three layersUniversity of OttawaSchool of Information Technology - SITEN (1).y (3,R3)N (R)y(1).y(R)Recurrent architecture (Hopfield NN)The ANN is usually supplied with an initialinput vector and then the outputs are usedas inputs for each succeeding cycle.Sensing and Modelling Research LaboratorySMRLab - Prof. Emil M. Petriu

Learning Rules (Training Algorithms)Procedure/algorithm to adjust the weights and biasesin order for the ANN to perform the desired task.Supervised LearningFor a given training set of pairs{p(1),t(1)},.,{p(n),t(n)}, where p(i)is an instance of the input vector andt(i) is the corresponding targetvalue for the output y, the learningrule calculates the updated value ofthe neuron weights and bias.pj( j 1, ,R).wjΣzyfbLearningRuleee t-ytReinforcement LearningSimilar to supervised learning - instead of being provided with the correct output value for each giveninput, the algorithm is only provided with a given grade/score as a measure of ANN’s performance.Unsupervised LearningThe weight and unbiased are adjusted based on inputs only. Most algorithms of this type learn tocluster input patterns into a finite number of classes. e.g. vector quantization applicationsUniversity of OttawaSchool of Information Technology - SITESensing and Modelling Research LaboratorySMRLab - Prof. Emil M. Petriu

THE PERCEPTRONFrank Rosenblatt (1958), Marvin Minski & Seymour Papert (1969)[Minski] “Perceptrons make decisions/determine whether or not event fits a certain patternby adding up evidence obtained from many small experiments”The perceptron is a neuron with a hard limit transfer function and a weight adjustment mechanism(“learning”) by comparing the actual and the expected output responses for any given input /stimulus.p1pjpRw1.wjwRfΣz10Perceptrons are well suited forpattern classification/recognition.yThe weight adjustment/trainingmechanism is called the perceptronlearning rule.by f (W. p b)NB: W is a row-vector and p is a column-vector.University of OttawaSchool of Information Technology - SITESensing and Modelling Research LaboratorySMRLab - Prof. Emil M. Petriu

Perceptron Learning Rule§ Supervised learningp1t the target valuee t-y the errorBecause of the perceptron’s hard limittransfer function y, t, e can take onlybinary valuespjpnw1.fwjΣwRbz10yp (p 1, , pR)T is the input column-vectorW (x1, , xR) is the weight row-vectorPerceptron learning rule:if e 1, then Wnew Wold p , bnew bold 1;if e -1, then Wnew Wold - p , bnew bold - 1 ;Wnew Wold e.pTbnew b old eif e 0, then Wnew Wold .University of OttawaSchool of Information Technology - SITESensing and Modelling Research LaboratorySMRLab - Prof. Emil M. Petriu

The hard limit transfer function (threshold function) provides the ability to classify input vectorsby deciding whether an input vector belongs to one of two linearly separable classes.p2Two-Input Perceptrony sign (b)p1p2w1w2Σzy sign (-b)-b / w21yW0fb0y hardlim (z) hardlim{ [w1 , w2] . [p 1 , p 2]T b}p1-b / w1w1 . p1 w2 . p2 b 0(z 0)The two classes (linearly separable regions) in the two-dimensionalinput space (p 1, p2) are separated by the line of equation z 0.The boundary is always orthogonal to the weight vector W.University of OttawaSchool of Information Technology - SITESensing and Modelling Research LaboratorySMRLab - Prof. Emil M. Petriu

q Example #1: Teaching a two-input perceptron to classify five input vectors into two classesp(1) (0.6, 0.2)Tt(1) 1p(2) (-0.2, 0.9)Tt(2) 1p(3) (-0.3, 0.4)Tt(3) 0p(4) (0.1, 0.1)Tt(4) 0p(5) (0.5, -0.6)Tt(5) 0The MATLAB solution is:p21p1-11-1University of OttawaSchool of Information Technology - SITEP [0.6 -0.2 -0.3 0.1 0.5;0.2 0.9 0.4 0.1 -0.6];T [1 1 0 0 0];W [-2 2];b -1;plotpv(P,T);plotpc(W,b);nepoc 0Y hardlim(W*P b);while any(Y T)Y hardlim(W*P b);E T-Y;[dW,db] learnp(P,E);W W dW;b b db;nepoc nepoc 1;disp(‘epochs ‘),disp(nepoc),disp(W), disp(b);plotpv(P,T);plotpc(W,b);endSensing and Modelling Research LaboratorySMRLab - Prof. Emil M. Petriu

q Example #1:Input Vector Classification2After nepoc 11(epochs of trainingstarting from aninitial weight vector1.5W [-2 2] and abias b -1)1the weights are:p20.5w1 2.4w2 3.10and the bias is:b -2-0.5-1-1.5-0.8University of OttawaSchool of Information Technology - SITE-0.6-0.4-0.200.2p10.40.60.811.2Sensing and Modelling Research LaboratorySMRLab - Prof. Emil M. Petriu

ØThe larger an input vector p is, the larger is its effect on the weight vector W during the learning processLong training times can be caused by the presence of an “outlier,” i.e. an input vectorwhose magnitude is much larger, or smaller, than other input vectors.Normalized perceptron learning rule,the effect of each input vector on theweights is of the same magnitude:Wnew Wold e.pT / pbnew b old ePerceptron Networks for Linearly Separable VectorsThe hard limit transfer function of the perceptron provides the ability to classify input vectorsby deciding whether an input vector belongs to one of two linearly separable classes.p [0 0 1 1;0 1 0 1]tAND [ 0 0 0 1 ]p2AND1p [0 0 1 1;0 1 0 1]tOR [ 0 1 1 1 ]p2OR1p1W [2 2]b -3University of OttawaSchool of Information Technology - SITE01p1W [2 2]b -101Sensing and Modelling Research LaboratorySMRLab - Prof. Emil M. Petriu

Three-Input Perceptronp1The two classes inthe 3-dimensionalinput space (p 1, p 2, p3)are separated by theplane of equation z 0.w1p2w2p3w3Σzy10y hardlim ( z ) hardlim{ [w1 , w2 ,w3] .[p 1 , p2 p3]T b}fbEXAMPLEP [ -1 1 1 -1 -1 1 1 -1; T [ 0 1 0 0 1 1 1 0 ]-1 -1 1 1 -1 -1 1 1;-1 -1 -1 -1 1 1 1 1]2p310-1-221p22100-1-1-2p1-2University of OttawaSchool of Information Technology - SITESensing and Modelling Research LaboratorySMRLab - Prof. Emil M. Petriu

One-layer multi-perceptron classification of linearly separable patternsDemo P3 in the “MATLAB Neural NetworkToolbox - User’s Guide”MATLAB representation:Perceptron LayerInputpRx1RyWΣSxR1P [ 0.1 0.7 0.8 0.8 1.0 0.3 0.0 -0.3 -0.5 -1.5;1.2 1.8 1.6 0.6 0.8 0.5 0.2 0.8 -1.5 -1.3 ]Sx1zSx1Where:R # InputsS # NeuronsbSx100 O ; 10 01 * ; 11 xT [ 1 1 1 0 0 1 1 1 0 0;0000011111]43y hardlim(W*p b)2510p210Error10R 2 inputsS 2 neurons-5100-1-1010-2-1510-201002468# Epochs-3-3-2-10123p1University of OttawaSchool of Information Technology - SITESensing and Modelling Research LaboratorySMRLab - Prof. Emil M. Petriu

Perceptron Networks for Linearly Non-Separable Vectorsp2p [0 0 1 1;0 1 0 1]tXOR [ 0 1 1 0 ]XOR1p10p1p2If a straight line cannot be drawn between the set ofinput vectors associated with targets of 0 value andthe input vectors associated with targets of 1, than aperceptron cannot classify these input vectors.1w11,1Σw11,2One solution is to use a two layer architecture, the perceptrons in the first layer areused as preprocessors producing linearly separable vectors for the second layer.(Alternatively, it is possible to use linear ANNor back-propagation ,2Σz12b121y12b21w1,1 w1,20f1The row index of a weight indicates the destinationneuron of the weight and the column index indicateswhich source is the input for that weight.University of OttawaSchool of Information Technology - SITEf2w2,1 w2,2 1 1b111 1b12[ w21,1 w21,2] [-1 1]-1.5 -0.5[ b21 ] [-0.5]Sensing and Modelling Research LaboratorySMRLab - Prof. Emil M. Petriu

LINEAR NEURAL NETWORKS (ADALINE NETWORKS)Linear Neuron LayerInputpRx1SxR1RyWΣqLinear neurons have a linear transfer functionthatallows to use a Least Mean-Square (LMS) procedure- Widrow-Hoff learning rule- to adjust weights andbiases according to the magnitude of errors.qLinear neurons suffer from the same limitation as theperceptron networks: they can only solve linearlyseparable problems.Sx1zSx1bSx1y purelin(W*p b)( ADALINE ADAptive LInear NEuron )Where: R # Inputs, S # NeuronsWidrow-Hoff Learning Rule ( The ) Rule ).wjpjΣy(y z)( j 1, ,R).bLMSLearning RuleUniversity of OttawaSchool of Information Technology - SITEee t-ytThe LMS algorithm will adjust ADALINE’s weightsand biases in such away to minimize the mean-squareerror E [e2] between all sets of the desired responseand network’s actual response:E [ (t-y)2 ] E [ (t - (w1 wR b) . (p 1 pR 1)T )2 ] E [ (t - W . p)2 ](NB: E[ ] denotes the “expected value ”; p is column vector)Sensing and Modelling Research LaboratorySMRLab - Prof. Emil M. Petriu

Widrow-Hoff algorithmE [ e2 ] E [ (t - W . p)2 ] {as for deterministic signals the expectation becomes a time-average } E[t2] - 2.W . E[t.p] W . E[p.pT ] . WTThe cross-correlationbetween the input vectorand its associated target.If the input correlation matrix is positivethe LMS algorithm will converge as there willa unique minimum of the mean square error.The input crosscorrelation matrixq The W-H rule is an iterative algorithm uses the “steepest-descent” method to reduce the mean-square-error.The key point of the W-H algorithm is that it replaces E[e2] estimation by the squared error of the iteration k:e2(k). At each iteration step k it estimates the gradient of this error k with respect to W as a vector consistingof the partial derivatives of e2(k) with respect to each weight: *k e 2 ( k ) W ( k ) [ e 2 ( k ) w1 ( k ). e 2 ( k ) wR ( k ), e2 (k ) b ( k )]The weight vector is then modified in the direction that decreases the error:W ( k 1) W ( K ) µ *k W (k ) µ e2 (k ) W ( k ) W (k ) 2 µ e(k ) e(k ) W ( k )As t(k) and p(k) - both affecting e(k) - are independent of W(k), we obtain the final expression of theWidrow-Hoff learning rule:W(k 1) W (k) 2.µ .e(k). p(k)b(k 1) b(k) 2.µ .e(k)where µ the “learning rate” and e(k) t(k)-y(k) t(k)-W(k) . p(k)University of OttawaSchool of Information Technology - SITESensing and Modelling Research LaboratorySMRLab - Prof. Emil M. Petriu

Widrow-Hoff algorithmDemo Lin 2 in the “MATLAB Neural Network Toolbox - User’s Guide”One-neuron one-input ADALINE, starting from some randomvalues for w -0.96 and b -0.90 and using the “trainwh” MATLABNN toolbox function, reaches the target after 12 epochs with an errore 0.001. The solution found for the weight and bias is:w -0.2354 and b 0.7066.Bias bErrorP [ 1.0 -1.2]T [ 0.5 1.0]WeightWUniversity of OttawaSchool of Information Technology - SITEbBiasWeight WSensing and Modelling Research LaboratorySMRLab - Prof. Emil M. Petriu

Back-Propagation Learning- The Generalized ) RuleP. Werbos (Ph.D. thesis 1974);D. Parker (1985), Yann Le Cun(1985),D. Rumelhart, G. Hinton, R. Williams (1986)q Single layer ANNs are suitable to only solving linearly separable classification problems. Multiple feedforward layers can give an ANN greater freedom. Any reasonable function can be modeled by a two layerarchitecture: a sigmoid layer feeding a linear output layer.q Single layer ANNs are only able to solve linearly Widrow-Hoff learning applies to single layer networks. generalized W-H algorithm ( -rule) back-propagation learning.q Back-propagation ANNs often

W. McCulloch & W. Pitts (1943) the first theory on the fundamentals of neural computing (neuro-logicalnetworks) “A Logical Calculus of the Ideas Immanent in Nervous Activity” McCulloch-Pitts neuron model; (1947) “How We Know Universals” - an essay on networks capable of recognizing spatial patterns invariant of geometric transformations.

Related Documents:

A growing success of Artificial Neural Networks in the research field of Autonomous Driving, such as the ALVINN (Autonomous Land Vehicle in a Neural . From CMU, the ALVINN [6] (autonomous land vehicle in a neural . fluidity of neural networks permits 3.2.a portion of the neural network to be transplanted through Transfer Learning [12], and .

Deep Neural Networks Convolutional Neural Networks (CNNs) Convolutional Neural Networks (CNN, ConvNet, DCN) CNN a multi‐layer neural network with – Local connectivity: Neurons in a layer are only connected to a small region of the layer before it – Share weight parameters across spatial positions:

neural networks using genetic algorithms" has explained that multilayered feedforward neural networks posses a number of properties which make them particularly suited to complex pattern classification problem. Along with they also explained the concept of genetics and neural networks. (D. Arjona, 1996) in "Hybrid artificial neural

4 Graph Neural Networks for Node Classification 43 4.2.1 General Framework of Graph Neural Networks The essential idea of graph neural networks is to iteratively update the node repre-sentations by combining the representations of their neighbors and their own repre-sentations. In this section, we introduce a general framework of graph neural net-

What is a neural network Artificial neural networks (ANN / NN) are computing systems vaguely inspired by the biological neural networks that constitute animal brains. Such systems "learn" to perform tasks by considering examples, generally without being programmed with task-specific rules. –[Wikipedia]

Neuro-physiologists use neural networks to describe and explore medium-level brain function (e.g. memory, sensory system, motorics). Physicists use neural networks to model phenomena in statistical mechanics and for a lot of other tasks. Biologists use Neural Networks to interpret nucleotide sequences.

Artificial Neural Networks Develop abstractionof function of actual neurons Simulate large, massively parallel artificial neural networks on conventional computers Some have tried to build the hardware too Try to approximate human learning, robustness to noise, robustness to damage, etc. Early Uses of neural networks

Video Super-Resolution With Convolutional Neural Networks Armin Kappeler, Seunghwan Yoo, Qiqin Dai, and Aggelos K. Katsaggelos, Fellow, IEEE Abstract—Convolutional neural networks (CNN) are a special type of deep neural networks (DNN). They have so far been suc-cessfully applied to image super-resolution (SR) as well as other image .