Introduction To Neural Networks - Computer Science

3y ago
43 Views
2 Downloads
1.47 MB
21 Pages
Last View : 14d ago
Last Download : 3m ago
Upload by : Giovanna Wyche
Transcription

Introduction to Neural NetworksCompSci 570Ronald ParrDuke UniversityDepartment of Computer ScienceWith thanks to Kris Hauser for some contentMany Applications of Neural Networks Used in unsupervised, supervised, andreinforcement learning Focus on use for supervised learning here Not a different type of learning – just adifferent type of function1

How Most Supervised LearningAlgorithms Work Main idea: Minimize error on training set How this is done depends on:– Hypothesis space– Type of data Some approaches use a regularizer or priorto trade off training set error vs. hypothesisspace complexitySuppose we’re in 1-dimensionEasy to find alinear separatorx 0Copyright 2001, 2003, Andrew W. Moore2

Harder 1-dimensional datasetWhat can be doneabout this?x 0Copyright 2001, 2003, Andrew W. MooreHarder 1-dimensional datasetRemember how permittingnon-linear basis functionsmade linear regression somuch nicer?Let’s permit them here toox 0Copyright 2001, 2003, Andrew W. Moore3

Harder 1-dimensional datasetNow linearly separable inthe new feature spaceBut, what if the rightfeature set isn’t obviousx 0Copyright 2001, 2003, Andrew W. MooreMotivation for non-linear Classifiers Linear methods are “weak”– Make strong assumptions– Can only express relatively simple functions of inputs Coming up with good features can be hard– Requires human input– Knowledge of the domain Role of neural networks– Neural networks started as linear models of single neurons– Combining ultimately led to non-linear functions that don’tnecessarily need careful feature engineering4

Neural Network Motivation Human brains are only known example of actual intelligenceIndividual neurons are slow, boringBrains succeed by using massive parallelismIdea: Copy what works Raises many issues:– Is the computational metaphor suited to the computational hardware?– How do we know if we are copying the important part?– Are we aiming too low?Why Neural Networks?Maybe computers should be more brain-like:ComputersBrainsComputational Units1010transistors/CPU1011neurons/brainStorage Units1011 bits RAM1013 bits HD1011 neurons1014 synapsesCycle Time10-9 S10-3 SBandwidth1010 bits/s*1014 bits/sCompute Power1010 Ops/s1014 Ops/s5

Comments on Summit(world’s fastest supercomputer as of 10/19) 149 Petaflops 1018 Ops/s (Summit) vs. 1014 Ops/s (brain) 2.4M cores (conflicting reports) 2.8 PB RAM (1017 bits) 10 Megawatts power( 10M/year in electricity [my estimate]) 200M costNote: recently surpassed by Fugaku – 3x more cores, 3x more power, 3x performance, 5x costMore Comments on Summit What is wrong with this picture?– Weight– Size– Power Consumption What is missing?– Still can’t replicate human abilities(though vastly exceeds human abilities in many areas)– Are we running the wrong programs?– Is the architecture well suited to the programs wemight need to run?6

Artificial Neural Networks Develop abstraction of function of actual neurons Simulate large, massively parallel artificial neuralnetworks on conventional computers Some have tried to build the hardware too Try to approximate human learning, robustness tonoise, robustness to damage, etc.Early Uses of neural networks Trained to pronounce English– Training set: Sliding window over text, sounds– 95% accuracy on training set– 78% accuracy on test set Trained to recognize handwritten digits– 99% accuracy Trained to drive (Pomerleau et al. no-hands across America 1995)https://www.cs.cmu.edu/ tjochem/nhaa/navlab5 details.html7

Neural Network Lore Neural nets have been adopted with an almost religious fervorwithin the AI community – several times– First coming: Perceptron– Second coming: Multilayer networks– Third coming (present): Deep networks Sound science behind neural networks: gradient descent Unsound social phenomenon behind neural networks: HYPE!Recall the Humble Perceptonxjwjnode/neuronYhh is a simple step function (sgn)8

Observations Linear separability is fairly weak We have other tricks:– Functions that are not linearly separable in one space, maybe linearly separable in another space– As shown earlier, we can engineer our inputs to addressthis, but it’s not easy in general Would like a more powerful learning architecture thatdoes more of the work for usGeneralizing the Perceptronxjwj,izinode/neuronhai h( w j,i x j )j!!h can be any function, but usually a smoothed step function 9

Threshold Functions1.510.5h(x) sgn(x)(perceptron)0-0.5-11-1.5-10-505100.5h(x) tanh(x) or 1/(1 exp(-x))(logistic sigmoid)0-0.5-1-10-50510Network Architectures Cyclic vs. Acyclic– Cyclic is tricky, but more biologically plausible Hard to analyze in general May not be stable Need to assume latches to avoid race conditions– Hopfield nets: special type of cyclic net useful forassociative memory– RNN, LSTM increasingly popular cyclic structures Single layer (perceptron) Multiple layer10

Feedforward Networks We consider acyclic networks One or more computational layers Entire network can be viewed as computing acomplicated non-linear function Typical uses in learning:– Classification (usually involving complex patterns)– General continuous function approximation Many other variations possibleMultilayer Networks Once people realized how simple perceptrons were, they lostinterest in neural networks for a while Multilayer networks turn out to be much more expressive(with a smoothed step function)– Use sigmoid, e.g., h tanh(wTx) or logistic sigmoid– With 2 layers, can represent any continuous function– With 3 layers, can represent many discontinuous functions Tricky part: How to adjust the weightsPlay with it at: http://playground.tensorflow.org11

Calculus Reminder Chain rule for one variable: f g f g x g x Chain rule for: f : ℜn ℜk ,g : ℜm ℜn()()Jx ( f !g) Jg(x) ( f )Jx (g) k n n m For k 1, m 1n f g(x)i xi 1 g(x)iJx ( f g) Smoothing Things Out Idea: Do gradient descent on a smooth error function Error function is sum of squared errors Consider a single training example firstE 0.5error(X (i),w)2 E E a j w ij a j w ij E δj a jNotation a j zi w ijCalculus E δ jzi w!! ijai z ii a j w ijzii!!jwijzj!!z j h(a j ) 12

E E a j δ j zi wij a j wijPropagating Errors a E δ j ,""" j zi ,"""" a j wij For output units (assuming no weights on outputs) E δj y t a!! ja j w ijzii!!ai wij For hidden (internal) unitsChain rule i jzj output!!z j f (a j )¶E¶E ¶ak¶E¶h di å åw ik i h'(ai )å w ikd k ¶ai¶aik ¶ak ¶aik ¶akkAll upstream nodes from iError gradient of upstream nodesDifferentiating h Recall the logistic sigmoid:ex1h(x) x 1 e1 e x!!e x11 h(x) x 1 e1 ex!! Differentiating: e x1e xh'(x) h(x)(1 h(x)) (1 e x )2 (1 e x ) (1 e x )!! 13

Putting it together Apply input x to network (sum for multiple inputs)– Compute all activation levels– Compute final output (forward pass) Compute d for output units!!δ y t Backpropagate ds to hidden units E akδj h'(a j ) w kjδ k !!k ak a jkCompute gradient update: ¶E d j zi¶wij Summary of Gradient Update Gradient calculation, parameter updates haverecursive formulation Decomposes into:– Local message passing– No transcendentals: h’(x) 1-h(x)2 for tanh(x) h’(x) h(x)(1-h(x)) for logistic sigmoid Highly parallelizable Biologically plausible(?) Celebrated backpropagation algorithm14

Inputs15

Propagate forward, computing activation levels,outputs to next layerCompute the output of the final layer16

Compute the error (d) for the final layerCompute the error d’s and gradient updates for earlier layers: E δ aj i w!! ij 17

Complete training for one datum – now repeat for entire training setGood News Can represent any continuous function with twolayers (1 hidden) Can represent essentially any function with 3 layers (But how many hidden nodes?) Multilayer nets are a universal approximationarchitecture with a highly parallelizable trainingalgorithm18

Backprop Issues Backprop gradient descent on an error functionFunction is nonlinear ( powerful)Function is nonlinear ( local minima)Big nets:– Many parameters Many optima Slow gradient descent Risk of overfitting– Biological plausibility ¹ Electronic plausibility Many NN experts became experts in numericalanalysis (by necessity)NN History Through the Second Coming Second wave of interest in neural networks lostresearch momentum in the 1990s – though stillcontinued to enjoy many practical applications Neural network tricks were not sufficient toovercome competing methods:– Support vector machines– Clever feature selection methods wrapped aroundsimple or linear methods 2000-2010 was an era of linear special sauce What changed?19

Deep Networks Not a learning algorithm, but a family of techniques– Improved training techniques (though still essentially gradient descent)– Clever crafting of network structure – convolutional nets– Some new activation functions Exploit massive computational power– Parallel computing– GPU computing– Very large data sets (can reduce overfitting)Deep Networks Today Still on the upward swing of the hype pendulum State of the art performance for many tasks:– Speech recognition– Object recognition– Playing video games Controversial but increasingly accepted in practice:– Hype, hype, hype! (but it really does work well in many cases!)– Theory lags practice– Collection of tricks, not an entirely a science yet– Results are not human-interpretable20

Conclusions Neural nets general function approximation architecture Gradient decomposition permits parallelizable training Historically wild swings in popularity Currently on upswing due to clever changes in trainingmethods, use of parallel computation, and large data sets21

Artificial Neural Networks Develop abstractionof function of actual neurons Simulate large, massively parallel artificial neural networks on conventional computers Some have tried to build the hardware too Try to approximate human learning, robustness to noise, robustness to damage, etc. Early Uses of neural networks

Related Documents:

A growing success of Artificial Neural Networks in the research field of Autonomous Driving, such as the ALVINN (Autonomous Land Vehicle in a Neural . From CMU, the ALVINN [6] (autonomous land vehicle in a neural . fluidity of neural networks permits 3.2.a portion of the neural network to be transplanted through Transfer Learning [12], and .

Deep Neural Networks Convolutional Neural Networks (CNNs) Convolutional Neural Networks (CNN, ConvNet, DCN) CNN a multi‐layer neural network with – Local connectivity: Neurons in a layer are only connected to a small region of the layer before it – Share weight parameters across spatial positions:

neural networks using genetic algorithms" has explained that multilayered feedforward neural networks posses a number of properties which make them particularly suited to complex pattern classification problem. Along with they also explained the concept of genetics and neural networks. (D. Arjona, 1996) in "Hybrid artificial neural

4 Graph Neural Networks for Node Classification 43 4.2.1 General Framework of Graph Neural Networks The essential idea of graph neural networks is to iteratively update the node repre-sentations by combining the representations of their neighbors and their own repre-sentations. In this section, we introduce a general framework of graph neural net-

Deep Learning 1 Introduction Deep learning is a set of learning methods attempting to model data with complex architectures combining different non-linear transformations. The el-ementary bricks of deep learning are the neural networks, that are combined to form the deep neural networks.

Neural networks—an overview The term "Neural networks" is a very evocative one. It suggests machines that are something like brains and is potentially laden with the science fiction connotations of the Frankenstein mythos. One of the main tasks of this book is to demystify neural networks

Philipp Koehn Machine Translation: Introduction to Neural Networks 22 September 2022. 8 example Philipp Koehn Machine Translation: Introduction to Neural Networks 22 September 2022. Simple Neural Network 9 1 1 4.5-5.2-4.6 -2.0-1.5 3.7 2.9 3.7 2.9 One innovation: bias units (no inputs, always value 1)

National Animal – the tsuru is designated as a Japanese national treasure and is an animal symbol of Japan – like the kangaroo for Australia, . and many more people could now learn to fold paper, including paper cranes. These pictures show two pages from the book, and two ladies with a child folding paper cranes – you can see the small scissors to cut the paper. 4 千羽鶴 Senbazuru .