Lecture Notes For Chapter 4 Artificial Neural Networks .

2y ago
12 Views
2 Downloads
726.24 KB
11 Pages
Last View : 4d ago
Last Download : 3m ago
Upload by : Pierre Damon
Transcription

Data MiningLecture Notes for Chapter 4Artificial Neural NetworksIntroduction to Data Mining , 2nd EditionbyTan, Steinbach, Karpatne, Kumar2/22/2021Introduction to Data Mining, 2nd Edition11Artificial Neural Networks (ANN) Basic Idea: A complex non-linear function can belearned as a composition of simple processing units ANN is a collection of simple processing units(nodes) that are connected by directed links (edges)– Every node receives signals from incoming edges, performscomputations, and transmits signals to outgoing edges– Analogous to human brain where nodes are neurons and signalsare electrical impulses– Weight of an edge determines the strength of connectionbetween the nodes Simplest ANN: Perceptron (single neuron)2/22/20212Introduction to Data Mining, 2nd Edition2

Basic Architecture of PerceptronActivation Function Learns linear decision boundariesRelated to logistic regression (activation function is signinstead of sigmoid)2/22/2021Introduction to Data Mining, 2nd Edition33Perceptron ExampleX1 X2 X3111100000011011001011010Y-1111-1-11-1Output Y is 1 if at least two of the three inputs are equal to 1.2/22/20214Introduction to Data Mining, 2nd Edition4

Perceptron ExampleX1 X2 X3111100000011011001011010Y-1111-1-11-1Y sign ( 0 . 3 X 1 0 . 3 X 2 0 . 3 X 3 0 . 4 ) 1where sign ( x ) 12/22/2021if x 0if x 0Introduction to Data Mining, 2nd Edition55Perceptron Learning RuleInitialize the weights (w0, w1, , wd) Repeat– For each training example (xi, yi) Compute 𝑦 Until stopping condition is met k: iteration number;2/22/20216Update the weights:𝜆: learning rateIntroduction to Data Mining, 2nd Edition6

Perceptron Learning Rule Weight update formula: Intuition:– Update weight based on error: e Ify 𝑦, e 0: no update neededy 𝑦, e 2: weight must be increased (assuming Xij ispositive) so that 𝑦 will increase Ify 𝑦, e -2: weight must be decreased (assuming Xij ispositive) so that 𝑦 will decrease If2/22/2021Introduction to Data Mining, 2nd Edition77Example of Perceptron Learning 0.1X 1 X2 X3 000.20.20.2000.20.2Weight updates over first epoch2/22/20218Introduction to Data Mining, 2nd EditionEpoch w0 w1 w2 w301234560000-0.2 0 0.2 0.2-0.2 0 0.4 0.2-0.4 0 0.4 0.2-0.4 0.2 0.4 0.4-0.6 0.2 0.4 0.2-0.6 0.4 0.4 0.2Weight updates overall epochs8

Perceptron Learning Since y is a linearcombination of inputvariables, decisionboundary is linear2/22/2021Introduction to Data Mining, 2nd Edition99Nonlinearly Separable DataFor nonlinearly separable problems, perceptron learningalgorithm will fail because no linear hyperplane canseparate the data perfectlyy x1 x2x1 x201012/22/2021100011XOR Datay-111-1Introduction to Data Mining, 2nd Edition10

Multi-layer Neural Network2/22/2021 More than one hidden layer ofcomputing nodes Every node in a hidden layeroperates on activations frompreceding layer and transmitsactivations forward to nodes ofnext layer Also referred to as“feedforward neural networks”Introduction to Data Mining, 2nd Edition1111Multi-layer Neural Network Multi-layer neural networks with at least onehidden layer can solve any type of classificationtask involving nonlinear decision surfacesXOR Data2/22/202112Introduction to Data Mining, 2nd Edition12

Why Multiple Hidden Layers?Activations at hidden layers can be viewed as featuresextracted as functions of inputs Every hidden layer represents a level of abstraction – Complex features are compositions of simpler features Number of layers is known as depth of ANN– Deeper networks express complex hierarchy of features2/22/2021Introduction to Data Mining, 2nd Edition1313Multi-Layer Network Architecture Activation valueat node i at layer l2/22/202114ActivationFunctionIntroduction to Data Mining, 2nd Edition Linear Predictor14

Activation Functions2/22/2021Introduction to Data Mining, 2nd Edition1515Learning Multi-layer Neural Network Can we apply perceptron learning rule to eachnode, including hidden nodes?– Perceptron learning rule computes error terme y - 𝑦 and updates weights accordingly Problem: how to determine the true value of y forhidden nodes?– Approximate error in hidden nodes by error inthe output nodes Problem:– Not clear how adjustment in the hidden nodes affect overallerror– No guarantee of convergence to optimal solution2/22/202116Introduction to Data Mining, 2nd Edition16

Gradient Descent Loss Function to measure errors across all training pointsSquared Loss: Gradient descent: Update parameters in the direction of“maximum descent” in the loss function across all points𝜆: learning rate Stochastic gradient descent (SGD): update the weight for everyinstance (minibatch SGD: update over min-batches of instances)2/22/2021Introduction to Data Mining, 2nd Edition1717Computing Gradients𝑦𝑖𝑗𝑖𝑗 Using chain rule of differentiation (on a single instance): For sigmoid activation function: How can we compute 𝛿 for every layer?2/22/202118𝑎Introduction to Data Mining, 2nd Edition18

Backpropagation Algorithm At output layer L: At a hidden layer 𝑙 (using chain rule):– Gradients at layer l can be computed using gradients at layer l 1– Start from layer L and “backpropagate” gradients to all previouslayers Use gradient descent to update weights at every epoch For next epoch, use updated weights to compute loss fn. and its gradient Iterate until convergence (loss does not change)2/22/2021Introduction to Data Mining, 2nd Edition1919Design Issues in ANN Number of nodes in input layer– One input node per binary/continuous attribute– k or log2 k nodes for each categorical attribute with k values Number of nodes in output layer– One output for binary class problem– k or log2 k nodes for k-class problem Number of hidden layers and nodes per layerInitial weights and biasesLearning rate, max. number of epochs, mini-batch size formini-batch SGD, 2/22/202120Introduction to Data Mining, 2nd Edition20

Characteristics of ANN Multilayer ANN are universal approximators but couldsuffer from overfitting if the network is too large– Naturally represents a hierarchy of features at multiple levels ofabstractions Gradient descent may converge to local minimumModel building is compute intensive, but testing is fastCan handle redundant and irrelevant attributes becauseweights are automatically learnt for all attributesSensitive to noise in training data– This issue can be addressed by incorporating model complexityin the loss function Difficult to handle missing attributes2/22/2021Introduction to Data Mining, 2nd Edition2121Deep Learning Trends Training deep neural networks (more than 5-10 layers) could only bepossible in recent times with:– Faster computing resources (GPU)– Larger labeled training sets Algorithmic Improvements in Deep Learning– Responsive activation functions (e.g., RELU)– Regularization (e.g., Dropout)– Supervised pre-training– Unsupervised pre-training (auto-encoders) Specialized ANN Architectures:– Convolutional Neural Networks (for image data)– Recurrent Neural Networks (for sequence data)– Residual Networks (with skip connections) Generative Models: Generative Adversarial Networks2/22/202122Introduction to Data Mining, 2nd Edition22

Artificial Neural Networks Introduction to Data Mining , 2nd Edition by Tan, Steinbach, Karpatne, Kumar 2/22/2021 Introduction to Data Mining, 2nd Edition 2 Artificial Neural Networks (ANN) Basic Idea: A complex non-linear function can be learned as a composition of simple proces

Related Documents:

Introduction of Chemical Reaction Engineering Introduction about Chemical Engineering 0:31:15 0:31:09. Lecture 14 Lecture 15 Lecture 16 Lecture 17 Lecture 18 Lecture 19 Lecture 20 Lecture 21 Lecture 22 Lecture 23 Lecture 24 Lecture 25 Lecture 26 Lecture 27 Lecture 28 Lecture

Bruksanvisning för bilstereo . Bruksanvisning for bilstereo . Instrukcja obsługi samochodowego odtwarzacza stereo . Operating Instructions for Car Stereo . 610-104 . SV . Bruksanvisning i original

Part One: Heir of Ash Chapter 1 Chapter 2 Chapter 3 Chapter 4 Chapter 5 Chapter 6 Chapter 7 Chapter 8 Chapter 9 Chapter 10 Chapter 11 Chapter 12 Chapter 13 Chapter 14 Chapter 15 Chapter 16 Chapter 17 Chapter 18 Chapter 19 Chapter 20 Chapter 21 Chapter 22 Chapter 23 Chapter 24 Chapter 25 Chapter 26 Chapter 27 Chapter 28 Chapter 29 Chapter 30 .

TO KILL A MOCKINGBIRD. Contents Dedication Epigraph Part One Chapter 1 Chapter 2 Chapter 3 Chapter 4 Chapter 5 Chapter 6 Chapter 7 Chapter 8 Chapter 9 Chapter 10 Chapter 11 Part Two Chapter 12 Chapter 13 Chapter 14 Chapter 15 Chapter 16 Chapter 17 Chapter 18. Chapter 19 Chapter 20 Chapter 21 Chapter 22 Chapter 23 Chapter 24 Chapter 25 Chapter 26

GEOMETRY NOTES Lecture 1 Notes GEO001-01 GEO001-02 . 2 Lecture 2 Notes GEO002-01 GEO002-02 GEO002-03 GEO002-04 . 3 Lecture 3 Notes GEO003-01 GEO003-02 GEO003-03 GEO003-04 . 4 Lecture 4 Notes GEO004-01 GEO004-02 GEO004-03 GEO004-04 . 5 Lecture 4 Notes, Continued GEO004-05 . 6

10 tips och tricks för att lyckas med ert sap-projekt 20 SAPSANYTT 2/2015 De flesta projektledare känner säkert till Cobb’s paradox. Martin Cobb verkade som CIO för sekretariatet för Treasury Board of Canada 1995 då han ställde frågan

service i Norge och Finland drivs inom ramen för ett enskilt företag (NRK. 1 och Yleisradio), fin ns det i Sverige tre: Ett för tv (Sveriges Television , SVT ), ett för radio (Sveriges Radio , SR ) och ett för utbildnings program (Sveriges Utbildningsradio, UR, vilket till följd av sin begränsade storlek inte återfinns bland de 25 största

Hotell För hotell anges de tre klasserna A/B, C och D. Det betyder att den "normala" standarden C är acceptabel men att motiven för en högre standard är starka. Ljudklass C motsvarar de tidigare normkraven för hotell, ljudklass A/B motsvarar kraven för moderna hotell med hög standard och ljudklass D kan användas vid