Neural Networks Basics - AI Is Math

3y ago

33 Views

3 Downloads

4.77 MB

74 Pages

Last View : 1m ago

Last Download : 3m ago

Upload by : Aiyana Dorn

Report this link

Download PDF

Transcription

NN basics

References http://cs231n.stanford.edu/index.html ctures/lectures.html http://www.cs.cmu.edu/ 16385/

What will we know to do? Hopefully by the end of the course: https://teachablemachine.withgoogle.com/

What is a neural network Artificial neural networks (ANN / NN) are computing systems vaguelyinspired by the biological neural networks that constitute animal brains. Suchsystems "learn" to perform tasks by considering examples, generally withoutbeing programmed with task-specific rules.– [Wikipedia]

What does a NN needs?

What a neural network can do? Image based:–––––Object recognitionHuman pose detection3D reconstruction from a signal imageImage captioningStyle transfer Non image based:– Language translation– Game playing And much-much more

Object recognition

Human pose detection

3D reconstruction from a single image

Image captioning

Style transfer

Object recognition challenges As we’ve seen before- object recognition is hard!

Challenge: variable viewpoint

Challenge: variable illuminationimage credit: J. Koenderink

Challenge: scale

Challenge: deformation

Challenge: occlusion

Challenge: background clutter

Challenge: intra-class variationsSvetlana Lazebnik

Object recognition challenges We’ve already seen that this is a hard problem to tackle with “classic” CValgorithms like SIFT and template matching.– Template matching does a relatively good job to find the same template instancein an image.– SIFT can extend this to find the instance with changingviewpoint/scale/illumination and rotation. What happens when want to find similar object that are not the same?– NN for the saving!

history

perceptron The basic building block of all NN. First introduced in 1958 at Cornell Aeronautical Laboratory by FrankRosenblatt. We will talk more about it in a moment

MNIST LeNet-5 MNIST is a large dataset of handwritten digits used in training of LeNet-5. LeNet-5 is the first known NN to solve a major computer vision problem:– Classifies digits, was applied by several banks to recognize hand-written numberson checks.– Used 7 trainable layers with a total of 60K params (sounds a lot?).– Yann LeCun at el., 1998, 23000 citations.

Large Scale Visual Recognition Challenge (ILSVRC) ImageNet is an image database most known for its ILSVRC challenge, andspecifically for the image classification contest:– 1000 object classes– 1,431,167 images– Winner has the minimum mean labeling error out of 5 gausses for a givenunknown test set.

ILSVRC winnersHumanerror: 5%

The classification problem Let’s first try to solve it with a perceptron.

Perceptron the perceptron is an algorithmfor supervised learning of binaryclassifiers.– The perceptron determines ahyperplane separator which isdetermined by a set of weights (𝑊).– A feature vector is the representation ofthe object to be classified which theperceptron receives as input (𝒙). The weights (𝑊) determine theseparator are what we need to learn inorder to optimize the classification.

hyperplane Paramtrization of a line in 2D:𝑎𝑥 𝑏𝑦 𝑐 0– if 𝑐 0:𝑎𝑥 𝑏𝑦 0 𝑎, 𝑏 𝑥, 𝑦 0 𝑎, 𝑏 𝑥, 𝑦 (𝑎, 𝑏) defines the normal to the line(𝑎, 𝑏)

hyperplane Paramtrization of a line in 2D:𝑎𝑥 𝑏𝑦 𝑐 0– if 𝑐 0:𝑎𝑥 𝑏𝑦 0 𝑎, 𝑏 𝑥, 𝑦 0 𝑎, 𝑏 𝑥, 𝑦 (𝑎, 𝑏) defines the normal to the line– if 𝑐 0: This is the bias factor. Defines the distance of 0,0 from the line:– Point-line distance: d – 𝑏𝑖𝑎𝑠 𝑐𝑎2 𝑏2𝑎𝑥 𝑏𝑦 𝑐𝑎2 𝑏2(𝑎, 𝑏)

hyperplane This is the same for 3D representation of a plane as well:𝑎𝑥 𝑏𝑦 𝑐𝑧 𝑑 0 (𝑎, 𝑏, 𝑐) defines the normal to the plane, 𝑑 defines the bias of the plane from(0,0,0). And the same representation can be done for ND space. The ND plane iscalled a hyperplane.

hyperplane Writing the hyperplane representation vector vise will result the equationbelow:𝑤1 𝑤𝑛𝑥1 𝑏 𝑤𝑇𝑥 𝑏 0𝑥𝑛 Points 𝑥 above the hyperplane (in the direction of the normal) will result in𝑤 𝑇 𝑥 𝑏 0, and points 𝑥 below the hyperplane will result in 𝑤 𝑇 𝑥 𝑏 0.

hyperplane Another option is to write the hyperplane representation with homogenousvectors, this will result with the (more compact) equation below:𝑥1 𝑤1 𝑤𝑛 𝑏 𝑤𝑇𝑥 0𝑥𝑛1 Points 𝑥 above the hyperplane (in the direction of the normal) will result in𝑤 𝑇 𝑥 0, and points 𝑥 below the hyperplane will result in 𝑤 𝑇 𝑥 0.

Activation function A non-linear function 𝑓() that appends the perceptron’s hyperplane equationy 𝑓(𝑊𝑥). If we have a problem of classifying two groups with a single hyperplane, wecan use a step activation function:0, 𝑥 0𝑓 𝑥 𝑠𝑡𝑒𝑝 𝑥 ቊ1, 𝑥 0 1 100

Activation function Later we will use more common activation functions. One of them is the rectified linear unit (ReLU) function:0, 𝑥 0𝑓 𝑥 max 𝑥, 0 ቊ𝑥, 𝑥 0 Other known activation functions: sigmoid, tanh, leaky ReLU.

perceptron: Inspiration from Biology Neural nets/perceptrons are loosely inspired by biology. But they certainly are not a proper model of how the brain works, or evenhow neurons work.

Hyperplanes and image classification In images, the pixels can be the input feature vector.

Hyperplanes and image classification We want to find a hyperplane in 4D space that puts all cats’ vectors in oneside of it, and all other images in the other side.– Let’s assume there are 2 more classes. In total: cats, dogs and ships. Now, 𝑊 is amatrix rather than a vector– Find 3 separating planes, one for each class.

Perceptron: template matching interpretation We can think about the optimized weights as a template in templatematching cross correlation algorithm.– We get a strong positive response when the template matches the image area.minima* maximum

Perceptron: template matching interpretation In our case the template isthe size of the image. We can see examples oftemplates for differentgroups- the optimizedtemplate can bee thought ofas the mean of the class.

Perceptron: template matching interpretation

optimization

Optimizing the weights We have this results for each possible label. which is the best result currently? Which should be the best result?

Optimizing the weights- first try We have this results for each possible label. which is the best result currently? Which should be the best result?– Let’s use our step activation function from before.011 Can’t tell us which class is better not good enough.– We need a way to quantify the results as more/less likely.

Softmax layer The softmax layer normalizes all the results so that you get a percentage ofcorrectness for each label. The softmax is usually added as the last layer in a NN to normalize the resultsinstead of an activation function.

Cross entropy loss function We need to define an error of the given probabilities and the correct (wanted)probabilities. A known loss function for this problem is called cross entropy loss.

Cross entropy loss softmax The cross entropy of the distribution 𝑞 (output results) relative to adistribution 𝑝 (wanted results) over a given set is defined as follow :

Total loss This 𝐿𝑖 is the loss of a single given input image 𝑥𝑖 . Let’s say we have all possible images in the world, so the total loss will be:𝑁1𝐿 𝐿𝑖𝑁𝑖 1– A mean of all possible losses, where 𝑁 is number of images. We want to find the best 𝑊 that minimizes 𝐿. How do we do this?

Finding the best W How do we do this?– Derive over 𝑊: 𝑊 𝐿 Problems:– We don’t have all images, and even if we do, it will take forever – No one said 𝐿 is a convex function.– It’s sometimes hard to compute the analytic derivative of the function 𝐿 for allpossible 𝒙 in order to naively find all extremum points. An approximate solution to find best 𝑊 is called mini-batch gradient descent.

Mini-batch Gradient descent

Mini-batch In mini-batch gradient descent we take only a small subset of images andcompute their average loss:෩𝑁1𝐿෨ 𝐿𝑖෩𝑁𝑖 1෩ is the size of images subset.– A mean of the subset losses, where 𝑁 This approximation of the loss function is faster to compute but lessaccurate.

Finding the best W How do we do this?– Derive over 𝑊: 𝑊 𝐿 Problems:– We don’t have all images, and even if we do, it will take forever – No one said 𝑳 is a convex function.– It’s sometimes hard to compute the analytic derivative of the function 𝑳 for allpossible 𝒙 in order to naively find all extremum points. An approximate solution to find best 𝑊 is called mini-batch gradient descent.

What is a gradient? describes the direction and magnitude of the fastest increase around a point𝒙. Example: gradient of a function of 2 variables:

Gradient descent An iterative algorithm for finding localminima of functions. starts at a random point and moves stepby-step in the direction and proportionalmagnitude of the negative of the gradientof the point he is currently in:– “proportional magnitude” step size 𝜂. In “proper use” this algorithm convergesto a local minimum which is depended onthe starting point.

Gradient descent- step size Also known as learning rate. Choosing the right step size is important. This is known as a hyperparameter: an unknown variable that is configuredby the user (unlike the weights 𝑊 which the system “learns”).

Gradient descent- local minima An iterative algorithm for finding local minima of functions. we can initiate this procedure several times from several random staringpoints and take the minimum of all output minimum points- this way we canget a better result.

Mini-batch gradient descent Combining the two methods is called Mini-batch gradient descent. Almost always mis-called stochastic gradient descent (SGD) – This is the name only if the batch size is 1.

Testing the results

Testing the results NN frameworks are build on learning from examples, so the data is important. Usually we split the data to 3 different datasets:– Train: to train the weights.– Validation: test the resulted NN with specific architecture on unseen data.– Test: compare different types of NN architectures/ change in hyperparameterswhich are not learned. If we don’t have a validation dataset, we will eventually change thearchitecture/ hyperparameters so they will fit the test data- basically learningon the unseen dataset- not good.

Multi-layer perceptron

Multi-layer perceptron Perceptron plane separation is not enough for all data sets- some are notlinearly separable. multi-layer perceptron (MLP), or in a more common name- neural network, isa better approach to try to handle this data.

CIFAR10 dataset CIFAR10 (Canadian Institute For Advanced Research) is a known dataset of 10classes of small images. 32X32X3 3072 DOFs in this problem, and images vary a lot. This is notpossible to linearly separate.10 classes

Multi-layer NN: intuition We can use the data of all the responses to all “templates” of weights fromthe first layer to better represent the result. In this way, instead of one best fit for a template, we can use all the responsesto all templates of the first layer to learn a better classification. This is also correct for any number of layers in an NN.

Multi-layer NN: intuition Before: human “hand engineered” features as input into a machine learning(ML) framework.– Examples of features we’ve seen: SIFT, HOG, color histograms. Now: the NN finds best features.

Multi-layer NN 2-layer NN example: Learned 100 different templates in the first layer andinput them into a second layer for final classification.10D results forfinal classification3072D inputvector(10 x 100 matrix)(100 x 3072 matrix)100Dintermediatevector

Multi-layer NN Total number of weights to learn:3,072 x 100 100 x 10 308,200

Multi-layer NN What happens if we remove the non-linear activation?𝑓 𝑊2 max 0, 𝑊1 𝑥

Multi-layer NN What happens if we remove the non-linear activation?෩𝑓 𝑊2 max 0, 𝑊1 𝑥 𝑊2 𝑊1 𝑥 𝑊𝑥 We’ve gotten a linear separator again not good. Remember the activation function!

Neural network architecture Computation graph for a 2-layer neural network.– Only count layers with tunable weights (so don’t count the input layer).– Each layer is built from perceptrons: weights activation function.One Neuron/ perceptron

Neural network architecture Deep networks typically have many layers and potentially millions ofparameters. Fully connected layer is a layer in which all inputs are multiplied for eachperceptron with different weights. (this is what we saw until now).

Neural network architecture Example of a deep NN: Inception network (Szegedy et al, 2015) 22 layers

A good fully connected example https://playground.tensorflow.org/#activation tanh&batchSize 10&dataset spiral®Dataset regplane&learningRate 0.03®ularizationRate 0&noise 0&networkShape 8,8,8&seed 0.68609&showTestData false&discretize false&percTrainData 50&x true&y true&xTimesY true&xSquared true&ySquared true&cosX false&sinX true&cosY false&sinY true&collectStats false&problem classification&initZero false&hideText false

What is a neural network Artificial neural networks (ANN / NN) are computing systems vaguely inspired by the biological neural networks that constitute animal brains. Such systems "learn" to perform tasks by considering examples, generally without being programmed with task-specific rules. –[Wikipedia]

Related Documents:

Now This is Podracing - Driving with Neural Networks

A growing success of Artificial Neural Networks in the research field of Autonomous Driving, such as the ALVINN (Autonomous Land Vehicle in a Neural . From CMU, the ALVINN [6] (autonomous land vehicle in a neural . fluidity of neural networks permits 3.2.a portion of the neural network to be transplanted through Transfer Learning [12], and .

59 Views

3y ago

Deep neural networks I - University of California, Davis

Deep Neural Networks Convolutional Neural Networks (CNNs) Convolutional Neural Networks (CNN, ConvNet, DCN) CNN a multi‐layer neural network with – Local connectivity: Neurons in a layer are only connected to a small region of the layer before it – Share weight parameters across spatial positions:

41 Views

3y ago

Neural Networks using Genetic Algorithms - ijcaonline.org

neural networks using genetic algorithms" has explained that multilayered feedforward neural networks posses a number of properties which make them particularly suited to complex pattern classification problem. Along with they also explained the concept of genetics and neural networks. (D. Arjona, 1996) in "Hybrid artificial neural

21 Views

1y ago

Chapter 4 Graph Neural Networks for Node Classiﬁcation - GitHub Pages

4 Graph Neural Networks for Node Classiﬁcation 43 4.2.1 General Framework of Graph Neural Networks The essential idea of graph neural networks is to iteratively update the node repre-sentations by combining the representations of their neighbors and their own repre-sentations. In this section, we introduce a general framework of graph neural net-

14 Views

1y ago

1 Basic concepts of Neural Networks and Fuzzy Logic ...

Neuro-physiologists use neural networks to describe and explore medium-level brain function (e.g. memory, sensory system, motorics). Physicists use neural networks to model phenomena in statistical mechanics and for a lot of other tasks. Biologists use Neural Networks to interpret nucleotide sequences.

25 Views

3y ago

Introduction to Neural Networks - Computer Science

Artificial Neural Networks Develop abstractionof function of actual neurons Simulate large, massively parallel artificial neural networks on conventional computers Some have tried to build the hardware too Try to approximate human learning, robustness to noise, robustness to damage, etc. Early Uses of neural networks

45 Views

3y ago

Video Super-Resolution With Convolutional Neural Networks

Video Super-Resolution With Convolutional Neural Networks Armin Kappeler, Seunghwan Yoo, Qiqin Dai, and Aggelos K. Katsaggelos, Fellow, IEEE Abstract—Convolutional neural networks (CNN) are a special type of deep neural networks (DNN). They have so far been suc-cessfully applied to image super-resolution (SR) as well as other image .

38 Views

3y ago

AngularJS: Novice to Ninja - Programmer Books

To my Mom and Dad who taught me to love books. It's not possible to thank you adequately for everything you have done for me. To my grandparents for their

53 Views

3y ago

Recent Views

IN THIS ISSUE CAR WASH INSIGHT Recent, Notable M&A Transactions .

9/8/2022 Club Car Wash Sites of Tidal Wave Express Car Wash 8 8/29/2022 Take 5 Car Wash Soft Touch Car Wash, Auto Oasis Car Wash, Clearwater Car Wash and Birdie's Car Wash 5 8/25/2022 WhiteWater Express Geaux Clean Car Wash 7 8/19/2022 ModWash Home Team Car Wash 3 8/18/2022 Splash In ECO Car Wash (Wills Group) Blue Hen Car Wash 2

9m ago

100 Views

Personal insurance - Car & Business insurance King Price Insurance

The king's insurance options 5 Things you need to know 7 The stuff you need to do 14 How to claim 16 Our commitment to you 20 Car insurance 22 Car warranty 37 Shortfall cover 45 Scratch and dent 46 Tyre and rim 48 Motorbike insurance 53 Trailer and caravan insurance 64 Watercraft insurance 68 Home contents insurance 77 Buildings insurance 89

1y ago

673 Views

ESSENTIAL PLAN - Discovery

Car insurance only Car and home insurance Car insurance only Car and home insurance 12.5% 25% 5% 10% YOUR FUEL CASH BACK PERCENTAGE GET TO THE HIGHEST CASH BACK PERCENTAGE Add at least R250 000 of home insurance (household contents, buildings or both) Take your car to Tiger Wheel & Tyre and pass the Annual MultiPoint check

1y ago

269 Views

CAR INSURANCE EVERYTHING EXPLAINED - RSA Insurance Group

CAR INSURANCE 93013821.indd 1 15/03/2018 10:46. 2 WELCOME TO µ CAR INSURANCE Thank you for choosing µ to protect you and your car. This booklet is intended to help you check your cover and to reassure you that µ will give you the protection you need for the year ahead. First of all, to help you understand your car insurance policy we want to .

1y ago

274 Views

Describe types and purposes of insurance.

D.O. CAPS Consumer Skills: Insurance—10E 3 Your car - The car you drive can also affect your insurance rates. Insurance companies place certain kinds of cars in special risk categories. You should ask your insurance agent before making a car purchase to make sure you aren't getting a car that will cost you extra for your liability insurance.

1y ago

233 Views

Contours Options Infant Car Seat Adapter Instruction Sheet

your Infant Car Seat, as described in the instruction manual provided by the Infant Car Seat manufacturer. † WHEN USING ONLY ONE INFANT CAR SEAT ADAPTER OR TWO FOR TWINS, THE FOLLOWING INFANT CAR SEATS CAN BE USED: † If your Infant Car Seat is not one of the models listed above, DO NOT use your infant car seat with this car seat adapter.

2y ago

564 Views

Microsoft Advertising Travel Update

last minute cruise deals -58.50% Car Rental Queries WoW Change car rental -43.80% rental cars -46.30% car rentals -40.60% cheap car rentals -48.00% car rentals cheapest rates -52.20% rent a car- 40.30% cheap rental cars -45.60% rental car -41.80% car rental deals -49.30% rental cars lowest price -53.90% Flight Queries WoW Change cheap flights .

1y ago

337 Views

Design and development of lift for an automatic car parking system

1. Stacker type car parking system 2. Puzzle type car parking system 3. Level type car parking system 4. Chess type car parking system 5. Rotary type car parking system 6. Tower type car parking system But lift is used only in tower type car parking system. Objectives:-

6m ago

172 Views

Gold Tier - MAPFRE Insurance

Foy Insurance of MA, LLC 198 Frank Consolati Insurance Agency, Inc. 198 County Insurance Agency, Inc. 198 Woodrow W Cross Agency 214 Woodland Insurance Agency, Inc. 214 Tegeler Insurance Services of CT, Inc. 214 Pantano/VonKahle Insurance Agency, Inc. 214 . Hanson Insurance Agency, Inc. 287 J.H. Slattery Insurance Agency, Inc. 287

1y ago

565 Views

Car Insurance This booklet covers:Car Rapid Bonus Business

Car Insurance This booklet covers:Car Rapid Bonus Business RAC Direct Insurance is a trading name of London and Edinburgh Insurance Company Limited. Registered in England No 924430. Registered Office: 8 Surrey Street, Norwich NR1 3NG. Member of the Aviva Group. Authorised and regulated by the Financial Services Authority. RAC052(V27)-1971-06.06 .

1y ago

218 Views

Root Insurance (ROOT) - Citron Research

Root Insurance (ROOT) Leveling the Playing Field of Car Insurance What every trader needs to know about one of the mostheavily shorted stocks in the market Traditional Credit-Based Car Insurance PerpetuatesEconomic and Racial Inequalities as one in three American cannot affordessentials because of car insurance premiums

1y ago

209 Views

NK-ID 0192-8365-3702-0D3E - Car-O-Liner

CAR-O-DATA. 4. The vast majority of vehicles on the road today can be found in Car-O-Liner's database. Your . Car-O-Tronic. is delivered with a 14-day trial . Car-O-Data Vision2. subscription. Car-O-Data. is available with different subscription periods and database. 4. Check all options with our distributors. SOFTWARE PART. NO. Vision2 X1 .

3y ago

321 Views

46686 Vision2 IM EN r0 - Metropolitan Car-o-liner

Car-O-Tronic, Vision2 Software and Car-O-Data. Car-O-Tronic is the measuring hardware, Vision2 Software is the measuring software. Car-O-Data is a database containing Car-O-Liner DataSheets, photo DataSheets and indexes for most vehicles. Car-O-Data is available through an online subscription or a DVD subscription which is updated 4 times a year.

3y ago

295 Views

Colorado Masonic Library & Museum Store

York Rite 15.00 _ CE40 Car Emblem - Order of the Eastern Star Cut-Out Auto Car Emblem-CE40 OES 15.00 _ CE41 Car Emblem - Shriners Cut-Out Auto Car Emblem-CE41 Shrine 15.00 _ CE42 Car Emblem - 33rd Degree Wings Up Cut-Out Auto Car Emblem-CE42 Scottish Rite 15.00 _ CE43 Car Emblem Free & Ac

2y ago

517 Views

Queueing Theory Part 2 - UW Courses Web Server

Queueing Theory-12 Car Wash Example Consider the following 3 car washes Suppose cars arrive according to a Poisson input process and service follows an exponential distribution Fill in the following table What conclusions can you draw from your results? ! µ! L L q W W q P 0 Car Wash A 0.1 car/min 0.5 car/min Car Wash B 0.1 car/min

1y ago

245 Views

Neural Networks Basics - AI Is Math

It looks like you're using an ad-blocker