Gradient Descent - University Of Liverpool

1y ago

22 Views

3 Downloads

1.59 MB

53 Pages

Last View : 4d ago

Last Download : 3m ago

Upload by : Maleah Dent

Report this link

Download PDF

Transcription

Gradient DescentDr. Xiaowei Huanghttps://cgi.csc.liv.ac.uk/ xiaowei/

Up to now, Three machine learning algorithms: decision tree learning k-nn linear regressiononly optimizationobjectives arediscussed, buthow to solve?

Today’s Topics Derivative Gradient Directional Derivative Method of Gradient Descent Example: Gradient Descent on Linear Regression Linear Regression: Analytical Solution

Problem Statement: Gradient-BasedOptimization Most ML algorithms involve optimization Minimize/maximize a function f (x) by altering x Usually stated as a minimization of e.g., the loss etc Maximization accomplished by minimizing –f(x) f (x) referred to as objective function or criterion In minimization also referred to as loss function cost, or error Example: linear least squares Linear regression Denote optimum value by x* argmin f (x)

Derivative

Derivative of a function Suppose we have function y f (x), x, y real numbers Derivative of function denoted: f’(x) or as dy/dx Derivative f’(x) gives the slope of f (x) at point x It specifies how to scale a small change in input to obtain a corresponding change in theoutput:f (x ε) f (x) ε f’ (x) It tells how you make a small change in input to make a small improvement in yRecall what’s the derivative for thefollowing functions:f(x) x2f(x) ex

Calculus in Optimization Suppose we have function Sign function: We know thatfor small ε. Therefore, we can reduceopposite sign of derivativeWhy opposite?, where x, y are real numbersThis technique iscalled gradientdescent (Cauchy1847)by moving x in small steps with

Example Function f(x) x2 f’(x) 2xε 0.1 For x -2, f’(-2) -4, sign(f’(-2)) -1 f(-2- ε*(-1)) f(-1.9) f(-2) For x 2, f’(2) 4, sign(f’(2)) 1 f(2- ε*1) f(1.9) f(2)

Gradient Descent IllustratedFor x 0, f(x) decreases with xand f’(x) 0For x 0, f(x) increases with xand f’(x) 0Use f’(x) to followfunction downhillReduce f(x) by going in directionopposite sign of derivative f’(x)

Stationary points, Local Optima Whenmove Points wherederivative provides no information about direction ofare known as stationary or critical points Local minimum/maximum: a point where f(x) lower/ higher than all itsneighbors Saddle Points: neither maxima nor minima

Presence of multiple minima Optimization algorithms may fail to find global minimum Generally accept such solutions

Gradient

Minimizing with multiple dimensional inputs We often minimize functions with multiple-dimensional inputs For minimization to make sense there must still be only one (scalar)output

Functions with multiple inputs Partial derivativesmeasures how f changes as only variable xi increases at point x Gradient generalizes notion of derivative where derivative is wrt avector Gradient is vector containing all of the partial derivatives denoted

Example y 5x15 4x2 x32 2 so what is the exact gradient on instance (1,2,3) the gradient is (25x14, 4, 2x3) On the instance (1,2,3), it is (25,4,6)

Functions with multiple inputs Gradient is vector containing all of the partial derivatives denoted Element i of the gradient is the partial derivative of f wrt xi Critical points are where every element of the gradient is equal tozero

Example y 5x15 4x2 x32 2 so what are the critical points? the gradient is (25x14, 4, 2x3) We let 25x14 0 and 2x3 0, so all instances whose x1 and x3 are 0.but 4 / 0. So there is no critical point.

Directional Derivative

Directional Derivative Directional derivative in directionfunction in direction This evaluates to Example: letcoordinates, sothen(a unit vector) is the slope ofbe a unit vector in Cartesian

Directional Derivative To minimize f find direction in which f decreases the fastest where is angle between and the gradient Substituteand ignore factors that not depend onto This is minimized whenthis simplifiespoints in direction opposite to gradient In other words, the gradient points directly uphill, and the negativegradient points directly downhill

Method of Gradient Descent

Method of Gradient Descent The gradient points directly uphill, and the negative gradient pointsdirectly downhill Thus we can decrease f by moving in the direction of the negativegradient This is known as the method of steepest descent or gradient descent Steepest descent proposes a new point whereis the learning rate, a positive scalar. Set to a small constant.

Choosing : Line Search We can choose in several different ways Popular approach: set to a small constant Another approach is called line search: Evaluatefor several values offunction valueand choose the one that results in smallest objective

Example: Gradient Descent on LinearRegression

Example: Gradient Descent on LinearRegression Linear regression: The gradient is

Example: Gradient Descent on LinearRegression Linear regression: The gradient is Gradient Descent algorithm is Set step size While, tolerance δ to small, positive numbers.do

Linear Regression: Analyticalsolution

Convergence of Steepest Descent Steepest descent converges when every element of the gradient iszero In practice, very close to zero We may be able to avoid iterative algorithm and jump to the criticalpoint by solving the following equation for x

Linear Regression: Analytical solution Linear regression: The gradient is Let Then, we have

Linear Regression: Analytical solution Algebraic view of the minimizer If 𝑋 is invertible, just solve 𝑋𝑤 𝑦 and get 𝑤 𝑋 1𝑦 But typically 𝑋 is a tall matrix

Generalization to discrete spaces

Generalization to discrete spaces Gradient descent is limited to continuous spaces Concept of repeatedly making the best small move can be generalizedto discrete spaces Ascending an objective function of discrete parameters is called hillclimbing

Exercises Given a function f(x) ex/(1 ex), how many critical points? Given a function f(x1,x2) 9x12 3x2 4, how many critical points? Please write a program to do the following: given any differentiablefunction (such as the above two), an ε, and a starting x and a target x’,determine whether it is possible to reach x’ from x. If possible, howmany steps? You can adjust ε to see the change of the answer.

Extended Materials

Beyond Gradient: Jacobian and Hessianmatrices Sometimes we need to find all derivatives of a function whose inputand output are both vectors If we have function f: Rm - Rn Then the matrix of partial derivatives is known as the Jacobian matrix Jdefined as

Second derivative Derivative of a derivative For a function f: Rn - R the derivative wrt xi of the derivative of f wrtxj is denoted as In a single dimension we can denoteby f’’(x) Tells us how the first derivative will change as we vary the input This is important as it tells us whether a gradient step will cause asmuch of an improvement as based on gradient alone

Second derivative measures curvature Derivative of a derivative Quadratic functions with different curvatures

Hessian Second derivative with many dimensions H ( f ) (x) is defined as Hessian is the Jacobian of the gradient Hessian matrix is symmetric, i.e., Hi,j Hj,i anywhere that the second partial derivatives are continuous So the Hessian matrix can be decomposed into a set of real eigenvalues andan orthogonal basis of eigenvectors Eigenvalues of H are useful to determine learning rate as seen in next two slides

Role of eigenvalues of Hessian Second derivative in direction d is dTHd If d is an eigenvector, second derivative in that direction is given by itseigenvalue For other directions, weighted average of eigenvalues (weights of 0 to 1, witheigenvectors with smallest angle with d receiving more value) Maximum eigenvalue determines maximum second derivative andminimum eigenvalue determines minimum second derivative

Learning rate from Hessian Taylor’s series of f(x) around current point x(0) where g is the gradient and H is the Hessian at x(0) If we use learning rate ε the new point x is given by x(0)-εg. Thus we get There are three terms: original value of f, expected improvement due to slope, and correction to be applied due to curvature Solving for step size when correction is least gives

Second Derivative Test: Critical Points On a critical point f’(x) 0 When f’’(x) 0 the first derivative f’(x) increases as we move to theright and decreases as we move left We conclude that x is a local minimum For local maximum, f’(x) 0 and f’’(x) 0 When f’’(x) 0 test is inconclusive: x may be a saddle point or part of aflat region

Multidimensional Second derivative test In multiple dimensions, we need to examine second derivatives of alldimensions Eigendecomposition generalizes the test Test eigenvalues of Hessian to determine whether critical point is alocal maximum, local minimum or saddle point When H is positive definite (all eigenvalues are positive) the point is alocal minimum Similarly negative definite implies a maximum

Saddle point Contains both positive and negative curvature Function is f(x) x12-x22 Along axis x1, function curves upwards: this axis is an eigenvector of H and hasa positive value Along x2, function corves downwards; its direction is an eigenvector of H withnegative eigenvalue At a saddle point eigen values are both positive and negative

Inconclusive Second Derivative Test Multidimensional second derivative test can be inconclusive just likeunivariate case Test is inconclusive when all non-zero eigen values have same sign butat least one value is zero since univariate second derivative test is inconclusive in cross-sectioncorresponding to zero eigenvalue

Poor Condition Number There are different second derivatives in each direction at a singlepoint Condition number of H e.g., λmax/λmin measures how much they differ Gradient descent performs poorly when H has a poor condition no. Because in one direction derivative increases rapidly while in anotherdirection it increases slowly Step size must be small so as to avoid overshooting the minimum, but it willbe too small to make progress in other directions with less curvature

Gradient Descent without H H with condition no, 5 Direction of most curvature has five times more curvature than direction ofleast curvature Due to small step size Gradientdescent wastes time Algorithm based on Hessian canpredict that steepest descent isnot promising

Newton’s method uses Hessian Another second derivative method Using Taylor’s series of f(x) around current x(0) solve for the critical point of this function to give When f is a quadratic (positive definite) function use solution to jump to theminimum function directly When not quadratic apply solution iteratively Can reach critical point much faster than gradient descent But useful only when nearby point is a minimum

Summary of Gradient Methods First order optimization algorithms: those that use only the gradient Second order optimization algorithms: use the Hessian matrix such asNewton’s method Family of functions used in ML is complicated, so optimization is morecomplex than in other fields No guarantees Some guarantees by using Lipschitz continuous functions, with Lipschitz constant L

Convex Optimization Applicable only to convex functions – functions which are wellbehaved, e.g., lack saddle points and all local minima are global minima For such functions, Hessian is positive semi-definite everywhere Many ML optimization problems, particularly deep learning, cannotbe expressed as convex optimization

Constrained Optimization We may wish to optimize f(x) when the solution x is constrained to liein set S Such values of x are feasible solutions Often we want a solution that is small, such as x 1 Simple approach: modify gradient descent taking constraint intoaccount (using Lagrangian formulation)

Ex: Least squares with Lagrangian We wish to minimize Subject to constraint xTx 1 We introduce the Lagrangian And solve the problem For the unconstrained problem (no Lagrangian) the smallest normsolution is x A b If this solution is not feasible, differentiate Lagrangian wrt x to obtain ATAxATb 2λx 0 Solution takes the form x (ATA 2λI)-1ATb Choosing λ: continue solving linear equation and increasing λ until x has thecorrect norm

Generalized Lagrangian: KKT More sophisticated than Lagrangian Karush-Kuhn-Tucker is a very general solution to constrainedoptimization While Lagrangian allows equality constraints, KKT allows both equalityand inequality constraints To define a generalized Lagrangian we need to describe S in terms ofequalities and inequalities

Generalized Lagrangian Set S is described in terms of m functions g(i) and n functions h(j) sothat Functions of g are equality constraints and functions of h are inequalityconstraints Introduce new variables λi and αj for each constraint (called KKTmultipliers) giving the generalized Lagrangian We can now solve the unconstrained optimization problem

Method of Gradient Descent The gradient points directly uphill, and the negative gradient points directly downhill Thus we can decrease f by moving in the direction of the negative gradient This is known as the method of steepest descent or gradient descent Steepest descent proposes a new point

Related Documents:

Learning to learn by gradient descent by gradient descent

2 f( ). While any method capable of minimizing this objective function can be applied, the standard approach for differentiable functions is some form of gradient descent, resulting in a sequence of updates t 1 t trf( t). The performance of vanilla gradient descent, however, is hampered by the fact that it only makes use

74 Views

3y ago

10-725: Optimization Fall 2012 Lecture 5: Gradient Desent ...

5.4.2 Steepest descent It is a close cousin to gradient descent and just change the choice of norm. Let’s suppose q;rare complementary: 1 q 1 r 1. Steepest descent just update x x t x, where x kuk r u u argmin kvk q 1 rf(x)T v If q 2, then x r f(x), which is exactly gradient descent.

34 Views

3y ago

16 The Gradient Descent Framework

16.2.1 The Basic Gradient Descent Method Gradient descent is an iterative algorithm to approximate the opti-mal solution x. The main idea is simple: since the gradient tells us the direction of steepest increase, we’d like to move opposite to the

42 Views

3y ago

A Gradient Descent Implementation of Adaptive Pulse Compression

A Gradient Descent Implementation of Adaptive Pulse Compression Patrick M. McCormick1, Shannon D. Blunt1, and Thomas Higgins2 1Radar Systems Lab, University of Kansas, Lawrence, KS 2Radar Division, Naval Research Laboratory, Washington, DC Abstract—Gradient descent is an iterative method of determining the minima or maxima of a function.

13 Views

1y ago

2 QUEEN SQUARE LIVERPOOL L1 1RH - Hitchcock Wright

ST.JOHN'S 2 QUEEN SQUARE LIVERPOOL L1 1RH Liverpool's Waterfront is a UNESCO World Heritage site Liverpool is the 5th largest City in the UK 7.2 million people live within a 1 hour commute The Liverpool City Region has a student population of 62,000 (the 3rd largest in the UK)

16 Views

11m ago

Mirror descent - Princeton University

Mirror descent 5-2 Convex and Lipschitz problems minimizex f (x) subject to x ! C f is convex andLf-Lipschitz continuous Mirror descent 5-35 Outline Mirror descent Bregman divergence Alternative forms of mirror descent Convergence analysis f (xt) !! f (xt),x " xt " " 1 2!t #x " xt#2

56 Views

3y ago

NN-Descent on High-Dimensional Data

and the hubness phenomenon (Section 2.2), which, as we will de-monstrate later, has a significant impact on NN-Descent. 2.1 NN-Descent The main purpose of the NN-Descent algorithm is to create a good approximation of the true K-NNG, and to create it as fast as possi-ble. The basic assumption made by NN-Descent can be summarized

44 Views

3y ago

API 650 – ABOVEGROUND STORAGE TANKS, Part I: Code ...

When designing a storage tank, it is necessary to meet the requirements of the design code (in this case, API 650), and also with all those requirements of the codes involved in the process of the tank. Some of them are listed below: API-RP 651: Cathodic Protection of Aboveground Petroleum Storage Tanks

124 Views

3y ago

Recent Views

Quotes within Quotes: When Single (') and Double (") Quotes . - SAS

Here the outside double quotes are replaced by a single quote and the apostrophe is replaced by two single quotes. This works because when the parser sees two single (or double) quotes immediately following each other, the parser resolves them into one quote mark after the closing quote has been determined.

1y ago

237 Views

What These Inspirational Quotes Say

Self Motivation Quotes Success Quotes Teacher Quotes And after reading all of these inspirational quotes you’d like to share which quotation is . -- Brian Tracy "You must constantly ask yourself these questions: Who am I around? What are they doing to me? Wha

2y ago

302 Views

Personal insurance - Car & Business insurance King Price Insurance

The king's insurance options 5 Things you need to know 7 The stuff you need to do 14 How to claim 16 Our commitment to you 20 Car insurance 22 Car warranty 37 Shortfall cover 45 Scratch and dent 46 Tyre and rim 48 Motorbike insurance 53 Trailer and caravan insurance 64 Watercraft insurance 68 Home contents insurance 77 Buildings insurance 89

1y ago

673 Views

Common Questions About Home Insurance

Homes with good security will generally be offered lower insurance quotes than the equivalent homes with poor security. In fact, some insurers may not offer quotes at all for homes with poor security. Contents Insurance Is money automatically covered? Most insurance policies will cover a limited amount of money (say up to 500) as part of

1y ago

257 Views

ANZ MatureChoice Home Insurance

Home Insurance In case you need to make a claim, tear off the cards below and place in your car, wallet and at home. ANZ MatureChoice Home Insurance is underwritten by CGU Insurance Limited ABN 27 004 478 371 AFSL 238291 (CGU Insurance). ANZ MatureChoice Home Insurance 24 hr/7 days Claims Assistance Service Call 1300 306 497

1y ago

122 Views

Quotations - Free Website Builder: Create free websites

cards, but sometimes, playing a poor hand well." . 50th Birthday Quotes 60th Birthday Quotes And there are more. Funny Birthday Quotes Cute Birthday Quotes . it a try, itʼs free. Triumph over failure can be a

2y ago

267 Views

The Top 100 Motivational & Inspirational Quotes for 2015

I've spent hours crawling through the web trying to find the best quotes to keep me motivated and inspired all throughout the New Year. I've saved hundreds of quotes on my laptop and figured that words alone could motivate and inspire me. but if I couple the quotes

2y ago

329 Views

Inspirational Quotes - Guideposts

Inspirational Quotes Inspiring quotes are like vitamins for the soul. From the heartfelt to the humorous, the words of wisdom you’ll find here will strengthen your faith, lift your spirits, and even spark a positive change in your life. This collection of some our favorite inspirational quotes from religious figures, world leaders, authors,

2y ago

553 Views

Gold Tier - MAPFRE Insurance

Foy Insurance of MA, LLC 198 Frank Consolati Insurance Agency, Inc. 198 County Insurance Agency, Inc. 198 Woodrow W Cross Agency 214 Woodland Insurance Agency, Inc. 214 Tegeler Insurance Services of CT, Inc. 214 Pantano/VonKahle Insurance Agency, Inc. 214 . Hanson Insurance Agency, Inc. 287 J.H. Slattery Insurance Agency, Inc. 287

1y ago

565 Views

Fortress Bedroom Rated Home Insurance Policy

WELCOME TO FORTRESS HOME INSURANCE Fortress is a bedroom-rated Home and Contents insurance policy arranged by UK General Insurance Limited on behalf of Great Lakes Insurance SE, UK Branch. Great Lakes Insurance SE is a German insurance company with its headquarters at Königinstrasse 107, 80802 Munich. .

1y ago

139 Views

Consumer Guide to Auto Insurance - csimt.gov

consumer guide to auto insurance contents introduction to auto insurance 1 understanding your auto insurance policy 2 required auto insurance 3 optional types of auto insurance 4-5 getting the right coverage 6 accidents and violations 7 how to shop for auto insurance 8 shopping tips 9 frequently asked questions 10-11 insurance complaints/when you have a problem 12

2y ago

805 Views

Industry Observations Insurance Industry

Jun 30, 2019 · 6/17/2019 Commercial Insurance Branch of Extraco Banks, N.A. Higginbotham Insurance Group, Inc. Insurance Brokers NA 6/13/2019 Links Insurance Services, LLC World Insurance Associates LLC Property and Casualty Insurance NA 6/13/2019 Abram Interstate Insurance Services, Inc. Risk Placement Services,

2y ago

619 Views

Life Insurance Buyer's Guide Life Insurance - National Association of .

Life Insurance uers uide Naional ssociaion of Insurance Commissioners Compare the Different Types of Insurance Policies There are many types of life insurance pol-icies. You should choose a policy with fea-tures that fit your individual needs. Some things to consider are: Term Insurance vs. Cash Value In-surance. Term insurance is intended to

1y ago

520 Views

your guide to understanding auto ins in nh - New Hampshire

Hampshire Insurance Department does not mandate or set Auto Insurance Rates. Auto Insurance Rates will vary by insurance company. This guide is intended to give New Hampshire consumers basic information on auto insurance. It suggests ways to: Lower the cost of your auto insurance, shop for Auto insurance and, file an auto insurance claim.

1y ago

449 Views

18.01.41 - REPLACEMENT OF LIFE INSURANCE AND ANNUITIES - Idaho

Department of Insurance Replacement of Life Insurance and Annuities. Page 3. 04. Existing Life Insurance or Annuity. "Existing Life Insurance or Annuity" means any life insurance or annuity in force, including life insurance under a binding or conditional receipt or a lif e insurance policy or annuity that is within an unconditional refund period.

1y ago

407 Views

Gradient Descent - University Of Liverpool

It looks like you're using an ad-blocker