Lecture 2: Linear Regression - GitHub Pages

2y ago

44 Views

7 Downloads

623.10 KB

31 Pages

Last View : 5m ago

Last Download : 3m ago

Upload by : Adele Mcdaniel

Report this link

Download PDF

Transcription

Lecture 2: Linear RegressionFeng LiShandong Universityfli@sdu.edu.cnSeptember 14, 2020Feng Li (SDU)Linear RegressionSeptember 14, 20201 / 31

Lecture 2: Linear Regression1Supervised Learning: Regression and Classification2Linear Regression3Gradient Descent Algorithm4Stochastic Gradient Descent5Revisiting Least Square6A Probabilistic Interpretation to Linear RegressionFeng Li (SDU)Linear RegressionSeptember 14, 20202 / 31

Supervised LearningRegression: Predict a continuous valueClassification: Predict a discrete value, the classLiving area (feet2 )21041600240014163000.Feng Li (SDU)Price (1000 s)400330369232540.Linear RegressionSeptember 14, 20203 / 31

Supervised Learning (Contd.)Features: input variables, x;Target: output variable, y ;Training example: (x (i) , y (i) ), i 1, 2, 3, ., mHypothesis: h : X Y.TrainingsetLearningalgorithmx(living area ofhouse.)Feng Li (SDU)hpredicted y(predicted price)of house)Linear RegressionSeptember 14, 20204 / 31

Linear RegressionLinear hypothesis: h(x) θ1 x θ0 .θi (i 1, 2 for 2D cases): Parameters to estimate.How to choose θi ’s?Feng Li (SDU)Linear RegressionSeptember 14, 20205 / 31

Linear Regression (Contd.)Input: Training set (x (i) , y (i) ) R2 (i 1, ., m)Goal: Model the relationship between x and y such that we can predictthe corresponding target according to a given new feature.Feng Li (SDU)Linear RegressionSeptember 14, 20206 / 31

Linear Regression (Contd.)The relationship between x and y is modeled as a linear function.The linear function in the 2D plane is a straight line.Hypothesis: hθ (x) θ0 θ1 x (where θ0 and θ1 are parameters)Feng Li (SDU)Linear RegressionSeptember 14, 20207 / 31

Linear Regression (Contd.)Given data x Rn , we then have θ Rn 1PThus hθ (x) ni 0 θi xi θT x, where x0 1What is the best choice of θ ?mmin J(θ) θ1X(hθ (x (i) ) y (i) )22i 1where J(θ) is so-called a cost functionFeng Li (SDU)Linear RegressionSeptember 14, 20208 / 31

Linear Regression (Contd.)m1X(hθ (x (i) ) y (i) )2min J(θ) θ2i 1Feng Li (SDU)Linear RegressionSeptember 14, 20209 / 31

GradientDefinitionDirectional Derivative: The directional derivative of function f : Rn Rin the direction u Rn isf (x hu) f (x)h 0h u f (x) lim u f (x) represents the rate at which f is increased in direction uWhen u is the i-th standard unit vector ei , u f (x) fi 0 (x)where fi 0 (x) Feng Li (SDU) f (x) xiis the partial derivative of f (x) w.r.t. xiLinear RegressionSeptember 14, 202010 / 31

Gradient (Contd.)TheoremFor any n-dimensional vector u, the directional derivative of f in the directionof u can be represented as u f (x) nXfi 0 (x) · uii 1Feng Li (SDU)Linear RegressionSeptember 14, 202011 / 31

Gradient (Contd.)Proof.Letting g (h) f (x hu), we haveg (h) g (0)f (x hu) g (0) lim u f (x)h 0h 0hhg 0 (0) lim(1)On the other hand, by the chain rule,g 0 (h) nXnfi 0 (x)i 1Let h 0, then g 0 (0) complete the proof.Feng Li (SDU)Xd(xi hui ) fi 0 (x)uidh(2)i 1Pni 1 fi0 (x)u ,iby substituting which into (1), weLinear RegressionSeptember 14, 202012 / 31

Gradient (Contd.)DefinitionGradient: The gradient of f is a vector function f : Rn Rn defined bynX f f (x) ei xii 1where ei is the i-th standard unit vector. In another simple form, f f f,,··· , f (x) x1 x2 xnFeng Li (SDU)Linear Regression TSeptember 14, 202013 / 31

Gradient (Contd.) u f (x) f (x) · u k f (x)kkuk cos a where a is the angle between f (x) and uWithout loss of generality, assume u is a unit vector, u f (x) k f (x)k cos aWhen u f (x) such that a 0 (and thus cos a 1, we have themaximum directional derivative of f , which implies that f (x) is thedirection of steepest ascent of f .Feng Li (SDU)Linear RegressionSeptember 14, 202014 / 31

Gradient Descent (GD) AlgorithmIf the multi-variable function J(θ) is differentiable in a neighborhood ofa point θ, then J(θ) decreases fastest if one goes from θ in the directionof the negative gradient of J at θFind a local minimum of a differentiable function using gradient descentAlgorithm 1 Gradient Descent1: Given a starting point θ dom J2: repeat3:Calculate gradient J(θ);4:Update θ θ α J(θ)5: until convergence criterion is satisfiedθ is usually initialized randomlyα is so-called learning rateFeng Li (SDU)Linear RegressionSeptember 14, 202015 / 31

GD Algorithm (Contd.)Stopping criterion (i.e., conditions to convergence)the gradient has its magnitude less than or equal to a predefined threshold (say ε), i.e.k f (x)k2 εwhere k · k2 is 2 norm, such that the values of the objective functiondiffer very slightly in successive iterationsSet a fixed value for the maximum number of iterations, such that thealgorithm is terminated after the number of the iterations exceeds thethreshold.Feng Li (SDU)Linear RegressionSeptember 14, 202016 / 31

GD Algorithm (Contd.)In more details, we update each component of θ according to the following rule J(θ), j 0, 1, · · · , nθj θj α θjCalculating the gradient for linear regression J(θ) θjm 1 X T (i)(θ x y (i) )2 θj 2 1 θj 2mXi 1mXnX(i)(θj xj y (i) )2i 1 j 0(i)(θT x (i) y (i) )xji 1Feng Li (SDU)Linear RegressionSeptember 14, 202017 / 31

GD Algorithm (Contd.)An illustration of gradient descent algorithmThe objective function is decreased fastest along the gradientFeng Li (SDU)Linear RegressionSeptember 14, 202018 / 31

GD Algorithm (Contd.)Another commonly used formm1 Xmin J(θ) (hθ (x (i) ) y (i) )2θ2mi 1What’s the difference?m is introduced to scale the objective function to deal with differentlysized training set.Gradient ascent algorithmMaximize the differentiable function J(θ)The gradient represents the direction along which J increases fastestTherefore, we have J(θ)θj θj α θjFeng Li (SDU)Linear RegressionSeptember 14, 202019 / 31

Convergence under Different Step Sizes0.6, 0.06, 0.07, 0.0710.5Objective function ationsFeng Li (SDU)Linear RegressionSeptember 14, 202020 / 31

Stochastic Gradient Descent (SGD)What if the training set is huge?In the above batch gradient descent algorithm, we have to run throughthe entire training set in each iterationA considerable computation cost is induced!Stochastic gradient descent (SGD), also known as incremental gradient descent, is a stochastic approximation of the gradient descentoptimization methodIn each iteration, the parameters are updated according to the gradientof the error with respect to one training sample onlyFeng Li (SDU)Linear RegressionSeptember 14, 202021 / 31

Stochastic Gradient Descent (Contd.)Algorithm 2 Stochastic Gradient Descent for Linear Regression1: Given a starting point θ dom J2: repeat3:Randomly shuffle the training data;4:for i 1, 2, · · · , m do5:θ θ α J(θ; x (i) , y (i) )6:end for7: until convergence criterion is satisfiedFeng Li (SDU)Linear RegressionSeptember 14, 202022 / 31

More About SGDThe objective does not always decrease for each iterationUsually, SGD has θ approaching the minimum much faster than batchGDSGD may never converge to the minimum, and oscillating may happenA variants: Mini-batch, say pick up a small group of samples and doaverage, which may accelerate and smoothen the convergenceFeng Li (SDU)Linear RegressionSeptember 14, 202023 / 31

Matrix Derivatives1A function f : Rm n RThe derivative of f with respect to A is defined f f· · · A A11n .Of (A) . f Am1··· f Amnas For an n n matrix, its trace is defined as trA Pni 1 AiitrABCD trDABC trCDAB trBCDAtrA trAT , tr(A B) trA trB, traA atrA5A trAB B T , 5AT f (A) (5A f (A))T5A trABAT C CAB C T AB T , 5A A A (A 1 )TFunky trace derivative OAT trABAT C B T AT C T BAT C1Details can be found in “Properties of the Trace and Matrix Derivatives” by JohnDuchiFeng Li (SDU)Linear RegressionSeptember 14, 202024 / 31

Revisiting Least SquareAssume (x (1) )T .X . (x (m) )T y (1) Y . y (m)Therefore, we have (1) T (1) hθ (x (1) ) y (1)(x ) θy . .Xθ Y . .x (m) )T θJ(θ) 12PmFeng Li (SDU)i 1 (hθ (x(i) )y (m)hθ (x (m) ) y (m) y (i) )2 12 (X θ Y )T (X θ Y )Linear RegressionSeptember 14, 202025 / 31

Revisiting Least Square (Contd.)Minimize J(θ) 21 (Y X θ)T (Y X θ)Calculate its derivatives with respect to θ15θ J(θ) 5θ (Y X θ)T (Y X θ)21 Oθ (Y T θT X T )(Y X θ)21Oθ tr(Y T Y Y T X θ θT X T Y θT X T X θ) 21 Oθ tr(θT X T X θ) X T Y21 T (X X θ X T X θ) X T Y2 XTXθ XTYTip: Funky trace derivative OAT trABAT C B T AT C T BAT CFeng Li (SDU)Linear RegressionSeptember 14, 202026 / 31

Revisiting Least Square (Contd.)Theorem:The matrix AT A is invertible if and only if the columns of A are linearlyindependent. In this case, there exists only one least-squares solutionθ (X T X ) 1 X T YProve the above theorem in Problem Set 1.Feng Li (SDU)Linear RegressionSeptember 14, 202027 / 31

Probabilistic InterpretationThe target variables and the inputs are relatedy θT x ’s denote the errors and are independently and identically distributed(i.i.d.) according to a Gaussian distribution N (0, σ 2 )The density of (i) is given by 1 2f ( ) exp 22σ2πσThe conditional probability density function of yy x; θ N (θT x, σ 2 )Feng Li (SDU)Linear RegressionSeptember 14, 202028 / 31

Probabilistic Interpretation (Contd.)The training data {x (i) , y (i) }i 1,··· ,m are sampled identically and independently!1(y (i) θT x (i) )2(i)(i)p(y y x x ; θ) exp 2σ 22πσLikelihood functoinL(θ) Yp(y (i) x (i) ; θ)i YiFeng Li (SDU)1(y (i) θT x (i) )2 exp 2σ 22πσLinear Regression!September 14, 202029 / 31

Probabilistic Interpretation (Contd.)Maximizing the likelihood L(θ)Since L(θ) is complicated, we maximize an increasing function of L(θ)instead (θ) log L(θ)!mY(y (i) θT x (i) )21 exp log2σ 22πσi!mX1(y (i) θT x (i) )2 log exp 2σ 22πσi1 X (i)1 2 m log (y θT x (i) )22πσ 2σiApparently, maximizing L(θ) (thus (θ)) is equivalent to minimizingm1 X (i)(y θT x (i) )22iFeng Li (SDU)Linear RegressionSeptember 14, 202030 / 31

Thanks!Q&AFeng Li (SDU)Linear RegressionSeptember 14, 202031 / 31

Lecture 2: Linear Regression 1 Supervised Learning: Regression and Classi cation 2 Linear Regression 3 Gradient Descent Algorithm 4 Stochastic Gradient Descent 5 Revisiting Least Square 6 A Probabilistic Interpretation to Linear Regressi

Related Documents:

Introduction to Regression Procedures

independent variables. Many other procedures can also ﬁt regression models, but they focus on more specialized forms of regression, such as robust regression, generalized linear regression, nonlinear regression, nonparametric regression, quantile regression, regression modeling of survey data, regression modeling of

161 Views

2y ago

CHEMICAL REACTION ENGINEERING

Introduction of Chemical Reaction Engineering Introduction about Chemical Engineering 0:31:15 0:31:09. Lecture 14 Lecture 15 Lecture 16 Lecture 17 Lecture 18 Lecture 19 Lecture 20 Lecture 21 Lecture 22 Lecture 23 Lecture 24 Lecture 25 Lecture 26 Lecture 27 Lecture 28 Lecture

100 Views

2y ago

ECE595 / STAT598: Machine Learning I Lecture 01: Linear ...

Lecture 1: Linear regression: A basic data analytic tool Lecture 2: Regularization: Constraining the solution Lecture 3: Kernel Method: Enabling nonlinearity Lecture 1: Linear Regression Linear Regression Notation Loss Function Solving the Regression Problem Geome

46 Views

2y ago

Lecture 14 Multiple Linear Regression and Logistic Regression

LINEAR REGRESSION 12-2.1 Test for Significance of Regression 12-2.2 Tests on Individual Regression Coefficients and Subsets of Coefficients 12-3 CONFIDENCE INTERVALS IN MULTIPLE LINEAR REGRESSION 12-3.1 Confidence Intervals on Individual Regression Coefficients 12-3.2 Confidence Interval

93 Views

2y ago

Lecture 9: Linear Regression - UW Genome Sciences

Lecture 9: Linear Regression. Goals Linear regression in R Estimating parameters and hypothesis testing with linear models Develop basic concepts of linear regression from a probabilistic framework. Regression Technique used for the modeling and analysis of numerical dataFile Size: 834KB

41 Views

2y ago

Lecture notes on CS725 : Machine learning - IIT Bombay

3 LECTURE 3 : REGRESSION 10 3 Lecture 3 : Regression This lecture was about regression. It started with formally de ning a regression problem. Then a simple regression model called linear regression was discussed. Di erent methods for learning the parameters in the model were next discussed. It also covered least square solution for the problem

21 Views

1y ago

LINEAR REGRESSION - York University

Probability & Bayesian Inference CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition J. Elder 3 Linear Regression Topics What is linear regression? Example: polynomial curve fitting Other basis families Solving linear regression problems Regularized regression Multiple linear regression

19 Views

1y ago

Linear regression, Logistic regression, and Generalized Linear Models

Its simplicity and ﬂexibility makes linear regression one of the most important and widely used statistical prediction methods. There are papers, books, and sequences of courses devoted to linear regression. 1.1Fitting a regression We ﬁt a linear regression to covariate/response data. Each data point is a pair .x;y/, where

10 Views

1y ago

Recent Views

Grammar as a Foreign Language - List of Proceedings

Grammar as a Foreign Language Oriol Vinyals Google vinyals@google.com Lukasz Kaiser Google lukaszkaiser@google.com Terry Koo Google terrykoo@google.com Slav Petrov Google slav@google.com Ilya Sutskever Google ilyasu@google.com Geoffrey Hinton Google geoffhinton@google.com Abstract Synta

2y ago

452 Views

Attention is All you Need - NIPS

Google Brain avaswani@google.com Noam Shazeer Google Brain noam@google.com Niki Parmar Google Research nikip@google.com Jakob Uszkoreit Google Research usz@google.com Llion Jones Google Research llion@google.com Aidan N. Gomezy University of Toronto aidan@cs.toronto.edu Łukasz Kaiser Google Brain lukaszkaiser@google.com Illia Polosukhinz illia .

1y ago

313 Views

GSA Implementation of Google (G) Suite

Google Meet Classic Hangouts Google Chat Google Calendar Google Drive and Shared Drive Google Docs Google Sheets Google Slides Google Forms Google Sites Google Keep Apps Script D

2y ago

323 Views

Google Drive (Google Docs, Google Sheets, Google Slides)

Google Drive (Google Docs, Google Sheets, Google Slides) Employees are automatically issued a Kyrene Google account. Navigate to drive.google.com. Use Kyrene email address and network password to login. Launch in Chrome browser for best experience. Google Drive is a cloud storage sys

2y ago

400 Views

Quick Guide of Using Google Home to Control Smart Devices

Configuration needs Google Home app. Search "Google Home" in App Store or Google Play to install the app. 3.1 Set up Google Home with Google Home app You can skip this part if your Google Home is already set up. 1. Make sure your Google Home is energized. 2. Open the Google Home app by tapping the app icon on your mobile device. 3.

1y ago

340 Views

Elaboração de Provas Online usando o Formulário Google Docs

2 Após o login acesse o Google Drive ou o Google Docs e selecione a ferramenta Google Forms (Formulários). Clique na caixa de Ferramentas do Google, localizada no canto direito superior da tela e selecione o Google Drive. Na tela do Google Drive clique em New , opção More e selecione Google Forms. OBS: É possível acessar o google

11m ago

129 Views

ACS WASC Templates

File upload, Folder upload, Google Docs, Google Sheets, or Google Slides. You can also create Google Forms, Google Drawings, Google My Maps, etc. Share with exactly who you want — without email attachments. Search or sort your list of files, folders, and Google Docs. Preview files and Google Docs.

2y ago

372 Views

Google Drive - San Bernardino City Unified School District

Google Apps All of the Google applications that are available upon logging into Google.com (G , Gmail, Gphotos, Gdrive, etc.). Google Suite Google’s online cloud based office companion applications (Docs, Sheets, Slides). Google Drive Google’s online cloud storage and file sharing/collaboration application.

2y ago

388 Views

Single Sign On for Google Apps with NetScaler Unified Gateway

Google Apps for Work is a suite of cloud computing productivity and collaboration applications provided by Google on a subscription basis. It includes Google’s popular web applications including Gmail, Google Drive, Google Hangouts, Google Calendar and Google

2y ago

302 Views

Serviceteil

Google 84, 87, 124 Google 110 Google AdWords 101, 103 Google Alerts 127 Google Analytics 89 Google Maps 100, 110, 173 Google-Maps 63 Google Places 100, 103, 124 Graphiken 66 H Haftung 170 Haftungsausschluss 72 Hausfarbe 11 Headline 35 Heilmittelwerbegesetz 14, 69, 163 Heilversprechen 164 HONcode 78 HTML 58 HWG 31 I Imagefilm 31

2y ago

342 Views

Best practices for managing identities when you move to Google Cloud

Google Cloud. To provide t he informat ion an organizat ion would ne e d to transfer data and ownership from one Google Account to anot her for s ome of t he noncore Google s er vice s, such as Google Ads, Google Analyt ics, or DV360. Intende d audience Organizat ion administrators. Sta planning Google Cloud / Google Wor kspace migrat ion. Key .

1y ago

491 Views

Introduction - Google Earth User Guide

Google Earth Community: Learn from other Google Earth users by asking questions and sharing answers on the Google Earth Community forums. Using Google Earth: This blog describes how you can use some of the interesting features of Google Earth. Selecting a Server Note: This section is relevant to Google Earth Pro and EC users.

3y ago

296 Views

Using Google Forms to Manage Officials Signups

Google Sheets, deleting a response from the form or sheet will not affect the other. Once the Google Form is linked to a Google Sheet, clicking on the spreadsheet icon will open the linked Google Sheet. Google Responses Sheet Google automatically creates and populates the sp

2y ago

281 Views

Google Cheat Sheets - Shake Up Learning

Google Slides Cheat Sheet p. 15-18 Google Sheets Cheat Sheet p. 19-22 Google Drawings Cheat Sheet p. 23-26 Google Drive for iOS Cheat Sheet p. 27-29 Google Chrome Cheat Sheet p. 30-32 ShakeUpLearning.com Google Cheat Sheets - By Kasey Bell 3

2y ago

303 Views

ChromeBox CXI (McQueen) UM (date) EN

Create a new Google Account. You can create a new Google Account if you don’t already have one. Click . Create a Google Account. on the right to set up a new account. A Google Account gives you access to useful web services developed by Google, such as Gmail, Google Docs, and Google Calendar. Browse as a guest

2y ago

184 Views

Lecture 2: Linear Regression - GitHub Pages

It looks like you're using an ad-blocker