ECE595 / STAT598: Machine Learning I Lecture 01: Linear .

2y ago
45 Views
3 Downloads
798.76 KB
22 Pages
Last View : 18d ago
Last Download : 3m ago
Upload by : Julia Hutchens
Transcription

ECE595 / STAT598: Machine Learning ILecture 01: Linear RegressionSpring 2020Stanley ChanSchool of Electrical and Computer EngineeringPurdue Universityc Stanley Chan 2020. All Rights Reserved.1 / 22

Outlinec Stanley Chan 2020. All Rights Reserved.2 / 22

OutlineMathematical BackgroundLecture 1: Linear regression: A basic data analytic toolLecture 2: Regularization: Constraining the solutionLecture 3: Kernel Method: Enabling nonlinearityLecture 1: Linear RegressionLinear RegressionNotationLoss FunctionSolving the Regression ProblemGeometryProjectionMinimum-Norm SolutionPseudo-Inversec Stanley Chan 2020. All Rights Reserved.3 / 22

Basic NotationScalar: a, b, c Ra, b, c RdMatrix: A, B , C RN d ; Entries are aijVector:or [A]ij .Rows and Columns A a1 a2 . ad , — ( x 1 )T — (x 2 )T and A . .— ( x N )T —— . —{a j }: The j-th feature. {x n }: The n-th sample.Identity matrixIAll-one vector 1 and all-zero vector 0Standard basisei .c Stanley Chan 2020. All Rights Reserved.4 / 22

Line 50.60.70.80.91c Stanley Chan 2020. All Rights Reserved.5 / 22

Linear RegressionThe problem of regression can be summarized as:Given measurements: y n , (where n 1, . . . , N)Given inputs: x nGiven a model: gθ (x n ) parameterized by θDetermine θ such that y n gθ (x n ).Linear regression is one type of regression:Restrict gθ (·) to a line:gθ (x ) x T θThe inputsx and the parameters θ arex [x1 , . . . , xd ]T andθ [θ1 , . . . , θd ]TThis is equivalent togθ (x ) x T θ dXxj θj .j 1c Stanley Chan 2020. All Rights Reserved.6 / 22

Solving the Regression ProblemThe (general) regression can be solved via the following logic:Define a (Squared-Error) Loss FunctionJ(θ) NXn 1NXL(gθ (x n ), y n )(gθ (x n ) y n )2 ,defe.g., L( , ) ( )2n 1Other loss functions can be used, e.g., L( , ) .The goal is to solve an optimizationθb argmin J(θ).θThe prediction of a new inputx new is y new gθb(x new ) θbT x new .c Stanley Chan 2020. All Rights Reserved.7 / 22

Linear Regression SolutionThe linear regression problem is a special case which we can solveanalytically.Restrict gθ (·) to a line:gθ (x n ) θ T x nThen the loss function becomesNXJ(θ) (θ T x n y n )2 kAθ y k2 .n 1The matrix and — —A —vectors are defined as (x 1 )T —θ12T (x )— . , θ . ,. .θdNT(x )— andy y1 y2 . . yNk · k2 stands for the 2 -norm square. See Tutorial on Linear Algebra.c Stanley Chan 2020. All Rights Reserved.8 / 22

Linear Regression SolutionTheoremFor a linear regression problemdefθb argmin J(θ) kAθ y k2 ,θthe minimizer isθb (AT A) 1 AT y .Take derivative and setting to zero: (See Tutorial on “LinearAlgebra”.) θ J(θ) θ kAθ y k2 2AT (Aθ y ) 0.So solution is θb (AT A) 1 AT y , assumingAT A is invertible.c Stanley Chan 2020. All Rights Reserved.9 / 22

ExamplesExample 1: Second-order polynomial fittingx12A . xN2yn axn2 bxn c x1 1y1. . , y . , . . . xN1yN aθ b cExample 2: Auto-regression yn ayn 1 byn 2 y1y3 y2 y4 ,y . , . . y2 y3A .yN 1 yN 2 aθ byNc Stanley Chan 2020. All Rights Reserved.10 / 22

Generalized Linear Regression10-101234567891001234567891010-11Eg 1: Fourier series n sin(ω0 tn )x1 x n sin(2ω t ) x n .2 . 0 n .xdnsin(K ω0 tn )0-10123456789101y n θT x n dXθk sin(kω0 tn )k 10θk : k-th Fourier coefficient-1012345678910sin(kω0 tn ): k-th Fourier basis attime tn10-1-2012345678910c Stanley Chan 2020. All Rights Reserved.11 / 22

OutlineMathematical BackgroundLecture 1: Linear regression: A basic data analytic toolLecture 2: Regularization: Constraining the solutionLecture 3: Kernel Method: Enabling nonlinearityLecture 1: Linear RegressionLinear RegressionNotationLoss FunctionSolving the Regression ProblemGeometryProjectionMinimum-Norm SolutionPseudo-Inversec Stanley Chan 2020. All Rights Reserved.12 / 22

Linear SpanGiven a set of vectors {a 1 , . . . , a d }, the span is the set of all possiblelinear combinations of these vectors. dXspan a 1 , . . . , a d z z αj a j(1)j 1Which of the following sets of vectors can span R3 ?c Stanley Chan 2020. All Rights Reserved.13 / 22

Geometry of Linear RegressionGiven θ, the productAθ can be viewed as Aθ a1 a2 θ1d X . . . a d . θj a j .j 1 θd So the set of all possible Aθ’s is equivalent to span{a 1 , . . . , a d }. Definethe range of A as R(A) {by yb Aθ}. Note that y 6 R(A).c Stanley Chan 2020. All Rights Reserved.14 / 22

Orthogonality PrincipleConsider the errore y Aθ.For the error to minimize, it must be orthogonal to R(A), which isthe span of the columns.This orthogonality principle means thatj 1, . . . , d, which implies AT e 0.aTj e 0 for allc Stanley Chan 2020. All Rights Reserved.15 / 22

Normal EquationThe orthogonality principle, which states that AT e 0, implies thatAT (y Aθ) 0 by substituting e y Aθ.This is called the normal equation:AT Aθ AT y .(2)The predicted value isyb Aθb A(AT A) 1 AT ydefThe matrix P A(AT A) 1 AT is a projection onto the span of{a 1 , . . . , a d }, i.e., the range of A.P is called the projection matrix. It holds that PP P .The error e y yb ise y A(AT A) 1 AT y (I P )y .c Stanley Chan 2020. All Rights Reserved.16 / 22

Over-determined and Under-determined SystemsAssume A has full column rank.Over-determined A: Tall and skinny. θb (AT A) 1 AT y .Under-determined A: Fat and short. θb AT (AAT ) 1 y .If A is under-determined, then there exists a non-trivial null spaceN (A) {θ Aθ 0}.This implies that if θb is a solution, then θb θ 0 is also a solution aslong as θ 0 N (A). (Why?)c Stanley Chan 2020. All Rights Reserved.17 / 22

Minimum-Norm SolutionAssume A is fat and has full row rank.Since A is fat, there exists infinitely many θb such thatSo we need to pick one in order to be unique.It turns out that θb AT (AAT ) 1 y is the solution toθb argmin kθk2 subject toθAθ y .Aθb y .(3)(You can solve this problem using Lagrange multiplier. See Appendix.)This is called the minimum-norm solution.c Stanley Chan 2020. All Rights Reserved.18 / 22

What if Rank-Deficient?If A is rank-deficient, then AT A is not invertibleApproach 1: Regularization. See Lecture 2.Approach 2: Pseudo-inverse. Decompose A USV T .U RN N , with U T U I . V Rd d , with V T V I .The diagonal block of S RN d is diag{s1 , . . . , sr , 0, . . . , 0}.The solution is called the pseudo-inverse:θb V S U T y ,where(4)S diag{1/s1 , . . . , 1/sr , 0, . . . , 0}.c Stanley Chan 2020. All Rights Reserved.19 / 22

Reading ListLinear AlgebraGilbert Strang, Linear Algebra and Its Applications, 5th Edition.Carl Meyer, Matrix Analysis and Applied Linear Algebra, SIAM, 2000.Univ. Waterloo Matrix Cookbook. https://www.math.uwaterloo.ca/ hwolkowi/matrixcookbook.pdfLinear RegressionStanford CS 229 (Note on Linear nalg.pdfElements of Statistical Learning (Chapter 3.2)https://web.stanford.edu/ hastie/ElemStatLearn/Learning from Data (Chapter 3.2)https://work.caltech.edu/telecoursec Stanley Chan 2020. All Rights Reserved.20 / 22

Appendixc Stanley Chan 2020. All Rights Reserved.21 / 22

Solving the Minimum Norm problemConsider this problemθb argmin kθk2 subject toθAθ y .(5)The Lagrangian isL(θ, λ) kθk2 λT (Aθ y ).Take derivative with respect to θ and λ yields θ L 2θ AT λ 0, λ L Aθ y 0First equation gives us θ AT λ/2.Substitute into second equation yields λ 2(AAT ) 1 y .Therefore, θ AT (AAT ) 1 y .c Stanley Chan 2020. All Rights Reserved.22 / 22

Lecture 1: Linear regression: A basic data analytic tool Lecture 2: Regularization: Constraining the solution Lecture 3: Kernel Method: Enabling nonlinearity Lecture 1: Linear Regression Linear Regression Notation Loss Function Solving the Regression Problem Geome

Related Documents:

Texts of Wow Rosh Hashana II 5780 - Congregation Shearith Israel, Atlanta Georgia Wow ׳ג ׳א:׳א תישארב (א) ׃ץרֶָֽאָּהָּ תאֵֵ֥וְּ םִימִַׁ֖שַָּה תאֵֵ֥ םיקִִ֑לֹאֱ ארָָּ֣ Îָּ תישִִׁ֖ארֵ Îְּ(ב) חַורְָּ֣ו ם

Learning is feasible if x p(x) p(x) says: Training and testing are related If training and testing are u

Lecture 1: Linear regression: A basic data analytic tool Lecture 2: Regularization: Constraining the solution Lecture 3: Kernel Method: Enabling nonlinearity Lecture 2: Regularization Ridge Regression Regularization Parameter LASSO

Convex Optimization for Logistic Regression We can use CVX to solve the logistic regression problem But it requires some re-organization of the equations J( ) XN n 1 n y n Tx n log(1 h (x n)) o XN n 1 n y n Tx n log 1 e Tx n 1 e Tx n! o XN n 1 n y n Tx n log 1 e Tx n o 8 : XN n 1 y nx n! T XN n 1 log 1 e Tx n 9 ;: The last .

Is Logistic Regression Better than Linear? Scenario 1: Identical Covariance. Equal Prior. Enough samples. N(0;1) with 100 samples and N(10;1) with 100 samples. Linear and logistic: Not much di erent.-5 0 5 10 15 0 0.2 0.4 0.6 0.8 1 Bayes oracle Bayes empirical lin reg lin reg decision log reg log reg decision

Maximum Loss Attack De nition (Maximum Loss Attack) The maximum loss attack nds a perturbed data x by solving the optimization maximize x g t(x ) max j6 t fg j(x )g subject to kx x 0k ; (2) where kkcan be any norm speci ed by the user, and 0 denotes the attack strength. I want to bound my attack kx x 0k I want to make g t(x ) as big as possible

decoration machine mortar machine paster machine plater machine wall machinery putzmeister plastering machine mortar spraying machine india ez renda automatic rendering machine price wall painting machine price machine manufacturers in china mail concrete mixer machines cement mixture machine wall finishing machine .

“Cost accounting is a quantitative method that accumulates, classifies, summarizes and interprets information for three major purposes: (in) Operational planning and control ;( ii) Special decision; and (iii) Product decision.” -Charles T. Horngren. 2 “Cost accounting is the process of accounting for costs from the point at which the expenditure is incurred of committed to the .