1y ago

8 Views

1 Downloads

4.18 MB

54 Pages

Transcription

LINEAR REGRESSIONJ. ElderCSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

CreditsProbability & Bayesian Inference2 Some of these slides were sourced and/or modifiedfrom: ChristopherBishop, Microsoft UKCSE 4404/5327 Introduction to Machine Learning and Pattern RecognitionJ. Elder

Linear Regression TopicsProbability & Bayesian Inference3 What is linear regression?Example: polynomial curve fittingOther basis familiesSolving linear regression problemsRegularized regressionMultiple linear regressionBayesian linear regressionCSE 4404/5327 Introduction to Machine Learning and Pattern RecognitionJ. Elder

What is Linear Regression?Probability & Bayesian Inference4 In classification, we seek to identify the categorical class Ckassociate with a given input vector x.In regression, we seek to identify (or estimate) a continuousvariable y associated with a given input vector x.y is called the dependent variable.x is called the independent variable.If y is a vector, we call this multiple regression.We will focus on the case where y is a scalar.Notation: y will denote the continuous model of the dependent variablet will denote discrete noisy observations of the dependentvariable (sometimes called the target variable).CSE 4404/5327 Introduction to Machine Learning and Pattern RecognitionJ. Elder

Where is the Linear in Linear Regression?Probability & Bayesian Inference5 In regression we assume that y is a function of x.The exact nature of this function is governed by anunknown parameter vector w:y y x, wThe regression is linear if y is linear in w. In otherwords, we can express y as( ) ()y wt! xwhere()! x is some (potentially nonlinear) function of x.CSE 4404/5327 Introduction to Machine Learning and Pattern RecognitionJ. Elder

Linear Basis Function ModelsProbability & Bayesian Inference6 Generallywhere ϕj(x) are known as basis functions.Typically, Φ0(x) 1, so that w0 acts as a bias.In the simplest case, we use linear basis functions :Φd(x) xd.CSE 4404/5327 Introduction to Machine Learning and Pattern RecognitionJ. Elder

Linear Regression TopicsProbability & Bayesian Inference7 What is linear regression?Example: polynomial curve fittingOther basis familiesSolving linear regression problemsRegularized regressionMultiple linear regressionBayesian linear regressionCSE 4404/5327 Introduction to Machine Learning and Pattern RecognitionJ. Elder

Example: Polynomial BasesProbability & Bayesian Inference8 Polynomial basisfunctions:These are globala small change in xaffects all basis functions. A small change in abasis function affects yfor all x. CSE 4404/5327 Introduction to Machine Learning and Pattern RecognitionJ. Elder

Example: Polynomial Curve Fitting9Probability & Bayesian InferenceCSE 4404/5327 Introduction to Machine Learning and Pattern RecognitionJ. Elder

Sum-of-Squares Error Function10Probability & Bayesian InferenceCSE 4404/5327 Introduction to Machine Learning and Pattern RecognitionJ. Elder

1st Order Polynomial11Probability & Bayesian InferenceCSE 4404/5327 Introduction to Machine Learning and Pattern RecognitionJ. Elder

3rd Order Polynomial12Probability & Bayesian InferenceCSE 4404/5327 Introduction to Machine Learning and Pattern RecognitionJ. Elder

9th Order Polynomial13Probability & Bayesian InferenceCSE 4404/5327 Introduction to Machine Learning and Pattern RecognitionJ. Elder

RegularizationProbability & Bayesian Inference14 Penalize large coefficient valuesCSE 4404/5327 Introduction to Machine Learning and Pattern RecognitionJ. Elder

RegularizationProbability & Bayesian Inference159th Order PolynomialCSE 4404/5327 Introduction to Machine Learning and Pattern RecognitionJ. Elder

RegularizationProbability & Bayesian Inference169th Order PolynomialCSE 4404/5327 Introduction to Machine Learning and Pattern RecognitionJ. Elder

RegularizationProbability & Bayesian Inference179th Order PolynomialCSE 4404/5327 Introduction to Machine Learning and Pattern RecognitionJ. Elder

Probabilistic View of Curve FittingProbability & Bayesian Inference18 Why least squares?Model noise (deviation of data from model) asGaussian i.i.d.where ! !1is the precision of the noise.2"CSE 4404/5327 Introduction to Machine Learning and Pattern RecognitionJ. Elder

Maximum LikelihoodProbability & Bayesian Inference19 We determine wML by minimizing the squared error E(w).Thus least-squares regression reflects an assumption that thenoise is i.i.d. Gaussian.CSE 4404/5327 Introduction to Machine Learning and Pattern RecognitionJ. Elder

Maximum LikelihoodProbability & Bayesian Inference20 We determine wML by minimizing the squared error E(w). Now given wML, we can estimate the variance of the noise:CSE 4404/5327 Introduction to Machine Learning and Pattern RecognitionJ. Elder

Predictive Distribution21Probability & Bayesian InferenceGenerating functionObserved dataMaximum likelihood predictionPosterior over tCSE 4404/5327 Introduction to Machine Learning and Pattern RecognitionJ. Elder

MAP: A Step towards BayesProbability & Bayesian Inference22 Prior knowledge about probable values of w can be incorporated into theregression:Now the posterior over w is proportional to the product of the likelihoodtimes the prior:The result is to introduce a new quadratic term in w into the error functionto be minimized:Thus regularized (ridge) regression reflects a 0-mean isotropic Gaussianprior on the weights.CSE 4404/5327 Introduction to Machine Learning and Pattern RecognitionJ. Elder

Linear Regression TopicsProbability & Bayesian Inference23 What is linear regression?Example: polynomial curve fittingOther basis familiesSolving linear regression problemsRegularized regressionMultiple linear regressionBayesian linear regressionCSE 4404/5327 Introduction to Machine Learning and Pattern RecognitionJ. Elder

Gaussian BasesProbability & Bayesian Inference24 Gaussian basis functions:Think of these as interpolation functions.These are local:a small change in x affectsonly nearby basis functions. a small change in a basisfunction affects y only fornearby x. μj and s control locationand scale (width). CSE 4404/5327 Introduction to Machine Learning and Pattern RecognitionJ. Elder

Linear Regression TopicsProbability & Bayesian Inference25 What is linear regression?Example: polynomial curve fittingOther basis familiesSolving linear regression problemsRegularized regressionMultiple linear regressionBayesian linear regressionCSE 4404/5327 Introduction to Machine Learning and Pattern RecognitionJ. Elder

Maximum Likelihood and Linear Least SquaresProbability & Bayesian Inference26 Assume observations from a deterministic function withadded Gaussian noise:where which is the same as saying,Given observed inputs,, andtargets,we obtain the likelihoodfunctionCSE 4404/5327 Introduction to Machine Learning and Pattern RecognitionJ. Elder

Maximum Likelihood and Linear Least SquaresProbability & Bayesian Inference27 Taking the logarithm, we get where is the sum-of-squares error.CSE 4404/5327 Introduction to Machine Learning and Pattern RecognitionJ. Elder

Maximum Likelihood and Least SquaresProbability & Bayesian Inference28 Computing the gradient and setting it to zero yields Solving for w, we get whereThe Moore-Penrosepseudo-inverse,.CSE 4404/5327 Introduction to Machine Learning and Pattern RecognitionJ. Elder

End of Lecture 8

Linear Regression TopicsProbability & Bayesian Inference30 What is linear regression?Example: polynomial curve fittingOther basis familiesSolving linear regression problemsRegularized regressionMultiple linear regressionBayesian linear regressionCSE 4404/5327 Introduction to Machine Learning and Pattern RecognitionJ. Elder

Regularized Least SquaresProbability & Bayesian Inference31 Consider the error function:Data term Regularization term With the sum-of-squares error function and aquadratic regularizer, we getwhich is minimized byλ is called theregularizationcoefficient.Thus the name ‘ridge regression’CSE 4404/5327 Introduction to Machine Learning and Pattern RecognitionJ. Elder

Regularized Least SquaresProbability & Bayesian Inference32 With a more general regularizer, we haveLassoQuadratic(Least absolute shrinkage and selection operator)CSE 4404/5327 Introduction to Machine Learning and Pattern RecognitionJ. Elder

Regularized Least SquaresProbability & Bayesian Inference33 Lasso generates sparse solutions.Iso-contoursof data term ED(w)Iso-contour ofregularization term EW(w)QuadraticLassoCSE 4404/5327 Introduction to Machine Learning and Pattern RecognitionJ. Elder

Solving Regularized SystemsProbability & Bayesian Inference34 Quadratic regularization has the advantage thatthe solution is closed form.Non-quadratic regularizers generally do not haveclosed form solutionsLasso can be framed as minimizing a quadraticerror with linear constraints, and thus represents aconvex optimization problem that can be solved byquadratic programming or other convexoptimization methods.We will discuss quadratic programming when wecover SVMsCSE 4404/5327 Introduction to Machine Learning and Pattern RecognitionJ. Elder

Linear Regression TopicsProbability & Bayesian Inference35 What is linear regression?Example: polynomial curve fittingOther basis familiesSolving linear regression problemsRegularized regressionMultiple linear regressionBayesian linear regressionCSE 4404/5327 Introduction to Machine Learning and Pattern RecognitionJ. Elder

Multiple OutputsProbability & Bayesian Inference36 Analogous to the single output case we have:Given observed inputstargetswe obtain the log likelihood function, andCSE 4404/5327 Introduction to Machine Learning and Pattern RecognitionJ. Elder

Multiple OutputsProbability & Bayesian Inference37 Maximizing with respect to W, we obtain If we consider a single target variable, tk, we see that wheresingle output case., which is identical with theCSE 4404/5327 Introduction to Machine Learning and Pattern RecognitionJ. Elder

Some Useful MATLAB FunctionsProbability & Bayesian Inference38 polyfit Least-squaresfit of a polynomial of specified order togiven data regress Moregeneral function that computes linear weights forleast-squares fitCSE 4404/5327 Introduction to Machine Learning and Pattern RecognitionJ. Elder

Linear Regression TopicsProbability & Bayesian Inference39 What is linear regression?Example: polynomial curve fittingOther basis familiesSolving linear regression problemsRegularized regressionMultiple linear regressionBayesian linear regressionCSE 4404/5327 Introduction to Machine Learning and Pattern RecognitionJ. Elder

Bayesian Linear RegressionRev. Thomas Bayes, 1702 - 1761

Bayesian Linear RegressionProbability & Bayesian Inference41 Define a conjugate prior over w:Combining this with the likelihood function and usingresults for marginal and conditional Gaussiandistributions, gives the posterior whereCSE 4404/5327 Introduction to Machine Learning and Pattern RecognitionJ. Elder

Bayesian Linear RegressionProbability & Bayesian Inference42 A common choice for the prior is for which Thus mN represents the ridge regression solution with! " /# Next we consider an example CSE 4404/5327 Introduction to Machine Learning and Pattern RecognitionJ. Elder

Bayesian Linear Regression43Probability & Bayesian Inference0 data points observedPriorData SpaceCSE 4404/5327 Introduction to Machine Learning and Pattern RecognitionJ. Elder

Bayesian Linear RegressionProbability & Bayesian Inference441 data point observedLikelihood for (x1,t1)PosteriorData SpaceCSE 4404/5327 Introduction to Machine Learning and Pattern RecognitionJ. Elder

Bayesian Linear RegressionProbability & Bayesian Inference452 data points observedLikelihood for (x2,t2)PosteriorData SpaceCSE 4404/5327 Introduction to Machine Learning and Pattern RecognitionJ. Elder

Bayesian Linear RegressionProbability & Bayesian Inference4620 data points observedLikelihood for (x20,t20)PosteriorData SpaceCSE 4404/5327 Introduction to Machine Learning and Pattern RecognitionJ. Elder

Predictive DistributionProbability & Bayesian Inference47 Predict t for new values of x by integrating over w: whereCSE 4404/5327 Introduction to Machine Learning and Pattern RecognitionJ. Elder

Predictive DistributionProbability & Bayesian Inference48 Example: Sinusoidal data, 9 Gaussian basis functions,1 data pointNotice how much bigger our uncertainty isrelative to the ML method!!(p t t,! , "Samples of y(x,w))E # t t,! , " %&CSE 4404/5327 Introduction to Machine Learning and Pattern RecognitionJ. Elder

Predictive DistributionProbability & Bayesian Inference49 Example: Sinusoidal data, 9 Gaussian basis functions,2 data pointsE # t t,! , " %&(p t t,! , ")Samples of y(x,w)CSE 4404/5327 Introduction to Machine Learning and Pattern RecognitionJ. Elder

Predictive DistributionProbability & Bayesian Inference50 Example: Sinusoidal data, 9 Gaussian basis functions,4 data pointsE # t t,! , " %&(p t t,! , ")Samples of y(x,w)CSE 4404/5327 Introduction to Machine Learning and Pattern RecognitionJ. Elder

Predictive DistributionProbability & Bayesian Inference51 Example: Sinusoidal data, 9 Gaussian basis functions,25 data pointsE # t t,! , " %&(p t t,! , ")Samples of y(x,w)CSE 4404/5327 Introduction to Machine Learning and Pattern RecognitionJ. Elder

Equivalent KernelProbability & Bayesian Inference52 The predictive mean can be writtenEquivalent kernel orsmoother matrix. This is a weighted sum of the training data targetvalues, tn.CSE 4404/5327 Introduction to Machine Learning and Pattern RecognitionJ. Elder

Equivalent Kernel53Probability & Bayesian InferenceWeight of tn depends on distance between x and xn;nearby xn carry more weight.CSE 4404/5327 Introduction to Machine Learning and Pattern RecognitionJ. Elder

Linear Regression TopicsProbability & Bayesian Inference54 What is linear regression?Example: polynomial curve fittingOther basis familiesSolving linear regression problemsRegularized regressionMultiple linear regressionBayesian linear regressionCSE 4404/5327 Introduction to Machine Learning and Pattern RecognitionJ. Elder

Probability & Bayesian Inference CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition J. Elder 3 Linear Regression Topics What is linear regression? Example: polynomial curve fitting Other basis families Solving linear regression problems Regularized regression Multiple linear regression

Related Documents: