LINEAR REGRESSION - York University

1y ago
17 Views
2 Downloads
4.18 MB
54 Pages
Last View : 9d ago
Last Download : 3m ago
Upload by : Hayden Brunner
Transcription

LINEAR REGRESSIONJ. ElderCSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

CreditsProbability & Bayesian Inference2 Some of these slides were sourced and/or modifiedfrom: ChristopherBishop, Microsoft UKCSE 4404/5327 Introduction to Machine Learning and Pattern RecognitionJ. Elder

Linear Regression TopicsProbability & Bayesian Inference3 What is linear regression?Example: polynomial curve fittingOther basis familiesSolving linear regression problemsRegularized regressionMultiple linear regressionBayesian linear regressionCSE 4404/5327 Introduction to Machine Learning and Pattern RecognitionJ. Elder

What is Linear Regression?Probability & Bayesian Inference4 In classification, we seek to identify the categorical class Ckassociate with a given input vector x.In regression, we seek to identify (or estimate) a continuousvariable y associated with a given input vector x.y is called the dependent variable.x is called the independent variable.If y is a vector, we call this multiple regression.We will focus on the case where y is a scalar.Notation: y will denote the continuous model of the dependent variablet will denote discrete noisy observations of the dependentvariable (sometimes called the target variable).CSE 4404/5327 Introduction to Machine Learning and Pattern RecognitionJ. Elder

Where is the Linear in Linear Regression?Probability & Bayesian Inference5 In regression we assume that y is a function of x.The exact nature of this function is governed by anunknown parameter vector w:y y x, wThe regression is linear if y is linear in w. In otherwords, we can express y as( ) ()y wt! xwhere()! x is some (potentially nonlinear) function of x.CSE 4404/5327 Introduction to Machine Learning and Pattern RecognitionJ. Elder

Linear Basis Function ModelsProbability & Bayesian Inference6 Generallywhere ϕj(x) are known as basis functions.Typically, Φ0(x) 1, so that w0 acts as a bias.In the simplest case, we use linear basis functions :Φd(x) xd.CSE 4404/5327 Introduction to Machine Learning and Pattern RecognitionJ. Elder

Linear Regression TopicsProbability & Bayesian Inference7 What is linear regression?Example: polynomial curve fittingOther basis familiesSolving linear regression problemsRegularized regressionMultiple linear regressionBayesian linear regressionCSE 4404/5327 Introduction to Machine Learning and Pattern RecognitionJ. Elder

Example: Polynomial BasesProbability & Bayesian Inference8 Polynomial basisfunctions:These are globala small change in xaffects all basis functions. A small change in abasis function affects yfor all x. CSE 4404/5327 Introduction to Machine Learning and Pattern RecognitionJ. Elder

Example: Polynomial Curve Fitting9Probability & Bayesian InferenceCSE 4404/5327 Introduction to Machine Learning and Pattern RecognitionJ. Elder

Sum-of-Squares Error Function10Probability & Bayesian InferenceCSE 4404/5327 Introduction to Machine Learning and Pattern RecognitionJ. Elder

1st Order Polynomial11Probability & Bayesian InferenceCSE 4404/5327 Introduction to Machine Learning and Pattern RecognitionJ. Elder

3rd Order Polynomial12Probability & Bayesian InferenceCSE 4404/5327 Introduction to Machine Learning and Pattern RecognitionJ. Elder

9th Order Polynomial13Probability & Bayesian InferenceCSE 4404/5327 Introduction to Machine Learning and Pattern RecognitionJ. Elder

RegularizationProbability & Bayesian Inference14 Penalize large coefficient valuesCSE 4404/5327 Introduction to Machine Learning and Pattern RecognitionJ. Elder

RegularizationProbability & Bayesian Inference159th Order PolynomialCSE 4404/5327 Introduction to Machine Learning and Pattern RecognitionJ. Elder

RegularizationProbability & Bayesian Inference169th Order PolynomialCSE 4404/5327 Introduction to Machine Learning and Pattern RecognitionJ. Elder

RegularizationProbability & Bayesian Inference179th Order PolynomialCSE 4404/5327 Introduction to Machine Learning and Pattern RecognitionJ. Elder

Probabilistic View of Curve FittingProbability & Bayesian Inference18 Why least squares?Model noise (deviation of data from model) asGaussian i.i.d.where ! !1is the precision of the noise.2"CSE 4404/5327 Introduction to Machine Learning and Pattern RecognitionJ. Elder

Maximum LikelihoodProbability & Bayesian Inference19 We determine wML by minimizing the squared error E(w).Thus least-squares regression reflects an assumption that thenoise is i.i.d. Gaussian.CSE 4404/5327 Introduction to Machine Learning and Pattern RecognitionJ. Elder

Maximum LikelihoodProbability & Bayesian Inference20 We determine wML by minimizing the squared error E(w). Now given wML, we can estimate the variance of the noise:CSE 4404/5327 Introduction to Machine Learning and Pattern RecognitionJ. Elder

Predictive Distribution21Probability & Bayesian InferenceGenerating functionObserved dataMaximum likelihood predictionPosterior over tCSE 4404/5327 Introduction to Machine Learning and Pattern RecognitionJ. Elder

MAP: A Step towards BayesProbability & Bayesian Inference22 Prior knowledge about probable values of w can be incorporated into theregression:Now the posterior over w is proportional to the product of the likelihoodtimes the prior:The result is to introduce a new quadratic term in w into the error functionto be minimized:Thus regularized (ridge) regression reflects a 0-mean isotropic Gaussianprior on the weights.CSE 4404/5327 Introduction to Machine Learning and Pattern RecognitionJ. Elder

Linear Regression TopicsProbability & Bayesian Inference23 What is linear regression?Example: polynomial curve fittingOther basis familiesSolving linear regression problemsRegularized regressionMultiple linear regressionBayesian linear regressionCSE 4404/5327 Introduction to Machine Learning and Pattern RecognitionJ. Elder

Gaussian BasesProbability & Bayesian Inference24 Gaussian basis functions:Think of these as interpolation functions.These are local:a small change in x affectsonly nearby basis functions. a small change in a basisfunction affects y only fornearby x. μj and s control locationand scale (width). CSE 4404/5327 Introduction to Machine Learning and Pattern RecognitionJ. Elder

Linear Regression TopicsProbability & Bayesian Inference25 What is linear regression?Example: polynomial curve fittingOther basis familiesSolving linear regression problemsRegularized regressionMultiple linear regressionBayesian linear regressionCSE 4404/5327 Introduction to Machine Learning and Pattern RecognitionJ. Elder

Maximum Likelihood and Linear Least SquaresProbability & Bayesian Inference26 Assume observations from a deterministic function withadded Gaussian noise:where which is the same as saying,Given observed inputs,, andtargets,we obtain the likelihoodfunctionCSE 4404/5327 Introduction to Machine Learning and Pattern RecognitionJ. Elder

Maximum Likelihood and Linear Least SquaresProbability & Bayesian Inference27 Taking the logarithm, we get where is the sum-of-squares error.CSE 4404/5327 Introduction to Machine Learning and Pattern RecognitionJ. Elder

Maximum Likelihood and Least SquaresProbability & Bayesian Inference28 Computing the gradient and setting it to zero yields Solving for w, we get whereThe Moore-Penrosepseudo-inverse,.CSE 4404/5327 Introduction to Machine Learning and Pattern RecognitionJ. Elder

End of Lecture 8

Linear Regression TopicsProbability & Bayesian Inference30 What is linear regression?Example: polynomial curve fittingOther basis familiesSolving linear regression problemsRegularized regressionMultiple linear regressionBayesian linear regressionCSE 4404/5327 Introduction to Machine Learning and Pattern RecognitionJ. Elder

Regularized Least SquaresProbability & Bayesian Inference31 Consider the error function:Data term Regularization term With the sum-of-squares error function and aquadratic regularizer, we getwhich is minimized byλ is called theregularizationcoefficient.Thus the name ‘ridge regression’CSE 4404/5327 Introduction to Machine Learning and Pattern RecognitionJ. Elder

Regularized Least SquaresProbability & Bayesian Inference32 With a more general regularizer, we haveLassoQuadratic(Least absolute shrinkage and selection operator)CSE 4404/5327 Introduction to Machine Learning and Pattern RecognitionJ. Elder

Regularized Least SquaresProbability & Bayesian Inference33 Lasso generates sparse solutions.Iso-contoursof data term ED(w)Iso-contour ofregularization term EW(w)QuadraticLassoCSE 4404/5327 Introduction to Machine Learning and Pattern RecognitionJ. Elder

Solving Regularized SystemsProbability & Bayesian Inference34 Quadratic regularization has the advantage thatthe solution is closed form.Non-quadratic regularizers generally do not haveclosed form solutionsLasso can be framed as minimizing a quadraticerror with linear constraints, and thus represents aconvex optimization problem that can be solved byquadratic programming or other convexoptimization methods.We will discuss quadratic programming when wecover SVMsCSE 4404/5327 Introduction to Machine Learning and Pattern RecognitionJ. Elder

Linear Regression TopicsProbability & Bayesian Inference35 What is linear regression?Example: polynomial curve fittingOther basis familiesSolving linear regression problemsRegularized regressionMultiple linear regressionBayesian linear regressionCSE 4404/5327 Introduction to Machine Learning and Pattern RecognitionJ. Elder

Multiple OutputsProbability & Bayesian Inference36 Analogous to the single output case we have:Given observed inputstargetswe obtain the log likelihood function, andCSE 4404/5327 Introduction to Machine Learning and Pattern RecognitionJ. Elder

Multiple OutputsProbability & Bayesian Inference37 Maximizing with respect to W, we obtain If we consider a single target variable, tk, we see that wheresingle output case., which is identical with theCSE 4404/5327 Introduction to Machine Learning and Pattern RecognitionJ. Elder

Some Useful MATLAB FunctionsProbability & Bayesian Inference38 polyfit Least-squaresfit of a polynomial of specified order togiven data regress Moregeneral function that computes linear weights forleast-squares fitCSE 4404/5327 Introduction to Machine Learning and Pattern RecognitionJ. Elder

Linear Regression TopicsProbability & Bayesian Inference39 What is linear regression?Example: polynomial curve fittingOther basis familiesSolving linear regression problemsRegularized regressionMultiple linear regressionBayesian linear regressionCSE 4404/5327 Introduction to Machine Learning and Pattern RecognitionJ. Elder

Bayesian Linear RegressionRev. Thomas Bayes, 1702 - 1761

Bayesian Linear RegressionProbability & Bayesian Inference41 Define a conjugate prior over w:Combining this with the likelihood function and usingresults for marginal and conditional Gaussiandistributions, gives the posterior whereCSE 4404/5327 Introduction to Machine Learning and Pattern RecognitionJ. Elder

Bayesian Linear RegressionProbability & Bayesian Inference42 A common choice for the prior is for which Thus mN represents the ridge regression solution with! " /# Next we consider an example CSE 4404/5327 Introduction to Machine Learning and Pattern RecognitionJ. Elder

Bayesian Linear Regression43Probability & Bayesian Inference0 data points observedPriorData SpaceCSE 4404/5327 Introduction to Machine Learning and Pattern RecognitionJ. Elder

Bayesian Linear RegressionProbability & Bayesian Inference441 data point observedLikelihood for (x1,t1)PosteriorData SpaceCSE 4404/5327 Introduction to Machine Learning and Pattern RecognitionJ. Elder

Bayesian Linear RegressionProbability & Bayesian Inference452 data points observedLikelihood for (x2,t2)PosteriorData SpaceCSE 4404/5327 Introduction to Machine Learning and Pattern RecognitionJ. Elder

Bayesian Linear RegressionProbability & Bayesian Inference4620 data points observedLikelihood for (x20,t20)PosteriorData SpaceCSE 4404/5327 Introduction to Machine Learning and Pattern RecognitionJ. Elder

Predictive DistributionProbability & Bayesian Inference47 Predict t for new values of x by integrating over w: whereCSE 4404/5327 Introduction to Machine Learning and Pattern RecognitionJ. Elder

Predictive DistributionProbability & Bayesian Inference48 Example: Sinusoidal data, 9 Gaussian basis functions,1 data pointNotice how much bigger our uncertainty isrelative to the ML method!!(p t t,! , "Samples of y(x,w))E # t t,! , " %&CSE 4404/5327 Introduction to Machine Learning and Pattern RecognitionJ. Elder

Predictive DistributionProbability & Bayesian Inference49 Example: Sinusoidal data, 9 Gaussian basis functions,2 data pointsE # t t,! , " %&(p t t,! , ")Samples of y(x,w)CSE 4404/5327 Introduction to Machine Learning and Pattern RecognitionJ. Elder

Predictive DistributionProbability & Bayesian Inference50 Example: Sinusoidal data, 9 Gaussian basis functions,4 data pointsE # t t,! , " %&(p t t,! , ")Samples of y(x,w)CSE 4404/5327 Introduction to Machine Learning and Pattern RecognitionJ. Elder

Predictive DistributionProbability & Bayesian Inference51 Example: Sinusoidal data, 9 Gaussian basis functions,25 data pointsE # t t,! , " %&(p t t,! , ")Samples of y(x,w)CSE 4404/5327 Introduction to Machine Learning and Pattern RecognitionJ. Elder

Equivalent KernelProbability & Bayesian Inference52 The predictive mean can be writtenEquivalent kernel orsmoother matrix. This is a weighted sum of the training data targetvalues, tn.CSE 4404/5327 Introduction to Machine Learning and Pattern RecognitionJ. Elder

Equivalent Kernel53Probability & Bayesian InferenceWeight of tn depends on distance between x and xn;nearby xn carry more weight.CSE 4404/5327 Introduction to Machine Learning and Pattern RecognitionJ. Elder

Linear Regression TopicsProbability & Bayesian Inference54 What is linear regression?Example: polynomial curve fittingOther basis familiesSolving linear regression problemsRegularized regressionMultiple linear regressionBayesian linear regressionCSE 4404/5327 Introduction to Machine Learning and Pattern RecognitionJ. Elder

Probability & Bayesian Inference CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition J. Elder 3 Linear Regression Topics What is linear regression? Example: polynomial curve fitting Other basis families Solving linear regression problems Regularized regression Multiple linear regression

Related Documents:

independent variables. Many other procedures can also fit regression models, but they focus on more specialized forms of regression, such as robust regression, generalized linear regression, nonlinear regression, nonparametric regression, quantile regression, regression modeling of survey data, regression modeling of

Its simplicity and flexibility makes linear regression one of the most important and widely used statistical prediction methods. There are papers, books, and sequences of courses devoted to linear regression. 1.1Fitting a regression We fit a linear regression to covariate/response data. Each data point is a pair .x;y/, where

LINEAR REGRESSION 12-2.1 Test for Significance of Regression 12-2.2 Tests on Individual Regression Coefficients and Subsets of Coefficients 12-3 CONFIDENCE INTERVALS IN MULTIPLE LINEAR REGRESSION 12-3.1 Confidence Intervals on Individual Regression Coefficients 12-3.2 Confidence Interval

Multiple Linear Regression Linear relationship developed from more than 1 predictor variable Simple linear regression: y b m*x y β 0 β 1 * x 1 Multiple linear regression: y β 0 β 1 *x 1 β 2 *x 2 β n *x n β i is a parameter estimate used to generate the linear curve Simple linear model: β 1 is the slope of the line

Lecture 9: Linear Regression. Goals Linear regression in R Estimating parameters and hypothesis testing with linear models Develop basic concepts of linear regression from a probabilistic framework. Regression Technique used for the modeling and analysis of numerical dataFile Size: 834KB

Linear Regression and Correlation Introduction Linear Regression refers to a group of techniques for fitting and studying the straight-line relationship between two variables. Linear regression estimates the regression coefficients β 0 and β 1 in the equation Y j β 0 β 1 X j ε j wh

New York Buffalo 14210 New York Buffalo 14211 New York Buffalo 14212 New York Buffalo 14215 New York Buffalo 14217 New York Buffalo 14218 New York Buffalo 14222 New York Buffalo 14227 New York Burlington Flats 13315 New York Calcium 13616 New York Canajoharie 13317 New York Canaseraga 14822 New York Candor 13743 New York Cape Vincent 13618 New York Carthage 13619 New York Castleton 12033 New .

7 COMMERCE (Degree in Finance / Commerce) 10-11 8 ARTS/HUMANITIES (Degrees in Humanities and Social Sciences) 12-13 PROFESIONAL CAREERS AFTER 12th (PCAT) General Degree Courses available after any stream in 12th 9 LAW 14-15 10 BUSINESS MANAGEMENT 16-17 11 HOTEL MANAGEMENT 18-19 12 LIBERAL STUDIES 20 13 MASS COMMUNICATION 21 14 ECONOMICS 22