Linear Regression And Support Vector Regression

2y ago
14 Views
2 Downloads
842.89 KB
27 Pages
Last View : 1m ago
Last Download : 3m ago
Upload by : Kairi Hasson
Transcription

Linear Regression and SupportVector RegressionPaul Paisitkriangkraipaulp@cs.adelaide.edu.auThe University of Adelaide24 October 2012

Outlines Regression overviewLinear regressionSupport vector regressionMachine learning tools available

Regression OverviewCLUSTERING REGRESSION (THIS TALK) CLASSIFICATION -- -- K-means Decision tree Linear Regression Linear Discriminant Analysis Support Vector Regression Neural Networks Support Vector Machines BoostingGroup data based on theircharacteristicsSeparate data based on theirlabelsFind a model that can explainthe output given the input

Data processing flowchart (Incomeprediction)Transformed data Rawdata(noise/outlier removal)Sex AgeHei IncomeghtM201.725,000F301.655,0001.830,000 Feature extraction andselectionRegressionHeight and sexseem to beirrelevant.IncomePre-processingM ProcesseddataMis-entry (shouldhave been 25!!!) Age

Linear Regression Given data with n dimensional variables and 1target-variable (real number){(x1 , y1 ), (x 2 , y2 ),., (x m , ym )}Where x n , y The objective: Find a function f that returnsnthe best fit. f : Assume that the relationship between X and yis approximately linear. The model can berepresented as (w represents coefficients andb is an intercept)f (w1 ,., wn , b) y w x b

Linear Regression To find the best fit, we minimize the sum ofsquared errors Least square estimationmmmin ( yi yˆ i ) ( yi (w xi b)) 22i 1i 1 The solution can be found by solvingˆ ( X T X ) 1 X T Yw(By taking the derivative of the above objectivefunction w.r.t. w ) In MATLAB, the back-slash operator computesa least square solution.

Linear Regressionmmi 1i 1ˆ xi bˆ)) 2min ( yi yˆ i ) 2 ( yi (w To ovoid over-fitting, a regularization term canbe introduced (minimize a magnitude of w)– LASSO:mmin ( yi w xi b) 2 Ci 1n wj 1j mni 1j 122– Ridge regression: min ( yi w xi b) C w j

Support Vector Regression Find a function, f(x), with at most -deviationfrom the target yw xi b yi The problem can be written as aconvex optimization problemw 1 x i b yi ;C: trade off the complexityWhat if the problem is not feasible?We can introduce slack variables(similar to soft margin loss function). Income1min w 22s.t. yi w1 x i b ; yi w1 xi b AgeWe do not care about errors as long asthey are less than

Support Vector RegressionAssume linear parameterizationf (x, ) w x bOnly the point outside the region contribute to the finalcosty 1 2*xL ( y, f (x, )) max y f (x, ) ,0 9

Soft marginyGiven training data x i ,yi 1 i 1,., m 2*Minimize1 w 2 C2mx*( i i)i 1Under constraints yi (w x i ) b i *(w x) b y iii , * 0, i 1,., m i i10

How about a non-linear case?11

Linear versus Non-linear SVR Linear case – Map data into a higher dimensionalspace, e.g.,f : ( age , 2age 2 ) income Agey i w1 xi bIncomeIncomef : age income Non-linear casey i w1 xi w 2 2xi2 b

Dual problem Primal1min w 2 C2 Dualm ( i 1i 1 m( i i* )( j *j ) xi , x j 2max i , jm 1m* ( i i ) yi ( i i* ) i 1i 1 )*i yi ( w x i ) b i s.t. (w x i ) b yi i* , * 0, i 1,., m i ims.t. ( i i* ) 0; 0 i , i* Ci 1Primal variables: w for each feature dimDual variables: , * for each data pointComplexity: the dim of the input spaceComplexity: Number of support vectorsy 1 0 0 2*x

Kernel trick Linear: x, y Non-linear: ( x), ( y) K ( x, y)Note: No need to compute the mapping function, (.), explicitly. Instead, we use the kernel function.Commonly used kernels:TdK(x,y) (xy 1)- Polynomial kernels:- Radial basis function (RBF) kernels:K ( x, y) exp( Note: for RBF kernel, dim( (.)) is infinite12 2 x y 2 )

Dual problem for non-linear case Primal1min w 2 C2 Dualm ( i 1i )*i yi (w (x i )) b i s.t. (w (x i )) b yi i* , * 0, i 1,., mii Primal variables: w for each feature dimK(xi, xj) 1 m( i i* )( j *j ) (x i ), (x j ) 2max i , j 1 mm* ( i i ) yi ( i i* ) i 1i 1 ms.t. ( i i* ) 0; 0 i , i* Ci 1Dual variables: , * for each data pointComplexity: the dim of the input spaceComplexity: Number of support vectorsy 1 0 0 2*x

SVR ApplicationsOptical Character Recognition (OCR)A. J. Smola and B. Scholkopf, A Tutorial on Support Vector Regression, NeuroCOLT Technical Report TR-98-030

SVR Applications Stock price prediction

SVR Demo

WEKA and linear regression Software can be downloaded fromhttp://www.cs.waikato.ac.nz/ml/weka/ Data set used in this experiment: Computer hardware The objective is to predict CPU performance based onthese given attributes:––––––Machine cycle time in nanoseconds (MYCT)Minimum main memory in kilobytes (MMIN)Maximum main memory (MMAX)Cache memory in kilobytes (CACH)Minimum channels in units (CHMIN)Maximum channels in units (CHMAX) Output is expressed as a linear combination of theattributes. Each attribute has a specific weight.– Output w1a1 w2 a2 . wn an b

Evaluation Root mean-square error222ˆˆˆ( y1 y1 ) ( y2 y2 ) . ( ym ym )n Mean absolute error y1 yˆ1 y2 yˆ 2 . ym yˆ m n

WEKALoad data and normalize each attribute to [0, 1]Data visualization

WEKA (Linear regression)

WEKA (Linear Regression)Performance (72.8 x MYCT) (484.8 x MMIN) (355.6 x MMAX) (161.2 x CACH) (256.9 x CHMAX) – 53.9Main memory plays a more importantrole in the system performanceLarge Machine cycle time (MYCT) doesnot indicate the best performance

WEKA (linear SVR)Compare to Linear RegressionPerformance (72.8 x MYCT) (484.8 x MMIN) (355.6 x MMAX) (161.2 x CACH) (256.9 x CHMAX) – 53.9

WEKA (non-linear SVR)A list of supportvectors

WEKA (Performance comparison)MethodMean absolute errorRoot mean squarederrorLinear regression41.169.55SVR (Linear) C 1.035.078.8SVR (RBF) C 1.0,gamma 1.028.866.3Parameter C (for linear SVR) and C, (for non-linear SVR) need to be crossvalidated for a better performance.

Other Machine Learning tools Shogun toolbox (C )– http://www.shogun-toolbox.org/ Shark Machine Learning library (C )– http://shark-project.sourceforge.net/ Machine Learning in Python (Python)– http://pyml.sourceforge.net/ Machine Learning in Open CV2– http://opencv.willowgarage.com/wiki/ LibSVM, LibLinear, etc.

Linear Regression and Support Vector Regression Paul Paisitkriangkrai paulp@cs.a

Related Documents:

independent variables. Many other procedures can also fit regression models, but they focus on more specialized forms of regression, such as robust regression, generalized linear regression, nonlinear regression, nonparametric regression, quantile regression, regression modeling of survey data, regression modeling of

Why Vector processors Basic Vector Architecture Vector Execution time Vector load - store units and Vector memory systems Vector length - VLR Vector stride Enhancing Vector performance Measuring Vector performance SSE Instruction set and Applications A case study - Intel Larrabee vector processor

Its simplicity and flexibility makes linear regression one of the most important and widely used statistical prediction methods. There are papers, books, and sequences of courses devoted to linear regression. 1.1Fitting a regression We fit a linear regression to covariate/response data. Each data point is a pair .x;y/, where

Probability & Bayesian Inference CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition J. Elder 3 Linear Regression Topics What is linear regression? Example: polynomial curve fitting Other basis families Solving linear regression problems Regularized regression Multiple linear regression

LINEAR REGRESSION 12-2.1 Test for Significance of Regression 12-2.2 Tests on Individual Regression Coefficients and Subsets of Coefficients 12-3 CONFIDENCE INTERVALS IN MULTIPLE LINEAR REGRESSION 12-3.1 Confidence Intervals on Individual Regression Coefficients 12-3.2 Confidence Interval

PEAK PCAN-USB, PEAK PCAN-USB Pro, PEAK PCAN-PCI, PEAK PCAN-PCI Express, Vector CANboard XL, Vector CANcase XL, Vector CANcard X, Vector CANcard XL, Vector CANcard XLe, Vector VN1610, Vector VN1611, Vector VN1630, Vector VN1640, Vector VN89xx, Son-theim CANUSBlight, Sontheim CANUSB, S

Multiple Linear Regression Linear relationship developed from more than 1 predictor variable Simple linear regression: y b m*x y β 0 β 1 * x 1 Multiple linear regression: y β 0 β 1 *x 1 β 2 *x 2 β n *x n β i is a parameter estimate used to generate the linear curve Simple linear model: β 1 is the slope of the line

Lecture 9: Linear Regression. Goals Linear regression in R Estimating parameters and hypothesis testing with linear models Develop basic concepts of linear regression from a probabilistic framework. Regression Technique used for the modeling and analysis of numerical dataFile Size: 834KB