Least-Squares Regression

1y ago

3 Views

2 Downloads

856.31 KB

12 Pages

Last View : 1m ago

Last Download : 3m ago

Upload by : Macey Ridenour

Report this link

Download PDF

Transcription

CHAPTER17Least-Squares RegressionWhere substantial error is associated with data, polynomial interpolation is inappropriateand may yield unsatisfactory results when used to predict intermediate values. Experimental data are often of this type. For example, Fig. 17.1a shows seven experimentallyderived data points exhibiting signiicant variability. Visual inspection of these data suggests a positive relationship between y and x. That is, the overall trend indicates thathigher values of y are associated with higher values of x. Now, if a sixth-order interpolating polynomial is itted to these data (Fig. 17.1b), it will pass exactly through all ofthe points. However, because of the variability in these data, the curve oscillates widelyin the interval between the points. In particular, the interpolated values at x 5 1.5 andx 5 6.5 appear to be well beyond the range suggested by these data.A more appropriate strategy for such cases is to derive an approximating functionthat its the shape or general trend of the data without necessarily matching the individual points. Figure 17.1c illustrates how a straight line can be used to generally characterize the trend of these data without passing through any particular point.One way to determine the line in Fig. 17.1c is to visually inspect the plotted dataand then sketch a “best” line through the points. Although such “eyeball” approacheshave commonsense appeal and are valid for “back-of-the-envelope” calculations, they aredeicient because they are arbitrary. That is, unless the points deine a perfect straightline (in which case, interpolation would be appropriate), different analysts would drawdifferent lines.To remove this subjectivity, some criterion must be devised to establish a basis forthe it. One way to do this is to derive a curve that minimizes the discrepancy betweenthe data points and the curve. A technique for accomplishing this objective, called leastsquares regression, will be discussed in the present chapter.17.1LINEAR REGRESSIONThe simplest example of a least-squares approximation is itting a straight line to a setof paired observations: (x1, y1), (x2, y2), . . . , (xn, yn). The mathematical expression forthe straight line isy 5 a0 1 a1x 1 e456(17.1)[80]

17.1457LINEAR REGRESSIONy5005x5x5x(a)y500(b)y5FIGURE 17.1(a) Data exhibiting signiﬁcanterror. (b) Polynomial ﬁtoscillating beyond the range ofthe data. (c) More satisfactoryresult using the least-squares ﬁt.00(c)where a0 and a1 are coeficients representing the intercept and the slope, respectively,and e is the error, or residual, between the model and the observations, which can berepresented by rearranging Eq. (17.1) ase 5 y 2 a0 2 a1xThus, the error, or residual, is the discrepancy between the true value of y and the approximate value, a0 1 a1x, predicted by the linear equation.[81]

458LEAST-SQUARES REGRESSION17.1.1 Criteria for a “Best” FitOne strategy for itting a “best” line through the data would be to minimize the sum ofthe residual errors for all the available data, as innna ei 5 a (yi 2 a0 2 a1 xi )i51(17.2)i51where n 5 total number of points. However, this is an inadequate criterion, as illustratedby Fig. 17.2a which depicts the it of a straight line to two points. Obviously, the bestFIGURE 17.2Examples of some criteria for “best ﬁt” that are inadequate for regression: (a) minimizes the sumof the residuals, (b) minimizes the sum of the absolute values of the residuals, and (c) minimizesthe maximum error of any individual point.yMidpointx(a)yx(b)yOutlierx(c)[82]

17.1459LINEAR REGRESSIONit is the line connecting the points. However, any straight line passing through the midpoint of the connecting line (except a perfectly vertical line) results in a minimum valueof Eq. (17.2) equal to zero because the errors cancel.Therefore, another logical criterion might be to minimize the sum of the absolutevalues of the discrepancies, as innna Zei Z 5 a Zyi 2 a0 2 a1xi Zi51i51Figure 17.2b demonstrates why this criterion is also inadequate. For the four pointsshown, any straight line falling within the dashed lines will minimize the sum of theabsolute values. Thus, this criterion also does not yield a unique best it.A third strategy for fitting a best line is the minimax criterion. In this technique,the line is chosen that minimizes the maximum distance that an individual pointfalls from the line. As depicted in Fig. 17.2c, this strategy is ill-suited for regression because it gives undue influence to an outlier, that is, a single point with alarge error. It should be noted that the minimax principle is sometimes well-suitedfor fitting a simple function to a complicated function (Carnahan, Luther, andWilkes, 1969).A strategy that overcomes the shortcomings of the aforementioned approaches is tominimize the sum of the squares of the residuals between the measured y and the ycalculated with the linear modelnnnSr 5 a e2i 5 a (yi, measured 2 yi, model ) 2 5 a (yi 2 a0 2 a1xi ) 2i51i51(17.3)i51This criterion has a number of advantages, including the fact that it yields a unique linefor a given set of data. Before discussing these properties, we will present a techniquefor determining the values of a0 and a1 that minimize Eq. (17.3).17.1.2 Least-Squares Fit of a Straight LineTo determine values for a0 and a1, Eq. (17.3) is differentiated with respect to each coeficient:0Sr5 22 a (yi 2 a0 2 a1xi )0a00Sr5 22 a [(yi 2 a0 2 a1xi )xi ]0a1Note that we have simpliied the summation symbols; unless otherwise indicated, allsummations are from i 5 1 to n. Setting these derivatives equal to zero will result in aminimum Sr. If this is done, the equations can be expressed as0 5 a yi 2 a a0 2 a a1xi0 5 a yi xi 2 a a0 xi 2 a a1xi2[83]

460LEAST-SQUARES REGRESSIONNow, realizing that Sa0 5 na0, we can express the equations as a set of two simultaneous linear equations with two unknowns (a0 and a1):()(na0 1 a xi a1 5 a yi2a xi a0 1 a x i a1 5 a xi yiThese are called the normal equations. They can be solved simultaneously(a1 5))no xi yi 2 o xi o yi(17.4)(17.5)(17.6)no xi2 2 ( o xi ) 2This result can then be used in conjunction with Eq. (17.4) to solve fora0 5 y 2 a1x(17.7)where y and x are the means of y and x, respectively.EXAMPLE 17.1Linear RegressionProblem Statement.of Table 17.1.Fit a straight line to the x and y values in the irst two columnsSolution. The following quantities can be computed:2a xi yi 5 119.5a xi 5 14028x554a xi 5 28724y55 3.428571a yi 5 247Using Eqs. (17.6) and (17.7),n57a1 57(119.5) 2 28(24)7(140) 2 (28) 25 0.8392857a0 5 3.428571 2 0.8392857(4) 5 0.07142857TABLE 17.1 Computations for an error analysis of the linear ﬁt.xiyi(yi 2 y )(yi 2 a0 2 a1xi 730.32650.58960.79720.19932.9911[84]

17.1461LINEAR REGRESSIONTherefore, the least-squares it isy 5 0.07142857 1 0.8392857xThe line, along with the data, is shown in Fig. 17.1c.17.1.3 Quantiﬁcation of Error of Linear RegressionAny line other than the one computed in Example 17.1 results in a larger sum of thesquares of the residuals. Thus, the line is unique and in terms of our chosen criterion isa “best” line through the points. A number of additional properties of this it can beelucidated by examining more closely the way in which residuals were computed. Recallthat the sum of the squares is deined as [Eq. (17.3)]nnSr 5 a e2i 5 a (yi 2 a0 2 a1xi ) 2i51(17.8)i51Notice the similarity between Eqs. (PT5.3) and (17.8). In the former case, the squareof the residual represented the square of the discrepancy between the data and a singleestimate of the measure of central tendency—the mean. In Eq. (17.8), the square of theresidual represents the square of the vertical distance between the data and another measure of central tendency—the straight line (Fig. 17.3).The analogy can be extended further for cases where (1) the spread of the pointsaround the line is of similar magnitude along the entire range of the data and (2) thedistribution of these points about the line is normal. It can be demonstrated that if thesecriteria are met, least-squares regression will provide the best (that is, the most likely)estimates of a0 and a1 (Draper and Smith, 1981). This is called the maximum likelihoodFIGURE 17.3The residual in linear regression represents the vertical distance between a data point and thestraight line.yyiMeasurementeyi – a0 – a1xilinesgrRensioa0 a1xixi[85]x

462LEAST-SQUARES REGRESSIONprinciple in statistics. In addition, if these criteria are met, a “standard deviation” for theregression line can be determined as [compare with Eq. (PT5.2)]syyx 5SrAn 2 2(17.9)where syyx is called the standard error of the estimate. The subscript notation “yyx” designates that the error is for a predicted value of y corresponding to a particular value of x.Also, notice that we now divide by n 2 2 because two data-derived estimates—a0 anda1—were used to compute Sr; thus, we have lost two degrees of freedom. As with ourdiscussion of the standard deviation in PT5.2.1, another justiication for dividing by n 2 2is that there is no such thing as the “spread of data” around a straight line connecting twopoints. Thus, for the case where n 5 2, Eq. (17.9) yields a meaningless result of ininity.Just as was the case with the standard deviation, the standard error of the estimatequantiies the spread of the data. However, sy/x quantiies the spread around the regressionline as shown in Fig. 17.4b in contrast to the original standard deviation sy that quantiiedthe spread around the mean (Fig. 17.4a).The above concepts can be used to quantify the “goodness” of our it. This is particularly useful for comparison of several regressions (Fig. 17.5). To do this, we returnto the original data and determine the total sum of the squares around the mean for thedependent variable (in our case, y). As was the case for Eq. (PT5.3), this quantity isdesignated St. This is the magnitude of the residual error associated with the dependentvariable prior to regression. After performing the regression, we can compute Sr, the sumof the squares of the residuals around the regression line. This characterizes the residualerror that remains after the regression. It is, therefore, sometimes called the unexplainedFIGURE 17.4Regression data showing (a) the spread of the data around the mean of the dependent variableand (b) the spread of the data around the best-ﬁt line. The reduction in the spread in going from(a) to (b), as indicated by the bell-shaped curves at the right, represents the improvement due tolinear regression.(a)(b)[86]

17.1463LINEAR REGRESSIONyx(a)yx(b)FIGURE 17.5Examples of linear regression with (a) small and (b) large residual errors.EXAMPLE 17.2Estimation of Errors for the Linear Least-Squares FitProblem Statement. Compute the total standard deviation, the standard error of theestimate, and the correlation coeficient for the data in Example 17.1.Solution. The summations are performed and presented in Table 17.1. The standarddeviation is [Eq. (PT5.2)]sy 522.71435 1.9457A 721and the standard error of the estimate is [Eq. (17.9)]syyx 52.99115 0.7735A722[87]

468LEAST-SQUARES REGRESSION17.1.5 Linearization of Nonlinear RelationshipsLinear regression provides a powerful technique for itting a best line to data. However,it is predicated on the fact that the relationship between the dependent and independentvariables is linear. This is not always the case, and the irst step in any regressionanalysis should be to plot and visually inspect the data to ascertain whether a linearmodel applies. For example, Fig. 17.8 shows some data that is obviously curvilinear. Insome cases, techniques such as polynomial regression, which is described in Sec. 17.2,are appropriate. For others, transformations can be used to express the data in a formthat is compatible with linear regression.FIGURE 17.8(a) Data that are ill-suited for linear least-squares regression. (b) Indication that a parabola ispreferable.yx(a)yx(b)[88]

17.1469LINEAR REGRESSIONOne example is the exponential modely 5 a1e b1x(17.12)where a1 and b1 are constants. This model is used in many ields of engineering tocharacterize quantities that increase (positive b1) or decrease (negative b1) at a rate thatis directly proportional to their own magnitude. For example, population growth or radioactive decay can exhibit such behavior. As depicted in Fig. 17.9a, the equation represents a nonlinear relationship (for b1 ? 0) between y and x.Another example of a nonlinear model is the simple power equationy 5 a2 x b2(17.13)FIGURE 17.9(a) The exponential equation, (b) the power equation, and (c) the saturation-growth-rateequation. Parts (d ), (e), and (f ) are linearized versions of these equations that resultfrom simple transformations.yyy 1e 1xy 3 x 3 xy 2 x 2xxln yx(c)Linearization(b)Linearization(a)log y1/ySlope 2Slope 3 / 3Slope 1Intercept 1/ 3Intercept ln 1xlog x1/xIntercept log 2(d)Linearizationy(e)(f)[89]

470LEAST-SQUARES REGRESSIONwhere a2 and b2 are constant coeficients. This model has wide applicability in all ieldsof engineering. As depicted in Fig. 17.9b, the equation (for b2 ? 0 or 1) is nonlinear.A third example of a nonlinear model is the saturation-growth-rate equation [recallEq. (E17.3.1)]y 5 a3xb3 1 x(17.14)where a3 and b3 are constant coeficients. This model, which is particularly well-suited forcharacterizing population growth rate under limiting conditions, also represents a nonlinearrelationship between y and x (Fig. 17.9c) that levels off, or “saturates,” as x increases.Nonlinear regression techniques are available to it these equations to experimentaldata directly. (Note that we will discuss nonlinear regression in Sec. 17.5.) However, asimpler alternative is to use mathematical manipulations to transform the equations intoa linear form. Then, simple linear regression can be employed to it the equations to data.For example, Eq. (17.12) can be linearized by taking its natural logarithm to yieldln y 5 ln a1 1 b1x ln eBut because ln e 5 1,ln y 5 ln a1 1 b1x(17.15)Thus, a plot of ln y versus x will yield a straight line with a slope of b1 and an interceptof ln a1 (Fig. 17.9d).Equation (17.13) is linearized by taking its base-10 logarithm to givelog y 5 b2 log x 1 log a2(17.16)Thus, a plot of log y versus log x will yield a straight line with a slope of b2 and anintercept of log a2 (Fig. 17.9e).Equation (17.14) is linearized by inverting it to giveb3 11115ya3 xa3(17.17)Thus, a plot of 1Yy versus lYx will be linear, with a slope of b3Ya3 and an intercept of1Ya3 (Fig. 17.9f ).In their transformed forms, these models can use linear regression to evaluate theconstant coeficients. They could then be transformed back to their original state andused for predictive purposes. Example 17.4 illustrates this procedure for Eq. (17.13). Inaddition, Sec. 20.1 provides an engineering example of the same sort of computation.EXAMPLE 17.4Linearization of a Power EquationProblem Statement. Fit Eq. (17.13) to the data in Table 17.3 using a logarithmictransformation of the data.Solution. Figure 17.10a is a plot of the original data in its untransformed state. Figure17.10b shows the plot of the transformed data. A linear regression of the log-transformeddata yields the resultlog y 5 1.75 log x 2 0.300[90]

17.1471LINEAR REGRESSIONTABLE 17.3 Data to be ﬁt to the power equation.xylog xlog 2260.5340.7530.922FIGURE 17.10(a) Plot of untransformed data with the power equation that ﬁts these data. (b) Plot of transformeddata used to determine the coefﬁcients of the power equation.y505x0.5log x0(a)log y0.5(b)[91]

the i t. One way to do this is to derive a curve that minimizes the discrepancy between the data points and the curve. A technique for accomplishing this objective, called least-squares regression, will be discussed in the present chapter. 17.1 LINEAR REGRESSION The simplest example of a least-squares approximation is i tting a straight line to .

Related Documents:

AP Statistics sample audit syllabus

3.2 Least-squares regression, Interpreting a regression line, Prediction, Technology: Least-Squares Regression Lines on the Calculator Interpret the slope and y intercept of a least-squares regression line in context. Use the least-squares regression line to predict y f

10 Views

2y ago

Introduction to Regression Procedures

independent variables. Many other procedures can also ﬁt regression models, but they focus on more specialized forms of regression, such as robust regression, generalized linear regression, nonlinear regression, nonparametric regression, quantile regression, regression modeling of survey data, regression modeling of

161 Views

2y ago

Least Squares Approximation/Optimization - CSE SERVICES

Linear Least Squares ! Linear least squares attempts to find a least squares solution for an overdetermined linear system (i.e. a linear system described by an m x n matrix A with more equations than parameters). ! Least squares minimizes the squared Eucliden norm of the residual ! For data fitting on m data points using a linear

16 Views

1y ago

User’s Guide to the Weighted-Multiple-Linear Regression ...

ordinary-least-squares (OLS), weighted-least-squares (WLS), and generalized-least-squares (GLS). All three approaches are based on the minimization of the sum of squares of differ-ences between the gage values and the line or surface defined by the regression. The OLS approach is

28 Views

2y ago

Up: Previous: Subsections - CHERIC

Least-Squares Regression Lest-squares regression is drived from a curve that minimized the discrepancy between the data points and the curve. Linear Regression A least-squares approximation is fitting a straight line to a set of paired observation. The mathematical expression for the straight line is

2 Views

1y ago

Biometry Lecture 08 Simple Linear Regression

Linear Regression Linear regression with one predictor Assess the fit of a regression model –Total sum of squares –Model sum of squares –Residual sum of squares –R2 Test . Microsoft PowerPoint - Biometry Lec

43 Views

2y ago

Lecture 14 Multiple Linear Regression and Logistic Regression

LINEAR REGRESSION 12-2.1 Test for Significance of Regression 12-2.2 Tests on Individual Regression Coefficients and Subsets of Coefficients 12-3 CONFIDENCE INTERVALS IN MULTIPLE LINEAR REGRESSION 12-3.1 Confidence Intervals on Individual Regression Coefficients 12-3.2 Confidence Interval

93 Views

2y ago

STM32 F4 series - emcu

programming Interrupt handling Ultra-low power Cortex-M4 low power. STM32 F4 Series highlights 1/4 ST is introducing STM32 products based on Cortex M4 core. Over 30 new part numbersOver 30 new part numbers pin-to-pin and software compatiblepin and software compatible with existing STM32 F2 Series. Th DSP d FPU i t ti bi d tThe new DSP and FPU instructions combined to 168Mhz performance open .

197 Views

3y ago

Recent Views

MOOSIC PRE ORDER OFFER 2018

9781860960147 Jazz Piano Grade 5: The CD 22.92 17.24 18.76 19.83 9781860960154 Jazz Piano from Scratch 55.00 41.36 45.02 47.58 9781860960161 Jazz Piano Aural Tests, Grades 1-3 18.15 13.65 14.86 15.70 9781860960505 Jazz Piano Aural Tests, Grades 4-5 15.29 11.50 12.52 13.23 Easier Piano Pieces (ABRSM)

3y ago

95 Views

Bethel A.M.E. Annual Women's Day Celebration

Annual Women's Day Celebration Theme: Steadfast and Faithful Women 1993 Bethel African Methodi st Epi scopal Church Champaign, Illinois The Ministry Thi.! Rev. Sleven A. Jackson, Pastor The Rev. O.G. Monroe. Assoc, Minister The Rl. Rev. James Haskell Mayo l1 ishop, f7011rt h Episcop;l) District The Rev. Lewis E. Grady. Jr. Prc. i ding Elder . Cover design taken from: Book of Black Heroes .

3y ago

97 Views

Automotive - Siemens Digital Industries Software

of this system requires a new level of close integration between mechanical, electrical and thermal domains. It becomes necessary to have true multi-domain data exchange between engineering software tools to inform the system design from an early concept stage. At the most progressive automotive OEMs, thermal, electrical

3y ago

51 Views

PRESENTER BIOGRAPHIES

PRESENTER BIOGRAPHIES. MDPH Commissioner Remarks: Cheryl Bartlett, RN Commissioner . MA Department of Public Health . Cheryl Bartlett was named Commissioner of the Massachusetts Department of Public Health in June 2013. As Commissioner Ms. Bartlett chairs the newly appointed Prevention and Wellness Advisory Board, which oversees a 60 million Prevention Trust Fund – the first of its kind in .

3y ago

116 Views

2019 SPLUNK INC. Splunk Certification Certification Exam .

Sample Questions Test Blueprint Splunk Core Certified Consultant Test Blueprint Splunk Certification Exams Table of Contents Please note: Sample questions (where available) are provided to give candidates a general idea of the formatting and type of questions for each of the exams listed above. The test blueprints provide much

3y ago

73 Views

Programme Specification BSc Chemistry (2020-21 )

The BSc Chemistry degree aims to enhance your enthusiasm for chemistry and to provide an intellectually stimulating learning environment. You will gain extensive in-depth knowledge and understanding of chemistry and related subjects, as well as a comprehensive training in practical chemistry and an appreciation of the importance of the discipline in different contexts. The programme will .

3y ago

51 Views

Chimney - Robot Virtual Games

Chimney Junior Each Total Correct balls on the Chimney Each ball will give you points if it is equal to the color indicated by the cube. 40 80 Incorrect balls on the Chimney Each ball will take you points if it is not equal to the color indicated by the cube. -5 -10 Park the robot Robot stops on Finish Area and simulation stops.

3y ago

42 Views

Timeline of the Cold War - truman.library

Timeline of the Cold War 1945 Defeat of Germany and Japan February 4-11: Yalta Conference meeting of FDR, Churchill, Stalin - the 'Big Three' Soviet Union has control of Eastern Europe. The Cold War Begins May 8: VE Day - Victory in Europe. Germany surrenders to the Red Army in Berlin July: Potsdam Conference - Germany was officially partitioned into four zones of occupation. August 6: The .

3y ago

253 Views

skinnytaste Cookbook Index

Naked Persian Turkey Burgers The Skinnytaste Cookbook Perfect Poultry 156 6 6 6 Orecchiette with Sausage, Baby Kale, and Bell Pepper The Skinnytaste Cookbook Perfect Poultry 181 11 11 4. RECIPE COOKBOOK CHAPTER PG SP Roasted Poblanos Rellenos with Chicken The Skinnytaste Cookbook Perfect Poultry 173 7 10 5

3y ago

71 Views

3-in-1 Cooking System - NinjaKitchen

5 ˆˇ 6 Getting to Know the Ninja 3-in-1 Cooking System Control Panel Function Dial Turn the dial to select Stovetop, Slow Cook or Oven mode. Stovetop - Use the Ninja 3-in-1 Cooking System as you would a stovetop.

3y ago

39 Views

BIOLOGY - Michigan

Credit for high school Earth Science, Biology, Physics, and Chemistry will be defined as meeting both essential and core subject area content expectations. Assessment Prerequisite Knowledge and Skills Basic Science Knowledge Orientation Towards Learning Reading, Writing, Communication Basic Mathematics Conventions, Probability, Statistics .

3y ago

27 Views

Investigatingrespiration*in*ectotherms(crickets)*

ets)*

Males"of" ud" chirpingsoundbyrubbingtheir forewingstogether;theydothisto p .

3y ago

34 Views

The Criminal Justice Response to Child Abuse: Lessons .

Rates of Criminal Justice Action on Investigated Cases Study Sample N Rate Tjaden & Thoennes, 1992 CPS 833 4% prosecuted Finkelhor, 1983 State clearing - house data 6096 24% criminal justice action taken Stroud, Martens & Barker, 2000 &KLOGUHQ¶V Advocacy Center 1043 56% referred to p rosecutors

3y ago

45 Views

Curriculum Adaptations for Exceptional Students

Adapting curriculum and instruction . The Center for School and Community Integration, Institute for the Study of Developmental Disabilities. Why do we want to use curriculum adaptations? Looking at learning in new and different ways. Get creative! EM 1.1.8 – Student understands concepts of

3y ago

29 Views

Brunei Darussalam In Brief - information.gov.bn

‘Brunei Darussalam In Brief’ is a publication where it discusses briefly on the socio-economic welfare of Brunei Darussalam in general. No part of this publication may be reproduced, stored in a retrieval system or transmitted in any form by any means without prior written permission from

3y ago

65 Views

Least-Squares Regression

It looks like you're using an ad-blocker