STA113: Probability And Statistics In Engineering

3y ago
52 Views
4 Downloads
1.19 MB
50 Pages
Last View : 1m ago
Last Download : 3m ago
Upload by : Shaun Edmunds
Transcription

Simple Linear Regression AnalysisMultiple Linear RegressionSTA113: Probability and Statistics inEngineeringLinear Regression Analysis - Chapters 12 and 13 in DevoreArtin ArmaganDepartment of Statistical ScienceNovember 18, 2009Armagan

Simple Linear Regression AnalysisMultiple Linear RegressionOutline1Simple Linear Regression AnalysisUsing simple regression to describe a linear relationshipInferences From a Simple Regression AnalysisAssessing the Fit of the Regression LinePrediction with a Sample Linear Regression Equation2Multiple Linear RegressionUsing Multiple Linear Regression to Explain aRelationshipInferences From a Multiple Regression AnalysisAssessing the Fit of the Regression LineComparing Two Regression ModelsMulticollinearityArmagan

Simple Linear Regression AnalysisMultiple Linear RegressionUsing simple regression to describe a linear relationshipInferences From a Simple Regression AnalysisAssessing the Fit of the Regression LinePrediction with a Sample Linear Regression EquationPurpose and FormulationRegression analysis is a statistical technique used todescribe relationships among variables.In the simplest case where bivariate data are observed,the simple linear regression is used.The variable that we are trying to model is referred to asthe dependent variable and often denoted by y .The variable that we are trying to explain y with is referredto as the independent or explanatory variable and oftendenoted by x.If a linear relationship between y and x is believed to exist,this relationship is expressed through an equation for aline:y b0 b1 xArmagan

Simple Linear Regression AnalysisMultiple Linear RegressionUsing simple regression to describe a linear relationshipInferences From a Simple Regression AnalysisAssessing the Fit of the Regression LinePrediction with a Sample Linear Regression EquationPurpose and FormulationAbove equation gives an exact or a deterministicrelationship meaning there exists no randomness.In this case recall that having only two pairs ofobservations (x, y ) would suffice to construct a line.However many things we observe have a randomcomponent to it which we try to understand throughvarious probability distributions.Armagan

Using simple regression to describe a linear relationshipInferences From a Simple Regression AnalysisAssessing the Fit of the Regression LinePrediction with a Sample Linear Regression EquationSimple Linear Regression AnalysisMultiple Linear RegressionExample 1212 1010 y 664 4 1 2y88 23456x1234xŷ 0.2 2.2xy 1 2xArmagan56

Simple Linear Regression AnalysisMultiple Linear RegressionUsing simple regression to describe a linear relationshipInferences From a Simple Regression AnalysisAssessing the Fit of the Regression LinePrediction with a Sample Linear Regression EquationLeast Squares Criterion to Fit a LineWe need to specify a method to find the “best” fitting line tothe observed data.When we pass a line through the the observations, therewill be differences between the actual observed values andthe values predicted by the fitted line. This difference ateach x value is called a residual and represents the “error”.It is only sensible to try to minimize the total error we makewhile fitting the line.The least squares criterionPminimizes the sum of squarederrors to fit a line, i.e. min ni 1 (yi ŷi )2 .Armagan

Simple Linear Regression AnalysisMultiple Linear RegressionUsing simple regression to describe a linear relationshipInferences From a Simple Regression AnalysisAssessing the Fit of the Regression LinePrediction with a Sample Linear Regression EquationLeast Squares Criterion to Fit a LineThis is a simple minimization problem and results in thefollowing expressions for b0 and b1 :Pni 1 (xi x̄)(yi ȳ )Pnb1 2i 1 (xi x̄)b0 ȳ b1 x̄PThese are simply obtained by differentiating ni 1 (yi ŷi )2(ŷi b0 b1 xi ) with respect to b0 and b1 and setting themequal to zero at the solution which leaves us with two equationsand two unknowns.Armagan

Simple Linear Regression AnalysisMultiple Linear RegressionUsing simple regression to describe a linear relationshipInferences From a Simple Regression AnalysisAssessing the Fit of the Regression LinePrediction with a Sample Linear Regression EquationPricing Communication Nodes55000 35000COST45000 25000 10 203040NUMPORTSˆArmagan506070

Using simple regression to describe a linear relationshipInferences From a Simple Regression AnalysisAssessing the Fit of the Regression LinePrediction with a Sample Linear Regression EquationSimple Linear Regression AnalysisMultiple Linear Regression300000Estimating Residential Real Estate Values 200000 50000100000VALUE 1000 20003000SIZEˆArmagan4000

Simple Linear Regression AnalysisMultiple Linear RegressionUsing simple regression to describe a linear relationshipInferences From a Simple Regression AnalysisAssessing the Fit of the Regression LinePrediction with a Sample Linear Regression EquationAssumptionsIt is assumed that there exists a linear deterministicrelationship between x and the mean of y , µy x :µy x β0 β1 xSince the actual observations deviate from this line, weneed to add a noise term givingyi β0 β1 xi ei .The expected value of this error term is zero: E(ei ) 0.The variance of each ei is equal to σe2 . This assumptionssuggests a constant variance along the regression line.The ei are normally distributed.The ei are independent.Armagan

Simple Linear Regression AnalysisMultiple Linear RegressionUsing simple regression to describe a linear relationshipInferences From a Simple Regression AnalysisAssessing the Fit of the Regression LinePrediction with a Sample Linear Regression EquationInferences about β0 and β1The point estimates of β0 and β1 are justified by the leastsquares criterion such that b0 and b1 minimize the sum ofsquared errors for the observed sample.It should be also noted that, under the assumptions madeearlier, the maximum likelihood estimator for β0 and β1 isidentical to the least squares estimator.Recall that a statistic is a function of a sample (which is arealization of a random variable), thus is a random variableitself. b0 and b1 have sampling distributions.Armagan

Simple Linear Regression AnalysisMultiple Linear RegressionUsing simple regression to describe a linear relationshipInferences From a Simple Regression AnalysisAssessing the Fit of the Regression LinePrediction with a Sample Linear Regression EquationSampling Distribution of b0E(b0 ) β0Var (b0 ) σe2 1n x̄ 22 2(xi 1 i x̄ ) PnThe sampling distribution of b0 is normal.Armagan

Simple Linear Regression AnalysisMultiple Linear RegressionUsing simple regression to describe a linear relationshipInferences From a Simple Regression AnalysisAssessing the Fit of the Regression LinePrediction with a Sample Linear Regression EquationSampling Distribution of b1E(b1 ) β1Var (b0 ) σe22 2(xi 1 i x̄ )PnThe sampling distribution of b1 is normal.Armagan

Simple Linear Regression AnalysisMultiple Linear RegressionUsing simple regression to describe a linear relationshipInferences From a Simple Regression AnalysisAssessing the Fit of the Regression LinePrediction with a Sample Linear Regression EquationProperties of b0 and b1b0 and b1 are unbiased estimators for β0 and β1b0 and b1 are consistent estimators for β0 and β1b0 and b1 are minimum variance unbiased estimators forβ0 and β1 . That said, they have smaller sampling errorsthan any other unbiased estimator for β0 and β1 .Armagan

Simple Linear Regression AnalysisMultiple Linear RegressionUsing simple regression to describe a linear relationshipInferences From a Simple Regression AnalysisAssessing the Fit of the Regression LinePrediction with a Sample Linear Regression EquationEstimating σe2The sampling distributions of b0 and b1 are normal whenσe2 is known.In realistic cases we won’t know σ22 .An unbiased estimate of σe2 is given byPn(yi ŷi )2SSE2se i 1 MSEn 2n 2where ŷi b0 b1 xi .Substituting se for σe earlier, sb0 and sb1 can be obtained.Armagan

Simple Linear Regression AnalysisMultiple Linear RegressionUsing simple regression to describe a linear relationshipInferences From a Simple Regression AnalysisAssessing the Fit of the Regression LinePrediction with a Sample Linear Regression EquationConstructing Confidence Intervals for β0 and β1Now that σe is not known, the sampling distributions of b001and b1 are t, i.e. b0s β tn 2 and b1s β tn 2 .bb01(1 α)100% confidence intervals then can be constructedas(b0 tα/2,n 2 sb0, b0 tα/2,n 2 sb0 )(b1 tα/2,n 2 sb1, b1 tα/2,n 2 sb1 ).Armagan

Simple Linear Regression AnalysisMultiple Linear RegressionUsing simple regression to describe a linear relationshipInferences From a Simple Regression AnalysisAssessing the Fit of the Regression LinePrediction with a Sample Linear Regression EquationHypothesis tests about β0 and β1Conducting a hypothesis test is no more involved thanconstructing a confidence interval. We make use of the1which is t distributed.same pivotal quantity, b1s βb1Since we often include the intercept in our model anyway,a hypothesis test on β0 may be redundant. Our main goalis to see whether there exists a linear relationship betweenthe two variables which is implied by the slope, β1 .We first state the null and alternative hypotheses:H0 : β1 ( , )β1 Ha : β1 6 ( , )β1 Armagan

Simple Linear Regression AnalysisMultiple Linear RegressionUsing simple regression to describe a linear relationshipInferences From a Simple Regression AnalysisAssessing the Fit of the Regression LinePrediction with a Sample Linear Regression EquationHypothesis tests about β0 and β1To test this hypothesis, a t statistic is used, t b1 β1 sb1 .A significance level, α, is specified to decide whether or notreject the null hypothesis.Possible alternative hypotheses and correspondingdecision rules areAlternative Decision RuleHa : β1 6 β1 Reject H0 if t tα/2,n 2Ha : β1 β1 Reject H0 if t tα,n 2Ha : β1 β1 Reject H0 if t tα,n 2Armagan

Simple Linear Regression AnalysisMultiple Linear RegressionUsing simple regression to describe a linear relationshipInferences From a Simple Regression AnalysisAssessing the Fit of the Regression LinePrediction with a Sample Linear Regression EquationPricing Communication NodesIn recent years the growth of data communications networks has beenamazing. The convenience and capabilities afforded by such networks areappealing to businesses with locations scattered throughout the US and theworld. Using networks allows centralization of an information system withaccess through personal computers at remote locations. The cost of adding anew communications node at a location not currently included in the networkwas of concern for a major Fort Worth manufacturing company. To try topredict the price of new communications nodes, data were obtained on asample of existing nodes. The installation cost and the number of portsavailable for access in each existing node were readily available information.(Applied Regression Analysis by Dielman)Armagan

Simple Linear Regression AnalysisMultiple Linear RegressionUsing simple regression to describe a linear relationshipInferences From a Simple Regression AnalysisAssessing the Fit of the Regression LinePrediction with a Sample Linear Regression EquationPricing Communication NodesArmagan

Simple Linear Regression AnalysisMultiple Linear RegressionUsing simple regression to describe a linear relationshipInferences From a Simple Regression AnalysisAssessing the Fit of the Regression LinePrediction with a Sample Linear Regression EquationPricing Communication NodesArmagan

Simple Linear Regression AnalysisMultiple Linear RegressionANOVAArmaganUsing simple regression to describe a linear relationshipInferences From a Simple Regression AnalysisAssessing the Fit of the Regression LinePrediction with a Sample Linear Regression Equation

Simple Linear Regression AnalysisMultiple Linear RegressionUsing simple regression to describe a linear relationshipInferences From a Simple Regression AnalysisAssessing the Fit of the Regression LinePrediction with a Sample Linear Regression EquationThe Coeficient of DeterminationIn an exact or deterministic relationship, SSR SST andSSE 0. This would imply that a straight line could bedrawn through each observed value.Since this is not the case in real life, we need a a measureof how well the regression line fits the data.The coefficient of determination gives the proportion oftotal variation explained in the response by the regressionline and is denoted by R 2 .R2 ArmaganSSRSST

Simple Linear Regression AnalysisMultiple Linear RegressionUsing simple regression to describe a linear relationshipInferences From a Simple Regression AnalysisAssessing the Fit of the Regression LinePrediction with a Sample Linear Regression EquationThe Correlation CoefficientFor simplelinear regression the correlation coefficient is r R2.This does not apply to multiple linear regression.If the sign of r is positive, then the relationship between thevariables is direct, otherwise is inverse.r ranges between 1 and 1.A correlation of 0 merely implies no linear relationship.Armagan

Simple Linear Regression AnalysisMultiple Linear RegressionUsing simple regression to describe a linear relationshipInferences From a Simple Regression AnalysisAssessing the Fit of the Regression LinePrediction with a Sample Linear Regression EquationThe F StatisticAn additional measure of how well the regression line fitsthe data is provided by the F statistic, which tests whetherthe equation ŷ b0 b1 x provides a better fit to the datathan the equation ŷ ȳ .F MSRMSEwhere MSR SSR/1 and MSE SSE/(n 2).The degrees of freedom corresponding to SSR and SSEadd up to the total degrees of freedom, n 1.Armagan

Simple Linear Regression AnalysisMultiple Linear RegressionUsing simple regression to describe a linear relationshipInferences From a Simple Regression AnalysisAssessing the Fit of the Regression LinePrediction with a Sample Linear Regression EquationThe F StatisticTo formalize the use of F statistic, consider the hypothesesH0 : β1 0 vs. Ha : β1 6 0.We reject H0 if F Fα,1,n 2 .For simple linear regression, F MSRMSE t 2.Since both a t-test and an F -test will yield the sameconclusions, it doesn’t matter which one we use.Armagan

Simple Linear Regression AnalysisMultiple Linear RegressionUsing simple regression to describe a linear relationshipInferences From a Simple Regression AnalysisAssessing the Fit of the Regression LinePrediction with a Sample Linear Regression EquationPricing Communication NodesArmagan

Simple Linear Regression AnalysisMultiple Linear RegressionUsing simple regression to describe a linear relationshipInferences From a Simple Regression AnalysisAssessing the Fit of the Regression LinePrediction with a Sample Linear Regression EquationPricing Communication NodesArmagan

Simple Linear Regression AnalysisMultiple Linear RegressionUsing simple regression to describe a linear relationshipInferences From a Simple Regression AnalysisAssessing the Fit of the Regression LinePrediction with a Sample Linear Regression EquationPricing Communication NodesArmagan

Using simple regression to describe a linear relationshipInferences From a Simple Regression AnalysisAssessing the Fit of the Regression LinePrediction with a Sample Linear Regression EquationSimple Linear Regression AnalysisMultiple Linear RegressionWhat Makes a Prediction Interval Wider?The difference arises from the difference between thevariation in the mean of y and the variation in oneindividual y value. (xf x̄)2V(ȳf ) σe2 n1 (n 1)s2x (xf x̄)2V(yf ) σe2 1 n1 (n 1)s2xσe2se2Replacebywhen the error variance is not known andis to be estimated.Armagan

Simple Linear Regression AnalysisMultiple Linear RegressionUsing simple regression to describe a linear relationshipInferences From a Simple Regression AnalysisAssessing the Fit of the Regression LinePrediction with a Sample Linear Regression EquationAssessing the Quality of FitThe mean square deviation is used commonly.Pn(yi ŷi )2MSD i 1nhwhere nh is the size of the hold-out sample.Armagan

Simple Linear Regression AnalysisMultiple Linear RegressionUsing Multiple Linear Regression to Explain a RelationshipInferences From a Multiple Regression AnalysisAssessing the Fit of the Regression LineComparing Two Regression ModelsMulticollinearityFormulationIf a linear relationship between y and a set of xs isbelieved to exist, this relationship is expressed through anequation for a plane:y b0 b1 x1 b2 x2 b3 x3 . bp xpArmagan

Simple Linear Regression AnalysisMultiple Linear RegressionUsing Multiple Linear Regression to Explain a RelationshipInferences From a Multiple Regression AnalysisAssessing the Fit of the Regression LineComparing Two Regression ModelsMulticollinearityMeddicorp SalesMeddicorp Company sells medical supplies to hospitals, clinics, and doctors’offices. The company currently markets in three regions of the United States:the South, the West, and the Midwest. These regions are each divided intomany smaller sales territories. Meddicorp’s management is concerned withthe effectiveness of a new bonus program. This program is overseen byregional sales managers and provides bonuses to salespeople based onperformance. Management wants to know of the bonuses paid in 2003 wererelated to sales. In determining whether this relationship exists, they alsowant to take into account the effects of advertising. (Applied RegressionAnalysis by Dielman)Armagan

Simple Linear Regression AnalysisMultiple Linear RegressionMeddicorp SalesArmaganUsing Multiple Linear Regression to Explain a RelationshipInferences From a Multiple Regression AnalysisAssessing the Fit of the Regression LineComparing Two Regression ModelsMulticollinearity

Simple Linear Regression AnalysisMultiple Linear RegressionUsing Multiple Linear Regression to Explain a RelationshipInferences From a Multiple Regression AnalysisAssessing the Fit of the Regression LineComparing Two Regression ModelsMulticollinearityAssumptionsAssumptions are the same with simple linear regressionmodel. Thus the population regression equation is writtenasyi β0 β1 xi1 β2 xi2 β3 xi3 . βp xip eiArmagan

Simple Linear Regression AnalysisMultiple Linear RegressionUsing Multiple Linear Regression to Explain a RelationshipInferences From a Multiple Regression AnalysisAssessing the Fit of the Regression LineComparing Two Regression ModelsMulticollinearityInferences About the Population RegressionCoefficientsArmagan

Simple Linear Regression AnalysisMultiple Linear RegressionUsing Multiple Linear Regression to Explain a RelationshipInferences From a Multiple Regression AnalysisAssessing the Fit of the Regression LineComparing Two Regression ModelsMulticollinearity2R 2 and RadjAn R 2 value is computed as in the case of simple linear regression.Although it has a nice interpretation, it also has a drawback in the caseof multiple linear regression.Intuitively, R 2 will never decrease as we add more independent variableinto the model disregarding the fact that the variables being thrown in tothe model may be explaining an insignificant portion of the variation iny . And as far as we can tell, the closer R 2 is to 1, the better.This means, we have to somehow account for how many variables weinclude in our model. In other words, we need to somehow “penalize”for the number of variables included in the model.Always remember that, the simpler the model we come up with, thebetter.Armagan

Simple Linear Regression AnalysisMultiple Linear RegressionUsing Multiple Linear Regression to Explain a RelationshipInferences From a Multiple Regression AnalysisAssessing

STA113: Probability and Statistics in Engineering Linear Regression Analysis - Chapters 12 and 13 in Devore Artin Armagan Department of Statistical Science November 18, 2009 Armagan. Simple Linear Regression Analysis Multiple Linear Regression Outline 1 Simple Linear Regression Analysis

Related Documents:

Joint Probability P(A\B) or P(A;B) { Probability of Aand B. Marginal (Unconditional) Probability P( A) { Probability of . Conditional Probability P (Aj B) A;B) P ) { Probability of A, given that Boccurred. Conditional Probability is Probability P(AjB) is a probability function for any xed B. Any

SOLUTION MANUAL KEYING YE AND SHARON MYERS for PROBABILITY & STATISTICS FOR ENGINEERS & SCIENTISTS EIGHTH EDITION WALPOLE, MYERS, MYERS, YE. Contents 1 Introduction to Statistics and Data Analysis 1 2 Probability 11 3 Random Variables and Probability Distributions 29 4 Mathematical Expectation 45 5 Some Discrete Probability

Pros and cons Option A: - 80% probability of cure - 2% probability of serious adverse event . Option B: - 90% probability of cure - 5% probability of serious adverse event . Option C: - 98% probability of cure - 1% probability of treatment-related death - 1% probability of minor adverse event . 5

Chapter 4: Probability and Counting Rules 4.1 – Sample Spaces and Probability Classical Probability Complementary events Empirical probability Law of large numbers Subjective probability 4.2 – The Addition Rules of Probability 4.3 – The Multiplication Rules and Conditional P

Springer Texts in Statistics Alfred: Elements of Statistics for the Life and Social Sciences Berger: An Introduction to Probability and Stochastic Processes Bilodeau and Brenner:Theory of Multivariate Statistics Blom: Probability and Statistics: Theory and Applications Brockwell and Davis:Introduction to Times Series and Forecasting, Second Edition Chow and Teicher:Probability Theory .

Dr Jonathan Jordan MAS113 Introduction to Probability and Statistics. Introduction Set theory and probability Measure Motivation - the need for set theory and measures If you have studied probability at GCSE or A-level, you may have seen a de nition of probability like this:

mathematics to model randomness. Probability is the mathematical study of chance. Knowing the chance, or probability, of an event happening can be very useful. For example, insurance companies estimate the probability of an automobile accident happening. This . 890 CHAPTER 14 Probability and Statistics

others are just rough paths. Details are given in a document called the Hazard Directory. 1.3 Signals Most running lines have signals to control the trains. Generally, signals are operated from a signal box and have an identifying number displayed on them. Signals are usually attached to posts alongside the track but can also be found on overhead gantries or on the ground. Modern signals tend .