Pearson Product-moment Correlation

1y ago

13 Views

2 Downloads

2.11 MB

77 Pages

Last View : Today

Last Download : 3m ago

Upload by : Olive Grimm

Report this link

Download PDF

Transcription

Correlation, Simple Regression, andLow-order Multiple Regression.Pearson product-moment correlation is what we will usually mean by “correlation”.It describes the strength of a linear relationshipbetween x and y.

o oo ooo o oo Y oooo o oo oo o o o Xcorrelation 0.8

o o ooo oooo ooo oo Y oooo ooo ooo oo o ooo o Xcorrelation 0.55

oY o x x oxxxxxoxooxooo cor 0 cor 1 "Xcorrelation for all 18 points 0.707 correlation squared 0.5When points having a perfect correlation are mixed with an equalnumber of points having no correlation, and the two sets have samemean and variance for X and Y, correlation is 0.707. Correlationsquared (“amount of variance accounted for”) is 0.5.

Y o o o o oooooo o o oooooo oo "Xcorrelation 0.87 (due to one outlier in upper right)If domination by one case is not desired, can use the Spearmanrank correlation (correlation among ranks instead of actual values).

Yo o ooooooooooo"Xcorrelation 0 but there is a strong nonlinear relationshipThe Pearson correlation only detects linear relationships.

Y o o o ooooXcorrelation 0.9 but there is an exact nonlinear relationshipsuch as y "o " "

Standardizing X and Y to equalize their unitsfind SD for thex data setConvert eachx element toto its standardizedvalue (z)Then do the same for y. Now their units areon an equal footing, with mean 0, SD 1.

Covariance and correlation between two variablesx itself(can dosame for y)x vs. yThis is the Pearson product-moment correlation (the “standard” correlation)

CorrelationSkill forforecastsBasisof climate predictabilityliesNINO3in predictabilityof ENSOSkill of Cane-Zebiak model in prediction of SST in tropical irgoodCorrelation between forecast and obs

Lead time and forecast skillCorrelation between temperature and precipitationforecasts and their subsequent corresponding observationsWeather forecasts (frominitial conditions)Forecast 1Skill .9Potential sub-seasonalpredictability(correlation).8good .7.6.5fair .4.3poor .2.1zero 010203060Seasonal forecasts (fromboundary conditions)8090Forecast leadtime (days)

Skill of forecasts at different time ranges:1-2 day weather3-7 day weatherSecond week weatherThird week weatherFourth week weather1-month climate (day 1-31)1-month climate (day 15-45)3-month climate (day 15-99)goodfairpoor, but not zerovirtually zerovirtually zeropoor to fairpoor, but not zeropoor to fairAt shorter ranges, forecasts are based on initialconditions and skill deteriorates quickly with time.Skill gets better at long range for ample time-averaging,due to consistent boundary condition forcing

Approximate* Standard Error of a Zero Correlation Coefficient(as would be expected if X and Y are independent random data)Examples ofand critical values for 2-sidedsignificance at 0.05 level for various sample sizes nn1020501004000.330.230.140.100.05*For small n, true values of0.650.450.280.200.10Note: Forsignificance ofa correlation,z-distributionis used, ratherthan t-distribution, for anysample size.are slightly smaller.

Confidence intervals for a nonzero correlation (r) are smallerthan those for zero correlation, and are asymmetric suchthat the interval toward lower absolute values of r is larger.For example: for n 100 and r 0.35, 95% confidence intervalis 0.17 to 0.51. That is 0.35 minus 0.18, but 0.35 plus 0.16.(For r 0, it is 0 plus 0.20 and 0 minus 0.20 – a larger span.)Sampling distribution around a population correlation iscomputed using the Fisher r-to-Z transformation, then findinga symmetric confidence interval in Z, then finally convertingback to r.

The use of linear correlation for prediction:Simple Linear Regression(“simple” implies just one predictor;if more than one, is Multiple Linear Regression)Determination of a regression lineto fit points on the x vs. y scatterplot,so that if given a value of x, a “bestprediction” can be made for y.

A line in the x vs. y coordinate system has the formy a bxa is y-interceptb is slopeRegression line is defined such that thesum of the squares of the errors (thepredicted y vs. true y) is minimized.Such a line predicts y from x such that:For example, ifthen y will be predictedto be half as many SDs away from its mean as x.

Proof thatminimizes the squared errors.That is, proof that the slope (the “b” in y bx a) should be set tobe the correlation coefficient between y and x when y and x are instandardized (z) form where their means are zero and SDs are 1.The squared error to be minimized, where i ranges from 1 to n pairsof predicted versus actual values of y, iswhererefers to the predicted standardized value of y, andthe actual (observed) standardized value of y.Substitutingforleads to

Expanding the square inBecausewe get 1 for any variable, andthe expression to be minimized reduces toTo find what value of b minimizes the expression, set itsderivative to zero:We then see that,.

StreamflowSimple Regression Predictionof Streamflow from July RainfallDeviations of Observationsfrom Regression Predictions0errorerrorJuly rainfall(X1000(cm) lbs)July rainfall(X1000 lbs)(cm)

Simple regression prediction, standardized units:If we incorporate the physical units of x and y rather than thestandardized (z) version in SD units, we get:The above equation “tailors” the basic z relationshipby adjusting for (1) ratio of SD of y to SD of x, and (2)the difference between the mean of y and the mean of x.is the slope (b) ofthe regression lineis the yintercept

Standard error of estimate of regression forecasts .is the standard deviation of the error distribution,where the errors areSt Error of Estimate (of standardized y data, or) St Error of Estimate (of actual y data in physical units) When cor 0, Stand Error of Estimate is same as the SD of y.When cor 1, Stand Error of Estimate is 0 (all errors are zero).

Standard error of estimate for a regression forecastphysical unItsStand error of estimate standardized unitsYregression linevertical arrow shows68% confidenceinterval ( 1 SD) stand error of estimateXThe linear regression model can lead to probability forecastsfor any result, given the exact prediction and the correlation,and an assumption that the variables are normally distributed.

Correlation vs. Standard Error of EstimateStandard Errorof 00.200.100.00(as a fraction of SDof the predictand [y] )0.000.44We need quitehalf0.60a high correlation0.71to get a low standard0.80error of estimate:0.87need cor 0.8660.92to get an SD of the0.95error down to half0.98of the SD of the0.99predicted variable (y).1.00

Standard error of estimate (in standardized units)for the prediction model as a whole (generalized forany possible values of x) isBut this can be defined more accurately if we know the x value.Let zo be the standardized value of the predictor (x). Thenstandard error of estimate as function of zo isIf we are dealing witha single case, 1 is addedto the content under thesecond square root term.Standard error is larger when x value is farther away from mean.There is also an “unbiasing” adjustment, even if x is at its mean.Both of these effects are smaller when the sample size is larger.

Simple Linear Regression Problem:Coupled GCM forecasts for Fiji for next Jan-Feb-MarSuppose we know that the correlation between a coupled GCMrainfall forecast for parts of Fiji in Jan-Feb-Mar (made at beginningof December), and the actual rainfall, is 0.52. This does not come asa surprise, because we know that Fiji is sensitive to the ENSO stateand that climate models are able to reproduce this relationship to amoderate extent. By early December the ENSO state is usually stable.Suppose we want to issue a rainfall forecast for the station ofNadi on the north side of the main Fiji island, using the forecastfrom this model. We have the following historical data:Model Predictions (JFM):Mean: 1140 mmSD: 700 mmObservations (JFM):Mean: 935 mmSD: 500 mmIf the model forecast for the current year is 1890 mm, what wouldbe our regression-based best forecast for the actual precipitation?

JFM season in Nadi, Fiji:Model Predictions (JFM):Mean: 1140 mmSD: 700 mmObservations (JFM):Mean: 935 mmSD: 500 mmCorrel (forecast vs. observations) 0.52We use:z value for predictor (Model predicts 1890 mmand) is (1890 - 1140) / 700 1.07Then z value for forecast of precip ( ) is (0.52) (1.07) 0.56(forecast of precip is 0.56 SDs above its mean.)Forecast of precip mean of y (0.56)(SDy)Forecast of precip 935 0.56(500) 935 280 1215 mmStandard error of estimate (standardized units) Standard error of estimate (physical units) .854(.854) 427 mmSince we do not know the sample size used to develop this regression model,We cannot compute the standard error of estimate for this forecast specifically.

What probabilistic lFREQUENCYHistorical distribution(climatological distribution)(33.3%, 33.3%, 33.3%)Forecast distribution(15%, 32%, 53%)(Courtesy Mike Tippett)NORMALIZED RAINFALLHistorically, the probabilities of above and below are 0.33. Shifting themean by one half standard deviation and reducing the variance by 20%changes the probability of below to 0.15 and of above to 0.53. Correlationskill would be 0.45, and predictor signal strength would be 1.11 SD units.

A “strong” shift of odds in rainfall forecast for Kenya during El NinoONDSteps in finding probabilities of each of the tercilebased categories (below, near and above normal).1. Use regression to make a deterministic (singlepoint) forecast.2. Determine standard error of estimate to representthe uncertainty of the deterministic forecast.3. Use standard error of estimate to form a forecastdistribution (i.e., make the red curve). 13% 29% 59% 4. Find what value of z on the forecast distributioncoincides with the tercile boundaries of theclimatological distribution (33%ile and 67%ileon the black curve). Then use z-table to get theprobabilities associated with these z values.

Tercile probabilities for various correlation skills and predictorsignal strengths (in SDs). Assumes Gaussian probability distribution. Forecast (F) signal (Predictor Signal) x (Correl Skill).CorrelationSkillPredictorSignal 0.0PredictorSignal 0.5PredictorSignal 1.0PredictorSignal 1.5PredictorSignal 2.00.00F signal 0.0033 / 33 / 33F signal 0.0033 / 33 / 33F signal 0.0033 / 33 / 33F signal 0.0033 / 33 / 33F signal 0.0033 / 33 / 330.20F signal 0.0033 / 34 / 33F signal 0.1029 / 34/ 37F signal 0.2026 / 33 / 41F signal 0.3023 / 33 / 45F signal 0.4020 / 31 / 490.30F signal 0.0033 / 35 / 33F signal 0.1527 / 34 / 38F signal 0.3022 / 33 / 45F signal 0.4517 / 31 / 51F signal 0.6014 / 29 / 570.40F signal 0.0032 / 36 / 32F signal 0.2025 / 35 / 40F signal 0.4018 / 33 / 49F signal 0.6013 / 30 / 57F signal 0.809 / 25 / 650.50F signal 0.0031 / 38 / 31F signal 0.2522 / 37 / 42F signal 0.5014 / 33 / 53F signal 0.759 / 27 / 64F signal 1.005 / 21 / 740.60F signal 0.0030 / 41 / 30F signal 0.3018 / 38 / 44F signal 0.6010 / 32 / 58F signal 0.905 / 23 / 72F signal 1.202 / 15 / 830.70F signal 0.0027 / 45 / 27F signal 0.3513 / 41 / 46F signal 0.706 / 30 / 65F signal 1.052 / 17 / 81F signal 1.401 / 8 / 910.80F signal 0.0024 / 53 / 24F signal 0.408 / 44 / 48F signal 0.802 / 25 / 73F signal 1.200* / 10 / 90F signal 1.600** / 3 / 97*0.3**0.04

Nino3.4 SST anomaly predictionsfrom July

Raw scoresregression formula:whereis predicted value of y from regression from xSlope can be nonzero only if correlation is nonzero.Therefore, testing if the slope is significantly differentfrom zero should give the same result (the probabilitythat it is different from zero by chance alone) as testingif the correlation is significantly different from zero.

Hypothesis test for a correlation valueHow can we reject the (null) hypothesis that a correlationvalue comes from a population having zero correlation?Standard error of correlation coefficient with respect tozero correlation (approximate; slightly too strict for n 10):(can also be called)See if your sample correlation falls outside of plus or minus 1.96(for 2-tailed, 5% level) times the above standard error. If not, itcould have come from population with zero correlation. For 1-sidedtest (if sign of the correlation in known or expected in advance ofseeing the resulting experimental correlation), see if your samplecorrelation is greater in magnitude than 1.65 times the abovestandard error. For correlation, 1-sided tests are common.

Example of a hypothesis test for a correlationSuppose we test the correlation between malaria incidencefollowing the November – March rainy season in Botswana,and the amount of rainfall during that rainy season. We know,before investigating the correlation, that more rainfall (exceptfor extreme flooding conditions) creates a more favorableenvironment for the vector and thus greater risk for malaria.Suppose for 10 years of data for rainfall during Nov – March andmalaria during March – May, we get a correlation of 0.64. Is thisstatistically significant in terms of the hypothesis that the truepopulation correlation is zero? That is, could the 0.64 have comeabout just by chance, due to natural sampling variations, and notdue to a physical association between rainfall amount and malaria?Since the slate is wiped clean for the rainfall – malaria relationshipwith each new year, we can use 10 as the degrees of freedom.(This might not be true if the cases were not independent, suchas for adjacent seasons that have nonzero lag correlation in bothrainfall and malaria.)

Example of a hypothesis test for a correlationSample size for rainfall and subsequent malaria incidence: n 10Correlation between rainfall and malaria incidence: 0.64We set up a 1-sided z test for the correlation of 0.64. It is 1-sidedbecause we have physical reason to expect a positive correlationNumerator shows correlation differencerather than a negative correlation.between sample outcome and populationhaving zero correlationLooking at a z table, the chance of equaling or exceeding z 1.92is 0.0274. Significance at the 5% level is therefore achieved.

butAs mentioned earlier Confidence intervals for a nonzero correlation are smallerthan those for zero correlation, and are asymmetric suchthat the interval toward lower absolute values is larger.Significance tests against populations with nonzerocorrelation require the Fisher r-to-Z transformation,whose tables are available in many statistics books.

Temporal degrees of freedom (number of independent time samples)can be less than the number of cases, due to autocorrelation in thedata.To assess the effective degrees of freedom (from Livezey and Chen,1983, Mon. Wea. Rev.), the time between independent samples isestimated:Integral time Then the effective degrees of freedom is Total period / integral timeFor example, if there are 20 years of data and the integral time is1.4 years, then there are 20/1.4 about 14 degrees of freedom.Monte Carlo techniques can also be used to estimate temporaldegrees of freedom and also spatial degrees of freedom.

Standard error of the slope b (depends on b itself, and on thecorrelation and on sample size n):See if confidence interval around your sample slope, reachingabout double (for 2-tailed, 5% level) the StError(b) on eitherside of your sample slope, contains zero slope. If so, couldhave come from population with zero slope (retain null hypoth).Again, a significance test on the slope should agree with asignificance test on the correlation itself.

Multiple Linear Regressionuses 2 or more predictorsGeneral form:Let us take simplest multiple regression case -- two predictors:Here, the b’s are not simplyand, unlessx1 and x2 have zero correlation with one another. Any correlation between x1 and x2 makes determining the b’s less simple.The b’s are related to the partial correlation, in which thevalue of the other predictor(s) is held constant. Holding otherpredictors constant eliminates the part of the correlation dueto the other predictors and not just to the predictor at hand.Notation: partial correlation of y with x1, with x2 heldconstant, is written

For 2 (or any n) predictors, there are 2 (or any n) equationsin 2 (or any n) unknowns to be solved simultaneously.When n 3 or so, determinant operations are necessary.For case of 2 predictors, and using z values (variablesstandardized by subtracting their mean and then dividingby the standard deviation) for simplicity, the solution canbe done by hand. The two equations to be solvedsimultaneously are:b1.2b1.2(corx1,x2) b2.1(cor x1,x2) b2.1 cory,x1 cory,x2Goal: to find the two coefficients, b1.2 and b2.1(called simply b1and b2 in the equation at the top)

Example of a multiple regression problem with two predictorsThe number of Atlantic hurricanes between June and Novemberis slightly predictable 6 months in advance (in early December)using several precursor atmospheric and oceanic variables. Twovariables used are:(1) 500 mb geopotential height in November in the polarnorth Atlantic (67.5N-85 N latitude, 10E-50 W longitude)(2) Sea level pressure in November in the Northtropical Pacific (7.5N-22.5 N latitude, 125-175 W longitude).

Location of two long-lead Atlantic hurricane predictor sst/sst.anom.month.gif

Physical reasoning behind the two predictors:(1) 500 millibar geopotential height in November in the polarnorth Atlantic. High heights are associated with a negativeNorth Atlantic Oscillation (NAO) pattern, tending to associatewith a stronger thermohaline circulation, and also tending to befollowed by weaker upper atmospheric westerlies and weakerlow-level trade winds in the tropical Atlantic the followinghurricane season. All of these favor hurricane activity.(2) sea level pressure in November in the North tropical Pacific.High pressure in this region in winter tends to be followed byLa Nina conditions in the coming summer and fall, which favorseasterly Atlantic wind anomalies aloft, and hurricane activity.First step: Find “regular” correlations among all the variables(x1 ,x2, y): corx1,y corx2,y corx1,x2

X1: Polar north Atlantic 500 mb heightX2: North tropical Pacific sea level pressure 0.20 (x1,y) 0.40 (x2,y)Simultaneous equations to be solved:b1.2 (0.30)b2.1 0.20(0.30)b1.2 b2.1 0.40one pre 0.30 (x1,x2) dictor vsthe otherWant to get oneof the predictorsto cancel out.Solution: Multiply 1st equation by 3.333, then subtract second equationfrom first equation:(3.033)b1.2 0 0.267 Dividing by 3.033:b1.2 0.088. Then use this in either equation to find b2.1 0.374.Then regression equation is Zy (0.088)zx1 (0.374)zx2

More detail on solving the two simultaneous equations:b1.2(0.30)b1.2 (0.30)b2.1 b2.1 0.20 0.40Solution: Multiply 1st equation by 3.333:(3.333)b1.2 (3.333)(0.30)b2.1 (3.333)(0.20)then the 1st equation becomes:Want to get one of the3.333 b1.2 (1.0)b2.1 0.667 predictors to cancelout. Do it with b2.1Now subtract second equation from the new first equation:(3.333 - 0.3)b1.2 (1 - 1)b2.1 0.667 - 0.40Doing the subtraction yields:3.033b1.2 0.267Then divide both sides by 3.033: b1.2 0.267 / 3.033 0.088.Use this value of b1.2 in either equation, and get b2.1 0.374.Then the regression equation is Zy (0.088)zx1 (0.374)zx2

Multiple correlation coefficient R correlation betweenpredicted y and actual y using multiple regression.In example above, 0.408Note this is only very slightly better than using the secondpredictor alone in simple regression. This is not surprising,since the first predictor’s total correlation with y is only0.2, and it is correlated 0.3 with the second predictor, sothat the second predictor already accounts for some of whatthe first predictor has to offer. A decision would probablybe made concerning whether it is worth the effort to includethe first predictor for such a small gain. Note: the multiplecorrelation can never decrease when more predictors are added.

Multiple R is usually inflated somewhat compared withthe true relationship, since additional predictors fitthe accidental variations found in the data sample.Adjustment (decrease) of R for the existence of multiplepredictors gives a less biased estimate of R:Adjusted R n sample sizek number of predictors

Sampling variability of a simple (x, y) correlation coefficientaround zero when population correlation is zero is approximatelyIn multiple regression the same approximate relationshipholds except that n must be further decreased, dependingon the number of predictors additional to the first one.If the number of predictors (x’s) is denoted by k, thenthe sampling variability of R around zero, when there isno true relationship with any of the predictors, is given byIt becomes easier to get a given multiple correlation bychance as the number of predictors increases.

Hypothesis test for a multiple correlation valueHow can we reject the (null) hypothesis that a multiple correlation value comes from a population having zero correlation?Standard error of correlation coefficient with respect tozero correlation (approximate; slightly too strict for n 10):(also called)See if your sample correlation (R) equals or exceeds 1.96 (for2-sided, 5% level) times the above standard error, or 1.65 (for1-sided, 5% level) times it. If not, it could have come from apopulation with zero correlation, with a probability of 5%.For multiple correlations, a 1-sided test can be used only whenthe signs of the correlations between each individual predictorand the predictand (y) are anticipated before the experiment,and when the results confirm those expected correlation signs.(Note: R is always positive.)

Example of a hypothesis test for a multiple correlationAs a follow-up to the hypothesis test of the positive rainfall vs.malaria correlation in Botswana presented in the section on simpleregression, suppose we now use both rainfall and temperature aspredictors of malaria incidence. We expect greater rainfall to resultin greater malaria incidence, but also expect higher temperature toincrease incidence, so we use both as predictors in multiple regression.Suppose for 10 years of data for rainfall during Nov – March andmalaria during the following March – May, using a correlation of0.64 for rainfall vs. malaria, 0.46 for temperature vs. malaria, and0.35 for rainfall vs. temperature, we get a multiple correlation of 0.69.Is this statistically significant in terms of the null hypothesis thatthe true population multiple correlation is zero? (Could the 0.69 havecome about just by chance, due to natural sampling variations amongx1, x2, and y, and not due to a physical association involving thecombined predictive effects rainfall and temperature, and malaria?)

Example of a hypothesis test for a multiple correlationSample size for rainfall, temperature, and malaria incidence: n 10Multiple correlation between (rain, temp) and malaria incidence: 0.69here k 2We do a 1-sided z test for the 0.69 correlation. 1-sided is justified,given that the correlations between malaria and both climate variables are both positive, as expected on basis of malaria knowledge.Numerator shows correlation differencebetween sample outcome and populationhaving zero correlationLooking at the z table, the chance of equaling or exceeding 1.95is 0.5 - 0.4744 0.0256. Significance at the 5% level is achieved.

Partial Correlation is correlation between y and x1, where avariable x2 is not allowed to vary. Example: in an elementary school, reading ability (y) is well correlated withthe child’s weight (x1). But both y and x1 are really causedby something else: the child’s age (call x2). What would thecorrelation be between weight and reading ability if the agewere held constant? (Would it drop down to zero?)A similar set of equations exists for b2 (second predictor).

Suppose the three correlations in a school study are:reading vs. weight :reading vs. age:weight vs. age:The two partial correlations come out to be:Finally, the two regression weights, for standardizedvariables, turn out to be:Body weight is seen to be a minor factor compared with age,as its regression weight is near zero.

Suppose a group of people observes an increase in globaltemperature but does not believe it is due to greenhouse gasincreases. Instead, they believe that the warming is due to thesimple passage of time, as stipulated by their religious doctrine.To try to judge whether global warming can be attributed more toincreases in greenhouse gas concentrations or to the march oftime, we do a 2-predictor multiple regression:x1 CO2 concentration (annual average)x2 the year numbery global mean temperature (annual average)

The correlations among x1, x2 and y:CO2 vs. global temperature: 0.89 (x1, y)year vs. global temperature: 0.85 (x2, y)CO2 vs. year:0.96 (x1,x2)The two partial correlations come out to be:Finally, the two regression weights, for standardizedvariables, turn out to be:CO2 concentration is seen to be the dominant predictor.

Suppose the CO2 vs. year correlation is even higher:CO2 vs. global temperature: 0.89 (x1, y)year vs. global temperature: 0.85 (x2, y)CO2 vs. year:0.98 (x1,x2)The two partial correlations then come out to be:Finally, the two regression weights turn out to be:When two predictors are correlated with one another andr(x2,y) [ r(x1,y) (r(x1,x2) ], the weight for x2 becomes signedopposite r(x2,y). (Here it becomes negative instead of positive.)In extreme cases the weights can take on very high magnitudes,and the regression can become unstable and even incalculable.

Two-predictor multiple regression:Some examples of behavior of regressionweights when x1, x2 and y are all standardizedto equalize their units

YaX1er(y,x1.x2) is r(x1,y)when x2 is heldconstantcbX2Squaredcorrelationsare additive;correlationsare not.Zero-order terms:r2(x1,y) a cr2(x2,y) b cR2(y, x1 & x2) a b cSemipartials:r2(y,x1.x2) ar2(y,x2.x1) bPartials:r2(y,x1.x2) a / (a e)r2(y,x2.x1) b / (b e)

In the following 2-predictor examples, colors are used as follows:Black: Independence of predictors:Information provided by each is unique.Blue: Partial redundancy amongpredictors: Part, but not all, of what x2offers is already provided by x1. Bothcoeffs retain original sign.Green: Maximum redundancy amongpredictors: x2 adds nothing beyondwhat is provided by x1, so x2 is uselessand has coeff of zero.Purple: Redundancy among predictors,but r(x2,y) is low (or even zero), and x2beneficially suppresses a part of x1that is unrelated to y. Coeff of x2 becomesopposite sign of its simple correlation with y.Red: One form of this condition is when theredundancy is less than the beneficialsuppression, causing R to exceed thatexpected for independent predictors. A variation of this is when x1 and x2 have negativeredundancy: e.g. (rx1,y) 0, r(x2,y) 0, r(x1,x2) 0

Effect of the inter-predictor correlation on weights (w) and multiple correlation (R)r(x1,y) .50w1r(x2,y) redundancy,decreasingbenefit fromusing bothpredictorsinstead of one

Effect of the inter-predictor correlation on weights (w) and multiple correlation (R)r(x1,y) .50w1r(x2,y) 7070.645r(x1,x2)Rmult0.6450.7070.791r(x1,y) .50 r(x2,y) en r(x1,x2) 0,Redundancy occurs when r(x1,x2) is of same sign as that of [r(x1,y)]*[r(x2,y)]Enhancement occurs when r(x1,x2) is of sign opposite that of [r(x1,y)]*[r(x2,y)]

Effect of the inter-predictor correlation on weights (w) and multiple correlation (R)r(x1,y) .54w1r(x2,y) edundancy,ry2 (ry1)(r12)ry2 (ry1)(r12)ry2 (ry1)(r12)*When R(x1,x2) .926, all information about y provided by x2 ry1 is r(x1,y)ry2 is r(x2,y)is totally redundant with that carried by x1.r12 is r(x1,x2)

Effect of the inter-predictor correlation on weights (w) and multiple correlation (R)r(x1,y) .70w1r(x2,y) 91-.34.60.752Enhancementredundancy,ry2 (ry1)(r12)ry2 (ry1)(r12)ry2 (ry1)(r12)*When r(x1,x2) .286, all information about y provided by x2is totally redundant with that carried by x1.YX1X2ry1 is r(x1,y)ry2 is r(x2,y)r12 is r(x1,x2)

Effect of the inter-predictor correlation on weights (w) and multiple correlation (R)r(x1,y) .50w1r(x2,y) .47.6YX1X20.5460.625X2 plays a role eventhough its correlationwith y is �Independence”,ry2 entry1 is r(x1,y)ry2 is r(x2,y)r12 is r(x1,x2)

In the preceding 2-predictor examples, colors were used as follows:Black: Independence of predictors:Information provided by each is unique.Blue: Partial redundancy amongpredictors: Part, but not all, of what x2offers is already provided by x1. Bothcoeffs retain original sign.Green: Maximum redundancy amongpredictors: x2 adds nothing beyondwhat is provided by x1, so x2 is uselessand has coeff of zero.Purple: Redundancy among predictors,but r(x2,y) is low (or even zero), and x2b

This is the Pearson product-moment correlation (the "standard" correlation) Correlation Skill for NINO3 forecasts Northern Spring barrier Skill bonus useless low fair good Correlation between forecast and obs Basis of climate predictability lies in predictability of ENSO Skill of Cane-Zebiak model in prediction of SST in tropical Pacific .

Related Documents:

Correlation: Karl Pearson's Coefficient of Correlation, Spearman Rank ...

Items Description of Module Subject Name Management Paper Name Quantitative Techniques for Management Decisions Module Title Correlation: Karl Pearson's Coefficient of Correlation, Spearman Rank Correlation Module Id 32 Pre- Requisites Basic Statistics Objectives After studying this paper, you should be able to - 1) Clearly define the meaning of Correlation and its characteristics.

27 Views

1y ago

Pearson Education LTD. Pearson Education Australia PTY, Limited. Pearson Education Singapore, Pte. Ltd. Pearson Education North Asia, Ltd. Pearson Education Canada, Ltd. Pearson Educación de Mexico, S.A. de C.V. Pearson Education—Japan Pearson Education Malaysia, Pte. Ltd. The Libra

339 Views

2y ago

2008 by Pearson Education, Inc.

Pearson Education LTD. Pearson Education Australia PTY, Limited. Pearson Education Singapore, Pte. Ltd. Pearson Education North Asia, Ltd. Pearson Education Canada, Ltd. Pearson Educatión de Mexico, S.A. de C.V. Pearson Education—Japan Pearson Education Malaysia, Pte. Ltd. Library of Co

250 Views

2y ago

Edexcel past paper questions - KUMAR'S MATHS REVISION

Edexcel past paper questions Statistics 1 Correlation & Regression. Correlation & Regression Page 2 Product-moment correlation coefficient The product-moment correlation coefficient, r, measures how close the

207 Views

2y ago

Pearson annual report

Pearson (UK) 80 Strand, London WC2R 0RL, UK T 44 (0)20 7010 2000 F 44 (0)20 7010 6060 firstname.lastname@pearson.com www.pearson.com Pearson (US) 1330 Avenue of the Americas, New York City, NY 10019, USA T 1 212 641 2400 F 1 212 641 2500 firstname.lastname@pearson-inc.com www.pearson.com Pearson Education One Lake Street, Upper Saddle River,

97 Views

1y ago

Correlation Trading Strategies Opportunities and Limitations - DerSoft

The correlation strategies, roughly in chronological order of their occurrence are 1) Empirical Correlation Trading, 2) Pairs Trading, 3) Multi-asset Options, 4) Structured Products, 5) Correlation Swaps, and 6) Dispersion trading. While traders can apply correlation trading strategies to enhance returns, correlation products are also a

32 Views

1y ago

Pearson's Correlation - Shippensburg University

The four y variables have the same mean (7.5), standard deviation (4.12), correlation (0.81) and regression line (y 3 0.5. x). Pearson's correlation coefficient is a measure of the. intensity of the . . e.g. Small r values are important in large samples. Remember that correlation does not equal causation. Title: Slide 1

16 Views

1y ago

Artifi cial intelligence - University of London Worldwide

Russell, S. and P. Norvig Artiﬁcial Intelligence: A Modern Approach. (Upper Saddle River, NJ: Prentice Hall, c2010) third edition [ISBN 9780132071482 (pbk); 9780136042594 (hbk)]. Russell and Norvig is one of the standard AI textbooks and covers a great deal of material; although you may enjoy reading all of it, you do not need to. The chapters that you should read are identiﬁed in the .

63 Views

3y ago

Recent Views

Chapter 15 Rooming Houses - MassLegalHelp

Individual renters usually have their own separate room and their own agreement with the landlord. For example, you may stay for just a few days, but another renter may stay for 3 months. Rooming houses with 4 or more renters at the same time must be licensed. Some cities and towns have local protections for renters in rooming houses. Rooming House

2y ago

356 Views

Americans rent, buy, sell and think about home.

median rent among Generation X is 1,062 per month. The youngest renters, Generation Z, are typically paying the least at 882 per month.9 This echoes the notion that Generation Z renters are opting to rent the smallest apartments or homes, which translates to lower monthly rental payments. Approximately half of renters (47 percent) are paying for

1y ago

174 Views

Disaster assistance process overview

A guide through the post-disaster recovery process. KEY ASSISTANCE SOURCES TIPS HOMEOWNERS/RENTERS INSURANCE If you have homeowners or renters insurance, this provides you funds to repair or replace property damaged as a result of covered perils during a disaster. Additional types of insurance, such as auto or other peril-specific

1y ago

109 Views

Personal insurance - Car & Business insurance King Price Insurance

The king's insurance options 5 Things you need to know 7 The stuff you need to do 14 How to claim 16 Our commitment to you 20 Car insurance 22 Car warranty 37 Shortfall cover 45 Scratch and dent 46 Tyre and rim 48 Motorbike insurance 53 Trailer and caravan insurance 64 Watercraft insurance 68 Home contents insurance 77 Buildings insurance 89

1y ago

673 Views

Texas Demographic Trends and Projections and the 2020 Census

Income disparities place African Americans and Latinos at greater risk during times of income loss. Renters, renters w/low incomes, Blacks, and households w/children face greater risk of eviction. Persistently low health insurance coverage in the state increases vulnerability of Texans with employer based insurance.

1y ago

137 Views

Gold Tier - MAPFRE Insurance

Foy Insurance of MA, LLC 198 Frank Consolati Insurance Agency, Inc. 198 County Insurance Agency, Inc. 198 Woodrow W Cross Agency 214 Woodland Insurance Agency, Inc. 214 Tegeler Insurance Services of CT, Inc. 214 Pantano/VonKahle Insurance Agency, Inc. 214 . Hanson Insurance Agency, Inc. 287 J.H. Slattery Insurance Agency, Inc. 287

1y ago

565 Views

Texas - milestonepnc

State Auto - Homeowners TEXAS 05/2017 State Auto Insurance Company UG-1.0 I - UNDERWRITING GUIDELINES A. Entire State Eligibility Guidelines Premier Protection Plus Standard Available Forms HO0004 - Renters HO0005 - Homeowner Expanded HO0006 - Condominium HO0003 - Homeowner HO0004 - Renters HO0005 - Homeowner Expanded

1y ago

112 Views

Consumer Guide to Auto Insurance - csimt.gov

consumer guide to auto insurance contents introduction to auto insurance 1 understanding your auto insurance policy 2 required auto insurance 3 optional types of auto insurance 4-5 getting the right coverage 6 accidents and violations 7 how to shop for auto insurance 8 shopping tips 9 frequently asked questions 10-11 insurance complaints/when you have a problem 12

2y ago

805 Views

Industry Observations Insurance Industry

Jun 30, 2019 · 6/17/2019 Commercial Insurance Branch of Extraco Banks, N.A. Higginbotham Insurance Group, Inc. Insurance Brokers NA 6/13/2019 Links Insurance Services, LLC World Insurance Associates LLC Property and Casualty Insurance NA 6/13/2019 Abram Interstate Insurance Services, Inc. Risk Placement Services,

2y ago

619 Views

Life Insurance Buyer's Guide Life Insurance - National Association of .

Life Insurance uers uide Naional ssociaion of Insurance Commissioners Compare the Different Types of Insurance Policies There are many types of life insurance pol-icies. You should choose a policy with fea-tures that fit your individual needs. Some things to consider are: Term Insurance vs. Cash Value In-surance. Term insurance is intended to

1y ago

520 Views

your guide to understanding auto ins in nh - New Hampshire

Hampshire Insurance Department does not mandate or set Auto Insurance Rates. Auto Insurance Rates will vary by insurance company. This guide is intended to give New Hampshire consumers basic information on auto insurance. It suggests ways to: Lower the cost of your auto insurance, shop for Auto insurance and, file an auto insurance claim.

1y ago

449 Views

18.01.41 - REPLACEMENT OF LIFE INSURANCE AND ANNUITIES - Idaho

Department of Insurance Replacement of Life Insurance and Annuities. Page 3. 04. Existing Life Insurance or Annuity. "Existing Life Insurance or Annuity" means any life insurance or annuity in force, including life insurance under a binding or conditional receipt or a lif e insurance policy or annuity that is within an unconditional refund period.

1y ago

407 Views

EXAMINATION REPORT OF THE ADMIRAL INSURANCE COMPANY AS OF . - Delaware

Berkley Regional Specialty Insurance Comp 31295 DE Carolina Casualty Insurance Company 10510 IA Clermont Insurance Company 33480 IA Continental Western Insurance Company 10804 IA Firemen's Insurance Com pany of Wash, D.C. 21784 DE Gemini Insurance Company 10833 DE Great Divide Insurance Company 25224 ND

1y ago

258 Views

American International Group, Inc. - Federal Reserve

American General Life Insurance Company AGL U.S. Life Insurance Company AGC Life Insurance Company AGC Life U.S. Life Insurance Company The United States Life Insurance Company in the City of New York U.S. Life U.S. Life Insurance Company The Variable Annuity Life Insurance Company VALIC U.S. Life Insurance Company

1y ago

269 Views

Japan's Insurance Market - Toa Re

with 61.6% of net premiums written, of which automobile insurance totaled 48.8% and compulsory automobile liability insurance totaled 12.8%. Fire insurance accounted for 13.7%, miscellaneous casualty insurance including liability insurance accounted for 11.6%, accident insurance accounted for 9.8%, and marine insurance accounted for 3.2%.

1y ago

179 Views

Pearson Product-moment Correlation

It looks like you're using an ad-blocker