Chapter 14Correlation and RegressionPowerPoint Lecture SlidesEssentials of Statistics for theBehavioral SciencesEighth Editionby Frederick J. Gravetter and Larry B. Wallnau
Chapter 14 Learning Outcomes1 Understand Pearson r as measure of variables’ relationship2 Compute Pearson r using definitional or computational formula3 Use and interpret Pearson r; understand assumptions &limitations4 Test hypothesis about population correlation (ρ) with sample r5 Understand the concept of a partial correlation
Chapter 14 Learning Outcomes(continued)6 Explain/compute Spearman correlation coefficient (ranks)7 Explain/compute point-biserial correlation coefficient (onedichotomous variable)8 Explain/compute phi-coefficient for two dichotomous variables9 Explain/compute linear regression equation to predict Y values10 Evaluate significance of regression equation
Tools You Will Need Sum of squares (SS) (Chapter 4)– Computational formula– Definitional formula z-Scores (Chapter 5) Hypothesis testing (Chapter 8) Analysis of Variance (Chapter 12)– MS values and F-ratios
14.1 Introduction toCorrelation Measures and describes the relationshipbetween two variables Characteristics of relationships– Direction (negative or positive; indicated by thesign, or – of the correlation coefficient)– Form (linear is most common)– Strength or consistency (varies from 0 to 1) Characteristics are all independent
Figure 14.1 Scatterplot forCorrelational Data
Figure 14.2 Positive andNegative Relationships
Figure 14.3 Different LinearRelationship Values
14.2 The Pearson Correlation Measures the degree and the direction of thelinear relationship between two variables Perfect linear relationship– Every change in X has a corresponding change in Y– Correlation will be –1.00 or 1.00covariabil ity of X and Yr variablity of X and Y separatel y
Sum of Products (SP) Similar to SS (sum of squared deviations) Measures the amount of covariabilitybetween two variables SP definitional formula:SP ( X MX)(Y MY )
SP – Computational formula Definitional formula emphasizes SP as the sumof two difference scores Computational formula results in easiercalculations SP computational formula:X Y SP XY n
Pearson CorrelationCalculation Ratio comparing the covariability of X and Y(numerator) with the variability of X and Yseparately (denominator)r SPSS X SS Y
Figure 14.4Example 14.3 Scatterplot
Pearson Correlation andz-Scores Pearson correlation formula can be expressedas a relationship of z-scores.z Sample : r XzYn 1z Population : XNzY
Learning Check A scatterplot shows a set of data points that fitvery loosely around a line that slopes down tothe right. Which of the following values wouldbe closest to the correlation for these data?A 0.75B 0.35C -0.75D -0.35
Learning Check - Answer A scatterplot shows a set of data points that fitvery loosely around a line that slopes down tothe right. Which of the following values wouldbe closest to the correlation for these data?A 0.75B 0.35C -0.75D -0.35
Learning Check Decide if each of the following statementsis True or FalseT/F A set of n 10 pairs of X and Yscores has ΣX ΣY ΣXY 20.For this set of scores, SP –20T/F If the Y variable decreases whenthe X variable decreases, theircorrelation is negative
Learning Check - Answers(20)(20)SP 20 20 40 2010
14.3 Using and Interpretingthe Pearson Correlation Correlations used for:– Prediction– Validity– Reliability– Theory verification
Interpreting Correlations Correlation describes a relationship but doesnot demonstrate causation Establishing causation requires an experimentin which one variable is manipulated andothers carefully controlled Example 14.4 (and Figure 14.5) demonstratesthe fallacy of attributing causation afterobserving a correlation
Figure 14.5 Correlation:Churches and Serious Crimes
Correlations and RestrictedRange of Scores Correlation coefficient value (size) will beaffected by the range of scores in the data Severely restricted range may provide a verydifferent correlation than would a broaderrange of scores To be safe, never generalize a correlationbeyond the sample range of data
Figure 14.6 Restricted ScoreRange Influences Correlation
Correlations and Outliers An outlier is an extremely deviant individual inthe sample Characterized by a much larger (or smaller)score than all the others in the sample In a scatter plot, the point is clearly differentfrom all the other points Outliers produce a disproportionately largeimpact on the correlation coefficient
Figure 14.7 Outlier InfluencesSize of Correlation
Correlations and the Strengthof the Relationship A correlation coefficient measures the degreeof relationship on a scale from 0 to 1.00 It is easy to mistakenly interpret this decimalnumber as a percent or proportion Correlation is not a proportion Squared correlation may be interpreted as theproportion of shared variability Squared correlation is called the coefficient ofdetermination
Coefficient of Determination Coefficient of determination measures theproportion of variability in one variable thatcan be determined from the relationship withthe other variable (shared variability)Coefficient of Determinat ion r2
Figure 14.8 Three Amounts ofLinear Relationship Example
14.4 Hypothesis Tests withthe Pearson Correlation Pearson correlation is usually computed forsample data, but used to test hypothesesabout the relationship in the population Population correlation shown by Greek letterrho (ρ) Non-directional: H0: ρ 0 and H1: ρ 0Directional: H0: ρ 0 and H1: ρ 0 orDirectional: H0: ρ 0 and H1: ρ 0
Figure 14.9 Correlation inSample vs. Population
Correlation Hypothesis Test Sample correlation r used to test population ρ Degrees of freedom (df) n – 2 Hypothesis test can be computed usingeither t or F; only t shown in this chapter Use t table to find critical value with df n - 2t r (1 r 2 )(n 2)
In the Literature Report– Whether it is statistically significant Concise test results– Value of correlation– Sample size– p-value or level– Type of test (one- or two-tailed) E.g., r -0.76, n 48, p .01, two tails
Partial Correlation A partial correlation measures the relationshipbetween two variables while mathematicallycontrolling the influence of a third variable byholding it constantrxy z rxy ( rxy ryz )(1 2rxz )(1 2ryz )
Figure 14.10 Controlling theImpact of a Third Variable
14.5 Alternatives to thePearson Correlation Pearson correlation has been developed– For data having linear relationships– With data from interval or ratio measurementscales Other correlations have been developed– For data having non-linear relationships– With data from nominal or ordinal measurementscales
Spearman Correlation Spearman (rs) correlation formula is used withdata from an ordinal scale (ranks)– Used when both variables are measured on anordinal scale– Also may be used if measurement scales is intervalor ratio when relationship is consistentlydirectional but may not be linear
Figure 14.11 ConsistentNonlinear Positive Relationship
Figure 14.12 ScatterplotShowing Scores and Ranks
Ranking Tied Scores Tie scores need ranks for Spearmancorrelation Method for assigning rank– List scores in order from smallest to largest– Assign a rank to each position in the list– When two (or more) scores are tied, compute themean of their ranked position, and assign thismean value as the final rank for each score.
Special Formula for theSpearman Correlation The ranks for the scores are simply integers Calculations can be simplified– Use D as the difference between the X rank andthe Y rank for each individual to compute the rsstatisticrs 1 D62n(n 2 1)
Point-Biserial Correlation Measures relationship between two variables– One variable has only two values(called a dichotomous or binomial variable) Effect size for independent samples t-test inChapter 10 can be measures by r2– Point-biserial r2 has same value as the r2computed from t-statistic– t-statistic tests significance of the mean difference– r statistic measures the correlation size
Point-Biserial Correlation Applicable in the same situation as theindependent-measures t test in Chapter 10– Code one group 0 and the other 1 (or any twodigits) as the Y score– t-statistic evaluates the significance of meandifference– Point-Biserial r measures correlation magnitude– r2 quantifies effect size
Phi Coefficient Both variables (X and Y) are dichotomous– Both variables are re-coded to values 0 and 1 (orany two digits)– The regular Pearson formulas is used to calculate r– r2 (coefficient of determination) measures effectsize (proportion of variability in one scorepredicted by the other)
Learning Check Participants were classified as “morning people”or “evening people” then measured on a 50-pointconscientiousness scale. Which correlationshould be used to measure the relationship?A Pearson correlationB Spearman correlationC Point-biserial correlationD Phi-coefficient
Learning Check - Answer Participants were classified as “morning people”or “evening people” then measured on a 50-pointconscientiousness scale. Which correlationshould be used to measure the relationship?A Pearson correlationB Spearman correlationC Point-biserial correlationD Phi-coefficient
Learning Check Decide if each of the following statementsis True or FalseT/F The Spearman correlation is used withdichotomous dataT/F In a non-directional significance test ofa correlation, the null hypothesis statesthat the population correlation is zero
Learning Check - AnswersFalse The Spearman correlation usesordinal (ranked) dataTrue Null hypothesis assumes norelationship; ρ zero indicates norelationship in the population
14.6 Introduction to LinearEquations and Regression The Pearson correlation measures a linearrelationship between two variables Figure 14.13 makes the relationship obvious The line through the data– Makes the relationship easier to see– Shows the central tendency of the relationship– Can be used for prediction Regression analysis precisely defines the line
Figure 14.13 Regression line
Linear Equations General equation for a line– Equation: Y bX a– X and Y are variables– a and b are fixed constant
Figure 14.14Linear Equation Graph
Regression Regression is a method of finding an equationdescribing the best-fitting line for a set of data How to define a “best fitting” straight linewhen there are many possible straight lines? The answer: a line that is the best fit for theactual data that minimizes prediction errors
Regression Ŷ is the value of Y predicted by the regressionequation (regression line) for each value of X (Y- Ŷ) is the distance each data point is fromthe regression line: the error of prediction The regression procedure produces a line thatminimizes total squared error of prediction This method is called the least-squared-errorsolution
Figure 14.15 Y-Ŷ Distance: ActualData Point Minus Predicted Point
Regression Equations Regression line equation: Ŷ bX a The slope of the line, b, can be calculatedSPsYb or b rSS XsX The line goes through (MX,MY) thereforea M Y bM X
Figure 14.16 Data Points andRegression Line: Example 14.13
Standard Error of Estimate Regression equation makes a prediction Precision of the estimate is measured by thestandard error of estimate (SEoE)SEoE SS residual df (Y Yˆ ) 2n 2
Figure 14.17 Regression Lines:Perfectly Fit vs. Example 14.13
Relationship Between Correlationand Standard Error of Estimate As r goes from 0 to 1, SEoE decreases to 0 Predicted variability in Y scores:SSregression r2 SSY Unpredicted variability in Y scores:SSresidual (1 - r2) SSY Standard Error of Estimate based on r:SSresidual(1 r 2 ) SSY dfn 2
Testing Regression Significance Analysis of Regression– Similar to Analysis of Variance– Uses an F-ratio of two Mean Square values– Each MS is a SS divided by its df H0: the slope of the regression line (b or beta)is zero
Mean Squares and F-ratioMS regression MS residualF SS regressiondf regressionSSresidual df residualMS regressionMS residual
Figure 14.18 Partitioning SSand df in Regression Analysis
Learning Check A linear regression has b 3 and a 4.What is the “predicted Y” (Ŷ) for X 7?A 14B 25C 31D Cannot be determined
Learning Check - Answer A linear regression has b 3 and a 4.What is the predicted Y for X 7?A 14B 25C 31D Cannot be determined
Learning Check Decide if each of the following statementsis True or FalseT/F It is possible for the regressionequation to place none of the actualdata points on the regression lineT/F If r 0.58, the linear regressionequation predicts about one third ofthe variance in the Y scores
Learning Check - AnswersTrue The line estimates where pointsshould be but there are almostalways prediction errorsTrue When r .58, r2 .336 ( 1/3)
Figure 14.19SPSS Output for Example 14.13
Figure 14.20 SPSS Output forExamples 14.13—14.15
Figure 14.21 Scatter Plot forData of Demonstration 14.1
Equations?Concepts?AnyQuestions?
Chapter 14 Learning Outcomes (continued) 6 Explain/compute Spearman correlation coefficient (ranks) Explain/compute point-biserial correlation coefficient (one 7 dichotomous variable) 8 Explain/compute phi-coefficient for two dichotomous variables 9 Explain/compute linear regression equation to predict Y values 10 Eval
independent variables. Many other procedures can also fit regression models, but they focus on more specialized forms of regression, such as robust regression, generalized linear regression, nonlinear regression, nonparametric regression, quantile regression, regression modeling of survey data, regression modeling of
Part One: Heir of Ash Chapter 1 Chapter 2 Chapter 3 Chapter 4 Chapter 5 Chapter 6 Chapter 7 Chapter 8 Chapter 9 Chapter 10 Chapter 11 Chapter 12 Chapter 13 Chapter 14 Chapter 15 Chapter 16 Chapter 17 Chapter 18 Chapter 19 Chapter 20 Chapter 21 Chapter 22 Chapter 23 Chapter 24 Chapter 25 Chapter 26 Chapter 27 Chapter 28 Chapter 29 Chapter 30 .
Chapter 7 Simple linear regression and correlation Department of Statistics and Operations Research November 24, 2019. Plan 1 Correlation 2 Simple linear regression. Plan 1 Correlation 2 Simple linear regression. De nition The measure of linear association ˆbetween two variables X and Y is estimated by the s
TO KILL A MOCKINGBIRD. Contents Dedication Epigraph Part One Chapter 1 Chapter 2 Chapter 3 Chapter 4 Chapter 5 Chapter 6 Chapter 7 Chapter 8 Chapter 9 Chapter 10 Chapter 11 Part Two Chapter 12 Chapter 13 Chapter 14 Chapter 15 Chapter 16 Chapter 17 Chapter 18. Chapter 19 Chapter 20 Chapter 21 Chapter 22 Chapter 23 Chapter 24 Chapter 25 Chapter 26
Chapter 3: Correlation and Regression The statistical tool with the help of which the relationship between two or more variables is studied is called correlation. The measure of correlation is called the Correlation Coefficie
Chapter 12. Simple Linear Regression and Correlation 12.1 The Simple Linear Regression Model 12.2 Fitting the Regression Line 12.3 Inferences on the Slope Rarameter ββββ1111 NIPRL 1 12.4 Inferences on the Regression Line 12.5 Prediction Intervals for Future Response Values 1
Linear Regression and Correlation Introduction Linear Regression refers to a group of techniques for fitting and studying the straight-line relationship between two variables. Linear regression estimates the regression coefficients β 0 and β 1 in the equation Y j β 0 β 1 X j ε j wh
LINEAR REGRESSION 12-2.1 Test for Significance of Regression 12-2.2 Tests on Individual Regression Coefficients and Subsets of Coefficients 12-3 CONFIDENCE INTERVALS IN MULTIPLE LINEAR REGRESSION 12-3.1 Confidence Intervals on Individual Regression Coefficients 12-3.2 Confidence Interval