Chapter 14 Correlation And Regression

2y ago
87 Views
19 Downloads
1.10 MB
70 Pages
Last View : 4d ago
Last Download : 3m ago
Upload by : Jerry Bolanos
Transcription

Chapter 14Correlation and RegressionPowerPoint Lecture SlidesEssentials of Statistics for theBehavioral SciencesEighth Editionby Frederick J. Gravetter and Larry B. Wallnau

Chapter 14 Learning Outcomes1 Understand Pearson r as measure of variables’ relationship2 Compute Pearson r using definitional or computational formula3 Use and interpret Pearson r; understand assumptions &limitations4 Test hypothesis about population correlation (ρ) with sample r5 Understand the concept of a partial correlation

Chapter 14 Learning Outcomes(continued)6 Explain/compute Spearman correlation coefficient (ranks)7 Explain/compute point-biserial correlation coefficient (onedichotomous variable)8 Explain/compute phi-coefficient for two dichotomous variables9 Explain/compute linear regression equation to predict Y values10 Evaluate significance of regression equation

Tools You Will Need Sum of squares (SS) (Chapter 4)– Computational formula– Definitional formula z-Scores (Chapter 5) Hypothesis testing (Chapter 8) Analysis of Variance (Chapter 12)– MS values and F-ratios

14.1 Introduction toCorrelation Measures and describes the relationshipbetween two variables Characteristics of relationships– Direction (negative or positive; indicated by thesign, or – of the correlation coefficient)– Form (linear is most common)– Strength or consistency (varies from 0 to 1) Characteristics are all independent

Figure 14.1 Scatterplot forCorrelational Data

Figure 14.2 Positive andNegative Relationships

Figure 14.3 Different LinearRelationship Values

14.2 The Pearson Correlation Measures the degree and the direction of thelinear relationship between two variables Perfect linear relationship– Every change in X has a corresponding change in Y– Correlation will be –1.00 or 1.00covariabil ity of X and Yr variablity of X and Y separatel y

Sum of Products (SP) Similar to SS (sum of squared deviations) Measures the amount of covariabilitybetween two variables SP definitional formula:SP ( X MX)(Y MY )

SP – Computational formula Definitional formula emphasizes SP as the sumof two difference scores Computational formula results in easiercalculations SP computational formula:X Y SP XY n

Pearson CorrelationCalculation Ratio comparing the covariability of X and Y(numerator) with the variability of X and Yseparately (denominator)r SPSS X SS Y

Figure 14.4Example 14.3 Scatterplot

Pearson Correlation andz-Scores Pearson correlation formula can be expressedas a relationship of z-scores.z Sample : r XzYn 1z Population : XNzY

Learning Check A scatterplot shows a set of data points that fitvery loosely around a line that slopes down tothe right. Which of the following values wouldbe closest to the correlation for these data?A 0.75B 0.35C -0.75D -0.35

Learning Check - Answer A scatterplot shows a set of data points that fitvery loosely around a line that slopes down tothe right. Which of the following values wouldbe closest to the correlation for these data?A 0.75B 0.35C -0.75D -0.35

Learning Check Decide if each of the following statementsis True or FalseT/F A set of n 10 pairs of X and Yscores has ΣX ΣY ΣXY 20.For this set of scores, SP –20T/F If the Y variable decreases whenthe X variable decreases, theircorrelation is negative

Learning Check - Answers(20)(20)SP 20 20 40 2010

14.3 Using and Interpretingthe Pearson Correlation Correlations used for:– Prediction– Validity– Reliability– Theory verification

Interpreting Correlations Correlation describes a relationship but doesnot demonstrate causation Establishing causation requires an experimentin which one variable is manipulated andothers carefully controlled Example 14.4 (and Figure 14.5) demonstratesthe fallacy of attributing causation afterobserving a correlation

Figure 14.5 Correlation:Churches and Serious Crimes

Correlations and RestrictedRange of Scores Correlation coefficient value (size) will beaffected by the range of scores in the data Severely restricted range may provide a verydifferent correlation than would a broaderrange of scores To be safe, never generalize a correlationbeyond the sample range of data

Figure 14.6 Restricted ScoreRange Influences Correlation

Correlations and Outliers An outlier is an extremely deviant individual inthe sample Characterized by a much larger (or smaller)score than all the others in the sample In a scatter plot, the point is clearly differentfrom all the other points Outliers produce a disproportionately largeimpact on the correlation coefficient

Figure 14.7 Outlier InfluencesSize of Correlation

Correlations and the Strengthof the Relationship A correlation coefficient measures the degreeof relationship on a scale from 0 to 1.00 It is easy to mistakenly interpret this decimalnumber as a percent or proportion Correlation is not a proportion Squared correlation may be interpreted as theproportion of shared variability Squared correlation is called the coefficient ofdetermination

Coefficient of Determination Coefficient of determination measures theproportion of variability in one variable thatcan be determined from the relationship withthe other variable (shared variability)Coefficient of Determinat ion r2

Figure 14.8 Three Amounts ofLinear Relationship Example

14.4 Hypothesis Tests withthe Pearson Correlation Pearson correlation is usually computed forsample data, but used to test hypothesesabout the relationship in the population Population correlation shown by Greek letterrho (ρ) Non-directional: H0: ρ 0 and H1: ρ 0Directional: H0: ρ 0 and H1: ρ 0 orDirectional: H0: ρ 0 and H1: ρ 0

Figure 14.9 Correlation inSample vs. Population

Correlation Hypothesis Test Sample correlation r used to test population ρ Degrees of freedom (df) n – 2 Hypothesis test can be computed usingeither t or F; only t shown in this chapter Use t table to find critical value with df n - 2t r (1 r 2 )(n 2)

In the Literature Report– Whether it is statistically significant Concise test results– Value of correlation– Sample size– p-value or level– Type of test (one- or two-tailed) E.g., r -0.76, n 48, p .01, two tails

Partial Correlation A partial correlation measures the relationshipbetween two variables while mathematicallycontrolling the influence of a third variable byholding it constantrxy z rxy ( rxy ryz )(1 2rxz )(1 2ryz )

Figure 14.10 Controlling theImpact of a Third Variable

14.5 Alternatives to thePearson Correlation Pearson correlation has been developed– For data having linear relationships– With data from interval or ratio measurementscales Other correlations have been developed– For data having non-linear relationships– With data from nominal or ordinal measurementscales

Spearman Correlation Spearman (rs) correlation formula is used withdata from an ordinal scale (ranks)– Used when both variables are measured on anordinal scale– Also may be used if measurement scales is intervalor ratio when relationship is consistentlydirectional but may not be linear

Figure 14.11 ConsistentNonlinear Positive Relationship

Figure 14.12 ScatterplotShowing Scores and Ranks

Ranking Tied Scores Tie scores need ranks for Spearmancorrelation Method for assigning rank– List scores in order from smallest to largest– Assign a rank to each position in the list– When two (or more) scores are tied, compute themean of their ranked position, and assign thismean value as the final rank for each score.

Special Formula for theSpearman Correlation The ranks for the scores are simply integers Calculations can be simplified– Use D as the difference between the X rank andthe Y rank for each individual to compute the rsstatisticrs 1 D62n(n 2 1)

Point-Biserial Correlation Measures relationship between two variables– One variable has only two values(called a dichotomous or binomial variable) Effect size for independent samples t-test inChapter 10 can be measures by r2– Point-biserial r2 has same value as the r2computed from t-statistic– t-statistic tests significance of the mean difference– r statistic measures the correlation size

Point-Biserial Correlation Applicable in the same situation as theindependent-measures t test in Chapter 10– Code one group 0 and the other 1 (or any twodigits) as the Y score– t-statistic evaluates the significance of meandifference– Point-Biserial r measures correlation magnitude– r2 quantifies effect size

Phi Coefficient Both variables (X and Y) are dichotomous– Both variables are re-coded to values 0 and 1 (orany two digits)– The regular Pearson formulas is used to calculate r– r2 (coefficient of determination) measures effectsize (proportion of variability in one scorepredicted by the other)

Learning Check Participants were classified as “morning people”or “evening people” then measured on a 50-pointconscientiousness scale. Which correlationshould be used to measure the relationship?A Pearson correlationB Spearman correlationC Point-biserial correlationD Phi-coefficient

Learning Check - Answer Participants were classified as “morning people”or “evening people” then measured on a 50-pointconscientiousness scale. Which correlationshould be used to measure the relationship?A Pearson correlationB Spearman correlationC Point-biserial correlationD Phi-coefficient

Learning Check Decide if each of the following statementsis True or FalseT/F The Spearman correlation is used withdichotomous dataT/F In a non-directional significance test ofa correlation, the null hypothesis statesthat the population correlation is zero

Learning Check - AnswersFalse The Spearman correlation usesordinal (ranked) dataTrue Null hypothesis assumes norelationship; ρ zero indicates norelationship in the population

14.6 Introduction to LinearEquations and Regression The Pearson correlation measures a linearrelationship between two variables Figure 14.13 makes the relationship obvious The line through the data– Makes the relationship easier to see– Shows the central tendency of the relationship– Can be used for prediction Regression analysis precisely defines the line

Figure 14.13 Regression line

Linear Equations General equation for a line– Equation: Y bX a– X and Y are variables– a and b are fixed constant

Figure 14.14Linear Equation Graph

Regression Regression is a method of finding an equationdescribing the best-fitting line for a set of data How to define a “best fitting” straight linewhen there are many possible straight lines? The answer: a line that is the best fit for theactual data that minimizes prediction errors

Regression Ŷ is the value of Y predicted by the regressionequation (regression line) for each value of X (Y- Ŷ) is the distance each data point is fromthe regression line: the error of prediction The regression procedure produces a line thatminimizes total squared error of prediction This method is called the least-squared-errorsolution

Figure 14.15 Y-Ŷ Distance: ActualData Point Minus Predicted Point

Regression Equations Regression line equation: Ŷ bX a The slope of the line, b, can be calculatedSPsYb or b rSS XsX The line goes through (MX,MY) thereforea M Y bM X

Figure 14.16 Data Points andRegression Line: Example 14.13

Standard Error of Estimate Regression equation makes a prediction Precision of the estimate is measured by thestandard error of estimate (SEoE)SEoE SS residual df (Y Yˆ ) 2n 2

Figure 14.17 Regression Lines:Perfectly Fit vs. Example 14.13

Relationship Between Correlationand Standard Error of Estimate As r goes from 0 to 1, SEoE decreases to 0 Predicted variability in Y scores:SSregression r2 SSY Unpredicted variability in Y scores:SSresidual (1 - r2) SSY Standard Error of Estimate based on r:SSresidual(1 r 2 ) SSY dfn 2

Testing Regression Significance Analysis of Regression– Similar to Analysis of Variance– Uses an F-ratio of two Mean Square values– Each MS is a SS divided by its df H0: the slope of the regression line (b or beta)is zero

Mean Squares and F-ratioMS regression MS residualF SS regressiondf regressionSSresidual df residualMS regressionMS residual

Figure 14.18 Partitioning SSand df in Regression Analysis

Learning Check A linear regression has b 3 and a 4.What is the “predicted Y” (Ŷ) for X 7?A 14B 25C 31D Cannot be determined

Learning Check - Answer A linear regression has b 3 and a 4.What is the predicted Y for X 7?A 14B 25C 31D Cannot be determined

Learning Check Decide if each of the following statementsis True or FalseT/F It is possible for the regressionequation to place none of the actualdata points on the regression lineT/F If r 0.58, the linear regressionequation predicts about one third ofthe variance in the Y scores

Learning Check - AnswersTrue The line estimates where pointsshould be but there are almostalways prediction errorsTrue When r .58, r2 .336 ( 1/3)

Figure 14.19SPSS Output for Example 14.13

Figure 14.20 SPSS Output forExamples 14.13—14.15

Figure 14.21 Scatter Plot forData of Demonstration 14.1

Equations?Concepts?AnyQuestions?

Chapter 14 Learning Outcomes (continued) 6 Explain/compute Spearman correlation coefficient (ranks) Explain/compute point-biserial correlation coefficient (one 7 dichotomous variable) 8 Explain/compute phi-coefficient for two dichotomous variables 9 Explain/compute linear regression equation to predict Y values 10 Eval

Related Documents:

independent variables. Many other procedures can also fit regression models, but they focus on more specialized forms of regression, such as robust regression, generalized linear regression, nonlinear regression, nonparametric regression, quantile regression, regression modeling of survey data, regression modeling of

Part One: Heir of Ash Chapter 1 Chapter 2 Chapter 3 Chapter 4 Chapter 5 Chapter 6 Chapter 7 Chapter 8 Chapter 9 Chapter 10 Chapter 11 Chapter 12 Chapter 13 Chapter 14 Chapter 15 Chapter 16 Chapter 17 Chapter 18 Chapter 19 Chapter 20 Chapter 21 Chapter 22 Chapter 23 Chapter 24 Chapter 25 Chapter 26 Chapter 27 Chapter 28 Chapter 29 Chapter 30 .

Chapter 7 Simple linear regression and correlation Department of Statistics and Operations Research November 24, 2019. Plan 1 Correlation 2 Simple linear regression. Plan 1 Correlation 2 Simple linear regression. De nition The measure of linear association ˆbetween two variables X and Y is estimated by the s

TO KILL A MOCKINGBIRD. Contents Dedication Epigraph Part One Chapter 1 Chapter 2 Chapter 3 Chapter 4 Chapter 5 Chapter 6 Chapter 7 Chapter 8 Chapter 9 Chapter 10 Chapter 11 Part Two Chapter 12 Chapter 13 Chapter 14 Chapter 15 Chapter 16 Chapter 17 Chapter 18. Chapter 19 Chapter 20 Chapter 21 Chapter 22 Chapter 23 Chapter 24 Chapter 25 Chapter 26

Chapter 3: Correlation and Regression The statistical tool with the help of which the relationship between two or more variables is studied is called correlation. The measure of correlation is called the Correlation Coefficie

Chapter 12. Simple Linear Regression and Correlation 12.1 The Simple Linear Regression Model 12.2 Fitting the Regression Line 12.3 Inferences on the Slope Rarameter ββββ1111 NIPRL 1 12.4 Inferences on the Regression Line 12.5 Prediction Intervals for Future Response Values 1

Linear Regression and Correlation Introduction Linear Regression refers to a group of techniques for fitting and studying the straight-line relationship between two variables. Linear regression estimates the regression coefficients β 0 and β 1 in the equation Y j β 0 β 1 X j ε j wh

LINEAR REGRESSION 12-2.1 Test for Significance of Regression 12-2.2 Tests on Individual Regression Coefficients and Subsets of Coefficients 12-3 CONFIDENCE INTERVALS IN MULTIPLE LINEAR REGRESSION 12-3.1 Confidence Intervals on Individual Regression Coefficients 12-3.2 Confidence Interval