Regression Analysis and Linear Models
Concepts, Applications, and Implementation
Richard B. Darlington and Andrew F. Hayes
1. Statistical Control and Linear Models
1.1 Statistical Control
1.1.1 The Need for Control
1.1.2 Five Methods of Control
1.1.3 Examples of Statistical Control
1.2 An Overview of Linear Models
1.2.1 What You Should Know Already
1.2.2 Statistical Software for Linear Modeling and Statistical Control
1.2.3 About Formulas
1.2.4 On Symbolic Representations
1.3 Chapter Summary
2. The Simple Regression Model
2.1 Scatterplots and Conditional Distributions
2.1.1 Scatterplots
2.1.2 A Line through Conditional Means
2.1.3 Errors of Estimate
2.2 The Simple Regression Model
2.2.1 The Regression Line
2.2.2 Variance, Covariance, and Correlation
2.2.3 Finding the Regression Line
2.2.4 Example Computations
2.2.5 Linear Regression Analysis by Computer
2.3 The Regression Coefficient versus the Correlation Coefficient
2.3.1 Properties of the Regression and Correlation Coefficients
2.3.2 Uses of the Regression and Correlation Coefficients
2.4 Residuals
2.4.1 The Three Components of Y
2.4.2 Algebraic Properties of Residuals
2.4.3 Residuals as Y Adjusted for Differences in X
2.4.4 Residual Analysis
2.5 Chapter Summary
3. Partial Relationship and the Multiple Regression Model
3.1 Regression Analysis with More Than One Predictor Variable
3.1.1 An Example
3.1.2 Regressors
3.1.3 Models
3.1.4 Representing a Model Geometrically
3.1.5 Model Errors
3.1.6 An Alternative View of the Model
3.2 The Best-Fitting Model
3.2.1 Model Estimation with Computer Software
3.2.2 Partial Regression Coefficients
3.2.3 The Regression Constant
3.2.4 Problems with Three or More Regressors
3.2.5 The Multiple Correlation R
3.3 Scale-Free Measures of Partial Association
3.3.1 Semipartial Correlation
3.3.2 Partial Correlation
3.3.3 The Standardized Regression Coefficient
3.4 Some Relations among Statistics
3.4.1 Relations among Simple, Multiple, Partial, and Semipartial Correlations
3.4.2 Venn Diagrams
3.4.3 Partial Relationships and Simple Relationships May Have Different Signs
3.4.4 How Covariates Affect Regression Coefficients
3.4.5 Formulas for bj, prj, srj, and R
3.5 Chapter Summary
4. Statistical Inference in Regression
4.1 Concepts in Statistical Inference
4.1.1 Statistics and Parameters
4.1.2 Assumptions for Proper Inference
4.1.3 Expected Values and Unbiased Estimation
4.2 The ANOVA Summary Table
4.2.1 Data = Model + Error
4.2.2 Total and Regression Sums of Squares
4.2.3 Degrees of Freedom
4.2.4 Mean Squares
4.3 Inference about the Multiple Correlation
4.3.1 Biased and Less Biased Estimation of TR2
4.3.2 Testing a Hypothesis about TR
4.4 The Distribution of and Inference about a Partial Regression Coefficient
4.4.1 Testing a Null Hypothesis about Tbj
4.4.2 Interval Estimates for Tbj
4.4.3 Factors Affecting the Standard Error of bj
4.4.4 Tolerance
4.5 Inferences about Partial Correlations
4.5.1 Testing a Null Hypothesis about Tprj and Tsrj
4.5.2 Other Inferences about Partial Correlations
4.6 Inferences about Conditional Means
4.7 Miscellaneous Issues in Inference
4.7.1 How Great a Drawback Is Collinearity?
4.7.2 Contradicting Inferences
4.7.3 Sample Size and Nonsignificant Covariates
4.7.4 Inference in Simple Regression (When k = 1)
4.8 Chapter Summary
5. Extending Regression Analysis Principles
5.1 Dichotomous Regressors
5.1.1 Indicator or Dummy Variables
5.1.2 Y Is a Group Mean
5.1.3 The Regression Coefficient for an Indicator Is a Difference
5.1.4 A Graphic Representation
5.1.5 A Caution about Standardized Regression Coefficients for Dichotomous Regressors
5.1.6 Artificial Categorization of Numerical Variables
5.2 Regression to the Mean
5.2.1 How Regression Got Its Name
5.2.2 The Phenomenon
5.2.3 Versions of the Phenomenon
5.2.4 Misconceptions and Mistakes Fostered by Regression to the Mean
5.2.5 Accounting for Regression to the Mean Using Linear Models
5.3 Multidimensional Sets
5.3.1 The Partial and Semipartial Multiple Correlation
5.3.2 What It Means If PR = 0 or SR = 0
5.3.3 Inference Concerning Sets of Variables
5.4 A Glance at the Big Picture
5.4.1 Further Extensions of Regression
5.4.2 Some Difficulties and Limitations
5.5 Chapter Summary
6. Statistical versus Experimental Control
6.1 Why Random Assignment?
6.1.1 Limitations of Statistical Control
6.1.2 The Advantage of Random Assignment
6.1.3 The Meaning of Random Assignment
6.2 Limitations of Random Assignment
6.2.1 Limitations Common to Statistical Control and Random Assignment
6.2.2 Limitations Specific to Random Assignment
6.2.3 Correlation and Causation
6.3 Supplementing Random Assignment with Statistical Control
6.3.1 Increased Precision and Power
6.3.2 Invulnerability to Chance Differences between Groups
6.3.3 Quantifying and Assessing Indirect Effects
6.4 Chapter Summary
7. Regression for Prediction
7.1 Mechanical Prediction and Regression
7.1.1 The Advantages of Mechanical Prediction
7.1.2 Regression as a Mechanical Prediction Method
7.1.3 A Focus on R Rather Than the Regression Weights
7.2 Estimating True Validity
7.2.1 Shrunken versus Adjusted R
7.2.2 Estimating TRS
7.2.3 Shrunken R Using Statistical Software
7.3 Selecting Predictor Variables
7.3.1 Stepwise Regression
7.3.2 All Subsets Regression
7.3.3 How Do Variable Selection Methods Perform?
7.4 Predictor Variable Configurations
7.4.1 Partial Redundancy (the Standard Configuration)
7.4.2 Complete Redundancy
7.4.3 Independence
7.4.4 Complementarity
7.4.5 Suppression
7.4.6 How These Configurations Relate to the Correlation between Predictors
7.4.7 Configurations of Three or More Predictors
7.5 Revisiting the Value of Human Judgment
7.6 Chapter Summary
8. Assessing the Importance of Regressors
8.1 What Does It Mean for a Variable to Be Important?
8.1.1 Variable Importance in Substantive or Applied Terms
8.1.2 Variable Importance in Statistical Terms
8.2 Should Correlations Be Squared?
8.2.1 Decision Theory
8.2.2 Small Squared Correlations Can Reflect Noteworthy Effects
8.2.3 Pearson’s r as the Ratio of a Regression Coefficient to Its Maximum Possible Value
8.2.4 Proportional Reduction in Estimation Error
8.2.5 When the Standard Is Perfection
8.2.6 Summary
8.3 Determining the Relative Importance of Regressors in a Single Regression Model
8.3.1 The Limitations of the Standardized Regression Coefficient
8.3.2 The Advantage of the Semipartial Correlation
8.3.3 Some Equivalences among Measures
8.3.4 Cohen’s f 2
8.3.5 Comparing Two Regression Coefficients in the Same Model
8.4 Dominance Analysis
8.4.1 Complete and Partial Dominance
8.4.2 Example Computations
8.4.3 Dominance Analysis Using a Regression Program
8.5 Chapter Summary
9. Multicategorical Regressors
9.1 Multicategorical Variables as Sets
9.1.1 Indicator (Dummy) Coding
9.1.2 Constructing Indicator Variables
9.1.3 The Reference Category
9.1.4 Testing the Equality of Several Means
9.1.5 Parallels with Analysis of Variance
9.1.6 Interpreting Estimated Y and the Regression Coefficients
9.2 Multicategorical Regressors as or with Covariates
9.2.1 Multicategorical Variables as Covariates
9.2.2 Comparing Groups and Statistical Control
9.2.3 Interpretation of Regression Coefficients
9.2.4 Adjusted Means
9.2.5 Parallels with ANCOVA
9.2.6 More Than One Covariate
9.3 Chapter Summary
10. More on Multicategorical Regressors
10.1 Alternative Coding Systems
10.1.1 Sequential (Adjacent or Repeated Categories) Coding
10.1.2 Helmert Coding
10.1.3 Effect Coding
10.2 Comparisons and Contrasts
10.2.1 Contrasts
10.2.2 Computing the Standard Error of a Contrast
10.2.3 Contrasts Using Statistical Software
10.2.4 Covariates and the Comparison of Adjusted Means
10.3 Weighted Group Coding and Contrasts
10.3.1 Weighted Effect Coding
10.3.2 Weighted Helmert Coding
10.3.3 Weighted Contrasts
10.3.4 Application to Adjusted Means
10.4 Chapter Summary
11. Multiple Tests
11.1 The Multiple-Test Problem
11.1.1 An Illustration through Simulation
11.1.2 The Problem Defined
11.1.3 The Role of Sample Size
11.1.4 The Generality of the Problem
11.1.5 Do Omnibus Tests Offer “Protection”?
11.1.6 Should You Be Concerned about the Multiple-Test Problem?
11.2 The Bonferroni Method
11.2.1 Independent Tests
11.2.2 The Bonferroni Method for Nonindependent Tests
11.2.3 Revisiting the Illustration
11.2.4 Bonferroni Layering
11.2.5 Finding an “Exact” p-Value
11.2.6 Nonsense Values
11.2.7 Flexibility of the Bonferroni Method
11.2.8 Power of the Bonferroni Method
11.3 Some Basic Issues Surrounding Multiple Tests
11.3.1 Why Correct for Multiple Tests at All?
11.3.2 Why Not Correct for the Whole History of Science?
11.3.3 Plausibility and Logical Independence of Hypotheses
11.3.4 Planned versus Unplanned Tests
11.4 Summary
11.5 Chapter Summary
12. Nonlinear Relationships
12.1 Linear Regression Can Model Nonlinear Relationships
12.1.1 When Must Curves Be Fitted?
12.1.2 The Graphical Display of Curvilinearity
12.2 Polynomial Regression
12.2.1 Basic Principles
12.2.2 An Example
12.2.3 The Meaning of the Regression Coefficients for Lower-Order Regressors
12.2.4 Centering Variables in Polynomial Regression
12.2.5 Finding a Parabola’s Maximum or Minimum
12.3 Spline Regression
12.3.1 Linear Spline Regression
12.3.2 Implementation in Statistical Software
12.3.3 Polynomial Spline Regression
12.3.4 Covariates, Weak Curvilinearity, and Choosing Joints
12.4 Transformations of Dependent Variables or Regressors
12.4.1 Logarithmic Transformation
12.4.2 The Box–Cox Transformation
12.5 Chapter Summary
13. Linear Interaction
13.1 Interaction Fundamentals
13.1.1 Interaction as a Difference in Slope
13.1.2 Interaction between Two Numerical Regressors
13.1.3 Interaction versus Intercorrelation
13.1.4 Simple Linear Interaction
13.1.5 Representing Simple Linear Interaction with a Cross-product
13.1.6 The Symmetry of Interaction
13.1.7 Interaction as a Warped Surface
13.1.8 Covariates in a Regression Model with an Interaction
13.1.9 The Meaning of the Regression Coefficients
13.1.10 An Example with Estimation Using Statistical Software
13.2 Interaction Involving a Categorical Regresson
13.2.1 Interaction between a Dichotomous and a Numerical Regressor
13.2.2 The Meaning of the Regression Coefficients
13.2.3 Interaction Involving a Multicategorical and a Numerical Regressor
13.2.4 Inference When Interaction Requires More Than One Regression Coefficient
13.2.5 A Substantive Example
13.2.6 Interpretation of the Regression Coefficients
13.3 Interaction between Two Categorical Regressors
13.3.1 The 2 × 2 Design
13.3.2 Interaction between a Dichotomous and a Multicategorical Regressor
13.3.3 Interaction between Two Multicategorical Regressors
13.4 Chapter Summary
14. Probing Interactions and Various Complexities
14.1 Conditional Effects as Functions
14.1.1 When the Interaction Involves Dichotomous or Numerical Variables
14.1.2 When the Interaction Involves a Multicategorical Variable
14.2 Inference about a Conditional Effect
14.2.1 When the Focal Predictor and Moderator Are Numerical or Dichotomous
14.2.2 When the Focal Predictor or Moderator Is Multicategorical
14.3 Probing an Interaction
14.3.1 Examining Conditional Effects at Various Values of the Moderator
14.3.2 The Johnson–Neyman Technique
14.3.3 Testing versus Probing an Interaction
14.3.4 Comparing Conditional Effects
14.4 Complications and Confusions in the Study of Interactions
14.4.1 The Difficulty of Detecting Interactions
14.4.2 Confusing Interaction with Curvilinearity
14.4.3 How the Scaling of Y Affects Interaction
14.4.4 The Interpretation of Lower-Order Regression Coefficients When a Cross-Product Is Present
14.4.5 Some Myths about Testing Interaction
14.4.6 Interaction and Nonsignificant Linear Terms
14.4.7 Homogeneity of Regression in ANCOVA
14.4.8 Multiple, Higher-Order, and Curvilinear Interactions
14.4.9 Artificial Categorization of Continua
14.5 Organizing Tests on Interaction
14.5.1 Three Approaches to Managing Complications
14.5.2 Broad versus Narrow Tests
14.6 Chapter Summary
15. Mediation and Path Analysis
15.1 Path Analysis and Linear Regression
15.1.1 Direct, Indirect, and Total Effects
15.1.2 The Regression Algebra of Path Analysis
15.1.3 Covariates
15.1.4 Inference about the Total and Direct Effects
15.1.5 Inference about the Indirect Effect
15.1.6 Implementation in Statistical Software
15.2 Multiple Mediator Models
15.2.1 Path Analysis for a Parallel Mediation Model
15.2.2 Path Analysis for a Serial Mediation Model
15.3 Extensions, Complications, and Miscellaneous Issues
15.3.1 Causality and Causal Order
15.3.2 The Causal Steps Approach
15.3.3 Mediation of a Nonsignificant Total Effect
15.3.4 Multicategorical Independent Variables
15.3.5 Fixing Direct Effects to Zero
15.3.6 Nonlinear Effects
15.3.7 Moderated Mediation
15.4 Chapter Summary
16. Detecting and Managing Irregularities
16.1 Regression Diagnostics
16.1.1 Shortcomings of Eyeballing the Data
16.1.2 Types of Extreme Cases
16.1.3 Quantifying Leverage, Distance, and Influence
16.1.4 Using Diagnostic Statistics
16.1.5 Generating Regression Diagnostics with Computer Software
16.2 Detecting Assumption Violations
16.2.1 Detecting Nonlinearity
16.2.2 Detecting Non-Normality
16.2.3 Detecting Heteroscedasticity
16.2.4 Testing Assumptions as a Set
16.2.5 What about Nonindependence?
16.3 Dealing with Irregularities
16.3.1 Heteroscedasticity-Consistent Standard Errors
16.3.2 The Jackknife
16.3.3 Bootstrapping
16.3.4 Permutation Tests
16.4 Inference without Random Sampling
16.5 Keeping the Diagnostic Analysis Manageable
16.6 Chapter Summary
17. Power, Measurement Error, and Various Miscellaneous Topics
17.1 Power and Precision of Estimation
17.1.1 Factors Determining Desirable Sample Size
17.1.2 Revisiting the Standard Error of a Regression Coefficient
17.1.3 On the Effect of Unnecessary Covariates
17.2 Measurement Error
17.2.1 What Is Measurement Error?
17.2.2 Measurement Error in Y
17.2.3 Measurement Error in Independent Variables
17.2.4 The Biggest Weakness of Regression: Measurement Error in Covariates
17.2.5 Summary: The Effects of Measurement Error
17.2.6 Managing Measurement Error
17.3 An Assortment of Problems
17.3.1 Violations of the Basic Assumptions
17.3.2 Collinearity
17.3.3 Singularity
17.3.4 Specification Error and Overcontrol
17.3.5 Noninterval Scaling
17.3.6 Missing Data
17.3.7 Rounding Error
17.4 Chapter Summary
18. Logistic Regression and Other Linear Models
18.1 Logistic Regression
18.1.1 Measuring a Model’s Fit to Data
18.1.2 Odds and Logits
18.1.3 The Logistic Regression Equation
18.1.4 An Example with a Single Regressor
18.1.5 Interpretation of and Inference about the Regression Coefficients
18.1.6 Multiple Logistic Regression and Implementation in Computing Software
18.1.7 Measuring and Testing the Fit of the Model
18.1.8 Further Extensions
18.1.9 Discriminant Function Analysis
18.1.10 Using OLS Regression with a Dichotomous Y
18.2 Other Linear Modeling Methods
18.2.1 Ordered Logistic and Probit Regression
18.2.2 Poisson Regression and Related Models of Count Outcomes
18.2.3 Time Series Analysis
18.2.4 Survival Analysis
18.2.5 Structural Equation Modeling
18.2.6 Multilevel Modeling
18.2.7 Other Resources
18.3 Chapter Summary
Appendices
A. The RLM Macro for SPSS and SAS
B. Linear Regression Analysis Using R
C. Statistical Tables
D. The Matrix Algebra of Linear Regression Analysis
Author Index
Subject Index
References
About the Authors