Linear Regression: Complete Guide with SPSS Implementation
Table of Contents
What is Linear Regression?
Linear Regression is a statistical method that models the relationship between a dependent variable (y) and one or more independent variables (x) by fitting a linear equation to observed data.
- Simple Linear Regression: One independent variable (e.g., study hours vs exam scores)
- Multiple Linear Regression: Multiple independent variables (e.g., study hours, sleep hours vs exam scores)
The regression line represents the best-fit straight line through the data points:
\[ y = \beta_0 + \beta_1x + \epsilon \]
Where:
- y = Dependent variable
- x = Independent variable
- β₀ = Intercept
- β₁ = Slope
- ε = Error term
When to Use Linear Regression?
Situation | Example |
---|---|
Linear Relationship between variables | Temperature vs Ice cream sales |
Continuous Dependent Variable | House prices, exam scores |
Prediction of outcomes | Predicting sales from advertising budget |
Relationship Analysis between variables | Effect of study time on exam performance |
When Not to Use Linear Regression?
Linear Regression is not appropriate when:
Situation | Alternative |
---|---|
Non-linear relationships | Polynomial regression |
Categorical dependent variable | Logistic regression |
Violation of assumptions | Data transformation or other models |
Time-series data | ARIMA models |
Linear Regression Formulas
Simple Linear Regression:
\[ y = \beta_0 + \beta_1x \]
Slope (β₁):
\[ \beta_1 = \frac{\sum(x_i – \bar{x})(y_i – \bar{y})}{\sum(x_i – \bar{x})^2} \]
Intercept (β₀):
\[ \beta_0 = \bar{y} – \beta_1\bar{x} \]
R-Squared (Goodness of Fit):
\[ R^2 = 1 – \frac{\sum(y_i – \hat{y}_i)^2}{\sum(y_i – \bar{y})^2} \]
Step-by-Step Example with Calculations
Study Hours vs Exam Scores:
Study Hours (x) | Exam Score (y) |
---|---|
2 | 50 |
3 | 60 |
4 | 65 |
5 | 70 |
6 | 80 |
Calculations:
\[ \bar{x} = 4, \bar{y} = 65 \]
\[ \beta_1 = \frac{70}{10} = 7 \]
\[ \beta_0 = 65 – 7 \times 4 = 37 \]
\[ \text{Regression Equation: } y = 37 + 7x \]
\[ R^2 = 0.98 \]
Interpretation: Each additional study hour increases exam score by 7 points (R² = 98%).
SPSS Implementation Guide
Step 1: Enter Data in SPSS
Create two variables in SPSS:
- Study_Hours (numeric)
- Exam_Score (numeric)
Step 2: Run Linear Regression
- Go to Analyze > Regression > Linear
- Add Exam_Score as Dependent
- Add Study_Hours as Independent
Step 3: Interpret SPSS Output
Model Summary:
R | R Square | Adjusted R Square |
---|---|---|
0.990 | 0.980 | 0.973 |
Coefficients:
Unstandardized Coefficients | Sig. | |
---|---|---|
(Constant) | 37.000 | 0.002 |
Study_Hours | 7.000 | 0.002 |
Final Equation: Exam_Score = 37 + 7×Study_Hours
The model explains 98% of variance in exam scores (p < 0.05).
Summary
- Linear regression models relationships between continuous variables
- Requires linearity, normality, and other assumptions
- SPSS provides comprehensive regression analysis tools
- Our example showed strong relationship (R²=0.98) between study hours and exam scores
- Always check assumptions and consider model limitations
VERY NICE
GOOD
VERY GOOD