Chi-Square Contingency Table Calculations: 5 Different Examples with Step-by-Step English Guide
The Chi-Square Test of Independence tests the association between two categorical variables using contingency tables. In this article, we will perform step-by-step Chi-Square calculations for 5 different contingency table examples in English. For each example, we will provide observed frequencies, expected frequencies, Chi-Square statistic, degrees of freedom, p-value, and interpretation.
Chi-Square Formula:
Where:
- \(O_i\): Observed frequency.
- \(E_i\): Expected frequency.
- \(\sum\): Sum across all cells.
Expected Frequency Formula:
Degrees of Freedom (df):
Where \( r \) = number of rows, \( c \) = number of columns.
Example 1: 2×2 Table – Gender and Smoking Status
Problem: Is there an association between gender and smoking status?
Observed Contingency Table:
Smoker | Non-Smoker | Total | |
---|---|---|---|
Male | 20 | 30 | 50 |
Female | 10 | 40 | 50 |
Total | 30 | 70 | 100 |
Hypothesis:
- H₀: Gender and smoking status are independent.
- H₁: There is an association between gender and smoking status.
Step 1: Calculate Expected Frequencies
\[ E_i = \frac{(\text{Row Total}) \times (\text{Column Total})}{\text{Grand Total}} \]
- Male, Smoker: \( \frac{50 \times 30}{100} = 15 \)
- Male, Non-Smoker: \( \frac{50 \times 70}{100} = 35 \)
- Female, Smoker: \( \frac{50 \times 30}{100} = 15 \)
- Female, Non-Smoker: \( \frac{50 \times 70}{100} = 35 \)
Expected Frequencies Table:
Smoker | Non-Smoker | |
---|---|---|
Male | 15 | 35 |
Female | 15 | 35 |
Step 2: Calculate Chi-Square Statistic
\[ \chi^2 = \sum \frac{(O_i – E_i)^2}{E_i} \]
- Male, Smoker: \( \frac{(20 – 15)^2}{15} = \frac{25}{15} \approx 1.667 \)
- Male, Non-Smoker: \( \frac{(30 – 35)^2}{35} = \frac{25}{35} \approx 0.714 \)
- Female, Smoker: \( \frac{(10 – 15)^2}{15} = \frac{25}{15} \approx 1.667 \)
- Female, Non-Smoker: \( \frac{(40 – 35)^2}{35} = \frac{25}{35} \approx 0.714 \)
Total: \[ \chi^2 = 1.667 + 0.714 + 1.667 + 0.714 = 4.762 \]
Step 3: Degrees of Freedom
\[ df = (r – 1) \times (c – 1) = (2 – 1) \times (2 – 1) = 1 \]
Step 4: P-Value and Interpretation
For \( \chi^2 = 4.762 \), df = 1, p-value ≈ 0.029 (from Chi-Square table or software).
Interpretation:
- p-value = 0.029: Less than 0.05, so we reject H₀.
- Conclusion: There is a significant association between gender and smoking status.
Step 5: Check Assumptions
All expected frequencies are ≥ 5, observations are independent, and data is categorical.
Example 2: 3×2 Table – Education Level and Job Satisfaction
Problem: Is there an association between education level and job satisfaction?
Observed Contingency Table:
Satisfied | Not Satisfied | Total | |
---|---|---|---|
High School | 40 | 60 | 100 |
Bachelor’s | 70 | 30 | 100 |
Master’s | 80 | 20 | 100 |
Total | 190 | 110 | 300 |
Hypothesis:
- H₀: Education level and job satisfaction are independent.
- H₁: There is an association.
Step 1: Calculate Expected Frequencies
- High School, Satisfied: \( \frac{100 \times 190}{300} = 63.33 \)
- High School, Not Satisfied: \( \frac{100 \times 110}{300} = 36.67 \)
- Bachelor’s, Satisfied: \( \frac{100 \times 190}{300} = 63.33 \)
- Bachelor’s, Not Satisfied: \( \frac{100 \times 110}{300} = 36.67 \)
- Master’s, Satisfied: \( \frac{100 \times 190}{300} = 63.33 \)
- Master’s, Not Satisfied: \( \frac{100 \times 110}{300} = 36.67 \)
Expected Frequencies Table:
Satisfied | Not Satisfied | |
---|---|---|
High School | 63.33 | 36.67 |
Bachelor’s | 63.33 | 36.67 |
Master’s | 63.33 | 36.67 |
Step 2: Calculate Chi-Square Statistic
- High School, Satisfied: \( \frac{(40 – 63.33)^2}{63.33} \approx 5.433 \)
- High School, Not Satisfied: \( \frac{(60 – 36.67)^2}{36.67} \approx 14.841 \)
- Bachelor’s, Satisfied: \( \frac{(70 – 63.33)^2}{63.33} \approx 0.703 \)
- Bachelor’s, Not Satisfied: \( \frac{(30 – 36.67)^2}{36.67} \approx 1.215 \)
- Master’s, Satisfied: \( \frac{(80 – 63.33)^2}{63.33} \approx 4.mediaplayer4.389 \)
- Master’s, Not Satisfied: \( \frac{(20 – 36.67)^2}{36.67} \approx 7.571 \)
Total: \[ \chi^2 = 5.433 + 14.841 + 0.703 + 1.215 + 4.389 + 7.571 = 34.152 \]
Step 3: Degrees of Freedom
\[ df = (3 – 1) \times (2 – 1) = 2 \]
Step 4: P-Value and Interpretation
For \( \chi^2 = 34.152 \), df = 2, p-value < 0.001.
Interpretation:
- p-value < 0.001: Less than 0.05, so we reject H₀.
- Conclusion: There is a significant association between education level and job satisfaction.
Step 5: Check Assumptions
All expected frequencies are ≥ 5, observations are independent, and data is categorical.
Example 3: 3×3 Table – Region and Preferred Transport
Problem: Is there an association between region and preferred transport mode?
Observed Contingency Table:
Car | Bus | Train | Total | |
---|---|---|---|---|
Urban | 50 | 30 | 20 | 100 |
Suburban | 40 | 40 | 20 | 100 |
Rural | 20 | 50 | 30 | 100 |
Total | 110 | 120 | 70 | 300 |
Hypothesis:
- H₀: Region and transport mode are independent.
- H₁: There is an association.
Step 1: Calculate Expected Frequencies
- Urban, Car: \( \frac{100 \times 110}{300} = 36.67 \)
- Urban, Bus: \( \frac{100 \times 120}{300} = 40 \)
- Urban, Train: \( \frac{100 \times 70}{300} = 23.33 \)
- Suburban, Car: \( \frac{100 \times 110}{300} = 36.67 \)
- Suburban, Bus: \( \frac{100 \times 120}{300} = 40 \)
- Suburban, Train: \( \frac{100 \times 70}{300} = 23.33 \)
- Rural, Car: \( \frac{100 \times 110}{300} = 36.67 \)
- Rural, Bus: \( \frac{100 \times 120}{300} = 40 \)
- Rural, Train: \( \frac{100 \times 70}{300} = 23.33 \)
Expected Frequencies Table:
Car | Bus | Train | |
---|---|---|---|
Urban | 36.67 | 40 | 23.33 |
Suburban | 36.67 | 40 | 23.33 |
Rural | 36.67 | 40 | 23.33 |
Step 2: Calculate Chi-Square Statistic
- Urban, Car: \( \frac{(50 – 36.67)^2}{36.67} \approx 4.848 \)
- Urban, Bus: \( \frac{(30 – 40)^2}{40} = 2.5 \)
- Urban, Train: \( \frac{(20 – 23.33)^2}{23.33} \approx 0.475 \)
- Suburban, Car: \( \frac{(40 – 36.67)^2}{36.67} \approx 0.303 \)
- Suburban, Bus: \( \frac{(40 – 40)^2}{40} = 0 \)
- Suburban, Train: \( \frac{(20 – 23.33)^2}{23.33} \approx 0.475 \)
- Rural, Car: \( \frac{(20 – 36.67)^2}{36.67} \approx 7.576 \)
- Rural, Bus: \( \frac{(50 – 40)^2}{40} = 2.5 \)
- Rural, Train: \( \frac{(30 – 23.33)^2}{23.33} \approx 1.905 \)
Total: \[ \chi^2 = 4.848 + 2.5 + 0.475 + 0.303 + 0 + 0.475 + 7.576 + 2.5 + 1.905 = 20.582 \]
Step 3: Degrees of Freedom
\[ df = (3 – 1) \times (3 – 1) = 4 \]
Step 4: P-Value and Interpretation
For \( \chi^2 = 20.582 \), df = 4, p-value ≈ 0.0004.
Interpretation:
- p-value ≈ 0.0004: Less than 0.05, so we reject H₀.
- Conclusion: There is a significant association between region and preferred transport mode.
Step 5: Check Assumptions
All expected frequencies are ≥ 5, observations are independent, and data is categorical.
Example 4: 4×2 Table – Age Group and Voting Preference
Problem: Is there an association between age group and voting preference?
Observed Contingency Table:
Party A | Party B | Total | |
---|---|---|---|
18-25 | 30 | 20 | 50 |
26-35 | 40 | 30 | 70 |
36-50 | 50 | 40 | 90 |
51+ | 60 | 30 | 90 |
Total | 180 | 120 | 300 |
Hypothesis:
- H₀: Age group and voting preference are independent.
- H₁: There is an association.
Step 1: Calculate Expected Frequencies
- 18-25, Party A: \( \frac{50 \times 180}{300} = 30 \)
- 18-25, Party B: \( \frac{50 \times 120}{300} = 20 \)
- 26-35, Party A: \( \frac{70 \times 180}{300} = 42 \)
- 26-35, Party B: \( \frac{70 \times 120}{300} = 28 \)
- 36-50, Party A: \( \frac{90 \times 180}{300} = 54 \)
- 36-50, Party B: \( \frac{90 \times 120}{300} = 36 \)
- 51+, Party A: \( \frac{90 \times 180}{300} = 54 \)
- 51+, Party B: \( \frac{90 \times 120}{300} = 36 \)
Expected Frequencies Table:
Party A | Party B | |
---|---|---|
18-25 | 30 | 20 |
26-35 | 42 | 28 |
36-50 | 54 | 36 |
51+ | 54 | 36 |
Step 2: Calculate Chi-Square Statistic
- 18-25, Party A: \( \frac{(30 – 30)^2}{30} = 0 \)
- 18-25, Party B: \( \frac{(20 – 20)^2}{20} = 0 \)
- 26-35, Party A: \( \frac{(40 – 42)^2}{42} \approx 0.095 \)
- 26-35, Party B: \( \frac{(30 – 28)^2}{28} \approx 0.143 \)
- 36-50, Party A: \( \frac{(50 – 54)^2}{54} \approx 0.296 \)
- 36-50, Party B: \( \frac{(40 – 36)^2}{36} \approx 0.444 \)
- 51+, Party A: \( \frac{(60 – 54)^2}{54} \approx 0.667 \)
- 51+, Party B: \( \frac{(30 – 36)^2}{36} \approx 1.000 \)
Total: \[ \chi^2 = 0 + 0 + 0.095 + 0.143 + 0.296 + 0.444 + 0.667 + 1.000 = 2.645 \]
Step 3: Degrees of Freedom
\[ df = (4 – 1) \times (2 – 1) = 3 \]
Step 4: P-Value and Interpretation
For \( \chi^2 = 2.645 \), df = 3, p-value ≈ 0.450.
Interpretation:
- p-value ≈ 0.450: Greater than 0.05, so we do not reject H₀.
- Conclusion: There is no significant association between age group and voting preference.
Step 5: Check Assumptions
All expected frequencies are ≥ 5, observations are independent, and data is categorical.
Example 5: 2×3 Table – Gender and Product Preference
Problem: Is there an association between gender and product preference?
Observed Contingency Table:
Product X | Product Y | Product Z | Total | |
---|---|---|---|---|
Male | 40 | 30 | 20 | 90 |
Female | 20 | 40 | 50 | 110 |
Total | 60 | 70 | 70 | 200 |
Hypothesis:
- H₀: Gender and product preference are independent.
- H₁: There is an association.
Step 1: Calculate Expected Frequencies
- Male, Product X: \( \frac{90 \times 60}{200} = 27 \)
- Male, Product Y: \( \frac{90 \times 70}{200} = 31.5 \)
- Male, Product Z: \( \frac{90 \times 70}{200} = 31.5 \)
- Female, Product X: \( \frac{110 \times 60}{200} = 33 \)
- Female, Product Y: \( \frac{110 \times 70}{200} = 38.5 \)
- Female, Product Z: \( \frac{110 \times 70}{200} = 38.5 \)
Expected Frequencies Table:
Product X | Product Y | Product Z | |
---|---|---|---|
Male | 27 | 31.5 | 31.5 |
Female | 33 | 38.5 | 38.5 |
Step 2: Calculate Chi-Square Statistic
- Male, Product X: \( \frac{(40 – 27)^2}{27} \approx 6.259 \)
- Male, Product Y: \( \frac{(30 – 31.5)^2}{31.5} \approx 0.071 \)
- Male, Product Z: \( \frac{(20 – 31.5)^2}{31.5} \approx 4.206 \)
- Female, Product X: \( \frac{(20 – 33)^2}{33} \approx 5.121 \)
- Female, Product Y: \( \frac{(40 – 38.5)^2}{38.5} \approx 0.058 \)
- Female, Product Z: \( \frac{(50 – 38.5)^2}{38.5} \approx 3.442 \)
Total: \[ \chi^2 = 6.259 + 0.071 + 4.206 + 5.121 + 0.058 + 3.442 = 19.157 \]
Step 3: Degrees of Freedom
\[ df = (2 – 1) \times (3 – 1) = 2 \]
Step 4: P-Value and Interpretation
For \( \chi^2 = 19.157 \), df = 2, p-value < 0.001.
Interpretation:
- p-value < 0.001: Less than 0.05, so we reject H₀.
- Conclusion: There is a significant association between gender and product preference.
Step 5: Check Assumptions
All expected frequencies are ≥ 5, observations are independent, and data is categorical.
Summary
- Chi-Square Test: Tests the association between two categorical variables.
- Examples and Results:
- 2×2 (Gender and Smoking): \( \chi^2 = 4.762 \), df = 1, p = 0.029 (significant).
- 3×2 (Education and Job Satisfaction): \( \chi^2 = 34.152 \), df = 2, p < 0.001 (significant).
- 3×3 (Region and Transport): \( \chi^2 = 20.582 \), df = 4, p ≈ 0.0004 (significant).
- 4×2 (Age and Voting): \( \chi^2 = 2.645 \), df = 3, p ≈ 0.450 (non-significant).
- 2×3 (Gender and Product): \( \chi^2 = 19.157 \), df = 2, p < 0.001 (significant).
- Formulas:
\[ \chi^2 = \sum \frac{(O_i – E_i)^2}{E_i}, \quad E_i = \frac{(\text{Row Total}) \times (\text{Column Total})}{\text{Grand Total}} \]
This guide will help you understand and apply Chi-Square Contingency Table calculations. If you have further questions, please leave a comment!