Chi-Square Contingency Table Calculations: 5 Different Examples 

Chi-Square Contingency Table Calculations: 5 Different Examples with Step-by-Step English Guide

Chi-Square Contingency Table Calculations: 5 Different Examples with Step-by-Step English Guide

The Chi-Square Test of Independence tests the association between two categorical variables using contingency tables. In this article, we will perform step-by-step Chi-Square calculations for 5 different contingency table examples in English. For each example, we will provide observed frequencies, expected frequencies, Chi-Square statistic, degrees of freedom, p-value, and interpretation.

Chi-Square Formula:

\[ \chi^2 = \sum \frac{(O_i – E_i)^2}{E_i} \]

Where:

  • \(O_i\): Observed frequency.
  • \(E_i\): Expected frequency.
  • \(\sum\): Sum across all cells.

Expected Frequency Formula:

\[ E_i = \frac{(\text{Row Total}) \times (\text{Column Total})}{\text{Grand Total}} \]

Degrees of Freedom (df):

\[ df = (r – 1) \times (c – 1) \]

Where \( r \) = number of rows, \( c \) = number of columns.

Example 1: 2×2 Table – Gender and Smoking Status

Problem: Is there an association between gender and smoking status?

Observed Contingency Table:

Smoker Non-Smoker Total
Male 20 30 50
Female 10 40 50
Total 30 70 100

Hypothesis:

  • H₀: Gender and smoking status are independent.
  • H₁: There is an association between gender and smoking status.

Step 1: Calculate Expected Frequencies

\[ E_i = \frac{(\text{Row Total}) \times (\text{Column Total})}{\text{Grand Total}} \]

  • Male, Smoker: \( \frac{50 \times 30}{100} = 15 \)
  • Male, Non-Smoker: \( \frac{50 \times 70}{100} = 35 \)
  • Female, Smoker: \( \frac{50 \times 30}{100} = 15 \)
  • Female, Non-Smoker: \( \frac{50 \times 70}{100} = 35 \)

Expected Frequencies Table:

Smoker Non-Smoker
Male 15 35
Female 15 35

Step 2: Calculate Chi-Square Statistic

\[ \chi^2 = \sum \frac{(O_i – E_i)^2}{E_i} \]

  • Male, Smoker: \( \frac{(20 – 15)^2}{15} = \frac{25}{15} \approx 1.667 \)
  • Male, Non-Smoker: \( \frac{(30 – 35)^2}{35} = \frac{25}{35} \approx 0.714 \)
  • Female, Smoker: \( \frac{(10 – 15)^2}{15} = \frac{25}{15} \approx 1.667 \)
  • Female, Non-Smoker: \( \frac{(40 – 35)^2}{35} = \frac{25}{35} \approx 0.714 \)

Total: \[ \chi^2 = 1.667 + 0.714 + 1.667 + 0.714 = 4.762 \]

Step 3: Degrees of Freedom

\[ df = (r – 1) \times (c – 1) = (2 – 1) \times (2 – 1) = 1 \]

Step 4: P-Value and Interpretation

For \( \chi^2 = 4.762 \), df = 1, p-value ≈ 0.029 (from Chi-Square table or software).

Interpretation:

  • p-value = 0.029: Less than 0.05, so we reject H₀.
  • Conclusion: There is a significant association between gender and smoking status.

Step 5: Check Assumptions

All expected frequencies are ≥ 5, observations are independent, and data is categorical.

Example 2: 3×2 Table – Education Level and Job Satisfaction

Problem: Is there an association between education level and job satisfaction?

Observed Contingency Table:

Satisfied Not Satisfied Total
High School 40 60 100
Bachelor’s 70 30 100
Master’s 80 20 100
Total 190 110 300

Hypothesis:

  • H₀: Education level and job satisfaction are independent.
  • H₁: There is an association.

Step 1: Calculate Expected Frequencies

  • High School, Satisfied: \( \frac{100 \times 190}{300} = 63.33 \)
  • High School, Not Satisfied: \( \frac{100 \times 110}{300} = 36.67 \)
  • Bachelor’s, Satisfied: \( \frac{100 \times 190}{300} = 63.33 \)
  • Bachelor’s, Not Satisfied: \( \frac{100 \times 110}{300} = 36.67 \)
  • Master’s, Satisfied: \( \frac{100 \times 190}{300} = 63.33 \)
  • Master’s, Not Satisfied: \( \frac{100 \times 110}{300} = 36.67 \)

Expected Frequencies Table:

Satisfied Not Satisfied
High School 63.33 36.67
Bachelor’s 63.33 36.67
Master’s 63.33 36.67

Step 2: Calculate Chi-Square Statistic

  • High School, Satisfied: \( \frac{(40 – 63.33)^2}{63.33} \approx 5.433 \)
  • High School, Not Satisfied: \( \frac{(60 – 36.67)^2}{36.67} \approx 14.841 \)
  • Bachelor’s, Satisfied: \( \frac{(70 – 63.33)^2}{63.33} \approx 0.703 \)
  • Bachelor’s, Not Satisfied: \( \frac{(30 – 36.67)^2}{36.67} \approx 1.215 \)
  • Master’s, Satisfied: \( \frac{(80 – 63.33)^2}{63.33} \approx 4.mediaplayer4.389 \)
  • Master’s, Not Satisfied: \( \frac{(20 – 36.67)^2}{36.67} \approx 7.571 \)

Total: \[ \chi^2 = 5.433 + 14.841 + 0.703 + 1.215 + 4.389 + 7.571 = 34.152 \]

Step 3: Degrees of Freedom

\[ df = (3 – 1) \times (2 – 1) = 2 \]

Step 4: P-Value and Interpretation

For \( \chi^2 = 34.152 \), df = 2, p-value < 0.001.

Interpretation:

  • p-value < 0.001: Less than 0.05, so we reject H₀.
  • Conclusion: There is a significant association between education level and job satisfaction.

Step 5: Check Assumptions

All expected frequencies are ≥ 5, observations are independent, and data is categorical.

Example 3: 3×3 Table – Region and Preferred Transport

Problem: Is there an association between region and preferred transport mode?

Observed Contingency Table:

Car Bus Train Total
Urban 50 30 20 100
Suburban 40 40 20 100
Rural 20 50 30 100
Total 110 120 70 300

Hypothesis:

  • H₀: Region and transport mode are independent.
  • H₁: There is an association.

Step 1: Calculate Expected Frequencies

  • Urban, Car: \( \frac{100 \times 110}{300} = 36.67 \)
  • Urban, Bus: \( \frac{100 \times 120}{300} = 40 \)
  • Urban, Train: \( \frac{100 \times 70}{300} = 23.33 \)
  • Suburban, Car: \( \frac{100 \times 110}{300} = 36.67 \)
  • Suburban, Bus: \( \frac{100 \times 120}{300} = 40 \)
  • Suburban, Train: \( \frac{100 \times 70}{300} = 23.33 \)
  • Rural, Car: \( \frac{100 \times 110}{300} = 36.67 \)
  • Rural, Bus: \( \frac{100 \times 120}{300} = 40 \)
  • Rural, Train: \( \frac{100 \times 70}{300} = 23.33 \)

Expected Frequencies Table:

Car Bus Train
Urban 36.67 40 23.33
Suburban 36.67 40 23.33
Rural 36.67 40 23.33

Step 2: Calculate Chi-Square Statistic

  • Urban, Car: \( \frac{(50 – 36.67)^2}{36.67} \approx 4.848 \)
  • Urban, Bus: \( \frac{(30 – 40)^2}{40} = 2.5 \)
  • Urban, Train: \( \frac{(20 – 23.33)^2}{23.33} \approx 0.475 \)
  • Suburban, Car: \( \frac{(40 – 36.67)^2}{36.67} \approx 0.303 \)
  • Suburban, Bus: \( \frac{(40 – 40)^2}{40} = 0 \)
  • Suburban, Train: \( \frac{(20 – 23.33)^2}{23.33} \approx 0.475 \)
  • Rural, Car: \( \frac{(20 – 36.67)^2}{36.67} \approx 7.576 \)
  • Rural, Bus: \( \frac{(50 – 40)^2}{40} = 2.5 \)
  • Rural, Train: \( \frac{(30 – 23.33)^2}{23.33} \approx 1.905 \)

Total: \[ \chi^2 = 4.848 + 2.5 + 0.475 + 0.303 + 0 + 0.475 + 7.576 + 2.5 + 1.905 = 20.582 \]

Step 3: Degrees of Freedom

\[ df = (3 – 1) \times (3 – 1) = 4 \]

Step 4: P-Value and Interpretation

For \( \chi^2 = 20.582 \), df = 4, p-value ≈ 0.0004.

Interpretation:

  • p-value ≈ 0.0004: Less than 0.05, so we reject H₀.
  • Conclusion: There is a significant association between region and preferred transport mode.

Step 5: Check Assumptions

All expected frequencies are ≥ 5, observations are independent, and data is categorical.

Example 4: 4×2 Table – Age Group and Voting Preference

Problem: Is there an association between age group and voting preference?

Observed Contingency Table:

Party A Party B Total
18-25 30 20 50
26-35 40 30 70
36-50 50 40 90
51+ 60 30 90
Total 180 120 300

Hypothesis:

  • H₀: Age group and voting preference are independent.
  • H₁: There is an association.

Step 1: Calculate Expected Frequencies

  • 18-25, Party A: \( \frac{50 \times 180}{300} = 30 \)
  • 18-25, Party B: \( \frac{50 \times 120}{300} = 20 \)
  • 26-35, Party A: \( \frac{70 \times 180}{300} = 42 \)
  • 26-35, Party B: \( \frac{70 \times 120}{300} = 28 \)
  • 36-50, Party A: \( \frac{90 \times 180}{300} = 54 \)
  • 36-50, Party B: \( \frac{90 \times 120}{300} = 36 \)
  • 51+, Party A: \( \frac{90 \times 180}{300} = 54 \)
  • 51+, Party B: \( \frac{90 \times 120}{300} = 36 \)

Expected Frequencies Table:

Party A Party B
18-25 30 20
26-35 42 28
36-50 54 36
51+ 54 36

Step 2: Calculate Chi-Square Statistic

  • 18-25, Party A: \( \frac{(30 – 30)^2}{30} = 0 \)
  • 18-25, Party B: \( \frac{(20 – 20)^2}{20} = 0 \)
  • 26-35, Party A: \( \frac{(40 – 42)^2}{42} \approx 0.095 \)
  • 26-35, Party B: \( \frac{(30 – 28)^2}{28} \approx 0.143 \)
  • 36-50, Party A: \( \frac{(50 – 54)^2}{54} \approx 0.296 \)
  • 36-50, Party B: \( \frac{(40 – 36)^2}{36} \approx 0.444 \)
  • 51+, Party A: \( \frac{(60 – 54)^2}{54} \approx 0.667 \)
  • 51+, Party B: \( \frac{(30 – 36)^2}{36} \approx 1.000 \)

Total: \[ \chi^2 = 0 + 0 + 0.095 + 0.143 + 0.296 + 0.444 + 0.667 + 1.000 = 2.645 \]

Step 3: Degrees of Freedom

\[ df = (4 – 1) \times (2 – 1) = 3 \]

Step 4: P-Value and Interpretation

For \( \chi^2 = 2.645 \), df = 3, p-value ≈ 0.450.

Interpretation:

  • p-value ≈ 0.450: Greater than 0.05, so we do not reject H₀.
  • Conclusion: There is no significant association between age group and voting preference.

Step 5: Check Assumptions

All expected frequencies are ≥ 5, observations are independent, and data is categorical.

Example 5: 2×3 Table – Gender and Product Preference

Problem: Is there an association between gender and product preference?

Observed Contingency Table:

Product X Product Y Product Z Total
Male 40 30 20 90
Female 20 40 50 110
Total 60 70 70 200

Hypothesis:

  • H₀: Gender and product preference are independent.
  • H₁: There is an association.

Step 1: Calculate Expected Frequencies

  • Male, Product X: \( \frac{90 \times 60}{200} = 27 \)
  • Male, Product Y: \( \frac{90 \times 70}{200} = 31.5 \)
  • Male, Product Z: \( \frac{90 \times 70}{200} = 31.5 \)
  • Female, Product X: \( \frac{110 \times 60}{200} = 33 \)
  • Female, Product Y: \( \frac{110 \times 70}{200} = 38.5 \)
  • Female, Product Z: \( \frac{110 \times 70}{200} = 38.5 \)

Expected Frequencies Table:

Product X Product Y Product Z
Male 27 31.5 31.5
Female 33 38.5 38.5

Step 2: Calculate Chi-Square Statistic

  • Male, Product X: \( \frac{(40 – 27)^2}{27} \approx 6.259 \)
  • Male, Product Y: \( \frac{(30 – 31.5)^2}{31.5} \approx 0.071 \)
  • Male, Product Z: \( \frac{(20 – 31.5)^2}{31.5} \approx 4.206 \)
  • Female, Product X: \( \frac{(20 – 33)^2}{33} \approx 5.121 \)
  • Female, Product Y: \( \frac{(40 – 38.5)^2}{38.5} \approx 0.058 \)
  • Female, Product Z: \( \frac{(50 – 38.5)^2}{38.5} \approx 3.442 \)

Total: \[ \chi^2 = 6.259 + 0.071 + 4.206 + 5.121 + 0.058 + 3.442 = 19.157 \]

Step 3: Degrees of Freedom

\[ df = (2 – 1) \times (3 – 1) = 2 \]

Step 4: P-Value and Interpretation

For \( \chi^2 = 19.157 \), df = 2, p-value < 0.001.

Interpretation:

  • p-value < 0.001: Less than 0.05, so we reject H₀.
  • Conclusion: There is a significant association between gender and product preference.

Step 5: Check Assumptions

All expected frequencies are ≥ 5, observations are independent, and data is categorical.

Summary

  • Chi-Square Test: Tests the association between two categorical variables.
  • Examples and Results:
    • 2×2 (Gender and Smoking): \( \chi^2 = 4.762 \), df = 1, p = 0.029 (significant).
    • 3×2 (Education and Job Satisfaction): \( \chi^2 = 34.152 \), df = 2, p < 0.001 (significant).
    • 3×3 (Region and Transport): \( \chi^2 = 20.582 \), df = 4, p ≈ 0.0004 (significant).
    • 4×2 (Age and Voting): \( \chi^2 = 2.645 \), df = 3, p ≈ 0.450 (non-significant).
    • 2×3 (Gender and Product): \( \chi^2 = 19.157 \), df = 2, p < 0.001 (significant).
  • Formulas:
    \[ \chi^2 = \sum \frac{(O_i – E_i)^2}{E_i}, \quad E_i = \frac{(\text{Row Total}) \times (\text{Column Total})}{\text{Grand Total}} \]

This guide will help you understand and apply Chi-Square Contingency Table calculations. If you have further questions, please leave a comment!

Leave a Reply

Your email address will not be published. Required fields are marked *