Chi-Square Goodness of Fit Test

Chi-Square Goodness of Fit Test: When to Use, When Not to Use, with Example in English (2025 Guide)

Chi-Square Goodness of Fit Test: When to Use, When Not to Use, with Example in English (2025 Guide)

The Chi-Square Goodness of Fit Test is a statistical method used to determine whether the observed frequency distribution of a categorical variable matches a specified expected frequency distribution. This test is applied to a single categorical variable to check if the data follows a theoretical or expected pattern.

In this 2025 guide, we will explain the Chi-Square Goodness of Fit Test, when to use it, when to avoid it, and provide a step-by-step example with data. This English guide is designed for students, researchers, and professionals interested in categorical data analysis.

When to Use the Chi-Square Goodness of Fit Test

The Chi-Square Goodness of Fit Test is appropriate when:

  • Single Categorical Variable: The data pertains to one categorical variable, such as colors (Red, Blue, Green) or preferences (Yes, No).
  • Known Expected Distribution: You have a theoretical or expected frequency distribution, such as a uniform distribution or historical data.
  • Adequate Sample Size: Expected frequencies should generally be at least 5 (in most cases).
  • Random Sampling: The data must be collected randomly.
  • Examples:
    • Is a die fair? (Expected: Each number has a 1/6 probability).
    • Are customer product preferences uniformly distributed?
    • Do voting patterns align with historical trends?

When Not to Use the Chi-Square Goodness of Fit Test

  • Continuous Data: For continuous data (e.g., height, weight), use tests like Kolmogorov-Smirnov or Shapiro-Wilk.
  • Small Expected Frequencies: If expected frequencies are less than 5, the test may not be reliable. Consider exact tests or data aggregation.
  • Multiple Variables: To test associations between two or more variables, use the Chi-Square Test of Independence.
  • Non-Random Sampling: Non-random data collection can lead to biased results.
  • Ordinal Data with Order Importance: For ordinal data where order matters, consider alternative tests like Mann-Whitney U.

Chi-Square Goodness of Fit Test Formula

The Chi-Square statistic is calculated as:

\[ \chi^2 = \sum \frac{(O_i – E_i)^2}{E_i} \]

Where:

  • \(O_i\): Observed frequency.
  • \(E_i\): Expected frequency.
  • \(\sum\): Sum across all categories.

Degrees of freedom (df) formula:

\[ df = k – 1 \]

Where \( k \) = number of categories.

Step-by-Step Example

Problem: A researcher wants to determine if a 6-sided die is fair. For a fair die, each number (1 to 6) should have a 1/6 probability. The researcher rolled the die 120 times and collected the following data.

Data: Observed frequencies:

Number 1 2 3 4 5 6 Total
Observed Frequency 15 25 22 18 20 20 120

Hypothesis:

  • Null Hypothesis (H₀): The die is fair (each number has an equal frequency).
  • Alternative Hypothesis (H₁): The die is not fair (frequencies are not equal).

Step 1: Calculate Expected Frequencies

For a fair die, each number has a 1/6 probability. Total rolls = 120, so the expected frequency for each category is:

\[ E_i = \frac{\text{Total Rolls}}{\text{Number of Categories}} = \frac{120}{6} = 20 \]

Expected Frequencies:

Number 1 2 3 4 5 6
Expected Frequency 20 20 20 20 20 20

Step 2: Calculate Chi-Square Statistic

Formula:

\[ \chi^2 = \sum \frac{(O_i – E_i)^2}{E_i} \]

Calculations:

  • Number 1: \( \frac{(15 – 20)^2}{20} = \frac{25}{20} = 1.25 \)
  • Number 2: \( \frac{(25 – 20)^2}{20} = \frac{25}{20} = 1.25 \)
  • Number 3: \( \frac{(22 – 20)^2}{20} = \frac{4}{20} = 0.2 \)
  • Number 4: \( \frac{(18 – 20)^2}{20} = \frac{4}{20} = 0.2 \)
  • Number 5: \( \frac{(20 – 20)^2}{20} = 0 \)
  • Number 6: \( \frac{(20 – 20)^2}{20} = 0 \)

Total Chi-Square statistic:

\[ \chi^2 = 1.25 + 1.25 + 0.2 + 0.2 + 0 + 0 = 2.9 \]

Step 3: Calculate Degrees of Freedom (df)

Degrees of freedom formula:

\[ df = k – 1 \]

Where \( k \) = 6 (categories).

\[ df = 6 – 1 = 5 \]

Step 4: Determine P_Value and Significance

Using a Chi-Square table or software (e.g., SPSS or R), the p-value for \( \chi^2 = 2.9 \), df = 5 is approximately 0.713.

Interpretation:

  • p-value = 0.713: This is greater than 0.05, so we do not reject the null hypothesis.
  • Conclusion: The die is fair, as the observed frequencies do not significantly differ from the expected frequencies (uniform distribution).

Step 5: Check Assumptions

Assumptions for the Chi-Square Goodness of Fit Test:

  • Data must be categorical (numbers 1-6 are categorical).
  • Observations must be independent (die rolls are independent).
  • Expected frequencies should be at least 5. Here, all expected frequencies = 20, which is ≥ 5.
  • Data must be randomly collected (assumed).

All assumptions are satisfied.

Summary

  • Chi-Square Goodness of Fit Test: Compares the observed distribution of a categorical variable to an expected distribution.
  • When to Use: Single categorical variable, known expected distribution, sufficient sample size.
  • When Not to Use: Continuous data, small expected frequencies, multiple variables.
  • Example: Die roll data yielded \( \chi^2 = 2.9 \), df = 5, p-value = 0.713, indicating the die is fair.
  • Formula:
    \[ \chi^2 = \sum \frac{(O_i – E_i)^2}{E_i} \]

This 2025 guide will help you understand and apply the Chi-Square Goodness of Fit Test effectively. For further questions, please leave a comment!

Leave a Reply

Your email address will not be published. Required fields are marked *