Chi-Square Goodness of Fit Test: When to Use, When Not to Use, with Example in English (2025 Guide)
The Chi-Square Goodness of Fit Test is a statistical method used to determine whether the observed frequency distribution of a categorical variable matches a specified expected frequency distribution. This test is applied to a single categorical variable to check if the data follows a theoretical or expected pattern.
In this 2025 guide, we will explain the Chi-Square Goodness of Fit Test, when to use it, when to avoid it, and provide a step-by-step example with data. This English guide is designed for students, researchers, and professionals interested in categorical data analysis.
When to Use the Chi-Square Goodness of Fit Test
The Chi-Square Goodness of Fit Test is appropriate when:
- Single Categorical Variable: The data pertains to one categorical variable, such as colors (Red, Blue, Green) or preferences (Yes, No).
- Known Expected Distribution: You have a theoretical or expected frequency distribution, such as a uniform distribution or historical data.
- Adequate Sample Size: Expected frequencies should generally be at least 5 (in most cases).
- Random Sampling: The data must be collected randomly.
- Examples:
- Is a die fair? (Expected: Each number has a 1/6 probability).
- Are customer product preferences uniformly distributed?
- Do voting patterns align with historical trends?
When Not to Use the Chi-Square Goodness of Fit Test
- Continuous Data: For continuous data (e.g., height, weight), use tests like Kolmogorov-Smirnov or Shapiro-Wilk.
- Small Expected Frequencies: If expected frequencies are less than 5, the test may not be reliable. Consider exact tests or data aggregation.
- Multiple Variables: To test associations between two or more variables, use the Chi-Square Test of Independence.
- Non-Random Sampling: Non-random data collection can lead to biased results.
- Ordinal Data with Order Importance: For ordinal data where order matters, consider alternative tests like Mann-Whitney U.
Chi-Square Goodness of Fit Test Formula
The Chi-Square statistic is calculated as:
Where:
- \(O_i\): Observed frequency.
- \(E_i\): Expected frequency.
- \(\sum\): Sum across all categories.
Degrees of freedom (df) formula:
Where \( k \) = number of categories.
Step-by-Step Example
Problem: A researcher wants to determine if a 6-sided die is fair. For a fair die, each number (1 to 6) should have a 1/6 probability. The researcher rolled the die 120 times and collected the following data.
Data: Observed frequencies:
Number | 1 | 2 | 3 | 4 | 5 | 6 | Total |
---|---|---|---|---|---|---|---|
Observed Frequency | 15 | 25 | 22 | 18 | 20 | 20 | 120 |
Hypothesis:
- Null Hypothesis (H₀): The die is fair (each number has an equal frequency).
- Alternative Hypothesis (H₁): The die is not fair (frequencies are not equal).
Step 1: Calculate Expected Frequencies
For a fair die, each number has a 1/6 probability. Total rolls = 120, so the expected frequency for each category is:
\[ E_i = \frac{\text{Total Rolls}}{\text{Number of Categories}} = \frac{120}{6} = 20 \]
Expected Frequencies:
Number | 1 | 2 | 3 | 4 | 5 | 6 |
---|---|---|---|---|---|---|
Expected Frequency | 20 | 20 | 20 | 20 | 20 | 20 |
Step 2: Calculate Chi-Square Statistic
Formula:
Calculations:
- Number 1: \( \frac{(15 – 20)^2}{20} = \frac{25}{20} = 1.25 \)
- Number 2: \( \frac{(25 – 20)^2}{20} = \frac{25}{20} = 1.25 \)
- Number 3: \( \frac{(22 – 20)^2}{20} = \frac{4}{20} = 0.2 \)
- Number 4: \( \frac{(18 – 20)^2}{20} = \frac{4}{20} = 0.2 \)
- Number 5: \( \frac{(20 – 20)^2}{20} = 0 \)
- Number 6: \( \frac{(20 – 20)^2}{20} = 0 \)
Total Chi-Square statistic:
\[ \chi^2 = 1.25 + 1.25 + 0.2 + 0.2 + 0 + 0 = 2.9 \]
Step 3: Calculate Degrees of Freedom (df)
Degrees of freedom formula:
Where \( k \) = 6 (categories).
\[ df = 6 – 1 = 5 \]
Step 4: Determine P_Value and Significance
Using a Chi-Square table or software (e.g., SPSS or R), the p-value for \( \chi^2 = 2.9 \), df = 5 is approximately 0.713.
Interpretation:
- p-value = 0.713: This is greater than 0.05, so we do not reject the null hypothesis.
- Conclusion: The die is fair, as the observed frequencies do not significantly differ from the expected frequencies (uniform distribution).
Step 5: Check Assumptions
Assumptions for the Chi-Square Goodness of Fit Test:
- Data must be categorical (numbers 1-6 are categorical).
- Observations must be independent (die rolls are independent).
- Expected frequencies should be at least 5. Here, all expected frequencies = 20, which is ≥ 5.
- Data must be randomly collected (assumed).
All assumptions are satisfied.
Summary
- Chi-Square Goodness of Fit Test: Compares the observed distribution of a categorical variable to an expected distribution.
- When to Use: Single categorical variable, known expected distribution, sufficient sample size.
- When Not to Use: Continuous data, small expected frequencies, multiple variables.
- Example: Die roll data yielded \( \chi^2 = 2.9 \), df = 5, p-value = 0.713, indicating the die is fair.
- Formula:
\[ \chi^2 = \sum \frac{(O_i – E_i)^2}{E_i} \]
This 2025 guide will help you understand and apply the Chi-Square Goodness of Fit Test effectively. For further questions, please leave a comment!