AP Statistics 8.2 Setting Up a Chi-Square Goodness of Fit Test Study Notes

AP Statistics 8.2 Setting Up a Chi-Square Goodness of Fit Test Study Notes- New syllabus

AP Statistics 8.2 Setting Up a Chi-Square Goodness of Fit Test Study Notes -As per latest AP Statistics Syllabus.

LEARNING OBJECTIVE

The chi-square distribution may be used to model variation.

Key Concepts:

Chi-Square Distributions
Null and Alternative Hypotheses for a Categorical Distribution
Testing a Distribution of Proportions in Categorical Data
Calculating Expected Counts for a Chi-Square Goodness-of-Fit Test
Conditions for a Chi-Square Goodness-of-Fit Test

AP Statistics -Concise Summary Notes- All Topics

Chi-Square Distributions

A chi-square (\(\chi^2\)) distribution is a probability distribution that describes the distribution of a sum of the squares of independent standard normal random variables. It is commonly used to model the variation between observed and expected counts in categorical data.

Key Features:

The chi-square distribution is always non-negative (\(\chi^2 \ge 0\)) because it is based on squared values.
It is right-skewed, but the skew decreases as the degrees of freedom increase.
The shape of the chi-square distribution depends on the degrees of freedom (df).
- Fewer degrees of freedom → more skewed.
- More degrees of freedom → more symmetric and bell-shaped.
The mean of a chi-square distribution is equal to the degrees of freedom: \(\mu = df\).
The variance is \( \sigma^2 = 2 \cdot df \).

Applications in Statistics:

Goodness-of-fit tests: Comparing observed counts to expected counts for a single categorical variable.
Tests of independence: Determining whether two categorical variables are associated in a contingency table.
Homogeneity tests: Comparing distributions across multiple populations.

Example:

If we have 5 categories for a categorical variable, and we calculate the sum of squared differences between observed and expected counts divided by expected counts, that sum follows a chi-square distribution with \(df = 5 – 1 = 4\).

Example

A small company wants to see if the distribution of employees across four departments matches what they expect based on company policy. Expected distribution: 25% in each department. There are 40 employees:

Observed counts: Department A = 8, B = 12, C = 10, D = 10
Expected counts: 40 × 0.25 = 10 per department

Describe the chi-square distribution that would be used to assess whether the observed counts differ from expected counts.

▶️ Answer / Explanation

Step 1 — Degrees of freedom

For a goodness-of-fit test with \(k = 4\) categories: \(\displaystyle df = k – 1 = 4 – 1 = 3.\)

Step 2 — Shape of the distribution

The chi-square distribution is right-skewed because \(df = 3\) is small.
It starts at 0 on the horizontal axis and has a long tail to the right.
The mean is \(\mu = df = 3\) and variance \(\sigma^2 = 2 \cdot df = 6\).

Step 3 — Interpretation

If we compute the chi-square statistic for the observed vs. expected counts:

\(\displaystyle \chi^2 = \sum \dfrac{(O_i – E_i)^2}{E_i} = \dfrac{(8-10)^2}{10} + \dfrac{(12-10)^2}{10} + \dfrac{(10-10)^2}{10} + \dfrac{(10-10)^2}{10} = 0.8\)

This value would be compared to the chi-square distribution with \(df = 3\) to determine if the observed variation is unexpectedly large. In this case, the variation is small and consistent with random chance.

Null and Alternative Hypotheses for a Categorical Distribution

When testing whether a categorical variable follows a specified distribution of proportions, we define hypotheses as follows:

Null hypothesis (\(H_0\)): The observed categorical data follow the specified distribution.

Example: \(H_0:\) The proportions of students in each grade (freshman, sophomore, junior, senior) are 25%, 25%, 25%, 25%.

Alternative hypothesis (\(H_a\)): The observed categorical data do not follow the specified distribution.

Example: \(H_a:\) The proportions of students in the grades are not all equal to 25%.

Key Points:

The null hypothesis is often stated with exact proportions based on theoretical expectations or claims.
The alternative hypothesis is non-directional (two-sided), indicating that the observed distribution differs in at least one category.
This setup is used in goodness-of-fit tests, where the goal is to compare observed counts to expected counts under \(H_0\).

Example : A candy company claims that their 4-color candy pack contains equal numbers of red, green, blue, and yellow candies.

\(H_0:\) Proportions are 25% each.

\(H_a:\) At least one color’s proportion is different from 25%.

Example

A school claims that students’ favorite lunch choices are equally distributed among four options: Pizza, Sandwich, Salad, and Pasta. A survey of 80 students yields the following observed counts:

Pizza: 22
Sandwich: 18
Salad: 20
Pasta: 20

Identify the null and alternative hypotheses for testing if the observed counts match the expected distribution.

▶️ Answer / Explanation

Step 1 — Define null hypothesis (\(H_0\))

\(H_0:\) The proportions of students choosing each lunch option are equal, i.e., 25% for Pizza, 25% for Sandwich, 25% for Salad, and 25% for Pasta.

Step 2 — Define alternative hypothesis (\(H_a\))

\(H_a:\) At least one lunch option has a proportion different from 25%. In other words, the observed distribution of favorite lunch choices does not match the claimed equal distribution.

Step 3 — Contextual interpretation

If we proceed with a goodness-of-fit test, we will compare the observed counts to expected counts (80 × 0.25 = 20 for each option). The hypotheses guide the test and help determine if the differences are due to random variation or indicate a real discrepancy from the claimed distribution.

Testing a Distribution of Proportions in Categorical Data

We want to determine whether observed counts in categorical data match a claimed or expected distribution of proportions.

Appropriate Testing Method:

Goodness-of-Fit Test using the Chi-Square (\(\chi^2\)) statistic

Used when:
- There is one categorical variable with \(k\) categories.
- We have specified expected proportions for each category under a null hypothesis \(H_0\).
- We want to compare observed counts to expected counts to see if deviations are larger than can be explained by random chance.

Steps for the Test:

Define the null hypothesis \(H_0\): The observed proportions match the expected proportions.
Define the alternative hypothesis \(H_a\): At least one category has a proportion different from the expected.
Calculate expected counts: \(E_i = N \times \text{expected proportion}_i\), where \(N\) is total sample size.
Compute the chi-square statistic: \(\displaystyle \chi^2 = \sum_{i=1}^k \dfrac{(O_i – E_i)^2}{E_i}\), where \(O_i\) is the observed count.
Determine degrees of freedom: \(df = k – 1\).
Compare the statistic to the \(\chi^2\) distribution with \(df\) to calculate a p-value and draw a conclusion.

Notes:

This method assesses whether the observed variation in counts is unusually large compared to what we would expect under the null hypothesis.
It is suitable for any categorical variable with two or more categories, as long as expected counts are sufficiently large (usually ≥5 for each category).

Example

A toy company claims that their four-color candy packs contain equal numbers of red, green, blue, and yellow candies. A random sample of 80 candies yields the following counts:

Red: 18
Green: 22
Blue: 20
Yellow: 20

What is the appropriate test to determine if the observed distribution of candy colors matches the company’s claim?

▶️ Answer / Explanation

Step 1 — Identify the test

The appropriate test is a chi-square goodness-of-fit test, because we are comparing observed counts in one categorical variable (candy color) to expected counts based on specified proportions (25% each color).

Step 2 — Expected counts

Expected count for each color: \(E = 80 \times 0.25 = 20\).

Step 3 — Interpretation

The chi-square test would compare the observed counts (18, 22, 20, 20) to the expected counts (20, 20, 20, 20) to see if the differences are unusually large. This determines whether the observed variation could plausibly be due to random chance or indicates a departure from the claimed distribution.

Calculating Expected Counts for a Chi-Square Goodness-of-Fit Test

Expected counts are the counts we would anticipate in each category if the null hypothesis \(H_0\) is true. They provide the baseline to compare with observed counts.

Formula:

\(\displaystyle E_i = N \times p_i\)

\(E_i\) = expected count for category \(i\)
\(N\) = total sample size
\(p_i\) = hypothesized proportion for category \(i\) under \(H_0\)

Key Points:

All expected counts should generally be at least 5 for the chi-square test to be valid.
Expected counts are calculated before comparing to observed counts.
They reflect the distribution claimed by the null hypothesis, not the actual data.

Example

A school claims that students’ favorite subjects are equally distributed among Math, Science, English, and History. A survey of 60 students yields observed counts:

Math: 14
Science: 18
English: 16
History: 12

Calculate the expected counts for each subject under the null hypothesis that all subjects are equally preferred.

▶️ Answer / Explanation

Step 1 — Identify total sample size

Total students: \(N = 60\).

Step 2 — Hypothesized proportions

Equal preference implies \(p_i = 0.25\) for each subject.

Step 3 — Calculate expected counts

Math: \(E = 60 \times 0.25 = 15\)
Science: \(E = 60 \times 0.25 = 15\)
English: \(E = 60 \times 0.25 = 15\)
History: \(E = 60 \times 0.25 = 15\)

Step 4 — Interpretation

These expected counts (15 for each subject) are what we would expect if students truly have no preference. Observed counts can now be compared to these expected counts to assess whether the variation is larger than expected by chance.

Conditions for a Chi-Square Goodness-of-Fit Test

Before performing a chi-square test for goodness-of-fit, we must verify that the data meet the necessary conditions to ensure valid inferences:

1. Random: The data come from a random sample or a randomized experiment.

This ensures that the sample is representative of the population.

2. Expected Counts (Large Enough Sample): Each expected count \(E_i\) should generally be at least 5.

This ensures that the chi-square approximation to the sampling distribution is valid.

3. Independence: The observations must be independent of each other.

If sampling without replacement, the 10% condition should be satisfied: \(n \le 0.1N\), where \(N\) is the population size.

Notes:

If any expected count is less than 5, consider combining categories or using a different test.
Randomness, independence, and sufficiently large expected counts together ensure that the chi-square test statistic follows the chi-square distribution approximately under the null hypothesis.

Example

A candy company claims that their four-color candy packs contain equal numbers of red, green, blue, and yellow candies. A sample of 80 candies yields:

Observed: Red = 18, Green = 22, Blue = 20, Yellow = 20
Expected (if proportions are equal): 20 each

Verify that conditions are met for a chi-square goodness-of-fit test.

▶️ Answer / Explanation

Step 1 — Random: The sample is assumed to be randomly selected from the candy packs.

Step 2 — Expected counts: Each expected count \(E_i = 80 \times 0.25 = 20\). All expected counts ≥ 5.

Step 3 — Independence: Each candy is counted independently, and the sample size is much smaller than the total population of candies (satisfies 10% condition).

Conclusion: All conditions are satisfied. It is appropriate to proceed with the chi-square goodness-of-fit test to evaluate whether the observed counts differ from the expected distribution.

AP Statistics 8.2 Setting Up a Chi-Square Goodness of Fit Test Study Notes

AP Statistics 8.2 Setting Up a Chi-Square Goodness of Fit Test Study Notes- New syllabus

LEARNING OBJECTIVE

Chi-Square Distributions

Null and Alternative Hypotheses for a Categorical Distribution

Testing a Distribution of Proportions in Categorical Data

Calculating Expected Counts for a Chi-Square Goodness-of-Fit Test

Conditions for a Chi-Square Goodness-of-Fit Test

Resources

Members

Company