SECTION II
Part A
Question 1
A large exercise center has several thousand members from age \(18\) to \(55\) years and several thousand members age \(56\) and older. The manager of the center is considering offering online fitness classes. The manager is investigating whether members’ opinions of taking online fitness classes differ by age. The manager selected a random sample of \(170\) exercise center members ages \(18\) to \(55\) years and a second random sample of \(230\) exercise center members ages \(56\) years and older. Each sampled member was asked whether they would be interested in taking online fitness classes. The manager found that \(51\) of the \(170\) sampled members ages \(18\) to \(55\) years and that \(79\) of the \(230\) sampled members ages \(56\) years and older said they would be interested in taking online fitness classes.
At a significance level of \(\alpha=0.05\), do the data provide convincing statistical evidence of a difference in the proportion of all exercise center members ages \(18\) to \(55\) years who would be interested in taking online fitness classes and the proportion of all exercise center members ages \(56\) years and older who would be interested in taking online fitness classes? Complete the appropriate inference procedure to justify your response.
Most-appropriate topic codes (CED):
• TOPIC 6.11: Carrying Out a Test for the Difference of Two Population Proportions
▶️ Answer/Explanation
State:
We will perform a two-sample z-test for a difference in proportions. Let \(p_{younger}\) be the true proportion of members ages \(18\) to \(55\) interested in online classes, and \(p_{older}\) be the true proportion for members ages \(56\) and older. The significance level is \(\alpha=0.05\).
The hypotheses are:
\(H_0: p_{younger} – p_{older} = 0\)
\(H_a: p_{younger} – p_{older} \ne 0\)
Plan:
We check the conditions for inference.
1. Random: The data come from two independent random samples.
2. Independence (10% condition): \(170\) and \(230\) are likely less than \(10\%\) of all members in their respective age groups at a “large exercise center.”
3. Normality (Large Counts):
– \(\hat{p}_{younger} = \frac{51}{170} = 0.3\). Successes = \(51\), Failures = \(119\). Both are \(\ge 10\).
– \(\hat{p}_{older} = \frac{79}{230} \approx 0.343\). Successes = \(79\), Failures = \(151\). Both are \(\ge 10\).
All conditions are met.
Do:
First, calculate the pooled proportion, \(\hat{p}_c\):
\(\hat{p}_c = \frac{51+79}{170+230} = \frac{130}{400} = 0.325\)
Next, calculate the z-test statistic:
\(z = \frac{(\hat{p}_{younger} – \hat{p}_{older}) – 0}{\sqrt{\hat{p}_c(1-\hat{p}_c)(\frac{1}{n_{younger}} + \frac{1}{n_{older}})}}\)
\(z = \frac{(0.3 – 0.3435)}{\sqrt{(0.325)(0.675)(\frac{1}{170} + \frac{1}{230})}} \approx -0.918\)
Finally, find the two-sided p-value:
p-value = \(2 \times P(Z < -0.918) \approx 0.359\)
Conclude:
Since the p-value (\(0.359\)) is greater than the significance level (\(\alpha = 0.05\)), we fail to reject the null hypothesis.
There is not convincing statistical evidence to conclude that there is a difference in the proportion of all younger members and all older members at this exercise center who would be interested in taking online fitness classes.
Question 2
A local elementary school decided to sell bottles printed with the school district’s logo as a fund-raiser. The students in the elementary school were asked to sell bottles in three different sizes (small, medium, and large). The relative frequencies of the number of bottles sold for each size by the elementary school were \(0.5\) for small bottles, \(0.3\) for medium bottles, and \(0.2\) for large bottles.
A local middle school also decided to sell bottles as a fund-raiser, using the same three sizes (small, medium, and large). The middle school students sold three times the number of bottles that the elementary school students sold. For the middle school students, the proportion of bottles sold was equal for all three sizes.
Most-appropriate topic codes (CED):
• TOPIC 2.2: Representing Two Categorical Variables
• TOPIC 2.3: Statistics for Two Categorical Variables
▶️ Answer/Explanation
(a)
The completed segmented bar graphs are shown below.![]()
– For the Elementary School, the segments are partitioned at \(0.5\) (for Small), \(0.8\) (for Medium, \(0.5+0.3\)), and \(1.0\) (for Large, \(0.8+0.2\)).
– For the Middle School, the proportions are equal for all three sizes, so each size represents \(\frac{1}{3}\) of the total. The segments are partitioned at \(\frac{1}{3} \approx 0.33\) and \(\frac{2}{3} \approx 0.67\).
(b)
No, the elementary school administrator’s conclusion is incorrect.
Explanation:
Although the proportion of small bottles sold by the elementary school (\(0.5\)) is greater than the proportion sold by the middle school (\(\approx 0.33\)), the middle school sold three times as many total bottles. Let \(N\) be the total number of bottles sold by the elementary school. Then the middle school sold \(3N\) bottles.
– Number of small bottles sold by Elementary: \(0.5 \times N\)
– Number of small bottles sold by Middle: \(\frac{1}{3} \times (3N) = N\)
Since \(N > 0.5N\), the middle school sold more small bottles.
(c)
(i) High School A sold a greater proportion of large bottles.
Justification: The proportion is represented by the height of the segment. The segment for “Large bottles” is taller for High School A (from \(0.7\) to \(1.0\), a height of \(0.3\)) than for High School B (from \(0.6\) to \(0.8\), a height of \(0.2\)).
(ii) High School B sold a greater number of large bottles.
Justification: The number of bottles is represented by the area of the rectangle (height \(\times\) width). The rectangle for large bottles at High School B is visibly wider than the rectangle for High School A. Although it is shorter, its greater width gives it a larger overall area, representing a greater number of large bottles sold.
Question 3
Most-appropriate topic codes (CED):
• TOPIC 3.6: Selecting an Experimental Design
• TOPIC 3.3: Random Sampling and Data Collection
▶️ Answer/Explanation
(a)
This is an observational study. The researchers are collecting data by asking car owners to estimate their mileage without imposing any treatments. The owners are not randomly assigned a car model to drive.
(b)
James can number the days of the experiment from \(1\) to \(70\). Then, using a random number generator, he can generate \(35\) unique integers from \(1\) to \(70\).The days corresponding to these \(35\) numbers will be assigned to the “drive with autopilot” treatment. The remaining \(35\) days will be assigned to the “drive without autopilot” treatment.
(c)
James’s experiment only used his own car, so the results can only be generalized to his specific car and driving conditions. To generalize the findings to all Model D cars in his club, he would need to conduct a new study. In the new study, he must randomly select a sample of Model D cars from the club’s members. He would then need to carry out a similar experiment using the randomly selected cars.
Question 4
(i) Calculate the mean of the distribution of the number of geodes Sarah will open until a red crystal is found. Show your work.
(ii) Calculate the standard deviation of the distribution of the number of geodes Sarah will open until a red crystal is found. Show your work.
(b) Another player, Conrad, decides to play the game and will stop opening geodes after finding a red crystal or when \(4\) geodes have been opened, whichever comes first. Let \(Y=\) the number of geodes Conrad will open. The table shows the partially completed probability distribution for the random variable Y.
| Number of geodes Conrad will open, y | \(1\) | \(2\) | \(3\) | \(4\) |
|---|---|---|---|---|
| Probability, P(Y=y) | \(0.08\) | \(0.0736\) |
(i) Calculate \(P(Y=3)\). Show your work.
(ii) Calculate \(P(Y=4)\). Show your work.
(i) Calculate the mean of the distribution of the number of geodes Conrad will open. Show your work.
(ii) Interpret the mean of the distribution of the number of geodes Conrad will open, which was calculated in part (c-i).
Most-appropriate topic codes (CED):
• TOPIC 4.7: Introduction to Random Variables and Probability Distributions
• TOPIC 4.8: Mean and Standard Deviation of Random Variables
▶️ Answer/Explanation
(a)
The number of geodes Sarah opens, G, follows a geometric distribution with probability of success \(p=0.08\).
(i) The mean of a geometric distribution is \(\mu = \frac{1}{p}\).
\(\mu = \frac{1}{0.08} = 12.5\) geodes.
(ii) The standard deviation of a geometric distribution is \(\sigma = \frac{\sqrt{1-p}}{p}\).
\(\sigma = \frac{\sqrt{1-0.08}}{0.08} = \frac{\sqrt{0.92}}{0.08} \approx 11.99\) geodes.
(b)
(i) \(P(Y=3)\) is the probability that the first red crystal is found on the third geode. This means the first two are not red and the third is red.
\(P(Y=3) = (1-0.08)^2 (0.08) = (0.92)^2 (0.08) \approx 0.0677\).
(ii) Conrad stops at \(4\) geodes if he does not find a red crystal in the first three tries. The outcome of the fourth geode does not matter. The probability of this is the sum of the probabilities of all outcomes that result in stopping at \(Y=4\). The easiest way to calculate this is to use the complement rule, as the probabilities for \(Y=1, 2, 3, 4\) must sum to \(1\).
\(P(Y=4) = 1 – [P(Y=1) + P(Y=2) + P(Y=3)]\)
\(P(Y=4) \approx 1 – (0.08 + 0.0736 + 0.0677) = 1 – 0.2213 = 0.7787\).
(c)
(i) The mean of the discrete random variable Y is calculated as \(E(Y) = \sum y \cdot P(Y=y)\).
\(E(Y) \approx (1)(0.08) + (2)(0.0736) + (3)(0.0677) + (4)(0.7787)\)
\(E(Y) \approx 0.08 + 0.1472 + 0.2031 + 3.1148 \approx 3.545\) geodes.
(ii) The mean of approximately \(3.545\) geodes is the long-run average number of geodes Conrad would open per attempt if he were to repeat this process many, many times.
Question 5
| Fewer Than 6 Months | 6-10 Months | 11-15 Months | 16-20 Months | 21 or More Months | Total | |
|---|---|---|---|---|---|---|
| Has a Majority of Regular Baseball Cards | \(80\) | \(84\) | \(71\) | \(76\) | \(112\) | \(423\) |
| Has a Majority of Rare Baseball Cards | \(11\) | \(16\) | \(9\) | \(6\) | \(35\) | \(77\) |
| Total | \(91\) | \(100\) | \(80\) | \(82\) | \(147\) | \(500\) |
(i) Name the hypothesis test Michelle should use to investigate her belief. Do not perform the hypothesis test.
(ii) State the appropriate null and alternative hypotheses for the hypothesis test you identified in (c-i). Do not perform the hypothesis test.
Most-appropriate topic codes (CED):
• TOPIC 4.5: Conditional Probability
• TOPIC 8.5: Setting Up a Chi-Square Test for Homogeneity or Independence
• TOPIC 8.6: Carrying Out a Chi-Square Test for Homogeneity or Independence
▶️ Answer/Explanation
(a)
The number of collectors who have been collecting for \(11\) or more months and have a majority of regular cards is the sum of the counts in the corresponding cells: \(71 + 76 + 112 = 259\).
The total number of collectors in the sample is \(500\).
The probability is \(\frac{259}{500} = 0.518\).
(b)
This is a conditional probability. The sample is restricted to the \(91\) collectors who have been collecting for fewer than \(6\) months. Of these, \(80\) have a majority of regular cards.
The probability is \(\frac{80}{91} \approx 0.879\).
(c)
(i) The appropriate test is a chi-square test for independence (or association).
(ii) The hypotheses are:
– \(H_0\): There is no association between the number of months spent collecting baseball cards and the majority type of card in the collection for all baseball card collectors at the convention.
– \(H_a\): There is an association between the number of months spent collecting baseball cards and the majority type of card in the collection for all baseball card collectors at the convention.
(d)
Since the p-value of \(0.0075\) is less than any common significance level (e.g., \(\alpha=0.05\)), Michelle should reject the null hypothesis.
There is convincing statistical evidence of an association between the number of months spent collecting baseball cards and the majority type of card in the collection for all baseball card collectors at the convention.
SECTION II
Part B
Question 6
| Sample Size | Mean | Std Dev | Min | \(Q_1\) | Median | \(Q_3\) | Max |
|---|---|---|---|---|---|---|---|
| \(20\) | \(5.12\) | \(0.743\) | \(4.25\) | \(4.51\) | \(4.885\) | \(5.475\) | \(6.58\) |
(i) Describe the shape of the distribution of the sample of whistle prices. Justify your response using appropriate values from the summary statistics table.
(ii) Using the \(1.5 \times IQR\) rule, determine whether there are any outliers in the sample of whistle prices. Justify your response.
(ii) Indicate the value of the Pearson’s coefficient of skewness you calculated in part (c-i) for the appropriate sample size by marking it with an “X” on the preceding graph.
(i) What should you conclude about the shape of the distribution of the sample of whistle prices? Justify your response.
- The sample size is greater than or equal to \(30\).
- If the sample size is less than \(30\), the distribution of the sample data is not strongly skewed and does not have outliers.
Most-appropriate topic codes (CED):
• TOPIC 1.6: Describing the Distribution of a Quantitative Variable
• TOPIC 1.7: Summary Statistics for a Quantitative Variable
▶️ Answer/Explanation
(a)
(i) The appropriate inference procedure is a one-sample t-interval for a population mean.
(ii) The parameter of interest is \(\mu\), the true mean price, in dollars, of this type of whistle at all stores that sell the whistle.
(b)
(i) The distribution of the sample of whistle prices appears to be slightly skewed to the right. This is because the sample mean (\(5.12\)) is greater than the sample median (\(4.885\)).
(ii) First, calculate the IQR: \(IQR = Q_3 – Q_1 = 5.475 – 4.51 = 0.965\).
Next, calculate the outlier fences:
– Lower Fence: \(Q_1 – 1.5(IQR) = 4.51 – 1.5(0.965) = 3.0625\)
– Upper Fence: \(Q_3 + 1.5(IQR) = 5.475 + 1.5(0.965) = 6.9225\)
Since the minimum value (\(4.25\)) is greater than the lower fence and the maximum value (\(6.58\)) is less than the upper fence, there are **no outliers** in the sample.
(c)
(i) Pearson’s Coefficient of Skewness = \(\frac{3(\bar{x} – m)}{s} = \frac{3(5.12 – 4.885)}{0.743} \approx 0.949\).
(ii) An “X” is marked on the graph at the coordinate (\(0.949, 20\)).![]()
(d)
(i) Based on the graph, the point (\(0.949, 20\)) falls into the region labeled “The distribution of sample data is considered strongly skewed.”
(ii) No, the normality condition is not satisfied. The sample size (\(n=20\)) is not greater than or equal to \(30\), and the analysis in part (d-i) shows that the distribution of the sample data is strongly skewed.
