Home / AP_Statistics_2024_FRQ

SECTION II
Part A

Question 1

A large exercise center has several thousand members from age \(18\) to \(55\) years and several thousand members age \(56\) and older. The manager of the center is considering offering online fitness classes. The manager is investigating whether members’ opinions of taking online fitness classes differ by age. The manager selected a random sample of \(170\) exercise center members ages \(18\) to \(55\) years and a second random sample of \(230\) exercise center members ages \(56\) years and older. Each sampled member was asked whether they would be interested in taking online fitness classes. The manager found that \(51\) of the \(170\) sampled members ages \(18\) to \(55\) years and that \(79\) of the \(230\) sampled members ages \(56\) years and older said they would be interested in taking online fitness classes.

At a significance level of \(\alpha=0.05\), do the data provide convincing statistical evidence of a difference in the proportion of all exercise center members ages \(18\) to \(55\) years who would be interested in taking online fitness classes and the proportion of all exercise center members ages \(56\) years and older who would be interested in taking online fitness classes? Complete the appropriate inference procedure to justify your response.

Most-appropriate topic codes (CED):

TOPIC 6.10: Setting Up a Test for the Difference of Two Population Proportions
TOPIC 6.11: Carrying Out a Test for the Difference of Two Population Proportions
▶️ Answer/Explanation
Detailed solution

State:
We will perform a two-sample z-test for a difference in proportions. Let \(p_{younger}\) be the true proportion of members ages \(18\) to \(55\) interested in online classes, and \(p_{older}\) be the true proportion for members ages \(56\) and older. The significance level is \(\alpha=0.05\).
The hypotheses are:
\(H_0: p_{younger} – p_{older} = 0\)
\(H_a: p_{younger} – p_{older} \ne 0\)

Plan:
We check the conditions for inference.
1. Random: The data come from two independent random samples.
2. Independence (10% condition): \(170\) and \(230\) are likely less than \(10\%\) of all members in their respective age groups at a “large exercise center.”
3. Normality (Large Counts):
– \(\hat{p}_{younger} = \frac{51}{170} = 0.3\). Successes = \(51\), Failures = \(119\). Both are \(\ge 10\).
– \(\hat{p}_{older} = \frac{79}{230} \approx 0.343\). Successes = \(79\), Failures = \(151\). Both are \(\ge 10\).
All conditions are met.

Do:
First, calculate the pooled proportion, \(\hat{p}_c\):
\(\hat{p}_c = \frac{51+79}{170+230} = \frac{130}{400} = 0.325\)

Next, calculate the z-test statistic:
\(z = \frac{(\hat{p}_{younger} – \hat{p}_{older}) – 0}{\sqrt{\hat{p}_c(1-\hat{p}_c)(\frac{1}{n_{younger}} + \frac{1}{n_{older}})}}\)
\(z = \frac{(0.3 – 0.3435)}{\sqrt{(0.325)(0.675)(\frac{1}{170} + \frac{1}{230})}} \approx -0.918\)

Finally, find the two-sided p-value:
p-value = \(2 \times P(Z < -0.918) \approx 0.359\)

Conclude:
Since the p-value (\(0.359\)) is greater than the significance level (\(\alpha = 0.05\)), we fail to reject the null hypothesis.

There is not convincing statistical evidence to conclude that there is a difference in the proportion of all younger members and all older members at this exercise center who would be interested in taking online fitness classes.

Question 2

A local elementary school decided to sell bottles printed with the school district’s logo as a fund-raiser. The students in the elementary school were asked to sell bottles in three different sizes (small, medium, and large). The relative frequencies of the number of bottles sold for each size by the elementary school were \(0.5\) for small bottles, \(0.3\) for medium bottles, and \(0.2\) for large bottles.

A local middle school also decided to sell bottles as a fund-raiser, using the same three sizes (small, medium, and large). The middle school students sold three times the number of bottles that the elementary school students sold. For the middle school students, the proportion of bottles sold was equal for all three sizes.

(a) Complete the segmented bar graphs representing the relative frequencies of the number of bottles sold for each size by students at each school.
(b) An administrator at the elementary school concluded that the elementary school students sold more small bottles than the middle school students did. Is the elementary school administrator’s conclusion correct? Explain your response.
Two high schools are also selling the bottles and are competing to see which one sold more large bottles.
(c) A mosaic plot for the distribution of the number of bottles sold by each of the high schools is shown here.
(i) Which of the two high schools sold a greater proportion of large bottles? Justify your answer.
(ii) Which of the two high schools sold a greater number of large bottles? Justify your answer.

Most-appropriate topic codes (CED):

TOPIC 1.4: Representing a Categorical Variable with Graphs
TOPIC 2.2: Representing Two Categorical Variables
TOPIC 2.3: Statistics for Two Categorical Variables
▶️ Answer/Explanation
Detailed solution

(a)
The completed segmented bar graphs are shown below.

– For the Elementary School, the segments are partitioned at \(0.5\) (for Small), \(0.8\) (for Medium, \(0.5+0.3\)), and \(1.0\) (for Large, \(0.8+0.2\)).
– For the Middle School, the proportions are equal for all three sizes, so each size represents \(\frac{1}{3}\) of the total. The segments are partitioned at \(\frac{1}{3} \approx 0.33\) and \(\frac{2}{3} \approx 0.67\).

(b)
No, the elementary school administrator’s conclusion is incorrect.

Explanation:
Although the proportion of small bottles sold by the elementary school (\(0.5\)) is greater than the proportion sold by the middle school (\(\approx 0.33\)), the middle school sold three times as many total bottles. Let \(N\) be the total number of bottles sold by the elementary school. Then the middle school sold \(3N\) bottles.
– Number of small bottles sold by Elementary: \(0.5 \times N\)
– Number of small bottles sold by Middle: \(\frac{1}{3} \times (3N) = N\)
Since \(N > 0.5N\), the middle school sold more small bottles.

(c)
(i) High School A sold a greater proportion of large bottles.
Justification: The proportion is represented by the height of the segment. The segment for “Large bottles” is taller for High School A (from \(0.7\) to \(1.0\), a height of \(0.3\)) than for High School B (from \(0.6\) to \(0.8\), a height of \(0.2\)).

(ii) High School B sold a greater number of large bottles.
Justification: The number of bottles is represented by the area of the rectangle (height \(\times\) width). The rectangle for large bottles at High School B is visibly wider than the rectangle for High School A. Although it is shorter, its greater width gives it a larger overall area, representing a greater number of large bottles sold.

Question 3

A car maker produces four different models of cars: A, B, C, and D. A group of researchers is investigating which model of car has the longest distance traveled per gallon of gas (mileage). Higher mileage is considered better than lower mileage. The researchers will conduct a study in which they contact several owners of each model of car and ask them to estimate their mileage.
(a) Is this an observational study or an experiment? Justify your answer in context.
Model D has an autopilot feature, in which the car controls its own motion with human supervision. James owns a Model D car and will investigate whether using the autopilot feature results in higher mileage than not using the autopilot. James will drive his car on \(70\) different days to and from work, using the same route at the same time each day. James will record the mileage each day.
(b) James will use a completely randomized design to conduct his investigation. Describe an appropriate method James could use to randomly assign the two treatments, driving using the autopilot feature and driving without using the autopilot feature, to \(35\) days each.
(c) After the investigation was completed, James verified that the conditions for inference were met and conducted a hypothesis test. He discovered the mean mileage when using the autopilot feature was significantly higher than the mean mileage when not using the autopilot feature. James is a member of a Model D club with thousands of members who all drive Model D cars. He will give a presentation at a Model D club members’ meeting later this year and would like to state that the results of his hypothesis test apply to all Model D cars in his club. Another member of the club who is a statistician tells James his findings do not apply to all Model D cars in the club. What change would James need to make to his original study to be able to generalize to all Model D cars in the club?

Most-appropriate topic codes (CED):

TOPIC 3.2: Introduction to Planning a Study
TOPIC 3.6: Selecting an Experimental Design
TOPIC 3.3: Random Sampling and Data Collection
▶️ Answer/Explanation
Detailed solution

(a)
This is an observational study. The researchers are collecting data by asking car owners to estimate their mileage without imposing any treatments. The owners are not randomly assigned a car model to drive.

(b)
James can number the days of the experiment from \(1\) to \(70\). Then, using a random number generator, he can generate \(35\) unique integers from \(1\) to \(70\).The days corresponding to these \(35\) numbers will be assigned to the “drive with autopilot” treatment. The remaining \(35\) days will be assigned to the “drive without autopilot” treatment.

(c)
James’s experiment only used his own car, so the results can only be generalized to his specific car and driving conditions. To generalize the findings to all Model D cars in his club, he would need to conduct a new study. In the new study, he must randomly select a sample of Model D cars  from the club’s members. He would then need to carry out a similar experiment using the randomly selected cars.

Question 4

In an online game, players move through a virtual world collecting geodes, a type of hollow rock. When broken open, these geodes contain crystals of different colors that are useful in the game. A red crystal is the most useful crystal in the game. The color of the crystal in each geode is independent and the probability that a geode contains a red crystal is \(0.08\).
(a) Sarah, a player, will collect and open geodes until a red crystal is found.
(i) Calculate the mean of the distribution of the number of geodes Sarah will open until a red crystal is found. Show your work.
(ii) Calculate the standard deviation of the distribution of the number of geodes Sarah will open until a red crystal is found. Show your work.

(b) Another player, Conrad, decides to play the game and will stop opening geodes after finding a red crystal or when \(4\) geodes have been opened, whichever comes first. Let \(Y=\) the number of geodes Conrad will open. The table shows the partially completed probability distribution for the random variable Y.

Number of geodes Conrad will open, y\(1\)\(2\)\(3\)\(4\)
Probability, P(Y=y)\(0.08\)\(0.0736\)  

(i) Calculate \(P(Y=3)\). Show your work.
(ii) Calculate \(P(Y=4)\). Show your work.

(c) Consider the table and your results from part (b).
(i) Calculate the mean of the distribution of the number of geodes Conrad will open. Show your work.
(ii) Interpret the mean of the distribution of the number of geodes Conrad will open, which was calculated in part (c-i).

Most-appropriate topic codes (CED):

TOPIC 4.12: The Geometric Distribution
TOPIC 4.7: Introduction to Random Variables and Probability Distributions
TOPIC 4.8: Mean and Standard Deviation of Random Variables
▶️ Answer/Explanation
Detailed solution

(a)
The number of geodes Sarah opens, G, follows a geometric distribution with probability of success \(p=0.08\).
(i) The mean of a geometric distribution is \(\mu = \frac{1}{p}\).
\(\mu = \frac{1}{0.08} = 12.5\) geodes.
(ii) The standard deviation of a geometric distribution is \(\sigma = \frac{\sqrt{1-p}}{p}\).
\(\sigma = \frac{\sqrt{1-0.08}}{0.08} = \frac{\sqrt{0.92}}{0.08} \approx 11.99\) geodes.

(b)
(i) \(P(Y=3)\) is the probability that the first red crystal is found on the third geode. This means the first two are not red and the third is red.
\(P(Y=3) = (1-0.08)^2 (0.08) = (0.92)^2 (0.08) \approx 0.0677\).
(ii) Conrad stops at \(4\) geodes if he does not find a red crystal in the first three tries. The outcome of the fourth geode does not matter. The probability of this is the sum of the probabilities of all outcomes that result in stopping at \(Y=4\). The easiest way to calculate this is to use the complement rule, as the probabilities for \(Y=1, 2, 3, 4\) must sum to \(1\).
\(P(Y=4) = 1 – [P(Y=1) + P(Y=2) + P(Y=3)]\)
\(P(Y=4) \approx 1 – (0.08 + 0.0736 + 0.0677) = 1 – 0.2213 = 0.7787\).

(c)
(i) The mean of the discrete random variable Y is calculated as \(E(Y) = \sum y \cdot P(Y=y)\).
\(E(Y) \approx (1)(0.08) + (2)(0.0736) + (3)(0.0677) + (4)(0.7787)\)
\(E(Y) \approx 0.08 + 0.1472 + 0.2031 + 3.1148 \approx 3.545\) geodes.
(ii) The mean of approximately \(3.545\) geodes is the long-run average number of geodes Conrad would open per attempt if he were to repeat this process many, many times.

Question 5

Michelle is at a national baseball card collector’s convention with approximately \(20,000\) attendees. She notices that some collectors have both regular cards, which are easily obtained, and rare cards, which are harder to obtain. Michelle believes that there is a relationship between the number of months a collector has been collecting baseball cards and whether the majority of the cards (cards appearing more often) in their collection are regular or rare. She obtains information from a random sample of \(500\) baseball card collectors at the convention and records how many full months they have been collecting baseball cards and whether the majority of the cards in their card collection are regular or rare. Her results are displayed in a two-way table.
Majority Type of Baseball Cards and Months of Collecting Baseball Cards
 Fewer Than 6 Months6-10 Months11-15 Months16-20 Months21 or More MonthsTotal
Has a Majority of Regular Baseball Cards\(80\)\(84\)\(71\)\(76\)\(112\)\(423\)
Has a Majority of Rare Baseball Cards\(11\)\(16\)\(9\)\(6\)\(35\)\(77\)
Total\(91\)\(100\)\(80\)\(82\)\(147\)\(500\)
(a) If one collector from the sample is selected at random, what is the probability that the collector has been collecting baseball cards for \(11\) or more months and has a majority of regular baseball cards? Show your work.
(b) Given that a randomly selected collector from the sample has been collecting baseball cards for fewer than \(6\) months, what is the probability the collector has a majority of regular baseball cards? Show your work.
(c) Michelle believes there is a relationship between the number of months spent collecting baseball cards and which type of card is the majority in the collection (regular or rare).
(i) Name the hypothesis test Michelle should use to investigate her belief. Do not perform the hypothesis test.
(ii) State the appropriate null and alternative hypotheses for the hypothesis test you identified in (c-i). Do not perform the hypothesis test.
(d) After completing the hypothesis test described in part (c), Michelle obtains a p-value of \(0.0075\). Assuming the conditions for inference are met, what conclusion should Michelle make about her belief? Justify your response.

Most-appropriate topic codes (CED):

TOPIC 4.3: Introduction to Probability
TOPIC 4.5: Conditional Probability
TOPIC 8.5: Setting Up a Chi-Square Test for Homogeneity or Independence
TOPIC 8.6: Carrying Out a Chi-Square Test for Homogeneity or Independence
▶️ Answer/Explanation
Detailed solution

(a)
The number of collectors who have been collecting for \(11\) or more months and have a majority of regular cards is the sum of the counts in the corresponding cells: \(71 + 76 + 112 = 259\).
The total number of collectors in the sample is \(500\).
The probability is \(\frac{259}{500} = 0.518\).

(b)
This is a conditional probability. The sample is restricted to the \(91\) collectors who have been collecting for fewer than \(6\) months. Of these, \(80\) have a majority of regular cards.
The probability is \(\frac{80}{91} \approx 0.879\).

(c)
(i) The appropriate test is a chi-square test for independence (or association).
(ii) The hypotheses are:
\(H_0\): There is no association between the number of months spent collecting baseball cards and the majority type of card in the collection for all baseball card collectors at the convention.
\(H_a\): There is an association between the number of months spent collecting baseball cards and the majority type of card in the collection for all baseball card collectors at the convention.

(d)
Since the p-value of \(0.0075\) is less than any common significance level (e.g., \(\alpha=0.05\)), Michelle should reject the null hypothesis.

There is convincing statistical evidence of an association between the number of months spent collecting baseball cards and the majority type of card in the collection for all baseball card collectors at the convention.

SECTION II
Part B

Question 6

A company sells a certain type of whistle. The price of the whistle varies from store to store. Julio, a statistician at the company, wants to estimate the mean price, in dollars (\(\$\)), of this type of whistle at all stores that sell the whistle.
(a) (i) Identify the appropriate inference procedure for Julio to use.
(ii) Describe the parameter for the inference procedure you identified in part (a-i) in context.
Julio called the managers of \(20\) randomly selected stores that sell the whistle and recorded the price of the whistle at each store. Following is a dotplot of Julio’s data.
The summary statistics for Julio’s data are shown in the following table.
Sample SizeMeanStd DevMin\(Q_1\)Median\(Q_3\)Max
\(20\)\(5.12\)\(0.743\)\(4.25\)\(4.51\)\(4.885\)\(5.475\)\(6.58\)
(b) Julio wants to examine some characteristics of the distribution of the sample of whistle prices.
(i) Describe the shape of the distribution of the sample of whistle prices. Justify your response using appropriate values from the summary statistics table.
(ii) Using the \(1.5 \times IQR\) rule, determine whether there are any outliers in the sample of whistle prices. Justify your response.
It can often be difficult to determine whether the distribution of sample data is skewed by looking at a graph of the data and the summary statistics, particularly when the sample size is small. Thus, statisticians sometimes measure how skewed a data set is. One such measure is Pearson’s coefficient of skewness, which is calculated using the following formula.
\[ \text{Pearson’s Coefficient of Skewness} \;=\; \frac{3\!\left(\bar{x}-m\right)}{s} \]
In the formula, \(\bar{x}\) is the sample mean, \(m\) is the sample median, and \(s\) is the sample standard deviation.
(c) (i) Calculate Pearson’s coefficient of skewness for Julio’s sample of \(20\) whistle prices. Show your work.
(ii) Indicate the value of the Pearson’s coefficient of skewness you calculated in part (c-i) for the appropriate sample size by marking it with an “X” on the preceding graph.
(d) Consider your work in part (c).
(i) What should you conclude about the shape of the distribution of the sample of whistle prices? Justify your response.
Julio’s inference procedure in part \((a\text{-}i)\) needs one of the following requirements to be satisfied to verify the normality condition.
  • The sample size is greater than or equal to \(30\).
  • If the sample size is less than \(30\), the distribution of the sample data is not strongly skewed and does not have outliers.
(ii) Using your response to (d-i) and the preceding requirements, is the normality condition satisfied for Julio’s data? Explain your response.

Most-appropriate topic codes (CED):

TOPIC 7.2: Constructing a Confidence Interval for a Population Mean
TOPIC 1.6: Describing the Distribution of a Quantitative Variable
TOPIC 1.7: Summary Statistics for a Quantitative Variable
▶️ Answer/Explanation
Detailed solution

(a)
(i) The appropriate inference procedure is a one-sample t-interval for a population mean.
(ii) The parameter of interest is \(\mu\), the true mean price, in dollars, of this type of whistle at all stores that sell the whistle.

(b)
(i) The distribution of the sample of whistle prices appears to be slightly  skewed to the right. This is because the sample mean (\(5.12\)) is greater than the sample median (\(4.885\)).
(ii) First, calculate the IQR: \(IQR = Q_3 – Q_1 = 5.475 – 4.51 = 0.965\).
Next, calculate the outlier fences:
– Lower Fence: \(Q_1 – 1.5(IQR) = 4.51 – 1.5(0.965) = 3.0625\)
– Upper Fence: \(Q_3 + 1.5(IQR) = 5.475 + 1.5(0.965) = 6.9225\)
Since the minimum value (\(4.25\)) is greater than the lower fence and the maximum value (\(6.58\)) is less than the upper fence, there are **no outliers** in the sample.

(c)
(i) Pearson’s Coefficient of Skewness = \(\frac{3(\bar{x} – m)}{s} = \frac{3(5.12 – 4.885)}{0.743} \approx 0.949\).
(ii) An “X” is marked on the graph at the coordinate (\(0.949, 20\)).

(d)
(i) Based on the graph, the point (\(0.949, 20\)) falls into the region labeled “The distribution of sample data is considered strongly skewed.”
(ii) No, the normality condition is not satisfied. The sample size (\(n=20\)) is not greater than or equal to \(30\), and the analysis in part (d-i) shows that the distribution of the sample data is strongly skewed.

Scroll to Top