Home / AP_Statistics_2023_FRQ

Question 1

As part of a study on the chemistry of Alaskan streams, researchers took water samples from many streams with temperatures colder than \( 8^\circ \mathrm{C} \) and from many streams with temperatures warmer than \( 8^\circ \mathrm{C} \). For each sample, the researchers measured the dissolved oxygen concentration, in milligrams per liter \( (\mathrm{mg}/\ell) \).
(a) The researchers constructed the histogram shown for the dissolved oxygen concentration in streams from the sample with water temperatures colder than \( 8^\circ \mathrm{C} \). Based on the histogram, describe the distribution of dissolved oxygen concentration in streams with water temperatures colder than \( 8^\circ \mathrm{C} \).
(b) The researchers computed the summary statistics shown in the table for the dissolved oxygen concentration in streams from the sample with water temperatures warmer than \( 8^\circ \mathrm{C} \). Use the summary statistics to construct a box plot for the dissolved oxygen concentration in streams with water temperatures warmer than \( 8^\circ \mathrm{C} \). Do not indicate outliers.

Min\( Q_1 \)Median\( Q_3 \)MaxMeanStd. Dev.
\( 2.10 \)\( 4.39 \)\( 5.43 \)\( 6.12 \)\( 13.45 \)\( 5.54 \)\( 1.64 \)

(c) The researchers believe that streams with higher dissolved oxygen concentration are generally healthier for wildlife. Which streams are generally healthier for wildlife, those with water temperature colder than \( 8^\circ \mathrm{C} \) or those with water temperature warmer than \( 8^\circ \mathrm{C} \)? Using characteristics of the distribution of dissolved oxygen concentration for temperatures colder than \( 8^\circ \mathrm{C} \) and characteristics of the distribution of dissolved oxygen concentration for temperatures warmer than \( 8^\circ \mathrm{C} \), justify your answer.

Most-appropriate topic codes (CED):

Topic 1.6 — Describing the Distribution of a Quantitative Variable — part (a)
Topic 1.8 — Graphical Representations of Summary Statistics (five-number summary & boxplots) — part (b)
Topic 1.9 — Comparing Distributions of a Quantitative Variable — part (c)
▶️ Answer/Explanation
Detailed solution

(a)
From the histogram for streams colder than \( 8^\circ \mathrm{C} \):
Shape: Unimodal and skewed left (longer tail toward lower values \( \approx 2\text{–}6 \,\mathrm{mg}/\ell \)).
Center: Median between \( 11 \) and \( 12 \,\mathrm{mg}/\ell \) (tallest bars around \( 10\text{–}12 \)).
Spread: Approximate range \( \approx 14 – 2 = 12 \,\mathrm{mg}/\ell \). Quartiles appear near \( Q_1 \in (10,11) \) and \( Q_3 \in (12,13) \), so \( \mathrm{IQR} \approx 2 \,\mathrm{mg}/\ell \).
Unusual features: Several potential low outliers in the \( 2\text{–}3 \), \( 4\text{–}5 \), and \( 5\text{–}6 \) bins because these are far below \( Q_1 – 1.5\,\mathrm{IQR} \approx 10 – 1.5(2) = 7 \,\mathrm{mg}/\ell \).
Therefore, the distribution is described as unimodal, skewed left, with median \( 11\text{–}12 \,\mathrm{mg}/\ell \), IQR \( \approx 2 \,\mathrm{mg}/\ell \), and possible low outliers.

(b)
Use the five-number summary to draw the boxplot for warmer than \( 8^\circ \mathrm{C} \):
• Minimum \( = 2.10 \), \( Q_1 = 4.39 \), Median \( = 5.43 \), \( Q_3 = 6.12 \), Maximum \( = 13.45 \).
• Compute \( \mathrm{IQR} = Q_3 – Q_1 = 6.12 – 4.39 = 1.73 \).
• Fences for outliers (not drawn, but for reference): lower \( Q_1 – 1.5\mathrm{IQR} = 4.39 – 1.5(1.73) = 1.885 \) (so \( 2.10 \) is not beyond the fence); upper \( Q_3 + 1.5\mathrm{IQR} = 6.12 + 1.5(1.73) = 8.715 \) (values above this would be flagged, but the instruction says not to indicate outliers).
• The box spans \( 4.39 \) to \( 6.12 \) with a median line at \( 5.43 \); whiskers extend to \( 2.10 \) (left) and \( 13.45 \) (right).

(c)
If higher dissolved oxygen implies healthier streams, then colder streams are generally healthier because:
Center comparison: Colder streams have a larger center (median between \( 11 \) and \( 12 \,\mathrm{mg}/\ell \)) than warmer streams (median \( 5.43 \,\mathrm{mg}/\ell \)).
Shape: Colder distribution is skewed left (most values high with a few small ones), whereas the warmer distribution is right-skewed given the very long upper whisker to \( 13.45 \).
Spread: Both show similar middle-spread (colder \( \mathrm{IQR} \approx 2 \,\mathrm{mg}/\ell \); warmer \( \mathrm{IQR} = 1.73 \,\mathrm{mg}/\ell \)), but this does not overturn the much higher center for colder streams.
Hence, using center (and acknowledging shape and spread), colder streams better satisfy the researchers’ criterion.

Question 2

A developer wants to know whether adding fibers to concrete used in paving driveways will reduce the severity of cracking, because any driveway with severe cracks will have to be repaired by the developer. The developer conducts a completely randomized experiment with \(60\) new homes that need driveways. Thirty of the driveways will be randomly assigned to receive concrete that contains fibers, and the other \(30\) driveways will receive concrete that does not contain fibers. After one year, the developer will record the severity of cracks in each driveway on a scale of \(0\) to \(10\), with \(0\) representing not cracked at all and \(10\) representing severely cracked.
(a) Based on the information provided about the developer’s experiment, identify each of the following.
• Experimental units
• Treatments
• Response variable
(b) Describe an appropriate method the developer could use to randomly assign concrete that contains fibers and concrete that does not contain fibers to the \(60\) driveways.
Suppose the developer finds that there is a statistically significant reduction in the mean severity of cracks in driveways using the concrete that contains fibers compared to the driveways using concrete that does not contain fibers.
(c) In terms of the developer’s conclusion, what is the benefit of randomly assigning the driveways to either the concrete that contains fibers or the concrete that does not contain fibers?

Most-appropriate topic codes (CED):

TOPIC 3.5: Introduction to Experimental Design
TOPIC 3.6: Selecting an Experimental Design
TOPIC 3.7: Inference and Experiments
▶️ Answer/Explanation
Detailed solution

(a)
Experimental units: The \(60\) individual driveways.
Treatments: Concrete with fibers and concrete without fibers.
Response variable: The severity of cracks recorded after one year on a scale of \(0\) to \(10\).

(b)
First, number each of the \(60\) driveways from \(1\) to \(60\). Then, use a random number generator to select \(30\) unique integers from \(1\) to \(60\). The driveways corresponding to these \(30\) numbers will be assigned to receive concrete with fibers. The remaining \(30\) driveways will receive concrete without fibers.

(c)
The benefit of randomly assigning the treatments is that it allows the developer to draw a cause-and-effect conclusion. Since the driveways were randomly assigned, it helps to ensure that the two treatment groups are roughly balanced on all other potential confounding variables at the beginning of the experiment. Therefore, if there is a statistically significant difference in the severity of cracks, the developer can conclude that it was the type of concrete (the treatment) that caused the difference.

Question 3

Bath fizzies are mineral tablets that dissolve and create bubbles when added to bathwater. In order to increase sales, the Fizzy Bath Company has produced a new line of bath fizzies that have a cash prize in every bath fizzy. Let the random variable, \( X \), represent the dollar value of the cash prize in a bath fizzy. The probability distribution of \( X \) is shown in the table.
Cash prize, \( x \)\( \$1 \)\( \$5 \)\( \$10 \)\( \$20 \)\( \$50 \)\( \$100 \)
Probability of cash prize, \( P(X = x) \)\( P(X = \$1) \)\( 0.2 \)\( 0.05 \)\( 0.05 \)\( 0.01 \)\( 0.01 \)
(a) Based on the probability distribution of \( X \), answer the following. Show your work.
(i) Calculate the proportion of bath fizzies that contain \( \$1 \).
(ii) Calculate the proportion of bath fizzies that contain at least \( \$10 \).
(b) Based on the probability distribution of \( X \), calculate the probability that a randomly selected bath fizzy contains \( \$100 \), given that it contains at least \( \$10 \). Show your work.
(c) Based on the probability distribution of \( X \), calculate and interpret the expected value of the distribution of the cash prize in the bath fizzies. Show your work.
(d) The Fizzy Bath Company would like to sell the bath fizzies in France, where the currency is euros. Suppose the conversion rate for dollars to euros is \( 1 \) dollar = \( 0.89 \) euros. Using your expected value from part (c), calculate the expected value, in euros, of the distribution of the cash prize in the bath fizzies. Show your work.

Most-appropriate topic codes (CED):

TOPIC 4.7: Introduction to Random Variables and Probability Distributions — parts (a), (b)
TOPIC 4.8: Mean and Standard Deviation of Random Variables — parts (c), (d)
TOPIC 4.5: Conditional Probability — part (b)
▶️ Answer/Explanation
Detailed solution

(a)
(i) The sum of all probabilities must equal \( 1 \). Therefore:
\( P(X = \$1) = 1 – (0.2 + 0.05 + 0.05 + 0.01 + 0.01) = 1 – 0.32 = 0.68 \)
The proportion of bath fizzies that contain \( \$1 \) is \( \boxed{0.68} \).

(ii) The proportion of bath fizzies that contain at least \( \$10 \) is:
\( P(X \geq \$10) = P(X = \$10) + P(X = \$20) + P(X = \$50) + P(X = \$100) \)
\( P(X \geq \$10) = 0.05 + 0.05 + 0.01 + 0.01 = 0.12 \)
The proportion is \( \boxed{0.12} \).

(b) Using the conditional probability formula:
\( P(X = \$100 \mid X \geq \$10) = \frac{P(X = \$100 \text{ and } X \geq \$10)}{P(X \geq \$10)} = \frac{P(X = \$100)}{P(X \geq \$10)} \)
\( P(X = \$100 \mid X \geq \$10) = \frac{0.01}{0.12} = \frac{1}{12} \approx 0.0833 \)
The probability is \( \boxed{\frac{1}{12}} \) or approximately \( \boxed{0.0833} \).

(c) The expected value is calculated as:
\( E(X) = \sum x \cdot P(X = x) \)
\( E(X) = 1(0.68) + 5(0.2) + 10(0.05) + 20(0.05) + 50(0.01) + 100(0.01) \)
\( E(X) = 0.68 + 1.00 + 0.50 + 1.00 + 0.50 + 1.00 = 4.68 \)
The expected value is \( \boxed{\$4.68} \).
Interpretation: If many bath fizzies are selected, the average cash prize per bath fizzy will be approximately \( \$4.68 \).

(d) Convert the expected value from dollars to euros:
\( E(X)_{\text{euros}} = 4.68 \times 0.89 = 4.1652 \)
Rounded to two decimal places, the expected value in euros is \( \boxed{4.17} \) euros.

Question 4

A medical researcher completed a study comparing an omega-3 fatty acids supplement to a placebo in the treatment of irritability in patients with a certain medical condition. Nineteen patients with the medical condition volunteered to participate in the study. The study was conducted using the following weekly schedule.
• Week 1: Each patient took a randomly assigned treatment, omega-3 supplement or placebo.
• Week 2: The patients did not take either the omega-3 supplement or the placebo. This was necessary to reduce the possibility of any carryover effect from the assigned treatment taken during week 1.
• Week 3: Each patient took the treatment, omega-3 supplement or placebo, that they did not take during week 1.
At the end of week 1 and week 3, each patient’s irritability was given a score on a scale of 0 to 10, with 0 representing no irritability and 10 representing the highest level of irritability.
For each patient, the two irritability scores and the difference in their scores (placebo minus omega-3) were recorded. The results are summarized in the table and boxplots.
 \( n \)MeanStandard Deviation
Placebo\( 19 \)\( 5.421 \)\( 2.987 \)
Omega-3\( 19 \)\( 3.632 \)\( 1.739 \)
Difference (placebo minus omega-3)\( 19 \)\( 1.789 \)\( 2.485 \)
The researcher claims the omega-3 supplement will decrease the mean irritability score of all patients with the medical condition similar to the volunteers who participated in the study. Is there convincing statistical evidence to support the researcher’s claim at a significance level of \( \alpha = 0.05 \)? Complete the appropriate inference procedure to support your answer.

Most-appropriate topic codes (CED):

TOPIC 7.5: Carrying Out a Test for a Population Mean — sections 1-3
TOPIC 7.4: Setting Up a Test for a Population Mean — section 1
TOPIC 7.7: Justifying a Claim About a Population Mean Based on a Confidence Interval — alternative approach
▶️ Answer/Explanation
Detailed solution

State: We will conduct a paired t-test for a population mean difference.
Let \( \mu_d \) = true mean difference (placebo minus omega-3) in irritability scores for all patients with this medical condition.
\( H_0: \mu_d = 0 \)
\( H_a: \mu_d > 0 \)
\( \alpha = 0.05 \)

Plan: We verify the conditions:
Random: Treatments were randomly assigned to weeks for each patient.
10% Condition: Not needed since this is an experiment.
Normal/Large Sample: The sample size (\( n = 19 \)) is less than 30, but the boxplot of differences shows an approximately symmetric distribution with no outliers, so the sampling distribution of \( \bar{x}_d \) should be approximately normal.

Do: Test statistic:
\( t = \frac{\bar{x}_d – \mu_0}{s_d/\sqrt{n}} = \frac{1.789 – 0}{2.485/\sqrt{19}} \approx \frac{1.789}{0.570} \approx 3.138 \)
Degrees of freedom: \( df = 19 – 1 = 18 \)
p-value: \( P(t > 3.138) \approx 0.0028 \)

Conclude: Since p-value \( = 0.0028 < \alpha = 0.05 \), we reject \( H_0 \). There is convincing statistical evidence that the true mean difference (placebo minus omega-3) in irritability scores for all patients with this medical condition is greater than zero. This supports the researcher’s claim that the omega-3 supplement decreases the mean irritability score.

Question 5

Wildlife biologists are interested in the health of tule elk, a species of deer found in California. An important measurement of tule elk health is their weight. The weight of a tule elk is difficult to measure in the wild. However, chest circumference, which is believed to be related to the weight of a tule elk, can easily be measured from a safe distance using a harmless laser. A study was done to investigate whether chest circumference, in centimeters (cm), could be used to accurately estimate the weight, in kilograms (kg), of male tule elk. For the study, wildlife biologists captured \(30\) male tule elk, measured their chest circumference and weight, and then released the elk. The data for the \(30\) male tule elk are shown in the scatterplot.
(a) Describe the relationship between chest circumference and weight of male tule elk in context.
Following is the equation of the least-squares regression line relating chest circumference and weight for male tule elk.
predicted weight \( = -350.3 + 3.7455(\text{chest circumference})\)
(b) The weight of one male tule elk with a chest circumference of \(145.9\) cm is \(204.3\) kg.
(i) Using the equation of the least-squares regression line, calculate the predicted weight for this male tule elk. Show your work.
(ii) Calculate the residual for this male tule elk. Show your work.
The equation of the least-squares regression line relating chest circumference and weight for male tule elk is repeated here.
predicted weight \( = -350.3 + 3.7455(\text{chest circumference})\)
(c) Interpret the slope of the least-squares regression line in context.
(d) The sambar, another species of deer, is similar in size to the tule elk. The slope of the population regression line relating chest circumference and weight for all male sambars is \(4.5\) kilograms per centimeter. A wildlife biologist wants to determine whether the slope of the population regression line for male tule elk is different than that for male sambars. Let \(\beta\) represent the slope of the population regression line for male tule elk. The wildlife biologist conducted a test of the following hypotheses using the sample of \(30\) tule elk.
\[
\begin{aligned}
H_0 &: \beta = 4.5 \\
H_a &: \beta \ne 4.5
\end{aligned}
\]
The test statistic was calculated to be \(3.408\). Assume all conditions for inference were met.
(i) Determine the p-value of the test.
(ii) At a significance level of \(\alpha=0.05\), what conclusion should the wildlife biologist make regarding the slope of the population regression line for male tule elk? Justify your response.

Most-appropriate topic codes (CED):

TOPIC 2.4: Representing the Relationship Between Two Quantitative Variables
TOPIC 2.7: Residuals
TOPIC 2.8: Least Squares Regression
TOPIC 9.5: Carrying Out a Test for the Slope of a Regression Model
▶️ Answer/Explanation
Detailed solution

(a)
There is a strong, positive, and roughly linear relationship between the chest circumference and weight of male tule elk. There are no obvious outliers or influential points that deviate from the linear pattern.

(b)
(i) Predicted weight \( = -350.3 + 3.7455(145.9) \approx -350.3 + 546.47 \approx 196.17\) kg.
\(\boxed{\text{Predicted weight} \approx 196.17 \text{ kg}}\)
(ii) Residual = Actual – Predicted
Residual \( = 204.3 – 196.17 = 8.13\) kg.
\(\boxed{\text{Residual} \approx 8.13 \text{ kg}}\)

(c)
For each additional centimeter of chest circumference, the predicted weight of a male tule elk increases by approximately \(3.7455\) kilograms.

(d)
(i) We need to find the p-value for a t-test statistic of \(3.408\) with degrees of freedom \(df = n-2 = 30-2 = 28\). Since the alternative hypothesis is two-sided (\(H_a: \beta \ne 4.5\)), the p-value is \(2 \times P(t_{28} > 3.408)\).
Using a t-table or calculator, this probability is approximately \(0.002\).
\(\boxed{\text{p-value} \approx 0.002}\)
(ii) Because the p-value (\(\approx 0.002\)) is less than the significance level (\(\alpha=0.05\)), the wildlife biologist should reject the null hypothesis. There is convincing statistical evidence that the slope of the population regression line for male tule elk is different from \(4.5\) kg/cm.

Section ii
Part B

Question 6

A jewelry company uses a machine to apply a coating of gold on a certain style of necklace. The amount of gold applied to a necklace is approximately normally distributed. When the machine is working properly, the amount of gold applied to a necklace has a mean of \(300\) milligrams (mg) and standard deviation of \(5\) mg.
(a) A necklace is randomly selected from the necklaces produced by the machine. Assuming that the machine is working properly, calculate the probability that the amount of gold applied to the necklace is between \(296\) mg and \(304\) mg.
The jewelry company wants to make sure the machine is working properly. Each day, Cleo, a statistician at the jewelry company, will take a random sample of the necklaces produced that day. Each selected necklace will be melted down and the amount of the gold applied to that necklace will be determined. Because a necklace must be destroyed to determine the amount of gold that was applied, Cleo will use random samples of size \(n=2\) necklaces.
Cleo starts by considering the mean amount of gold being applied to the necklaces. After Cleo takes a random sample of \(n=2\) necklaces, she computes the sample mean amount of gold applied to the two necklaces.
(b) Suppose the machine is working properly with a population mean amount of gold being applied of \(300\) mg and a population standard deviation of \(5\) mg.
(i) Calculate the probability that the sample mean amount of gold applied to a random sample of \(n=2\) necklaces will be greater than \(303\) mg.
(ii) Suppose Cleo took a random sample of \(n=2\) necklaces that resulted in a sample mean amount of gold applied of \(303\) mg. Would that result indicate that the population mean amount of gold being applied by the machine is different from \(300\) mg? Justify your answer without performing an inference procedure.
Now, Cleo will consider the variation in the amount of gold the machine applies to the necklaces. Because of the small sample size, \(n=2\), Cleo will use the sample range of the data for the two randomly selected necklaces, rather than the sample standard deviation.
Cleo will investigate the behavior of the range for samples of size \(n=2\). She will simulate the sampling distribution of the range of the amount of gold applied to two randomly sampled necklaces. Cleo generates \(100,000\) random samples of size \(n=2\) independent values from a normal distribution with mean \(\mu=300\) and standard deviation \(\sigma=5\). The range is calculated for the two observations in each sample. The simulated sampling distribution of the range is shown in Graph I. This process is repeated using \(\sigma=8\), as shown in Graph II, and again using \(\sigma=12\), as shown in Graph III.
(c) Use the information in the graphs to complete the following.
(i) Describe the sampling distribution of the sample range for random samples of size \(n=2\) from a normal distribution with standard deviation \(\sigma=5\), as shown in Graph I.
(ii) Describe how the sampling distribution of the sample range for samples of size \(n=2\) changes as the value of the population standard deviation increases.
Recall that Cleo needs to consider both the mean and standard deviation of the amount of gold applied to necklaces to determine whether the machine is working properly. Suppose that one month later, Cleo is again checking the machine to make sure it is working properly. Cleo takes a random sample of \(2\) necklaces and calculates the sample mean amount of gold applied as \(303\) mg and the sample range as \(10\) mg.
(d) Recall that the machine is working properly if the amount of gold applied to the necklaces has a mean of \(300\) mg and standard deviation of \(5\) mg.
(i) Consider Cleo’s range of \(10\) mg from the sample of size \(n=2\). If the machine is working properly with a standard deviation of \(5\) mg, is a sample range of \(10\) mg unusual? Justify your answer.
(ii) Do Cleo’s sample mean of \(303\) mg and range of \(10\) mg indicate that the machine is not working properly? Explain your answer.

Most-appropriate topic codes (CED):

TOPIC 1.10 — The Normal Distribution: (a)
TOPIC 5.7 — Sampling Distributions for Sample Means: (b)(i), (b)(ii), (d)(ii)
TOPIC 5.1 — Introducing Statistics: Why Is My Sample Not Like Yours?: (c)(i), (c)(ii), (d)(i), (d)(ii)
▶️ Answer/Explanation
Detailed solution

(a)
We are looking for \(P(296 < X < 304)\) for a normal distribution with \(\mu=300\) and \(\sigma=5\).
– Z-score for \(296\): \(z = \frac{296-300}{5} = -0.8\)
– Z-score for \(304\): \(z = \frac{304-300}{5} = 0.8\)
\(P(-0.8 < Z < 0.8) = P(Z < 0.8) – P(Z < -0.8) \approx 0.7881 – 0.2119 = 0.5762\).
\(\boxed{P \approx 0.576}\)

(b)
(i) The sampling distribution of \(\bar{x}\) for \(n=2\) is normal with \(\mu_{\bar{x}}=300\) and \(\sigma_{\bar{x}} = \frac{5}{\sqrt{2}} \approx 3.536\).
We need \(P(\bar{x} > 303)\).
\(z = \frac{303-300}{3.536} \approx 0.848\).
\(P(Z > 0.848) \approx 0.198\).
(ii) No, this result would not provide convincing evidence. A sample mean of \(303\) mg is not unusual because the probability of observing a sample mean this far or farther from \(300\) mg (\(P(\bar{x} \ge 303)\) or \(P(\bar{x} \le 297)\)) is large (\(2 \times 0.198 = 0.396\)).

(c)
(i) The sampling distribution of the sample range shown in Graph I is skewed to the right. The center is approximately \(6\) mg, and the values are spread from \(0\) mg to about \(25\) mg.
(ii) As the population standard deviation (\(\sigma\)) increases, the sampling distribution of the sample range becomes  more spread out and its center (mean) increases.

(d)
(i) No, a sample range of \(10\) mg is not unusual. According to Graph I (\(\sigma=5\)), there is a notable proportion of the distribution at or above a sample range of \(10\) mg (approximately \(20\%\)), so this value occurs frequently by chance.
(ii) No, Cleo’s results do not indicate the machine is not working properly. As shown in part (b), a sample mean of \(303\) mg is not unusual. As shown in part (d-i), a sample range of \(10\) mg is also not unusual. Since neither the sample mean nor the sample range is an unusual result, there is no convincing evidence that the machine is not working properly.

Scroll to Top