Home / AP_Statistics_2017_FRQ

Question 1

Researchers studying a pack of gray wolves in North America collected data on the length x, in meters, from nose to tip of tail, and the weight y, in kilograms, of the wolves. A scatterplot of weight versus length revealed a relationship between the two variables described as positive, linear, and strong.
(a) For the situation described above, explain what is meant by each of the following words.
(i) Positive:
(ii) Linear:
(iii) Strong:

The data collected from the wolves were used to create the least-squares equation \(\hat{y}=-16.46+35.02x.\)
(b) Interpret the meaning of the slope of the least-squares regression line in context.
(c) One wolf in the pack with a length of \(1.4\) meters had a residual of \(-9.67\) kilograms. What was the weight of the wolf?

Most-appropriate topic codes (CED):

TOPIC 2.4: Representing the Relationship Between Two Quantitative Variables
TOPIC 2.8: Least Squares Regression
TOPIC 2.7: Residuals
▶️ Answer/Explanation
Detailed solution

(a)
(i) Positive: This means that as the length of a wolf increases, its weight also tends to increase.
(ii) Linear: This means that the data points on the scatterplot tend to follow a straight-line pattern. For a constant increase in length, there is a roughly constant increase in weight.
(iii) Strong: This means that the data points are tightly clustered around the linear trend, indicating that the linear model is a good fit for the data.

(b)
The slope of \(35.02\) means that for each additional meter in a wolf’s length, the predicted weight of the wolf increases by approximately \(35.02\) kilograms.

(c)
First, calculate the predicted weight for the wolf using the regression equation.
Predicted weight (\(\hat{y}\)) \( = -16.46 + 35.02(1.4) = -16.46 + 49.028 = 32.568\) kg.

Next, use the formula for a residual: Residual = Actual weight – Predicted weight.
\(-9.67 = \text{Actual weight} – 32.568\)
Actual weight \( = 32.568 – 9.67 = 22.898\) kg.
\(\boxed{\text{Actual weight} \approx 22.9 \text{ kg}}\)

Question 2

The manager of a local fast-food restaurant is concerned about customers who ask for a water cup when placing an order but fill the cup with a soft drink from the beverage fountain instead of filling the cup with water. The manager selected a random sample of \(80\) customers who asked for a water cup when placing an order and found that \(23\) of those customers filled the cup with a soft drink from the beverage fountain.
(a) Construct and interpret a \(95\) percent confidence interval for the proportion of all customers who, having asked for a water cup when placing an order, will fill the cup with a soft drink from the beverage fountain.
(b) The manager estimates that each customer who asks for a water cup but fills it with a soft drink costs the restaurant \(\$0.25\). Suppose that in the month of June \(3,000\) customers ask for a water cup when placing an order. Use the confidence interval constructed in part (a) to give an interval estimate for the cost to the restaurant for the month of June from the customers who ask for a water cup but fill the cup with a soft drink.

Most-appropriate topic codes (CED):

TOPIC 6.2: Constructing a Confidence Interval for a Population Proportion — part (a)
TOPIC 6.3: Justifying a Claim Based on a Confidence Interval for a Population Proportion — part (b)
▶️ Answer/Explanation
Detailed solution

(a)

State: We want to construct a \(95\%\) confidence interval for \(p\), the true proportion of all customers who ask for a water cup and fill it with a soft drink.

Plan: The appropriate procedure is a one-sample z-interval for a population proportion. We must check the conditions:
1. Random: The problem states that the manager selected a random sample of \(80\) customers. This condition is met.
2. Large Counts (for Normality): We must check if the number of successes and failures are both at least \(10\).

  • Number of successes = \(n\hat{p} = 23\). This is \( \ge 10 \).
  • Number of failures = \(n(1-\hat{p}) = 80 – 23 = 57\). This is \( \ge 10 \).

Since both conditions are met, we can proceed with constructing the interval.

Do: The sample proportion is \(\hat{p} = \frac{23}{80} = 0.2875\). For a \(95\%\) confidence level, the critical value is \(z^* = 1.96\).
The confidence interval is calculated as: \[ \hat{p} \pm z^* \sqrt{\frac{\hat{p}(1-\hat{p})}{n}} \] \[ 0.2875 \pm 1.96 \sqrt{\frac{0.2875(1-0.2875)}{80}} \] \[ 0.2875 \pm 1.96 \sqrt{\frac{(0.2875)(0.7125)}{80}} \] \[ 0.2875 \pm 1.96(0.0506) \] \[ 0.2875 \pm 0.0992 \] The interval is \((0.1883, 0.3867)\).

\(\boxed{(0.188, 0.387)}\)

Conclude: We are \(95\%\) confident that the interval from \(0.1883\) to \(0.3867\) captures the true proportion of all customers who, having asked for a water cup, will fill the cup with a soft drink from the beverage fountain.

(b)
To create an interval estimate for the total cost, we multiply the endpoints of the confidence interval for the proportion by the total number of customers who ask for a water cup (\(3,000\)) and the cost per customer (\(\$0.25\)).

Lower bound of cost estimate: \[ 0.1883 \times 3,000 \times \$0.25 = \$141.225 \] Upper bound of cost estimate: \[ 0.3867 \times 3,000 \times \$0.25 = \$290.025 \] Thus, a \(95\%\) confidence interval for the cost to the restaurant in June is from \(\$141.23\) to \(\$290.03\).

\(\boxed{(\$141.23, \$290.03)}\)

Question 3

A grocery store purchases melons from two distributors, J and K. Distributor J provides melons from organic farms. The distribution of the diameters of the melons from Distributor J is approximately normal with mean 133 millimeters (mm) and standard deviation 5 mm.
(a) For a melon selected at random from Distributor J, what is the probability that the melon will have a diameter greater than 137 mm?
Distributor K provides melons from nonorganic farms. The probability is 0.8413 that a melon selected at random from Distributor K will have a diameter greater than 137 mm. For all the melons at the grocery store, 70 percent of the melons are provided by Distributor J and 30 percent are provided by Distributor K.
(b) For a melon selected at random from the grocery store, what is the probability that the melon will have a diameter greater than 137 mm?
(c) Given that a melon selected at random from the grocery store has a diameter greater than 137 mm, what is the probability that the melon will be from Distributor J?

Most-appropriate topic codes (CED):

TOPIC 4.3: Introduction to Probability — parts (a), (b), (c)
TOPIC 4.5: Conditional Probability — part (c)
TOPIC 4.7: Introduction to Random Variables and Probability Distributions — part (a)
▶️ Answer/Explanation
Detailed solution

(a)
Let \( X \) represent the diameter of a melon from Distributor J.
\( X \sim N(133, 5) \)
We want \( P(X > 137) \).
\( z = \frac{137 – 133}{5} = \frac{4}{5} = 0.8 \)
\( P(z > 0.8) = 1 – P(z < 0.8) = 1 – 0.7881 = 0.2119 \)
The probability is \( \boxed{0.2119} \).

(b)
Let \( G \) be the event that a melon has diameter greater than 137 mm.
From part (a): \( P(G \mid J) = 0.2119 \)
Given: \( P(G \mid K) = 0.8413 \)
\( P(J) = 0.7 \), \( P(K) = 0.3 \)
Using the law of total probability:
\( P(G) = P(G \mid J)P(J) + P(G \mid K)P(K) \)
\( P(G) = (0.2119)(0.7) + (0.8413)(0.3) = 0.14833 + 0.25239 = 0.40072 \)
The probability is \( \boxed{0.4007} \).

OR Tree structure (probabilities):

From the tree diagram, \( P(G)=P(G \text{ and } J)+P(G \text{ and } K)=0.1483+0.2524=0.4007 \).

(c)
We want \( P(J \mid G) \).
Using Bayes’ theorem:
\( P(J \mid G) = \frac{P(G \mid J)P(J)}{P(G)} = \frac{(0.2119)(0.7)}{0.4007} = \frac{0.14833}{0.4007} = 0.3701 \)
The probability is \( \boxed{0.3701} \).

Question 4

The chemicals in clay used to make pottery can differ depending on the geographical region where the clay originated. Sometimes, archaeologists use a chemical analysis of clay to help identify where a piece of pottery originated. Such an analysis measures the amount of a chemical in the clay as a percent of the total weight of the piece of pottery. The boxplots below summarize analyses done for three chemicals—X, Y, and Z—on pieces of pottery that originated at one of three sites: I, II, or III.
(a) For chemical Z, describe how the percents found in the pieces of pottery are similar and how they differ among the three sites.
(b) Consider a piece of pottery known to have originated at one of the three sites, but the actual site is not known.
(i) Suppose an analysis of the clay reveals that the sum of the percents of the three chemicals X, Y, and Z is 20.5%. Based on the boxplots, which site—I, II, or III—is the most likely site where the piece of pottery originated? Justify your choice.
(ii) Suppose only one chemical could be analyzed in the piece of pottery. Which chemical—X, Y, or Z—would be the most useful in identifying the site where the piece of pottery originated? Justify your choice.

Most-appropriate topic codes (CED):

TOPIC 1.9: Comparing Distributions of a Quantitative Variable — part (a)
TOPIC 1.6: Describing the Distribution of a Quantitative Variable — part (b-i)
TOPIC 2.3: Statistics for Two Categorical Variables — part (b-ii)
▶️ Answer/Explanation
Detailed solution

(a)
For chemical Z, the median percentage is similar across all three sites, approximately \( 7\% \) for each site. However, the variability differs substantially among the sites. Site II has the smallest range (approximately \( 6\% \) to \( 8\% \)), Site I has a moderate range (approximately \( 4\% \) to \( 10\% \)), and Site III has the largest range (approximately \( 3\% \) to \( 11\% \)).

(b)
(i) The piece most likely originated from \( \boxed{\text{Site III}} \).
Calculating the approximate sum ranges for each site:
• Site I: Minimum sum ≈ \( 6 + 11 + 4 = 21\% \), Maximum sum ≈ \( 8 + 15 + 10 = 33\% \)
• Site II: Minimum sum ≈ \( 5 + 1.9 + 6 = 12.9\% \), Maximum sum ≈ \( 7 + 4 + 8 = 19\% \)
• Site III: Minimum sum ≈ \( 5 + 6 + 3 = 14\% \), Maximum sum ≈ \( 7.5 + 8 + 11 = 26.5\% \)
Only Site III has a range (14% to 26.5%) that includes 20.5%.

Chemical Concentration Ranges (by Site)

ChemicalSite ISite IISite III
 MinMaxMinMaxMinMax
X\(6\)\(8\)\(5\)\(7\)\(5\)\(7.5\)
Y\(11\)\(15\)\(1.9\)\(4\)\(6\)\(8\)
Z\(4\)\(10\)\(6\)\(8\)\(3\)\(11\)
Sum\(21\)\(33\)\(12.9\)\(19\)\(14\)\(26.5\)

(ii) \( \boxed{\text{Chemical Y}} \) would be most useful for identifying the site.
Chemical Y shows the clearest distinction among the three sites with no overlap in the boxplots. Site I has high percentages (approximately 11% to 15%), Site II has low percentages (approximately 1.9% to 4%), and Site III has moderate percentages (approximately 6% to 8%). In contrast, chemicals X and Z show substantial overlap among the sites, making them less useful for distinguishing the origin.

Question 5

The table and the bar chart below summarize the age at diagnosis, in years, for a random sample of 207 men and women currently being treated for schizophrenia.
Age-Group (years)20 to 2930 to 3940 to 4950 to 59Total
Women46402112119
Men53239388
Total99633015207
Do the data provide convincing statistical evidence of an association between age-group and gender in the diagnosis of schizophrenia?

Most-appropriate topic codes (CED):

TOPIC 8.4: Expected Counts in Two-Way Tables — part of hypothesis test
TOPIC 8.5: Carrying Out a Chi-Square Test for Goodness of Fit — entire question
TOPIC 8.6: Concluding a Test for a Population Proportion — conclusion
▶️ Answer/Explanation
Detailed solution

We will conduct a chi-square test for independence using the four-step process.

State:
\( H_0 \): There is no association between age-group and gender in the diagnosis of schizophrenia.
\( H_a \): There is an association between age-group and gender in the diagnosis of schizophrenia.

Plan:
We will use a chi-square test for independence.
Conditions:
1. Random sample: Satisfied – stated in the problem
2. Expected counts: All expected counts should be at least 5

Age-Group20-2930-3940-4950-59
Women (Expected)56.9136.2217.258.62
Men (Expected)42.0926.7812.756.38

All expected counts are greater than 5, so conditions are satisfied.

Do:
Test statistic: \( \chi^2 = \sum \frac{(O – E)^2}{E} \)
Calculations:
Women 20-29: \( \frac{(46 – 56.91)^2}{56.91} = 2.093 \)
Women 30-39: \( \frac{(40 – 36.22)^2}{36.22} = 0.395 \)
Women 40-49: \( \frac{(21 – 17.25)^2}{17.25} = 0.817 \)
Women 50-59: \( \frac{(12 – 8.62)^2}{8.62} = 1.322 \)
Men 20-29: \( \frac{(53 – 42.09)^2}{42.09} = 2.830 \)
Men 30-39: \( \frac{(23 – 26.78)^2}{26.78} = 0.534 \)
Men 40-49: \( \frac{(9 – 12.75)^2}{12.75} = 1.105 \)
Men 50-59: \( \frac{(3 – 6.38)^2}{6.38} = 1.788 \)
Total: \( \chi^2 = 2.093 + 0.395 + 0.817 + 1.322 + 2.830 + 0.534 + 1.105 + 1.788 = 10.884 \)
Degrees of freedom: \( (4 – 1) \times (2 – 1) = 3 \)
p-value: \( P(\chi^2 \geq 10.884) = 0.012 \)

Conclude:
Since the p-value (0.012) is less than \( \alpha = 0.05 \), we reject the null hypothesis. There is convincing statistical evidence of an association between age-group and gender in the diagnosis of schizophrenia.

Question 6

Consider an experiment in which two men and two women will be randomly assigned to either a treatment group or a control group in such a way that each group has two people. The people are identified as Man 1, Man 2, Woman 1, and Woman 2. The six possible arrangements are shown below.
Arrangement A
Treatment: Man 1, Man 2
Control: Woman 1, Woman 2
Arrangement B
Treatment: Man 1, Woman 1
Control: Man 2, Woman 2
Arrangement C
Treatment: Man 1, Woman 2
Control: Man 2, Woman 1
Arrangement D
Treatment: Woman 1, Woman 2
Control: Man 1, Man 2
Arrangement E
Treatment: Man 2, Woman 1
Control: Man 1, Woman 2
Arrangement F
Treatment: Man 2, Woman 2
Control: Man 1, Woman 1
Two possible methods of assignment are being considered: the sequential coin flip method, as described in part (a), and the chip method, as described in part (b). For each method, the order of the assignment will be Man 1, Man 2, Woman 1, Woman 2.
(a) For the sequential coin flip method, a fair coin is flipped until one group has two people. An outcome of tails assigns the person to the treatment group, and an outcome of heads assigns the person to the control group. As soon as one group has two people, the remaining people are automatically assigned to the other group.
(i) Complete the table below by calculating the probability of each arrangement occurring if the sequential coin flip method is used.
ArrangementABCDEF
Probability      
(ii) For the sequential coin flip method, what is the probability that Man 1 and Man 2 are assigned to the same group?
The six arrangements are repeated again
Arrangement A
Treatment: Man 1, Man 2
Control: Woman 1, Woman 2
Arrangement B
Treatment: Man 1, Woman 1
Control: Man 2, Woman 2
Arrangement C
Treatment: Man 1, Woman 2
Control: Man 2, Woman 1
Arrangement D
Treatment: Woman 1, Woman 2
Control: Man 1, Man 2
Arrangement E
Treatment: Man 2, Woman 1
Control: Man 1, Woman 2
Arrangement F
Treatment: Man 2, Woman 2
Control: Man 1, Woman 1
 
(b) For the chip method, two chips are marked “treatment” and two chips are marked “control.” Each person selects one chip at random without replacement.
(i) Complete the table below by calculating the probability of each arrangement occurring if the chip method is used.
ArrangementABCDEF
Probability      
(ii) For the chip method, what is the probability that Man 1 and Man 2 are assigned to the same group?
(c) Sixteen participants consisting of 10 students and 6 teachers at an elementary school will be used for an experiment to determine lunch preference for the school population of students and teachers. As the participants enter the school cafeteria for lunch, they will be randomly assigned to receive one of two lunches so that 8 will receive a salad, and 8 will receive a grilled cheese sandwich. The students will enter the cafeteria first, and the teachers will enter next. Which method, the sequential coin flip method or the chip method, should be used to assign the treatments? Justify your choice.

Most-appropriate topic codes (CED):

TOPIC 4.2: Estimating Probabilities Using Simulation — parts (a), (b)
TOPIC 4.3: Introduction to Probability — parts (a), (b)
TOPIC 3.6: Selecting an Experimental Design — part (c)
▶️ Answer/Explanation
Detailed solution

(a) Sequential Coin Flip Method

(i) The probabilities for each arrangement:

Arrangements and Probabilities for Coin Outcomes

ArrangementABCDEF
Coin outcomesTTTHTTHHHHHTHHTT
Calculation\(\left(\tfrac{1}{2}\right)\!\left(\tfrac{1}{2}\right)\)\(\left(\tfrac{1}{2}\right)\!\left(\tfrac{1}{2}\right)\!\left(\tfrac{1}{2}\right)\)\(\left(\tfrac{1}{2}\right)\!\left(\tfrac{1}{2}\right)\!\left(\tfrac{1}{2}\right)\)\(\left(\tfrac{1}{2}\right)\!\left(\tfrac{1}{2}\right)\)\(\left(\tfrac{1}{2}\right)\!\left(\tfrac{1}{2}\right)\!\left(\tfrac{1}{2}\right)\)\(\left(\tfrac{1}{2}\right)\!\left(\tfrac{1}{2}\right)\!\left(\tfrac{1}{2}\right)\)
Probability\(\tfrac{1}{4}\)\(\tfrac{1}{8}\)\(\tfrac{1}{8}\)\(\tfrac{1}{4}\)\(\tfrac{1}{8}\)\(\tfrac{1}{8}\)

Justification: Arrangements A and D occur when the first two people get the same assignment (TT or HH), with probability \( \left(\frac{1}{2}\right)^2 = \frac{1}{4} \) each. Arrangements B, C, E, and F occur when the first two people get different assignments, requiring a third flip to determine the final assignment, with probability \( \frac{1}{8} \) each.

(ii) Man 1 and Man 2 are assigned to the same group in Arrangements A and D.
Probability = \( P(A) + P(D) = \frac{1}{4} + \frac{1}{4} = \frac{1}{2} \).
\( \boxed{\frac{1}{2}} \)

(b) Chip Method

(i) The probabilities for each arrangement:

Arrangements and Probabilities for Chip Outcomes

ArrangementABCDEF
Chip outcomesTTTCTTCCCCCTCCTT
Calculation\(\left(\tfrac{2}{4}\right)\!\left(\tfrac{1}{3}\right)\)\(\left(\tfrac{2}{4}\right)\!\left(\tfrac{2}{3}\right)\!\left(\tfrac{1}{2}\right)\)\(\left(\tfrac{2}{4}\right)\!\left(\tfrac{2}{3}\right)\!\left(\tfrac{1}{2}\right)\)\(\left(\tfrac{2}{4}\right)\!\left(\tfrac{1}{3}\right)\)\(\left(\tfrac{2}{4}\right)\!\left(\tfrac{2}{3}\right)\!\left(\tfrac{1}{2}\right)\)\(\left(\tfrac{2}{4}\right)\!\left(\tfrac{2}{3}\right)\!\left(\tfrac{1}{2}\right)\)
Probability\(\tfrac{1}{6}\)\(\tfrac{1}{6}\)\(\tfrac{1}{6}\)\(\tfrac{1}{6}\)\(\tfrac{1}{6}\)\(\tfrac{1}{6}\)

Justification: There are \( \binom{4}{2} = 6 \) equally likely ways to choose which two people get the treatment group, and each arrangement corresponds to one of these choices.

(ii) Man 1 and Man 2 are assigned to the same group in Arrangements A and D.
Probability = \( P(A) + P(D) = \frac{1}{6} + \frac{1}{6} = \frac{1}{3} \).
\( \boxed{\frac{1}{3}} \)

(c)
The \( \boxed{\text{chip method}} \) should be used.
Justification: The chip method gives equal probability (\( \frac{1}{6} \)) to all possible arrangements, ensuring balanced treatment groups. The coin method gives higher probability (\( \frac{1}{4} \) each) to arrangements where the first two people (both students) are in the same group, which could result in imbalanced groups with respect to students and teachers. Since students and teachers may have different food preferences, this imbalance could confound the results, making it difficult to determine whether differences in lunch preference are due to the treatment or the role (student vs. teacher).

Scroll to Top