Home / AP_Statistics_2011_FRQ

Question 1

A professional sports team evaluates potential players for a certain position based on two main characteristics, speed and strength.

(a) Speed is measured by the time required to run a distance of 40 yards, with smaller times indicating more desirable (faster) speeds. From previous speed data for all players in this position, the times to run 40 yards have a mean of \(4.60\) seconds and a standard deviation of \(0.15\) seconds, with a minimum time of \(4.40\) seconds, as shown in the table below.

 MeanStandard DeviationMinimum
Time to run 40 yards4.60 seconds0.15 seconds4.40 seconds

Based on the relationship between the mean, standard deviation, and minimum time, is it reasonable to believe that the distribution of 40-yard running times is approximately normal? Explain.

(b) Strength is measured by the amount of weight lifted, with more weight indicating more desirable (greater) strength. From previous strength data for all players in this position, the amount of weight lifted has a mean of \(310\) pounds and a standard deviation of \(25\) pounds, as shown in the table below.

 MeanStandard Deviation
Amount of weight lifted310 pounds25 pounds

Calculate and interpret the z-score for a player in this position who can lift a weight of \(370\) pounds.

(c) The characteristics of speed and strength are considered to be of equal importance to the team in selecting a player for the position. Based on the information about the means and standard deviations of the speed and strength data for all players and the measurements listed in the table below for Players A and B, which player should the team select if the team can only select one of the two players? Justify your answer.

 Player APlayer B
Time to run 40 yards4.42 seconds4.57 seconds
Amount of weight lifted370 pounds375 pounds

Most-appropriate topic codes (CED):

TOPIC 1.10: The Normal Distribution — part (a)
TOPIC 1.7: Summary Statistics for a Quantitative Variable — part (b)
TOPIC 4.8: Mean and Standard Deviation of Random Variables — part (c)
▶️ Answer/Explanation
Detailed solution

(a)
No, it is not reasonable to believe the distribution is approximately normal.
The minimum time of \(4.40\) seconds is \(z = \frac{4.40 – 4.60}{0.15} = -1.33\) standard deviations below the mean.
In a normal distribution, about \(9.2\%\) of values would be below \(z = -1.33\), but the data show no values below \(4.40\) seconds, which contradicts what we would expect from a normal distribution.

(b)
\(z = \frac{370 – 310}{25} = 2.4\)
Interpretation: This player’s weight lifting ability is \(2.4\) standard deviations above the mean for all players in this position.
Answer: \(\boxed{2.4}\)

(c)
Calculate z-scores for both players:
Speed (lower is better):
Player A: \(z = \frac{4.42 – 4.60}{0.15} = -1.2\)
Player B: \(z = \frac{4.57 – 4.60}{0.15} = -0.2\)

Strength (higher is better):
Player A: \(z = \frac{370 – 310}{25} = 2.4\)
Player B: \(z = \frac{375 – 310}{25} = 2.6\)

Player A is much faster (\(1.0\) standard deviation better) and only slightly weaker (\(0.2\) standard deviation worse). Since speed and strength are equally important, the team should select Player A.

Question 2

The table below shows the political party registration by gender of all \(500\) registered voters in Franklin Township.

PARTY REGISTRATION—FRANKLIN TOWNSHIP

 Party WParty XParty YTotal
Female\(60\)\(120\)\(120\)\(300\)
Male\(28\)\(124\)\(48\)\(200\)
Total\(88\)\(244\)\(168\)\(500\)

(a) Given that a randomly selected registered voter is a male, what is the probability that he is registered for Party Y?

(b) Among the registered voters of Franklin Township, are the events “is a male” and “is registered for Party Y” independent? Justify your answer based on probabilities calculated from the table above.

(c) One way to display the data in the table is to use a segmented bar graph. The following segmented bar graph, constructed from the data in the party registration—Franklin Township table, shows party-registration distributions for males and females in Franklin Township.

In Lawrence Township, the proportions of all registered voters for Parties W, X, and Y are the same as for Franklin Township, and party registration is independent of gender. Complete the graph below to show the distributions of party registration by gender in Lawrence Township.

Most-appropriate topic codes (CED):

TOPIC 4.5: Conditional Probability — part (a)
TOPIC 4.6: Independent Events — part (b)
TOPIC 2.2: Representing Two Categorical Variables — part (c)
▶️ Answer/Explanation
Detailed solution

(a)
This is a conditional probability. We are given that the voter is male, so we only look at the “Male” row. There are \(\(200\)\) males in total, and \(\(48\)\) of them are registered for Party Y. \[ P(\text{Party Y} \mid \text{Male}) = \frac{\text{Number of Males in Party Y}}{\text{Total Number of Males}} = \frac{48}{200} = 0.24 \]

(b)
No, the events are not independent. Two events A and B are independent if \(P(A \mid B) = P(A)\).

From part (a), we know: \[ P(\text{Party Y} \mid \text{Male}) = 0.24 \] Now, we calculate the overall probability of being registered for Party Y: \[ P(\text{Party Y}) = \frac{\text{Total Registered for Party Y}}{\text{Total Registered Voters}} = \frac{168}{500} = 0.336 \] Since \(P(\text{Party Y} \mid \text{Male}) \neq P(\text{Party Y})\) ( \(0.24 \neq 0.336\) ), the events “is a male” and “is registered for Party Y” are not independent.

(c)
If party registration is independent of gender, the distribution of parties must be the same for males, females, and the overall population. We use the marginal (overall) proportions from the “Total” row of the Franklin Township table.

  • Party W: \(\frac{88}{500} = 0.176\)
  • Party X: \(\frac{244}{500} = 0.488\)
  • Party Y: \(\frac{168}{500} = 0.336\)

Both the “Male” and “Female” bars in the graph must be segmented identically using these proportions.

  • Party W segment: from \(0.0\) to \(0.176\)
  • Party X segment: from \(0.176\) to \(0.176 + 0.488 = 0.664\)
  • Party Y segment: from \(0.664\) to \(0.664 + 0.336 = 1.0\)

Completed Graph:

Question 3

An apartment building has nine floors and each floor has four apartments. The building owner wants to install new carpeting in eight apartments to see how well it wears before she decides whether to replace the carpet in the entire building.

The figure below shows the floors of apartments in the building with their apartment numbers. Only the nine apartments indicated with an asterisk (*) have children in the apartment.

\(11\)*\(12\)
1st Floor
\(14\)\(13\)
\(21\)\(22\)*
2nd Floor
\(24\)\(23\)*
\(31\)\(32\)
3rd Floor
\(34\)\(33\)
\(41\)\(42\)
4th Floor
\(44\)\(43\)
\(51\)*\(52\)
5th Floor
\(54\)\(53\)
\(61\)\(62\)
6th Floor
\(64\)\(63\)
\(71\)\(72\)
7th Floor
\(74\)*\(73\)*
\(81\)\(82\)
8th Floor
\(84\)*\(83\)
\(91\)\(92\)*
9th Floor
\(94\)\(93\)*
* = Children in the apartment

(a) For convenience, the apartment building owner wants to use a cluster sampling method, in which the floors are clusters, to select the eight apartments. Describe a process for randomly selecting eight different apartments using this method.

(b) An alternative sampling method would be to select a stratified random sample of eight apartments, where the strata are apartments with children and apartments with no children. A stratified random sample of size eight might include two randomly selected apartments with children and six randomly selected apartments with no children. In the context of this situation, give one statistical advantage of selecting such a stratified sample as opposed to a cluster sample of eight apartments using the floors as clusters.

Most-appropriate topic codes (CED):

TOPIC 3.3: Random Sampling and Data Collection — part (a)
TOPIC 3.3: Random Sampling and Data Collection — part (b)
▶️ Answer/Explanation
Detailed solution

(a)
To get a sample of \(8\) apartments using floors as clusters, the owner must select \(2\) floors (since each floor has \(4\) apartments).

Step 1: Label the nine floors with unique integers from \(1\) to \(9\).
Step 2: Use a random number generator to select two different integers from \(1\) to \(9\). (For example, generate a random integer from \(1\) to \(9\), then generate another from \(1\) to \(8\) to select from the remaining floors, or use a generator to select two unique numbers simultaneously).
Step 3: The sample will consist of all \(4\) apartments on each of the \(2\) randomly selected floors, for a total of \(8\) apartments.

(b)
The primary statistical advantage of the stratified sample is that it guarantees representation from both strata (apartments with children and apartments without children).

The amount of wear on the carpet is the variable being studied, and it is very likely that the presence of children will affect how the carpet wears. The cluster sample described in part (a) could, by pure chance, select two floors with no children (e.g., Floor \(3\) and Floor \(6\)). This sample would provide no information about carpet wear in apartments with children, making the results unrepresentative.

The stratified sample ensures that both types of apartments are included, providing a more representative sample and more precise information about carpet wear for both groups.

Question 4

High cholesterol levels in people can be reduced by exercise, diet, and medication. Twenty middle-aged males with cholesterol readings between \(220\) and \(240\) milligrams per deciliter (mg/dL) of blood were randomly selected from the population of such male patients at a large local hospital. Ten of the \(20\) males were randomly assigned to group A, advised on appropriate exercise and diet, and also received a placebo. The other \(10\) males were assigned to group B, received the same advice on appropriate exercise and diet, but received a drug intended to reduce cholesterol instead of a placebo. After three months, posttreatment cholesterol readings were taken for all \(20\) males and compared to pretreatment cholesterol readings. The tables below give the reduction in cholesterol level (pretreatment reading minus posttreatment reading) for each male in the study.
Group A (placebo)
\(2\)\(19\)\(8\)\(4\)\(12\)\(8\)\(17\)\(7\)\(24\)\(1\)
Mean Reduction: \(10.20\) | Standard Deviation of Reductions: \(7.66\)
Group B (cholesterol drug)
\(30\)\(19\)\(18\)\(17\)\(20\)\(-4\)\(23\)\(10\)\(9\)\(22\)
Mean Reduction: \(16.40\) | Standard Deviation of Reductions: \(9.40\)
Do the data provide convincing evidence, at the \(\alpha = 0.01\) level, that the cholesterol drug is effective in producing a reduction in mean cholesterol level beyond that produced by exercise and diet?

Most-appropriate topic codes (CED):

TOPIC 7.9: Carrying Out a Test for the Difference of Two Population Means — part (State)
TOPIC 7.9: Carrying Out a Test for the Difference of Two Population Means — part (Plan)
TOPIC 7.9: Carrying Out a Test for the Difference of Two Population Means — part (Do)
TOPIC 7.9: Carrying Out a Test for the Difference of Two Population Means — part (Conclude)
▶️ Answer/Explanation
Detailed solution

State:
Let \( \mu_A \) = mean cholesterol reduction for all such patients receiving placebo
Let \( \mu_B \) = mean cholesterol reduction for all such patients receiving drug
\( H_0: \mu_A = \mu_B \)
\( H_a: \mu_A < \mu_B \)

Plan:
Two-sample \( t \)-test for difference in means
Conditions:
1. Random assignment: Subjects were randomly assigned to groups A and B
2. Independence: Reasonable to assume observations are independent
3. Normality: Sample sizes are small (n=10 each), but no strong skewness or extreme outliers apparent

The following dotplots reveal slight skewness and a possible outlier for group B, but it appears reasonable to proceed with the two-sample t-test.

Do:
\[t = \frac{\bar{x}_A – \bar{x}_B}{\sqrt{\frac{s_A^2}{n_A} + \frac{s_B^2}{n_B}}} = \frac{10.20 – 16.40}{\sqrt{\frac{7.66^2}{10} + \frac{9.40^2}{10}}} = \frac{-6.20}{\sqrt{5.8676 + 8.836}} = \frac{-6.20}{\sqrt{14.7036}} = \frac{-6.20}{3.834} \approx -1.62\]
Degrees of freedom ≈ \(17.3\)
\( p \)-value ≈ \(0.062\)

Conclude:
Since \( p \)-value (\(0.062\)) > \( \alpha \) (\(0.01\)), we fail to reject \( H_0 \). There is not convincing evidence that the cholesterol drug produces a greater mean reduction than exercise and diet alone.

Question 5

Windmills generate electricity by transferring energy from wind to a turbine. A study was conducted to examine the relationship between wind velocity in miles per hour (mph) and electricity production in amperes for one particular windmill. For the windmill, measurements were taken on twenty-five randomly selected days, and the computer output for the regression analysis for predicting electricity production based on wind velocity is given below. The regression model assumptions were checked and determined to be reasonable over the interval of wind speeds represented in the data, which were from \(10\) miles per hour to \(40\) miles per hour.
PredictorCoefSE CoefTP
Constant\(0.137\)\(0.126\)\(1.09\)\(0.289\)
Wind velocity\(0.240\)\(0.019\)\(12.63\)\(0.000\)
\(S=0.237\)\(R-Sq=0.873\)\(R-Sq(adj)=0.868\)

(a) Use the computer output above to determine the equation of the least squares regression line. Identify all variables used in the equation.

(b) How much more electricity would the windmill be expected to produce on a day when the wind velocity is \(25\) mph than on a day when the wind velocity is \(15\) mph? Show how you arrived at your answer.

(c) What proportion of the variation in electricity production is explained by its linear relationship with wind velocity?

(d) Is there statistically convincing evidence that electricity production by the windmill is related to wind velocity? Explain.

Most-appropriate topic codes (CED):

TOPIC 2.8: Least Squares Regression — part (a), (b)
TOPIC 2.6: Linear Regression Models — part (c)
TOPIC 9.5: Carrying Out a Test for the Slope of a Regression Model — part (d)
▶️ Answer/Explanation
Detailed solution

(a)
The equation is found using the “Coef” (coefficient) column from the output.

  • The intercept (Constant) is \(0.137\).
  • The slope (Wind velocity) is \(0.240\).

The equation is: \[ \hat{y} = 0.137 + 0.240x \] Where:

  • \(\hat{y}\) is the predicted electricity production in amperes.
  • \(x\) is the wind velocity in mph.

(Alternatively: Predicted Production = \(0.137 + 0.240 \times\) Wind Velocity)

(b)
The slope of the line, \(0.240\), represents the expected increase in electricity production (in amperes) for each additional \(1\) mph of wind velocity.

The question asks for the expected difference between \(25\) mph and \(15\) mph, which is a difference of \(10\) mph.

Calculation: \[ (10 \text{ mph}) \times (0.240 \frac{\text{amperes}}{\text{mph}}) = 2.4 \text{ amperes} \] The windmill is expected to produce \(2.4\) amperes more electricity on a day with \(25\) mph wind velocity than on a day with \(15\) mph wind velocity.

(c)
The proportion of variation in the response variable (electricity production) that is explained by the linear relationship with the explanatory variable (wind velocity) is given by the coefficient of determination, \(R-Sq\).

From the computer output, \(R-Sq = 0.873\).

Therefore, \(0.873\) (or \(87.3\%\)) of the variation in electricity production is explained by its linear relationship with wind velocity.

(d)
Yes, there is statistically convincing evidence that electricity production is related to wind velocity.

Explanation: We perform a hypothesis test for the slope of the regression line, \(\beta_1\).

  • Hypotheses: \(H_0: \beta_1 = 0\) (There is no linear relationship) vs. \(H_a: \beta_1 \neq 0\) (There is a linear relationship).
  • Test Statistic and p-value: From the “Wind velocity” row of the output, the t-statistic is \(t = 12.63\) and the p-value is \(P = 0.000\).
  • Conclusion: Assuming a standard significance level (e.g., \(\alpha = 0.05\)), our p-value (\(0.000\)) is less than \(\alpha\). We reject the null hypothesis.

Because the p-value is so small (essentially zero), we have strong evidence to conclude that a significant linear relationship exists between wind velocity and electricity production.

Question 6

Every year, each student in a nationally representative sample is given tests in various subjects. Recently, a random sample of \(9,600\) twelfth-grade students from the United States were administered a multiple-choice United States history exam. One of the multiple-choice questions is below. (The correct answer is C.)

In 1935 and 1936 the Supreme Court declared that important parts of the New Deal were unconstitutional. President Roosevelt responded by threatening to
(A) impeach several Supreme Court justices
(B) eliminate the Supreme Court
(C) appoint additional Supreme Court justices who shared his views
(D) override the Supreme Court’s decisions by gaining three-fourths majorities in both houses of Congress

Of the \(9,600\) students, \(28\) percent answered the multiple-choice question correctly.

(a) Let \(p\) be the proportion of all United States twelfth-grade students who would answer the question correctly. Construct and interpret a \(99\) percent confidence interval for \(p\).

Assume that students who actually know the correct answer have a \(100\) percent chance of answering the question correctly, and students who do not know the correct answer to the question guess completely at random from among the four options.

Let \(k\) represent the proportion of all United States twelfth-grade students who actually know the correct answer to the question.

(b) A tree diagram of the possible outcomes for a randomly selected twelfth-grade student is provided below. Write the correct probability in each of the five empty boxes. Some of the probabilities may be expressions in terms of \(k\).
(c) Based on the completed tree diagram, express the probability, in terms of \(k\), that a randomly selected twelfth-grade student would correctly answer the history question.
(d) Using your interval from part (a) and your answer to part (c), calculate and interpret a \(99\%\) confidence interval for \(k\), the proportion of all United States twelfth-grade students who actually know the answer to the history question. You may assume that the conditions for the confidence interval have been checked and verified.

Most-appropriate topic codes (CED):

TOPIC 6.2: Constructing a Confidence Interval for a Population Proportion — part (a)
TOPIC 4.3: Introduction to Probability — part (b)
TOPIC 4.5: Conditional Probability — part (c)
TOPIC 6.3: Justifying a Claim Based on a Confidence Interval for a Population Proportion — part (d)
▶️ Answer/Explanation
Detailed solution

(a)
State:
We construct a one-sample z-interval for a population proportion \(p\).

Plan:
Conditions:
1. Random: Random sample stated
2. Large Counts: \(n\hat{p} = 9,600 \times 0.28 = 2,688 \geq 10\) and \(n(1-\hat{p}) = 9,600 \times 0.72 = 6,912 \geq 10\)

Do:
\(\hat{p} = 0.28\), \(n = 9,600\), \(z^* = 2.576\)
Standard error: \(\sqrt{\frac{0.28 \times 0.72}{9,600}} \approx 0.00459\)
Margin of error: \(2.576 \times 0.00459 \approx 0.0118\)
Interval: \(0.28 \pm 0.0118 = (0.2682, 0.2918)\)

Conclude:
We are \(99\%\) confident that the true proportion of all U.S. twelfth-grade students who would answer this question correctly is between \(0.268\) and \(0.292\).

(b)
Completed tree diagram:

(c)
\(P(\text{correct}) = P(\text{knows and correct}) + P(\text{doesn’t know and guesses correctly})\)
\(= k \times 1 + (1-k) \times 0.25 = k + 0.25 – 0.25k = 0.25 + 0.75k\)

(d)
From part (c): \(P(\text{correct}) = 0.25 + 0.75k\)
From part (a): \(0.268 \leq P(\text{correct}) \leq 0.292\)

Solve for \(k\):
\(0.25 + 0.75k = 0.268 \Rightarrow 0.75k = 0.018 \Rightarrow k = 0.024\)
\(0.25 + 0.75k = 0.292 \Rightarrow 0.75k = 0.042 \Rightarrow k = 0.056\)

We are \(99\%\) confident that the true proportion of all U.S. twelfth-grade students who actually know the correct answer is between \(0.024\) and \(0.056\).

Scroll to Top