Home / AP_Statistics_2014_FRQ

Question 1

An administrator at a large university is interested in determining whether the residential status of a student is associated with level of participation in extracurricular activities. Residential status is categorized as on campus for students living in university housing and off campus otherwise. A simple random sample of \(100\) students in the university was taken, and each student was asked the following two questions.
• Are you an on campus student or an off campus student?
• In how many extracurricular activities do you participate?
The responses of the \(100\) students are summarized in the frequency table shown.
Level of Participation in Extracurricular ActivitiesOn campusOff campusTotal
No activities\(9\)\(30\)\(39\)
One activity\(17\)\(25\)\(42\)
Two or more activities\(7\)\(12\)\(19\)
Total\(33\)\(67\)\(100\)
(a) Calculate the proportion of on campus students in the sample who participate in at least one extracurricular activity and the proportion of off campus students in the sample who participate in at least one extracurricular activity.

The responses of the \(100\) students are summarized in the segmented bar graph shown.
(b) Write a few sentences summarizing what the graph reveals about the association between residential status and level of participation in extracurricular activities among the \(100\) students in the sample.
(c) After verifying that the conditions for inference were satisfied, the administrator performed a chi-square test of the following hypotheses.
\(H_0\): There is no association between residential status and level of participation in extracurricular activities among the students at the university.
\(H_a\): There is an association between residential status and level of participation in extracurricular activities among the students at the university.
The test resulted in a p-value of \(0.23\). Based on the p-value, what conclusion should the administrator make?

Most-appropriate topic codes (CED):

TOPIC 2.3: Statistics for Two Categorical Variables — part (a)
TOPIC 2.2: Representing Two Categorical Variables — part (b)
TOPIC 8.6: Carrying Out a Chi-Square Test for Homogeneity or Independence — part (c)
▶️ Answer/Explanation
Detailed solution

(a)
On campus proportion: \(\frac{17+7}{33} = \frac{24}{33} \approx 0.727\)
Off campus proportion: \(\frac{25+12}{67} = \frac{37}{67} \approx 0.552\)

(b)
The graph reveals an association between residential status and participation. On-campus students are more likely to participate in extracurricular activities than off-campus students. A larger proportion of off-campus students (\(\approx 45\%\)) participate in no activities compared to on-campus students (\(\approx 27\%\)). Consequently, a larger proportion of on-campus students participate in one activity (\(\approx 52\%\)) compared to off-campus students (\(\approx 37\%\)). The proportions for two or more activities are similar.

(c)
We compare the p-value to a standard significance level, such as \(\alpha = 0.05\). Since the p-value (\(0.23\)) is greater than \(\alpha\) (\(0.05\)), the administrator should **fail to reject the null hypothesis**. There is not convincing statistical evidence to conclude that an association exists between residential status and level of participation in extracurricular activities for all students at the university.

Question 2

Nine sales representatives, \(6\) men and \(3\) women, at a small company wanted to attend a national convention. There were only enough travel funds to send \(3\) people. The manager selected \(3\) people to attend and stated that the people were selected at random. The \(3\) people selected were women. There were concerns that no men were selected to attend the convention.

(a) Calculate the probability that randomly selecting \(3\) people from a group of \(6\) men and \(3\) women will result in selecting \(3\) women.

(b) Based on your answer to part (a), is there reason to doubt the manager’s claim that the \(3\) people were selected at random? Explain.

(c) An alternative to calculating the exact probability is to conduct a simulation to estimate the probability. A proposed simulation process is described below.

Each trial in the simulation consists of rolling three fair, six-sided dice, one die for each of the convention attendees. For each die, rolling a \(1\), \(2\), \(3\), or \(4\) represents selecting a man; rolling a \(5\) or \(6\) represents selecting a woman. After \(1,000\) trials, the number of times the dice indicate selecting \(3\) women is recorded.

Does the proposed process correctly simulate the random selection of \(3\) women from a group of \(9\) people consisting of \(6\) men and \(3\) women? Explain why or why not.

Most-appropriate topic codes (CED):

TOPIC 4.4: Mutually Exclusive Events — part (a)
TOPIC 4.3: Introduction to Probability — part (b)
TOPIC 4.2: Estimating Probabilities Using Simulation — part (c)
▶️ Answer/Explanation
Detailed solution

(a)
\(P(\text{3 women}) = \frac{3}{9} \times \frac{2}{8} \times \frac{1}{7} = \frac{6}{504} = \frac{1}{84} \approx 0.0119\)
Answer: \(\boxed{0.012}\)

(b)
Yes, there is reason to doubt the manager’s claim. The probability of selecting \(3\) women by random chance is only about \(1.2\%\), which is very small. This suggests it’s unlikely this outcome occurred purely by chance.

(c)
No, the proposed process does not correctly simulate the random selection. The dice simulation assumes:
• Independent selections (dice rolls are independent)
• Constant probability of selecting a woman (\( \frac{2}{6} = \frac{1}{3} \))

However, the actual selection is:
• Dependent selections (sampling without replacement)
• Changing probabilities (after each woman is selected, probability decreases)

The dice simulation represents sampling with replacement, not the actual without-replacement scenario.

Question 3

Schools in a certain state receive funding based on the number of students who attend the school. To determine the number of students who attend a school, one school day is selected at random and the number of students in attendance that day is counted and used for funding purposes. The daily number of absences at High School A in the state is approximately normally distributed with mean of \(120\) students and standard deviation of \(10.5\) students.

(a) If more than \(140\) students are absent on the day the attendance count is taken for funding purposes, the school will lose some of its state funding in the subsequent year. Approximately what is the probability that High School A will lose some state funding?

(b) The principals’ association in the state suggests that instead of choosing one day at random, the state should choose \(3\) days at random. With the suggested plan, High School A would lose some of its state funding in the subsequent year if the mean number of students absent for the \(3\) days is greater than \(140\). Would High School A be more likely, less likely, or equally likely to lose funding using the suggested plan compared to the plan described in part (a)? Justify your choice.

(c) A typical school week consists of the days Monday, Tuesday, Wednesday, Thursday, and Friday. The principal at High School A believes that the number of absences tends to be greater on Mondays and Fridays, and there is concern that the school will lose state funding if the attendance count occurs on a Monday or Friday. If one school day is chosen at random from each of \(3\) typical school weeks, what is the probability that none of the \(3\) days chosen is a Tuesday, Wednesday, or Thursday?

Most-appropriate topic codes (CED):

TOPIC 4.7: Introduction to Random Variables and Probability Distributions — part (a)
TOPIC 5.7: Sampling Distributions for Sample Means — part (b)
TOPIC 4.6: Independent Events and Unions of Events — part (c)
▶️ Answer/Explanation
Detailed solution

(a)
\(z = \frac{140 – 120}{10.5} \approx 1.90\)
\(P(Z > 1.90) = 1 – 0.9713 = 0.0287\)
The probability is approximately \(0.029\).
Answer: \(\boxed{0.029}\)

(b)
High School A would be less likely to lose funding using the suggested plan.
• With \(3\) days: \(\sigma_{\bar{x}} = \frac{10.5}{\sqrt{3}} \approx 6.062\)
• \(z = \frac{140 – 120}{6.062} \approx 3.30\)
• \(P(Z > 3.30) \approx 0.0005\)
The probability decreases from \(0.0287\) to \(0.0005\) because the sampling distribution of the mean has less variability.

(c)
Probability of selecting Monday or Friday in one week: \(\frac{2}{5} = 0.4\)
Since weeks are independent: \((0.4)^3 = 0.064\)
The probability that none of the \(3\) days is Tuesday, Wednesday, or Thursday is \(0.064\).
Answer: \(\boxed{0.064}\)

Question 4

As part of its twenty-fifth reunion celebration, the class of \(1988\) (students who graduated in \(1988\)) at a state university held a reception on campus. In an informal survey, the director of alumni development asked \(50\) of the attendees about their incomes. The director computed the mean income of the \(50\) attendees to be \( \$189,952 \). In a news release, the director announced, “The members of our class of \(1988\) enjoyed resounding success. Last year’s mean income of its members was \( \$189,952 \)!”
(a) What would be a statistical advantage of using the median of the reported incomes, rather than the mean, as the estimate of the typical income?
(b) The director felt the members who attended the reception may be different from the class as a whole. A more detailed survey of the class was planned to find a better estimate of the income as well as other facts about the alumni. The staff developed two methods based on the available funds to carry out the survey.
Method 1: Send out an e-mail to all \(6,826\) members of the class asking them to complete an online form. The staff estimates that at least \(600\) members will respond.
Method 2: Select a simple random sample of members of the class and contact the selected members directly by phone. Follow up to ensure that all responses are obtained. Because method \(2\) will require more time than method \(1\), the staff estimates that only \(100\) members of the class could be contacted using method \(2\).
Which of the two methods would you select for estimating the average yearly income of all \(6,826\) members of the class of \(1988\)? Explain your reasoning by comparing the two methods and the effect of each method on the estimate.

Most-appropriate topic codes (CED):

TOPIC 1.7: Summary Statistics for a Quantitative Variable — part (a)
TOPIC 3.2: Introduction to Planning a Study — part (b)
TOPIC 3.4: Potential Problems with Sampling — part (b)
▶️ Answer/Explanation
Detailed solution

(a)
The median is resistant to skewness and outliers, while the mean is not. Income distributions are typically right-skewed, with a small number of very high incomes. These high incomes would pull the mean upward but have little effect on the median. Therefore, the median would provide a better estimate of a typical income than the mean.
\(\boxed{\text{Median is resistant to skewness and outliers}}\)

(b)
Method 2 is better. Method 1 uses voluntary response, which is likely to produce a biased sample. People with higher incomes may be more likely to respond, leading to an overestimate of the mean income. Method 2 uses a random sample with follow-up to ensure responses, which should produce a more representative sample and a better estimate of the population mean income, despite the smaller sample size.
\(\boxed{\text{Method 2}}\)

Question 5

A researcher conducted a study to investigate whether local car dealers tend to charge women more than men for the same car model. Using information from the county tax collector’s records, the researcher randomly selected one man and one woman from among everyone who had purchased the same model of an identically equipped car from the same dealer. The process was repeated for a total of \(8\) randomly selected car models.

The purchase prices and the differences (woman – man) are shown in the table below. Summary statistics are also shown.

Car model12345678
Women\(20,100\)\(17,400\)\(22,300\)\(32,500\)\(17,710\)\(21,500\)\(29,600\)\(46,300\)
Men\(19,580\)\(17,500\)\(21,400\)\(32,300\)\(17,720\)\(20,300\)\(28,300\)\(45,630\)
Difference\(520\)\(-100\)\(900\)\(200\)\(-10\)\(1,200\)\(1,300\)\(670\)
 MeanStandard Deviation
Women\(25,926.25\)\(9,846.61\)
Men\(25,341.25\)\(9,728.60\)
Difference\(585.00\)\(530.71\)

Dotplots of the data and the differences are shown below.

Do the data provide convincing evidence that, on average, women pay more than men in the county for the same car model?

Most-appropriate topic codes (CED):

TOPIC 7.5: Carrying Out a Test for a Population Mean
TOPIC 7.4: Setting Up a Test for a Population Mean
TOPIC 7.9: Carrying Out a Test for the Difference of Two Population Means
▶️ Answer/Explanation
Detailed solution

State:
We will perform a paired t-test at significance level \(\alpha = 0.05\).
Let \(\mu_{\text{diff}}\) be the population mean difference in purchase price (woman – man).
\(H_0: \mu_{\text{diff}} = 0\)
\(H_a: \mu_{\text{diff}} > 0\)


Plan:
We check the conditions for inference:
1. Random: Random selection of car models and buyers stated.
2. Normality: Sample size (\(n = 8\)) is small, but the dotplot of differences shows no strong skewness or outliers.


Do:
Test statistic: \(t = \frac{\bar{x}_{\text{diff}} – 0}{s_{\text{diff}}/\sqrt{n}} = \frac{585 – 0}{530.71/\sqrt{8}} \approx 3.12\)
Degrees of freedom: \(df = 7\)
p-value: \(P(t > 3.12) \approx 0.008\)


Conclude:
Since p-value (\(0.008\)) < \(\alpha\) (\(0.05\)), we reject \(H_0\).

There is convincing statistical evidence that, on average, women pay more than men in the county for the same car model.

Question 6

Jamal is researching the characteristics of a car that might be useful in predicting the fuel consumption rate (FCR); that is, the number of gallons of gasoline that the car requires to travel \(100\) miles under conditions of typical city driving. The length of a car is one explanatory variable that can be used to predict FCR. Graph I is a scatterplot showing the lengths of \(66\) cars plotted with the corresponding FCR. One point on the graph is labeled A.
Jamal examined the scatterplot and determined that a linear model would be a reasonable way to express the relationship between FCR and length. A computer output from a linear regression is shown below.
Linear Fit
\[ \text{FCR} = -1.595789 + 0.0372614 \times \text{Length} \]
Summary of Fit
RSquare: \(0.250401\)
Root Mean Square Error: \(0.902382\)
Observations: \(66\)
(a) The point on the graph labeled A represents one car of length \(175\) inches and an FCR of \(5.88\). Calculate and interpret the residual for the car relative to the least squares regression line.
Jamal knows that it is possible to predict a response variable using more than one explanatory variable. He wants to see if he can improve the original model of predicting FCR from length by including a second explanatory variable in addition to length. He is considering including engine size, in liters, or wheel base (the length between axles), in inches. Graph II is a scatterplot showing the engine size of the \(66\) cars plotted with the corresponding residuals from the regression of FCR on length. Graph III is a scatterplot showing the wheel base of the \(66\) cars plotted with the corresponding residuals from the regression of FCR on length.
(b) In graph II, the point labeled A corresponds to the same car whose point was labeled A in graph I. The measurements for the car represented by point A are given below.
FCRLength (inches)Engine Size (liters)Wheel Base (inches)
\(5.88\)\(175\)\(3.6\)\(93\)
(i) Circle the point on graph III that corresponds to the car represented by point A on graphs I and II.
(ii) There is a point on graph III labeled B. It is very close to the horizontal line at \(0\). What does that indicate about the FCR of the car represented by point B?
(c) Write a few sentences to compare the association between the variables in graph II with the association between the variables in graph III.
(d) Jamal wants to predict FCR using length and one of the other variables, engine size or wheel base. Based on your response to part (c), which variable, engine size or wheel base, should Jamal use in addition to length if he wants to improve the prediction? Explain why you chose that variable.

Most-appropriate topic codes (CED):

TOPIC 2.7: Residuals — part (a)
TOPIC 2.7: Residuals — part (b)
TOPIC 2.4: Representing the Relationship Between Two Quantitative Variables — part (c)
TOPIC 2.8: Least Squares Regression — part (d)
▶️ Answer/Explanation
Detailed solution

(a)
Predicted FCR = \(-1.595789 + 0.0372614 \times 175 = 4.925\)
Residual = actual – predicted = \(5.88 – 4.925 = 0.955\)
Interpretation: The car’s actual FCR is \(0.955\) gallons per \(100\) miles greater than predicted for a car of its length.
\(\boxed{0.955}\)

(b i)
The point on graph III with wheel base \(93\) inches and residual approximately \(0.96\) should be circled.

(b ii)
Point B being close to the horizontal line at \(0\) indicates that the car’s actual FCR is very close to the FCR predicted by the regression model using length alone.

(c)
Graph II shows a moderate positive linear association between engine size and residuals from the FCR-length regression. Graph III shows a weak association with no clear pattern between wheel base and these residuals. The association is stronger in graph II than in graph III.

(d)
Jamal should use engine size. The stronger association in graph II indicates that engine size explains more of the variation in FCR that remains after accounting for length, which would improve the prediction model.
\(\boxed{\text{Engine size}}\)

Scroll to Top