Question 1
• Are you an on campus student or an off campus student?
• In how many extracurricular activities do you participate?
The responses of the \(100\) students are summarized in the frequency table shown.
| Level of Participation in Extracurricular Activities | On campus | Off campus | Total |
|---|---|---|---|
| No activities | \(9\) | \(30\) | \(39\) |
| One activity | \(17\) | \(25\) | \(42\) |
| Two or more activities | \(7\) | \(12\) | \(19\) |
| Total | \(33\) | \(67\) | \(100\) |
The responses of the \(100\) students are summarized in the segmented bar graph shown.
\(H_0\): There is no association between residential status and level of participation in extracurricular activities among the students at the university.
\(H_a\): There is an association between residential status and level of participation in extracurricular activities among the students at the university.
The test resulted in a p-value of \(0.23\). Based on the p-value, what conclusion should the administrator make?
Most-appropriate topic codes (CED):
• TOPIC 2.2: Representing Two Categorical Variables — part (b)
• TOPIC 8.6: Carrying Out a Chi-Square Test for Homogeneity or Independence — part (c)
▶️ Answer/Explanation
(a)
On campus proportion: \(\frac{17+7}{33} = \frac{24}{33} \approx 0.727\)
Off campus proportion: \(\frac{25+12}{67} = \frac{37}{67} \approx 0.552\)
(b)
The graph reveals an association between residential status and participation. On-campus students are more likely to participate in extracurricular activities than off-campus students. A larger proportion of off-campus students (\(\approx 45\%\)) participate in no activities compared to on-campus students (\(\approx 27\%\)). Consequently, a larger proportion of on-campus students participate in one activity (\(\approx 52\%\)) compared to off-campus students (\(\approx 37\%\)). The proportions for two or more activities are similar.
(c)
We compare the p-value to a standard significance level, such as \(\alpha = 0.05\). Since the p-value (\(0.23\)) is greater than \(\alpha\) (\(0.05\)), the administrator should **fail to reject the null hypothesis**. There is not convincing statistical evidence to conclude that an association exists between residential status and level of participation in extracurricular activities for all students at the university.
Question 2
(a) Calculate the probability that randomly selecting \(3\) people from a group of \(6\) men and \(3\) women will result in selecting \(3\) women.
(b) Based on your answer to part (a), is there reason to doubt the manager’s claim that the \(3\) people were selected at random? Explain.
(c) An alternative to calculating the exact probability is to conduct a simulation to estimate the probability. A proposed simulation process is described below.
Each trial in the simulation consists of rolling three fair, six-sided dice, one die for each of the convention attendees. For each die, rolling a \(1\), \(2\), \(3\), or \(4\) represents selecting a man; rolling a \(5\) or \(6\) represents selecting a woman. After \(1,000\) trials, the number of times the dice indicate selecting \(3\) women is recorded.
Does the proposed process correctly simulate the random selection of \(3\) women from a group of \(9\) people consisting of \(6\) men and \(3\) women? Explain why or why not.
Most-appropriate topic codes (CED):
• TOPIC 4.3: Introduction to Probability — part (b)
• TOPIC 4.2: Estimating Probabilities Using Simulation — part (c)
▶️ Answer/Explanation
(a)
\(P(\text{3 women}) = \frac{3}{9} \times \frac{2}{8} \times \frac{1}{7} = \frac{6}{504} = \frac{1}{84} \approx 0.0119\)
Answer: \(\boxed{0.012}\)
(b)
Yes, there is reason to doubt the manager’s claim. The probability of selecting \(3\) women by random chance is only about \(1.2\%\), which is very small. This suggests it’s unlikely this outcome occurred purely by chance.
(c)
No, the proposed process does not correctly simulate the random selection. The dice simulation assumes:
• Independent selections (dice rolls are independent)
• Constant probability of selecting a woman (\( \frac{2}{6} = \frac{1}{3} \))
However, the actual selection is:
• Dependent selections (sampling without replacement)
• Changing probabilities (after each woman is selected, probability decreases)
The dice simulation represents sampling with replacement, not the actual without-replacement scenario.
Question 3
(a) If more than \(140\) students are absent on the day the attendance count is taken for funding purposes, the school will lose some of its state funding in the subsequent year. Approximately what is the probability that High School A will lose some state funding?
(b) The principals’ association in the state suggests that instead of choosing one day at random, the state should choose \(3\) days at random. With the suggested plan, High School A would lose some of its state funding in the subsequent year if the mean number of students absent for the \(3\) days is greater than \(140\). Would High School A be more likely, less likely, or equally likely to lose funding using the suggested plan compared to the plan described in part (a)? Justify your choice.
(c) A typical school week consists of the days Monday, Tuesday, Wednesday, Thursday, and Friday. The principal at High School A believes that the number of absences tends to be greater on Mondays and Fridays, and there is concern that the school will lose state funding if the attendance count occurs on a Monday or Friday. If one school day is chosen at random from each of \(3\) typical school weeks, what is the probability that none of the \(3\) days chosen is a Tuesday, Wednesday, or Thursday?
Most-appropriate topic codes (CED):
• TOPIC 5.7: Sampling Distributions for Sample Means — part (b)
• TOPIC 4.6: Independent Events and Unions of Events — part (c)
▶️ Answer/Explanation
(a)
\(z = \frac{140 – 120}{10.5} \approx 1.90\)
\(P(Z > 1.90) = 1 – 0.9713 = 0.0287\)
The probability is approximately \(0.029\).
Answer: \(\boxed{0.029}\)
(b)
High School A would be less likely to lose funding using the suggested plan.
• With \(3\) days: \(\sigma_{\bar{x}} = \frac{10.5}{\sqrt{3}} \approx 6.062\)
• \(z = \frac{140 – 120}{6.062} \approx 3.30\)
• \(P(Z > 3.30) \approx 0.0005\)
The probability decreases from \(0.0287\) to \(0.0005\) because the sampling distribution of the mean has less variability.
(c)
Probability of selecting Monday or Friday in one week: \(\frac{2}{5} = 0.4\)
Since weeks are independent: \((0.4)^3 = 0.064\)
The probability that none of the \(3\) days is Tuesday, Wednesday, or Thursday is \(0.064\).
Answer: \(\boxed{0.064}\)
Question 4
Most-appropriate topic codes (CED):
• TOPIC 3.2: Introduction to Planning a Study — part (b)
• TOPIC 3.4: Potential Problems with Sampling — part (b)
▶️ Answer/Explanation
(a)
The median is resistant to skewness and outliers, while the mean is not. Income distributions are typically right-skewed, with a small number of very high incomes. These high incomes would pull the mean upward but have little effect on the median. Therefore, the median would provide a better estimate of a typical income than the mean.
\(\boxed{\text{Median is resistant to skewness and outliers}}\)
(b)
Method 2 is better. Method 1 uses voluntary response, which is likely to produce a biased sample. People with higher incomes may be more likely to respond, leading to an overestimate of the mean income. Method 2 uses a random sample with follow-up to ensure responses, which should produce a more representative sample and a better estimate of the population mean income, despite the smaller sample size.
\(\boxed{\text{Method 2}}\)
Question 5
A researcher conducted a study to investigate whether local car dealers tend to charge women more than men for the same car model. Using information from the county tax collector’s records, the researcher randomly selected one man and one woman from among everyone who had purchased the same model of an identically equipped car from the same dealer. The process was repeated for a total of \(8\) randomly selected car models.
The purchase prices and the differences (woman – man) are shown in the table below. Summary statistics are also shown.
| Car model | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 |
|---|---|---|---|---|---|---|---|---|
| Women | \(20,100\) | \(17,400\) | \(22,300\) | \(32,500\) | \(17,710\) | \(21,500\) | \(29,600\) | \(46,300\) |
| Men | \(19,580\) | \(17,500\) | \(21,400\) | \(32,300\) | \(17,720\) | \(20,300\) | \(28,300\) | \(45,630\) |
| Difference | \(520\) | \(-100\) | \(900\) | \(200\) | \(-10\) | \(1,200\) | \(1,300\) | \(670\) |
| Mean | Standard Deviation | |
|---|---|---|
| Women | \(25,926.25\) | \(9,846.61\) |
| Men | \(25,341.25\) | \(9,728.60\) |
| Difference | \(585.00\) | \(530.71\) |
Dotplots of the data and the differences are shown below.
![]()
Do the data provide convincing evidence that, on average, women pay more than men in the county for the same car model?
Most-appropriate topic codes (CED):
• TOPIC 7.4: Setting Up a Test for a Population Mean
• TOPIC 7.9: Carrying Out a Test for the Difference of Two Population Means
▶️ Answer/Explanation
State:
We will perform a paired t-test at significance level \(\alpha = 0.05\).
Let \(\mu_{\text{diff}}\) be the population mean difference in purchase price (woman – man).
\(H_0: \mu_{\text{diff}} = 0\)
\(H_a: \mu_{\text{diff}} > 0\)
Plan:
We check the conditions for inference:
1. Random: Random selection of car models and buyers stated.
2. Normality: Sample size (\(n = 8\)) is small, but the dotplot of differences shows no strong skewness or outliers.
Do:
Test statistic: \(t = \frac{\bar{x}_{\text{diff}} – 0}{s_{\text{diff}}/\sqrt{n}} = \frac{585 – 0}{530.71/\sqrt{8}} \approx 3.12\)
Degrees of freedom: \(df = 7\)
p-value: \(P(t > 3.12) \approx 0.008\)
Conclude:
Since p-value (\(0.008\)) < \(\alpha\) (\(0.05\)), we reject \(H_0\).
There is convincing statistical evidence that, on average, women pay more than men in the county for the same car model.
Question 6
\[ \text{FCR} = -1.595789 + 0.0372614 \times \text{Length} \]
RSquare: \(0.250401\)
Root Mean Square Error: \(0.902382\)
Observations: \(66\)
| FCR | Length (inches) | Engine Size (liters) | Wheel Base (inches) |
|---|---|---|---|
| \(5.88\) | \(175\) | \(3.6\) | \(93\) |
Most-appropriate topic codes (CED):
• TOPIC 2.7: Residuals — part (b)
• TOPIC 2.4: Representing the Relationship Between Two Quantitative Variables — part (c)
• TOPIC 2.8: Least Squares Regression — part (d)
▶️ Answer/Explanation
(a)
Predicted FCR = \(-1.595789 + 0.0372614 \times 175 = 4.925\)
Residual = actual – predicted = \(5.88 – 4.925 = 0.955\)
Interpretation: The car’s actual FCR is \(0.955\) gallons per \(100\) miles greater than predicted for a car of its length.
\(\boxed{0.955}\)
(b i)
The point on graph III with wheel base \(93\) inches and residual approximately \(0.96\) should be circled.
![]()
(b ii)
Point B being close to the horizontal line at \(0\) indicates that the car’s actual FCR is very close to the FCR predicted by the regression model using length alone.
(c)
Graph II shows a moderate positive linear association between engine size and residuals from the FCR-length regression. Graph III shows a weak association with no clear pattern between wheel base and these residuals. The association is stronger in graph II than in graph III.
(d)
Jamal should use engine size. The stronger association in graph II indicates that engine size explains more of the variation in FCR that remains after accounting for length, which would improve the prediction model.
\(\boxed{\text{Engine size}}\)
