Question 1
Lead Levels
| 2 | | 8 |
| 3 | | 0 |
| 3 | | 5 8 8 |
| 4 | | 1 1 2 |
| 4 | | 6 8 8 |
| 5 | | 0 1 2 2 3 4 |
| 5 | | 9 9 |
| 6 | | 3 4 |
| 6 | | 6 8 |
(a) What proportion of crows in the sample had lead levels that are classified by the biologist as unhealthy?
(b) The mean lead level of the \(23\) crows in the sample was \(4.90\) ppm and the standard deviation was \(1.12\) ppm. Construct and interpret a \(95\%\) confidence interval for the mean lead level of crows in the region.
Most-appropriate topic codes (CED):
• TOPIC 7.2: Constructing a Confidence Interval for a Population Mean — part (b)
▶️ Answer/Explanation
(a)
From the stemplot, crows with lead levels > \(6.0\) ppm are: \(6.3\), \(6.4\), \(6.6\), \(6.8\).
Number of unhealthy crows: \(4\)
Total sample size: \(23\)
Proportion: \(\frac{4}{23} \approx 0.174\)
Answer: \(\boxed{0.174}\)
(b)
Step 1: Identify procedure and check conditions
We use a one-sample t-interval for a population mean.
• Random: Random sample stated
• Normality: Sample size \(n = 23\) is small, but stemplot shows no strong skewness or outliers
Step 2: Calculate the interval
Formula: \(\bar{x} \pm t^* \frac{s}{\sqrt{n}}\)
\(\bar{x} = 4.90\), \(s = 1.12\), \(n = 23\)
Degrees of freedom: \(df = 22\)
Critical value: \(t^* = 2.074\)
Margin of error: \(2.074 \times \frac{1.12}{\sqrt{23}} \approx 0.484\)
Interval: \(4.90 \pm 0.484 = (4.416, 5.384)\)
Step 3: Interpretation
We are \(95\%\) confident that the true population mean lead level of all crows in this region is between \(4.416\) ppm and \(5.384\) ppm.
Answer: \(\boxed{(4.416, 5.384)}\)
Question 2
An administrator at a large university wants to conduct a survey to estimate the proportion of students who are satisfied with the appearance of the university buildings and grounds. The administrator is considering three methods of obtaining a sample of \(500\) students from the \(70,000\) students at the university.
(a) Because of financial constraints, the first method the administrator is considering consists of taking a convenience sample to keep the expenses low. A very large number of students will attend the first football game of the season, and the first \(500\) students who enter the football stadium could be used as a sample. Why might such a sampling method be biased in producing an estimate of the proportion of students who are satisfied with the appearance of the buildings and grounds?
(b) Because of the large number of students at the university, the second method the administrator is considering consists of using a computer with a random number generator to select a simple random sample of \(500\) students from a list of \(70,000\) student names. Describe how to implement such a method.
(c) Because stratification can often provide a more precise estimate than a simple random sample, the third method the administrator is considering consists of selecting a stratified random sample of \(500\) students. The university has two campuses with male and female students at each campus. Under what circumstance(s) would stratification by campus provide a more precise estimate of the proportion of students who are satisfied with the appearance of the university buildings and grounds than stratification by gender?
Most-appropriate topic codes (CED):
• TOPIC 3.3: Random Sampling and Data Collection — part (b)
• TOPIC 3.2: Introduction to Planning a Study — part (c)
▶️ Answer/Explanation
(a)
This is a convenience sample. The first \(500\) students arriving at a football game are not likely to be representative of the entire student population. These students might have more school pride than the average student. This pride could be associated with having a more positive opinion about the university’s appearance. If so, the sample proportion of satisfied students would likely be higher than the true population proportion, leading to a biased estimate.
(b)
A simple random sample (SRS) can be implemented as follows:
- Obtain a complete list of all \(70,000\) students at the university.
- Assign a unique identification number to each student, from \(1\) to \(70,000\).
- Use a computer’s random number generator to select \(500\) unique integers between \(1\) and \(70,000\) (sampling without replacement).
- The \(500\) students corresponding to the selected numbers will form the sample.
(c)
Stratification provides a more precise estimate when the variability of responses within each stratum is low, and the variability between strata is high.
Therefore, stratification by campus would be more precise if the opinions on appearance differ significantly between the two campuses, but are similar within each campus. This strategy would be advantageous if the difference in opinions between the two campuses is greater than the difference in opinions between genders.
Question 3
Each full carton of Grade A eggs consists of \(1\) randomly selected empty cardboard container and \(12\) randomly selected eggs. The weights of such full cartons are approximately normally distributed with a mean of \(840\)\) grams and a standard deviation of \(7.9\) grams.
(a) What is the probability that a randomly selected full carton of Grade A eggs will weigh more than \(850\) grams?
The weights of the empty cardboard containers have a mean of \(20\) grams and a standard deviation of \(1.7\) grams. It is reasonable to assume independence between the weights of the empty cardboard containers and the weights of the eggs. It is also reasonable to assume independence among the weights of the \(12\) eggs that are randomly selected for a full carton.
Let the random variable \(X\) be the weight of a single randomly selected Grade A egg.
(b)
i) What is the mean of \(X\)?
ii) What is the standard deviation of \(X\)?
Most-appropriate topic codes (CED):
• TOPIC 4.9: Combining Random Variables — part (b)
▶️ Answer/Explanation
(a)
Let \(W\) be the weight of a randomly selected full carton. We are given that \(W\) follows a Normal distribution with a mean \(\mu_W = 840\) grams and a standard deviation \(\sigma_W = 7.9\) grams. We want to find \(P(W > 850)\).
1. Find the z-score: \[ z = \frac{W – \mu_W}{\sigma_W} = \frac{850 – 840}{7.9} \approx 1.27 \]
2. Find the probability: Using a standard normal table or calculator: \[ P(W > 850) = P(Z > 1.27) \approx 1 – 0.8980 = 0.1020 \] The probability that a randomly selected carton weighs more than \(850\) grams is approximately \(0.1020\).
(b)
Let \(P\) be the weight of the packaging and \(X_i\) be the weight of the \(i\)-th egg. The total weight of the carton is \(W = P + X_1 + X_2 + \dots + X_{12}\).
i) Mean of \(X\): The rule for means is \(E(A+B) = E(A) + E(B)\). \[ E(W) = E(P + X_1 + \dots + X_{12}) \] \[ E(W) = E(P) + E(X_1) + \dots + E(X_{12}) \] Since all eggs have the same mean, \(E(X_i) = E(X)\): \[ E(W) = E(P) + 12 \times E(X) \] We are given \(E(W) = 840\) and \(E(P) = 20\). \[ 840 = 20 + 12 \times E(X) \] \[ 820 = 12 \times E(X) \] \[ E(X) = \frac{820}{12} \approx 68.33 \text{ grams} \]
ii) Standard deviation of \(X\): The rule for variances of independent variables is \(Var(A+B) = Var(A) + Var(B)\). \[ Var(W) = Var(P + X_1 + \dots + X_{12}) \] \[ Var(W) = Var(P) + Var(X_1) + \dots + Var(X_{12}) \] Since all eggs have the same variance, \(Var(X_i) = Var(X)\): \[ Var(W) = Var(P) + 12 \times Var(X) \] We must use variances (standard deviation squared). We are given \(\sigma_W = 7.9\) and \(\sigma_P = 1.7\). \[ Var(W) = (7.9)^2 = 62.41 \] \[ Var(P) = (1.7)^2 = 2.89 \] Substitute these values into the equation: \[ 62.41 = 2.89 + 12 \times Var(X) \] \[ 59.52 = 12 \times Var(X) \] \[ Var(X) = \frac{59.52}{12} = 4.96 \] The standard deviation is the square root of the variance: \[ SD(X) = \sqrt{Var(X)} = \sqrt{4.96} \approx 2.23 \text{ grams} \]
Question 4
The Behavioral Risk Factor Surveillance System is an ongoing health survey system that tracks health conditions and risk behaviors in the United States. In one of their studies, a random sample of \(8,866\) adults answered the question “Do you consume five or more servings of fruits and vegetables per day?” The data are summarized by response and by age-group in the frequency table below.
| Age-Group (years) | Yes | No | Total |
|---|---|---|---|
| 18–34 | 231 | 741 | 972 |
| 35–54 | 669 | 2,242 | 2,911 |
| 55 or older | 1,291 | 3,692 | 4,983 |
| Total | 2,191 | 6,675 | 8,866 |
Do the data provide convincing statistical evidence that there is an association between age-group and whether or not a person consumes five or more servings of fruits and vegetables per day for adults in the United States?
Most-appropriate topic codes (CED):
• TOPIC 8.5: Setting Up a Chi-Square Test for Independence or Homogeneity
• TOPIC 8.6: Carrying Out a Chi-Square Test for Independence or Homogeneity
▶️ Answer/Explanation
State:
We will perform a chi-square test for independence at significance level \(\alpha = 0.05\).
\(H_0\): There is no association between age-group and fruit/vegetable consumption.
\(H_a\): There is an association between age-group and fruit/vegetable consumption.
Plan:
We check the conditions for inference:
1. Random: Random sample stated.
2. Large Counts: All expected counts ≥ 5.
Expected counts table:
| Age Group | Yes (Expected) | No (Expected) |
|---|---|---|
| 18–34 | 240.2 | 731.8 |
| 35–54 | 719.4 | 2,191.6 |
| 55+ | 1,231.4 | 3,751.6 |
All expected counts > 5, so condition is met.
Do:
Calculate test statistic:
\(\chi^2 = \sum \frac{(O-E)^2}{E} = \frac{(231-240.2)^2}{240.2} + \frac{(741-731.8)^2}{731.8} + \frac{(669-719.4)^2}{719.4} + \frac{(2242-2191.6)^2}{2191.6} + \frac{(1291-1231.4)^2}{1231.4} + \frac{(3692-3751.6)^2}{3751.6} \approx 8.983\)
Degrees of freedom: \((3-1)(2-1) = 2\)
p-value: \(P(\chi^2 \geq 8.983) \approx 0.011\)
Conclude:
Since p-value (\(0.011\)) < \(\alpha\) (\(0.05\)), we reject \(H_0\).
There is convincing statistical evidence that there is an association between age-group and fruit/vegetable consumption for U.S. adults.
Question 5
Psychologists interested in the relationship between meditation and health conducted a study with a random sample of \(28\) men who live in a large retirement community. Of the men in the sample, \(11\) reported that they participate in daily meditation and \(17\) reported that they do not participate in daily meditation.
The researchers wanted to perform a hypothesis test of
\[H_{0}:p_{m}-p_{c}=0\] \[H_{a}:p_{m}-p_{c}<0,\] where \(p_{m}\) is the proportion of men with high blood pressure among all the men in the retirement community who participate in daily meditation and \(p_{c}\) is the proportion of men with high blood pressure among all the men in the retirement community who do not participate in daily meditation.
(a) If the study were to provide significant evidence against \(H_{0}\) in favor of \(H_{a}\), would it be reasonable for the psychologists to conclude that daily meditation causes a reduction in blood pressure for men in the retirement community? Explain why or why not.
The psychologists found that of the \(11\) men in the study who participate in daily meditation, \(0\) had high blood pressure. Of the \(17\) men who do not participate in daily meditation, \(8\) had high blood pressure.
(b) Let \(\hat{p}_{m}\) represent the proportion of men with high blood pressure among those in a random sample of \(11\) who meditate daily, and let \(\hat{p}_{c}\) represent the proportion of men with high blood pressure among those in a random sample of \(17\) who do not meditate daily. Why is it not reasonable to use a normal approximation for the sampling distribution of \(\hat{p}_{m}-\hat{p}_{c}\)?
(c)Although a normal approximation cannot be used, it is possible to simulate the distribution of \(\hat{p}_{m}-\hat{p}_{c}\). Under the assumption that the null hypothesis is true, \(10,000\) values of \(\hat{p}_{m}-\hat{p}_{c}\) were simulated. The histogram below shows the results of the simulation.
![]()
Based on the results of the simulation, what can be concluded about the relationship between blood pressure and meditation among men in the retirement community?
Most-appropriate topic codes (CED):
• TOPIC 6.11: Carrying Out a Test for the Difference of Two Population Proportions — part (b)
• TOPIC 4.2: Estimating Probabilities Using Simulation — part (c)
• TOPIC 6.11: Carrying Out a Test for the Difference of Two Population Proportions — part (c)
▶️ Answer/Explanation
(a)
No, it would not be reasonable to conclude causation. Because this is an observational study and not a randomized experiment, a cause-and-effect relationship cannot be established. The men self-selected whether to meditate. It is possible that men who choose to meditate are different in other lifestyle ways (e.g., diet, exercise) that also reduce blood pressure. These other factors are potential confounding variables.
(b)
It is not reasonable because the Large Counts Condition for normality is not met. For a hypothesis test, we check the expected counts using the pooled proportion of successes, \(\hat{p}_{pooled}\).
1. Find pooled proportion: \[\hat{p}_{pooled} = \frac{\text{total successes}}{\text{total } n} = \frac{0 + 8}{11 + 17} = \frac{8}{28} \approx 0.286\] 2. Check expected counts: We must check that all four expected counts (successes and failures for both groups) are at least \(10\).
- Expected successes (meditators): \(n_m \hat{p}_{pooled} = 11\left(\frac{8}{28}\right) \approx 3.14\)
- Expected successes (non-meditators): \(n_c \hat{p}_{pooled} = 17\left(\frac{8}{28}\right) \approx 4.86\)
Since \(3.14 < 10\) and \(4.86 < 10\), the condition is not met. (Note: The number of observed successes in the meditation group, \(0\), is also less than \(10\), which is another valid check).
(c)
1. Calculate observed statistic: The observed difference in proportions is: \[\hat{p}_m – \hat{p}_c = \frac{0}{11} – \frac{8}{17} \approx -0.4706\]
2. Find p-value from simulation: The simulation was run \(10,000\) times assuming the null hypothesis (\(H_0: p_m – p_c = 0\)) is true. We need to find the probability of getting a result as or more extreme than our observed statistic (\(-0.4706\)). According to the histogram, the value \(-0.47\) (which represents the bin for \(\le -0.47\)) occurred \(76\) times.
The approximate p-value is \(P(\hat{p}_m – \hat{p}_c \le -0.47) = \frac{76}{10000} = 0.0076\).
3. Conclude: Because the p-value of \(0.0076\) is very small (e.g., smaller than \(\alpha = 0.05\)), we reject \(H_0\). There is convincing statistical evidence that men in this retirement community who meditate are less likely to have high blood pressure than men who do not meditate. However, as stated in part (a), this is an observational study, so we can only conclude an association, not a causal relationship.
Question 6
Tropical storms in the Pacific Ocean with sustained winds that exceed 74 miles per hour are called typhoons. Graph A below displays the number of recorded typhoons in two regions of the Pacific Ocean—the Eastern Pacific and the Western Pacific—for the years from 1997 to 2010.![]()
Graph A: Yearly Frequency of Typhoons (1997-2010)
(a) Compare the distributions of yearly frequencies of typhoons for the two regions of the Pacific Ocean for the years from 1997 to 2010.
(b) For each region, describe how the yearly frequencies changed over the time period from 1997 to 2010.
A moving average for data collected at regular time increments is the average of data values for two or more consecutive increments. The 4-year moving averages for the typhoon data are provided in the table below.![]()
(c) Show how to calculate the 4-year moving average for the year 2010 in the Western Pacific. Write your value in the appropriate place in the table.
(d) Graph B below shows both yearly frequencies (connected by dashed lines) and the respective 4-year moving averages (connected by solid lines). Use your answer in part (c) to complete the graph.
(e) Consider graph B.
i) What information is more apparent from the plots of the 4-year moving averages than from the plots of the yearly frequencies of typhoons?
ii) What information is less apparent from the plots of the 4-year moving averages than from the plots of the yearly frequencies of typhoons?
Most-appropriate topic codes (CED):
• TOPIC 2.4: Representing the Relationship Between Two Quantitative Variables — part (b)
• TOPIC 7.1: Introducing Statistics: Should I Worry About Error? — parts (c)-(e)
▶️ Answer/Explanation
(a)
1. Center: The Western Pacific had more typhoons than the Eastern Pacific in almost all years. The average was about \(31\) typhoons/year for Western Pacific vs. about \(19\) typhoons/year for Eastern Pacific.
2. Variability: The Western Pacific showed more variability (range ≈ \(21\) typhoons) than the Eastern Pacific (range ≈ \(10\) typhoons).
3. Context: The Western Pacific region consistently experiences more typhoons annually with greater year-to-year fluctuation.
(b)
• Western Pacific: Showed a clear decreasing trend over the time period, especially from around \(2001\) to \(2010\).
• Eastern Pacific: Remained relatively consistent over the time period, with a slight increasing trend in the later years (\(2005\)-\(2010\)).
(c)
The 4-year moving average for Western Pacific in \(2010\) uses data from \(2007\), \(2008\), \(2009\), and \(2010\):
\[ \frac{28 + 27 + 28 + 18}{4} = \frac{101}{4} = 25.25 \]
The value \(25.25\) should be written in the table for Western Pacific 4-year moving average for \(2010\).
(d)
On Graph B, plot the point for Western Pacific 4-year moving average at year \(2010\) and frequency \(25.25\), then connect this point to the previous moving average point at \(2009\).
(e)
i) More apparent: The overall long-term trends are more visible. The moving averages clearly show the decreasing trend in Western Pacific and the slight increasing trend in Eastern Pacific.
ii) Less apparent: The year-to-year variability is smoothed out, making individual yearly fluctuations less noticeable.
