Question 1
(a) Topic: 6.4 Sampling and estimation
(b) Topic: 6.4 Sampling and estimation
The heights of a certain species of deer are known to have standard deviation 0.35 m. A zoologist takes a random sample of 150 of these deer and finds that the mean height of the deer in the sample is 1.42 m.
(a) Calculate a 96% confidence interval for the population mean height.
(b) Bubay says that 96% of deer of this species are likely to have heights that are within this confidence interval. Explain briefly whether Bubay is correct.
▶️Answer/Explanation
Solution :-
Part (a): Calculating the 96% Confidence Interval
Given:
- Sample mean (\(\bar{x}\)) = 1.42 m
- Population standard deviation (\(\sigma\)) = 0.35 m
- Sample size (n) = 150
- Confidence level = 96%
For a 96% confidence interval, the z-score is found using the standard normal distribution. The z-score corresponding to 96% confidence is approximately 2.054 or 2.055. (Accept 3 sf if nothing better seen (2.05 or 2.06).)
The formula for the confidence interval is:
\(\bar{x} \pm z \frac{\sigma}{\sqrt{n}}\)
Plugging in the values:
\(1.42 \pm z \frac{0.35}{\sqrt{150}}\) (Must be a z value.)
\(1.42 \pm 2.054 \times \frac{0.35}{\sqrt{150}}\)
\(1.42 \pm 2.054 \times \frac{0.35}{12.247}\)
\(1.42 \pm 2.054 \times 0.02857\)
\(1.42 \pm 0.0587\)
The confidence interval is:
(1.42 – 0.0587, 1.42 + 0.0587)
(1.3613, 1.4787)
Therefore, the 96% confidence interval for the population mean height is approximately (1.36 m, 1.48 m) (3 sf). (Correct working only. Must be an interval.)
Part (b): Evaluating Bubay’s Statement
Bubay’s statement is incorrect. (Or similar. Need both.)
Explanation:
A confidence interval estimates the range within which the population mean is likely to fall, not the range within which individual data points (deer heights) are likely to fall. The 96% confidence interval means that we are 96% confident that the true population mean height lies within the calculated interval. It does not mean that 96% of individual deer heights are within this interval. Individual deer heights will vary according to the distribution of heights, which is characterized by the standard deviation.
In short, CI is about the mean, not individual values.
Question 2
Topic: 6.5 Hypothesis tests
The masses, in kilograms, of small and large bags of wheat have the independent distributions N(16.0, 0.4) and N(51.0, 0.9) respectively.
Find the probability that the total mass of 3 randomly chosen small bags is greater than the mass of one randomly chosen large bag.
▶️Answer/Explanation
Solution :-
Let \(S\) be the mass of a small bag and \(L\) be the mass of a large bag.
Given distributions:
- \(S \sim N(16.0, 0.4)\)
- \(L \sim N(51.0, 0.9)\)
Let \(S_3\) be the total mass of 3 small bags. Then:
- \(S_3 = S_1 + S_2 + S_3\)
- \(E(S_3) = 3 \times E(S) = 3 \times 16.0 = 48.0\)
- \(Var(S_3) = 3 \times Var(S) = 3 \times 0.4 = 1.2\)
- \(S_3 \sim N(48.0, 1.2)\)
We want to find \(P(S_3 > L)\), which is equivalent to \(P(S_3 – L > 0)\).
Let \(D = S_3 – L\). Then:
- \(E(D) = E(S_3) – E(L) = 48.0 – 51.0 = -3.0\) (Oe, using \(L-(S_{1}+S_{2}+S_{3})\))
- \(Var(D) = Var(S_3) + Var(L) = 1.2 + 0.9 = 2.1\)
- \(D \sim N(-3.0, 2.1)\)
We need to find \(P(D > 0)\). To do this, we standardize \(D\) using the z-score:
\(z = \frac{D – E(D)}{\sqrt{Var(D)}} = \frac{0 – (-3.0)}{\sqrt{2.1}} = \frac{3.0}{\sqrt{2.1}} \approx \frac{3.0}{1.449} \approx 2.070\) (For standardising with their values.)
Now we find \(P(Z > 2.070)\) using the standard normal distribution table or calculator:
\(P(Z > 2.070) = 1 – P(Z \le 2.070)\) (For area consistent with their values.)
From the standard normal table, \(P(Z \le 2.070) \approx 0.9808\)
\(P(Z > 2.070) = 1 – 0.9808 = 0.0192\) (3 sf)
Therefore, the probability that the total mass of 3 randomly chosen small bags is greater than the mass of one randomly chosen large bag is approximately 0.0192.
Question 3
(a) Topic: 6.4 Sampling and estimation
(b) Topic: 6.4 Sampling and estimation
The times, \(T\) minutes, taken by a random sample of 75 students to complete a test were noted. The results were summarised by \(\sum t = 230\) and \(\sum t^2 = 930\).
(a) Calculate unbiased estimates of the population mean and variance of \(T\).
You should now assume that your estimates from part (a) are the true values of the population mean and variance of \(T\).
(b) The times taken by another random sample of 75 students were noted, and the sample mean, \(\bar{T}\), was found. Find the value of \(a\) such that \(P(\bar{T} > a) = 0.234\).
▶️Answer/Explanation
Solution :-
Part (a): Unbiased Estimates
Given:
- Sample size \(n = 75\)
- \(\sum t = 230\)
- \(\sum t^2 = 930\)
Unbiased estimate of the population mean \(\mu\):
\(\bar{t} = \frac{\sum t}{n} = \frac{230}{75} = 3.0666… \text{ or } 3.07 \text{ (3 sf)} \text{ or } \frac{46}{15}\)
Unbiased estimate of the population variance \(\sigma^2\):
\(s^2 = \frac{1}{n-1} \left[ \sum t^2 – \frac{(\sum t)^2}{n} \right] = \frac{1}{74} \left[ 930 – \frac{230^2}{75} \right]\) (Use of correct formula.)
\(s^2 = 3.0360… \text{ or } 3.04 \text{ (3 sf)} \text{ or } \frac{337}{111}\)
Part (b): Finding \(a\)
Given: \(P(\bar{T} > a) = 0.234\), \(n = 75\), \(\mu = 3.0667\), \(\sigma^2 = 3.04\)
\([\Phi^{-1}(1 – 0.234)] = 0.726\)
Using the formula \(z = \frac{a – \mu}{\sigma_{\bar{T}}}\):
\(\frac{a – 3.0667}{\sqrt{\frac{3.04}{75}}} = 0.726\) (Ft their 0.726 but must be a z value. Note using 0.766 is M0. Must have sqrt 75.)
\(a = 3.21 \text{ (3 sf)}\) (CWO)
Question 4
(a) Topic: 6.3 Continuous random variables
(b) Topic: 6.3 Continuous random variables
A random variable \(X\) has probability density function \(f\) defined by
\[f(x)=\begin{cases}\frac{a}{x^{2}}-\frac{18}{x^{3}}\\ 0\end{cases}\] \[2\le x\le3\] otherwise,
where \(a\) is a constant.
(a) Show that \(a=\frac{27}{2}\).
(b) Show that \(E(X)=\frac{27}{2}\ln\frac{3}{2}-3\).
▶️Answer/Explanation
Solution :-
Part (a): Showing \(a = \frac{27}{2}\)
For \(f(x)\) to be a valid probability density function, the integral over its domain must equal 1:
\(\int_{2}^{3} \left(\frac{a}{x^{2}}-\frac{18}{x^{3}}\right) dx = 1\) (Attempt integrate \(f(x)\) ignore limits and \(=1^{*}\))
\(\left[-\frac{a}{x} + \frac{9}{x^{2}}\right]_{2}^{3} = 1\) (OE Correct integration and limits.)
\(\left[-\frac{a}{3} + 1 + \frac{a}{2} – \frac{9}{4}\right] = 1 \implies a = \frac{27}{2}\) (AG) (Must see correct substitution of limits. Correct working no errors seen.)
Part (b): Showing \(E(X) = \frac{27}{2}\ln\frac{3}{2}-3\)
\(E(X) = \int_{2}^{3} xf(x) dx\)
\(E(X) = \int_{2}^{3} x\left(\frac{27}{2x^{2}}-\frac{18}{x^{3}}\right) dx\)
\(E(X) = \int_{2}^{3} \left(\frac{27}{2x}-\frac{18}{x^{2}}\right) dx\) (Attempt to integrate \(xf(x),\) ignore limits.)
\(E(X) = \left[\frac{27}{2}\ln|x| + \frac{18}{x}\right]_{2}^{3}\) or \(\left[\frac{27}{2}\ln|2x| + \frac{18}{x}\right]_{2}^{3}\) (Correct integration and limits. OE e.g. using ln 2x.)
\(E(X) = \left(\frac{27}{2}\ln 3 + \frac{18}{3}\right) – \left(\frac{27}{2}\ln 2 + \frac{18}{2}\right)\)
\(E(X) = \frac{27}{2}\ln 3 + 6 – \frac{27}{2}\ln 2 – 9\)
\(E(X) = \frac{27}{2}(\ln 3 – \ln 2) – 3\)
\(E(X) = \frac{27}{2}\ln\frac{3}{2} – 3\) (AG) (Must see correct substitution of limits. Correct working no errors seen.)
Question 5
(a) Topic: 6.5 Hypothesis tests
(b) Topic: 6.5 Hypothesis tests
The lengths, in centimetres, of worms of a certain kind are normally distributed with mean \(\mu\) and standard deviation 2.3. An article in a magazine states that the value of \(\mu\) is 12.7. A scientist wishes to test whether this value is correct. He measures the lengths, \(x\) cm, of a random sample of 50 worms of this kind and finds that \(\sum x = 597.1\). He plans to carry out a test, at the 1% significance level, of whether the true value of \(\mu\) is different from 12.7.
(a) State, with a reason, whether he should use a one-tailed or a two-tailed test.
(b) Carry out the test.
▶️Answer/Explanation
Solution :-
Part (a): One-tailed or Two-tailed Test
Two-tailed because looking for difference.
Part (b): Carrying out the Test
Given:
- Population standard deviation \(\sigma = 2.3\)
- Sample size \(n = 50\)
- Sample sum \(\sum x = 597.1\)
- Sample mean \(\bar{x} = \frac{\sum x}{n} = \frac{597.1}{50} = 11.942\)
- Hypothesized mean \(\mu_0 = 12.7\)
- Significance level \(\alpha = 1\% = 0.01\)
Hypotheses:
- \(H_0: \mu = 12.7\) (No ft from part (a).)
- \(H_1: \mu \neq 12.7\) (No ft from part (a).)
Test statistic (z-score):
\(z = \frac{\bar{x} – \mu_0}{\frac{\sigma}{\sqrt{n}}} = \frac{11.942 – 12.7}{\frac{2.3}{\sqrt{50}}}\)
\(z = \frac{\frac{597.1}{50} – 12.7}{\frac{2.3}{\sqrt{50}}}\)
\(z = -2.330\) (or 0.00989 or 0.0099. Accept 2.336 or 2.337 or 0.0097 if area comparison used.)
Critical value for a two-tailed test at 1% significance level:
\(z_{\alpha/2} = z_{0.005} \approx \pm 2.576\) (Accept 2.574 to 2.579.)
Decision:
\(-2.330 > -2.576\) or \(2.330 < 2.576\) (Or 0.00989 > 0.005 or 0.0097 > 0.005)
Or use of CV. \(12.7 – 2.576 \times (2.3 / \sqrt{50}) = 11.862\) and \(11.942 > 11.862\).
Conclusion:
[Not reject \(H_0\)] There is insufficient evidence to suggest that \(\mu\) is not 12.7. (FT OF ft their \(z\)-value. In context, not definite, e.g. not ‘\(\mu = 12.7\)’. No contradictions.)
SC use of 1 tailed test can score B0M1A1M1 for comparison with 0.01 A0 max 3/5.
Question 6
(a) Topic: 6.1 The Poisson distribution
(b) Topic: 6.1 The Poisson distribution
(c) Topic: 6.1 The Poisson distribution
The numbers of customers arriving at service desks \(A\) and \(B\) during a 10-minute period have the independent distributions Po(1.8) and Po(2.1) respectively.
(a) Find the probability that during a randomly chosen 15-minute period more than 2 customers will arrive at desk \(A\).
(b) Find the probability that during a randomly chosen 5-minute period the total number of customers arriving at both desks is less than 4.
(c) An inspector waits at desk \(B\). She wants to wait long enough to be 90% certain of seeing at least one customer arrive at the desk. Find the minimum time for which she should wait, giving your answer correct to the nearest minute.
▶️Answer/Explanation
Solution :-
Part (a): Probability for Desk A in 15 minutes
Desk \(A\) has a rate of 1.8 customers per 10 minutes. For a 15-minute period, the rate is:
\(\lambda_A = 1.8 \times \frac{15}{10} = 2.7\)
\(X_A \sim Po(2.7)\)
We want to find \(P(X_A > 2)\), which is \(1 – P(X_A \le 2)\).
\(P(X_A \le 2) = P(X_A = 0) + P(X_A = 1) + P(X_A = 2)\)
\(P(X_A = k) = \frac{e^{-\lambda_A} \lambda_A^k}{k!}\)
\(P(X_A = 0) = \frac{e^{-2.7} 2.7^0}{0!} = e^{-2.7} \approx 0.0672\)
\(P(X_A = 1) = \frac{e^{-2.7} 2.7^1}{1!} = 2.7e^{-2.7} \approx 0.1815\)
\(P(X_A = 2) = \frac{e^{-2.7} 2.7^2}{2!} = \frac{7.29e^{-2.7}}{2} \approx 0.2450\)
\(P(X_A \le 2) = 0.0672 + 0.1815 + 0.2450 = 0.4937\)
\(P(X_A > 2) = 1 – 0.4937 = 0.5063\)
\([\lambda=2.7] 1 – e^{-2.7} (1 + 2.7 + \frac{2.7^2}{2})\) or \(1 – e^{-2.7} (1 + 2.7 + 3.645)\) or \(1 – (0.06721 + 0.1815 + 0.2450)\) (Any \(\lambda\). Allow one end error. Must see expression.)
\(= 0.506 \text{ (3 sf)}\) (SC unsupported answer 0.506 scores B1.)
Part (b): Probability for Both Desks in 5 minutes
Desk \(A\) has a rate of 1.8 customers per 10 minutes. For a 5-minute period, the rate is:
\(\lambda_A = 1.8 \times \frac{5}{10} = 0.9\)
Desk \(B\) has a rate of 2.1 customers per 10 minutes. For a 5-minute period, the rate is:
\(\lambda_B = 2.1 \times \frac{5}{10} = 1.05\)
The combined rate is \(\lambda_{A+B} = \lambda_A + \lambda_B = 0.9 + 1.05 = 1.95\)
\(X_{A+B} \sim Po(1.95)\)
We want to find \(P(X_{A+B} < 4) = P(X_{A+B} \le 3)\)
\(P(X_{A+B} \le 3) = P(X_{A+B} = 0) + P(X_{A+B} = 1) + P(X_{A+B} = 2) + P(X_{A+B} = 3)\)
\(P(X_{A+B} = 0) = e^{-1.95} \approx 0.1423\)
\(P(X_{A+B} = 1) = 1.95e^{-1.95} \approx 0.2775\)
\(P(X_{A+B} = 2) = \frac{1.95^2 e^{-1.95}}{2} \approx 0.2701\)
\(P(X_{A+B} = 3) = \frac{1.95^3 e^{-1.95}}{6} \approx 0.1755\)
\(P(X_{A+B} \le 3) = 0.1423 + 0.2775 + 0.2701 + 0.1755 = 0.8654\)
\(\lambda = 1.95\)
\(e^{-1.95} (1 + 1.95 + \frac{1.95^2}{2} + \frac{1.95^3}{6})\) or \(e^{-1.95} (1 + 1.95 + 1.90125 + 1.2358)\) or \(0.1423 + 0.2774 + 0.2705 + 0.1758\) (Any \(\lambda\). Allow one end error. Must see expression.)
\(= 0.866\) (SC unsupported answer 0.866 scores B1B1.)
Part (c): Minimum Waiting Time for Desk B
Desk \(B\) has a rate of 2.1 customers per 10 minutes. Let \(t\) be the waiting time in minutes.
\(\lambda_B = 2.1 \times \frac{t}{10} = 0.21t\)
\(X_B \sim Po(0.21t)\)
We want to find \(t\) such that \(P(X_B \ge 1) \ge 0.9\)
\(1 – P(X_B = 0) \ge 0.9\)
\(P(X_B = 0) \le 0.1\)
\(e^{-0.21t} \le 0.1\)
\(-0.21t \le \ln(0.1)\)
\(0.21t \ge -\ln(0.1)\)
\(t \ge \frac{-\ln(0.1)}{0.21}\)
\(t \ge \frac{2.3026}{0.21}\)
\(t \ge 10.9648\)
Rounding to the nearest minute, \(t \ge 11\) minutes.
\(1 – e^{-2}\)
Question 7
(a) Topic: 6.1 Poisson Distribution and 6.5 Hypothesis tests
(b) Topic: 6.1 Poisson Distribution and 6.5 Hypothesis tests
(c) Topic: 6.1 Poisson Distribution and 6.5 Hypothesis tests
(d) Topic: 6.1 Poisson Distribution and 6.5 Hypothesis tests
The number of accidents per year on a certain road has the distribution Po(\(\lambda\)). In the past the value of \(\lambda\) was 3.3. Recently, a new speed limit was imposed and the council wishes to test whether the value of \(\lambda\) has decreased. The council notes the total number, \(X\), of accidents during two randomly chosen years after the speed limit was introduced and carries out a test at the 5% significance level.
(a) Calculate the probability of a Type I error.
(b) Given that \(X = 2\), carry out the test.
(c) The council decides to carry out another similar test at the 5% significance level using the same hypotheses and two different randomly chosen years. Given that the true value of \(\lambda\) is 0.6, calculate the probability of a Type II error.
(d) Using \(\lambda = 0.6\) and a suitable approximating distribution, find the probability that there will be more than 19 accidents in 50 years.
▶️Answer/Explanation
Solution :-
Part (a): Probability of Type I Error
Null hypothesis \(H_0: \lambda = 3.3\)
Alternative hypothesis \(H_1: \lambda < 3.3\)
Significance level \(\alpha = 5\% = 0.05\)
Two years: \(\lambda = 3.3 \times 2 = 6.6\)
\(X \sim Po(6.6)\)
We need to find the critical region. Let \(c\) be the critical value.
\(P(X \le c) \le 0.05\)
Using Poisson distribution tables or calculator:
\(P(X \le 2) = 0.0496\)
\(P(X \le 3) = 0.1026\)
Critical region: \(X \le 2\)
Probability of Type I error is \(P(X \le 2)\) when \(\lambda = 6.6\).
\(P(X \le 2) = P(X=0) + P(X=1) + P(X=2)\)
\(P(X=0) = e^{-6.6} \approx 0.001357\)
\(P(X=1) = 6.6e^{-6.6} \approx 0.008956\)
\(P(X=2) = \frac{6.6^2 e^{-6.6}}{2} \approx 0.02955\)
\(P(X \le 2) = 0.001357 + 0.008956 + 0.02955 = 0.039863\)
Probability of Type I error is approximately 0.0399.
\(\lambda = 6.6\)
\(P(X \le 2) = e^{-6.6} (1 + 6.6 + \frac{6.6^2}{2}) [= 0.0400] [< 0.05]\) (Expression must be seen. No end errors. Allow use of 3.3 here.)
or \(e^{-6.6} (1 + 6.6 + 21.78)\)
or \(0.001360 + 0.008978 + 0.02963\)
\(P(X \le 3) = e^{-6.6} (1 + 6.6 + \frac{6.6^2}{2} + \frac{6.6^3}{6})\) or \(0.0400 + … = 0.105 [> 0.05]\) (Condone unsupported 0.105.)
\(P(\text{Type I error}) = 0.0400 \text{ (3 sf)}\) (Allow 0.040 or 0.04 AWRT. SC unsupported ans of 0.0400 can score max B1B1B1.)
Part (b): Carry out the test
Given \(X = 2\), which is in the critical region \(X \le 2\).
Therefore, we reject \(H_0\).
Conclusion: There is sufficient evidence at the 5% significance level to conclude that the value of \(\lambda\) has decreased.
\(H_0: \lambda = 6.6, H_1: \lambda < 6.6\) (May be seen in part (a) and award B1 mark here. Accept \(\mu\) or \(\lambda\). Accept 3.3 or 6.6.)
\([P(X \le 2) = 0.0400] 0.04 < 0.05\) (For comparing their \(P(X \le 2)\) any \(\lambda\) with 0.05.)
[\(\text{Reject } H_0\)] There is evidence to suggest that mean number of accidents has decreased. (In context, not definite. No contradictions. CWO.)
Part (c): Probability of Type II Error
\(H_0: \lambda = 3.3\)
\(H_1: \lambda < 3.3\)
Critical region: \(X \le 2\) when \(\lambda = 6.6\) (from part a).
True value of \(\lambda = 0.6\), so for two years, \(\lambda = 0.6 \times 2 = 1.2\).
\(X \sim Po(1.2)\)
Type II error is failing to reject \(H_0\) when \(H_1\) is true.
We fail to reject \(H_0\) when \(X > 2\).
Probability of Type II error is \(P(X > 2)\) when \(\lambda = 1.2\).
\(P(X > 2) = 1 – P(X \le 2)\)
\(P(X \le 2) = P(X=0) + P(X=1) + P(X=2)\)
\(P(X=0) = e^{-1.2} \approx 0.3012\)
\(P(X=1) = 1.2e^{-1.2} \approx 0.3614\)
\(P(X=2) = \frac{1.2^2 e^{-1.2}}{2} \approx 0.2169\)
\(P(X \le 2) = 0.3012 + 0.3614 + 0.2169 = 0.8795\)
\(P(X > 2) = 1 – 0.8795 = 0.1205\)
Probability of Type II error is approximately 0.1205.
\(P(X > 2)\) attempted, with any \(\lambda\).
\(P(X > 2) = 1 – e^{-1.2} (1 + 1.2 + \frac{1.2^2}{2})\) (Expression must be seen. Correct \(\lambda\). No end errors.)
or \(= 1 – e^{-1.2} (1 + 1.2 + 0.72)\)
or \(= 1 – (0.3012 + 0.3614 + 0.2169)\)
\(= 0.121 \text{ (3 sf)} \text{ or } 0.120\) (SC unsupported answer scores B2.)
Part (d): Normal Approximation
\(\lambda = 0.6\) per year, so for 50 years, \(\lambda = 0.6 \times 50 = 30\).
\(X \sim Po(30)\)
For large \(\lambda\), Poisson can be approximated by Normal distribution.
\(X \sim N(\mu, \sigma^2)\), where \(\mu = \lambda = 30\) and \(\sigma^2 = \lambda = 30\).
\(\sigma = \sqrt{30} \approx 5.477\)
We want \(P(X > 19)\). Using continuity correction, \(P(X > 19.5)\).
\(Z = \frac{X – \mu}{\sigma} = \frac{19.5 – 30}{5.477} = \frac{-10.5}{5.477} \approx -1.917\)
\(P(Z > -1.917) = 1 – P(Z < -1.917)\)
From standard normal tables, \(P(Z < -1.917) \approx 0.0276\)
\(P(Z > -1.917) = 1 – 0.0276 = 0.9724\)
Probability is approximately 0.9724.
\(N(18, 18)\) seen or implied.
\(\frac{10.5 – 18}{\sqrt{18}} [= -1.768]\) (Allow with no or incorrect continuity correction. Their 18.)
\(P(X > -1.768) = \Phi(1.768)\) (ft their standardised value. Area consistent with their values.)
\(= 0.961 \text{ or } 0.962 \text{ (3 sf)}\)