Edexcel IAL - Statistics 3- 3.8 Large Sample Tests with Unknown Variances- Study notes - New syllabus
Edexcel IAL – Statistics 3- 3.8 Large Sample Tests with Unknown Variances -Study notes- New syllabus
Edexcel IAL – Statistics 3- 3.8 Large Sample Tests with Unknown Variances -Study notes -Edexcel A level Maths- per latest Syllabus.
Key Concepts:
- 3.8 Large Sample Tests with Unknown Variances
Large Sample Test for the Difference Between Two Means (Variances Unknown)
When comparing the means of two populations, the population variances are often unknown. If the sample sizes are sufficiently large, large sample results together with the Central Limit Theorem allow us to carry out hypothesis tests using the normal distribution.
In this syllabus, the emphasis is on practical application. A knowledge of the \( \mathrm{t} \)-distribution is not required.
Conditions for Use
The following conditions must be satisfied:
- The two samples are random and independent
- Sample sizes \( \mathrm{n_x} \) and \( \mathrm{n_y} \) are sufficiently large (typically \( \mathrm{n \geq 30} \))
- Each population has finite variance
The populations themselves do not need to be normally distributed.
Sample Statistics
![]()
Let:
\( \mathrm{\bar{X}} \), \( \mathrm{S_x^2} \), \( \mathrm{n_x} \) be the sample mean, sample variance, and sample size from population X
\( \mathrm{\bar{Y}} \), \( \mathrm{S_y^2} \), \( \mathrm{n_y} \) be the sample mean, sample variance, and sample size from population Y
The difference between the sample means is \( \mathrm{\bar{X} – \bar{Y}} \).
Hypotheses![]()
The null hypothesis is usually
\( \mathrm{H_0:\mu_x – \mu_y = d_0} \)
where \( \mathrm{d_0} \) is a specified value, often 0.
The alternative hypothesis may be two-tailed or one-tailed:
Two-tailed: \( \mathrm{H_1:\mu_x – \mu_y \neq d_0} \)
Upper-tailed: \( \mathrm{H_1:\mu_x – \mu_y > d_0} \)
Lower-tailed: \( \mathrm{H_1:\mu_x – \mu_y < d_0} \)
Test Statistic
For large samples, the test statistic used is
\( \mathrm{Z = \dfrac{(\bar{X} – \bar{Y}) – (\mu_x – \mu_y)}{\sqrt{\dfrac{S_x^2}{n_x} + \dfrac{S_y^2}{n_y}}}} \)
This statistic can be treated as having the standard normal distribution:
\( \mathrm{Z \sim N(0,1)} \)
No use of the \( \mathrm{t} \)-distribution is required.
Decision Rule
Using the critical value method:
Two-tailed 5% test: reject \( \mathrm{H_0} \) if \( \mathrm{|Z| > 1.96} \)
Upper-tailed 5% test: reject \( \mathrm{H_0} \) if \( \mathrm{Z > 1.645} \)
Lower-tailed 5% test: reject \( \mathrm{H_0} \) if \( \mathrm{Z < -1.645} \)
Alternatively, a p-value approach may be used.
Interpretation
Rejecting \( \mathrm{H_0} \) provides evidence of a difference between population means
Not rejecting \( \mathrm{H_0} \) indicates insufficient evidence of a difference
Points to Remember
- Large samples justify normal approximation
- Sample variances replace population variances
- No \( \mathrm{t} \)-distribution is needed
Example :
Two independent populations X and Y have unknown variances.
A random sample from X of size \( \mathrm{n_x = 64} \) has mean \( \mathrm{\bar{x} = 72} \) and sample standard deviation \( \mathrm{S_x = 10} \).
A random sample from Y of size \( \mathrm{n_y = 49} \) has mean \( \mathrm{\bar{y} = 68} \) and sample standard deviation \( \mathrm{S_y = 14} \).
Test at the 5% significance level whether the population means are different.
▶️ Answer/Explanation
Hypotheses
\( \mathrm{H_0:\mu_x – \mu_y = 0} \)
\( \mathrm{H_1:\mu_x – \mu_y \neq 0} \)
Test statistic
\( \mathrm{Z = \dfrac{(72-68)}{\sqrt{\dfrac{10^2}{64}+\dfrac{14^2}{49}}} = \dfrac{4}{\sqrt{1.5625+4}} = \dfrac{4}{2.36} = 1.69} \)
Decision
At 5% level, reject \( \mathrm{H_0} \) if \( \mathrm{|Z|>1.96} \).
Conclusion
Since \( \mathrm{1.69<1.96} \), do not reject \( \mathrm{H_0} \). There is insufficient evidence of a difference between the population means.
Example :
A company compares the mean daily output of two factories.
Factory A: \( \mathrm{n_x = 100,\; \bar{x} = 540,\; S_x = 30} \)
Factory B: \( \mathrm{n_y = 81,\; \bar{y} = 520,\; S_y = 27} \)
Test at the 1% significance level whether Factory A has a higher mean output.
▶️ Answer/Explanation
Hypotheses
\( \mathrm{H_0:\mu_x – \mu_y = 0} \)
\( \mathrm{H_1:\mu_x – \mu_y > 0} \)
Test statistic
\( \mathrm{Z = \dfrac{540-520}{\sqrt{\dfrac{30^2}{100}+\dfrac{27^2}{81}}} = \dfrac{20}{\sqrt{9+9}} = \dfrac{20}{4.24} = 4.72} \)
Decision
Upper-tailed 1% test: reject if \( \mathrm{Z>2.33} \).
Conclusion
Since \( \mathrm{4.72>2.33} \), reject \( \mathrm{H_0} \). There is strong evidence that Factory A has a higher mean output.
Example :
Two treatments are compared for recovery time.
Treatment X: \( \mathrm{n_x = 50,\; \bar{x} = 18.6,\; S_x = 4.5} \)
Treatment Y: \( \mathrm{n_y = 45,\; \bar{y} = 19.8,\; S_y = 5.1} \)
Test at the 5% significance level whether Treatment X gives a shorter mean recovery time.
▶️ Answer/Explanation
Hypotheses
\( \mathrm{H_0:\mu_x – \mu_y = 0} \)
\( \mathrm{H_1:\mu_x – \mu_y < 0} \)
Test statistic
\( \mathrm{Z = \dfrac{18.6-19.8}{\sqrt{\dfrac{4.5^2}{50}+\dfrac{5.1^2}{45}}} = \dfrac{-1.2}{\sqrt{0.405+0.578}} = \dfrac{-1.2}{0.99} = -1.21} \)
Decision
Lower-tailed 5% test: reject if \( \mathrm{Z<-1.645} \).
Conclusion
Since \( \mathrm{-1.21>-1.645} \), do not reject \( \mathrm{H_0} \). There is insufficient evidence that Treatment X gives a shorter mean recovery time.
