Home / Edexcel A Level / Study notes

Edexcel IAL - Statistics 3- 3.8 Large Sample Tests with Unknown Variances- Study notes  - New syllabus

Edexcel IAL – Statistics 3- 3.8 Large Sample Tests with Unknown Variances -Study notes- New syllabus

Edexcel IAL – Statistics 3- 3.8 Large Sample Tests with Unknown Variances -Study notes -Edexcel A level Maths- per latest Syllabus.

Key Concepts:

  • 3.8 Large Sample Tests with Unknown Variances

Edexcel IAL Maths-Study Notes- All Topics

Large Sample Test for the Difference Between Two Means (Variances Unknown)

When comparing the means of two populations, the population variances are often unknown. If the sample sizes are sufficiently large, large sample results together with the Central Limit Theorem allow us to carry out hypothesis tests using the normal distribution.

In this syllabus, the emphasis is on practical application. A knowledge of the \( \mathrm{t} \)-distribution is not required.

Conditions for Use

The following conditions must be satisfied:

  • The two samples are random and independent
  • Sample sizes \( \mathrm{n_x} \) and \( \mathrm{n_y} \) are sufficiently large (typically \( \mathrm{n \geq 30} \))
  • Each population has finite variance

The populations themselves do not need to be normally distributed.

Sample Statistics

Let:

\( \mathrm{\bar{X}} \), \( \mathrm{S_x^2} \), \( \mathrm{n_x} \) be the sample mean, sample variance, and sample size from population X

\( \mathrm{\bar{Y}} \), \( \mathrm{S_y^2} \), \( \mathrm{n_y} \) be the sample mean, sample variance, and sample size from population Y

The difference between the sample means is \( \mathrm{\bar{X} – \bar{Y}} \).

Hypotheses

The null hypothesis is usually

\( \mathrm{H_0:\mu_x – \mu_y = d_0} \)

where \( \mathrm{d_0} \) is a specified value, often 0.

The alternative hypothesis may be two-tailed or one-tailed:

Two-tailed: \( \mathrm{H_1:\mu_x – \mu_y \neq d_0} \)

Upper-tailed: \( \mathrm{H_1:\mu_x – \mu_y > d_0} \)

Lower-tailed: \( \mathrm{H_1:\mu_x – \mu_y < d_0} \)

Test Statistic

For large samples, the test statistic used is

\( \mathrm{Z = \dfrac{(\bar{X} – \bar{Y}) – (\mu_x – \mu_y)}{\sqrt{\dfrac{S_x^2}{n_x} + \dfrac{S_y^2}{n_y}}}} \)

This statistic can be treated as having the standard normal distribution:

\( \mathrm{Z \sim N(0,1)} \)

No use of the \( \mathrm{t} \)-distribution is required.

Decision Rule

Using the critical value method:

Two-tailed 5% test: reject \( \mathrm{H_0} \) if \( \mathrm{|Z| > 1.96} \)

Upper-tailed 5% test: reject \( \mathrm{H_0} \) if \( \mathrm{Z > 1.645} \)

Lower-tailed 5% test: reject \( \mathrm{H_0} \) if \( \mathrm{Z < -1.645} \)

Alternatively, a p-value approach may be used.

Interpretation

Rejecting \( \mathrm{H_0} \) provides evidence of a difference between population means

Not rejecting \( \mathrm{H_0} \) indicates insufficient evidence of a difference

Points to Remember

  • Large samples justify normal approximation
  • Sample variances replace population variances
  • No \( \mathrm{t} \)-distribution is needed

Example :

Two independent populations X and Y have unknown variances.

A random sample from X of size \( \mathrm{n_x = 64} \) has mean \( \mathrm{\bar{x} = 72} \) and sample standard deviation \( \mathrm{S_x = 10} \).

A random sample from Y of size \( \mathrm{n_y = 49} \) has mean \( \mathrm{\bar{y} = 68} \) and sample standard deviation \( \mathrm{S_y = 14} \).

Test at the 5% significance level whether the population means are different.

▶️ Answer/Explanation

Hypotheses

\( \mathrm{H_0:\mu_x – \mu_y = 0} \)

\( \mathrm{H_1:\mu_x – \mu_y \neq 0} \)

Test statistic

\( \mathrm{Z = \dfrac{(72-68)}{\sqrt{\dfrac{10^2}{64}+\dfrac{14^2}{49}}} = \dfrac{4}{\sqrt{1.5625+4}} = \dfrac{4}{2.36} = 1.69} \)

Decision

At 5% level, reject \( \mathrm{H_0} \) if \( \mathrm{|Z|>1.96} \).

Conclusion

Since \( \mathrm{1.69<1.96} \), do not reject \( \mathrm{H_0} \). There is insufficient evidence of a difference between the population means.

Example :

A company compares the mean daily output of two factories.

Factory A: \( \mathrm{n_x = 100,\; \bar{x} = 540,\; S_x = 30} \)

Factory B: \( \mathrm{n_y = 81,\; \bar{y} = 520,\; S_y = 27} \)

Test at the 1% significance level whether Factory A has a higher mean output.

▶️ Answer/Explanation

Hypotheses

\( \mathrm{H_0:\mu_x – \mu_y = 0} \)

\( \mathrm{H_1:\mu_x – \mu_y > 0} \)

Test statistic

\( \mathrm{Z = \dfrac{540-520}{\sqrt{\dfrac{30^2}{100}+\dfrac{27^2}{81}}} = \dfrac{20}{\sqrt{9+9}} = \dfrac{20}{4.24} = 4.72} \)

Decision

Upper-tailed 1% test: reject if \( \mathrm{Z>2.33} \).

Conclusion

Since \( \mathrm{4.72>2.33} \), reject \( \mathrm{H_0} \). There is strong evidence that Factory A has a higher mean output.

Example :

Two treatments are compared for recovery time.

Treatment X: \( \mathrm{n_x = 50,\; \bar{x} = 18.6,\; S_x = 4.5} \)

Treatment Y: \( \mathrm{n_y = 45,\; \bar{y} = 19.8,\; S_y = 5.1} \)

Test at the 5% significance level whether Treatment X gives a shorter mean recovery time.

▶️ Answer/Explanation

Hypotheses

\( \mathrm{H_0:\mu_x – \mu_y = 0} \)

\( \mathrm{H_1:\mu_x – \mu_y < 0} \)

Test statistic

\( \mathrm{Z = \dfrac{18.6-19.8}{\sqrt{\dfrac{4.5^2}{50}+\dfrac{5.1^2}{45}}} = \dfrac{-1.2}{\sqrt{0.405+0.578}} = \dfrac{-1.2}{0.99} = -1.21} \)

Decision

Lower-tailed 5% test: reject if \( \mathrm{Z<-1.645} \).

Conclusion

Since \( \mathrm{-1.21>-1.645} \), do not reject \( \mathrm{H_0} \). There is insufficient evidence that Treatment X gives a shorter mean recovery time.

Scroll to Top