# IB DP Maths Topic 7.4 The central limit theorem HL Paper 3

## Question

(a)     A random variable, X , has probability density function defined by

$f(x) = \left\{ {\begin{array}{*{20}{l}} {100,}&{{\text{for }} – 0.005 \leqslant x < 0.005} \\ {0,}&{{\text{otherwise}}{\text{.}}} \end{array}} \right.$

Determine E(X) and Var(X) .

(b)     When a real number is rounded to two decimal places, an error is made.

Show that this error can be modelled by the random variable X .

(c)     A list contains 20 real numbers, each of which has been given to two decimal places. The numbers are then added together.

(i)     Write down bounds for the resulting error in this sum.

(ii)     Using the central limit theorem, estimate to two decimal places the probability that the absolute value of the error exceeds 0.01.

(iii)     State clearly any assumptions you have made in your calculation.

## Markscheme

(a)     f(x)is even (symmetrical about the origin)     (M1)

$${\text{E}}(X) = 0$$     A1

$${\text{Var}}(X) = {\text{E}}({X^2}) = \int_{ – 0.005}^{0.005} {100{x^2}{\text{d}}x}$$     (M1)(A1)

$$= 8.33 \times {10^{ – 6}}\left( {{\text{accept }}0.83 \times {{10}^{ – 5}}{\text{ or }}\frac{1}{{120\,000}}} \right)$$     A1

[5 marks]

(b)     rounding errors to 2 decimal places are uniformly distributed     R1

and lie within the interval $$– 0.005 \leqslant x < 0.005.$$     R1

this defines X     AG

[2 marks]

(c)     (i)     using the symbol y to denote the error in the sum of 20 real numbers each rounded to 2 decimal places

$$– 0.1 \leqslant y( = 20 \times x) < 0.1$$     A1

(ii)     $$Y \approx {\text{N}}(20 \times 0,{\text{ }}20 \times 8.3 \times {10^{ – 6}}) = {\text{N}}(0,{\text{ }}0.00016)$$     (M1)(A1)

$${\text{P}}\left( {\left| Y \right| > 0.01} \right) = 2\left( {1 – {\text{P}}(Y < 0.01)} \right)$$     (M1)(A1)

$$= 2\left( {1 – {\text{P}}\left( {Z < \frac{{0.01}}{{0.0129}}} \right)} \right)$$

$$= 0.44$$ to 2 decimal places     A1     N4

(iii)     it is assumed that the errors in rounding the 20 numbers are independent     R1

and, by the central limit theorem, the sum of the errors can be modelled approximately by a normal distribution     R1

[8 marks]

Total [15 marks]

## Examiners report

This was the only question on the paper with a conceptually ‘hard’ final part. Part(a) was generally well done, either by integration or by use of the standard formulae for a uniform distribution. Many candidates were not able to provide convincing reasoning in parts (b) and (c)(iii). Part(c)(ii), the application of the Central Limit Theorem was only very rarely tackled competently.

## Question

The continuous random variable X has probability density function f given by

$f(x) = \left\{ {\begin{array}{*{20}{c}} {\frac{{3{x^2} + 2x}}{{10}},}&{{\text{for }}1 \leqslant x \leqslant 2} \\ {0,}&{{\text{otherwise}}{\text{.}}} \end{array}} \right.$

(i)     Determine an expression for $$F(x)$$, valid for $$1 \leqslant x \leqslant 2$$, where F denotes the cumulative distribution function of X.

(ii)     Hence, or otherwise, determine the median of X.

[6]
a.

(i)     State the central limit theorem.

(ii)     A random sample of 150 observations is taken from the distribution of X and $$\bar X$$ denotes the sample mean. Use the central limit theorem to find, approximately, the probability that $$\bar X$$ is greater than 1.6.

[8]
b.

## Markscheme

(i)     $$F(x) = \int_1^x {\frac{{3{u^2} + 2u}}{{10}}{\text{d}}u}$$     (M1)

$$= \left[ {\frac{{{u^3} + {u^2}}}{{10}}} \right]_1^x$$     A1

Note: Do not penalise missing or wrong limits at this stage.

Accept the use of x in the integrand.

$$= \frac{{{x^3} + {x^2} – 2}}{{10}}$$     A1

(ii)     the median m satisfies the equation $$F(m) = \frac{1}{2}$$ so     (M1)

$${m^3} + {m^2} – 7 = 0$$     (A1)

Note: Do not FT from an incorrect $$F(x)$$.

$$m = 1.63$$     A1

Note: Accept any answer that rounds to 1.6.

[6 marks]

a.

(i)     the mean of a large sample from any distribution is approximately

normal     A1

Note: This is the minimum acceptable explanation.

(ii)     we require the mean $$\mu$$ and variance $${\sigma ^2}$$ of X

$$\mu = \int_1^2 {\left( {\frac{{3{x^3} + 2{x^2}}}{{10}}} \right){\text{d}}x}$$     (M1)

$$= \frac{{191}}{{120}}{\text{ }}(1.591666 \ldots )$$     A1

$${\sigma ^2} = \int_1^2 {\left( {\frac{{3{x^4} + 2{x^3}}}{{10}}} \right){\text{d}}x – {\mu ^2}}$$     (M1)

$$= 0.07659722 \ldots$$     A1

the central limit theorem states that

$$\bar X \approx N\left( {\mu ,\frac{{{\sigma ^2}}}{n}} \right),$$ i.e. $$N(1.591666 \ldots ,{\text{ }}0.0005106481 \ldots )$$     M1A1

$${\text{P}}(\bar X > 1.6) = 0.356$$     A1

Note: Accept any answer that rounds to 0.36.

[8 marks]

b.

## Examiners report

Solutions to (a)(i) were disappointing in general, suggesting that many candidates are unfamiliar with the concept of the cumulative distribution function. Many candidates knew that it was something to do with the integral of the probability density function but some thought it was $$\int\limits_1^2 {f(x){\text{d}}x}$$ which they then evaluated as $$1$$ while others thought it was just $$\int {f(x){\text{d}}x} = \frac{{\left( {{x^2} + {x^3}} \right)}}{{10}}$$ which is not, in general, a valid method. However, most candidates solved (a)(ii) correctly, usually by integrating the probability density function from $$1$$ to $$m$$.

a.

In (b)(i), the statement of the central limit theorem was often quite dreadful. The term ‘sample mean’ was often not mentioned and a common misconception appears to be that the actual distribution rather than the sample mean tends to normality as the sample size increases. Solutions to (b)(ii) often failed to go beyond finding the mean and variance of $$X$$ . In calculating the variance, some candidates rounded the mean from $$1.5916666..$$ to $$1.59$$ which resulted in an incorrect value for the variance. It is important to note that calculating a variance usually involves a small difference of two large numbers so that full accuracy must be maintained.

b.

## Question

The random variable X has probability distribution Po(8).

(i)     Find $${\text{P}}(X = 6)$$.

(ii)     Find $${\text{P}}(X = 6|5 \leqslant X \leqslant 8)$$.

[5]
a.

$$\bar X$$ denotes the sample mean of $$n > 1$$ independent observations from $$X$$.

(i)     Write down $${\text{E}}(\bar X)$$ and $${\text{Var}}(\bar X)$$.

(ii)     Hence, give a reason why $$\bar X$$ is not a Poisson distribution.

[3]
b.

A random sample of $$40$$ observations is taken from the distribution for $$X$$.

(i)     Find $${\text{P}}(7.1 < \bar X < 8.5)$$.

(ii)     Given that $${\text{P}}\left( {\left| {\bar X – 8} \right| \leqslant k} \right) = 0.95$$, find the value of $$k$$.

[6]
c.

## Markscheme

(i)     $${\text{P}}(X = 6) = 0.122$$     (M1)A1

(ii)     $${\text{P}}(X = 6|5 \leqslant X \leqslant 8) = \frac{{{\text{P}}(X = 6)}}{{{\text{P}}(5 \leqslant X \leqslant 8)}} = \frac{{0.122 \ldots }}{{0.592 \ldots – 0.0996 \ldots }}$$     (M1)(A1)

$$= 0.248$$     A1

[5 marks]

a.

(i)     $${\text{E}}(\bar X) = 8$$     A1

$${\text{Var}}(\bar X) = \frac{8}{n}$$     A1

(ii)     $${\text{E}}(\bar X) \ne {\text{Var}}(\bar X)$$   $${\text{(for }}n > 1)$$     R1

Note:     Only award the R1 if the two expressions in (b)(i) are different.

[3 marks]

b.

(i)     EITHER

$$\bar X \sim {\text{N(8, 0.2)}}$$     (M1)A1

Note:     M1 for normality, A1 for parameters.

$${\text{P}}(7.1 < \bar X < 8.5) = 0.846$$     A1

OR

The expression is equivalent to

$${\text{P}}(283 \leqslant \sum {X \leqslant 339)}$$ where $$\sum X$$ is $${\text{Po(320)}}$$     M1A1

$$= 0.840$$     A1

Note:     Accept 284, 340 instead of 283, 339

Accept any answer that rounds correctly to 0.84 or 0.85.

(ii)     EITHER

$$k = 1.96\frac{\sigma }{{\sqrt n }}$$ or $$1.96{\text{ std}}(\bar X)$$     (M1)(A1)

$$k = 0.877$$ or $$1.96\sqrt {0.2}$$     A1

OR

The expression is equivalent to

$$P(320 – 40k \leqslant \sum {X \leqslant 320 + 40k) = 0.95}$$     (M1)

$$k = 0.875$$     A2

Note:     Accept any answer that rounds to 0.87 or 0.88.

Award M1A0 if modulus sign ignored and answer obtained rounds to 0.74 or 0.75

[6 marks]

c.

[N/A]

a.

[N/A]

b.

[N/A]

c.

## Question

John rings a church bell 120 times. The time interval, $${T_i}$$, between two successive rings is a random variable with mean of 2 seconds and variance of $$\frac{1}{9}{\text{ second}}{{\text{s}}^2}$$.

Each time interval, $${T_i}$$, is independent of the other time intervals. Let $$X = \sum\limits_{i = 1}^{119} {{T_i}}$$ be the total time between the first ring and the last ring.

The church vicar subsequently becomes suspicious that John has stopped coming to ring the bell and that he is letting his friend Ray do it. When Ray rings the bell the time interval, $${T_i}$$ has a mean of 2 seconds and variance of $$\frac{1}{{25}}{\text{ second}}{{\text{s}}^2}$$.

The church vicar makes the following hypotheses:

$${H_0}$$: Ray is ringing the bell; $${H_1}$$: John is ringing the bell.

He records four values of $$X$$. He decides on the following decision rule:

If $$236 \leqslant X \leqslant 240$$ for all four values of $$X$$ he accepts $${H_0}$$, otherwise he accepts $${H_1}$$.

Find

(i)     $${\text{E}}(X)$$;

(ii)     $${\text{Var}}(X)$$.

[3]
a.

Explain why a normal distribution can be used to give an approximate model for $$X$$.

[2]
b.

Use this model to find the values of $$A$$ and $$B$$ such that $${\text{P}}(A < X < B) = 0.9$$, where $$A$$ and $$B$$ are symmetrical about the mean of $$X$$.

[7]
c.

Calculate the probability that he makes a Type II error.

[5]
d.

## Markscheme

(i)     $${\text{mean}} = 119 \times 2 = 238$$     A1

(ii)     $${\text{variance}} = 119 \times \frac{1}{9} = \frac{{119}}{9}{\text{ }}( = 13.2)$$     (M1)A1

Note: If 120 is used instead of 119 award A0(M1)A0 for part (a) and apply follow through for parts (b)-(d). (b) is unaffected and in (c) the interval becomes $$(234,{\text{ }}246)$$. In (d) the first 2 A1 marks are for $$0.3633 \ldots$$ and $$0.0174 \ldots$$ so the final answer will round to 0.017.

[3 marks]

a.

justified by the Central Limit Theorem     R1

since $$n$$ is large     A1

Note: Accept $$n > 30$$.

[2 marks]

b.

$$X \sim N\left( {238,{\text{ }}\frac{{119}}{9}} \right)$$

$$Z = \frac{{X – 238}}{{\frac{{\sqrt {119} }}{3}}} \sim N(0,{\text{ }}1)$$     (M1)(A1)

$${\text{P}}(Z < q) = 0.95 \Rightarrow q = 1.644 \ldots$$    (A1)

so $${\text{P}}( – 1.644 \ldots < Z < 1.644 \ldots ) = 0.9$$     (R1)

$${\text{P}}( – 1.644 \ldots < \frac{{X – 238}}{{\frac{{\sqrt {119} }}{3}}} < 1.644 \ldots ) = 0.9$$    (M1)

interval is $$232 < X < 244{\text{ }}({\text{3sf}}){\text{ }}(A = 232,{\text{ }}B = 244)$$     A1A1

Notes: Accept the use of inverse normal applied to the distribution of $$X$$.

Alternative is to use the GDC to find a pretend $$Z$$ confidence interval for a mean and then convert by multiplying by 119.

Either $$A$$ or $$B$$ correct implies the five implied marks.

Accept any numbers that round to these 3sf numbers.

[7 marks]

c.

under $${{\text{H}}_1},{\text{ }}X \sim N\left( {238,{\text{ }}\frac{{119}}{9}} \right)$$     (M1)

$${\text{P}}(236 \leqslant X \leqslant 240) = 0.41769 \ldots$$    (A1)

probability that all 4 values of $$X$$ lie in this interval is

$${(0.41769 \ldots )^4} = 0.030439 \ldots$$     (M1)(A1)

so probability of a Type II error is 0.0304 (3sf)     A1

Note: Accept any answer that rounds to 0.030.

[5 marks]

d.

[N/A]

a.

[N/A]

b.

[N/A]

c.

[N/A]

d.