Home / IB Mathematics SL 4.2 Presentation of data AI HL Paper 2- Exam Style Questions

IB Mathematics SL 4.2 Presentation of data AI HL Paper 2- Exam Style Questions- New Syllabus

Question

The heights, \( h \), of 200 university students are documented in the table below.

Height (cm)Frequency
\(140 \leq h < 160\)11
\(160 \leq h < 170\)51
\(170 \leq h < 180\)68
\(180 \leq h < 190\)47
\(190 \leq h < 210\)23

(a) (i) Provide the mid-interval value for the range \( 140 \leq h < 160 \).

(ii) Compute an approximate mean height for the 200 students. [3]

(b) Utilize the cumulative frequency graph to approximate the interquartile range. [2]

Elena is a student included in the dataset, and her height is 204cm.

(c) Employ your result from part (b) to assess whether Elena’s height qualifies as an outlier for this data. Provide a justification for your assessment. [3]

It is hypothesized that the heights of university students conform to a normal distribution with a mean of 176cm and a standard deviation of 13.5cm.

A \(\chi^2\) goodness of fit test is planned to evaluate whether this sample of 200 students could reasonably be derived from an underlying distribution \( N(176, 13.5^2) \).

(d) State the null and alternative hypotheses for the test. [2]

As part of the test, the following table is prepared.

Height of student (cm)Observed frequencyExpected frequency
\(h < 160\)1123.6
\(160 \leq h < 170\)5142.1
\(170 \leq h < 180\)68\(a\)
\(180 \leq h < 190\)4746.7
\(190 \leq h\)23\(b\)

(e) (i) Determine the values of \( a \) and \( b \).

(ii) Consequently, conduct the test at a 5% significance level, clearly stating the conclusion in context. [8]

▶️ Answer/Explanation
Markscheme

(a)

(i) The mid-interval value is calculated as the average of the lower and upper bounds of the range \( 140 \leq h < 160 \).

150 (cm)   A1

(ii) Applying the mean formula by multiplying each mid-interval value by its corresponding frequency and dividing by the total number of students.

Mid-interval values: 150, 165, 175, 185, 200 (for ranges 140-160, 160-170, 170-180, 180-190, 190-210).

Frequencies: 11, 51, 68, 47, 23.

\[ \text{mean} = \frac{(150 \times 11) + (165 \times 51) + (175 \times 68) + (185 \times 47) + (200 \times 23)}{200} \]

\[ = \frac{1650 + 8415 + 11900 + 8695 + 4600}{200} \]

\[ = \frac{35260}{200} = 176.3 \]

Approximating, \( \text{mean} = 176 \ (176.3) \ (\text{cm}) \)

A1[3 marks]

(b)

Using the cumulative frequency curve, identify the 25th percentile (Q1) and 75th percentile (Q3) values.

183 OR 168 seen (representing Q3 and Q1 respectively) A1

Note: These values may be seen in the working for part (c).

\[ \text{IQR} = 183 – 168 = 15 \ (\text{cm}) \] A1

[2 marks]

(c)

Calculating the upper bound for outliers using the interquartile range: upper bound = Q3 + 1.5 × IQR.

\[ \text{upper bound} = 183 + 1.5 \times 15 = 205.5 \] seen A1

Comparing Elena’s height (204cm) with the upper bound: \( 205.5 > 204 \), or \( 204 – 183 < 22.5 \), or \( 204 – 22.5 < 183 \).

Elena’s height is not an outlier   A1

Note: Do not award R0A1.

[3 marks]

(d)

Defining the hypotheses for the \(\chi^2\) goodness of fit test.

\(H_0\): The heights of the students can be modelled by \(N(176, 13.5^2)\)

\(H_1\): The heights of the students cannot be modelled by \(N(176, 13.5^2)\)

A1A1

Award A1 for each correct hypothesis that includes a reference to normal distribution with a mean of 176 and a standard deviation of 13.5 (or variance of \(13.5^2\)). “Correlation”, “independence”, “association”, and “relationship” are incorrect.

Award at most A0A1 for correctly worded hypotheses that include a reference to a normal distribution but omit the distribution’s parameters in one or both hypotheses. Award A0A1 for correct hypotheses that are reversed.

[2 marks]

(e)

(i) Assuming the heights follow \( h \sim N(176, 13.5^2) \), calculate the expected frequencies by determining the probabilities for each range and multiplying by 200.

Attempt to find normal probability for \( 170 \leq h < 180 \) or \( h \geq 190 \) M1

Using standard normal distribution tables or calculations:

\( P(170 \leq h < 180) = P\left(\frac{170 – 176}{13.5} \leq Z < \frac{180 – 176}{13.5}\right) = P(-0.44 \leq Z < 0.30) \)

\( \approx 0.1700 – 0.1292 = 0.28826 \), so \( a = 0.28826 \times 200 \approx 57.652 \)

\( P(h \geq 190) = P\left(Z \geq \frac{190 – 176}{13.5}\right) = P(Z \geq 1.04) \)

\( \approx 1 – 0.8508 = 0.1492 \), so \( b = 0.1492 \times 200 \approx 29.84 \)

Rounding to one decimal place, \( a = 57.6 \ (57.6274\ldots), \quad b = 30.0 \ (29.9718\ldots) \) A1A1

(ii) Determining degrees of freedom (df) as the number of intervals minus 1 (5 – 1 = 4).

\( df = 4 \)

Calculating the p-value using the \(\chi^2\) test statistic (assuming standard calculation based on observed and expected frequencies).

\[ (p) = 0.0166 \ (0.0166282\ldots) \] A1

Comparing the p-value to the 5% significance level R1

\( 0.0166 < 0.05 \)

Accept p-value of 0.0165 ( = 0.0164903…) from using a and b to 3 sf. Reject \(H_0\). There is sufficient evidence to say that the data has not been drawn from the \(N(176, 13.5^2)\) distribution.

A1

Note: Do not award R0A1.

The conclusion to part (e)(ii) MUST follow through from their hypotheses seen in part (d); if hypotheses are incorrect/reversed etc., the answer to part (e)(ii) must reflect this in order for the A1 to be credited.

[8 marks]

[Total: 18 marks]

Scroll to Top