IB Mathematics SL 4.2 Presentation of data AI HL Paper 2- Exam Style Questions- New Syllabus
The heights, \( h \), of 200 university students are documented in the table below.
Height (cm) | Frequency |
---|---|
\(140 \leq h < 160\) | 11 |
\(160 \leq h < 170\) | 51 |
\(170 \leq h < 180\) | 68 |
\(180 \leq h < 190\) | 47 |
\(190 \leq h < 210\) | 23 |
(a) (i) Provide the mid-interval value for the range \( 140 \leq h < 160 \).
(ii) Compute an approximate mean height for the 200 students. [3]
(b) Utilize the cumulative frequency graph to approximate the interquartile range. [2]
Elena is a student included in the dataset, and her height is 204cm.
(c) Employ your result from part (b) to assess whether Elena’s height qualifies as an outlier for this data. Provide a justification for your assessment. [3]
It is hypothesized that the heights of university students conform to a normal distribution with a mean of 176cm and a standard deviation of 13.5cm.
A \(\chi^2\) goodness of fit test is planned to evaluate whether this sample of 200 students could reasonably be derived from an underlying distribution \( N(176, 13.5^2) \).
(d) State the null and alternative hypotheses for the test. [2]
As part of the test, the following table is prepared.
Height of student (cm) | Observed frequency | Expected frequency |
---|---|---|
\(h < 160\) | 11 | 23.6 |
\(160 \leq h < 170\) | 51 | 42.1 |
\(170 \leq h < 180\) | 68 | \(a\) |
\(180 \leq h < 190\) | 47 | 46.7 |
\(190 \leq h\) | 23 | \(b\) |
(e) (i) Determine the values of \( a \) and \( b \).
(ii) Consequently, conduct the test at a 5% significance level, clearly stating the conclusion in context. [8]
▶️ Answer/Explanation
(a)
(i) The mid-interval value is calculated as the average of the lower and upper bounds of the range \( 140 \leq h < 160 \).
150 (cm) A1
(ii) Applying the mean formula by multiplying each mid-interval value by its corresponding frequency and dividing by the total number of students.
Mid-interval values: 150, 165, 175, 185, 200 (for ranges 140-160, 160-170, 170-180, 180-190, 190-210).
Frequencies: 11, 51, 68, 47, 23.
\[ \text{mean} = \frac{(150 \times 11) + (165 \times 51) + (175 \times 68) + (185 \times 47) + (200 \times 23)}{200} \]
\[ = \frac{1650 + 8415 + 11900 + 8695 + 4600}{200} \]
\[ = \frac{35260}{200} = 176.3 \]
Approximating, \( \text{mean} = 176 \ (176.3) \ (\text{cm}) \)
A1 [3 marks]
(b)
Using the cumulative frequency curve, identify the 25th percentile (Q1) and 75th percentile (Q3) values.
183 OR 168 seen (representing Q3 and Q1 respectively) A1
\[ \text{IQR} = 183 – 168 = 15 \ (\text{cm}) \] A1
[2 marks]
(c)
Calculating the upper bound for outliers using the interquartile range: upper bound = Q3 + 1.5 × IQR.
\[ \text{upper bound} = 183 + 1.5 \times 15 = 205.5 \] seen A1
Comparing Elena’s height (204cm) with the upper bound: \( 205.5 > 204 \), or \( 204 – 183 < 22.5 \), or \( 204 – 22.5 < 183 \).
Elena’s height is not an outlier A1
[3 marks]
(d)
Defining the hypotheses for the \(\chi^2\) goodness of fit test.
\(H_0\): The heights of the students can be modelled by \(N(176, 13.5^2)\)
\(H_1\): The heights of the students cannot be modelled by \(N(176, 13.5^2)\)
A1A1
Award A1 for each correct hypothesis that includes a reference to normal distribution with a mean of 176 and a standard deviation of 13.5 (or variance of \(13.5^2\)). “Correlation”, “independence”, “association”, and “relationship” are incorrect.
Award at most A0A1 for correctly worded hypotheses that include a reference to a normal distribution but omit the distribution’s parameters in one or both hypotheses. Award A0A1 for correct hypotheses that are reversed.
[2 marks]
(e)
(i) Assuming the heights follow \( h \sim N(176, 13.5^2) \), calculate the expected frequencies by determining the probabilities for each range and multiplying by 200.
Attempt to find normal probability for \( 170 \leq h < 180 \) or \( h \geq 190 \) M1
Using standard normal distribution tables or calculations:
\( P(170 \leq h < 180) = P\left(\frac{170 – 176}{13.5} \leq Z < \frac{180 – 176}{13.5}\right) = P(-0.44 \leq Z < 0.30) \)
\( \approx 0.1700 – 0.1292 = 0.28826 \), so \( a = 0.28826 \times 200 \approx 57.652 \)
\( P(h \geq 190) = P\left(Z \geq \frac{190 – 176}{13.5}\right) = P(Z \geq 1.04) \)
\( \approx 1 – 0.8508 = 0.1492 \), so \( b = 0.1492 \times 200 \approx 29.84 \)
Rounding to one decimal place, \( a = 57.6 \ (57.6274\ldots), \quad b = 30.0 \ (29.9718\ldots) \) A1A1
(ii) Determining degrees of freedom (df) as the number of intervals minus 1 (5 – 1 = 4).
\( df = 4 \)
Calculating the p-value using the \(\chi^2\) test statistic (assuming standard calculation based on observed and expected frequencies).
\[ (p) = 0.0166 \ (0.0166282\ldots) \] A1
Comparing the p-value to the 5% significance level R1
\( 0.0166 < 0.05 \)
A1
Note: Do not award R0A1.
The conclusion to part (e)(ii) MUST follow through from their hypotheses seen in part (d); if hypotheses are incorrect/reversed etc., the answer to part (e)(ii) must reflect this in order for the A1 to be credited.
[8 marks]
[Total: 18 marks]