IB Mathematics SL 4.4 Linear correlation of bivariate data AA SL Paper 2- Exam Style Questions- New Syllabus
A class is given two tests, Test A and Test B. Each test is scored out of a total of 100 marks. The scores of the students are shown in the following table.
Let \( x \) be the score on Test A and \( y \) be the score on Test B.
The teacher finds that the equation of the regression line of \( y \) on \( x \) for these scores is \( y = 0.822x + 18.4 \).
a) Find the value of Pearson’s product-moment correlation coefficient, \( r \) [5]
Giovanni was absent for Test A and Paulo was absent for Test B. The teacher uses the regression line of \( y \) on \( x \) to estimate the missing scores. Paulo scored 10 on Test A. The teacher estimated his score on Test B to be 27 to the nearest integer using the following calculation:
\( y = 0.822 \times 10 + 18.4 \approx 27 \)
b) Give a reason why this method is not appropriate for Paulo [3]
Giovanni scored 90 on Test B. The teacher estimated his score on Test A to be 87 to the nearest integer using the following calculation:
\( 90 = 0.822x + 18.4 \), so \( x = \frac{90 – 18.4}{0.822} \approx 87 \)
ci) Give a reason why this method is not appropriate for Giovanni [3]
cii) Use an appropriate method to show that the estimated Test A score for Giovanni is 86 to the nearest integer [5]
▶️ Answer/Explanation
a) Use the data pairs for 10 students: (52, 58), (71, 80), (100, 92), (93, 98), (81, 90), (80, 82), (88, 100), (100, 100), (70, 65), (61, 74).
Calculate means:
\( \sum x = 52 + 71 + 100 + 93 + 81 + 80 + 88 + 100 + 70 + 61 = 796 \)
\( \bar{x} = \frac{796}{10} = 79.6 \)
\( \sum y = 58 + 80 + 92 + 98 + 90 + 82 + 100 + 100 + 65 + 74 = 839 \)
\( \bar{y} = \frac{839}{10} = 83.9 \)
Calculate standard deviations:
Deviations \( x_i – \bar{x} \): -27.6, -8.6, 20.4, 13.4, 1.4, 0.4, 8.4, 20.4, -9.6, -18.6
Squared deviations: 761.76, 73.96, 416.16, 179.56, 1.96, 0.16, 70.56, 416.16, 92.16, 345.96
\( \sum (x_i – \bar{x})^2 = 2358.4 \)
\( s_x = \sqrt{\frac{2358.4}{10}} = \sqrt{235.84} \approx 15.36 \)
Deviations \( y_i – \bar{y} \): -25.9, -3.9, 8.1, 14.1, 6.1, -1.9, 16.1, 16.1, -18.9, -9.9
Squared deviations: 670.81, 15.21, 65.61, 198.81, 37.21, 3.61, 259.21, 259.21, 357.21, 98.01
\( \sum (y_i – \bar{y})^2 = 1964.9 \)
\( s_y = \sqrt{\frac{1964.9}{10}} = \sqrt{196.49} \approx 14.02 \)
Using slope \( b_{y|x} = 0.822 \):
\( 0.822 = r \times \frac{s_y}{s_x} = r \times \frac{14.02}{15.36} \)
\( \frac{14.02}{15.36} \approx 0.9128 \)
\( r = \frac{0.822}{0.9128} \approx 0.901 \)
Rounding to two decimal places: \( r \approx 0.90 \) [5]
b) Paulo’s Test A score of 10 is far below the data range (52 to 100), requiring extrapolation.
The regression line \( y = 0.822 \times 10 + 18.4 \approx 26.62 \approx 27 \) assumes the relationship holds outside the observed range, which may not be valid due to potential non-linearity or unmodeled factors [3]
ci) The regression line \( y = 0.822x + 18.4 \) predicts \( y \) from \( x \), but Giovanni’s estimate uses it to predict \( x \) from \( y = 90 \).
Solving \( 90 = 0.822x + 18.4 \), \( x = \frac{90 – 18.4}{0.822} \approx 87 \) reverses the model, which is not designed for this purpose [3]
cii) Use the regression line of \( x \) on \( y \). Slope \( b_{x|y} = r \times \frac{s_x}{s_y} \):
\( b_{x|y} = 0.901 \times \frac{15.36}{14.02} \approx 0.901 \times 1.0956 \approx 0.9871 \)
Intercept \( a \) using \( \bar{x} = b_{x|y} \bar{y} + a \):
\( 79.6 = 0.9871 \times 83.9 + a \)
\( 0.9871 \times 83.9 \approx 82.8177 \)
\( a \approx 79.6 – 82.8177 \approx -3.2177 \)
Regression line: \( x \approx 0.9871 \times y – 3.2177 \)
For \( y = 90 \):
\( x \approx 0.9871 \times 90 – 3.2177 \)
\( 0.9871 \times 90 \approx 88.839 \)
\( x \approx 88.839 – 3.2177 \approx 85.6213 \approx 86 \) [5]
Alternative: \( b_{x|y} = \frac{r^2}{b_{y|x}} \)
\( r^2 \approx 0.8118 \), \( b_{x|y} \approx \frac{0.8118}{0.822} \approx 0.9876 \)
\( 79.6 = 0.9876 \times 83.9 + a \)
\( 0.9876 \times 83.9 \approx 82.8596 \)
\( a \approx -3.2596 \)
\( x \approx 0.9876 \times 90 – 3.2596 \approx 85.6244 \approx 86 \) [5]