IBDP Maths SL 4.4 Linear correlation of bivariate data AA HL Paper 2- Exam Style Questions- New Syllabus
A botanist is conducting an experiment which studies the growth of plants. The heights of the plants are measured on seven different days. The following table shows the number of days, \( d \), that the experiment has been running and the average height, \( h \) cm, of the plants on each of those days.
Number of days (\( d \)) | 2 | 5 | 13 | 24 | 33 | 37 | 42 |
---|---|---|---|---|---|---|---|
Average height (\( h \)) | 10 | 16 | 30 | 59 | 76 | 79 | 82 |
The value of Pearson’s product-moment correlation coefficient, \( r \), for this data is \( 0.991 \), correct to three significant figures.
(a) The regression line of \( h \) on \( d \) for this data can be written in the form \( h = ad + b \). Find the value of \( a \) and the value of \( b \). [2]
(b) Use your regression line to estimate the average height of the plants when the experiment has been running for 20 days. [2]
▶️ Answer/Explanation
(a) [2 marks]
Regression line: \( h = ad + b \).
Slope: \( a = r \cdot \frac{s_h}{s_d} \), where \( r = 0.991 \), \( s_d \approx 14.72 \), \( s_h \approx 30.25 \) (M1).
\( a = 0.991 \cdot \frac{30.25}{14.72} \approx 1.93258 \approx 1.93 \) (A1).
Intercept: \( b = \bar{h} – a \bar{d} \), where \( \bar{d} \approx 22.29 \), \( \bar{h} \approx 50.29 \).
\( b \approx 50.29 – 1.93 \cdot 22.29 \approx 7.21662 \approx 7.22 \) (A1).
(b) [2 marks]
Use regression line: \( h = 1.93d + 7.22 \) (M1).
For \( d = 20 \): \( h = 1.93 \cdot 20 + 7.22 = 38.6 + 7.22 = 45.82 \approx 45.9 \) cm (A1).
Markscheme Answers:
(a) \( a = 1.93 \), \( b = 7.22 \) (A1A1)
(b) \( 45.9 \) cm (M1A1)
Total [4 marks]
A class is given two tests, Test A and Test B. Each test is scored out of a total of 100 marks. The scores of the students are shown in the following table.
Let \( x \) be the score on Test A and \( y \) be the score on Test B.
The teacher finds that the equation of the regression line of \( y \) on \( x \) for these scores is \( y = 0.822x + 18.4 \).
(a) Find the value of Pearson’s product-moment correlation coefficient, \( r \).
Giovanni was absent for Test A and Paulo was absent for Test B.
The teacher uses the regression line of \( y \) on \( x \) to estimate the missing scores.
Paulo scored 10 on Test A.
The teacher estimated his score on Test B to be 27 to the nearest integer using the following calculation:
\[ y = 0.822(10) + 18.4 \approx 27 \]
(b) Give a reason why this method is not appropriate for Paulo.
Giovanni scored 90 on Test B.
The teacher estimated his score on Test A to be 87 to the nearest integer using the following calculation:
\[ 90 = 0.822x + 18.4, \text{ so } x = \frac{90 – 18.4}{0.822} \approx 87 \]
(c)(i) Give a reason why this method is not appropriate for Giovanni.
(c)(ii) Use an appropriate method to show that the estimated Test A score for Giovanni is 86 to the nearest integer.
▶️ Answer/Explanation
(a) [2 marks]
Using data: \( (x, y) \): (52, 58), (71, 80), (100, 92), (93, 98), (81, 90), (80, 82), (88, 100), (100, 100), (70, 65), (61, 74).
Means: \( \bar{x} = \frac{796}{10} = 79.6 \), \( \bar{y} = \frac{839}{10} = 83.9 \) (M1).
Standard deviations: \( s_x = \sqrt{\frac{2358.4}{10}} \approx 15.36 \), \( s_y = \sqrt{\frac{1964.9}{10}} \approx 14.02 \).
Regression slope: \( b_{y|x} = 0.822 = r \cdot \frac{s_y}{s_x} = r \cdot \frac{14.02}{15.36} \approx r \cdot 0.9128 \).
\( r = \frac{0.822}{0.9128} \approx 0.901 \approx 0.90 \) (A1).
(b) [1 mark]
Paulo’s Test A score (\( x = 10 \)) is outside the data range (52 to 100). Extrapolation using \( y = 0.822x + 18.4 \) is unreliable as the linear relationship may not hold (A1).
(c)(i) [1 mark]
The regression line \( y = 0.822x + 18.4 \) predicts \( y \) from \( x \), not \( x \) from \( y \). Using it to estimate Giovanni’s Test A score from Test B (\( y = 90 \)) is incorrect (A1).
(c)(ii) [2 marks]
Use regression line of \( x \) on \( y \): \( b_{x|y} = r \cdot \frac{s_x}{s_y} = 0.901 \cdot \frac{15.36}{14.02} \approx 0.987 \) (M1).
Line: \( x = 0.987y – 3.22 \).
For \( y = 90 \): \( x = 0.987 \cdot 90 – 3.22 \approx 85.61 \approx 86 \) (A1).
Markscheme Answers:
(a) \( r = 0.90 \) (M1A1)
(b) Should not extrapolate as score is outside data range (A1)
(c)(i) Should not use line of \( y \) on \( x \) to predict \( x \) from \( y \) (A1)
(c)(ii) \( x = 86 \) (M1A1)
Total [6 marks]