Home / IB Mathematics SL 4.4 Linear correlation of bivariate data AI HL Paper 1- Exam Style Questions

IB Mathematics SL 4.4 Linear correlation of bivariate data AI HL Paper 1- Exam Style Questions

IB Mathematics SL 4.4 Linear correlation of bivariate data AI HL Paper 1- Exam Style Questions- New Syllabus

Question

Bill is investigating whether or not there is a positive association between the heights and weights of boys of a certain age. He defines the hypotheses \[{{\rm{H}}_0}:\rho = 0;{{\rm{H}}_1}:\rho > 0 ,\] where \(\rho \) denotes the population correlation coefficient between heights and weights of boys of this age. He measures the height, \(h\) cm, and weight, \(w\) kg, of each of a random sample of \(20\) boys of this age and he calculates the following statistics. \[\sum {w = 340,\sum {h = 2002,\sum {{w^2} = 5830} } } ,\sum {{h^2} = 201124} ,\sum {hw = 34150} \]

a. (i) Calculate the correlation coefficient for this sample.

(ii) Calculate the \(p\)-value of your result and interpret it at the \(1\%\) level of significance. [8]

b. (i) Calculate the equation of the least squares regression line of \(w\) on \(h\).

(ii) The height of a randomly selected boy of this age of \(90\) cm. Estimate his weight. [3]

▶️ Answer/Explanation
Solution a(i)

To calculate Pearson’s correlation coefficient (r):

\[r = \frac{n\sum hw – (\sum h)(\sum w)}{\sqrt{[n\sum h^2 – (\sum h)^2][n\sum w^2 – (\sum w)^2]}}\]

Plugging in the given values (n=20):

Numerator = \(20 \times 34150 – 2002 \times 340 = 683000 – 680680 = 2320\)

Denominator part 1 = \(20 \times 201124 – 2002^2 = 4022480 – 4008004 = 14476\)

Denominator part 2 = \(20 \times 5830 – 340^2 = 116600 – 115600 = 1000\)

\[r = \frac{2320}{\sqrt{14476 \times 1000}} = \frac{2320}{3804.74} = 0.6098\]

Thus, the sample correlation coefficient is \(\boxed{0.610}\) (to 3 significant figures).

Solution a(ii)

To test the significance of the correlation:

Test statistic: \( t = r\sqrt{\frac{n-2}{1-r^2}} \) with \( n-2 = 18 \) degrees of freedom

\[ t = 0.6098\sqrt{\frac{18}{1-0.6098^2}} = 0.6098\sqrt{\frac{18}{0.6281}} = 0.6098 \times 5.352 = 3.2636 \]

For a one-tailed t-test (since H₁: ρ > 0) with df=18:

p-value = P(T > 3.2636) ≈ 0.00215 (from t-distribution tables or calculator)

Since 0.00215 < 0.01 (1% significance level), we reject H₀.

Conclusion: There is statistically significant evidence at the 1% level to support a positive correlation between height and weight.

Solution b(i)

For the regression line \( w = a + bh \):

Slope (b): \( b = r\frac{s_w}{s_h} = \frac{n\sum hw – \sum h\sum w}{n\sum h^2 – (\sum h)^2} \)

\[ b = \frac{2320}{14476} = 0.1603 \text{ kg/cm} \]

Intercept (a): \( a = \bar{w} – b\bar{h} = \frac{340}{20} – 0.1603 \times \frac{2002}{20} = 17 – 16.04 = 0.96 \text{ kg} \)

Thus, the regression equation is:

\(\boxed{w = 0.96 + 0.160h}\)

Interpretation: For each additional cm in height, weight increases by about 0.160 kg on average.

Solution b(ii)

Using the regression equation to estimate weight when h = 90 cm:

\[ w = 0.96 + 0.160 \times 90 = 0.96 + 14.4 = 15.36 \text{ kg} \]

\(\boxed{15.4 \text{ kg}}\) (to 3 significant figures)

Note: This is an interpolation as 90 cm falls within the observed data range. The estimate becomes less reliable for heights far outside the sample range.

Scroll to Top