Question
Observations on 12 pairs of values of the random variables X , Y yielded the following results.
Σx = 76.3 , Σx 2 = 563.7, Σy = 72.2, Σy 2 = 460.1, Σxy = 495.4
(i) Calculate the value of r , the product moment correlation coefficient of the sample.
(ii) Assuming that the distribution of X , Y is bivariate normal with product moment correlation coefficient ρ , calculate the p-value of your result when testing the hypotheses H0 : ρ = 0; H1 : ρ > 0.
- (iii) State whether your p-value suggests that X and Y are independent. [7]
- b Given a further value x = 5.2 from from the distribution of X , Y , predict the corresponding value of y . Give your answer to one decimal place. [3]
▶️Answer/Explanation
Ans:
(a)
(i) use of
(ii)
t = 0.80856… \(\sqrt{\frac{10}{1-0.80856…}}\)
= 4.345…
p-value = 7.27 × 10-4
(iii) this value indicates that X,Y are not independent
(b)
use of
putting x = 5.2 gives y = 5.5
Question
Jim is investigating the relationship between height and foot length in teenage boys.
A sample of 13 boys is taken and the height and foot length of each boy are measured.
The results are shown in the table.
You may assume that this is a random sample from a bivariate normal distribution.
Jim wishes to determine whether or not there is a positive association between height and foot length.
a.Calculate the product moment correlation coefficient.[2]
b.Find the \(p\)–value.[2]
c.Interpret the \(p\)–value in the context of the question.[1]
d.Find the equation of the regression line of \(y\) on \(x\).[2]
e.Estimate the foot length of a boy of height 170 cm.[2]
▶️Answer/Explanation
Markscheme
Note: In all parts accept answers which round to the correct 2sf answer.
\(r = 0.806\) A2
\(4.38 \times {10^{ – 4}}\) A2
\(p\)-value represents strong evidence to indicate a (positive) association between height and foot length A1
Note: FT the \(p\)-value
\(y = 0.103x + 12.3\) A2
attempted substitution of \(x = 170\) (M1)
\(y = 29.7\) A1
Note: Accept \(y = 29.8\)
Question
Bill is investigating whether or not there is a positive association between the heights and weights of boys of a certain age. He defines the hypotheses\[{{\rm{H}}_0}:\rho = 0;{{\rm{H}}_1}:\rho > 0 ,\]where \(\rho \) denotes the population correlation coefficient between heights and weights of boys of this age. He measures the height, \(h\) cm, and weight, \(w\) kg, of each of a random sample of \(20\) boys of this age and he calculates the following statistics.\[\sum {w = 340,\sum {h = 2002,\sum {{w^2} = 5830} } } ,\sum {{h^2} = 201124} ,\sum {hw = 34150} \]
a.(i) Calculate the correlation coefficient for this sample.
(ii) Calculate the \(p\)-value of your result and interpret it at the \(1\% \) level of significance.[8]
b.(i) Calculate the equation of the least squares regression line of \(w\) on \(h\) .
(ii) The height of a randomly selected boy of this age of \(90\) cm. Estimate his weight.[3]
▶️Answer/Explanation
Markscheme
(i) \(r = \frac{{34150 – 340 \times \frac{{2002}}{{20}}}}{{\sqrt {\left( {5830 – \frac{{{{340}^2}}}{{20}}} \right)} \left( {201124 – \frac{{{{2002}^2}}}{{20}}} \right)}}\) (M1)(A1)
Note: Accept equivalent formula.
\( = 0.610\) A1
(ii) (\(T = R \times \sqrt {\frac{{n – 2}}{{1 – {R^2}}}} \) has the t-distribution with \(n – 2\) degrees of freedom)
\(t = 0.6097666 \ldots \sqrt {\frac{{18}}{{1 – 0.6097666{ \ldots ^2}}}} \) M1
\( = 3.2640 \ldots \) A1
\({\rm{DF}} = 18\) A1
\(p{\rm{ – value}} = 0.00215 \ldots \) A1
this is less than \(0.01\), so we conclude that there is a positive association between heights and weights of boys of this age R1
[8 marks]
(i) the equation of the regression line of \(w\) on \(h\) is
\(w – \frac{{340}}{{20}} = \left( {\frac{{20 \times 34150 – 340 \times 2002}}{{20 \times 201124 – {{2002}^2}}}} \right)\left( {h – \frac{{2002}}{{20}}} \right)\) M1
\(w = 0.160h + 0.957\) A1
(ii) putting \(h = 90\) , \(w = 15.4\) (kg) A1
Note: Award M0A0A0 for calculation of \(h\) on \(w\).
[3 marks]
Question
The random variables \(X\), \(Y\) follow a bivariate normal distribution with product moment correlation coefficient \(\rho \). The following table gives a random sample from this distribution.
(a) Determine the value of \(r\), the product moment correlation coefficient of this sample.
(b) (i) Write down hypotheses in terms of \(\rho \) which would enable you to test whether or not \(X\) and \(Y\) are independent.
(ii) Determine the p-value of the above sample and state your conclusion at the 5% significance level. Justify your answer.
(c) (i) Determine the equation of the regression line of \(y\) on \(x\).
(ii) State whether or not this equation can be used to obtain an accurate prediction of the value of \(y\) for a given value of \(x\). Give a reason for your answer.
▶️Answer/Explanation
Markscheme
(a) \(r = – 0.163\) A2
[2 marks]
(b) (i) \({{\text{H}}_0}:\rho = 0:{{\text{H}}_1}:\rho \ne 0\) A1
(ii) \(t = r\sqrt {\frac{{n – 2}}{{1 – {r^2}}}} = – 0.468 \ldots \) (A1)
\({\text{DF}} = 8\) (A1)
\(p{\text{-value}} = 2 \times 0.326 \ldots = 0.652\) A1
since \(0.652 > 0.05\), we accept \({{\text{H}}_0}\) R1
Note: Award (A1)(A1)A0 if the p-value is given as \(0.326\) without prior working.
Note: Follow through their p-value for the R1.
[5 marks]
(c) (i) \(y = – 0.257x + 5.22\) A1
Note: Accept answers which round to \(–0.26\) and \(5.2\).
(ii) no, because \(X\) and \(Y\) have been shown to be independent (or equivalent) A1
[2 marks]
Question
[Maximum mark: 6]
Consider the following data
The regression line for y on x is y = 2.2x – 0.5
(a) Solve the equation above for x to find an expression in the form x = ay+b [2]
(b) Find the equation x = cy+d of the regression line for x on y. [2]
(c) Describe the advantage of the linear equation in (b). [2]
▶️Answer/Explanation
Ans:
(a) y = 2.2x – 0.5 ⇔ y + 0.5 = 2.2x ⇔ x = 0.455 y + 0.227
(b) x = 0.423y + 0.385
(c) The relation in (a) is in fact the inverse function of the line y = 2.2x – 0.5
If y is given, the answer in (c) gives a more reliable estimation of x.