Question .
After taking a mathematics test, Fatima wonders how many more marks she would have achieved if she had spent an extra 1.5 hours studying.
To find out, she randomly selects five students from her class who took the same test and asks them how many hours ( t ) they spent studying for the test and the marks ( m ) they achieved. Their responses are shown in the following table.
(a) (i) Find the Pearson’s product moment correlation coefficient, $r$, for this data.
(ii) Find the least squares regression line of $m$ on $t$ for this data.
(b) According to her model, find how many more marks Fatima would have achieved if she spent an extra 1.5 hours studying.
(c) State one reason why the value obtained in part (b) might not be valid.
▶️Answer/Explanation
Detailed Solution
Given Data:
Hours studied , |
0 | 1.2 | 1.6 | 2.5 | 4 |
---|---|---|---|---|---|
Marks obtained , |
45 | 54 | 61 | 72 | 86 |
(a) (i) Find Pearson’s product-moment correlation coefficient
The formula for Pearson’s correlation coefficient is:
where:
-
and
are the given values,
-
(number of data points).
Let’s calculate
.
The Pearson’s product-moment correlation coefficient
is approximately 0.995, indicating a very strong positive correlation between study time and marks obtained.
(a) (ii) Find the least squares regression line of
on
The least squares regression line is given by:
where:
-
(slope)
-
(intercept)
Let’s calculate
and
.
The least squares regression line of
on
is:
(b) Predicting Additional Marks for Fatima
If Fatima studied an extra 1.5 hours, the increase in marks can be estimated using the slope
from our regression equation:
If Fatima had studied an extra 1.5 hours, she would have achieved approximately 15.90 more marks.
(c): One reason the value 15.9 might not be valid is that the linear regression model assumes a constant increase in marks per hour of study, but in reality, additional study time might yield diminishing returns due to factors like fatigue or saturation, and the model may not account for individual variability in Fatima’s study effectiveness.
…………………………………..Markscheme………………………………….
Solution: –
(a) (i) r = 0.995 (0.994705…)
(ii) m = 10.6t + 43.9 (10.6032…t + 43.8780…)
(b) EITHER
$10.6032…\times1.5$
OR
(10.6032…(t+1.5)+43.8780…)-(10.6032…(t)+43.8780…)
THEN
15.9 (marks) (15.9048…)
(c) Accept any valid reason
e.g:
The students in the sample might not be of equal ability / she has not controlled for ability. She might have originally obtained close to full marks so an extra 15.9 would not be possible.
Question
Observations on 12 pairs of values of the random variables X , Y yielded the following results.
Σx = 76.3 , Σx 2 = 563.7, Σy = 72.2, Σy 2 = 460.1, Σxy = 495.4
(i) Calculate the value of r , the product moment correlation coefficient of the sample.
(ii) Assuming that the distribution of X , Y is bivariate normal with product moment correlation coefficient ρ , calculate the p-value of your result when testing the hypotheses H0 : ρ = 0; H1 : ρ > 0.
- (iii) State whether your p-value suggests that X and Y are independent. [7]
- b Given a further value x = 5.2 from from the distribution of X , Y , predict the corresponding value of y . Give your answer to one decimal place. [3]
▶️Answer/Explanation
Ans:
(a)
(i) use of
(ii)
t = 0.80856… \(\sqrt{\frac{10}{1-0.80856…}}\)
= 4.345…
p-value = 7.27 × 10-4
(iii) this value indicates that X,Y are not independent
(b)
use of
putting x = 5.2 gives y = 5.5
Question
Jim is investigating the relationship between height and foot length in teenage boys.
A sample of 13 boys is taken and the height and foot length of each boy are measured.
The results are shown in the table.
You may assume that this is a random sample from a bivariate normal distribution.
Jim wishes to determine whether or not there is a positive association between height and foot length.
a.Calculate the product moment correlation coefficient.[2]
b.Find the \(p\)–value.[2]
c.Interpret the \(p\)–value in the context of the question.[1]
d.Find the equation of the regression line of \(y\) on \(x\).[2]
e.Estimate the foot length of a boy of height 170 cm.[2]
▶️Answer/Explanation
Markscheme
Note: In all parts accept answers which round to the correct 2sf answer.
\(r = 0.806\) A2
\(4.38 \times {10^{ – 4}}\) A2
\(p\)-value represents strong evidence to indicate a (positive) association between height and foot length A1
Note: FT the \(p\)-value
\(y = 0.103x + 12.3\) A2
attempted substitution of \(x = 170\) (M1)
\(y = 29.7\) A1
Note: Accept \(y = 29.8\)
Question
Bill is investigating whether or not there is a positive association between the heights and weights of boys of a certain age. He defines the hypotheses\[{{\rm{H}}_0}:\rho = 0;{{\rm{H}}_1}:\rho > 0 ,\]where \(\rho \) denotes the population correlation coefficient between heights and weights of boys of this age. He measures the height, \(h\) cm, and weight, \(w\) kg, of each of a random sample of \(20\) boys of this age and he calculates the following statistics.\[\sum {w = 340,\sum {h = 2002,\sum {{w^2} = 5830} } } ,\sum {{h^2} = 201124} ,\sum {hw = 34150} \]
a.(i) Calculate the correlation coefficient for this sample.
(ii) Calculate the \(p\)-value of your result and interpret it at the \(1\% \) level of significance.[8]
b.(i) Calculate the equation of the least squares regression line of \(w\) on \(h\) .
(ii) The height of a randomly selected boy of this age of \(90\) cm. Estimate his weight.[3]
▶️Answer/Explanation
Markscheme
(i) \(r = \frac{{34150 – 340 \times \frac{{2002}}{{20}}}}{{\sqrt {\left( {5830 – \frac{{{{340}^2}}}{{20}}} \right)} \left( {201124 – \frac{{{{2002}^2}}}{{20}}} \right)}}\) (M1)(A1)
Note: Accept equivalent formula.
\( = 0.610\) A1
(ii) (\(T = R \times \sqrt {\frac{{n – 2}}{{1 – {R^2}}}} \) has the t-distribution with \(n – 2\) degrees of freedom)
\(t = 0.6097666 \ldots \sqrt {\frac{{18}}{{1 – 0.6097666{ \ldots ^2}}}} \) M1
\( = 3.2640 \ldots \) A1
\({\rm{DF}} = 18\) A1
\(p{\rm{ – value}} = 0.00215 \ldots \) A1
this is less than \(0.01\), so we conclude that there is a positive association between heights and weights of boys of this age R1
[8 marks]
(i) the equation of the regression line of \(w\) on \(h\) is
\(w – \frac{{340}}{{20}} = \left( {\frac{{20 \times 34150 – 340 \times 2002}}{{20 \times 201124 – {{2002}^2}}}} \right)\left( {h – \frac{{2002}}{{20}}} \right)\) M1
\(w = 0.160h + 0.957\) A1
(ii) putting \(h = 90\) , \(w = 15.4\) (kg) A1
Note: Award M0A0A0 for calculation of \(h\) on \(w\).
[3 marks]
Question
The random variables \(X\), \(Y\) follow a bivariate normal distribution with product moment correlation coefficient \(\rho \). The following table gives a random sample from this distribution.
(a) Determine the value of \(r\), the product moment correlation coefficient of this sample.
(b) (i) Write down hypotheses in terms of \(\rho \) which would enable you to test whether or not \(X\) and \(Y\) are independent.
(ii) Determine the p-value of the above sample and state your conclusion at the 5% significance level. Justify your answer.
(c) (i) Determine the equation of the regression line of \(y\) on \(x\).
(ii) State whether or not this equation can be used to obtain an accurate prediction of the value of \(y\) for a given value of \(x\). Give a reason for your answer.
▶️Answer/Explanation
Markscheme
(a) \(r = – 0.163\) A2
[2 marks]
(b) (i) \({{\text{H}}_0}:\rho = 0:{{\text{H}}_1}:\rho \ne 0\) A1
(ii) \(t = r\sqrt {\frac{{n – 2}}{{1 – {r^2}}}} = – 0.468 \ldots \) (A1)
\({\text{DF}} = 8\) (A1)
\(p{\text{-value}} = 2 \times 0.326 \ldots = 0.652\) A1
since \(0.652 > 0.05\), we accept \({{\text{H}}_0}\) R1
Note: Award (A1)(A1)A0 if the p-value is given as \(0.326\) without prior working.
Note: Follow through their p-value for the R1.
[5 marks]
(c) (i) \(y = – 0.257x + 5.22\) A1
Note: Accept answers which round to \(–0.26\) and \(5.2\).
(ii) no, because \(X\) and \(Y\) have been shown to be independent (or equivalent) A1
[2 marks]
Question
[Maximum mark: 6]
Consider the following data
The regression line for y on x is y = 2.2x – 0.5
(a) Solve the equation above for x to find an expression in the form x = ay+b [2]
(b) Find the equation x = cy+d of the regression line for x on y. [2]
(c) Describe the advantage of the linear equation in (b). [2]
▶️Answer/Explanation
Ans:
(a) y = 2.2x – 0.5 ⇔ y + 0.5 = 2.2x ⇔ x = 0.455 y + 0.227
(b) x = 0.423y + 0.385
(c) The relation in (a) is in fact the inverse function of the line y = 2.2x – 0.5
If y is given, the answer in (c) gives a more reliable estimation of x.