IBDP Maths AA: Topic: SL 4.4: Linear correlation of bivariate data: IB style Questions HL Paper 1

Question

Observations on 12 pairs of values of the random variables X , Y yielded the following results.

Σx = 76.3 , Σx 2 = 563.7, Σy = 72.2, Σy 2 = 460.1, Σxy = 495.4

    1. (i) Calculate the value of r , the product moment correlation coefficient of the sample.

      (ii) Assuming that the distribution of X , Y is bivariate normal with product moment correlation coefficient ρ , calculate the p-value of your result when testing the hypotheses H0 : ρ = 0; H1 : ρ > 0.

  1.   (iii) State whether your p-value suggests that X and Y are independent. [7]
  2. b             Given a further value x = 5.2 from from the distribution of X , Y , predict the corresponding value of y . Give your answer to one decimal place. [3]
▶️Answer/Explanation

Ans:

(a)

(i) use of 

(ii)

t = 0.80856… \(\sqrt{\frac{10}{1-0.80856…}}\)

= 4.345…

p-value = 7.27 × 10-4 

(iii) this value indicates that X,Y are not independent

(b)

use of

putting x = 5.2 gives y = 5.5

Question

Jim is investigating the relationship between height and foot length in teenage boys.

A sample of 13 boys is taken and the height and foot length of each boy are measured.

The results are shown in the table.

You may assume that this is a random sample from a bivariate normal distribution.

Jim wishes to determine whether or not there is a positive association between height and foot length.

a.Calculate the product moment correlation coefficient.[2]

b.Find the \(p\)value.[2]

c.Interpret the \(p\)value in the context of the question.[1]

d.Find the equation of the regression line of \(y\) on \(x\).[2]

e.Estimate the foot length of a boy of height 170 cm.[2]

▶️Answer/Explanation

Markscheme

a.Note: In all parts accept answers which round to the correct 2sf answer.

\(r = 0.806\)     A2

b.

\(4.38 \times {10^{ – 4}}\)     A2

c.

\(p\)-value represents strong evidence to indicate a (positive) association between height and foot length     A1

Note: FT the \(p\)-value

d.

\(y = 0.103x + 12.3\)     A2

e.

attempted substitution of \(x = 170\)     (M1)

\(y = 29.7\)     A1

Note: Accept \(y = 29.8\)

 

Question

Bill is investigating whether or not there is a positive association between the heights and weights of boys of a certain age. He defines the hypotheses\[{{\rm{H}}_0}:\rho  = 0;{{\rm{H}}_1}:\rho  > 0 ,\]where \(\rho \) denotes the population correlation coefficient between heights and weights of boys of this age. He measures the height, \(h\) cm, and weight, \(w\) kg, of each of a random sample of \(20\) boys of this age and he calculates the following statistics.\[\sum {w = 340,\sum {h = 2002,\sum {{w^2} = 5830} } } ,\sum {{h^2} = 201124} ,\sum {hw = 34150} \]

a.(i)     Calculate the correlation coefficient for this sample.

(ii)     Calculate the \(p\)-value of your result and interpret it at the \(1\% \) level of significance.[8]

b. (i)     Calculate the equation of the least squares regression line of \(w\) on \(h\) .

(ii)     The height of a randomly selected boy of this age of \(90\) cm. Estimate his weight.[3]

▶️Answer/Explanation

Markscheme

(i)     \(r = \frac{{34150 – 340 \times \frac{{2002}}{{20}}}}{{\sqrt {\left( {5830 – \frac{{{{340}^2}}}{{20}}} \right)} \left( {201124 – \frac{{{{2002}^2}}}{{20}}} \right)}}\)     (M1)(A1)

Note: Accept equivalent formula.

 

\( = 0.610\)     A1

 

(ii)     (\(T = R \times \sqrt {\frac{{n – 2}}{{1 – {R^2}}}} \) has the t-distribution with \(n – 2\) degrees of freedom)

\(t = 0.6097666 \ldots \sqrt {\frac{{18}}{{1 – 0.6097666{ \ldots ^2}}}} \)     M1

\( = 3.2640 \ldots \)     A1

\({\rm{DF}} = 18\)     A1

\(p{\rm{ – value}} = 0.00215 \ldots \)     A1

this is less than \(0.01\), so we conclude that there is a positive association between heights and weights of boys of this age     R1

Question

The random variables \(X\), \(Y\) follow a bivariate normal distribution with product moment correlation coefficient \(\rho \). The following table gives a random sample from this distribution.

(a)     Determine the value of \(r\), the product moment correlation coefficient of this sample.

(b)     (i)     Write down hypotheses in terms of \(\rho \) which would enable you to test whether or not \(X\) and \(Y\) are independent.

(ii)     Determine the p-value of the above sample and state your conclusion at the 5% significance level. Justify your answer.

(c)     (i)     Determine the equation of the regression line of \(y\) on \(x\).

(ii)     State whether or not this equation can be used to obtain an accurate prediction of the value of \(y\) for a given value of \(x\). Give a reason for your answer.

▶️Answer/Explanation

Markscheme

(a)     \(r =  – 0.163\)     A2

[2 marks]

 

(b)     (i)     \({{\text{H}}_0}:\rho  = 0:{{\text{H}}_1}:\rho  \ne 0\)     A1

(ii)     \(t = r\sqrt {\frac{{n – 2}}{{1 – {r^2}}}}  =  – 0.468 \ldots \)     (A1)

\({\text{DF}} = 8\)     (A1)

\(p{\text{-value}} = 2 \times 0.326 \ldots  = 0.652\)   A1

since \(0.652 > 0.05\), we accept \({{\text{H}}_0}\)     R1

Note: Award (A1)(A1)A0 if the p-value is given as \(0.326\) without prior working.

Note: Follow through their p-value for the R1.

[5 marks]

 

(c)     (i)     \(y =  – 0.257x + 5.22\)     A1

Note: Accept answers which round to \(–0.26\) and \(5.2\).

(ii)     no, because \(X\) and \(Y\) have been shown to be independent (or equivalent)     A1

[2 marks]

MAA SL 4.4 LINEAR REGRESSION [concise]-lala

Question

[Maximum mark: 10] [with GDC]
Consider the following data

(a) Find the correlation coefficient \(r\).
(b) Describe the relation between \(x\) and \(y\).
(c) Find the equation \(y = ax+b\) of the regression line for \(y\) on \(x\).
(d) Find the equation \(x = cy+d\) of the regression line for \(x\) on \(y\).
(e) Find the inverse of the function in question (c); Is it the function in question (d)?

▶️Answer/Explanation

Ans.

(a) 0.965
(b) strong positive
(c) y = 2.2x – 0.5
(d) x = 0.423y + 0.385
(e) y = 2.2x – 0.5 ⇔ y + 0.5 = 2.2x ⇔ x = 0.455 y + 0.227. They are different

Question

[Maximum mark: 4] [without GDC]
Statements I, II, III, IV, V represent descriptions of the correlation between two variables
I      High positive linear correlation
II      Low positive linear correlation
III    No correlation
IV    Low negative linear correlation
V     High negative linear correlation
Which statement best represents the relationship between the two variables shown in
each of the scatter diagrams below.

▶️Answer/Explanation

Ans.

(a) II
(b) V
(c) III
(d) I

Question

[Maximum mark: 7] [without GDC]
The sketches below represent scatter diagrams for the way in which variables \(x, y\) and
\(z\) change over time, \(t\), in a given chemical experiment.

(a) State which of the diagrams indicate that the pair of variables
(i) is not correlated.        (ii) shows strong linear correlation.
(b) A student is given a piece of paper with five numbers written on it. She is told that
three of these numbers are the product moment correlation coefficients for the
three pairs of variables shown above. The five numbers are
0.9,    –0.85,    –0.20,    0.04,    1.60
(i) For each sketch above state which of these five numbers is the most
appropriate value for the correlation coefficient.
(ii) For the two remaining numbers, state why you reject them for this
experiment.

▶️Answer/Explanation

Ans.

Question

[Maximum mark: 4] [without GDC]
The length and width of 10 leaves are shown on the scatter diagram below.

(a) Plot the point \(M\)(97, 43) which represents the mean length and the mean width.
(b) Draw a suitable line of best fit.
(c) Write a sentence describing the relationship between leaf length and leaf width for
this sample.

▶️Answer/Explanation

Ans.

(a) (see diagram)

(b)

(c) leaf length and leaf width are positively correlated

Question

[Maximum mark: 5] [with GDC]
Ten students were asked for their average grade at the end of their last year of high
school and their average grade at the end of their last year at university. The results
were put into a table as follows:

(a) Find the correlation coefficient \(r\).
(b) Describe the correlation between the high school and the university grades.
(c) Find the equation of the regression line for \(y\) on \(x\).

▶️Answer/Explanation

Ans.

(a) \(r\) = 0.76
(b) There is a fairly strong positive correlation between high school
grades and university grades.
(c) \(y\) = 0.052x – 1.29 (3 s.f.)

Question

[Maximum mark: 5] [without GDC]
The diagram below shows the marks scored by pupils in a French test and a German
test. The mean score on the French test is 29 marks and on the German test is 31
marks.

(a) Describe the relationship between the scores.
(b) On the graph mark the point M which represents the mean of the distribution.
(c) Draw a suitable line of best fit.
(d) Idris scored 32 marks on the French test. Use your graph to estimate the mark
Idris scored on the German test.

▶️Answer/Explanation

Ans.

(a) High positive or high or positive or good correlation etc.
(b) Correct point M(29, 31)
(c) Suitable line should pass through M and have nearly as many crosses (plotted points)
below it as above it.
(d) Accept only value (including non-integers) obtained using line of best fit.

Question

[Maximum mark: 5] [without GDC]
A group of 15 students was given a test on mathematics. The students then played a
computer game. The diagram below shows the scores on the test and the game.

The point M corresponding to the means has coordinates (56.9, 45.9).
(a) Describe the relationship between the two sets of scores.
(b) On the diagram draw the straight line of best fit given that it passes through the
point (0, 69).
Jane took the tests late and scored 45 at mathematics.
(c) Using your graph or otherwise, estimate the score Jane expects on the computer
game, giving your answer to the nearest whole number.

▶️Answer/Explanation

Ans.

(a) The scores are negatively correlated
(b)

Line must be drawn straight. It must pass through (0, 69).
It must pass through the mean point M = (56.9, 45.9).

(c) 51 is closest. (low 50 or 52)

Question

[Maximum mark: 6] [with GDC]
The following table gives the heights and weights of five sixteen-year-old boys.

(a) Find
(i) the mean height;
(ii) the mean weight.
(b) Plot the above data on the grid below and draw the line of best fit.

▶️Answer/Explanation

Ans.

(a)   (i) \(\frac{182+173+162+178+190}{5}\)= 177 cm

(ii)  \(\frac{73+68+60+66+75}{5}\)= 68.4 kg

Question

[Maximum mark: 6] [with GDC]
Eight students in Mr. O’Neil’s Physical Education class did pushups and situps. Their
results are shown in the following table.

The graph below shows the results for the first seven students.

(a) Plot the results for the eighth student on the graph.
(b) Find \(\bar{x}\)  and \(\bar{y}\) , and draw a line of best fit on the graph.
(c) A student can do 60 pushups. How many situps can the student be expected to
do?

▶️Answer/Explanation

Ans.

(a) on the graph
(b) \(\bar{x}\) = 34 and \(\bar{y}\)= 40

Notes: a line through (34, 40), a reasonable line.
(c) 50 situps (allow ±2)

Question

[Maximum mark: 4] [with GDC]
Ten students were given two tests, one on Mathematics and one on English.
The table shows the results of the tests for each of the ten students.

(a) Find correct to two decimal places, the correlation coefficient (r).
(b) Use your result from part (a) to comment on the statement:
‘Those who do well in Mathematics also do well in English.

▶️Answer/Explanation

Ans.

(a) \(r\) = 0.6399706… ≈ 0.64 (2 d.p.)
(b) There is a degree of positive correlation between scores in Maths and scores in English.
Therefore those who do well in Mathematics are likely to do well in English also. (Or
equivalent statements.)

Question

[Maximum mark: 6] [with GDC]
The following table gives the amount of fuel in a car’s fuel tank, and the number of
kilometres travelled after filling the tank.

(a) On the scatter diagram below, plot the remaining points.

The mean distance travelled is 421 km \((\bar{x})\), and the mean amount of fuel in the tank is
28 litres \((\bar{y})\). This point is plotted on the scatter diagram.
(b) Sketch the line of best fit.
(c) A car travelled 350km. Use your line above to estimate the amount of fuel left in
the tank.

▶️Answer/Explanation

Ans.

(a)

(b) Straight line with –ve gradient passing through the mean
intercept on y-axis between 50 and 55
(c) 32 (read answer from candidate’s line)

Question

[Maximum mark: 7] [with GDC]
It is decided to take a random sample of 10 students to see if there is any linear
relationship between height and shoe size. The results are given in the table below.

(a) Write down the equation of the regression line of shoe size \((y)\) on height \((x)\),
giving your answer in the form \(y = mx + c\).
(b) Use your equation in part (a) to predict the shoe size of a student who is 162 cm
in height.
(c) Write down the correlation coefficient.
(d) Describe the correlation between height and shoe size.

▶️Answer/Explanation

Ans.

(a) \(y\) = 0.070x – 3.22
(b) \(y\) = 0.070 × 162 – 3.22 = 8.12 Therefore shoe size 8 or 9 (8.12).
(c) \(r\) = 0.681
(d) Moderately strong, positive correlation.

Question

[Maximum mark: 8] [with GDC]
The Type Fast secretarial training agency has a new computer software spreadsheet
package. The agency investigates the number of hours it takes people of varying ages
to reach a level of proficiency using this package. Fifteen individuals are tested and the
results are summarised in the table below.

(a) (i) Find the correlation coefficient \(r\) for this data.
(ii) What does the value of the correlation coefficient suggest about the
relationship between the two variables?
(b) Write down the equation of the regression line for \(y\) on \(x\) in the form \(y = ax + b\).
(c) Use your equation for the regression line to predict
(i) the time that it would take a 30 year old person to reach proficiency, giving
your answer correct to the nearest hour;
(ii) the age of a person who would take 8 hours to reach proficiency, giving
your answer correct to the nearest year.

▶️Answer/Explanation

Ans.

(a) (i)  \(r\) = 0.935 (3 s.f.)
(ii) it suggests a strong positive correlation between the two variables.
(b) \(y\) = 0.291x + 1.56
(c) (i) \(y\) = 0.291 × 30 + 1.56 = 10.29 = 10 hours
(ii)    8 = 0.291x + 1.56
\(x\) = 22.13 = 22 years

Question

[Maximum mark: 12] [with GDC]
The heights and weights of 10 students selected at random are shown in the table
below.

(a) Plot this information on a scatter graph. Use a scale of 1 cm to represent 20 cm
on the x-axis and 1 cm to represent 10 kg on the y-axis.
(b) Calculate the mean height.
(c) Calculate the mean weight.
(d) (i) Find the equation of the line of best fit.
(ii) Draw the line of best fit on your graph.
(e) Use your line to estimate
(i) the weight of a student of height 190 cm;
(ii) the height of a student of weight 72 kg.
(f) It is decided to remove the data for student number 10 from all calculations.
Explain briefly what effect this will have on the line of best fit.

▶️Answer/Explanation

Ans.

(a)

(b) Mean height = 166.1 = 166 (3 s.f.)
(c) Mean weight = 74.9 (3 s.f.)
(d) (i) \(y\) = 0.276x + 29.1
(ii) Line on graph.      Note: y-intercept at 29.1, straight line through (166, 74.9).
(e) (i) \(y\) = 0.276 × 190 + 29.1= 81.5 kg
(ii) 72 = 0.276x + 29.1 ⇒ x=\(\frac{72-29.1}{0.276}\)= 155 cm.
OR  From the graph     (i) \(y\) = 81 (±1)             (ii) \(x\) = 155 (±1)

(f) The ‘line of best fit’ becomes closer to the remaining points.
OR Gradient becomes steeper and the line is more accurate ‘best fit’.
OR Any reasonable explanation. (Line becomes \(y\) = 1.10x – 113)

Question

[Maximum mark: 12] [without GDC]
A shopkeeper wanted to investigate whether or not there was a correlation between the
prices of food 10 years ago in 1992, with their prices today. He chose 8 everyday items
and the prices are given in the table below.

(a) Calculate the mean and the standard deviation of the prices
(i) in 1992;
(ii) in 2002.
(b) (i) Find the correlation coefficient.
(ii) Comment on the relationship between the prices.
(c) Find the equation of the line of the best fit in the form \(y = mx + c\).
(d) What would you expect to pay now for an item costing $2.60 in 1992?
(e) Which item would you omit to increase the correlation coefficient?

▶️Answer/Explanation

Ans.

(a) (i)    1992 mean = \(\$\)1.59,  Sd = \(\$\)0.727 (or 0.73)             (accept 0.777 or 0.78)
(ii)    2002 mean = \(\$\)1.98,  Sd = \(\$\)0.635 (or 0.64)           (accept 0.679 or 0.68)
(b) (i) r = 0.672
(ii) There is a weak positive correlation
(c) \(y\) = 0.588x + 1.05
(d) \(y\) = 0.582 × 2.60 + 1.05 = $2.56
OR   \(y\) = 0.587 × 2.60 + 1.05 = $2.58
OR   \(y\) = 0.588 × 2.60 + 1.05 = $2.58
(e) Coffee – because it is the only item to go down in price.
OR
Rolls – because the price increased significantly.

Question

[Maximum mark: 12] [without GDC]
A shopkeeper wanted to investigate whether or not there was a correlation between the
prices of food 10 years ago in 1992, with their prices today. He chose 8 everyday items
and the prices are given in the table below.

(a) Calculate the mean and the standard deviation of the prices
(i) in 1992;
(ii) in 2002.
(b) (i) Find the correlation coefficient.
(ii) Comment on the relationship between the prices.
(c) Find the equation of the line of the best fit in the form \(y = mx + c\).
(d) What would you expect to pay now for an item costing $2.60 in 1992?
(e) Which item would you omit to increase the correlation coefficient?

▶️Answer/Explanation

Ans.

(a) (i)    1992 mean = \(\$\)1.59,  Sd = \(\$\)0.727 (or 0.73)             (accept 0.777 or 0.78)
(ii)    2002 mean = \(\$\)1.98,  Sd = \(\$\)0.635 (or 0.64)           (accept 0.679 or 0.68)
(b) (i) r = 0.672
(ii) There is a weak positive correlation
(c) \(y\) = 0.588x + 1.05
(d) \(y\) = 0.582 × 2.60 + 1.05 = $2.56
OR   \(y\) = 0.587 × 2.60 + 1.05 = $2.58
OR   \(y\) = 0.588 × 2.60 + 1.05 = $2.58
(e) Coffee – because it is the only item to go down in price.
OR
Rolls – because the price increased significantly.

Question

[Maximum mark: 12] [without GDC]
The following are the results of a survey of the scores of 10 people on both a
mathematics \((x)\) and a science \((y)\) aptitude test:

(a) Plot this information on a scatter graph. Use a scale of 1 cm to represent 10 units
on the x-axis and 1 cm to represent 10 units on the y-axis.
(b) Find and plot the point M \(( \bar{x} , \bar{y} )\) on the graph.
(c) Find the equation of the regression line of y on x in the form \(y = ax + b\).
(d) Graph this line on the above graph.
(e) Given that a student receives an 88 on the mathematics test, what would you
expect this student’s science score to be? Show how you arrived at your result.

▶️Answer/Explanation

Ans.

(a)

(b) Point M(73,78) plotted correctly.
(c) \(y\) = 0.359x + 51.8
(d) Reasonable line of best fit. Note: going through M, y intercept anywhere from 50 to 54
(e) \(y\) = 0.359 × 88 + 51.8 = 83
OR
\(y\) = 83 (±2) if read from the graph and method is shown.

Scroll to Top