Home / IBDP Maths AI: Topic: SL 4.4: Linear correlation: IB style Questions SL Paper 2

IBDP Maths AI: Topic: SL 4.4: Linear correlation: IB style Questions SL Paper 2

Question 1 [Maximum mark: 18]

As part of his mathematics exploration about classic books, Jason investigated the time taken by students in his school to read the book The Old Man and the Sea. He collected his data by

stopping and asking students in the school corridor, until he reached his target of 10 students from each of the literature classes in his school.

State which of the two sampling methods, systematic or quota, Jason has used. [1]

Jason constructed the following box and whisker diagram to show the number of hours students in the sample took to read this book.

Write down the median time to read the book. [1]

Calculate the interquartile range. [2]

Mackenzie, a member of the sample, took 25 hours to read the novel. Jason believes Mackenzie’s time is not an outlier.

Determine whether Jason is correct. Support your reasoning. [4] For each student interviewed, Jason recorded the time taken to read The Old Man and the

Sea ( x ), measured in hours, and paired this with their percentage score on the final exam ( y ).

These data are represented on the scatter diagram.

Describe the correlation. [1]

Jason correctly calculates the equation of the regression line y on x for these students to be

y = -1.54x + 98.8 .

He uses the equation to estimate the percentage score on the final exam for a student who read the book in 1.5 hours.

Find the percentage score calculated by Jason. [2]

State whether it is valid to use the regression line y on x for Jason’s estimate. Give a  reason for your answer. [2]

Jason  found a website that rated the ‘top 50’ classic books. He randomly chose eight of these classic books and recorded the number of pages. For example, Book H is rated 44th and has 281 pages. These data are shown in the table.

Jason intends to analyse the data using Spearman’s rank correlation coefficient, rs .

Copy and complete the information in the following table. [2]

(i) Calculate the value of rs .

(ii) Interpret your result. [3]

Answer/Explanation

(a) Quota sampling

(b) 10(hours)   (c) 15 – 7 = 8 

(d) indication of a valid attempt to find the upper fence 15 + 1.5 × 8 = 27 25 27 < (accept equivalent answer in words) Jason is correct

(e) “negative” seen (f) correct substitution y =−  1.54 ×  1.5 + 98.8 96.5 (%) (96.49)

(g) not reliable extrapolation OR outside the given range of the data

(h)

(i) (i) 0.714 (0.714285…)

(ii) EITHER
there is a (strong/moderate) positive association between the number of pages an the top 50 rating. OR there is a (strong/moderate) agreement between the rank order of number of pages and the rank order top 50 rating. OR there is a (strong/moderate) positive (linear) correlation between the rank order of number of pages and the rank order top 50 rating.

Question


    Boris recorded the number of daylight hours on the first day of each month in a northern
    hemisphere town.
    This data was plotted onto a scatter diagram. The points were then joined by a smooth curve,
    with minimum point (0 , 8) and maximum point (6 , 16) as shown in the following diagram.

                     

     Let the curve in the diagram be y = f (t), where t is the time, measured in months,
     since Boris first recorded these values.
     Boris thinks that f (t) might be modelled by a quadratic function.
        (a) Write down one reason why a quadratic function would not be a good model for
               the number of hours of daylight per day, across a number of years.                                   [1]

     Paula thinks that a better model is f (t) = a cos(bt) + d, t ≥ 0, for specific values of
     a, b and d.
         (b) For Paula’s model, use the diagram to write down
                (i) the amplitude.
                (ii) the period.
                (iii) the equation of the principal axis.                                                                                      [4]
         (c) Hence or otherwise find the equation of this model in the form:                                        [3]
                                  f (t) = a cos (bt) + d
         (d) For the first year of the model, find the length of time when there are more than
                10 hours and 30 minutes of daylight per day.                                                                         [4]
                The true maximum number of daylight hours was 16 hours and 14 minutes.
          (e) Calculate the percentage error in the maximum number of daylight hours Boris
                 recorded in the diagram.                                                                                                            [3]

Answer/Explanation

Ans

1. (a) EITHER
          annual cycle for daylight length                                                                                                        R1
          OR
          there is a minimum length for daylight (cannot be negative)                                                     R1
          OR
          a quadratic could not have a maximum and a minimum or equivalent                                    R1

Note: Do not accept “Paula’s model is better”.                                                                                         [1 mark]
   (b) (i) 4                                                                                                                                                            A1
         (ii) 12                                                                                                                                                         A1
         (iii) y =12                                                                                                                                                  A1A1

Note: Award A1 “ y = ( ) a constant ” and A1 for that constant being 12.                                            [4 marks]
   (c) f (t) = -4 cos (30t) + 12    OR    f (t) -4cos ( -30t ) + 12                                                                  A1A1A1

Note: Award A1 for b = 30 (or b = −30 ), A1 for a = −4 , and A1 for d =12 . Award at most
A1A1A0 if extra terms are seen or form is incorrect. Award at most A1A1A0 if x is
used instead of t.                                                                                                                                            [3 marks]

  (d) 10.5  -4cos (30t ) + 12                                                                                                                            (M1)
         EITHER
         t1 = 2.26585 …, t2 9.73414                                                                                                                    (A1)(A1)
         OR

          \(t_{1}=\frac{1}{30}\cos ^{-1}\frac{3}{8}\)                                                                                                  (A1)

          t1 = − 12 – t1                                                                                                                                             (A1)
        THEN
         9.73414… -2.26585 …
          7.47 (7.46828…) months (0.622356…years)                                                                                    A1

Note: Award M1A1A1A0 for an unsupported answer of 7.46. If there is only one
intersection point, award M1A1A0A0.                                                                                                        [4 marks]
                                                                                                                                                                             continued…

     (e)  \(\begin{vmatrix}\frac{16-\left ( 16+\frac{14}{60} \right )}{16+\frac{14}{60}}\end{vmatrix}\times 100%\)     (M1)(M1)

Note: Award M1 for correct values and absolute value signs, M1 for ×100 .
               =1.44% (1.43737…%)                                                                                                                        A1
                                                                                                                                                                             [3 marks]
                                                                                                                                                                             [Total 15 marks]

Question

Don took part in a project investigating wind speed, x km h1 , and the time, y minutes, to fully charge a solar powered robot.

The investigation was carried out six times. The results are recorded in the table.

Wind Speed, x , (km h1)

6

10

16

24

28

30

Time, y , (minutes)

28

26

30

33

38

37

    1. On graph paper, draw a scatter diagram to show the results of Don’s investigation.

      Use a scale of 1 cm to represent 2 units on the x-axis, and 1 cm to represent 5 units on the y-axis. [4]

    2. Calculate

      1. \(\bar{x}\) , the mean wind speed;

      2. \(\bar{y}\), the mean time to fully charge the robot. [2]

        M is the point with coordinates \(\bar{x}\) , \(\bar{y}\) .

    3. Plot and label the point M on your scatter diagram. [2]

    4.          (i) Calculate r , Pearson’s product–moment correlation coefficient.

      (ii) Describe the correlation between the wind speed and the time to fully charge the robot. [4]

    5.         (i) Write down the equation of the regression line y on x , in the form y = mx + c .

      (ii) Draw this regression line on your scatter diagram.

                           (iii) Hence or otherwise estimate the charging time when the wind speed is 27 km h1. [6]

        Don concluded from his investigation: “There is no causation between wind speed and the time to fully charge the robot”.

      1. In the context of the question, briefly explain the meaning of “no causation”. [1]

    Answer/Explanation

    Ans:

    (a)

    (b)

    (i) 19 (km h -1)

    (ii) 32 (minutes)

    (c) point in correct position, labelled M Note: Award (A1)(ft) for point plotted in correct position, (A1) for point labelled M Follow through from their part (b).

    (d)

    (i) (r=) 0.944 (0.943733..)

    (ii) (very) strong positive correlation

    (e)

    (i) y= 0.465x+23.2 (y= 0.465020…x+ 23.1646…)

    (ii) regression line through their M (ft) regression line through their (0, 23.2)

    (iii) (y = ) 0.465020…. (27) + 23.1646….. Note: Award (M1) for correct substitution into their regression equation. 35.7 (minutes) (35.7201…) OR an attempt to use their regression line to find the y value at x = 27 Note: Award (M1) for an indication of using their regression line. This must be illustrated by vertical and horizontal lines or marks at the correct place(s) on their scatter diagram. 35.7 (minutes)

    (f) wind speed does not cause a change in the time to charge (the robot)

    Question

    The figure below shows the lengths in centimetres of fish found in the net of a small trawler.

    Find the total number of fish in the net.[2]

    a.

    Find (i) the modal length interval,

    (ii) the interval containing the median length,

    (iii) an estimate of the mean length.[5]

    b.

    (i) Write down an estimate for the standard deviation of the lengths.

    (ii) How many fish (if any) have length greater than three standard deviations above the mean?[3]

    c.

    The fishing company must pay a fine if more than 10% of the catch have lengths less than 40cm.

    Do a calculation to decide whether the company is fined.[2]

    d.

    A sample of 15 of the fish was weighed. The weight, W was plotted against length, L as shown below.

    Exactly two of the following statements about the plot could be correct. Identify the two correct statements.

    Note: You do not need to enter data in a GDC or to calculate r exactly.

    (i) The value of r, the correlation coefficient, is approximately 0.871.

    (ii) There is an exact linear relation between W and L.

    (iii) The line of regression of W on L has equation W = 0.012L + 0.008 .

    (iv) There is negative correlation between the length and weight.

    (v) The value of r, the correlation coefficient, is approximately 0.998.

    (vi) The line of regression of W on L has equation W = 63.5L + 16.5.[2]

    e.
    Answer/Explanation

    Markscheme

    Total = 2 + 3 + 5 + 7 + 11 + 5 + 6 + 9 + 2 + 1     (M1)

    (M1) is for a sum of frequencies.

    = 51     (A1)(G2)[2 marks]

    a.

    Unit penalty (UP) is applicable where indicated in the left hand column.

    (i) modal interval is 60 – 70

    Award (A0) for 65     (A1)

    (ii) median is length of fish no. 26,     (M1)(A1)

    also 60 – 70     (G2)

    Can award (A1)(ft) or (G2)(ft) for 65 if (A0) was awarded for 65 in part (i).

    (iii) mean is \(\frac{{2 \times 25 + 3 \times 35 + 5 \times 45 + 7 \times 55 + …}}{{51}}\)     (M1)

    (UP) = 69.5 cm (3sf)     (A1)(ft)(G1)

    Note: (M1) is for a sum of (frequencies multiplied by midpoint values) divided by candidate’s answer from part (a). Accept mid-points 25.5, 35.5 etc or 24.5, 34.5 etc, leading to answers 70.0 or 69.0 (3sf) respectively. Answers of 69.0, 69.5 or 70.0 (3sf) with no working can be awarded (G1).[5 marks]

    b.

    Unit penalty (UP) is applicable where indicated in the left hand column.

    (UP) (i) standard deviation is 21.8 cm     (G1)

    For any other answer without working, award (G0). If working is present then (G0)(AP) is possible.

    (ii) \(69.5 + 3 \times 21.8 = 134.9 > 120\)     (M1)

    no fish     (A1)(ft)(G1)

    For ‘no fish’ without working, award (G1) regardless of answer to (c)(i). Follow through from (c)(i) only if method is shown. [3 marks]

    c.

    5 fish are less than 40 cm in length,     (M1)

    Award (M1) for any of \(\frac{5}{51}\), \(\frac{46}{51}\), 0.098 or 9.8%, 0.902, 90.2% or 5.1 seen.

    hence no fine.     (A1)(ft)

    Note: There is no G mark here and (M0)(A1) is never allowed. The follow-through is from answer in part (a).[2 marks]

    d.

    (i) and (iii) are correct.     (A1)(A1)[2 marks]

    e.

    Question

    The number of bottles of water sold at a railway station on each day is given in the following table.

    Write down

    (i)     the mean temperature;

    (ii)    the standard deviation of the temperatures.[2]

    a.

    Write down the correlation coefficient, \(r\), for the variables \(n\) and \(T\).[1]

    b.

    Comment on your value for \(r\).[2]

    c.

    The equation of the line of regression for \(n\) on \(T\) is \(n = dT – 100\).

    (i)     Write down the value of \(d\).

    (ii)    Estimate how many bottles of water will be sold when the temperature is \({19.6^ \circ }\).[2]

    d.

    On a day when the temperature was \({36^ \circ }\) Peter calculates that \(314\) bottles would be sold. Give one reason why his answer might be unreliable.[1]

    e.
    Answer/Explanation

    Markscheme

    (i)     19.2     (G1)

    (ii)    1.45     (G1)[2 marks]

    a.

    \(r = 0.942\)     (G1)[1 mark]

    b.

    Strong, positive correlation.     (A1)(ft)(A1)(ft)[2 marks]

    c.

    (i)     \(d = 11.5\)     (G1)

    (ii)    \(n = 11.5 \times 19.6 – 100\)

    \( = 125\) (accept \(126\))     (A1)(ft)

    Note: Answer must be a whole number.[2 marks]

    d.

    It is unreliable to extrapolate outside the values given (outlier).     (R1)[1 mark]

    e.

    Question

    In a mountain region there appears to be a relationship between the number of trees growing in the region and the depth of snow in winter. A set of 10 areas was chosen, and in each area the number of trees was counted and the depth of snow measured. The results are given in the table below.

    In a study on \(100\) students there seemed to be a difference between males and females in their choice of favourite car colour. The results are given in the table below. A \(\chi^2\) test was conducted.

    Use your graphic display calculator to find the mean number of trees.[1]

    A, a, i.

    Use your graphic display calculator to find the mean depth of snow.[1]

    A, a, iii.

    Use your graphic display calculator to find the standard deviation of the depth of snow.[1]

    A, a, iv.

    The covariance, Sxy = 188.5.

    Write down the product-moment correlation coefficient, r.[2]

    A, b.

    Write down the equation of the regression line of y on x.[2]

    A, c.

    If the number of trees in an area is 55, estimate the depth of snow.[2]

    A, d.

    Use the equation of the regression line to estimate the depth of snow in an area with 100 trees.[1]

    A, e, i.

    Decide whether the answer in (e)(i) is a valid estimate of the depth of snow in the area. Give a reason for your answer.[2]

    A, e, ii.

    Write down the total number of male students.[1]

    B, a.

    Show that the expected frequency for males, whose favourite car colour is blue, is 12.6.[2]

    B, b.

    The calculated value of \({\chi ^2}\) is \(1.367\) and the critical value of \({\chi ^2}\) is \(5.99\) at the \(5\%\) significance level.

    Write down the null hypothesis for this test.[1]

    B, c, i.

    The calculated value of \({\chi ^2}\) is \(1.367\) and the critical value of \({\chi ^2}\) is \(5.99\) at the \(5\%\) significance level.

    Write down the number of degrees of freedom.[1]

    B, c, ii.

    The calculated value of \({\chi ^2}\) is \(1.367\) and the critical value of \({\chi ^2}\) is \(5.99\) at the \(5\%\) significance level.

    Determine whether the null hypothesis should be accepted at the \(5\%\) significance level. Give a reason for your answer.[2]

    B, c, iv.
    Answer/Explanation

    Markscheme

    50     (G1)[1 mark]

    A, a, i.

    30.5     (G1)[1 mark]

    A, a, iii.

    12.3     (G1)

    Note: Award (A1)(ft) for 13.0 in (iv) but only if 17.7 seen in (a)(ii).[1 mark]

    A, a, iv.

    \(r = \frac{{188.5}}{{(16.79 \times 12.33)}}\)     (M1)

    Note: Award (M1) for using their values in the correct formula.

    = 0.911 (accept 0.912, 0.910)     (A1)(ft)(G2)[2 marks]

    A, b.

    y = 0.669x − 2.95     (G1)(G1)

    Note: Award (G1) for 0.669x, (G1) for −2.95. If the answer is not in the form of an equation, award at most (G1)(G0).[2 marks]

    A, c.

    Depth = 0.669 × 55 − 2.95     (M1)

    = 33.8     (A1)(ft)(G2)(ft)

    Note: Follow through from their (c) even if no working seen.[2 marks]

    A, d.

    64.0 (accept 63.95, 63.9)     (A1)(ft)(G1)(ft)

    Note: Follow through from their (c) even if no working seen.[1 mark]

    A, e, i.

    It is not valid. It lies too far outside the values that are given. Or equivalent.     (A1)(R1)

    Note: Do not award (A1)(R0).[2 marks]

    A, e, ii.

    28     (A1)[1 mark]

    B, a.

    \(\frac{{28 \times 45}}{{100}}\left( {\frac{{28}}{{100}} \times \frac{{45}}{{100}} \times 100} \right)\)     (M1)(A1)(ft)

    Note: Award (M1) for correct formula, (A1) for correct substitution.

    = 12.6     (AG)

    Note: Do not award (A1) unless 12.6 seen.[2 marks]

    B, b.

    the favourite car colour is independent of gender.     (A1)

    Note: Accept there is no association between gender and favourite car colour.

    Do not accept ‘not related’ or ‘not correlated’.[1 mark]

    B, c, i.

    \(2\)     (A1)[1 marks]

    B, c, ii.

    Accept the null hypothesis since \(1.367 < 5.991\)     (A1)(ft)(R1)

    Note: Allow “Do not reject”. Follow through from their null hypothesis and their critical value.

    Full credit for use of \(p\)-values from GDC [\(p = 0.505\)].

    Do not award (A1)(R0). Award (R1) for valid comparison.[2 marks]

    B, c, iv.

    Question

    In an environmental study of plant diversity around a lake, a biologist collected data about the number of different plant species (y) that were growing at different distances (x) in metres from the lake shore.

    Draw a scatter diagram to show the data. Use a scale of 2 cm to represent 10 metres on the x-axis and 2 cm to represent 10 plant species on the y-axis.[4]

    a.

    Using your scatter diagram, describe the correlation between the number of different plant species and the distance from the lake shore.[1]

    b.

    Use your graphic display calculator to write down \(\bar x\), the mean of the distances from the lake shore.[1]

    c.i.

    Use your graphic display calculator to write down \(\bar y\), the mean number of plant species.[1]

    c.ii.

    Plot the point (\(\bar x\), \(\bar y\)) on your scatter diagram. Label this point M.[2]

    d.

    Write down the equation of the regression line y on x for the above data.[2]

    e.

    Draw the regression line y on x on your scatter diagram.[2]

    f.

    Estimate the number of plant species growing 30 metres from the lake shore.[2]

    g.
    Answer/Explanation

    Markscheme

         (A1)(A3)

    Notes: Award (A1) for scales and labels (accept x/y).

    Award (A3) for all points correct.

    Award (A2) for 7 or 8 points correct.

    Award (A1) for 5 or 6 points correct.

    Award at most (A1)(A2) if points are joined up.

    If axes are reversed award at most (A0)(A3)(ft).[4 marks]

    a.

    Negative     (A1)[1 mark]

    b.

    17     (G1)[1 mark]

    c.i.

    23     (G1)[1 mark]

    c.ii.

    Point correctly placed and labelled M     (A1)(ft)(A1) Note: Accept an error of ±0.5.[2 marks]

    d.

    y = –0.708x + 35.0     (G1)(G1)

    Note: Award at most (G1)(G0) if y = not seen. Accept 35.[2 marks]

    e.

    Regression line drawn that passes through M and (0, 35)     (A1)(ft)(A1)(ft)

    Note: Award (A1) for straight line that passes through M, (A1) for line (extrapolated if necessary) that passes through (0, 35) (accept error of ±1).

    If ruler not used, award a maximum of (A1)(A0).[2 marks]

    f.

    y = –0.708(30) + 35.0     (M1)
    = 14 (Accept 13)     (A1)(ft)(G2)

    OR

    Using graph: (M1) for some indication on graph of point, (A1)(ft) for answers. Final answer must be consistent with their graph.     (M1)(A1)(ft)(G2)

    Note: The final answer must be an integer.[2 marks]

    g.
    Scroll to Top