IB Mathematics SL 4.10 Spearman’s rank correlation coefficient AI SL Paper 1- Exam Style Questions- New Syllabus
Question
At a running club, Aarav conducts a study to find if there is any association between an athlete’s age and their best time taken to run \(100\ \text{m}\). Eight athletes are selected at random, and their details are listed below.
Variable | A | B | C | D | E | F | G | H |
---|---|---|---|---|---|---|---|---|
Age (years) | 13 | 17 | 22 | 18 | 19 | 25 | 11 | 36 |
Time (seconds) | 13.4 | 14.6 | 13.4 | 12.9 | 12.0 | 11.8 | 17.0 | 13.1 |
Aarav decides to calculate the Spearman’s rank correlation coefficient for his set of data.
(a) Complete the table of ranks. [2]
Rank | A | B | C | D | E | F | G | H |
---|---|---|---|---|---|---|---|---|
Age rank | 3 | |||||||
Time rank | 1 |
(b) Calculate the Spearman’s rank correlation coefficient, \(r_s\). [2]
(c) Interpret this value of \(r_s\) in the context of the question. [1]
(d) Suggest a mathematical reason why Aarav may have decided not to use Pearson’s product–moment correlation coefficient with his data from the original table. [1]
▶️Answer/Explanation
Markscheme
(a)
Use descending ranks (largest value \(=\) rank \(1\)); average any ties. The completed ranks are:
Rank | A | B | C | D | E | F | G | H |
---|---|---|---|---|---|---|---|---|
Age rank | 7 | 6 | 3 | 5 | 4 | 2 | 8 | 1 |
Time rank | 3.5 | 2 | 3.5 | 6 | 7 | 8 | 1 | 5 |
A1 A1
(b)
Compute \(r_s\) as the Pearson correlation of the two rank lists (ties averaged). With \(n=8\), \(\bar R_x=\bar R_y=\dfrac{1+2+\cdots+8}{8}=4.5\).
Using the completed ranks: \[ \sum (R_x-\bar R_x)^2=42,\quad \sum (R_y-\bar R_y)^2=41.5,\quad \sum (R_x-\bar R_x)(R_y-\bar R_y)=-28. \] Hence \[ r_s \;=\; \frac{\sum (R_x-\bar R_x)(R_y-\bar R_y)}{\sqrt{\sum (R_x-\bar R_x)^2\,\sum (R_y-\bar R_y)^2}} \;=\; \frac{-28}{\sqrt{42\times 41.5}} \;=\; -0.670670\ldots \approx \boxed{-0.671}. \] A2
GDC method: enter the two rank lists (with ties averaged) and use the correlation function to obtain \(r_s\approx -0.671\).
Note: The shortcut \(1-\dfrac{6\sum d_i^2}{n(n^2-1)}\) is exact only when there are no ties; here there is a tie at \(13.4\ \text{s}\), so the ranked-Pearson (or technology) method is appropriate.
(c)
\(\boxed{\text{There is a negative correlation between age and best \(100\ \text{m}\) time in this sample.}}\) R1
(d)
A valid reason: the relationship may not be linear / the data need not be bivariate normal / Pearson’s \(r\) is sensitive to outliers / presence of equal values (ties). R1
Total Marks: 6
Question
The decathlon is a contest where athletes compete in ten events. Two of those events are long jump and high jump. In both events, a greater distance implies a better ranking.
The table lists results for these two events at the World Championships.
Athlete’s Country | Long Jump (m) | High Jump (m) | Long Jump Rank | High Jump Rank |
---|---|---|---|---|
Germany | 7.64 | 2.11 | 1 | |
France | 7.52 | 2.08 | 2 | |
Estonia | 7.49 | 1.84 | 3 | |
Canada | 7.44 | 2.02 | 4 | |
Netherlands | 7.33 | 2.05 | 5 | |
Ukraine | 7.28 | 2.02 | 6 | |
Algeria | 7.22 | 1.90 | 7 | |
Austria | 7.11 | 1.87 | 8 | |
Grenada | 6.98 | 1.99 | 9 | |
Japan | 6.64 | 1.96 | 10 |
The Spearman’s rank correlation coefficient is used to assess whether there is a linear correlation between an athlete’s ranking in long jump and their ranking in high jump.
(a) Complete the table to show the athletes’ rankings in high jump. [2]
(b) Find the value of the Spearman’s rank correlation coefficient \( r_s \). [2]
The following guide is used by the trainer to determine the strength of the correlation between the ranks for long jump and high jump.
\(|r_s|\) | Strength |
---|---|
0.000 to 0.199 | Very weak |
0.200 to 0.399 | Weak |
0.400 to 0.599 | Moderate |
0.600 to 0.799 | Strong |
0.800 to 1.000 | Very strong |
(c) State the strength of the correlation between the rankings as indicated by the table and interpret this in context. [2]
▶️Answer/Explanation
Markscheme
(a)
Rank the high-jump heights from largest (rank 1) to smallest. Ties share the average rank. The completed table is:
Country | Long Jump (m) | High Jump (m) | Long Jump Rank | High Jump Rank |
---|---|---|---|---|
Germany | 7.64 | 2.11 | 1 | 1 |
France | 7.52 | 2.08 | 2 | 2 |
Estonia | 7.49 | 1.84 | 3 | 10 |
Canada | 7.44 | 2.02 | 4 | 4.5 |
Netherlands | 7.33 | 2.05 | 5 | 3 |
Ukraine | 7.28 | 2.02 | 6 | 4.5 |
Algeria | 7.22 | 1.90 | 7 | 8 |
Austria | 7.11 | 1.87 | 8 | 9 |
Grenada | 6.98 | 1.99 | 9 | 6 |
Japan | 6.64 | 1.96 | 10 | 7 |
A1 A1
[2 marks]
(b)
Let \(n=10\). Using Spearman’s formula \[ r_s = 1 – \frac{6\sum d_i^2}{n(n^2-1)},\quad d_i=(\text{LJ rank})-(\text{HJ rank}). \] Compute differences and squares (in LJ order):
Germany \(0,0\); France \(0,0\); Estonia \(-7,49\); Canada \(-0.5,0.25\); Netherlands \(2,4\); Ukraine \(1.5,2.25\); Algeria \(-1,1\); Austria \(-1,1\); Grenada \(3,9\); Japan \(3,9\).
Hence \(\sum d_i^2=75.5\) and \[ r_s=1-\frac{6(75.5)}{10(10^2-1)} = 1-\frac{453}{990}\approx 0.542. \] So \(r_s\approx \mathbf{0.541}\) (3 s.f.). A2
Germany \(0,0\); France \(0,0\); Estonia \(-7,49\); Canada \(-0.5,0.25\); Netherlands \(2,4\); Ukraine \(1.5,2.25\); Algeria \(-1,1\); Austria \(-1,1\); Grenada \(3,9\); Japan \(3,9\).
Hence \(\sum d_i^2=75.5\) and \[ r_s=1-\frac{6(75.5)}{10(10^2-1)} = 1-\frac{453}{990}\approx 0.542. \] So \(r_s\approx \mathbf{0.541}\) (3 s.f.). A2
[2 marks]
(c)
Since \(|r_s|=0.541\) lies in \(0.400\)–\(0.599\), the correlation is moderate.
Interpretation: athletes who place well in long jump tend to place fairly well in high jump (and vice versa), though the relationship is not extremely strong. A1 A1
Interpretation: athletes who place well in long jump tend to place fairly well in high jump (and vice versa), though the relationship is not extremely strong. A1 A1
[2 marks]
Total Marks: 6