IB Mathematics AI SL Spearman's Rank MAI Study Notes - New Syllabus
IB Mathematics AI SL Spearman’s Rank MAI Study Notes
LEARNING OBJECTIVE
- Spearman’s rank correlation coefficient, rs
Key Concepts:
- Spearman’s rank correlation coefficient, rs
- IBDP Maths AI SL- IB Style Practice Questions with Answer-Topic Wise-Paper 1
- IBDP Maths AI SL- IB Style Practice Questions with Answer-Topic Wise-Paper 2
- IB DP Maths AI HL- IB Style Practice Questions with Answer-Topic Wise-Paper 1
- IB DP Maths AI HL- IB Style Practice Questions with Answer-Topic Wise-Paper 2
- IB DP Maths AI HL- IB Style Practice Questions with Answer-Topic Wise-Paper 3
SPEARMAN’S RANK CORRELATION COEFFICIENT
Spearman’s Rank Correlation Coefficient (rₛ)
Spearman’s rank correlation coefficient (denoted as $r_s$) is a non-parametric measure of correlation that assesses how well the relationship between two variables can be described using a monotonic function (i.e., as one variable increases, the other tends to increase or decrease).
$r_s = 1 – \frac{6 \sum d_i^2}{n(n^2 – 1)}$
$d_i$ = difference between the ranks of corresponding variables
$n$ = number of data pairs
Interpreting Spearman’s rₛ
The value of Spearman’s rank correlation coefficient ranges from $-1$ to $+1,$ and its interpretation is based on the strength and direction of a monotonic relationship:
Notice: Also consider p-values and scatterplots to confirm significance.
The table below shows the rankings of 6 students in two subjects: Math and English.
Student | Math Rank (X) | English Rank (Y) | d = X – Y | d² |
---|---|---|---|---|
A | 1 | 2 | -1 | 1 |
B | 2 | 1 | 1 | 1 |
C | 3 | 4 | -1 | 1 |
D | 4 | 3 | 1 | 1 |
E | 5 | 5 | 0 | 0 |
F | 6 | 6 | 0 | 0 |
▶️ Answer/Explanation
Calculate the sum of squared differences:
\( \sum d^2 = 1 + 1 + 1 + 1 + 0 + 0 = 4 \)
Number of data pairs: \( n = 6 \)
Spearman’s rank correlation coefficient:
\( r_s = 1 – \frac{6 \sum d^2}{n(n^2 – 1)} = 1 – \frac{6 \times 4}{6(36 – 1)} = 1 – \frac{24}{210} = 1 – 0.1143 = \rm{0.8857} \)
Interpretation: Since \( r_s \approx 0.89 \), there is a strong positive correlation between the students’ rankings in Math and English.
Data: The table below shows two variables, X and Y, representing the performance of 7 students in two different subjects.
Student | X (Math Score) | Y (Physics Score) |
---|---|---|
A | 85 | 82 |
B | 78 | 75 |
C | 92 | 88 |
D | 70 | 74 |
E | 88 | 90 |
F | 76 | 70 |
G | 80 | 78 |
▶️ Answer/Explanation
2. Enter
X
values into L1, Y
values into L23. Use a ranking tool or sort to create ranked lists (L3 = rank(L1), L4 = rank(L2)) — on Casio, use the Statistics mode to rank data directly.
4. Calculate d = L3 − L4 and then d² in another list.
5. Use the formula:
\( r_s = 1 – \dfrac{6 \sum d^2}{n(n^2 – 1)} \)
On TI-Nspire or Casio ClassPad, there is a built-in Spearman rank correlation feature:
MENU → Statistics → Regression → Spearman Rank
Choose your two lists and press OK.
Result on GDC:
\( r_s ≈ 0.964 \) → strong positive correlation between math and physics scores.
HANDLING TIED RANKS
Handling Tied Ranks
Tied Ranks Problem:
When two or more values are the same, assigning a unique rank to each becomes problematic.
Assign the average rank to all tied values.
Example Data: \(x = [10, 10, 30]\) Ranks before tie handling: 1, 2, 3 Since two 10s are tied at rank positions 1 and 2: What is the final ranks. ▶️ Answer/Explanation$\text{Average Rank} = \frac{1 + 2}{2} = 1.5 $ Final ranks: \([1.5, 1.5, 3]\) |
Adjustment in Formula:
If many ties occur, especially in large datasets, a correction factor or a more complex version of the formula may be used:
$
r_s = \frac{\text{Cov}(R_X, R_Y)}{\sigma_{R_X} \cdot \sigma_{R_Y}}
$
Where $R_X, R_Y$ are the ranks of the variables.
Comparison with Pearson’s Correlation Coefficient
Feature | Pearson’s r | Spearman’s rs |
---|---|---|
Type of Relationship | Linear | Monotonic (not necessarily linear) |
Data Type | Interval/Ratio | Ordinal, Interval, Ratio |
Outlier Sensitivity | High | Low |
Assumes Normality | Yes | No |
Rank-based | No | Yes |
Example:
If $x = [1, 2, 3, 4, 100]$ and $y = [1, 2, 3, 4, 5]$, Pearson’s r may be distorted by the extreme value (100), but Spearman’s rₛ will still capture the monotonic trend.
Data: The following data shows the time spent studying (in hours) and test scores of 6 students.
Student | Hours Studied (X) | Test Score (Y) |
---|---|---|
A | 1 | 52 |
B | 2 | 60 |
C | 3 | 70 |
D | 4 | 65 |
E | 5 | 80 |
F | 6 | 85 |
▶️ Answer/Explanation
Using GDC or spreadsheet software:
Pearson’s r ≈ 0.976
This indicates a strong positive linear correlation.
Spearman’s Rank Correlation Coefficient (rₛ)
Convert data to ranks:
- Hours Studied: already in order → ranks 1 to 6
- Test Scores: ranks → 1, 2, 4, 3, 5, 6
$ r_s = 1 – \frac{6 \sum d_i^2}{n(n^2 – 1)} = 1 – \frac{6(1^2 + 0^2 + 0^2 + 1^2 + 0^2 + 0^2)}{6(36 – 1)} = 1 – \frac{12}{210} = 0.943 $
Spearman’s rₛ ≈ 0.943
Conclusion:
Pearson’s r is higher because the data is almost perfectly linear.
Spearman’s rₛ is slightly lower due to the minor rank changes caused by the score for Student D.
Both suggest a strong positive relationship, but Pearson measures linearity and Spearman measures monotonicity.
OUTLIER
Outlier
An outlier is a single data point that goes far outside the average value of a group of statistics. Outliers may be exceptions that stand outside individual samples of populations as well. In a more general context, an outlier is an individual that is markedly different from the norm in some respect.
Causes:
Measurement error
Natural variation
Data entry mistakes
Example:
Dataset: $5, 6, 7, 8, 100$→ 100 is an outlier
Effect of Outliers
Method | Effect of Outliers |
Pearson’s r | Strongly affected |
Spearman’s rₛ | Less affected (uses ranks) |
Outliers change the magnitude of Pearson’s r by pulling the regression line toward themselves.
Spearman’s rₛ, using ranks, ignores raw values and only considers ordering, making it more robust.
Appropriateness and Limitations
Use Spearman’s rₛ when: | Limitations: |
|
|
Choosing the Right Correlation Method
Situation | Use |
Linear relationship, no outliers | Pearson’s r |
Monotonic, non-linear | Spearman’s rₛ |
Ordinal data | Spearman’s rₛ |
Outliers present | Spearman’s rₛ preferred |
Non-monotonic | Neither (consider other methods) |