Home / IB DP Maths / Application and Interpretation HL / IB Mathematics AI SL Measures of central tendency MAI Study Notes

IB Mathematics AI SL Measures of central tendency MAI Study Notes - New Syllabus

IB Mathematics AI SL Measures of central tendency MAI Study Notes

LEARNING OBJECTIVE

  • Measures of central tendency (mean, median and mode).

Key Concepts: 

  • Outliers
     
  • Univariate Data
     
  • Interpreting Data

MAI HL and SL Notes – All topics

MEASURES OF CENTRAL TENDENCY AND SPREAD

Consider the following numerical data:
10, 20, 20, 20, 30, 30, 40, 50, 70, 70, 80
The total number of entries is \( n = 11 \).

To describe these data, we use:
3 measures of central tendency
3 measures of spread

The first three measures indicate a representative central value that best describes the data, while the second three measures indicate whether the data are close to each other or dispersed.

MEASURES OF CENTRAL TENDENCY (The 3 M’s)

A) MEAN = The sum of all values divided by \( n \).
Here,
$
\text{mean} = \frac{10 + 20 + 20 + 20 + 30 + 30 + 40 + 50 + 70 + 70 + 80}{11} = 40
$

B) MODE = The most frequent value.
Here,
$
\text{mode} = 20
$

C) MEDIAN = The value in the middle (provided the data are placed in ascending order).
Here, it is the sixth number in the list:
$
\text{median} = 30
$

NOTICE
 For the data 10, 20, 30:
$
\text{Median} = 20
$
For the data 10, 20, 30, 40:
$
\text{Median} = 25 \quad (\text{mean of the two middle values})
$
For an even number of data, the median is the mean of the two middle values.

The median is not the \(\frac{n}{2}\)-th entry as one might expect. It is the \(\frac{n+1}{2}\)-th entry.

For example:
If \( n = 11 \), \(\frac{n+1}{2} = 6\), so the median is the 6th entry.
If \( n = 10 \), \(\frac{n+1}{2} = 5.5\), so the median is the mean of the 5th and 6th entries.

The median is also denoted by \( Q_2 \).

The mean is denoted by \( \mu \) (for the whole population) or \( \bar{x} \) (for a sample).
If the data are \( x_1, x_2, \ldots, x_n \), the mean is:
$
\mu = \frac{x_1 + x_2 + \cdots + x_n}{n}
$

EXAMPLE 1
Find:
a) The integers \( a \leq b \leq c \), given that mean = 4, mode = 5, median = 5.
Solution:
The median implies \( b = 5 \). The mode implies \( c = 5 \).
Then:
$
\frac{a + 5 + 5}{3} = 4 \implies a + 10 = 12 \implies a = 2
$
Thus, the numbers are 2, 5, 5.

b) The integers \( a \leq b \leq c \leq d \), given that mean = 5, mode = 7, median = 6.
Solution:
The median implies either \( b = c = 6 \) or \( b = 5 \) and \( c = 7 \).
Since the mode is 7, we have \( b = 5 \) and \( c = d = 7 \).
Then:
$
\frac{a + 5 + 7 + 7}{4} = 5 \implies a + 19 = 20 \implies a = 1
$
Thus, the numbers are 1, 5, 7, 7.

MEASURES OF SPREAD

 

We use the same set of data:
10, 20, 40, 50, 60, 70, 80

A) STANDARD DEVIATION
The standard deviation measures how far the entries are from the mean. It can be found using a GDC.
For this example, the GDC gives \( \sigma = 22.96 \).

B) RANGE = (maximum value) – (minimum value).
Here:
$
\text{range} = 80 – 10 = 70
$

C) INTERQUARTILE RANGE (IQR) = \( Q_3 – Q_1 \), where:
\( Q_1 \) = Lower quartile = median of the values before \( Q_2 \).
\( Q_3 \) = Upper quartile = median of the values after \( Q_2 \).

Here:
Before the median \( Q_2 = 50 \), we have 5 numbers. Thus:
$
Q_1 = 40 \quad (\text{3rd entry})
$
After the median, the upper quartile is:
$
Q_3 = 70 \quad (\text{3rd entry from the end})
$
Therefore:
$
\text{IQR} = 70 – 20 = 50
$

EXAMPLE 2
Estimating \( Q_1, Q_2, Q_3 \):
a) For \( n = 7 \) entries: 10, 20, 30, 40, 50, 60, 70
Median \( Q_2 = 40 \) (4th entry).
\( Q_1 = 20 \), \( Q_3 = 60 \).

b) For \( n = 8 \) entries: 10, 20, 30, 40, 50, 60, 70, 80
Median \( Q_2 = 45 \) (mean of 4th and 5th entries).
\( Q_1 = 25 \), \( Q_3 = 65 \).

c) For \( n = 9 \) entries: 10, 20, 30, 40, 50, 60, 70, 80, 90
Median \( Q_2 = 50 \) (5th entry).
\( Q_1 = 25 \), \( Q_3 = 75 \).

d) For \( n = 10 \) entries: 10, 20, 30, 40, 50, 60, 70, 80, 90, 100
Median \( Q_2 = 55 \) (mean of 5th and 6th entries).
\( Q_1 = 30 \), \( Q_3 = 80 \).

NOTICE
The square of the standard deviation is called variance:
$
\text{variance} = \sigma^2 \quad \text{or} \quad s_n^2
$
For the example, \( \sigma^2 = 22.96^2 = 527.27 \).

USE OF GDC
We can use the GDC to obtain all these measures. For Casio CFX:
 MENU → STAT → Enter data in List 1 → CALC → 1VAR → Obtain statistics.

FORMULAS FOR VARIANCE AND STANDARD DEVIATION

If the data are \( x_1, x_2, \ldots, x_n \):
$
\sigma^2 = \frac{\sum (x_i – \mu)^2}{n}
$
$
\sigma = \sqrt{\frac{\sum (x_i – \mu)^2}{n}}
$
An alternative formula for variance:
$
\sigma^2 = \frac{\sum x_i^2}{n} – \bar{x}^2
$

EXAMPLE 3

For the data: 10, 20, 20, 30, 50, 40, 50, 70, 80
The box plot marks:
 Min = 10
 \( Q_1 = 20 \)
\( Q_2 = 40 \)
\( Q_3 = 70 \)
 Max = 80

1) Percentiles:
\( Q_1 \): 25th percentile
\( Q_2 \): 50th percentile
\( Q_3 \): 75th percentile

2) Outliers:
An outlier is any value:
Below \( Q_1 – 1.5 \times \text{IQR} \)
Above \( Q_3 + 1.5 \times \text{IQR} \)
For the example:
$
Q_1 – 1.5 \times \text{IQR} = 20 – 1.5 \times 50 = -55
$
$
Q_3 + 1.5 \times \text{IQR} = 70 + 1.5 \times 50 = 145
$
Thus, there are no outliers.

FREQUENCY TABLES – GROUPED DATA

Consider the numerical data again:
10, 20, 20, 20, 30, 30, 40, 50, 70, 70, 80
The total number of entries is \( n = 11 \).

An alternative presentation is the frequency table:

 

MEASURES OF CENTRAL TENDENCY

A) MEAN:
$
\text{mean} = \frac{1 \times 10 + 3 \times 20 + 2 \times 30 + 1 \times 40 + 1 \times 50 + 2 \times 70 + 1 \times 80}{11} = 40
$
In general:
$
\mu = \frac{\sum f_i x_i}{n}
$

B) MODE:
The entry with the highest frequency:
$
\text{mode} = 20
$

C) MEDIAN:
The median is the entry in position \( \frac{n+1}{2} = 6 \). Using cumulative frequencies:

The 6th entry is 30, so:
$
\text{median} = 30
$

MEASURES OF SPREAD

A) STANDARD DEVIATION:
From the GDC: \( \sigma = 22.96 \).

B) RANGE:
$
\text{range} = 80 – 10 = 70
$

C) INTERQUARTILE RANGE (IQR):
Using cumulative frequencies:
\( Q_1 \): The median of the first 5 entries (position 3) = 20.
\( Q_3 \): The median of the last 5 entries (position 9) = 70.
Thus:
$
\text{IQR} = 70 – 20 = 50
$

USE OF GDC
For Casio CFX:
MENU → STAT → Enter data in List 1 and frequencies in List 2 → CALC → SET → 1VAR → Obtain statistics.

GROUPED DATA
Suppose 100 students took an exam with scores from 1 to 60:

Mean and Standard Deviation:
Using midpoints:
$
\mu = \frac{8 \times 5 + 12 \times 15 + 10 \times 25 + 25 \times 35 + 35 \times 45 + 10 \times 55}{100} = 34.7
$
From the GDC:
$
\mu = 34.7, \quad \sigma = 14.31
$

Modal Group:
The interval with the highest frequency:
$
40 < x \leq 50
$

Median and Quartiles:

 
Using cumulative frequency diagram:
 \( Q_1 \): 25th percentile ≈ 25
 \( Q_2 \): 50th percentile ≈ 38
\( Q_3 \): 75th percentile ≈ 46

Box and Whisker Plot:

Outliers:
$
\text{IQR} = 46 – 25 = 21
$
$
Q_1 – 1.5 \times \text{IQR} = 25 – 1.5 \times 21 = -6.5
$
$
Q_3 + 1.5 \times \text{IQR} = 46 + 1.5 \times 21 = 77.5
$
No outliers exist.

Scroll to Top