Home / IB DP Maths 2026, 2027 & 2028 / IB Math Analysis and Approach HL / MAA HL Study Notes / Intersections of a line with a planes Study Notes

IB Mathematics AA Equation of the regression line Study Notes

IB Mathematics AA Equation of the regression line Study Notes

IB Mathematics AA Equation of the regression line Study Notes

IB Mathematics AA Equation of the regression line Notes Offer a clear explanation of Equation of the regression line, its mean and variance, including various formula, rules, exam style questions as example to explain the topics. Worked Out examples and common problem types provided here will be sufficient to cover for topic Equation of the regression line, its mean and variance.

Equation of the Regression Line of \( x \) on \( y \)

Equation of the Regression Line of \( x \) on \( y \)

The regression line of \( x \) on \( y \) is used to predict \( x \) values given \( y \). It has the form:

\( x = a y + b \)

How to calculate the coefficients:

The slope \( a \) is:

\( a = r \frac{\sigma_x}{\sigma_y} \)

where:

    • \( r \) is the correlation coefficient
    • \( \sigma_x \) is the standard deviation of \( x \)
    • \( \sigma_y \) is the standard deviation of \( y \)

The intercept \( b \) is:

\( b = \bar{x} – a \bar{y} \)

where \( \bar{x} \) and \( \bar{y} \) are the means of \( x \) and \( y \).

This line minimizes the sum of squares of horizontal distances (errors in \( x \)) from the data points to the line.

Example:

The variables \( x \) and \( y \) have means \( \bar{x} = 10 \), \( \bar{y} = 5 \), standard deviations \( \sigma_x = 4 \), \( \sigma_y = 2 \), and correlation coefficient \( r = 0.75 \). Find the regression line of \( x \) on \( y \).

▶️ Answer/Explanation

\( a = 0.75 \times \frac{4}{2} = 0.75 \times 2 = 1.5 \)

\( b = 10 – 1.5 \times 5 = 10 – 7.5 = 2.5 \)

Final regression line:

\( x = 1.5 y + 2.5 \)

Example:

In a biology experiment, the lengths \( x \) (in cm) of a species of insect and their body mass \( y \) (in grams) are recorded. Summary statistics are:

  • \( \bar{x} = 8.5 \)
  • \( \bar{y} = 2.0 \)
  • \( \sigma_x = 1.2 \)
  • \( \sigma_y = 0.5 \)
  • \( r = 0.80 \)

Find the equation of the regression line of \( x \) on \( y \), and use it to predict the length \( x \) when \( y = 2.5 \).

▶️ Answer/Explanation

\( a = r \frac{\sigma_x}{\sigma_y} = 0.80 \times \frac{1.2}{0.5} = 0.80 \times 2.4 = 1.92 \)

\( b = \bar{x} – a \bar{y} = 8.5 – 1.92 \times 2.0 = 8.5 – 3.84 = 4.66 \)

\( x = 1.92 y + 4.66 \)

Predict \( x \) when \( y = 2.5 \)

\( x = 1.92 \times 2.5 + 4.66 = 4.8 + 4.66 = 9.46 \)

Conclusion: The predicted length is approximately 9.46 cm when the mass is 2.5 g.

Use of the Equation for Prediction Purposes

Use of the Equation for Prediction Purposes

Once the regression equation is found (either \( y \) on \( x \) or \( x \) on \( y \)), it can be used to predict the dependent variable from a given value of the independent variable.

For regression of \( y \) on \( x \): use to predict \( y \) from \( x \)
For regression of \( x \) on \( y \): use to predict \( x \) from \( y \)

Important caution: Students should be aware that they cannot reliably use an \( x \) on \( y \) regression line to predict \( y \) from \( x \), because the line is derived to minimize errors in predicting \( x \), not \( y \).

Also, predictions outside the range of the data (extrapolation) can be unreliable because the linear relationship may not hold beyond the observed values.

Example:

A study relates students’ revision time \( x \) (in hours) and exam grade \( y \). The regression line of \( y \) on \( x \) is found as:

\( y = 5x + 40 \)

Predict the grade for a student who revised 8 hours. Discuss whether it is appropriate to use this line to predict revision time from a grade.

▶️ Answer/Explanation

Predict the grade

\( y = 5 \times 8 + 40 = 40 + 40 = 80 \)

The predicted grade is 80.

The given regression is \( y \) on \( x \), suitable for predicting \( y \) (grade) from \( x \) (hours). It is not appropriate to use this line to predict revision hours from a grade, because the line minimizes errors in \( y \), not \( x \).

For predicting revision hours from grade, the regression line of \( x \) on \( y \) should be used.

Scroll to Top