IB Mathematics AI AHL Non-linear regression MAI Study Notes- New Syllabus

IB Mathematics AI AHL Non-linear regression MAI Study Notes

LEARNING OBJECTIVE

Non-linear regression.

Key Concepts:

Evaluation of least squares regression curves
Sum of square residuals
The coefficient of determination

Non-linear regression models describe relationships between variables that cannot be captured with a straight line. These models are especially useful in AI and data science where real-world data often exhibits non-linear behavior.

Types of Non-Linear Regression Models

Quadratic Regression

Formula: $y = ax^2 + bx + c$

Used when data forms a parabolic curve, such as in projectile motion.

Cubic Regression

Formula: $y = ax^3 + bx^2 + cx + d$

Useful when data has more than one curve or inflection point.

Exponential Regression

Formula: $y = ab^x$ or $y = ae^{bx}$

Appropriate for growth or decay processes, such as population growth or bacterial growth.

Power Regression

Formula: $y = ax^b$

Used when one variable changes at a rate proportional to a power of another, often found in physics or engineering.

Sine Regression

Formula: $y = a\sin(bx + c) + d$

Best for periodic or oscillatory data, such as seasonal trends or sound waves.

Example Applications

Exponential Model: Predicting population growth.
Sine Model: Modeling temperature cycles or light intensity.

Selecting the Appropriate Model

The choice of model depends on data patterns:

Quadratic: One curve, either concave up or down.
Cubic: Multiple bends or inflection points.
Exponential: Rapid growth or decay.
Power: Steady increase/decrease at a variable rate.
Sine: Repetitive, periodic trends.

Least Squares Method – Linear Regression

The least squares method is used to find the line of best fit (regression line) for a set of data. This line minimizes the sum of the squares of the vertical distances (errors) between the observed values and the values predicted by the line.

Line of best fit: $ y = mx + c $

Formulas:

Slope: $ m = \dfrac{n\sum xy – \sum x \sum y}{n\sum x^2 – (\sum x)^2} $
Intercept: $ c = \dfrac{\sum y – m\sum x}{n} $

Example

Given the following data, find the least squares regression line:

x	y
1	2
2	3
3	5
4	4
5	6

▶️ Answer/Explanation

Compute the required sums
\[ \sum x = 1+2+3+4+5 = 15,\quad \sum y = 2+3+5+4+6 = 20 \]
\[ \sum x^2 = 1^2 + 2^2 + 3^2 + 4^2 + 5^2 = 55,\quad \sum xy = (1)(2) + (2)(3) + (3)(5) + (4)(4) + (5)(6) = 69 \]
Number of data points: $ n = 5 $

Calculate slope

\[ m = \frac{5(69) – (15)(20)}{5(55) – (15)^2} = \frac{345 – 300}{275 – 225} = \frac{45}{50} = 0.9 \]

Calculate intercept

\[ c = \frac{20 – 0.9(15)}{5} = \frac{20 – 13.5}{5} = \frac{6.5}{5} = 1.3 \]

Final answer:
The least squares regression line is: \[ \boxed{y = 0.9x + 1.3} \]

In the Context of AI

In AI and machine learning, regression techniques help model relationships and make predictions. Non-linear regression is essential for capturing complex patterns that linear models miss.

Exam Tip

Use graphing calculators or software like Desmos or Excel to fit different regression models and assess which best suits the data. Visual pattern recognition is key.

The Sum of Square Residuals ($\text{SS}_{\text{res}}$) measures the total deviation of the observed values from the predicted values by the regression model. It is used to assess the fit of the model.

The formula for $\text{SS}_{\text{res}}$ is:

$ \text{SS}_{\text{res}} = \sum_{i=1}^{n} (y_i – \hat{y}_i)^2 $

Where:

$y_i$ is the observed value
$\hat{y}_i$ is the predicted value from the regression model

A smaller $\text{SS}_{\text{res}}$ indicates a better fit of the model to the data, meaning the predictions are closer to the observed values.

Example

The observed and predicted values for a regression model are given below:

At $ x = 1 $, observed $ y = 2 $, predicted $ \hat{y} = 2.1 $
At $ x = 2 $, observed $ y = 3 $, predicted $ \hat{y} = 2.8 $
At $ x = 3 $, observed $ y = 5 $, predicted $ \hat{y} = 4.9 $

▶️ Answer/Explanation

Solution:

Use the formula for the sum of squared residuals:
$ \text{SS}_{\text{res}} = \sum (y_i – \hat{y}_i)^2 $

$ \text{SS}_{\text{res}} = (2 – 2.1)^2 + (3 – 2.8)^2 + (5 – 4.9)^2 $

$ = (-0.1)^2 + (0.2)^2 + (0.1)^2 = 0.01 + 0.04 + 0.01 = \boxed{0.06} $

Conclusion: The total sum of squared residuals is $ \text{SS}_{\text{res}} = 0.06 $, indicating that the model predictions are close to the observed values.

The coefficient of determination (R²) is a statistical measure that explains how well the regression model accounts for the variance in the dependent variable.

The formula for $R^2$ is:

$R^2 = 1 – \frac{\text{SS}_{\text{res}}}{\text{SS}_{\text{tot}}}$

Where:

$\text{SS}_{\text{res}}$ is the sum of squared residuals (as defined above)
$\text{SS}_{\text{tot}}$ is the total sum of squares, defined as:

$\text{SS}_{\text{tot}} = \sum_{i=1}^{n} (y_i – \bar{y})^2$

Where:

$y_i$ is the observed value
$\bar{y}$ is the mean of the observed values

Note: $R^2$ values range from 0 to 1. An $R^2$ of 1 indicates a perfect fit ($\text{SS}_{\text{res}} = 0$), while an $R^2$ of 0 means that the model does not explain any of the variability in the data.

Interpreting $R^2$

An $R^2$ of 0.95 means that 95% of the variance in the dependent variable can be predicted from the independent variable(s). For linear models, $R^2$ is the square of Pearson’s correlation coefficient.

Note:

It’s a common mistake to believe that a higher $R^2$ automatically means a better model. This isn’t necessarily true, particularly when the models being compared vary in complexity or type.”

Limitations of $R^2$

While $R^2$ is helpful, it has limitations:

$R^2$ increases when more variables are added to the model, even if they are irrelevant.
It does not indicate whether the coefficients are biased.
It does not assess the appropriateness of the model type.

Model Evaluation Beyond $R^2$

To fully evaluate a regression model, consider:

Residual plots: Look for patterns that might indicate a poor fit.
Domain knowledge: Does the model make sense given the problem context?
Simplicity: Prefer simpler models when performance is comparable (Occam’s Razor).
Predictive power: Test the model on new data to assess its generalizability.

Example

Consider data on bacterial colony growth over time:

Linear Model: $ y = 2x + 5, \, R^2 = 0.85 $
Exponential Model: $ y = 5e^{0.2x}, \, R^2 = 0.98 $

While the exponential model has a higher $R^2$, we should evaluate whether exponential growth is appropriate for the biological process (e.g., bacterial growth). An overly complex model might lead to overfitting.

Example

A regression model is used to predict values of $ y $. The total sum of squares is given as $ \text{SS}_{\text{tot}} = 5.00 $, and the sum of squares of residuals is $ \text{SS}_{\text{res}} = 0.75 $.

▶️ Answer/Explanation

Solution:

The coefficient of determination is given by the formula:
$ R^2 = 1 – \frac{\text{SS}_{\text{res}}}{\text{SS}_{\text{tot}}}$
$ R^2 = 1 – \frac{0.75}{5.00} = 1 – 0.15 = \boxed{0.85} $

Interpretation: The model explains 85% of the variation in the dependent variable.

Modern tools like R, Python, and statistical software can efficiently perform non-linear regression:

Automated model fitting for different functions
Calculation of $R^2$ and other fit measures
Visualization of data and fitted models
Residual analysis

Exam Technique

When determining the regression equation:

Plot the Data: Visualize the relationship with a scatter plot.
Choose a Model: Based on the trend, decide on the regression type.
Use Least Squares: Fit the curve by minimizing $\text{SS}_{\text{res}}$.
Evaluate Fit: Compute $R^2$ for accuracy.
Validate Predictions: Check residual plots for patterns.

Tip: Familiarize yourself with at least one statistical software or programming language to perform non-linear regression analysis effectively.

Understanding Overfitting in Non-Linear Regression

Overfitting occurs when the model becomes too complex, fitting the noise in the data instead of the actual trend. This results in high $R^2$ values for training data but poor predictive performance on new data.

Signs of overfitting:

The model includes unnecessary higher-degree terms.
The regression curve passes through almost all data points but fails to generalize.
Residual plots show erratic or systematic patterns instead of random scatter.

To prevent overfitting:

Use cross-validation to test the model on unseen data.
Prefer simpler models when performance is similar (Occam’s Razor).
Consider the context: Real-world relationships are often simpler than overly complex models.

Previous Topic Notes

Next Topic Notes

IB Mathematics AI AHL Non-linear regression MAI Study Notes- New Syllabus

IB Mathematics AI AHL Non-linear regression MAI Study Notes

NON-LINEAR REGRESSION

SUM OF SQUARE RESIDUALS (\(\text{SS}_{\text{res}}\))

COEFFICIENT OF DETERMINATION (R²)

TECHNOLOGY IN NON-LINEAR REGRESSION

Resources

Members

Company