Home / AP Statistics 2.8 Least Squares Regression Study Notes

AP Statistics 2.8 Least Squares Regression Study Notes

AP Statistics 2.8 Least Squares Regression Study Notes- New syllabus

AP Statistics 2.8 Least Squares Regression Study Notes -As per latest AP Statistics Syllabus.

LEARNING OBJECTIVE

  • Regression models may allow us to predict responses to changes in an explanatory variable.

Key Concepts:

  • Estimating Parameters for the Least-Squares Regression Line
  • The Coefficient of Determination (\( r^2 \))

AP Statistics -Concise Summary Notes- All Topics

Estimating Parameters for the Least-Squares Regression Line

Estimating Parameters for the Least-Squares Regression Line

The least-squares regression line is the line that minimizes the sum of squared residuals.

The model is written as:

\( \displaystyle \hat{y} = a + b x \)

  • \( \hat{y} \): predicted response
  • \( a \): intercept
  • \( b \): slope

Formulas for Estimating Parameters:

\( \displaystyle b = r \cdot \dfrac{s_y}{s_x}, \quad a = \bar{y} – b\bar{x} \)

  • \(r\): correlation between \(x\) and \(y\)
  • \(s_x, s_y\): standard deviations of \(x\) and \(y\)
  • \(\bar{x}, \bar{y}\): means of \(x\) and \(y\)

Interpreting Coefficients:

  • Slope \(b\): The predicted change in the response variable \(y\) for a one-unit increase in the explanatory variable \(x\).
  • Intercept \(a\): The predicted value of \(y\) when \(x=0\). This may or may not be meaningful, depending on the context.

Cautions:

  • Interpret slope only within the observed range of data.
  • Do not extrapolate predictions for \(x\)-values far outside the data.
  • The regression line summarizes association, not causation.

Example

A dataset records students’ hours studied (\(x\)) and exam scores (\(y\)).

Summary statistics:

  • \( \bar{x} = 5,\; s_x = 2 \)
  • \( \bar{y} = 70,\; s_y = 10 \)
  • Correlation \( r = 0.8 \)

Find the least-squares regression line equation.

▶️ Answer / Explanation

\( b = r \cdot \dfrac{s_y}{s_x} = 0.8 \cdot \dfrac{10}{2} = 4.0 \).

\( a = \bar{y} – b\bar{x} = 70 – 4(5) = 50 \).

Regression Equation: \( \hat{y} = 50 + 4x \).

Example

Suppose the regression line is \( \hat{y} = 50 + 4x \), where \(x\) = hours studied and \(y\) = exam score.

Interpret the slope and intercept in context.

▶️ Answer / Explanation

Slope (4): For each additional hour of study, the model predicts an average increase of 4 exam points.

Intercept (50): When \(x=0\) (a student does not study at all), the model predicts an exam score of 50. This interpretation makes sense here, since “0 hours” is within the possible range.

The Coefficient of Determination (\( r^2 \))

The Coefficient of Determination (\( r^2 \))

In simple linear regression, \( r^2 \) is the square of the correlation coefficient \( r \). It is called the coefficient of determination.

Formula:

\( \displaystyle r^2 = \dfrac{\text{Explained Variation}}{\text{Total Variation}} \)

Interpretation:

\( r^2 \) represents the proportion of variation in the response variable (\(y\)) that is explained by the explanatory variable (\(x\)) in the regression model.

  • Values of \( r^2 \) range from 0 to 1:
    • \( r^2 = 0 \): The model explains none of the variation.
    • \( r^2 = 1 \): The model explains all of the variation.
  • Intermediate values: Higher \( r^2 \) means better explanatory power.
  • \( r^2 \) alone does not indicate whether the model is appropriate (check residual plots).

Example 

Suppose the correlation between hours studied (\(x\)) and exam scores (\(y\)) is \( r = 0.8 \).

 What does \( r^2 \) tell us in this context?

▶️ Answer / Explanation

Step 1 — compute \( r^2 \):

\( r^2 = (0.8)^2 = 0.64 \).

Step 2 — interpret:

About 64% of the variation in exam scores can be explained by a linear relationship with hours studied. The remaining 36% of the variation is due to other factors (or random variation).

Note: A strong \( r^2 \) does not prove causation. Other variables could also influence exam scores.

Scroll to Top