AP Statistics- Unit 9: Inference for Quantitative Data: Slopes Summary Notes

Inference for Quantitative Data: Slopes
Finally, 2-5\% of the questions on your AP Statistics exam will cover the topic Inference for Quantitative Data: Slopes.

Recall that the model \(\hat{y}=a+b x\) describes a linear relationship between the two variables \(x\) and \(y\). The values of \(a\) and \(b\) are calculated based on a sample, so they are statistics. The parameter that \(b\) estimates is the true value of the slope when the entire population is considered and is denoted \(\beta\). Just like we construct confidence intervals and carry out hypothesis tests for other parameters (such as proportions and means), we can also do this for \(\beta\).
There are several conditions that must be checked in these situations:

  •  \(x\) and \(y\) should have a linear relationship. This can be checked using residual plots, as shown in an earlier section.
  • \(\sigma_y\) should be the same for all \(x\). That is, the standard deviation of all the \(y\)-values associated with any given \(x\)-value should be the same regardless of the \(x\)-value. This is difficult to check in practice, but again, a residual plot can be useful.
  •  Data should come from a random sample.
  •  The \(y\)-values associated with any given \(x\) should be approximately normal.

Confidence Intervals for Slopes
If the response variable, \(y\), has standard deviation \(\sigma_y\), then \(\sigma_y\) can be estimated using the standard deviation of the residuals. This is calculated using the formula \(s=\sqrt{\frac{1}{n-2} \sum\left(y_i-\hat{y}_i\right)^2}\). The sampling distribution of the slope \(b\) is then \(S E_b=\frac{s}{s_x \sqrt{n-1}}\), where \(s_x\) is the sample standard deviation of the \(x\)-values. This comes from a \(t\)-distribution with \(d f=n-2\). The point estimate, of course, is the slope \(b\) calculated from the sample.

As usual, the margin of error is \(M E=t^* \cdot S E_b\), and the confidence interval is \((b-M E, b+M E)\).

Free Response Tip
If a confidence interval for the slope contains both positive and negative values, then in particular it contains 0 . This means that at the confidence level specified, you are not even sure that there is any linear relationship between \(x\) and \(y\). If this happens, be sure to mention it in your conclusion.

Hypothesis Tests for Slopes
The null hypothesis for a test of slope is \(H_0: \beta=\beta_0\), where \(\beta_0\) is the hypothesized slope. The alternative hypothesis is one of \(H_a: \beta<\beta_0, H_a: \beta>\beta_0\), or \(H_a: \beta \neq \beta_0\). Commonly, the null hypothesis will be \(H_0: \beta=0\). If this is rejected, it establishes the fact that there is some relationship between \(x\) and \(y\).

The test statistic comes from a \(t\)-distribution with \(d f=n-2\), and is given by the formula \(t=\frac{b-\beta_0}{S E_b}\), where \(S E_b=\frac{s}{s_x \sqrt{n-1}}\) as described previously. A \(p\)-value is obtained from the \(t\)-distribution as usual, and is interpreted in relation to the chosen \(\alpha\).

Suggested Reading

  • Starnes \& Tabor. The Practice of Statistics. \(6^{\text {th }}\) edition. Chapter 12. New York, NY: Macmillan.
  •  Larson \& Farber. Elementary Statistics: Picturing the World. \(7^{\text {th }}\) edition. Chapter 9. New York, NY: Pearson.
  •  Bock, Velleman, De Veaux, \& Bullard. Stats: Modeling the World. \(5^{\text {th }}\) edition. Chapter 26. New York, NY: Pearson.
  • Sullivan. Statistics: Informed Decisions Using Data. \(5^{\text {th }}\) edition. Chapter 14. New York, NY: Pearson.
  •  Peck, Short, \& Olsen. Introduction to Statistics and Data Analysis. \(6^{\text {th }}\) edition. Chapter 13. Boston, MA: Cengage Learning.

Sample Inference for Quantitative Data: Slopes Questions
Use the information below to answer the two questions that follow.

Growth hormones are often used to increase the weight gain of turkeys. In an experiment involving 25 turkeys, five different doses of growth hormone \((0,0.25,0.5,0.75\), and \(1.0 \mathrm{mg} / \mathrm{kg}\) ) were injected into turkeys (five for each dose) and the subsequent weight gain recorded. A linear relationship appears to hold for the data. The following output regarding the regression line was obtained:

The best fit regression line and 95\% confidence interval for its slope are given by which of the following?
A. Regression Line: \(y=5.3180+4.2153 x\)
95\% Cl for slope: \(4.2153 \pm 1.714(1.2210)\)
B. Regression Line: \(y=5.3180 x+4.2153\)
95\% Cl for slope: \(5.3180 \pm 2.069(1.634)\)
C. C.Regression Line: \(y=5.3180 x\)
95\% Cl for slope: \(5.3180 \pm 1.96(1.634)\)
D. Regression Line: \(y=5.3180+4.2153 x\)
95\% Cl for slope: \(4.2153 \pm 2.069(1.2210)\)
E. Regression Line: \(y=5.3180 x+4.2153\)
95\% Cl for slope: \(5.3180 \pm 1.714(1.6340)\)

▶️Answer/Explanation

Explanation:
The correct answer is B. You correctly identified the slope and intercept for the regression line from the output, and when constructing the \(95 \% \mathrm{Cl}\) for the slope, the formula is slope \(\pm t 0.05 / 2\). Here, the cut-off \(t 0.05 / 2\) with \(\mathrm{df}=23\) is 2.069 (from the \(t\)-table). So, choice B is the correct answer. Choice \(A\) is incorrect because you interchanged the slope and intercept in the equation of the regression line, and you used the cut-off \(t 0.05\) with \(\mathrm{df}=23\) instead of \(t 0.05 / 2\) with \(\mathrm{df}=23\) when constructing the \(\mathrm{Cl}\). Choice \(\mathrm{C}\) is incorrect because you forgot to include the intercept in the equation of the regression line. Also, a less critical error, but one nonetheless, was that you used the cut-off \(z 0.05=1.96\) when constructing the \(\mathrm{Cl}\) instead of \(t 0.05 / 2\) with \(\mathrm{df}=23\), which is 2.069. Choice \(D\) is incorrect because you interchanged the slope and intercept in the equation of the regression line. Choice \(\mathrm{E}\) is incorrect because you used the cut-off \(t 0.05\) with \(\mathrm{df}=23\) instead of \(t 0.05 / 2\) with \(\mathrm{df}=23\) (which is 2.069 from the \(\mathrm{t}\)-table) when constructing the \(\mathrm{Cl}\).

Which of the following lists appropriate null and alternative hypotheses to test the slope, the test statistic, and \(p\)-value?
A. Hypotheses: \(\mathrm{H}_0: \beta_1=0\) versus \(\mathrm{H}_{\mathrm{A}}\) : \(\beta_1 \neq 0\)
Test Statistic: \(\mathrm{T}=3.18\)
p-value: \(0.20<p<0.50\)
B. Hypotheses: \(\mathrm{H}_0: B_0=0\) versus \(\mathrm{H}_{\mathrm{A}}: B_0 \neq 0\)
Test Statistic: \(\mathrm{T}=3.18\)
p-value: \(0.20<p<0.50\)

C. Hypotheses: \(\mathrm{H}_0: B_1 \neq 0\) versus \(\mathrm{H}_{\mathrm{A}}: \boldsymbol{B}_1=0\)
Test Statistic: \(\mathrm{T}=3.18\)
p-value: \(0.20<p<0.50\)
D. Hypotheses: \(\mathrm{H}_0: B_0=0\) versus \(\mathrm{H}_{\mathrm{A}}: \mathrm{B}_0>0\)
Test Statistic: \(T=3.77\)
\(p\)-value: \(p<0.01\)
E. Hypotheses: \(\mathrm{H}_0: b_1=0\) versus \(\mathrm{H}_{\mathrm{A}}: b_1>0\)
Test Statistic: \(\mathrm{T}=3.18\)
p-value: \(0.20<p<0.50\)

▶️Answer/Explanation

Explanation:
The correct answer is A. The null and alternative hypotheses for the slope are standard for a linear regression model. The test statistic is the one associated with dose, and the \(p\)-value is 0.0215, which satisfies this inequality. Choice B is incorrect because the hypotheses are listed for the intercept, not the slope. Choice \(\mathrm{C}\) is incorrect because the null and alternative hypotheses are reversed. Choice \(D\) is incorrect because the stated hypotheses concern the intercept, not the slope. And, the alternative hypotheses should not be an inequality here but rather “not equal to” since we are trying to assess the existence of any trend, positive or negative. Choice \(\mathrm{E}\) is incorrect because the alternative hypotheses should not be an inequality here but rather “not equal to” since we are trying to assess the existence of any trend, positive or negative.

Scroll to Top