IBDP MAI :Topic 4: Statistics and probability-AHL 4.13-Sum of square residuals\( (SS_res)\) as a measure of fit for a model.Exam Style Questions Paper 3

Question

A systems analyst defines the following variables in a model:
– $t$ is the number of days since the first computer was infected by the virus.
– $Q(t)$ is the total number of computers that have been infected up to and including day $t$.

The following data were collected:

A model for the early stage of the spread of the computer virus suggests that
$$
Q^{\prime}(t)=\beta N Q(t)
$$
where $N$ is the total number of computers in a city and $\beta$ is a measure of how easily the virus is spreading between computers. Both $N$ and $\beta$ are assumed to be constant.

The data above are taken from city $\mathrm{X}$ which is estimated to have 2.6 million computers.
The analyst looks at data for another city, Y. These data indicate a value of $\beta=9.64 \times 10^{-8}$.

An estimate for $Q^{\prime}(t), t \geq 5$, can be found by using the formula:
$$
Q^{\prime}(t) \approx \frac{Q(t+5)-Q(t-5)}{10} .
$$

The following table shows estimates of $Q^{\prime}(t)$ for city $\mathrm{X}$ at different values of $t$.

An improved model for $Q(t)$, which is valid for large values of $t$, is the logistic differential equation
$$
Q^{\prime}(t)=k Q(t)\left(1-\frac{Q(t)}{L}\right)
$$
where $k$ and $L$ are constants.
Based on this differential equation, the graph of $\frac{Q^{\prime}(t)}{Q(t)}$ against $Q(t)$ is predicted to be a straight line.
a.i. Find the equation of the regression line of $Q(t)$ on $t$.
a.ii.Write down the value of $r$, Pearson’s product-moment correlation coefficient.
a.iiiExplain why it would not be appropriate to conduct a hypothesis test on the value of $r$ found in (a)(ii).
b.i. Find the general solution of the differential equation $Q^{\prime}(t)=\beta N Q(t)$.
b.ii.Using the data in the table write down the equation for an appropriate non-linear regression model.
b.iiWrite down the value of $R^2$ for this model.
b.ivtence comment on the suitability of the model from (b)(ii) in comparison with the linear model found in part (a).
b.v.By considering large values of $t$ write down one criticism of the model found in (b)(ii).
c. Use your answer from part (b)(ii) to estimate the time taken for the number of infected computers to double.
d. Find in which city, $\mathrm{X}$ or $\mathrm{Y}$, the computer virus is spreading more easily. Justify your answer using your results from part (b).
e. Determine the value of $a$ and of $b$. Give your answers correct to one decimal place.
f.i. Use linear regression to estimate the value of $k$ and of $L$.
f.ii. The solution to the differential equation is given by
$$
Q(t)=\frac{L}{1+C \mathrm{e}^{-k t}}
$$
where $C$ is a constant.
Using your answer to part (f)(i), estimate the percentage of computers in city $X$ that are expected to have been infected by the virus over a long period of time.

▶️Answer/Explanation

a.i. $Q(t)=3090 t-54000(3094.27 \ldots t-54042.3 \ldots)$
A1A1
Note: Award at most A1AO if answer is not an equation. Award A1AO for an answer including either $x$ or $y$.
[2 marks]
a.ii.0. $755(0.754741 \ldots)$
A1
[1 mark]
a.iiit is not a random variable OR it is not a (bivariate) normal distribution
OR data is not a sample from a population
OR data appears nonlinear
OR $r$ only measures linear correlation
R1
Note: Do not accept ” $r$ is not large enough”.
[1 mark]
b.i.attempt to separate variables
(M1)
$$
\begin{aligned}
& \int \frac{1}{Q} \mathrm{~d} Q=\int \beta N \mathrm{~d} t \\
& \ln |Q|=\beta N t+c
\end{aligned}
$$

A1A1A1

Note: Award $\boldsymbol{A 1}$ for LHS, $\boldsymbol{A 1}$ for $\beta N t$, and $\boldsymbol{A 1}$ for $+c$.
Award full marks for $Q=\mathrm{e}^{\beta N t+c}$ OR $Q=A \mathrm{e}^{\beta N t}$.
Award M1A1A1AO for $Q=\mathrm{e}^{\beta N t}$
[4 marks]

b.iiattempt at exponential regression
(M1)
$$
Q=1.15 \mathrm{e}^{0.292 t}\left(Q=1.14864 \ldots \mathrm{e}^{0.292055 \ldots t}\right)
$$
A1
OR
attempt at exponential regression
(M1)
$$
Q=1.15 \times 1.34^t(1.14864 \ldots \times 1.33917 \ldots t)
$$
A1
Note: Condone answers involving $y$ or $x$. Condone absence of ” $Q=$ ” Award M1AO for an incorrect answer in correct format.
[2 marks]
b.iii0. 999 (0.999431 ..)
A1
[1 mark]
b.ivcomparing something to do with $R^2$ and something to do with $r \quad M \boldsymbol{M}$
Note: Examples of where the $\boldsymbol{M 1}$ should be awarded:
$$
\begin{aligned}
& R^2>r \\
& R>r \\
& 0.999>0.755 \\
& 0.999>0.755^2 \quad(=0.563)
\end{aligned}
$$

The “correlation coefficient” in the exponential model is larger.
Model B has a larger $R^2$
Examples of where the $\boldsymbol{M 1}$ should not be awarded:
The exponential model shows better correlation (since not clear how it is being measured)
Model 2 has a better fit
Model 2 is more correlated
an unambiguous comparison between $R^2$ and $r^2$ or $R$ and $r$ leading to the conclusion that the model in part (b) is more suitable / better

Note: Condone candidates claiming that $R$ is the “correlation coefficient” for the non-linear model.
[2 marks]

b.v.t suggests that there will be more infected computers than the entire population
R1
Note: Accept any response that recognizes unlimited growth.
[1 mark]
c. $1.15 \mathrm{e}^{0.292 t}=2.3$ OR $1.15 \times 1.34^t=2.3$ OR $t=\frac{\ln 2}{0.292}$ OR using the model to find two specific times with values of $Q(t)$ which double
M1
$t=2.37$ (days)
A1
Note: Do not $\boldsymbol{F T}$ from a model which is not exponential. Award MOAO for an answer of 2.13 which comes from using (10, 20) from the data or any other answer which finds a doubling time from figures given in the table.
[2 marks]
d. an attempt to calculate $\beta$ for city $\mathrm{X}$
(M1)
$$
\begin{aligned}
\beta & =\frac{0.292055 \ldots}{2.6 \times 10^6} \text { OR } \beta=\frac{\ln 1.33917 \ldots}{2.6 \times 10^6} \\
& =1.12328 \ldots \times 10^{-7} \quad \text { A1 }
\end{aligned}
$$
this is larger than $9.64 \times 10^{-8}$ so the virus spreads more easily in city $\mathrm{X}$
R1
Note: It is possible to award M1AOR1.
Condone “so the virus spreads faster in city X” for the final $\boldsymbol{R 1}$.
[3 marks]
e. $a=38.3, b=3086.1$
A1A1
Note: Award A1AO if values are correct but not to $1 \mathrm{dp}$.
[2 marks]

f.i. $\frac{Q^{\prime}}{Q}=0.42228-2.5561 \times 10^{-6} Q$
(A1)(A1)
Note: Award A1 for each coefficient seen – not necessarily in the equation. Do not penalize seeing in the context of $y$ and $x$.
identifying that the constant is $k$ OR that the gradient is $-\frac{k}{L} \quad$ (M1)
therefore $k=0.422(0.422228 \ldots)$
A1
$$
\begin{aligned}
& \frac{k}{L}=2.5561 \times 10^{-6} \\
& L=165000(165205)
\end{aligned}
$$
A1
Note: Accept a value of $L$ of 164843 from use of $3 \mathrm{sf}$ value of $k$, or any other value from plausible pre-rounding.
Allow follow-through within the question part, from the equation of their line to the final two $\boldsymbol{A 1}$ marks.
[5 marks]
f.ii. recognizing that their $L$ is the eventual number of infected
(M1)
$$
\frac{165205 \ldots}{2600000}=6.35 \%
$$
(6. $35403 \ldots \%)$
A1

Note: Accept any final answer consistent with their answer to part (f)(i) unless their $L$ is less than 120146 in which case award at most M1AO.
[2 marks]

 

Scroll to Top