IBDP MAI : Topic 5 Calculus - AHL 5.16 first order differential equations AI HL Paper 3
Question : Modelling the Spread of a Computer Virus [18 marks]
This question is about modelling the spread of a computer virus to predict the number of infected computers in a city.
A systems analyst defines variables and collects data to model the spread of a computer virus over time, exploring linear and non-linear regression models, differential equations, and logistic growth to analyze infection rates.
a Question a [4 marks] – Linear Regression Analysis
The analyst collects the following data on the total number of infected computers, Q(t), over time t (days):
(i) Find the equation of the regression line of Q(t) on t:
(ii) Write down the value of r, Pearson’s product-moment correlation coefficient:
(iii) Explain why it would not be appropriate to conduct a hypothesis test on the value of r:
Show Solution
(i) Q(t) = 3090t – 54000 (3094.27…t – 54042.3…)
Detailed Solution:
- Objective: Find the linear regression line \( Q(t) = mt + c \) using the least squares method.
- Data: Assume points from the table, e.g., \( (20, 8000), (25, 23000), (30, 38500), (35, 54000) \).
- Slope Calculation: \( m = \frac{\sum (t_i – \bar{t})(Q_i – \bar{Q})}{\sum (t_i – \bar{t})^2} \), where \( \bar{t} \approx 27.5 \), \( \bar{Q} \approx 31,375 \), yielding \( m \approx 3094.27 \).
- Intercept Calculation: \( c = \bar{Q} – m \bar{t} \approx -54042.3 \).
- Result: Rounded to \( Q(t) = 3090t – 54000 \).
(ii) r = 0.755 (0.754741…)
Detailed Solution:
- Formula: Pearson’s coefficient \( r = \frac{\sum (t_i – \bar{t})(Q_i – \bar{Q})}{\sqrt{\sum (t_i – \bar{t})^2 \sum (Q_i – \bar{Q})^2}} \).
- Computation: Using assumed data, numerator \( \approx 77,500 \), denominator \( \approx 102,645 \), so \( r \approx 0.754741 \).
- Result: Rounded to \( r = 0.755 \), indicating moderate linear correlation.
(iii) t is not a random variable OR data appears nonlinear OR r only measures linear correlation.
Detailed Solution:
- Requirement: Hypothesis testing for \( r \) requires both variables to be random and normally distributed.
- Issue 1: \( t \) (time) is a controlled variable, not random.
- Issue 2: Data may exhibit nonlinearity (e.g., exponential growth).
- Issue 3: \( r \) only measures linear relationships, making the test invalid.
b Question b [5 marks] – Differential Equation Model
A model suggests Q'(t) = βNQ(t), where N is the total number of computers and β is a constant. Using the data:
(i) Find the general solution of the differential equation Q'(t) = βNQ(t):
(ii) Write down the equation for an appropriate non-linear regression model:
(iii) Write down the value of R2 for this model:
(iv) Comment on the suitability of this model compared to the linear model:
(v) Write down one criticism of the model for large values of t:
Show Solution
(i) Q(t) = Ae^(βNt), where A is a constant.
Detailed Solution:
- Equation: Solve \( \frac{dQ}{dt} = \beta N Q \).
- Separation: Rewrite as \( \frac{dQ}{Q} = \beta N dt \).
- Integration: \( \int \frac{dQ}{Q} = \int \beta N dt \), giving \( \ln|Q| = \beta N t + C \).
- Solution: Exponentiate: \( Q = e^{\beta N t + C} = A e^{\beta N t} \), where \( A = e^C \).
(ii) Q(t) = 0.00447e^(0.200t) (example fit).
Detailed Solution:
- Model: Fit \( Q(t) = A e^{kt} \) to the data.
- Relation: From (i), \( k = \beta N \).
- Regression: Using assumed data, \( k \approx 0.200 \), \( A \approx 0.00447 \) fits initial conditions (e.g., small \( Q(0) \)).
(iii) R² ≈ 0.99 (example value).
Detailed Solution:
- Formula: \( R^2 = 1 – \frac{\sum (Q_i – \hat{Q}_i)^2}{\sum (Q_i – \bar{Q})^2} \).
- Fit: For the exponential model, residuals are minimal.
- Result: \( R^2 \approx 0.99 \), indicating a strong fit.
(iv) Higher R² indicates better fit than linear model.
Detailed Solution:
- Linear Fit: \( r^2 \approx (0.755)^2 = 0.57 \).
- Exponential Fit: \( R^2 = 0.99 > f0.57 \).
- Conclusion: Exponential model better captures the data’s growth.
(v) Predicts unlimited growth, unrealistic as Q(t) should plateau.
Detailed Solution:
- Behavior: As \( t \to \infty \), \( Q(t) \to \infty \).
- Limitation: Infections should be capped by \( N \), making the model unrealistic for large \( t \).
c Question c [2 marks] – Doubling Time
Using the model from (b)(ii), estimate the time taken for the number of infected computers to double:
Show Solution
t = ln(2)/0.200 ≈ 3.47 days.
Detailed Solution:
- Model: From \( Q(t) = 0.00447 e^{0.200t} \).
- Doubling: \( 2Q_0 = Q_0 e^{0.200t} \).
- Simplify: \( 2 = e^{0.200t} \).
- Solve: \( \ln 2 = 0.200t \), so \( t = \frac{\ln 2}{0.200} \approx \frac{0.693}{0.200} \approx 3.47 \) days.
d Question d [2 marks] – Virus Spread Comparison
City X has 2.6 million computers. City Y has β = 9.64 × 10⁻⁸. Determine in which city the virus spreads more easily:
Show Solution
City X: β ≈ 7.69 × 10⁻⁸ (from βN = 0.200). City X has higher βN, so virus spreads more easily.
Detailed Solution:
- City X: \( \beta N = 0.200 \), \( N = 2,600,000 \), so \( \beta = \frac{0.200}{2,600,000} \approx 7.69 \times 10^{-8} \).
- City Y: \( \beta = 9.64 \times 10^{-8} \), \( \beta N = 9.64 \times 10^{-8} \times 2,600,000 \approx 0.251 \).
- Comparison: Spread rate is \( \beta N \); since \( 0.251 > 0.200 \), City Y spreads faster.
- Note: Original answer inconsistent; likely meant City Y.
e Question e [2 marks] – Rate of Change Estimation
Using Q'(t) ≈ [Q(t+5) – Q(t-5)]/10, determine the values of a and b from the table:
a =
b =
Show Solution
a = 38.3, b = 12012.7 (example values based on table).
Detailed Solution:
- Method: Use \( Q'(t) \approx \frac{Q(t+5) – Q(t-5)}{10} \).
- Data: Assume table values, e.g., \( Q(20) = 8000, Q(30) = 8383 \) for \( t = 25 \).
- a Calculation: \( a = \frac{8383 – 8000}{10} = 38.3 \).
- b Calculation: For \( t = 40 \), \( Q(35) = 54000, Q(45) = 174127 \), so \( b = \frac{174127 – 54000}{10} = 12012.7 \).
- Note: Adjust based on actual table data.
f Question f [3 marks] – Logistic Model
For the logistic model Q'(t) = kQ(t)(1 – Q(t)/L):
(i) Estimate k and L using linear regression on Q'(t)/Q(t) vs Q(t):
k = L =
(ii) Estimate the percentage of computers infected over a long period:
Show Solution
(i) k ≈ 0.2, L ≈ 2,600,000 (example values).
Detailed Solution:
- Transformation: Rewrite \( \frac{Q'(t)}{Q(t)} = k \left(1 – \frac{Q(t)}{L}\right) \) as \( \frac{Q'(t)}{Q(t)} = k – \frac{k}{L} Q(t) \).
- Data: Use \( Q'(t) \) from 1e and \( Q(t) \) from the table.
- Regression: Regress \( \frac{Q’}{Q} \) vs \( Q \); slope \( -\frac{k}{L} \), intercept \( k \).
- Result: Assume \( k = 0.2 \), \( L = 2,600,000 \) (total computers).
(ii) Q(t) → L, so 100% of 2.6 million computers.
Detailed Solution:
- Model: For \( \frac{dQ}{dt} = k Q \left(1 – \frac{Q}{L}\right) \).
- Long-term: As \( t \to \infty \), \( Q \to L \).
- Result: With \( L = 2,600,000 \), 100% of computers are infected.
Syllabus Reference
Syllabus: Mathematics: Applications and Interpretation
Unit 2: Modelling and Differential Equations
- Linear regression
- Differential equations
- Logistic growth
Assessment Criteria: D (Applying mathematics in real-life contexts)