Home / IB DP Maths 2026, 2027 & 2028 / Application and Interpretation HL / IBDP MAI : AHL 4.12 Design of valid data collection methods

IB Mathematics AHL 4.12 Design of valid data collection methods AI HL Paper 3- Exam Style Questions- New Syllabus

Question

Amira, a sociologist, is studying whether income influences happiness among doctors. This question requires you to evaluate Amira’s methodology and conclusions.

Amira acquired a list of email addresses for doctors in her city and invited them to complete an anonymous survey. Participants were asked to report their annual income and answer questions that generated a happiness score out of 100. Out of 415 doctors contacted, 11 responded.

(a)

(i) Suggest one method Amira could use to enhance the reliability of her study. [1]

(ii) Identify one critique of the validity of Amira’s study. [1]

Amira’s findings are presented in the table below:

ResponseAnnual income ($)Happiness score
A65 00060
B63 00052
C40 00031
D125 00081
E100 00048
F245 00061
G48 00042
H39 00040
I85 00057
J92 00053
K123 456 78956

(b) Amira identifies response K as an outlier and excludes it from the data. Provide one possible reason for her decision to remove it. [1]

(c) For the remaining ten responses, Amira calculates the mean happiness score to be 52.5.

(i) Calculate the mean annual income for these remaining responses. [2]

(ii) Determine the value of \( r \), Pearson’s product-moment correlation coefficient, for these remaining responses. [2]

Amira conducts a hypothesis test on the correlation coefficient to explore whether higher annual income is associated with greater happiness.

(d)

(i) Explain why the hypothesis test should be one-tailed. [1]

(ii) State the null and alternative hypotheses for this test. [2]

The critical value for this test, at the 5% significance level, is 0.549. Amira assumes the population is bivariate normal.

(iii) Determine whether there is significant evidence of a positive correlation between annual income and happiness. Justify your answer. [2]

(e) Amira aims to develop a model to predict how changes in annual income affect happiness scores, treating annual income in dollars, \( X \), as the independent variable and happiness score, \( Y \), as the dependent variable.

She first explores a linear model of the form:

\( Y = aX + b. \)

(i) Using Amira’s data, calculate the values of \( a \) and \( b \). [1]

(ii) Explain, in the context of income and happiness, what the value of \( a \) signifies. [1]

Amira then considers a quadratic model of the form:

\( Y = cX^2 + dX + e. \)

(iii) Determine the values of \( c \), \( d \), and \( e \). [1]

(iv) Calculate the coefficient of determination for both models. [2]

(v) Compare the two models based on the results. [1]

(vi) Comment on the validity of her model selection. [1]

After presenting her findings, a colleague questions whether Amira’s sample represents all doctors in the city.

A report indicates that the average annual income of doctors in the city is $80,000. Amira conducts a statistical test to assess whether her sample could reasonably come from a population with a mean income of $80,000.

(f)

(i) Name the statistical test Amira should use. [1]

(ii) State the null and alternative hypotheses for this test. [1]

(iii) Conduct the test at a 5% significance level and state the conclusion in context. [3]

▶️ Answer/Explanation
Markscheme

(a) (i) Any one from:

Increase the sample size or response rate, repeat the survey, ensure the sample is representative, use stratified sampling, or employ random sampling. A1

[1 mark]

(ii) Any one from:

Non-random sampling may lead to bias, self-reported happiness may not reflect true happiness, happiness is difficult to quantify, income may include external sources, the sample is limited to one city, or correlation does not imply causation. A1

[1 mark]

(b) Response K’s income is implausible or significantly deviates from other data points. A1

[1 mark]

(c) (i) Sum of incomes (excluding K): \( 65000 + 63000 + 40000 + 125000 + 100000 + 245000 + 48000 + 39000 + 85000 + 92000 = 902000 \).

Mean: \( \frac{902000}{10} = 90200 \). A1 A1

[2 marks]

(ii) Pearson’s correlation coefficient: \( r \approx 0.558 \). A1 A1

[2 marks]

(d) (i) The test is one-tailed because Amira is only investigating whether increased income is associated with greater happiness (positive correlation). A1

[1 mark]

(ii) Null hypothesis: \( H_0 : \rho = 0 \).

Alternative hypothesis: \( H_1 : \rho > 0 \). A1 A1

[2 marks]

(iii) Compare \( r \approx 0.558 \) to critical value 0.549. Since \( 0.558 > 0.549 \), or p-value \( \approx 0.0469 < 0.05 \), there is significant evidence of a positive correlation between income and happiness. A1 A1

[2 marks]

(e) (i) For linear model \( Y = aX + b \):

\( a \approx 0.000126 \), \( b \approx 41.1 \). A1

[1 mark]

(ii) The value of \( a \) represents the increase in happiness score per dollar increase in annual income. A1

[1 mark]

(iii) For quadratic model \( Y = cX^2 + dX + e \):

\( c \approx -2.06 \times 10^{-9} \), \( d \approx 7.05 \times 10^{-5} \), \( e \approx 12.6 \). A1

[1 mark]

(iv) Coefficient of determination:

Quadratic model: \( R^2 \approx 0.659 \).

Linear model: \( R^2 \approx 0.311 \). A1 A1

[2 marks]

(v) The quadratic model is a better fit as it explains a higher proportion of the variance. A1

[1 mark]

(vi) The decision to use \( R^2 \) for model comparison is not valid, as it is not suitable for comparing models with different numbers of parameters, or the quadratic model inherently fits better due to its complexity. A1

[1 mark]

(f) (i) Single sample \( t \)-test. A1

[1 mark]

(ii) Null hypothesis: \( H_0 : \mu = 80000 \).

Alternative hypothesis: \( H_1 : \mu \neq 80000 \). A1

[1 mark]

(iii) Sample mean: \( \bar{x} \approx 90200 \). Sample standard deviation and \( t \)-test yield \( p \approx 0.610 \). Since \( 0.610 > 0.05 \), there is no significant evidence that the sample mean differs from $80,000, suggesting the sample could plausibly be drawn from a population with that mean. A1 A1 A1

[3 marks]

Scroll to Top