IBDP MAI : Topic 4 Statistics and probability - AHL 4.12 Design of valid data collection methods AI HL Paper 3
Question : Validity and Reliability in Recruitment [28 marks]
A firm wishes to review its recruitment processes. This question considers the validity and reliability of the methods used.
Every year an accountancy firm recruits new employees for a trial period of one year from a large group of applicants. At the start, all applicants are interviewed and given a rating. Those with a rating of either Excellent, Very good or Good are recruited for the trial period. At the end of this period, some of the new employees will stay with the firm. It is decided to test how valid the interview rating is as a way of predicting which of the new employees will stay with the firm. Data is collected and recorded in a contingency table.
The next year’s group of applicants are asked to complete a written assessment which is then analysed. From those recruited as new employees, a random sample of size 18 is selected. The sample is stratified by department. Of the 91 new employees recruited that year, 55 were placed in the national department and 36 in the international department. At the end of their first year, the level of performance of each of the 18 employees in the sample is assessed by their department manager. They are awarded a score between 1 (low performance) and 10 (high performance). The marks in the written assessment and the scores given by the managers are shown in both the table and the scatter diagram.
The same seven employees are given the written assessment a second time, at the end of the first year, to measure its reliability. Their marks are shown in the table below.
a Question a [6 marks] – Chi-Squared Test
Use an appropriate test, at the 5 % significance level, to determine whether a new employee staying with the firm is independent of their interview rating. State the null and alternative hypotheses, the p-value and the conclusion of the test:
Show Solution
Ans: 2. (a) Use of χ2 test for independence Ho : Staying (or leaving) the firm and interview rating are independent. $H_{1}$ : Staying (or leaving) the firm and interview rating are not independent p-value = 0.487 (0.487221…) 0.487 > 0.05 (the result is not significant at the 5% level)insufficient evidence to reject the $H_{0}$ (or “accept $H_{0}$ ”)
Detailed Solution: A \( \chi^2 \) test assesses independence in the contingency table.
Observed: Excellent (8 stay, 5 leave),
Very Good (11 stay, 8 leave),
Good (7 stay, 6 leave).
Totals: Stay = 26,
Leave = 19,
Total = 45;
Rows: 13, 19, 13.
Expected: e.g.,
Excellent Stay = \( \frac{26 \cdot 13}{45} \approx 7.51 \),
Leave = \( 5.49 \).
\( \chi^2 = \sum \frac{(O – E)^2}{E} \approx 1.44 \),
df = 2, \( p \)-value = 0.487221…
Since \( 0.487 > 0.05 \), fail to reject \( H_0 \), suggesting no significant dependence between rating and retention.
b Question b [2 marks] – Stratified Sampling
Show that 11 employees are selected for the sample from the national department:
Show Solution
(b) \(\frac{55}{91}\times 18= 10.9 (10.8791…)\) ≈11
Detailed Solution: In stratified sampling, the sample reflects department proportions. Total = 91, National = 55, sample size = 18. National proportion = \( \frac{55}{91} \approx 0.6044 \), National sample = \( 0.6044 \times 18 \approx 10.8791 \), rounded to 11 (integer, with 7 from International summing to 18).
c Question c [8 marks] – Spearman’s Correlation
(i) Without calculation, explain why it might not be appropriate to calculate a correlation coefficient for the whole sample of 18 employees:
(ii) Find rs for the seven employees working in the international department:
(iii) Hence comment on the validity of the written assessment as a measure of the level of performance of employees in this department:
Show Solution
(c) (i) there seems to be a difference between the two departments the international department manager seems to be less generous than the national department manager
(ii)
$r_{1}$ = 0.909 (0.909241…..)
(iii) EITHER there is a (strong) association between the written assessment mark and the manager scores. OR there is a (strong) agreement in the rank order of the written assessment marks and the rank order of the manager scores. OR there is a (strong linear) correlation between the rank order of the written assessment marks and the rank order of the manager scores. THEN the written assessment is likely to be a valid measure (of the level of employee performance)
Detailed Solution: (i) The scatter plot shows distinct clusters, with International scores lower than National, indicating differing manager biases, skewing a combined \( r_s \). (ii) International data ranks align closely (e.g., marks 57-78, scores 5-11), \( \sum d^2 \) small, \( r_s = 1 – \frac{6 \sum d^2}{7 \cdot 48} \approx 0.909241 \). (iii) High \( r_s \) suggests strong rank agreement, implying the assessment validly predicts performance in International.
d Question d [6 marks] – Reliability Test
(i) State the name of this type of test for reliability:
(ii) For the data in this table, test the null hypothesis, H0 : ρ = 0 against the alternative hypothesis, H1 : ρ > 0, at the 5 % significance level. You may assume that all the requirements for carrying out the test have been met:
(iii) Hence comment on the reliability of the written assessment:
Show Solution
(d) (i) test-retest
(ii) p-value = 0.00209 (0.0020939…) 0.00209 < 0.05 (the result is significant at the 5% level) (there is sufficient evidence to) reject H0
(iii) the test seems reliable
Detailed Solution: (i) Repeated testing is a test-retest method.
(ii) Ranks: Test 1 (57-78), Test 2 (64-77),
\( \sum d^2 = 6 \),
\( r_s = 1 – \frac{6 \cdot 6}{7 \cdot 48} \approx 0.8929 \),
One-tailed \( p \)-value for \( n = 7 \),
\( r_s \approx 0.893 \) is 0.00209, \( < 0.05 \), reject \( H_0 \).
(iii) Significant positive correlation indicates consistent results, suggesting reliability.
e Question e [6 marks] – Multiple Tests
(i) Write down the number of tests they carry out:
(ii) The tests are performed at the 5 % significance level. Assuming that: there is no correlation between the marks in any of the sections and scores in any of the attributes, the outcome of each hypothesis test is independent of the outcome of the other hypothesis tests, find the probability that at least one of the tests will be significant:
(iii) The firm obtains a significant result when comparing section 2 of the written assessment and attribute X. Interpret this result:
Show Solution
(e) (i) 25
(ii) probability of significant result given no correlation is 0.05 probability of at least one significant result in 25 tests is 1- 0.9525 = 0.723 (0.722610…)
(iii) (though the result is significant) it is very likely that one significant result would be achieved by chance, so it should be disregarded or further evidence sought
Detailed Solution: (i) 5 sections × 5 attributes = 25 tests.
(ii) \( P(\text{not significant}) = 0.95 \),
\( P(\text{all not significant}) = 0.95^{25} \approx 0.27739 \),
\( P(\text{at least one}) = 1 – 0.27739 \approx 0.72261 \approx 0.723 \).
(iii) With a 72.3% chance of at least one false positive in 25 tests, the significance of section 2 and X is likely a Type I error unless confirmed further.
Syllabus Reference
Syllabus: Mathematics: Applications and Interpretation
Topic 4: Statistics and Probability
- Chi-squared test
- Spearman’s correlation
- Hypothesis testing
- Binomial probability
Assessment Criteria: D (Applying mathematics in real-life contexts)