Home / 2025_Computer_science_paper_2_TZ2_HL

Question 1

Alpha Hospital is situated in a large city and has over 1000 staff. It stores its data in a relational database. Some of the tables in this relational database contain data about the hospital staff, their roles, and the hospital departments. Staff roles include doctor, nurse, pharmacist, radiologist, and support staff. Staff can only hold one role. Departments include Accident and Emergency, Critical Care, Medical, General Surgery, Orthopaedics, and Ophthalmology. Staff can work in several departments.
The ROLE table, STAFF table, and DEPARTMENT table are shown in Figure 1.
(a)(i) State the primary key in the STAFF table.
(a)(ii) State a foreign key in the STAFF table.
(b) Describe the relationships between the three tables in Figure 1.
(c) Outline what a query is used for in a database.
(d) Identify the steps to create a query to list the staff with the surname Waters who are on pay grade 17. The query must display only FirstName, Surname, and PayGrade.
(e) Outline why an integer is an appropriate data type for the PayGrade field.
(f) Explain two ways in which the database administrator can ensure the privacy of the hospital’s staff data.

Most-appropriate topic codes (IB Computer Science HL):

• Topic A.2 — The relational database model (Parts (a)(i), (a)(ii), (b), (c), (d), (e))
• Topic A.3 — Further aspects of database management (Part (f))

▶️ Answer/Explanation

(a)(i)
For the correct answer:
StaffID

In any well-designed relational database table, the primary key serves as the unique identifier for each record, meaning every staff member will have their own distinct ID that will never repeat. So, looking at the structure of a STAFF table, the StaffID field is the most logical and standard choice to guarantee that each employee is uniquely identifiable.

(a)(ii)
For the correct answer:
RoleID or DepartmentID

A foreign key is a field in one table that links to the primary key in another table, creating a relationship between the two. In the STAFF table, the RoleID is needed to link each staff member to their specific job title in the ROLE table, and if there’s a direct link, a DepartmentID could similarly connect them to a department, making either a valid foreign key.

(b)
For the correct answer:
ROLE and STAFF: one to many
STAFF and DEPARTMENT: many to many

The relationship between ROLE and STAFF is one-to-many because a single role, like ‘nurse’, can be assigned to dozens of different staff members, but each staff member can hold only one role at a time. For STAFF and DEPARTMENT, it’s a many-to-many relationship since any staff member, such as a surgeon, can work in multiple departments, and each department will naturally contain many different staff members.

(c)
For the correct answer:
A query provides a virtual representation/filtered view of the database based on criteria set in the query.

A query is essentially a question you ask the database to retrieve specific information without seeing all the clutter. It allows you to filter, sort, and combine data dynamically, so instead of browsing thousands of records, the system instantly shows you only the names of staff matching a specific surname and pay grade, just like you requested.

(d)
For the correct answer:
SELECT STAFF.Firstname, STAFF.Surname, ROLE.PayGrade FROM STAFF INNER JOIN ROLE ON STAFF.RoleID = ROLE.RoleID WHERE STAFF.Surname = ‘Waters’ AND ROLE.PayGrade = 17

To get this result, you must first tell the database which columns you want (FirstName, Surname, PayGrade) by selecting them and then join the STAFF and ROLE tables together using the common key RoleID, because PayGrade is in the ROLE table. Finally, you set the filtering conditions to look for exactly ‘Waters’ in the Surname column and the number 17 in the PayGrade column, which narrows down the results.

(e)
For the correct answer:
The pay grade values are whole numbers.

An integer data type is a perfect fit here because pay grades like 17, 18, or 19 are always whole numbers without any fractions or decimals. Using an integer also keeps the database efficient, as it takes up less storage space than text and makes mathematical comparisons and sorting faster and more reliable.

(f)
For the correct answer:
The use of different levels of access/authorization. Meaning the minimum number of people have access to this data. For example, password protect/lock/restrict access to Role table.
Encrypting the stored data in the database. The data is scrambled/converted to cipher text (and cannot be understood without a key). For example, only the employees with the key can decrypt the data.

One effective way is implementing strict user authorization levels, which means designing the system so that a general administrative assistant can see basic contact details but is completely blocked from viewing sensitive salary or pay grade information, with access granted only to top-level HR managers. A second crucial method is encryption, where the database scrambles private data like pay grades and medical records into unreadable cipher text, so even if a hacker physically steals the hard drive, the information stays completely useless without the decryption key held by the hospital’s security team.

Question 2

Database design is a complex process that takes place in a range of phases. Different phases of database design use different schema.
(a) Describe the difference between a conceptual schema and a logical schema.
(b) Explain the importance of a data definition language in implementing a data model.
(c) Explain why data modelling is used during the development of a database.
(d) Explain why both data validation and data verification are required to ensure the correctness of the data within a database.
(e) Outline how data integrity is maintained during a database transaction.
(f) Outline the role of relational integrity in maintaining data consistency within a database.

Most-appropriate topic codes (IB Computer Science HL):

• Topic A.2 — The relational database model (Parts (a), (b), (c), (d), (e), (f))

▶️ Answer/Explanation

(a)
For the correct answer:
The conceptual schema is a high-level/least detailed representation of the database, identifying entities and relationships. The logical schema is more detailed, showing field names and developed from the conceptual schema.

The conceptual schema is like a rough sketch focusing on “what” data the system needs, identifying high-level entities like ‘Patient’ and ‘Doctor’ and their basic relationships without caring about database software. The logical schema takes that rough idea and formalises it into a precise blueprint, specifying every single field name, data type, and primary key—it’s the “how” stage that database developers can directly translate into a working system.

(b)
For the correct answer:
A DDL is used to specify the schema of a database, allowing you to define the tables, fields, and set datatypes (e.g., CREATE TABLE).

A Data Definition Language is crucial because it provides the actual coded commands to physically build the database structure you designed on paper into the computer. Without DDL commands like CREATE TABLE, your carefully planned data model would remain just an idea, as these statements are the only way to tell the database software to allocate space, create columns with specific data types, and enforce the relationships between your tables.

(c)
For the correct answer:
It helps to identify the entities/tables in the database to ensure they support the database’s purpose. Normalization during data modelling reduces data duplication, which reduces data anomalies and saves storage space.

Data modelling is used early on to visualise and organise all the information before any code is written, preventing a chaotic mess of redundant data. By carefully identifying entities and their attributes and then applying normalization rules, you eliminate harmful duplication—so a patient’s address gets stored once, not a hundred times—which prevents update anomalies where changing a piece of data in one place would leave old, incorrect versions in another.

(d)
For the correct answer:
Data validation is an automated process that ensures input meets data entry rules. Data verification is the checking of data to ensure it is the input intended. Using both techniques provides the optimal solution.

Validation alone isn’t enough because it only checks if data looks plausible, like a date of birth being in the past, but it can’t catch a user accidentally typing a correct-looking but wrong surname. Verification handles this human error, often by asking a person to double-check their input, so together, a system can automatically block impossible values while also catching slips of the finger, leading to truly accurate records for a hospital.

(e)
For the correct answer:
Integrity is maintained by no changes being made until the transaction is complete. If the transaction cannot be completed it is rolled back to the original state (Atomicity).

Consider transferring a patient’s record between departments—this isn’t a single action but a multi-step process, and the database must treat it as one unbreakable unit using atomicity. If any step fails—say the system crashes right after removing the patient from the old department but before adding them to the new one—the whole transaction must be rolled back automatically, returning the database to its initial, consistent state and preventing the patient’s record from being lost or orphaned.

(f)
For the correct answer:
Referential integrity is maintained by the connection between the primary key in one table and the foreign key in another, ensuring records are appropriately updated and preventing orphan records.

Relational integrity acts as a strict rulebook that prevents your database from falling into chaos, for example, by making it impossible to assign a staff member to a DepartmentID that doesn’t exist in the DEPARTMENT table. It also manages cascading updates—if a department’s ID code changes, the system automatically updates it for every single staff member linked to it, ensuring that all connections remain valid and that no one ends up assigned to a non-existent department.

Question 3

Alpha Hospital uses a database application to book appointments. Information can be shown in different formats.
Figure 2 shows an example of a patient’s appointments displayed in the application.
The Age field is a derived field.
(a)(i) Outline one reason why a derived field would be used in a database.
(a)(ii) Describe how the derived field Age would be calculated.
Patient information can be represented in the following format:
PATIENT (PatientID, FirstName, Surname, PreferredName, DateOfBirth)
(b) Outline the difference between first normal form (1NF) and second normal form (2NF).
(c) Construct a database in third normal form (3NF) for all of the data shown in Figure 2. You should use database notation as shown in the PATIENT table.

Most-appropriate topic codes (IB Computer Science HL):

• Topic A.2 — The relational database model (Parts (a)(i), (a)(ii), (b), (c))

▶️ Answer/Explanation

(a)(i)
For the correct answer:
A derived field is calculated using data that exists within the database, meaning the data does not have to be input or take up storage space.

Using a derived field like Age is a smart design choice because age changes constantly while a date of birth is fixed, so storing the age directly would require relentless manual updates. Instead, the database simply calculates it on the fly from the stored DateOfBirth whenever you run a query, guaranteeing the displayed age is always perfectly current without wasting any disk space.

(a)(ii)
For the correct answer:
Select Date of Birth from the patient table and subtract the year of the birth date from the current year to get the Age.

To dynamically compute a patient’s age, the system retrieves their stored DateOfBirth and mathematically finds the difference between that and today’s date. Behind the scenes, this usually involves a function like selecting `YEAR(CURDATE()) – YEAR(DateOfBirth)` from the table, which neatly outputs their age as a whole number without anyone having to do the mental arithmetic.

(b)
For the correct answer:
The prerequisite for 2NF is 1NF. The focus of 1NF is to eliminate repeating groups and ensure atomicity of data values, while the focus of 2NF is to ensure full functional dependency by removing partial dependencies.

First normal form is the foundational level where you make sure every cell contains just one value and there are no repeating columns, essentially cleaning up the messy spreadsheet layout into a proper table. Second normal form then builds on this by tackling only tables with composite primary keys, ensuring that every non-key attribute depends on the entire composite key, not just a part of it, which further reduces redundancy.

(c)
For the correct answer:
PATIENT (PatientID, FirstName, Surname, PreferredName, DateOfBirth)
DOCTOR (DoctorID, FirstName, Surname)
DEPARTMENT (DepartmentID, departmentName)
APPOINTMENT (PatientID*, DoctorID*, date, time, DepartmentID*)

To achieve a clean 3NF design, I need to split the appointment data into distinct entities because a patient and a doctor are separate real-world things with their own properties, and each department name should only be stored once. The central APPOINTMENT table ties everything together using foreign keys, so one appointment record simply points to the correct PatientID, DoctorID, and DepartmentID, while the date and time fields are fully dependent on that unique appointment event, eliminating any transitive dependencies.

Question 4

Alpha Hospital is part of a wider health authority that manages several hospitals in the region. This health authority is planning to integrate the databases held by each hospital into a single database.
The health authority is considering a range of database models for this integration. One model is a relational database.
(a) Identify two other database models that could be used to integrate the databases held by each hospital into a single database.
The National Health Authority wants to coordinate all data records in a data warehouse. Each regional health authority will provide the data that will be mined.
(b) Define the term data warehouse.
(c) Explain why the extract, transform, load (ETL) process is used to prepare data for data warehousing.
(d) Outline the importance of timestamping in a data warehouse.
(e) Describe how data in a data warehouse is updated in real time.
(f) Describe the process of deviation detection.
Data mining techniques are being used in the data warehouse to assist the detection of patterns in the health data. The pattern detection approaches being used include cluster analysis and association rules.
(g) Compare and contrast cluster analysis and association rules as methods of identifying patterns in data mining.

Most-appropriate topic codes (IB Computer Science HL):

• Topic A.4 — Further database models and database analysis (All parts (a) to (g))

▶️ Answer/Explanation

(a)
For the correct answer:
object-oriented; network

Apart from the classic relational model, an object-oriented database could be a powerful alternative because it stores data as objects, making it very intuitive for applications built with object-oriented programming languages. A network model is another option, which creates a more flexible web of connections between records, although it can become complex to navigate and manage compared to simpler relational tables.

(b)
For the correct answer:
Subject-oriented, integrated, time-variant, and non-volatile collection of data used in support of management decision making.

A data warehouse isn’t built for daily transaction processing like booking an appointment, but is a massive, centralised repository where data from multiple different systems is copied and organised for high-level analysis. It’s designed to be read-only, holding historical snapshots over long periods, so that managers can spot long-term trends without slowing down the operational databases.

(c)
For the correct answer:
The data required for a data warehouse may require the aggregation of data from multiple sources. The format of the data needs to be revised so it is in a consistent/standardized format before it is loaded into the data warehouse.

An ETL process is essential because each hospital in a region might have patient records in wildly different structures, with different codes for the same treatment or incompatible date formats. The ETL pipeline first extracts all this messy raw data, then transforms it by cleaning up errors, standardising naming conventions, and resolving duplicates, and finally loads this uniform, reliable dataset into the warehouse where it becomes truly useful for cross-hospital analytics.

(d)
For the correct answer:
The timestamping of data provides a recorded representation of data at a particular date/time. This allows data from different times to be compared and enables trends to be established.

Timestamping is what makes a data warehouse time-variant, essentially giving every piece of data a historical context so you know exactly when it was valid. Without it, comparing last year’s flu admissions to this year’s would be meaningless, but with a timestamp, analysts can slice the data by any period to build accurate trend lines and track seasonal patterns.

(e)
For the correct answer:
Data is extracted in real time from transaction systems, transformed and standardised, and then the system executes transactions in short intervals to load the data into the data warehouse.

For near-real-time updates, modern systems often use change data capture, where the moment a new record hits the operational database, a lightweight trigger grabs just that change and streams it through a mini-ETL pipeline. This freshly formatted data is then committed to the warehouse almost immediately, ensuring a dashboard showing current bed occupancy is only seconds behind the live hospital system.

(f)
For the correct answer:
Data is collected, standardized and transformed into numeric values, and statistical methods are used to build a model of normal behaviour to determine outliers.

Deviation detection is like a digital watchdog that learns what ‘normal’ patient activity looks like, from average wait times to typical prescription amounts, and then sounds the alarm when something falls outside these learned thresholds. For instance, if a single pharmacy suddenly orders ten times its usual quantity of a controlled drug, the system flags this outlier for immediate investigation by the health authority.

(g)
For the correct answer:
Cluster analysis groups abstract objects into classes of similar objects, while association determines relationships based on the co-occurrence of large sets of data items. Both techniques use unsupervised learning techniques.

Cluster analysis is like a sorting machine that puts patients into distinct groups based on measurable similarities, such as segmenting a population by common symptom patterns without any prior labels. Association rules work differently by mining for “if-then” patterns, like discovering that a diagnosis of diabetes very frequently occurs alongside a specific heart condition, and both are similar in that the computer finds these hidden patterns on its own without being told what to look for.

Question 5

Global warming can be measured over time using mean temperatures. Figure 3 shows the mean daily maximum temperatures from 1970 to 2020 for Nauru, an island in the Pacific Ocean.
 
(a) State two variables that are necessary for the mean daily maximum temperature to be calculated.
(b) Identify the steps that would be used to create the diagram exactly as shown in Figure 3 using the data in a spreadsheet.
(c) Identify two reasons why the mean daily maximum temperature data is presented in graphical form.
An automated system is used to collect the temperature data for Nauru once an hour for one year.
(d) Outline one reason why the temperature data is collected once an hour rather than at shorter intervals, such as once a minute.
(e) Describe the steps that should be used to store the data collected for one day (24 hours) in suitable parallel one-dimensional (1D) arrays.
The time the data was collected must be easy to identify. Do not write a pseudocode algorithm.
The temperature data is downloaded every day and collated into a master file.
The data from the master file is loaded into a suitable array for that 24-hour period. The following statistics are calculated:
  • Maximum temperature
  • Minimum temperature
  • Mean temperature
As Nauru is very close to the equator, the length of the day changes very little throughout the year. For the purposes of part (f), the lengths of its day and night are:
  • Day: 07:00 to 18:59 inclusive (12 hours)
  • Night: 19:00 to 06:59 inclusive (12 hours)
(f) Construct a pseudocode algorithm to calculate the:
  • maximum temperature
  • minimum temperature
  • mean temperature
  • mean night-time temperature.
Assume the arrays for the time of day of the reading and hourly temperature readings have already been set up and populated as parallel 1D arrays.

Most-appropriate topic codes (IB Computer Science HL):

• Topic B.1 — The basic model (Part (a) (b) (d) (e))
• Topic B.3 — Visualization (Part (c))
• Topic B.2 — Simulations (Part (f))

▶️ Answer/Explanation

(a)
To calculate the mean daily maximum temperature, you need a variable to hold each day’s highest reading, such as `DailyMaximumTemperature`. You also need an accumulator variable, such as `TotalTemperature`, to sum these daily maximums before dividing by the count.

(b)
The steps to create the diagram would start with selecting the data cells for the mean daily maximum temperatures and the corresponding years. Next, you would choose the line graph chart type from the spreadsheet’s insert menu. Then, you would ensure the chart has the correct heading and finally label both axes appropriately, such as ‘Temperature (°C)’ for the y-axis and ‘Year’ for the x-axis.

(c)
The data is presented in graphical form firstly because displaying data visually allows trendlines to be identified, and this particular trend clearly displays the increase in mean daily temperatures over the time period shown. Secondly, complicated data like a long sequence of temperature readings is simply much easier for a person to understand and interpret quickly when it is displayed visually rather than as a table of raw numbers.

(d)
Collecting data once an hour, which results in 24 recordings over a day rather than the 1440 if collected every minute, will require significantly less data processing and less storage space. Furthermore, the minor temperature changes that might happen within each hour are negligible and should not affect the overall study of the climate, so hourly readings provide a sufficient and efficient dataset.

(e)
To store the data for a single day, you would first create two parallel one-dimensional arrays of size 24, perhaps named `TIME` and `TEMP`. You could then initialise the time array with the 24-hour clock format from 00:00, 01:00, and so on, up to 23:00 for the full day. Using a loop counter, you would loop 24 times, and during each iteration, you would store or display the current time element `TIME(N)` before inputting and storing the corresponding temperature reading in `TEMP(N)`.

(f)
The pseudocode algorithm would look like this:
TOTAL = 0
NIGHT_TOTAL = 0
MIN = 1000
MAX = -1000
loop T from 0 to 23
    NEXT = TEMP[T]
    TOTAL = TOTAL + NEXT
    if TIME[T] >= 19:00 AND TIME[T] < 07:00 then
        TOTAL_NIGHT = TOTAL_NIGHT + NEXT
    end if
    if NEXT < MIN then
        MIN = NEXT
    end if
    if NEXT > MAX then
        MAX = NEXT
    end if
end loop
MEAN_TEMP = TOTAL / 24
MEAN_TOTAL_NIGHT = TOTAL_NIGHT / 12

Question 6

The global increase in mean temperatures is causing concern, and governments are using computer models to determine potential future changes.
Many islands in the Pacific Ocean are close to or below sea level and have observed an increased incidence of coastal flooding. This suggests there is a relationship between the increase in mean temperatures and the increase in mean sea levels.
Figure 4 shows the mean sea level of Nauru from 1994 to 2015.
If the mean sea level rises, the probability of coastal flooding will also increase. This will put many of the inhabitants of Nauru at risk.
It has been proposed that a simulation be developed to show the effects of rising temperatures on the extent and frequency of coastal flooding.
(a) Distinguish between a model and a simulation.
(b) Describe how to identify the rules required to create a simulation from the mean temperature and mean sea level data.
(c) Evaluate how test cases could be used to effectively validate the accuracy of this proposed simulation.
(d) Discuss the advantages and disadvantages of using a simulation for decision making in the coastal areas of islands in the Pacific Ocean.

Most-appropriate topic codes (IB Computer Science HL):

• Topic B.1 — The basic model (Part (a))
• Topic B.2 — Simulations (Parts (b), (c), (d))

▶️ Answer/Explanation

(a)
For the correct answer:
A model is a mathematical representation/abstraction of a real-life situation. A simulation is the running of a mathematical model over time on a computer.

Think of a model as a static blueprint or a set of equations that describes the relationship between temperature and sea level—it’s the design on paper. A simulation breathes life into that model by executing it on a computer, stepping through time to show what actually might happen to Nauru’s coastline each year as temperatures rise.

(b)
For the correct answer:
Collect / analyse data for any pair of temperature, sea level, and flooding; Suggest rules for the relationship; Check the suggested rules with actual data.

First, I would plot the historical temperature data against the sea level data to look for a mathematical correlation, perhaps a linear or exponential trend, that can be turned into a formula or a rule connecting the two. Once I have a candidate rule, I would feed historical temperature data into it and see if the predicted sea levels match what was actually observed, tweaking the rule until the output realistically mirrors the real-world records.

(c)
For the correct answer:
Find historical data not impacted by external factors; Input the observed data; Verify the simulation against the known results; Modify the model if necessary. This use of test cases will help to improve the model and therefore make the simulation more accurate.

An effective way to validate is to use a technique called back-testing, where I would run the simulation on a past decade of data that we already know the outcome for, like the years 2000–2010, and check if the simulation’s predicted flooding matches the historical records. If the simulation’s forecast for that period aligns closely with the real documented floods, I gain confidence in its reliability for predicting future scenarios, but if it’s off, I must go back and refine the underlying rules and algorithms until the test cases pass.

(d)
For the correct answer:
Advantages: It can produce measurable visual results and help predict flooding so communities can be prepared. Disadvantages: It is a crude solution only testing one or two factors and does not consider social or cultural factors.

A simulation gives government planners a powerful, risk-free sandbox to visualise “what-if” scenarios, such as testing if a proposed sea wall will save a village under the worst-case temperature rise by 2100, all without spending a dollar on concrete. On the downside, these decisions carry a heavy burden because a simulation is a radical simplification of reality and cannot model the deep cultural trauma of relocating a community from their ancestral land, yet a flawed simulation might be the only tool available, leading to decisions that technically work on paper but are socially devastating.

Question 7

Organizations such as Earth.Org have raised concerns about the rate of sea level increase for Nauru. They stated, “Sea level rise is 2 to 3 times faster around Nauru than the global average, putting its freshwater supplies and crops at risk of saltwater contamination. Already reliant on economic aid, Nauru’s basic resource needs may have to be acquired externally for life to be sustained on the island.”
The 2D visualisation in Figure 5 shows the projected impact of mild and extreme sea level increases on Nauru by 2100.
(a) Define the term visualization.
There are proposals to develop a 3D visualization of the impact of rising sea levels on Nauru.
(b) Outline the relationship between images stored in memory and 3D visualizations.
(c) Discuss the time and memory considerations of 3D animation in the proposed 3D visualization for Nauru.

Most-appropriate topic codes (IB Computer Science HL):

• Topic B.3 — Visualization (All parts (a) to (c))

▶️ Answer/Explanation

(a)
For the correct answer:
Visualization is a graphical representation of data.

A visualization transforms abstract numbers, like projected sea levels, into a graphical format that our eyes and brain can instantly process. Instead of reading a table of elevation figures, a colour-coded map of Nauru showing blue where the water will encroach makes the existential threat immediately obvious to anyone.

(b)
For the correct answer:
The image in memory is stored as a mathematical model. Images in memory are rendered to create a 3D visualization.

Inside the computer’s memory, a 3D scene of a flooded Nauru isn’t stored as a photograph but as a mathematical wireframe model defining the precise coordinates of every building and coastline. The graphics processor then takes that mathematical description and performs rendering, calculating lighting, shadows, and textures to convert it into the final 2D image on the screen that looks three-dimensional to us.

(c)
For the correct answer:
3D animation is very complex in terms of programming and requires a lot of time for processing/rendering. Rendering different layers and transitions requires a lot of RAM and may also require the use of secondary memory/GPU, which has the issue of a different processing speed. Therefore, 3D animation requires sufficient, fast primary and secondary memory.

Creating a smooth, fly-through animation of Nauru’s future flooding is extremely demanding because each frame of the film showing water creeping inland could take minutes or even hours to render, meaning a full five-minute animation might tie up a powerful computer for days. On the memory side, the system must hold the massive 3D mesh of the island’s terrain and all the high-resolution textures for sand, vegetation, and buildings in RAM simultaneously; if the working memory runs out, the process spills over onto much slower secondary storage, causing the rendering to grind to a frustrating crawl, so investing in fast GPU memory and plenty of it is non-negotiable for this kind of project.

Question 8

Genetic algorithms are based on Darwin’s theory of natural selection. The process selects the fittest individuals.
(a) Identify three of the phases in a genetic algorithm.
Developments in machine learning and neural networks have led to simulations that are able to beat the leading human players of games. For example, Deep Blue beat Gary Kasparov in chess, and Alpha Go beat Fan Hui in go.
The neural network used to defeat Fan Hui used the go board (see Figure 6) as an input device.
Within the neural network there is a policy layer that selects the next move. There is also a value network that predicts the winner of the game.
(b) State two components of a neural network.
(c) Identify the steps that could be used to train the neural network used for the simulation of go to recognize patterns of play.
(d) Explain how supervised learning and unsupervised learning could lead to different outputs from the neural network.
Neural networks can be used in a variety of contexts, such as for predicting outcomes of board games like go and chess or for natural language processing.
(e) Explain how advances in natural language processing have improved the accuracy of the predictions of neural networks.

Most-appropriate topic codes (IB Computer Science HL):

• Topic B.4 — Communication modelling and simulation (Parts (a), (b), (c), (d), (e))

▶️ Answer/Explanation

(a)
For the correct answer:
Selection; Crossover; Mutation

A genetic algorithm starts by evaluating the current population against a fitness function and then selects the strongest candidates to be parents for the next generation, much like natural selection favouring the fittest. These selected solutions then undergo crossover, where parts of two good solutions are combined to create a potentially even better offspring, and finally, random mutation tweaks a small element to maintain diversity and prevent the algorithm from getting stuck at a local maximum.

(b)
For the correct answer:
Input layer; Hidden layer(s)

At its simplest, a neural network has an input layer that receives the raw data, which in the case of Go would be the state of every position on the board. Between the input and the final output, there are one or more hidden layers where the actual deep processing and pattern recognition magic happens through weighted connections.

(c)
For the correct answer:
Teach the rules for moves allowed in the game; Allow the neural network to predict moves based on previous moves; Play against human players of differing abilities; Play against itself; Use feedback loops to modify its decision making; Link back to the value network to judge how successful it has been.

Training begins by feeding the network thousands of historical professional Go games so it can learn fundamental strategies and common patterns of stone placement by imitation. The most powerful next step is self-play, where the AI plays millions of games against a copy of itself, and after each match, the value network judges who won, and the policy network updates its weights to make moves that led to victory more likely in the future, creating a continuous feedback loop of improvement.

(d)
For the correct answer:
Supervised learning uses labelled data to teach known outcomes based on given inputs, which can make the results more predictable. Unsupervised learning uses unlabelled data and the AI learns by looking for patterns and relationships on its own, which can lead to unpredictable outcomes.

If a Go neural network is trained using supervised learning with labelled historic games showing the correct winning move for each position, it will output safe and predictable strategies that closely mimic human grandmasters. In contrast, unsupervised learning just lets the network loose to find statistical structures in the data on its own, and the resulting output can be radically novel and creative, producing bizarre but effective strategies no human has ever considered, which is exactly the approach that led AlphaGo to invent its legendary “move 37.”

(e)
For the correct answer:
Pre-trained language models are pre-trained on large data sets and can be fine-tuned for specific tasks with a smaller set of labelled data. Contextual embedding represents words with a context, which greatly enhances semantic understanding and reduces ambiguity.

Advances like Transformer architectures and contextual embeddings mean a neural network no longer sees a word like “bank” as just a generic token but understands from the surrounding text whether it refers to a financial institution or a river side, resolving ambiguity that baffled older systems. This deep semantic grasp allows predictions in tasks like translation or question-answering to be far more accurate because the model has a genuine, context-aware understanding of language rather than just statistically matching words.

Question 9

Alexia and Jay are discussing their use of online resources to prepare for their IB examinations. They often visit websites such as BBC Bitesize.
Alexia refers to accessing the websites as “surfing the internet” and Jay refers to the resources as being “on the web”.
(a) Distinguish between the internet and the World Wide Web.
The uniform resource locator (URL) for the BBC Bitesize website is as follows: https://www.bbc.co.uk/bytesize/index.htm
(b) Identify two characteristics of a URL.
Jay wants to transfer a file to Alexia and suggests they use file transfer protocol (FTP).
(c) Identify two characteristics of file transfer protocol.
(d) Explain how a web browser functions.

Most-appropriate topic codes (IB Computer Science HL):

• Topic C.1 — Creating the web (All parts (a) to (d))

▶️ Answer/Explanation

(a)
For the correct answer:
The internet is a global network of inter-connected computers using internet protocols. The World Wide Web is a service on the internet, a collection of information and resources accessed via the internet using web browsers and protocols like HTTP.

The internet is the physical hardware infrastructure—a massive global network of cables, routers, and servers that connects billions of devices—while the World Wide Web is simply one of the many services that runs on top of that network, specifically the system of hyperlinked documents and resources accessed through a web browser using the HTTP protocol.

(b)
For the correct answer:
Protocol – https; Host – www.bbc.co.uk

A URL is packed with identifying information, starting with the protocol which tells the browser how to communicate with the server, which in this case is the secure HTTPS. Following that, the host or domain name uniquely identifies which specific computer on the vast internet hosts the files you want, here directing you to the BBC’s server.

(c)
For the correct answer:
FTP establishes two connections, one for control and one for data transfer; FTP facilitates the transfer of files efficiently between a client and a server.

FTP uses a clever dual-channel approach where one connection on port 21 handles the commands and login credentials, while a separate connection on port 20 is dedicated purely to shuttling the actual file data back and forth. This separation ensures that even while a large file is being transferred, you can still send control commands like abort or check the progress without clogging up the data pipeline.

(d)
For the correct answer:
Applies the appropriate protocols to enable communication with the web server; Provides a way to navigate to, access and fetch web pages using HTTP Request and Response; The data, typically written in a markup language such as HTML, is rendered to display web pages properly.

When you type a web address, the browser first acts as a translator, contacting a DNS server to convert the human-readable URL into a machine-friendly IP address before sending a structured HTTP GET request to the server. Once the server replies with the HTML, CSS, and JavaScript files, the browser switches roles and becomes an interpreter, parsing the raw code and meticulously laying out the text, images, and interactive elements on your screen in a process called rendering.

Question 10

Alexia and Jay are researching the web development language PHP. They type the phrase “PHP” directly into a web browser (see Figure 7).
The web browser redirects them to a popular search engine, which executes a search.
(a) Define the term search engine.
The operation of a search engine can be divided into three steps (see Figure 8).
(b) Describe how a web crawler functions.
(c) Outline why keywords are important for web indexing.
(d) Discuss whether an organization should use black hat search engine optimization (SEO) techniques to improve the ranking of its website.
Search engines return a very large number of results, but many of the web pages are not useful. The search needs to be refined.
Alexia and Jay’s teacher recommended that they use an online database that accesses the deep web.
(e) Distinguish between the surface web and the deep web.
One of the online databases provides Alexia and Jay with the following code:
<?php
if(isset(\$_FILES[‘CV’])){
\$errors = array();
\$file_name = \$_FILES[‘CV’][‘name’];
\$file_size = \$_FILES[‘CV’][‘size’];
\$file_tmp = \$_FILES[‘CV’][‘tmp_name’];
\$file_type = \$_FILES[‘CV’][‘type’];
\$file_ext = strtolower(end(explode(‘.’,\$_FILES[‘CV’][‘name’])));
\$extensions = array(“pdf”,”doc”,”docx”);
if(in_array(\$file_ext,\$extensions) === false){
\$errors[] = “This file extension not allowed”;
}
if(\$file_size > 2097152){
\$errors[] = “File size must be under 2 Mb”;
}
if(empty(\$errors) == true){
move_uploaded_file(\$file_tmp,”CV/”.\$file_name);
echo “Success”;
}else{
print_r(\$errors);
}
}
?>
<html>
<body>
<h1>Curriculum Vitae</h1>
<form action = “” method = “POST” enctype = “multipart/form-data”>
<input type = “file” name = “CV”/>
<input type = “submit”/>
</form>
</body>
</html>
(f) Identify four steps that take place during the processing of this PHP code.
The PHP code is processed on the server.
(g) Explain why an organization would choose to use server-side processing rather than client-side processing when delivering content to the client.

Most-appropriate topic codes (IB Computer Science HL):

• Topic C.2 — Searching the web (Part (a), (b), (c), (d), (e))
• Topic C.1 — Creating the web (Parts (f), (g))

▶️ Answer/Explanation

(a)
A search engine is a software system, program, or application that searches the World Wide Web or a database for keywords that match the user’s specification. It can also apply filters such as date, usage rights, size, and currency if used.

(b)
A web crawler, also called a bot or spider, starts at a designated “seed” or starting page and reviews and categorises web pages based on criteria for the information searched for. It looks for keywords, content, hyperlinks, and meta tags, and then follows hyperlinks from page to page. The crawler can move through a site either depth-first or breadth-first, often copying part or all of the content of a visited page. This review can be stopped by rules set in a site’s robots.txt file, and the crawler continues this cycle, constantly updating the index with new or changed content.

(c)
Web crawlers look for keywords from the meta keywords and meta tags found in the meta description, title, and potentially the URL of the page. They then determine how many times the keywords appear in the body of the page. This information is used by the ranking algorithm as part of the ranking process, making keywords critically important for web indexing and supporting Search Engine Optimization (SEO).

(d)
An organization must weigh the short-term benefits against the long-term consequences. Black hat SEO involves manipulating search engine rules to gain a higher ranking, which can bring increased traffic, more visitors, and potentially increased revenue, directing users to content the developer wants them to see. Techniques include keyword stuffing, poor quality or duplicated content, hidden keywords, paid links, link farming, and cloaking. However, the disadvantages are severe: if detected, a search engine can penalize the site, resulting in a lower score, blacklisting, or being flagged as an unsafe site, causing reputational damage. Although initial ranking may improve, the long-term score can drop significantly. Furthermore, there are ethical issues concerning the provision of inaccurate or unreliable content. Ultimately, the long-term risk of being de-indexed and suffering reputational harm outweighs the temporary gain in traffic, making it an unsound strategy.

(e)
The deep web is a part of the World Wide Web that is not indexed by search engines and therefore is not discoverable by normal search engines; this includes databases and dynamic pages that require authentication. The surface web, in contrast, is indexed by common search engines and is therefore accessible to most users, being searchable by normal or common search engines like Google or Bing.

(f) (Output of the PHP code, for reference purpose.) 

Curriculum Vitae


Four steps during the processing of this PHP code are: first, a user selects a file and clicks the submit button. Second, data is extracted from the file, including file size, file type, and name. Third, the file is checked against criteria in conditional statements for file extension (which must be pdf, doc, or docx) and for file size (which must not be greater than 2,097,152 bytes). Fourth, if there are errors, they are added to the error array; if the error array is empty, the file is uploaded and a success message is printed; otherwise, the errors from the array are printed.

(g)
When an organization chooses server-side processing, the processing of the script occurs on the web server rather than on the browser of the client. This results in a consistent result regardless of the processing capacity of the client device, and only the processed result is seen by the user. Crucially, the processing and underlying data are secure on the server, offering a more consistent experience for the end user and greater control for the site owner.

Question 11

A social media file-sharing website allows users to upload their original video content.
The site provides an upload module that uses lossless compression.
(a) Outline how lossless compression maintains the quality of a media file.
When a file is decompressed, the content is rendered to fit a standard set of sizes, aspect ratios, and frame rates. Standard video formats, such as MP4, are used for the output of media.
The file format MP4 or MPEG-4 is an open standard for media files.
(b) Identify two characteristics of an open standard.
Social media sites use a distributed system where the data is stored in many countries.
In the European Union, Argentina, and the Philippines the right to be forgotten has been established as law. The right to be forgotten means a person has the right to have private information removed from databases and applications so it cannot be found in an internet search.
(c) Discuss the impact of the decentralized web on an individual’s right to privacy.
Xero, a small software company, was established in 2006 in Wellington, New Zealand. It produced an easy-to-use accountancy service for small to medium enterprises. This product was based on the software-as-a-service (SaaS) model and is sold in a subscription format all over the world. Customer data is securely stored online as part of the subscription cost.
(d) Explain how developments in the web have enabled small companies such as Xero to have a global reach.

Most-appropriate topic codes (IB Computer Science HL):

• Topic C.3 — Distributed approaches to the web (Parts (a), (b))
• Topic C.4 — The evolving web (Parts (c), (d))

▶️ Answer/Explanation

(a)
For the correct answer:
Lossless compression allows the compressed media to be reconstructed perfectly and completely, with no data deleted during the compression. It uses a shorthand version to replace repeating elements.

Instead of discarding any information like lossy compression does, lossless algorithms find and encode statistical patterns, such as replacing a long string of identical colour pixels with a short code saying “repeat this pixel value 500 times.” When the file is decompressed, every single bit of the original data is perfectly restored, so the video frame is mathematically identical to what was first captured.

(b)
For the correct answer:
A standard that is openly accessible and usable by anyone; Not owned by any governing body or private entity; Designed to ensure interoperability between systems and platforms.

An open standard like MP4 is essentially a public recipe book that any software developer can freely read and implement without paying royalties or asking permission from a controlling corporation. This open access guarantees that files created by one program will work flawlessly on any other device or platform, preventing the fragmentation and lock-in that happens with proprietary formats.

(c)
For the correct answer:
The decentralised web uses peer-to-peer networks where no single entity has control. Positive impacts: individual control means the user can allow or restrict information sharing, and ownership of the data remains with the user. Negative impacts: since there is no overall control, there is the ability to publish any information by anyone, making it harder for an individual to take down a page. Because data is stored across multiple nodes, there is a greater surface area for potential breaches or leaks. The right to be forgotten becomes nearly impossible to enforce on a truly decentralised system.

On the one hand, a decentralised architecture liberates individuals from surveillance by giant tech companies, giving them true ownership and cryptographic control over their personal data without a middleman mining it for profit. However, this same lack of a central authority becomes a privacy nightmare when harmful content or personal secrets are leaked, because there is no company headquarters to send a takedown request to; the data is replicated across countless independent nodes worldwide, making a right to be forgotten practically unenforceable under current law.

(d)
For the correct answer:
Economic and operational efficiency: no need to develop physical software packages or large infrastructure; cloud storage protects data from loss. Accessibility and scalability: users from any region can access the platform without physical distribution; widespread smartphone and internet access increases the user base.

The advent of cloud computing and SaaS meant a tiny startup like Xero in New Zealand didn’t need to build a global network of offices and distribution channels; it could simply launch a website and instantly offer its accounting service to anyone with a web browser. Combined with digital marketing and secure online payment gateways, the web effectively erased the geographical and logistical barriers that would have buried a small company just two decades earlier, allowing them to compete on a level playing field with global giants.

Question 12

The bowtie structure of the web is described as a web graph.
Figure 9 shows the bowtie structure of the web based on two web crawls – one in 2000 and one in 2012.
(a)(i) Define the term node.
(a)(ii) Define the term edge.
(b) Define the term strongly connected core (SCC).
Compare the 2000 and 2012 structures in Figure 9. The diameter (relative importance) of the SCC has changed.
In 1980, Robert Metcalfe proposed that the influence or effect of a telecommunications network is influenced by the number of connections in that network.
This can be expressed as a mathematical equation:
\[\text{effect} = \frac{n(n – 1)}{2}\]
Where:
  • n is the number of nodes
  • effect is the number of edges.
(c) With reference to the bowtie structure of the web, explain whether Metcalfe’s law is a useful measure in determining the connectedness of the web.

Most-appropriate topic codes (IB Computer Science HL):

• Topic C.5 — Analysing the web (All parts (a) to (c))

▶️ Answer/Explanation

(a)(i)
For the correct answer:
A node is a point in the network where connections intersect or branch out, usually represented as a website or page, sometimes called a vertex.

In the web graph, each individual webpage like a news article or a blog post is represented as a single node, forming the fundamental dot in the bowtie diagram that hyperlinks can point to or originate from.

(a)(ii)
For the correct answer:
An edge is the link or connection between nodes, representing a hyperlink from one page to another.

Every clickable hyperlink on a webpage is modelled as a directed edge in the graph, connecting the source node to the destination node and capturing the directional nature of web navigation.

(b)
For the correct answer:
The Strongly Connected Core (SCC) is a subset of nodes where each node is connected to every other node through one or more direct links.

The SCC sits at the heart of the bowtie, representing the dense central hub of the web where you can navigate from any page to any other just by clicking links, and comparing the 2000 and 2012 crawls shows the diameter of this core grew significantly, indicating the central hub became more tightly interconnected over that decade.

(c)
For the correct answer:
Metcalfe’s Law assumes full connectivity between all nodes, but the bowtie structure shows only the SCC is fully connected while the IN and OUT components are not connected to each other or the entire network. Therefore, Metcalfe’s Law overestimates overall connectedness because it does not account for directionality or the disconnected parts of the web.

Using Metcalfe’s elegant formula \(n(n-1)/2\) to model web connectedness would falsely assume every webpage can directly reach every other, but the bowtie model visually proves large OUT and IN components exist in relative isolation, meaning the law drastically overestimates the true navigable structure of the internet.

Question 13

The evolution of the web has been described as taking place in three stages: its beginnings as Web 1.0, the static web; Web 2.0, the “social or read-write” web; and finally Web 3.0, the Semantic web.
(a) Describe the aims of the Semantic Web.
Ontologies are a technology associated with the Semantic Web.
(b) Outline two reasons why ontologies rather than folksonomies will enable computer systems like search engines to work in cooperation with people.
Connected devices like Amazon’s Alexa or Apple’s HomePod are often described as examples of ambient intelligence, while some have described search engines as collective intelligence.
(c) Distinguish between ambient intelligence and collective intelligence.
(d) To what extent does the use of collective intelligence contribute to the development and evolution of search engines?

Most-appropriate topic codes (IB Computer Science HL):

• Topic C.6 — The intelligent web (All parts (a) to (d))

▶️ Answer/Explanation

(a)
For the correct answer:
The semantic web is developed using well-structured data and tags/metadata in such a way that it can be read directly by computers. The Semantic Web aims to allow different systems, platforms, and applications to understand and share data meaningfully.

The core vision of the Semantic Web is to move from a web of simple linked documents to a web of linked data with rich, machine-readable meaning, where a computer can understand that a string of numbers on a page isn’t just text but a specific date, price, or location. This enables intelligent agents to autonomously integrate information across different websites, so your calendar could automatically cross-reference a concert listing with your schedule and book tickets without you lifting a finger.

(b)
For the correct answer:
Ontologies use a common formal language, which allows a more consistent and deeper understanding of the relationships. The formal structure better suits computer systems than the informal structure of folksonomy, reducing ambiguity and improving search.

An ontology defines a strict, shared vocabulary with explicit rules for how concepts relate to each other, so a machine knows with certainty that “Canis lupus” and “gray wolf” refer to the exact same biological entity. A folksonomy, like user-generated hashtags on social media, is inherently chaotic and ambiguous—the tag #apple could refer to the fruit, the technology company, or a record label—and this informal messiness makes it impossible for a search engine to reliably infer precise meaning and cooperate with a human’s true intent.

(c)
For the correct answer:
Ambient intelligence is an electronic system that is sensitive and responsive to the presence of people. Collective intelligence is group intelligence where collaborations and consensus contribute to understanding, such as social media or crowdsourcing.

Ambient intelligence is all about the environment becoming smart and adaptive around an individual, like a room that dims the lights and plays your favourite music the moment you walk in based on biometric sensors and learned preferences, focusing entirely on that one person’s immediate context. Collective intelligence is the opposite concept scaled up, where the aggregated, messy contributions of thousands or millions of people—such as the combined editing history of a Wikipedia article or the aggregated clickstream data of all Google users—create an intellectual output that is far smarter and more accurate than any single expert could produce alone.

(d)
For the correct answer:
Collective intelligence contributes fundamentally and extensively to the evolution of search engines. Modern search engines operate by harvesting the collective intelligence of all users through implicit feedback loops.

Google’s foundational PageRank algorithm is itself a form of collective intelligence, determining a page’s authority not by a human editor’s judgment but by mathematically analysing the collective linking behaviour of millions of independent webmasters, which is a form of crowd-sourced voting. Beyond this, every time a user types a query and then clicks one result while ignoring the other nine blue links, that action feeds back into the machine learning models, making the search engine smarter for everyone else, and techniques like analysing aggregated search logs to correct spelling mistakes or predict trending queries mean the system is continuously refined by the hive mind of its entire global user base on a minute-by-minute basis.

Question 14

A company sells healthcare products. Each product type is associated with a particular brand and brand price.
A system is designed to manage product sales, customers, and suppliers.
The Product class keeps details of a product. The following shows part of the code for this class:

public class Product {

private String prodCode; // eg X123
private String prodType; // eg Sunscreen
private String prodDescription; // about the product
private Brand prodBrand; // an object of type Brand
private int prodSale; // number of units sold

// Constructor
// code missing for constructor method

public int getProdSale() {
return prodSale;
}

public Brand getProdBrand() {
return prodBrand;
}

// all accessor and mutator methods are present but not shown

} // end of Product class

The Brand class keeps details of a particular brand. The following shows part of the code for this class:

public class Brand {

private String brandName; // eg Safesun
private float brandPrice; // price of the product of this brand

public Brand(String brandName, float brandPrice) {
this.brandName = brandName;
this.brandPrice = brandPrice;
}

public float getBrandPrice() {
return brandPrice;
}

// all accessor and mutator methods are present but not shown

} // end of Brand class

(a) (i) Define the term private.
(a) (ii) Define the term accessor method.
(a) (iii) Construct an accessor method in the Product class that returns the description for a product.
(b) (i) Construct a UML diagram for the given Product class.
(b) (ii) Construct the code for the constructor method for the Product class that initializes all attributes for a new product.
(c) (i) Construct the code to create an instance of the Brand class that has the brand name Safesun and a brand price of 2.17.
(c) (ii) Describe the difference between a class and an instance of a class.
(d) Identify two features of modern programming languages.

Most-appropriate topic codes (IB Computer Science HL):

• Topic D.3 — Program development (Part (a)(i), (a)(ii), (a)(iii), (b)(ii))
• Topic D.1 — Objects as a programming concept (Part (b)(i))
• Topic  D.3 —  Program development (Part (c)(i), (c)(ii), (d))

▶️ Answer/Explanation

(a)(i)
The term `private` is an access modifier or specifier used for attributes and methods. It makes them only accessible within the class in which they are defined, meaning private variables or methods cannot be accessed from outside the class.

(a)(ii)
An accessor method is a method that returns the value of a private or protected variable of an instance. It allows for accessing these private or protected variables from outside the class in a controlled manner.

(a)(iii)
The accessor method to return the product description would be written as:
public String getProdDescription() {
    return prodDescription;
}

(b)(i)
The UML diagram for the Product class consists of a box divided into three sections. The top section contains the class name `Product`. The middle section lists the private attributes: `- prodCode: String`, `- prodType: String`, `- prodDescription: String`, `- prodBrand: Brand`, and `- prodSale: int`. The bottom section lists the public methods, which include `+ getProdSale(): int` and `+ getProdBrand(): Brand`.

(b)(ii)
The constructor code to initialise all attributes for a new `Product` object would be:
public Product(String prodCode, String prodType, String prodDescription, Brand prodBrand, int prodSale) {
    this.prodCode = prodCode;
    this.prodType = prodType;
    this.prodDescription = prodDescription;
    this.prodBrand = prodBrand;
    this.prodSale = prodSale;
}

(c)(i)
To create an instance of the `Brand` class named `b` with the specified arguments, the code would be:
Brand b = new Brand("Safesun", 2.17f);

(c)(ii)
A class is a blueprint or template that defines the attributes and behaviours of its instances, without necessarily allocating memory itself. An instance of a class is an actual object created from that blueprint, which holds the specific values for its attributes and gets a memory allocation for storing those values.

(d)
Two features of modern programming languages are:
1. Libraries and frameworks of pre-written code, which allow for code reuse and faster development.
2. Exception handling, which provides a structured way to manage runtime errors, making programs more robust and stable.

Question 15

(a) (i) Define the term primitive data type.
(a) (ii) Outline one advantage of using a primitive data type, such as int.
The ProductManagement class has the main method and other methods to generate the information required:

public class ProductManagement {
private Product[] allProducts = new Product[25];

public void sortProducts()
// sort in descending order of prodSale
{
// code missing
}
} // end of ProductManagement class

(b) Construct code for the method sortProducts() to sort the allProducts[] array in descending order of prodSale.
You must make use of the selection sort algorithm.

Most-appropriate topic codes (IB Computer Science HL):

• Topic D.3  — Program development (Part (a)(i), (a)(ii), (b))

▶️ Answer/Explanation

(a)(i)
A primitive data type is a data type that is pre-defined or fundamental in the programming language. It serves as one of the basic building blocks of composite data types or classes and is always assigned a value in memory.

(a)(ii)
One advantage of using a primitive type like `int` is memory efficiency. This is because primitive data types take up less memory than their corresponding object types, which makes them more efficient when working with large data sets or in memory-constrained environments.

(b)
The following code implements the `sortProducts()` method using a selection sort algorithm to sort in descending order:
public void sortProducts() {
    int n = allProducts.length; // accept 25
    for (int i = 0; i < n - 1; i++) {
        int maxIndex = i;
        for (int j = i + 1; j < n; j++) {
            if (allProducts[j].getProdSale() > allProducts[maxIndex].getProdSale()) {
                maxIndex = j;
            }
        }
        // Swap the elements
        Product temp = allProducts[maxIndex];
        allProducts[maxIndex] = allProducts[i];
        allProducts[i] = temp;
    }
}

Question 16

(a) (i) Outline one advantage of polymorphism.
(a) (ii) Outline one advantage of encapsulation.
(a) (iii) Outline one disadvantage of inheritance.
An invoice is created every time a customer purchases one or more products.
The Invoice class keeps details of each invoice. The following shows part of the code for this class:

public class Invoice {

private String invoiceID; // identifies a unique invoice
private static Product[] products = new Product[20]; // list of products purchased
private static int[] prodQuantity = new int[20]; // number of items of a particular product purchased
private boolean qualifiesForDiscount; // default value is false
private int numOfProducts; // how many products in this invoice

// constructor is defined, code not shown

public String getInvoiceID(){
return invoiceID;
}

// all accessor and mutator methods are present but not shown

public void addProduct(Product product, int quantity) {
// code missing
}

public void setQualifiesForDiscount(){
// if total value of purchases is more than 3000,
// qualifiesForDiscount value is set to true
// code missing
}

} // end of Invoice class

(b) Describe how encapsulation has been used in this code.
(c) Construct the method addProduct (Product product, int quantity) that will update an invoice.
The method should:
  • Update the products array
  • Update the prodQuantity array
  • Increment the numOfProducts.
If the total value of the purchases is greater than 3000, the invoice qualifies for a discount.
(d) Construct code for the method setQualifiesForDiscount() to change the status of an invoice.
The method should:
  • calculate the total value of an invoice
  • change the value of qualifiesForDiscount if needed.
(e) Outline one advantage of using modularity in program development.

Most-appropriate topic codes (IB Computer Science HL):

• Topic D.2 — Features of OOP (Part (a)(i), (a)(ii), (a)(iii), (b), (e))
• Topic D.3 — Program development (Part (c), (d))

▶️ Answer/Explanation

(a)(i)
One key advantage of polymorphism is code reusability. It allows developers to write more general and reusable code because a single function or method can work with objects of different classes that share a common interface or base class, reducing the need for duplicate logic.

(a)(ii)
Encapsulation provides data hiding and protection. By keeping sensitive data hidden from outside interference and misuse, and ensuring only authorized methods can access or modify the internal state, it significantly reduces the risk of unexpected behavior and protects the integrity of the object’s data.

(a)(iii)
A major disadvantage of inheritance is tight coupling. This occurs because a child class becomes tightly coupled to its parent class, meaning any change made to the parent can unintentionally affect the child, potentially introducing subtle and hard-to-find bugs.

(b)
Encapsulation has been used in this code by bundling or wrapping the variables and methods that operate on the data into one unit, the `Invoice` class. Furthermore, the variables of the `Invoice` class, such as `invoiceID` and `qualifiesForDiscount`, are declared as `private`. This prevents direct access from outside the class. Controlled access is provided through public methods, like the getter `getInvoiceID()` and the setter `setQualifiesForDiscount()`, to read and modify the private data safely.

(c)
The method to add a product to the invoice and update the arrays would be:
public void addProduct(Product product, int quantity) {
    products[numOfProducts] = product;
    prodQuantity[numOfProducts] = quantity;
    numOfProducts += 1;
}

(d)
The code for the `setQualifiesForDiscount` method would be:
public void setQualifiesForDiscount() {
    float totalValue = 0;
    for (int i = 0; i < numOfProducts; i++) {
        float price = products[i].getProdBrand().getBrandPrice();
        float amount = price * prodQuantity[i];
        totalValue = totalValue + amount;
    }
    if (totalValue > 3000) {
        qualifiesForDiscount = true;
    }
}

(e)
One significant advantage of modularity is that it makes debugging much easier and faster. Because a large program is broken down into smaller, independent modules, there are far fewer mistakes to search through in each individual module, allowing a developer to quickly isolate and fix an error.

Question 17

The Supplier class gives the details of suppliers. The following shows part of the code for this class:

import java.util.LinkedList;

public class Supplier {

private String supplierName;
private String supplierCountry;
private String[] productNames = new String[10];

public Supplier(String supplierName, String supplierCountry, String[] productNames) {
this.supplierName = supplierName;
this.supplierCountry = supplierCountry;
this.productNames = productNames;
}

public String getSupplierName(){
return supplierName;
}

public String getSupplierCountry(){
return supplierCountry;
}

// all accessor and mutator methods are present but not shown

public String displayData(){
System.out.println(“Supplier: “+ supplierName + “, Country: “+ supplierCountry);
}

} // end of Supplier class

The SupplierManager class has methods to generate the information required. The following shows part of the code for this class:

public class SupplierManager {

LinkedList<Supplier> supplierList;

public SupplierManager() {
supplierList = new LinkedList<>();
}

public void addSupplier(Supplier newSupplier) {
// code missing
}

public void displayList() {
// code missing
}

public static int countOfSuppliers(LinkedList<Supplier> supplierList, String country, int n) {
// code missing
}

} // end of SupplierManager class

(a) (i) Outline one advantage of using library collections when developing a program.
(a) (ii) Construct the method displayList() that uses the displayData() method and collections method to output the details of all the suppliers in the linked list.
A linked list, supplierList, is used to store Supplier objects.
(b) Construct the method addSupplier(Supplier newSupplier) that will add new suppliers to a linked list ordered by supplier name.
(c) (i) Define the term recursion.
Many suppliers come from different countries.
A method exists that gives the size, n, of the supplier linked list.
(c) (ii) Construct the recursive method countOfSuppliers (LinkedList<Supplier> supplierList, String country, int n) that returns how many suppliers are from this country.
(c) (iii) Outline two reasons why a recursive algorithm is not the most appropriate algorithm for the scenario described in part (c) (ii).
The programmer has made use of naming conventions throughout the code.
(d) Outline one advantage of using naming conventions.

Most-appropriate topic codes (IB Computer Science HL):

• Topic D.4 — Advanced program development (Part (a)(i) to (d))

▶️ Answer/Explanation

(a)(i)
One significant advantage is that using library collections simplifies data handling and leads to faster development time. Classes like `LinkedList` come with powerful, pre-tested built-in methods to add, remove, and access elements. This reduces the amount of code a developer has to write manually and makes the process far less prone to errors.

(a)(ii)
The `displayList` method would be constructed as follows to iterate through the list and call each supplier’s `displayData` method:
public void displayList() {
    for(int i = 0; i < supplierList.size(); i++) {
        supplierList.get(i).displayData();
    }
}

(b)
The following code constructs the `addSupplier` method to insert suppliers into the correct position to keep the linked list ordered by supplier name:
public void addSupplier(Supplier newSupplier) {
    int index = 0;
    if (supplierList.size() == 0) {
        supplierList.add(newSupplier);
    } else {
        while (index < supplierList.size() && newSupplier.getSupplierName().compareTo(supplierList.get(index).getSupplierName()) > 0) {
            index++;
        }
        supplierList.add(index, newSupplier);
    }
}

(c)(i)
Recursion is a technique or process where a method calls itself to solve a problem. Each subsequent call involves updated parameters that move the execution closer to a base case, which is the condition that stops the recursive calls.

(c)(ii)
The recursive method to count suppliers from a specific country would be:
public static int countOfSuppliers (LinkedList<Supplier> supplierList, String country, int n) {
    // Base case: size of list = 0
    if (n == 0) {
        return 0;
    } else if (supplierList.get(n-1).getSupplierCountry().equals(country)) {
        return 1 + countOfSuppliers(supplierList, country, n-1);
    } else {
        return countOfSuppliers(supplierList, country, n-1);
    }
}

(c)(iii)
Two reasons a recursive algorithm is not the most appropriate for this scenario are: first, memory usage, because recursive algorithms use the call stack for each function call, which leads to excessive memory consumption for a simple task like counting through a long list. Second, processing overhead, as each recursive call adds overhead due to repeated function calls and return handling, making an iterative loop a far more efficient solution with a lower runtime cost.

(d)
An important advantage of using naming conventions is that it greatly improves code readability and maintainability. When developers follow consistent, meaningful naming for variables, methods, and classes, the code becomes much easier for others to understand, which saves time, money, and effort, especially when working in programming teams.

Scroll to Top