Home / 2024_May_Computer_science_paper_2__HL

Question 1

Environmental systems and societies (ESS) students are collecting data about the plant species found on sand dunes as part of their internal assessment. The data is collected from 10 sites using a paper form (Figure 1).
The form shown in Figure 1 is used to input data into the Environment database.
(a) State the data type for:
(i) Species
(ii) Gradient
(b) Outline one way that data validation could be carried out on the gradient attribute.
Three of the tables in the Environment database are shown in Figure 2:
(c) Construct an entity relationship diagram (ERD) for the Plant, Site and Distribution tables.
(d) Outline why a composite primary key is used for the Distribution table.
(e) Identify the steps to create a query to calculate the total number of sites where gorse has been found from the samples carried out on 14 October 2019.
(f) Explain how data consistency can be maintained in the Environment database.

Most-appropriate topic codes (IB Computer Science HL):

• Topic A.1 — Basic concepts (Part (f))
• Topic A.2 — The relational database model (Parts (a), (b), (c), (d), (e))

▶️ Answer/Explanation

(a)(i)
For the correct answer:
String / Text (accept also Varchar, Alphanumeric)

The Species field stores the name of a plant, such as “Marram grass” or “Gorse”. Since these are textual descriptions composed of letters and possibly spaces, the most natural and flexible storage type is a string or text data type, which can hold any sequence of characters without needing mathematical operations on it.

(a)(ii)
For the correct answer:
Real (accept also Integer, Float, Number, Numeric)

The Gradient attribute records a numerical measurement like $8.2$ or $7.9$, representing the steepness of a sand dune slope. Because these values include a decimal point and are not whole numbers, a real (or float) data type is required to store the fractional part accurately rather than truncating it.

(b)
For the correct answer (any one):
Range check — to ensure input is between $-90$ and $+90$; Presence check — to ensure a value has been input; Type check — to ensure the input is numeric.

One practical way to validate the gradient attribute is to apply a range check. Since the steepest possible slope on earth cannot exceed $90^\circ$, setting a rule that the gradient value must logically fall between $-90$ and $+90$ will immediately flag any obviously erroneous entries, preventing nonsensical data from ever entering the database.

(c)
For the correct answer:

The ERD must depict three distinct rectangular boxes representing the Plant, Site, and Distribution tables. The line connecting Plant to Distribution should show a one-to-many relationship (one plant species can appear in many distribution records), and similarly the line from Site to Distribution should also show a one-to-many relationship, since each site can host multiple plant distribution entries over time.

(d)
For the correct answer:
Plant_ID and Site_ID can be repeated / not unique — no single attribute in Distribution table can uniquely identify a record; using multiple fields uniquely identifies a tuple/record and removes the need to add a new ID field as Primary Key (which would waste storage space).

In the Distribution table, neither Plant_ID alone nor Site_ID alone can uniquely identify each row because the same plant can be recorded at multiple sites and the same site can host multiple plants. By combining both fields into a composite primary key, every pair becomes unique without needing to invent a separate artificial ID column, keeping the design efficient and semantically meaningful.

(e)
For the correct answer:
Use Plant and Distribution tables; use COUNT on Site_ID; correct species and date conditions; correct condition for connecting the two tables (Plant.Plant_ID = Distribution.Plant_ID).

You would first need to join or link the Plant and Distribution tables using their shared Plant_ID field. Then filter the combined records by setting the Species to “Gorse” and the Date to ’14/10/2019′. Finally, apply the COUNT function on the Site_ID column to tally how many distinct site entries match those criteria, giving the total number of sites where gorse was sampled on that particular day.

(f)
For the correct answer (any three):
Any data written must be valid according to all defined rules; Referential integrity — all foreign keys should have a primary key; Cascades — changing data in one table should be carried through to related tables; Normalise the database to remove/reduce redundant data; Triggers to automatically invoke updates; Validation checks; Verification checks; Data updates use transactions to ensure automatic rollback on failure.

Data consistency means that the information stored across related tables never contradicts itself. This can be maintained by enforcing referential integrity so every foreign key points to an existing primary key, using cascading updates so that a change made in one table automatically propagates to all dependent records, and wrapping critical multi-table operations inside transactions that will roll back entirely if any step fails, leaving the database in a clean, coherent state.

Question 2

The Bucharesti School website allows parents to login and select school transportation for their children. If they select the school bus, they will have to pay for this service at the end of the month.
(a) Identify the steps that take place in a transaction when a parent attempts to pay the school bus at the end of a month.
(b) Explain how the database management system (DBMS) prevents a record being updated by two parents simultaneously.
(c) Identify two roles of the database administrator at Bucharesti School.
(d) Outline two ways that a database management system (DBMS) can be used to ensure the students’ personal data remains secure.
(e) Explain how the developers of the Bucharesti School database can ensure that it has been designed ethically.

Most-appropriate topic codes (IB Computer Science HL):

• Topic A.1 — Basic concepts (Parts (a), (b))
• Topic A.3 — Further aspects of database management (Part (c), (d), (e))

▶️ Answer/Explanation

(a)
For the correct answer:
Parent authenticated by the DBMS; Outstanding amount/bill is calculated/displayed; Transaction is initiated for the transport payment; Payment details added / entered for transport; If payment details and payment can be processed by DBMS, then School transport Account is credited and parent’s account is debited; Transaction is committed; Else Transaction is rolled back; Notification sent to the parent.

A payment transaction is a carefully ordered sequence. First the system authenticates the parent, then it calculates the outstanding bus fee for the month. Once the parent confirms payment, the system attempts to debit their account and credit the school’s transport account simultaneously — if both halves succeed, the entire transaction is committed and a receipt is generated; if either half fails, everything is rolled back so no money is lost or double-charged.

(b)
For the correct answer:
Record locking / Isolation / Data Locking / Row locking — ensures exclusive editing, done in isolation to prevent the same data item from being changed by two different transactions. OR Optimistic Concurrency Control (OCC) / Multi-version concurrency control (MVCC) — allows multiple users to access the unmodified version of data at the same time; on update request, a check is done to see if the existing data has been modified by another user since it was initially read to prevent lost updates.

When two parents might try to book the last seat on the same bus simultaneously, the DBMS uses record locking to give the first transaction exclusive write access to that specific row. The second transaction must wait until the first completes and releases the lock, ensuring that only one parent can successfully update the available seat count, preventing a double-booking scenario.

(c)
For the correct answer (any two):
Approving Data Access / Managing user accounts; Monitoring Performance / Performance tuning; Backup and Recovery; Implementing security; Upgrading/updating the database / Maintenance.

The database administrator at the school would be responsible for managing who gets access to the system by creating and approving user accounts for parents and staff members. They would also be in charge of backup and recovery, ensuring that if the server crashes, all student transport records and payment histories can be restored without losing critical information.

(d)
For the correct answer (two ways, each with outline):
Access controls — effective restrictions so end users can access only the data or programs for which they have legitimate privileges. User accounts — different users have usernames and passwords/biometrics to enable unique login experiences. Data Encryption — nullifies the potential value of data interception and ensures the confidentiality of data.

One strong method is implementing strict access controls so that a parent logging in can only view their own child’s records and never another student’s data. Another essential measure is encrypting all stored personal data, which means even if someone gained unauthorized access to the raw database files, the information would appear as indecipherable gibberish without the correct decryption keys.

(e)
For the correct answer (up to 6 marks):
Privacy considerations must be ensured — prevention of unauthorized access to private data, providing security measures e.g. logins; Encryption of data to ensure it is unusable to unauthorized users; Especially as many of the data subjects will be under 18; Ensuring that the inappropriate use of data cannot take place, for example sharing the data with third parties without the consent of data subjects; Measures to ensure accuracy and completeness when collecting data; Availability of data content, and the data subject’s legal right to access; Ownership rights to inspect, update or correct these data; The data is not available to the vast majority of its users / Views are used to access specific data instead of giving access to the whole table / Redaction; The database is designed to conform with data protection legislation; Data must not be kept for longer than it is required; Only relevant data must be collected and stored; Keeping the data secure from loss or damage e.g. backup.

Designing ethically means putting the students’ rights at the centre of every technical decision. The developers must ensure the database only collects information that is genuinely necessary (data minimisation), stores it securely with encryption and strict access controls, and gives parents the ability to review and correct their children’s records. Since many students are minors, extra care must be taken, and the system should automatically purge records after a legally defined retention period rather than holding onto them indefinitely.

Question 3

The ATHLETICS table contains information about athletics events.

(a) Outline one reason why databases are normalized.
(b) Outline why the data type for the Olympic Record attribute (OlymRec) cannot be an integer.
The table can also be represented as:
ATHLETICS
(Event, Type, SubType, Gender, OlymRec, WldRec)
(c) Construct the 2nd Normal Form (2NF) of the unnormalized ATHLETICS relation shown above.
(d) Outline why databases are normalized from 2nd normal form (2NF) to 3rd normal form (3NF).

Most-appropriate topic codes (IB Computer Science HL):

• Topic A.2 — The relantional database model (Parts (a), (b), (c), (d))

▶️ Answer/Explanation

(a)
For the correct answer (any one, outlined):
Reduces duplicated/Redundant data — which reduces wastage of storage space / reduces/eliminates data inconsistencies and improves query processing time. Improves data security/privacy — as more granular access control can be implemented on individual tables. Improves data integrity/consistency — as integrity constraints can be set to ensure changes follow allowed rules, eliminating update/insert/delete anomalies. Improves query performance / makes querying data easier — as data is stored in a structured manner.

One compelling reason to normalize is to eliminate update anomalies. Imagine if the Type “Track” was misspelled in one row — without normalization, you would have to hunt through every occurrence and fix it individually, risking inconsistency. By separating the event description into its own table, you store each piece of information exactly once, so a correction needs to be made in only one place.

(b)
For the correct answer:
The value in OlymRec has a decimal point / is not a whole number — so it would not be accurate; a float/double/real data type is required. The timing of athletes will vary in just milliseconds and so the minute will barely change, so a decimal point is required for the accurate measurement.

An Olympic record like $9.63$ seconds for the $100\text{m}$ sprint is a fractional value where the hundredths of a second are critically important. If you stored this as an integer, the record would be truncated to either $9$ or $10$, completely destroying the precision needed to distinguish between a gold-medal-winning time and a runner-up finish.

(c)
For the correct answer:
EVENTS (Event, Type, Subtype) and ATHLETICS (Event*, Gender, OlymRec, WldRec); OR EVENTS (EventID, Event, Type, Subtype) and ATHLETICS (EventID*, Gender, OlymRec, WldRec).

In 2NF, you identify that Type and Subtype depend only on the Event (partial dependency) and not on the combination of Event and Gender. So you split the table in two: an EVENTS table storing each event’s Type and Subtype once, keyed by Event (or an EventID surrogate), and an ATHLETICS table that keeps the Event (as a foreign key), Gender, OlymRec, and WldRec — eliminating the repetition of “Track / Run” for every row of the same event.

(d)
For the correct answer:
To remove transitive (functional)/non-key dependencies — where non-key attribute(s) depend on another non-key attribute; to reduce insert/update/delete anomalies and further reduce data redundancy.

Moving from 2NF to 3NF means hunting down transitive dependencies — situations where a non-key column depends on another non-key column rather than directly on the primary key. For instance, if SubType determined Type, that would be a transitive dependency. Eliminating these ensures you cannot end up with conflicting data (like two subtypes mapping to different types) and further slims down redundancy.

Question 4

Books are sold in physical bookshops and by online retailers. Each retailer maintains a unique database of books sold.
Online retailers can have a much greater range of books available. Books that are very rarely sold are not stocked and can be printed on demand.
Information from each retailer’s database is loaded into a data warehouse where data analytics take place.
(a) Outline one advantage of using a data warehouse.
(b) Explain why the data from the individual retailers is transformed before it is loaded into a data warehouse.
(c) Explain why data warehouses use timestamping.
(d) Distinguish between decision trees and neural network learning algorithms.
(e) Outline why deviation detection is used in data analytics.
(f) Compare cluster analysis and forecasting as techniques to understand and predict data in data mining.

Most-appropriate topic codes (IB Computer Science HL):

• Topic A.4 — Further database models and database analysis (Parts (a), (b), (c), (d), (e), (f))

▶️ Answer/Explanation

(a)
For the correct answer (any one):
They enhance business intelligence — since they integrate huge amounts of data from different sources, managers and executives can make more informed decisions based on diverse data. They reduce decision-making time / Timely Access to data — users can quickly access critical data all in one place. Enhances Data Quality and Consistency — data transformation / standardisation / cleansing is enforced. Provides Historical Intelligence — they store large amounts of historical data that allow analysis of different time periods and trends. Provides more efficient access to data — as data is optimised for reporting through de-normalisation / removal of constraints.

A major advantage of a data warehouse is that it brings together sales figures from every book retailer into a single, consistent repository. Instead of running separate reports against each shop’s unique database and then painstakingly merging them in a spreadsheet, analysts can query the warehouse directly and get a unified picture of which books are selling across the entire market in one go.

(b)
For the correct answer:
When the data is collected from different book retailers, each will have their own formats or standards; data must be standardized to conform with the structure of the data warehouse; the data is transformed to select the data that is useful for analysis; the data is optimised for retrieval; aggregation of data / summarizing and consolidating large sets of detailed data into a more concise and manageable data set — improves query performances and reduces storage requirements.

Each retailer might store dates differently (one uses DD/MM/YYYY while another uses MM-DD-YYYY), categorise book genres under different naming conventions, or even use different currency formats. Before this chaotic mix can be usefully analysed together, it must go through an ETL (Extract, Transform, Load) process that standardises everything — converting all dates to a uniform format, mapping disparate genre labels to a common taxonomy, and cleaning out any incomplete or duplicate records.

(c)
For the correct answer:
Data warehouse update happens in real time / Data in a data warehouse is time dependent / Data is only valid for a period; The user’s needs (data required) change with time; A record update on a transaction is unique with the timestamp; Since a data warehouse contains both current and historical data — timestamps are required to compare data from different times / helps to know the last update of a record; The Retention Limitation Obligation prohibits organisations from holding personal data indefinitely — therefore timestamping can be used as a tool to help ensure data is current.

Data in a warehouse is time-variant, meaning it is only valid for a specific period. By attaching a timestamp to every record, analysts can reconstruct what the sales picture looked like at any given moment in the past — for example, comparing December 2023’s figures with December 2024’s — and the system can automatically purge records that exceed legal retention periods, keeping the warehouse compliant and manageable.

(d)
For the correct answer:
Decision trees — construction of decision tree classifiers does not require any domain knowledge or parameter setting; generally easy to assimilate/understand by humans; easy to understand the logic behind the decision-making process; good accuracy with smaller sets of structured/tabular data; uses estimates and probabilities to calculate likely outcomes. Neural Networks — needs domain knowledge and parameter setting; more complex to assimilate/understand by humans; superior for tasks involving complex patterns and high-dimensional data (e.g., image and speech recognition); high tolerance of noisy data; requires significant computational resources and time; potential for high accuracy with large datasets; uses activation functions and link weights.

The key distinction is transparency versus power. A decision tree produces a series of if-then rules that a human can read and understand — you can trace exactly why a certain book was classified as a bestseller candidate. A neural network, by contrast, learns patterns through layers of weighted connections that act like a black box; it can capture far more subtle and complex relationships in massive datasets, but explaining why it made a particular prediction is notoriously difficult.

(e)
For the correct answer (with link to scenario):
To provide a statistical technique/method to direct appropriate marketing to certain types of books; which helps to detect outlying data that does not fit the assumed model; For example, to detect books with abnormally high or low sales; It can be used to predict the trends and patterns of demand for certain books/genres in the future; It can help discover new opportunities for product development and marketing strategies; As anomalies can sometimes reveal hidden patterns or trends that weren’t previously known; For example, very high book sales in a particular month can reveal a new market niche for that book.

Deviation detection acts like an early warning system. If a normally slow-selling academic textbook suddenly spikes in sales, the anomaly might indicate that a popular online course just added it to their reading list — a golden opportunity to increase stock before demand outstrips supply. Conversely, detecting an unusual drop in sales for a normally reliable title could flag a data entry error or even fraudulent activity that needs investigation.

(f)
For the correct answer (3 marks for each technique):
Cluster analysis — employs unsupervised learning strategy / there are no pre-existing data labels; to discover groups/clusters of data points based on some similarities/differences within the data; the main focus is to find hidden patterns in the data points and segment them for further analysis; it is applied in anomaly detection, target marketing, fraud detection analysis, market segmentation etc. / accept one example relating to the scenario. Forecasting — is a supervised learning strategy / usually uses historical data with labelled target variables; to predict (future) values based on historical data / to predict the value of a dependent variable based on the value(s) of independent variable(s); the main focus is predicting (future) trends or events; it is applied in call volume prediction, weather forecasting, demand forecasting, stock market prediction etc. / accept one example relating to the scenario.

Cluster analysis is like sorting a messy pile of customer receipts into natural piles without any preconceived categories — you might discover that readers naturally fall into groups like “buys literary fiction and poetry” versus “buys only technical manuals,” revealing market segments you hadn’t consciously defined. Forecasting, on the other hand, starts with labelled historical data (sales figures from the past five years) and uses that to project forward, answering questions like “how many copies of this genre will we sell next quarter?” The former is exploratory and descriptive; the latter is predictive and extrapolative.

Question 5

A company designs new kitchens for customers. It has a shop that shows examples of the kitchen cabinets, sinks, wall tiles and floor tiles that can be included in the new kitchen.
When customers have chosen the items they would like for the new kitchen, a simulation is set up to show how these items would look.
(a) State three variables that could be used for this simulation.
(b) Outline two rules that would need to be applied for this simulation to be created within the constraints of the customer’s kitchen.
(c) Outline two factors that would impact on the reliability of this simulation.
(d) Discuss the advantages and disadvantages of using simulation to design a fitted kitchen.

Most-appropriate topic codes (IB Computer Science HL):

• Topic B.1 — The basic model (Parts (a), (b), (c))
• Topic B.2 — Simulations (Part (d))

▶️ Answer/Explanation

(a)
For the correct answer (any three):
NUMBER_OF_WALLS, WALL1_HEIGHT, WALL1_WIDTH, TILE_HEIGHT, TILE_WIDTH, etc. (accept ‘dimensions’ but not vague ‘size’).

To build a meaningful kitchen simulation, the software needs measurable inputs it can mathematically manipulate. Variables like the width and height of each wall let the system calculate how many tiles will fit, while the dimensions of individual cabinets determine whether they can be placed side by side along a given run of wall without overlapping or leaving awkward gaps.

(b)
For the correct answer (two rules, each outlined):
The cabinet widths when added together must be less than the wall width — in order for the cabinets to fit the available space. The width of the room and cabinet depths must be taken into account when fitting cabinets to opposite walls — to make sure there is enough room for a person to comfortably walk/work between them. (Accept other reasonable answers.)

One essential rule is a fit constraint: the sum of all cabinet widths placed along a wall must be strictly less than or equal to that wall’s total length, otherwise the simulation would allow an impossible design. Another critical rule concerns human ergonomics — if cabinets are placed on two opposing walls, the gap between them must exceed a minimum threshold (say $1.2\text{m}$) so that a person can actually stand at the counter, open drawers, and move around without feeling cramped.

(c)
For the correct answer (two factors, each outlined):
The accuracy of the measurements/shape of the room — if the room was not measured correctly, the resulting simulation will not be correct. The dimensions of the cabinets/tiles used in the simulation — the cabinets/tiles may prove to be too big for the available space. The quality of the representation of colours/designs/patterns of the products — may lead to customer disappointment. If it’s not a VR simulation — it may be difficult to visualize how much space there is around a person when the cabinets are fitted. (Do not accept hardware limitations.)

Reliability hinges on input fidelity and perceptual accuracy. If the surveyor’s laser measure was slightly off and recorded a wall as $3.2\text{m}$ when it is actually $3.15\text{m}$, a cabinet run that appeared to fit perfectly on screen might be physically impossible to install. Beyond measurements, the simulation’s colour rendering matters hugely — a tile that looks like a warm cream on a calibrated monitor might arrive as a cold beige in reality, undermining the entire design consultation.

(d)
For the correct answer (2 advantages, 2 disadvantages, 1 conclusion):
Advantages — Allows the customer to try out different designs until it meets their requirements; The simulation is a tool that may help to avoid making expensive mistakes / may help to ensure better customer satisfaction. Disadvantages — The simulation will take time to set up and it may not be timely enough to satisfy the customer; The simulation depends on measurements taken and it may not be accurate enough / the kitchen may not fit in the actual space; The usefulness depends on the hardware — it may take too much time to render. Conclusion — A balanced final judgement on whether simulation is worthwhile.

Simulation gives customers an extraordinary preview — they can swap cabinet colours, rearrange the layout, and instantly see the result without a single physical prototype being built, potentially saving thousands in costly mid-installation changes. However, the simulation is only as trustworthy as the measurements fed into it, and a low-resolution render on a slow computer might look nothing like the final $3\text{D}$ reality, risking disappointment. On balance, the benefits of visual experimentation far outweigh the drawbacks, provided the designer validates every critical dimension with a physical site survey.

Question 6

A real estate agent makes use of electronic brochures to send to potential house buyers. These brochures contain details of the properties, including sets of photographs of the rooms and the different views from the property.
(a) Outline the impact in terms of memory requirements on the potential house buyer’s device when viewing a brochure.
The real estate agent decides to improve their brochures by using animated ‘walk-throughs’.
(b) State the name of the process that relates the original photographs of the properties to the animated ‘walk-throughs’.
(c) Explain how ray tracing may be beneficial to the production of the real estate agent’s animations.
(d) Explain the ethical considerations for the use of animated ‘walk-throughs’ in the new brochures.

Most-appropriate topic codes (IB Computer Science HL):

• Topic B.3 — Visualization (Parts (a), (b), (c))
• Topic B.2 — Simulations (Part (d))

▶️ Answer/Explanation

(a)
For the correct answer:
The images provided may be high resolution; the device will therefore need to have sufficient amount of RAM so that images can load quickly / correctly. Standard memory requirements should be enough since only text and images are being displayed.

Electronic brochures packed with high-resolution property photographs can consume substantial memory when opened. Each uncompressed image might occupy several megabytes in RAM during viewing, so a device with limited memory could struggle, causing slow scrolling, lagging transitions, or even crashes if too many large images are rendered simultaneously on screen.

(b)
For the correct answer:
(3D) Rendering

The process that transforms a set of static photographs into a seamless animated walk-through is called rendering. It involves the computer calculating how each scene should appear from successive viewpoints, generating the intermediate frames that create the illusion of smoothly moving through the property.

(c)
For the correct answer:
Ray tracing renders the image by computing the path of light between the objects in the room and different viewpoints; allows the objects in the room to be simulated and shown as they would look from different positions and angles; allowing a more realistic experience while operating the virtual walk-through.

Ray tracing works by simulating the actual physics of light — tracing rays from a virtual camera through each pixel, bouncing them off surfaces, and calculating how they interact with materials. This means reflections on a polished wooden floor, soft shadows cast by furniture near a window, and the subtle way light diffuses through curtains can all be rendered with near-photographic accuracy, making the walk-through feel genuinely immersive rather than like a cartoon approximation.

(d)
For the correct answer (two considerations, each explained):
Privacy — the animation may accidentally capture images that the homeowner would rather were not made public. Security — the animation may show security flaws so that a potential burglar may enter the property. Reliability — the animation may make the property seem better than it is, so that the potential buyer may feel cheated. Anonymity — items in the animation may identify the seller of the property, which could be illegal / have negative impact on the seller.

One pressing ethical concern is privacy: a 360-degree walk-through might inadvertently reveal personal items like family photographs on the wall or sensitive documents on a desk, exposing details about the current occupants that they never consented to share publicly. Another is representational honesty — if the animation digitally enhances room sizes, hides structural defects, or uses unrealistic lighting to make a dark basement look bright and airy, the buyer is being deceived, which breaches ethical standards of fair dealing in real estate marketing.

Question 7

A supermarket has set up a spreadsheet model to compare its sales for each quarter during the financial year 2020 to 2021.
This model, for each of the eight departments, shows the:
  • quantity of units sold each quarter
  • average units sold per quarter
  • highest quarterly sales
  • lowest quarterly sales.
The manager of the supermarket plans to use this model in meetings with the eight department heads so that they can set targets for future sales.

(a) Identify the functions or formulas that could be used in the cells:

(i) F3 
(ii) G3 
(iii) H3 
(iv) I3
This model needs to be developed to set targets for increasing the sales over the next financial year for the bakery department. The target percentage increase can be changed within the model.
(b) Design a spreadsheet model that will calculate the target sales for the bakery department.
The model will display the updated sales targets for each quarter, the whole year and the average per quarter. The initial sales target is an increase of $7\%$.
(c) Describe one limitation of this model for predicting future profits.
The supermarket uses a second model to predict future sales increases based on previous performance. The spreadsheet in Figure 5 is part of that model. For the year 2020 to 2021, it shows the:
  • revenue for sales taken by each department in each quarter
  • cost of purchasing the stock for the supermarket
  • utility costs of running the store
  • staff costs.
All values have been rounded to the nearest dollar.
(d) Identify the formulas used in the cells:
(i) B30
(ii) B32
The names of the departments have been stored in a one-dimensional array, DEPARTMENT[]. It has been decided to use a number of parallel one-dimensional arrays to store the quarterly figures and the annual totals for each department.
(e) Construct the pseudocode required to enter the data for each department for each separate quarter, calculate the annual totals and store the data into suitably named arrays. [6]

Most-appropriate topic codes (IB Computer Science HL):

• Topic B.1 — The basic model (Parts (a), (b), (c), (d))
• Topic B.2 — Simulations (Part (e))

▶️ Answer/Explanation

(a)(i)
For the correct answer:
=SUM(B3:E3) // =B3+C3+D3+E3

Cell F3 sits in the “Whole year” column and needs to total all four quarterly sales figures for the Bakery row. The simplest approach is to use the SUM function across the range B3 to E3, which adds up the four quarterly values $9.4+10.2+14.7+10.2$ to give the annual total of $44.4$.

(a)(ii)
For the correct answer:
=AVERAGE(B3:E3) // =(B3+C3+D3+E3)/4 // =F3/4

The quarter average in G3 represents the mean sales per quarter. You can either directly average the four quarterly cells, or more elegantly divide the already-calculated whole-year figure in F3 by $4$, which yields $44.4 / 4 = 11.1$.

(a)(iii)
For the correct answer:
=MAX(B3:E3)

To identify the highest-performing quarter, the MAX function scans through cells B3 to E3 and returns the largest value. For Bakery, this would select $14.7$ from the Jan-Mar column.

(a)(iv)
For the correct answer:
=MIN(B3:E3)

Similarly, the MIN function picks out the smallest quarterly value, which for Bakery is $9.4$ from both the Jul-Sep and Apr-Jun quarters.

(b)
For the correct answer (5 marks):
Inclusion of $7$ or $1.07$ anywhere in the answer (either in a cell or in formula); Use of absolute cell reference for common target $\%$ in cell B16; Use correct formula in cell B16, either $=B3 * 1.07$ or $=B3 * (1 + \$C\$12 / 100)$; Use correctly adapted formulas in cells C16, D16, E16; Use correct formulas in F16 and G16, either similar to columns B-E or just directly calculating the sum and average on B16..E16.

The model should place the target percentage (7\%) in a separate cell with an absolute reference so it can be changed later without rewriting every formula. Then each quarter’s target is calculated by multiplying the original sales figure by $1.07$ (or equivalently by $(1 + \text{target\%}/100)$). The target whole-year and target average can then be computed from these four new target quarterly values using SUM and AVERAGE, mirroring the structure of the original data.

(c)
For the correct answer (any one):
The model only shows data for one year — it may not be an accurate basis for predicting next year’s outcome. The model assumes there is the same target $\%$ for each department — this may not be an accurate reflection of the way the business is developing. The future sales cannot be predicted based on a desired $\%$ increase — for a prediction you need trends over time.

One fundamental weakness is that the model extrapolates from a single year’s data. Sales figures for 2020–2021 might have been unusually high or low due to one-off events (like a pandemic lockdown boosting grocery sales or a supply chain disruption depressing deli figures). Using just one data point to forecast the future ignores seasonal trends, economic cycles, and competitor activity, making the $7\%$ target potentially unrealistic or, conversely, far too conservative.

(d)(i)
For the correct answer:
=B24+B28 / =B24+B26+B27 / =SUM(B24, B28)

Cell B30 sits in the “Total costs” row and must combine the wholesale costs subtotal (row 24) with the other costs subtotal (row 28). A straightforward addition formula $=B24+B28$ achieves this correctly.

(d)(ii)
For the correct answer:
=B13-B30 / =B13-B24-B28 / =B13-B24-B26-B27

Cell B32 represents the profit, which is defined as total revenues minus total costs. Looking at the spreadsheet layout, row 13 holds the total revenues and row 30 holds the total costs, so the formula $=B13-B30$ computes the difference, yielding the profit figure of $22.2$ for the Apr-Jun quarter.

(e)
For the correct answer (6 marks):
Use of a loop with correct parameters; Prompts to let user know for which department to enter data/which quarter; Appropriately named array for at least one quarter; All arrays appropriately named; Correct formula to add together annual data; All input/calculated data assigned to correct array elements.

loop COUNT from 0 to 7
    output "Enter data for ", DEPARTMENT[COUNT], " Department"
    output "Apr - Jun: "
    input APR_JUN[COUNT]
    output "Jul - Sep: "
    input JUL_SEP[COUNT]
    output "Oct - Dec: "
    input OCT_DEC[COUNT]
    output "Jan - Mar: "
    input JAN_MAR[COUNT]
    WHOLEYEAR[COUNT] = APR_JUN[COUNT] + JUL_SEP[COUNT] + OCT_DEC[COUNT] + JAN_MAR[COUNT]
end loop

The pseudocode uses a loop that iterates through each of the eight departments (index 0 to 7). For each iteration, it prompts the user with the department name, collects four quarterly sales values into parallel arrays, and then computes the annual total by summing those four inputs and storing the result in a separate WHOLEYEAR array at the same index position.

Question 8

Artificial neural networks are designed to mimic the functioning of biological networks.
(a) Draw a block diagram to represent the interaction between the different parts of an artificial neural network.
A number of applications involving artificial neural networks are related to communication. This includes: speech recognition, optical character recognition and natural language processing.
(b) Explain how speech recognition and natural language processing are used to facilitate communication.
Some hotels are starting to use robots. Robots are used as receptionists to welcome and register guests. Others take verbal orders for drinks and make them.
(c) Outline two key structures of natural language that may make it difficult for robots to understand what the hotel guests are saying.
(d) Outline two developments in the use of modern machine text translators.
Using machine learning enables the trained robot to work independently. Two examples of machine learning are supervised learning and unsupervised learning.
(e) Compare supervised and unsupervised learning in relation to human–computer interaction through artificial neural networks.

Most-appropriate topic codes (IB Computer Science HL):

• Topic B.4 — Communication modelling and simulation (Parts (a), (b), (c), (d), (e))

▶️ Answer/Explanation

(a)
For the correct answer:

The block diagram should show three distinct columns of nodes: an input layer on the left that receives raw data, one or more hidden layers in the middle where weighted computations occur, and an output layer on the right that produces the final result. Arrows connecting each node to nodes in the next layer represent the weighted pathways through which information flows, mimicking how biological neurons pass signals across synapses.

(b)
For the correct answer:
Speech recognition requires the input of an acoustic signal; input to speech recognition systems must be precise; the neural network tries to identify the words that were spoken using deep learning; the output is given based on probability, where the system offers the highest probability ‘suggestion’ as the required output. Natural language processing is the ability of the computer system to understand human language; designed to understand the input from a user as though they were interacting with another person; takes the data from speech recognition as its input and attempts to work out what is required.

Speech recognition acts as the ears — it captures raw audio, breaks it into phonemes, and uses a trained neural network to map those sound patterns onto probable words. Natural language processing then acts as the brain — it takes that transcribed text and parses its grammar, intent, and context to determine what the speaker actually wants, turning “What’s the weather like tomorrow?” from a string of words into a database query that returns a meaningful forecast.

(c)
For the correct answer (two structures, each outlined):
Syntax — spoken natural language often uses sentences that are not semantically complete, e.g. “My case please” which means something like “Get me my case please”. Semantics — natural language includes many referents that need to be resolved, e.g. the underlined words in human speech like “My case is over there. Bring it to my room”. Pragmatics — natural language uses implied understanding, e.g. a person might say “Can you take my case to my room?” to which “Yes, I can” would not be a correct answer.

One major hurdle is semantics — the meaning of words in context. When a guest says “My case is over there,” the robot must resolve the referent “there” by physically identifying which object in the room is being pointed at, something humans do instinctively but machines find computationally challenging. Another is pragmatics — understanding intent beyond literal meaning. If a guest asks “Can you bring me a towel?”, a robot that answers “Yes, I can” and does nothing has failed, because the question was not a capability inquiry but a polite request for action.

(d)
For the correct answer (two developments, each outlined):
Hybrid machine translation — makes use of the strengths of statistical and rule-based translation techniques. Neural machine translation — makes use of a deep learning-based approach.

Neural machine translation represents a dramatic leap forward: instead of translating word-by-word or phrase-by-phrase, a deep neural network reads the entire source sentence, encodes its meaning into a dense vector representation, and then generates a fluent target-language sentence that captures the original intent. Hybrid systems combine this with rule-based grammar checking, using statistical probabilities for fluidity while falling back on explicit linguistic rules to handle edge cases that pure neural approaches might mangle.

(e)
For the correct answer:
Supervised learning — uses rules to follow to match output to known/given input; makes use of training data to learn the link between the input and output; has access to labels and prior knowledge of the datasets; the training data can be generalized and the resulting model accurately used on new data. Unsupervised learning — does not use output data / has no labelled prior knowledge; the algorithm is left to discover and present the hidden structure in the data; is able to use its own prior knowledge/learning to develop further; which allows it to perform more complex tasks over time.

The fundamental difference is whether the training data comes with answer keys. In supervised learning, you show the network thousands of labelled examples — pictures of cats labelled “cat” and dogs labelled “dog” — and it learns to map inputs to correct outputs by minimizing its prediction errors against those known labels. Unsupervised learning receives no such guidance; it is thrown into a sea of unlabelled data and must independently discover clusters, patterns, or anomalies, much like a human exploring a new city without a map and gradually forming mental neighbourhoods based on observed similarities.

Question 9

The web browser shown in Figure 6 includes a feature that enables the user to inspect the source code.
(a) Outline why the URL in Figure 6 is a “Full URL”. 
(b) Sketch the output of the code in Figure 6. 
(c) Outline why the web page in Figure 6 is a static web page. 
The web page in Figure 6 uses Javascript elements.
(d) Explain why the support of client-side scripting languages is a key function of web browsers. 
(e) Distinguish between a protocol and a standard.
A user wants to access another website and enters its URL into the address bar. 
(f) Describe how the domain name service (DNS) enables the user to access the new site.
A user wishes to download a video resource from a web-based host to their smartphone. The site offers a lossy download option and lossless download option. It was recommended that the user uses the lossy compression option for this download.
(g) Explain why lossy compression is used in mobile computing. 

Most-appropriate topic codes (IB Computer Science HL):

• Topic C.1 — Creating the web (Parts (a), (b), (c), (d), (e), (f))
• Topic C.3 — Distributed approaches to the web (Part (g))

▶️ Answer/Explanation

(a)
For the correct answer:
The URL contains the protocol/scheme, domain, sub domain and Top-level domain (path) and the file resource (Query/fragment). For reference: Scheme/Protocol: https://, Subdomain: www, Domain Name: educationalsite.org, Top level domain: org, Path: /assets/home.html

A full URL leaves no ambiguity about how to reach the resource. It explicitly states the protocol (https://), the subdomain (www), the registered domain (educationalsite.org), the top-level domain (.org), and the exact file path (/assets/home.html) — every component the browser needs to locate and retrieve that specific page is present in the address.

(b)
For the correct answer:

The HTML renders a heading reading “Mother Tongue Languages” followed by a table with a single header row listing eight language columns. However, the table body contains no actual data rows, so below the header the table appears empty — just the column titles sitting above blank space, which would look incomplete to a viewer.

(c)
For the correct answer:
The web page does not display different content each time it is viewed; it doesn’t change dependent on either user input, time of day etc.; A static web page requires the source code to be rewritten/edited/modified to add new content; A static page is one where the construction of the page is controlled by the browser on the client device and does not link to an external data source; whereas a dynamic web page is linked to an external data source.

A static page is essentially a fixed document sitting on a server. Every visitor who navigates to that URL sees exactly the same HTML file — the content doesn’t adapt based on who is logged in, what time it is, or any database query. To change anything on the page, someone must physically edit the HTML source code and re-upload it, unlike dynamic pages that assemble content on the fly from databases or APIs.

(d)
For the correct answer:
Client-side scripting languages would include Javascript, jquery etc; These are commonly used languages that add functionality and interaction to pages; There is broad implementation of the languages and are accepted as a standard; Client-side languages are rendered in the browser rather than on the server; Failure to correctly support may reduce functionality, cause errors in appearance or interaction; Broad support of client-side scripting languages allows similar functionality and therefore similar browsing experience independent of browser type.

Modern web pages are not just static documents — they are interactive applications. Client-side scripting (primarily JavaScript) enables everything from form validation to animated menus to real-time content updates without reloading the page. Browsers must interpret and execute these scripts locally on the user’s device; without this capability, the vast majority of today’s websites would break, becoming inert, unresponsive shells of their intended selves.

(e)
For the correct answer:
A protocol is a set of rules which enable network communication that must be followed; A standard is a set of rules which have broad support and should be adhered to and provide a framework for development.

A protocol is a precise agreement on how data is formatted and transmitted — like HTTP defining that a request must start with a method (GET, POST) followed by headers and a body. A standard is a wider, often industry-backed specification that multiple protocols and technologies conform to — HTML5 is a standard because all major browser vendors have agreed to implement it, ensuring web pages render consistently regardless of which browser you use.

(f)
For the correct answer:
The DNS server or Domain Name Service server translates the Domain names into an IP Address; The browser first checks its own cache to see if it has a recent DNS record for the domain. If found, it uses this information to directly connect to the website’s IP address. If the required DNS record is not found locally, the DNS query is sent to the configured DNS resolver until the top-level Domain (TLD). Top-level domain servers are the ultimate authority for the domain and hold the master list of sites for the domain. Address resolution occurs in the application layer of TCP/IP. The DNS resolver sends the IP address back to the user’s web browser and information is displayed. If it is not found, an error message is sent back to the client’s browser.

DNS acts like the internet’s phonebook. When you type a human-readable address like “example.com” into your browser, the DNS system works through a hierarchy — first checking your local cache, then querying your ISP’s resolver, which may in turn ask the .com top-level domain server, ultimately returning the numerical IP address (like $93.184.216.34$) that your computer needs to actually connect to the correct server and load the website.

(g)
For the correct answer (up to 4 marks, from clusters):
File Size — Lossy compression allows the file size to be reduced more than lossless compression; this means it can be used to transfer information between devices more rapidly. Quality — Although there will be some reduction in picture quality or sound, the advantages of greater speed of transfer will outweigh the loss of image quality as the screen size of the mobile device may not be sufficiently large for the imperfections to be apparent. Cost — Lossy compression resulting in smaller file size would require less data usage, conserving bandwidth and reducing costs for mobile users.

On a mobile device, every megabyte counts. Lossy compression aggressively shrinks video files by discarding perceptual details the human eye barely notices — slight colour variations in shadow areas or audio frequencies outside normal hearing range. The result might be a $50\text{MB}$ video becoming $10\text{MB}$, downloading five times faster over a cellular connection, consuming a fraction of the user’s monthly data allowance, and still looking perfectly acceptable on a palm-sized smartphone screen where minor compression artefacts are invisible.

Question 10

While working on an assignment task for History of the Americas, Brooke enters a question into a search engine (Figure 7).
The search returns 173000 results in 0.043 seconds.
Another student indicated that Brooke would obtain better results using keywords rather than a search phrase.
(a) Outline why keywords would be used in a search rather than a phrase.
Web crawling indexes webpages in the search engine’s database (Figure 8). The two web crawling methods used are a breadth-first crawl and a depth-first crawl.
(b) In Figure 8, A has not been previously visited. State the first three webpages visited in a breadth-first search.
(c) Outline one reason why search engines use a breadth-first search.
As the web crawler traverses the pages in a website it collects data. This is used to form the metrics data for search rankings.
(d) Identify two features of the PageRank algorithm.
Many web developers attempt to optimize the search results for their site.
(e) Explain the impact for DP History students such as Brooke if the web developer uses black hat search engine optimization techniques.

Most-appropriate topic codes (IB Computer Science HL):

• Topic C.2 — Searching the web (Parts (a), (b), (c), (d), (e))

▶️ Answer/Explanation

(a)
For the correct answer:
Keywords are often matched with the metatag keywords and description as well as the page content increasing the accuracy of the search; Use of keywords helps to clarify the searcher’s thinking; Using a phrase adds additional terms that are either ignored by the search engine or could provide false positive results; Can reduce contextual differences derived from use of language in the results.

When Brooke types a full natural-language question like “What were the causes of the American Civil War?”, the search engine has to parse irrelevant filler words (“what”, “were”, “the”, “of”) that dilute the query’s focus. By reducing the query to keywords — “causes American Civil War” — every term carries analytical weight, matching more precisely against indexed page content and metadata, producing results that are more relevant and less cluttered by pages that happen to contain the phrase “what were the” in some unrelated context.

(b)
For the correct answer:
A-B-E

In a breadth-first crawl starting from node A, the crawler visits A first, then explores all of A’s immediate neighbours at the same depth level before going deeper. A links to B first, then to E (after processing B’s siblings or in order of discovery), so the sequence is A, then B, then E — covering the shallowest layer completely before descending.

(c)
For the correct answer (any one):
The breadth first crawl ensures that all pages at a certain depth are indexed before moving deeper, giving a thorough snapshot of the web. The breadth first crawl completes/visits all the links on the page before moving to the next page, whereas the depth first crawl follows the line of one series of links from page to page until it ends — potentially this means the initial pages may not be completely indexed if the stack is large. Prioritizes more important, higher-level pages, which are more likely to contain significant information and links to other relevant content.

Breadth-first crawling is like systematically mapping a city by first cataloguing every building on Main Street before venturing down any side alleys. This ensures the crawler builds a comprehensive index of the most prominent, highly-linked pages early in the crawl, avoiding the risk of getting lost down an infinitely deep chain of obscure, low-value pages and missing the important hub content altogether.

(d)
For the correct answer (any two):
Page rank is a numerical value that represents the importance of a page; A Dampening factor is applied to the calculation; Page rank algorithm is recursive; Page rank uses the page rank of incoming / outgoing links.

PageRank fundamentally treats links as votes of confidence. One key feature is its recursive nature — a page’s rank depends on the ranks of the pages linking to it, which in turn depend on their own incoming links, requiring iterative computation until values converge. Another feature is the dampening factor (typically around $0.85$), which models the probability that a random surfer will continue clicking links rather than jumping to an entirely new page, preventing rank from being trapped in circular link loops.

(e)
For the correct answer (up to 4 marks, with reference to DP History):
Information inaccuracy — As the Black hat SEO distorts search rankings (using keyword stuffing), it provides false/inaccurate information; the students may potentially have unintended access to inappropriate material; the validity of material is questionable using unethical techniques. Time and Resources — This may lead to wastage of bandwidth visiting unintended websites; the time wasted in visiting the site. Security — including exposure to malware and other security issues; potential damage from this.

For Brooke researching History of the Americas, black hat SEO poses a serious academic threat. A site using keyword stuffing might invisibly pack its pages with terms like “American Revolution primary sources,” tricking the search engine into ranking it highly even though its actual content is plagiarised, factually wrong, or completely unrelated — perhaps a shopping site selling revolutionary-war-themed merchandise. Brooke could waste precious research time sifting through these deceptive results, and worse, might unknowingly cite inaccurate information in her assignment, undermining her academic integrity and grade.

Question 11

ARPANET was developed as a project by the American military. It became the technical foundation for the internet. Figure 9 is a representation of ARPANET in 1974.
(a) Outline one reason why ARPANET was developed as a distributed network.
The original ARPANET used cable networks within the US. When linking to Hawaii and the United Kingdom it used a satellite link. The network consisted of connected mainframe computers hosting servers that had a number of connected terminals (clients).
(b) Outline one advantage of using a client-server architecture.
The nature of computing has evolved from client-server architecture to peer-2-peer and cloud computing.
(c) Compare peer-2-peer and cloud computing.
Decentralization of the web is partly a result of open standards, interoperability and distributed networks.
(d) To what extent have open standards and interoperability supported the decentralization of the web?

Most-appropriate topic codes (IB Computer Science HL):

• Topic C.3 — Distributed approaches to the web (Parts (a), (d))
• Topic C.4 — The evolving web (Parts (b), (c))

▶️ Answer/Explanation

(a)
For the correct answer (any one):
Connection and sharing between nodes — enabling better resource utilization / collaboration / distributed computing power / storage. Reliability of the network in the event of failure of one or more nodes (fault tolerance) — providing redundancy and replication to ensure data durability and availability. Distributed networks are more scalable, especially in scenarios where the network is distributed among large geographical areas — they leverage localized resources and reduce latency.

The military motivation was survivability. In a centralized network, destroying the single central hub would collapse all communication. ARPANET’s distributed design meant messages could dynamically reroute through any available path — if one node was knocked out (say, by a Soviet attack), traffic would simply flow around the damage through surviving nodes, keeping command and control operational.

(b)
For the correct answer (any one):
Has structured hierarchy/centralized control — the functionality of the network will be centrally controlled, giving more systematic and structured access. Enhanced security — as the server is responsible for security, it facilitates better implementation of monitoring. Enables management of resources, storage, applications and users leading to cost saving. Client devices don’t need to have as much processing power, memory, storage, etc. because the applications are kept and run on the server.

In a client-server architecture, the server acts as a gatekeeper. All critical data, authentication, and business logic reside on a central machine that can be secured, backed up, and patched by administrators. Thin client terminals — even old, low-powered machines — can access sophisticated applications because all the heavy computational lifting happens server-side, reducing hardware costs across the organisation.

(c)
For the correct answer (2 marks for similarities/contrasts, up to 4 marks):
In cloud computing resources are managed and provided by centralized data centres and accessed via the internet; it is often provided as a service (PaaS, SaaS, private cloud, community cloud, public cloud); it is generally scalable / adaptable / distributed. Peer to peer computing is distributed / decentralized; peers directly share resources such as files or processing power; level of access and control is generally set by the users; it is focused on sharing rather than supplying a specific service; there is equality between the peers in the network.

Cloud computing centralizes resources in vast, professionally managed data centres — when you use Google Drive, your files sit on Google’s servers, and you access them through a client. Peer-to-peer, in contrast, has no central authority: every participant is simultaneously a consumer and a provider, sharing their own disk space and bandwidth directly with others. Cloud offers reliability and convenience at the cost of control; P2P offers resilience and freedom from corporate oversight but depends on the goodwill of individual peers staying online.

(d)
For the correct answer (3 marks for open standards, 3 for interoperability):
Decentralization means that resources are distributed rather than localized ensuring that diverse systems and devices can communicate seamlessly. Open standards (HTTP, TCP/IP, HTML, etc) shifts the control from government (and corporate) control to a broad selection including individuals, like-minded groups and organizations; this enables freedom of expression, sharing, cooperation, communication and collaboration. Interoperability is a product or system whose interfaces are understood by and work with other systems; there are no restrictions enabling ease of processing, manipulation and transfer of data; interoperability enables information systems (particularly databases) to exchange information without significant modification or the use of third-party agents; interoperability is based on open standards.

Open standards have been the bedrock of web decentralisation. Because TCP/IP, HTTP, and HTML are publicly documented and freely implementable, no single company or government can gatekeep who builds a web server or browser. Anyone can create a website or a new web service that interoperates with everything else, which is why we have millions of independent sites rather than a single monopolistic platform. Interoperability amplifies this — your email from a tiny self-hosted server can still reach someone on Gmail because both follow the SMTP standard, meaning power remains distributed across countless independent operators rather than concentrated in a few walled gardens.

Question 12

Figure 10 shows a web sub-graph. The adjacency matrix in Figure 11 quantifies the edges connecting the nodes.
(a) Distinguish between a web graph and a web sub-graph.
(b) With reference to Figure 10, which nodes would be part of the:
(i) in component
(ii) strongly connected core
(iii) out component
Search engines use web graphs as a factor in the ranking of websites in a search.
(c) With reference to the unweighted directed web sub-graph in Figure 10 and adjacency matrix in Figure 11, explain why the search engine ranking based on this information may not always be accurate.

Most-appropriate topic codes (IB Computer Science HL):

• Topic C.5 — Analysing the web (Parts (a), (b), (c))

▶️ Answer/Explanation

(a)
For the correct answer:
A web graph describes the directed links between pages (entire network of web pages) on the world wide web; A web sub graph looks at a subset of the world wide web.

A web graph is the theoretical representation of the entire World Wide Web as a giant directed graph — every webpage is a node and every hyperlink between pages is a directed edge. A web sub-graph zooms in on just a tiny fraction of this — perhaps all pages related to a single topic or all pages within one domain — making it manageable to analyse but necessarily incomplete in representing the full connectivity picture.

(b)(i)
For the correct answer:
B & I

The in-component consists of nodes that can reach the strongly connected core but cannot be reached from it. Looking at the graph structure, nodes B and I have outgoing links that eventually lead into the central cluster, but no paths exist from the core back to them, placing them in the in-component.

(b)(ii)
For the correct answer:
E, D, C, G & H

The strongly connected core is the set of nodes where every node can reach every other node via directed paths. Nodes E, D, C, G, and H form this mutually reachable cluster — from any one of them you can navigate to any other by following the directed edges, creating a densely interconnected centre.

(b)(iii)
For the correct answer:
F & A

The out-component contains nodes that can be reached from the core but cannot reach back into it. Nodes F and A sit downstream — you can get to them from the core nodes, but there is no directed path leading back, making them terminal sinks in the graph structure.

(c)
For the correct answer (up to 4 marks):
A directed graph is one where the edges have direction to them — the edge is represented by an arrow. A weighted graph is one where the vertices or edge of the graph have been assigned a weight or value. This web sub graph has edges that are directed, but since no values are added to the graph it is unweighted; The Adjacency matrix shows the number of connections made but this is not represented in the graph; With partial information provided the significance is harder to judge. Search engines algorithms like PageRank make use of the incoming and outgoing links and ranks of the other pages. A node’s PageRank score is developed by factors including volume of incoming and outgoing links — only partly reflected in the web graph and adjacency matrix; Search engines potentially value sites with high levels of connections (Authority and hub scores) and use this to provide a higher ranking. The ranking is not determined by these links alone and different search engine algorithms may place different weightings on the incoming and outgoing edges.

The fundamental problem is that counting links alone — as a simple adjacency matrix does — treats every link as equal. In reality, a link from the BBC homepage carries vastly more authority than a link from an obscure personal blog. The unweighted graph cannot distinguish the quality, context, or relevance of links. Furthermore, modern search ranking incorporates hundreds of signals beyond link structure: content relevance, user engagement metrics, page load speed, and mobile-friendliness. Reducing ranking to just the topological structure of a sub-graph would produce results that are easily manipulated and frequently irrelevant to actual user queries.

Question 13

The Genome Reference Consortium (GRC) is an organization developed from the Human Genome Project. The GRC is attempting to map human genetic structure to develop a deeper understanding of our genes. The GRC maintains open online databases of genetic information such as genetic materials for a number of different species including humans, chickens, zebra fish and mice. It also provides the names of the scientists who have contributed to the research.
(a) With reference to the GRC describe why ontologies are a key component of this project. 
(b) Describe how projects like the GRC can be used to develop collective intelligence.
In contrast to the GRC, social media uses folksonomies based on the use of hashtags such as #metoo to identify a particular cause or issue.
(c) Discuss whether the use of folksonomies in social media support the democratization of the web.

Most-appropriate topic codes (IB Computer Science HL):

• Topic C.6 — The intelligent web (Part (a), (b), (c))

▶️ Answer/Explanation

(a)
For the correct answer:
Ontology is a representation, formal naming and definition of the categories, properties and relations/interrelationships that exist for a particular field or knowledge domain — in this case the analysis and sequencing of genetic materials in the genomes of several different organisms. Ontologies provide a common language for describing the genetic data / a common framework to enable use and reuse of data / searchable keywords and phrases / formal language for the knowledge domain. Ontologies enable data from various sources to be brought together — for example the data from the human genome project and research projects around other species. Ontologies can use if-then rules to make logical inferences.

Without a shared ontology, a gene labelled “BRCA1” in the human database and the chicken database would be incomparable — researchers wouldn’t know if they referred to homologous sequences or entirely different genetic features. Ontologies provide that common vocabulary and relational framework, defining precisely what each term means and how concepts interrelate, so that cross-species genomic comparisons, automated reasoning, and data integration become possible across disparate research groups worldwide.

(b)
For the correct answer:
The open structure of the project is a collaborative work which enables researchers and organizations to share, collate and present their research and findings in an accessible medium; Data and information are not restricted to specific journals or publications, which are often subscription based; Researchers can comment and discuss the findings applying the capabilities of multiple researchers and scientists; New research is presented and shared enabling other researchers to utilize the data; Data is available to be downloaded and used, reducing the time and cost of developing the datasets.

Collective intelligence arises when the GRC’s open platform allows thousands of geneticists worldwide to contribute their individual discoveries to a shared repository. A researcher in Tokyo might sequence a particular gene variant, upload it to the GRC database, and a team in Berlin analysing zebra fish genomes can immediately cross-reference it against their own findings, spotting a homologous sequence that would have taken months to discover in isolation — the whole becomes demonstrably smarter than the sum of its parts.

(c)
For the correct answer (balanced discussion, up to 6 marks):
Arguments for democratization: Folksonomies are a system for classifying information using user-generated keywords often in the form of hashtags; social media platforms support the use of #hashtags and this extends to their search engines and popularity rating systems; social media is widely used by a broad selection of the population potentially giving a significant audience to tagged articles and posts; the simplicity of and ease of access to social media means that large numbers of people can participate in and contribute to popular issues; the international nature of social media makes restriction of the use of folksonomies by controlling interests difficult. Arguments against: Inaccurate information and “fake news” can be propagated using folksonomies and the distribution magnified by inclusion in popular threads; organizations can make use of or create and promote popular tags to promote their own agendas, opinions and bias; the use of folksonomies by interest groups and lack of moderation may make some viewers sceptical; non-popular tags or misspelt tags or tags not conveying exact meaning may deprive access to content due to its incorrect nature.

On one hand, folksonomies — especially hashtags — have genuinely lowered the barriers to public discourse. A grassroots movement can coordinate around #MeToo or #BlackLivesMatter without needing permission from any central authority, allowing marginalized voices to reach a global audience and influence real-world political change. On the other hand, the same lack of gatekeeping means that #Pizzagate-style conspiracy theories can spread just as virally, and well-funded organisations can artificially inflate their chosen hashtags through bots and coordinated campaigns, drowning out authentic grassroots expression. So folksonomies are a double-edged sword: they bypass traditional elites and enable participation, but without curation or verification mechanisms, they can also be weaponised to manipulate rather than democratise public discourse.

Question 14

A car rental company has offices in cities in Spain and Portugal. It manages its cars as a large, unsorted collection of rental objects that is accessed by a Java program.
The following UML diagram describes the current main Rental class. Fuel type and transmission type were chosen to be Boolean because they have two choices: petrol or diesel for fuel type, and manual or automatic for transmission type.
The brand and the model of the car are stored together as one string brandModel.
Typically the company has many cars of the same brand and model.
(a) Outline the general nature of an object.
(b) State one mutator method to be included in the class Rental.
(c) Construct the code for the accessor method getBrandModel().
(d) Outline one purpose of a default constructor.
The company is buying new electric cars and hybrid cars.
(e) Outline one change that needs to be made to class Rental due to this development (company buying new electric cars and hybrid cars).
Based on this Rental class, the program defines several other classes: Car, Bus and Van, each with their own characteristics. For example, the class Car adds the attribute numberOfDoors to the class Rental.
(f) State the relationship between Rental and Car.
(g) Construct the code for the class Car without having to duplicate all the attributes and methods from the class Rental. The default constructor of the class Rental should be overridden to also assign the value 4 to numberOfDoors. No other constructors are required.

Most-appropriate topic codes (IB Computer Science HL):

• Topic D.1 — Objects as a programming concept (Parts (a), (b), (c), (d), (e), (f), (g))
• Topic D.2 — Features of OOP (Part (g))
• Topic D.3  — Program development (Parts (b), (c), (d), (g))

▶️ Answer/Explanation

(a)
For the correct answer:
An object is an abstract entity; consists of data/attributes/properties; has methods/behaviour/actions on that data; An object occupies memory / has a lifecycle; An object is an instance of a class.

An object is a self-contained bundle that combines state and behaviour. Think of a Rental object — it holds concrete data like a specific number plate and price per day (its attributes), and it can perform actions like returning its brand model or updating its rental class (its methods). The object exists in memory as a real instance created from the blueprint defined by its class.

(b)
For the correct answer (any one):
setNumberPlate(String numberPlate); setPricePerDay(double pricePerDay); setRentalClass(char rentalClass); setYear(int year); setBrandModel(String brandModel); setFuelType(boolean fuelType); setTransmissionType(boolean transmissionType);

A mutator (or setter) method allows external code to safely modify an object’s private attribute. For example, setPricePerDay(double newPrice) would update the rental price while potentially including validation logic — ensuring the new price isn’t negative — before actually changing the internal variable.

(c)
For the correct answer (3 marks):
public method; return type; correct return.

public String getBrandModel() {
    return this.brandModel;
}

An accessor method grants read-only access to a private field. The method is declared public so external classes can call it, has a return type of String matching the brandModel field’s type, and simply returns the current value using the return keyword — the this keyword is optional but clarifies we’re referring to the instance variable.

(d)
For the correct answer:
A default constructor instantiates an object of a class with null or default values for the instance variables/attributes without using any parameter.

A default constructor provides a no-argument way to create an object when you don’t yet have all the specifics. It allocates memory for the new Rental object and initialises all fields to sensible defaults (numbers to 0, booleans to false, Strings to null), giving you a blank but valid object that can be populated later through mutator methods.

(e)
For the correct answer:
fuelType can no longer be boolean; but could be another datatype such as int/char/String (or similar) to represent the distinct values for 4 different types of fuel.

With four fuel options — petrol, diesel, electric, and hybrid — a simple true/false boolean is mathematically insufficient (2 states cannot represent 4 possibilities). The fuelType attribute must be changed to a wider data type: a String could hold “electric” or “hybrid”, an int could use codes (0=petrol, 1=diesel, 2=electric, 3=hybrid), or a char could use initials (‘P’, ‘D’, ‘E’, ‘H’).

(f)
For the correct answer:
Car inherits Rental (allow Car ‘is a’ Rental or Car extends Rental or Car is a subclass of Rental).

The Car class extends the Rental class, establishing an inheritance (“is-a”) relationship. A Car is a specific kind of Rental — it inherits all the generic rental properties like number plate and price per day, while adding its own specialised attribute (numberOfDoors) that is unique to cars but not relevant to buses or vans.

(g)
For the correct answer (4 marks):
Award [1] for (public) class Car extends Rental; Award [1] for declaring numberOfDoors; Award [1] for numberOfDoors being set to 4 within the constructor; Award [1] for correct getter / setter method.

public class Car extends Rental {
    private int numberOfDoors;

    public Car() {
        super();      // calls Rental's default constructor
        this.numberOfDoors = 4;
    }

    public int getNumberOfDoors() {
        return this.numberOfDoors;
    }

    public void setNumberOfDoors(int n) {
        this.numberOfDoors = n;
    }
}

By using the extends keyword, Car automatically inherits all of Rental’s fields and methods without re-declaring them. The overridden default constructor first calls super() to invoke the Rental constructor (which initialises the inherited attributes), then sets numberOfDoors to 4. The getter and setter provide controlled access to this new, Car-specific attribute, following the encapsulation principle.

Question 15

(a) Identify the OOP feature that was used to declare the Car class.
(b) Explain the benefits of the feature identified in part (a).
(c) Identify the two other features of OOP.
(d) Describe one advantage of modularity in program development.

Most-appropriate topic codes (IB Computer Science HL):

• Topic D.2 — Features of OOP (Parts (a), (b), (c), (d))

▶️ Answer/Explanation

(a)
For the correct answer:
Inheritance

Declaring Car using extends Rental is the classic syntax of inheritance — the defining OOP mechanism that allows a new class to absorb the properties and behaviours of an existing class.

(b)
For the correct answer (up to 3 marks):
Because the parent class holds common attributes and methods, inheritance will enhance reuse of code and reduce maintenance costs. Faster development time — as the existing code (base class) is already tested and less code needs to be written and debugged. Child classes may add new functionality (Extensibility) — extending the parent’s action and data without redefining them. Child class redefines the base class methods (Overriding) — to provide different functionality to existing method of the parent class. Easier to maintain — as the changes in the parent class are automatically reflected in the child class.

Inheritance means you write and test the generic rental logic once in the Rental class, and every specialised vehicle type — Car, Bus, Van — inherits it for free. If a bug is found in how the price per day is calculated, fixing it in Rental automatically cascades the correction to all child classes, eliminating the nightmare of hunting down and repairing duplicated code in multiple places.

(c)
For the correct answer (any two):
Encapsulation; Polymorphism; Abstraction

Beyond inheritance, OOP is defined by encapsulation (bundling data with the methods that operate on it and restricting direct external access), and polymorphism (the ability of different object types to respond to the same method call in their own specific way — for instance, both Car and Bus might have a calculateRentalCost() method that behaves differently).

(d)
For the correct answer (any one):
Easier / faster to debug/test — because there are far fewer mistakes in the smaller/individual modules. Speedier / faster completion of the project — because different teams work on different modules. Facilitates reusability of the code — as the existing modules can be reused across other modules. Improves code readability / organisation — smaller manageable modules leading to better logical organisation.

Modularity breaks a large, intimidating programming problem into small, independent chunks. When each module has a clear, narrow responsibility, you can test it in isolation — if the Car class works perfectly on its own, you know any integration bugs lie elsewhere. This compartmentalisation means multiple developers can work in parallel on different modules without stepping on each other’s toes, dramatically accelerating project completion.

Question 16

All Car objects have been read into a large unsorted array called allCars.
A method is needed to show customers the range of available cars.
This method should take the array allCars as a parameter and select Car objects from allCars so that every available brandModel is presented only once.
You may assume that there are never more than 100 different types of cars (as identified by the variable brandModel).
(a) Define the term parameter variable.
(b) Construct the code for the method findBrandModels() that will take the array allCars as a parameter. It must return a Car array that contains every brandModel that is available without duplication.
A customer wants to see which different types of cars are available. The criteria are it must be a petrol car with automatic transmission and cost less than 35 euros per day.
(c) Without writing code, outline the steps needed for a method to perform this query and present the results to the customer. 

Most-appropriate topic codes (IB Computer Science HL):

• Topic D.3  — Program development (Parts (a), (b), (c))

▶️ Answer/Explanation

(a)
For the correct answer:
The value/variable passed when the function/method is called; passed as a value or as a reference; is found in the parameter list of the method definition/signature.

A parameter variable is the named placeholder in a method’s signature that receives an argument when the method is invoked. When you call findBrandModels(allCars), the array reference allCars is the argument, and it gets bound to the parameter variable declared inside the parentheses of the method definition, making the passed data accessible by that local name within the method body.

(b)
For the correct answer (8 marks):
Correct method signature (excluding return type); instantiating a Car array (result) of size 100; loop through allCars with length condition; setting and resetting a variable (found or similar) inside the outer loop; loop that checks uniqueness; checking for a null pointer exception in at least one loop; correct test (use of equals() and ‘==’); correctly adding the Car when not found in result; returning the correct result array.

public Car[] findBrandModels(Car[] allCars) {
    Car[] result = new Car[100];
    int count = 0;
    for (int i = 0; i < allCars.length && allCars[i] != null; i++) {
        boolean found = false;
        for (int j = 0; j < count; j++) {
            if (result[j] != null && result[j].getBrandModel().equals(allCars[i].getBrandModel())) {
                found = true;
                break;
            }
        }
        if (!found) {
            result[count] = allCars[i];
            count++;
        }
    }
    return result;
}

The algorithm uses a nested loop structure. The outer loop walks through every Car in allCars; for each one, the inner loop scans the result array to see if a Car with the same brandModel has already been stored. The found boolean flag tracks whether a match is discovered. Only if found remains false after the inner scan is the current Car added to result at the count position, ensuring the returned array contains each distinct brandModel exactly once, in the order they were first encountered.

(c)
For the correct answer (5 marks, outline only):
Create a result array to store the return value of findBrandModels(); iterate through the result array to check individual Car object; if the Car object does not fulfil all three criteria then remove this Car object from result / make this Car object null; iterate (or sort/search) through the result array to output the Car objects that are not null / return the result array. OR: Create a desiredCars array; iterate through the result of findBrandModels(); if an object fulfils the three conditions, copy the car object into desiredCars; output / return the desiredCars array.

You would first call findBrandModels() to get the de-duplicated list of all available car types. Then, loop through that result array and, for each non-null entry, check three boolean conditions: is the fuel type petrol (getFuelType() == true assuming true means petrol), is the transmission automatic (getTransmissionType() == true), and is the daily price less than 35 euros (getPricePerDay() < 35.0). Cars that pass all three tests are either added to a new filtered array or printed directly to the customer. Finally, present this filtered list — perhaps by iterating through it and outputting each matching brandModel and its price.

Question 17

The car rental company also has a database of customers. For each customer it stores an object with personal data such as their ID, name and address.
This Customer object includes the history of the cars they have rented and the car they are currently renting (if any).
(a) Draw the relationship between Customer and Car objects.
A suggestion has been made to modify the Rental class to include customerID.
The intention is to make it easier to find the customer who has a certain car.
(b) Describe in terms of dependencies why the suggestion to modify the Rental class to include customerID is inappropriate.
(c) Explain the ethical obligations for programmers when developing a customer database.

Most-appropriate topic codes (IB Computer Science HL):

• Topic D.1 — Objects as a programming concept (Parts (a), (b))
• Topic D.3  — Program development (Part (c))

▶️ Answer/Explanation

(a)
For the correct answer:

The UML-style line should connect Customer to Car with an aggregation or association arrow. Typically, a Customer object contains a collection (like a CarList) of Car objects representing rental history, plus an optional reference to a single Car for the current rental — this is a “has-a” relationship where the Customer holds references to multiple Cars over time.

(b)
For the correct answer:
The problem becomes that Car ‘has a’ Customer and Customer ‘has a’ car; This is a circular / duplicate / redundant relationship which may cause inconsistencies; It increases dependencies and causes more overhead when changes need to be made.

Adding customerID to Rental would create a circular dependency: Customer already maintains a list of Cars they’ve rented, so if Car also stores a reference back to Customer, you have two independent sources of “truth” about the same relationship. If a customer returns a car and the customerID in the Car object isn’t cleared simultaneously with its removal from the Customer’s history, the database enters an inconsistent state — a classic maintenance nightmare that violates the principle of keeping dependencies unidirectional and minimal.

(c)
For the correct answer (5 marks):
Obligation to respect privacy — only relevant data should be stored that helps the customer, to limit the impact on privacy. Obligation to provide data security — programmers should incorporate safeguards such as encryption, to limit the chance of personal data being misused. Obligation to protect data against corruption — programmers should incorporate data validation / verification, to limit the chance of incorrect personal data being stored. (Accept any other valid obligation.)

Programmers handling customer databases shoulder serious ethical weight. They must design systems that collect only necessary data (minimisation) — storing medical information or political affiliations when all you need is a driving licence number is indefensible. They are obligated to implement robust security: encrypting stored personal data, hashing passwords, and using parameterised queries to prevent SQL injection attacks that could leak entire customer tables. Furthermore, they must ensure data accuracy through validation rules and provide mechanisms for customers to access and correct their own records, honouring the principle that people have a right to control information about themselves.

Question 18

The car rental company stores details of its customers as objects of a Customer class, as follows.
public class Customer
{
    private String customerID;
    private String name;
    // more personal data
    private CarList history;
    private String level;
    // constructors
    // all getter and setter methods
    public void updateHistory(Car newCar)
    {
        // missing code
    }
}
Where the class CarList is declared as:
public class CarList
{
    private CarNode root;
    // default constructor
    public void addToFront(Car newCar)
    {
        // missing code
    }
    public boolean isEmpty()
    {
        return (root == null);
    }
    public int count()  // returns the number of Cars in history
    {
        // missing code
    }
    … more methods …
}
The class CarNode is declared as:

public class CarNode
{
    private Car aCar;
    private CarNode next;

    public CarNode(Car newCar)
    {
        this.aCar = newCar;
        this.next = null;
    }

    // getter and setter methods
}

(a) By using object references, construct the method addToFront in class CarList that allows a new car to be added to the front of the list.
The company has a loyalty program with 4 levels (basic, silver, gold and diamond). The more cars you rent, the higher your level and the bigger your benefits.
When a customer returns a car, the program will add the Car object to the start of their history list. Then it will count the cars in the history list and finally it will determine and save the new status of the customer.
(b) Construct the method updateHistory(Car newCar) in the Customer class, that will perform the following tasks:
• add newCar to the start of the carList history
• count the number of cars in the carList history
• update the customer’s status.
You can use any previously developed methods.

Most-appropriate topic codes (IB Computer Science HL):

• Topic D.4 — Advanced program development (Parts (a), (b))
• Topic D.3  — Program development (Part (b))

▶️ Answer/Explanation

(a)
For the correct answer (3 marks):
Instantiating a new node with newCar as parameter; checking if the root is already null; assigning newNode.next to point to the start of list using setNext(); assigning root to point to newNode.

public void addToFront(Car newCar)
{
    CarNode newNode = new CarNode(newCar);
    if (root == null)
    {
        root = newNode;
    }
    else
    {
        newNode.setNext(root);
        root = newNode;
    }
}

Adding to the front of a linked list is an elegant constant-time operation. The method first wraps the incoming Car object inside a freshly constructed CarNode. If the list is completely empty (root is null), this new node becomes the sole entry — the root. Otherwise, we set the new node’s next pointer to point at whatever is currently at the front (the existing root), and then reassign root to this new node. The old root and the entire rest of the chain remains intact behind it — two pointer updates, done in an instant regardless of list size.

(b)
For the correct answer (4 marks):
Correct use of history.addToFront(); correct use of history.count(); attempting to update the level; correctly updating the level.

public void updateHistory(Car newCar)
{
    history.addToFront(newCar);
    int count = history.count();
    this.level = "Basic";
    if (count > 2)
    {
        this.level = "Silver";
    }
    if (count > 9)
    {
        this.level = "Gold";
    }
    if (count > 19)
    {
        this.level = "Diamond";
    }
}

The method performs three sequential tasks. First, it pushes the newly returned Car onto the front of the customer’s rental history list by calling history.addToFront(newCar). Second, it obtains the total number of Cars now in history using history.count(). Third, it determines the loyalty level: starting with a baseline of “Basic”, it uses a series of independent if statements (not else-if!) to progressively override the level based on the count. This cascading structure is crucial — a count of 25 will pass all three thresholds, correctly ending as “Diamond” because each true condition overwrites the previous assignment in descending order of prestige.

Question 19

The extensive customer database of the car rental company is saved in a collection and needs to be read into an abstract data structure to ensure that searches can be done quickly.
(a) Outline why a linked list is slower to search than a binary tree.
The following class TNode has been defined to store Customer objects in a binary search tree.

public class TNode
{
    private TNode left;
    private Customer data;
    private TNode right;

    public TNode(Customer newCustomer)
    {
        left = null;
        data = newCustomer;
        right = null;
    }

    // getter and setter methods
}

Figure 12 shows a representation of the way customers are stored in the binary search tree. The nodes only show the customerID. However, each node stores a full Customer object.
Consider the following recursive algorithm.
public void print(TNode node)
{
    if (node != null)
    {
        print(node.left);
        output(node.data.getCustomerID());
        print(node.right);
    }
}
(b) State the output for the call print(root) given in Figure 12.
The output of the call print(root) is used to construct a new binary search tree.
(c) Sketch the resulting binary search tree using your output from part (b) as sequential input to create the new tree.
A new method is required to add the Customer objects from the binary tree to a sequential collection called myCollection. The order of the Customer objects in the collection must enable the tree to be restored in its original binary search tree form when the customers are read in a sequential order from myCollection.
(d) Construct the recursive code for a method storeBST() that will allow the binary tree of customers to be stored appropriately in myCollection. You may assume a method myCollection.add(Customer aCustomer) exists that appends aCustomer to the end of the sequential collection of customers.

Most-appropriate topic codes (IB Computer Science HL):
• Topic D.4 — Advanced program development (Parts (a), (b), (c), (d))

▶️ Answer/Explanation

(a)
For the correct answer:
A linked list is a sequential data structure; which in the worst case means that all elements have to be searched; a binary tree uses a binary search algorithm; every comparison in a BST (ideally) halves the amount of nodes to search; in the worst case you need $n$ comparisons for $2^n$ items in the BST.

Searching a linked list is a linear process—you must start at the head and examine each node one by one until you find the target or reach the end. In contrast, a balanced binary search tree organizes data so that each decision (left or right) eliminates roughly half of the remaining nodes. This turns a search that could take $N$ steps in a list into one that takes only about $\log_2 N$ comparisons, making trees far faster for large customer databases.

(b)
For the correct answer:
101, 337, 451, 519, 612, 788, 999

The print method performs an inorder traversal: it recursively visits the left subtree, then processes the current node (outputting its customer ID), then recursively visits the right subtree. Starting from the root 519, the left subtree (337–101–451) is output in ascending order: 101, 337, 451, then 519, then the right subtree (788–612–999) gives 612, 788, 999. The combined sequence is strictly increasing.

(c)
For the correct answer:

Because the input sequence is already fully sorted in ascending order, each new value is larger than all existing nodes. When inserted into a standard binary search tree, each new node becomes the right child of the previously inserted node. The result is an extremely unbalanced, degenerate tree that is essentially a linked list—every node points only to the right, and the height of the tree equals the number of nodes.

(d)
For the correct answer (5 marks):
Correct method signature; correct test for null; correct add of current data before recursive calls; correct recursive call to left child; correct recursive call to right child, all in a preorder sequence (root, left, right).

public void storeBST(TNode current)
{
    if (current != null)
    {
        myCollection.add(current.getData());
        storeBST(current.getLeft());
        storeBST(current.getRight());
    }
}

The method uses a preorder traversal (root, left subtree, right subtree). For each non-null node, it first adds the node’s Customer object to the collection using myCollection.add(), then recursively stores the entire left subtree, and finally the right subtree. This guarantees that when the sequence is later read back and inserted into an empty BST in the same order, the original tree structure is perfectly reconstructed—each parent is stored before its children, so the insertion process recreates the exact shape.

Scroll to Top