Question 1

Alpha Hospital is situated in a large city and has over 1000 staff. It stores its data in a relational database. Some of the tables in this relational database contain data about the hospital staff, their roles, and the hospital departments. Staff roles include doctor, nurse, pharmacist, radiologist, and support staff. Staff can only hold one role. Departments include Accident and Emergency, Critical Care, Medical, General Surgery, Orthopaedics, and Ophthalmology. Staff can work in several departments.

The ROLE table, STAFF table, and DEPARTMENT table are shown in Figure 1.

(a)(i) State the primary key in the STAFF table.

(a)(ii) State a foreign key in the STAFF table.

(b) Describe the relationships between the three tables in Figure 1.

(c) Outline what a query is used for in a database.

(d) Identify the steps to create a query to list the staff with the surname Waters who are on pay grade 17. The query must display only FirstName, Surname, and PayGrade.

(e) Outline why an integer is an appropriate data type for the PayGrade field.

(f) Explain two ways in which the database administrator can ensure the privacy of the hospital’s staff data.

Most-appropriate topic codes (IB Computer Science SL):

• Topic A.2 — The relational database model (Parts (a)(i), (a)(ii), (b), (c), (d), (e))
• Topic A.3 — Further aspects of database management (Part (f))

▶️ Answer/Explanation

(a)(i)
For the correct answer:
StaffID

In any well-designed relational database table, the primary key serves as the unique identifier for each record, meaning every staff member will have their own distinct ID that will never repeat. So, looking at the structure of a STAFF table, the StaffID field is the most logical and standard choice to guarantee that each employee is uniquely identifiable.

(a)(ii)
For the correct answer:
RoleID or DepartmentID

A foreign key is a field in one table that links to the primary key in another table, creating a relationship between the two. In the STAFF table, the RoleID is needed to link each staff member to their specific job title in the ROLE table, and if there’s a direct link, a DepartmentID could similarly connect them to a department, making either a valid foreign key.

(b)
For the correct answer:
ROLE and STAFF: one to many
STAFF and DEPARTMENT: many to many

The relationship between ROLE and STAFF is one-to-many because a single role, like ‘nurse’, can be assigned to dozens of different staff members, but each staff member can hold only one role at a time. For STAFF and DEPARTMENT, it’s a many-to-many relationship since any staff member, such as a surgeon, can work in multiple departments, and each department will naturally contain many different staff members.

(c)
For the correct answer:
A query provides a virtual representation/filtered view of the database based on criteria set in the query.

A query is essentially a question you ask the database to retrieve specific information without seeing all the clutter. It allows you to filter, sort, and combine data dynamically, so instead of browsing thousands of records, the system instantly shows you only the names of staff matching a specific surname and pay grade, just like you requested.

(d)
For the correct answer:
SELECT STAFF.Firstname, STAFF.Surname, ROLE.PayGrade FROM STAFF INNER JOIN ROLE ON STAFF.RoleID = ROLE.RoleID WHERE STAFF.Surname = ‘Waters’ AND ROLE.PayGrade = 17

To get this result, you must first tell the database which columns you want (FirstName, Surname, PayGrade) by selecting them and then join the STAFF and ROLE tables together using the common key RoleID, because PayGrade is in the ROLE table. Finally, you set the filtering conditions to look for exactly ‘Waters’ in the Surname column and the number 17 in the PayGrade column, which narrows down the results.

(e)
For the correct answer:
The pay grade values are whole numbers.

An integer data type is a perfect fit here because pay grades like 17, 18, or 19 are always whole numbers without any fractions or decimals. Using an integer also keeps the database efficient, as it takes up less storage space than text and makes mathematical comparisons and sorting faster and more reliable.

(f)
For the correct answer:
The use of different levels of access/authorization. Meaning the minimum number of people have access to this data. For example, password protect/lock/restrict access to Role table.
Encrypting the stored data in the database. The data is scrambled/converted to cipher text (and cannot be understood without a key). For example, only the employees with the key can decrypt the data.

One effective way is implementing strict user authorization levels, which means designing the system so that a general administrative assistant can see basic contact details but is completely blocked from viewing sensitive salary or pay grade information, with access granted only to top-level HR managers. A second crucial method is encryption, where the database scrambles private data like pay grades and medical records into unreadable cipher text, so even if a hacker physically steals the hard drive, the information stays completely useless without the decryption key held by the hospital’s security team.

Question 2

Database design is a complex process that takes place in a range of phases. Different phases of database design use different schema.

(a) Describe the difference between a conceptual schema and a logical schema.

(b) Explain the importance of a data definition language in implementing a data model.

(c) Explain why data modelling is used during the development of a database.

(d) Explain why both data validation and data verification are required to ensure the correctness of the data within a database.

(e) Outline how data integrity is maintained during a database transaction.

(f) Outline the role of relational integrity in maintaining data consistency within a database.

Most-appropriate topic codes (IB Computer Science SL):

• Topic A.2 — The relational database model (Parts (a), (b), (c), (d), (e), (f))

▶️ Answer/Explanation

(a)
For the correct answer:
The conceptual schema is a high-level/least detailed representation of the database, identifying entities and relationships. The logical schema is more detailed, showing field names and developed from the conceptual schema.

The conceptual schema is like a rough sketch focusing on “what” data the system needs, identifying high-level entities like ‘Patient’ and ‘Doctor’ and their basic relationships without caring about database software. The logical schema takes that rough idea and formalises it into a precise blueprint, specifying every single field name, data type, and primary key—it’s the “how” stage that database developers can directly translate into a working system.

(b)
For the correct answer:
A DDL is used to specify the schema of a database, allowing you to define the tables, fields, and set datatypes (e.g., CREATE TABLE).

A Data Definition Language is crucial because it provides the actual coded commands to physically build the database structure you designed on paper into the computer. Without DDL commands like CREATE TABLE, your carefully planned data model would remain just an idea, as these statements are the only way to tell the database software to allocate space, create columns with specific data types, and enforce the relationships between your tables.

(c)
For the correct answer:
It helps to identify the entities/tables in the database to ensure they support the database’s purpose. Normalization during data modelling reduces data duplication, which reduces data anomalies and saves storage space.

Data modelling is used early on to visualise and organise all the information before any code is written, preventing a chaotic mess of redundant data. By carefully identifying entities and their attributes and then applying normalization rules, you eliminate harmful duplication—so a patient’s address gets stored once, not a hundred times—which prevents update anomalies where changing a piece of data in one place would leave old, incorrect versions in another.

(d)
For the correct answer:
Data validation is an automated process that ensures input meets data entry rules. Data verification is the checking of data to ensure it is the input intended. Using both techniques provides the optimal solution.

Validation alone isn’t enough because it only checks if data looks plausible, like a date of birth being in the past, but it can’t catch a user accidentally typing a correct-looking but wrong surname. Verification handles this human error, often by asking a person to double-check their input, so together, a system can automatically block impossible values while also catching slips of the finger, leading to truly accurate records for a hospital.

(e)
For the correct answer:
Integrity is maintained by no changes being made until the transaction is complete. If the transaction cannot be completed it is rolled back to the original state (Atomicity).

Consider transferring a patient’s record between departments—this isn’t a single action but a multi-step process, and the database must treat it as one unbreakable unit using atomicity. If any step fails—say the system crashes right after removing the patient from the old department but before adding them to the new one—the whole transaction must be rolled back automatically, returning the database to its initial, consistent state and preventing the patient’s record from being lost or orphaned.

(f)
For the correct answer:
Referential integrity is maintained by the connection between the primary key in one table and the foreign key in another, ensuring records are appropriately updated and preventing orphan records.

Relational integrity acts as a strict rulebook that prevents your database from falling into chaos, for example, by making it impossible to assign a staff member to a DepartmentID that doesn’t exist in the DEPARTMENT table. It also manages cascading updates—if a department’s ID code changes, the system automatically updates it for every single staff member linked to it, ensuring that all connections remain valid and that no one ends up assigned to a non-existent department.

Question 3

Alpha Hospital uses a database application to book appointments. Information can be shown in different formats.

Figure 2 shows an example of a patient’s appointments displayed in the application.

The Age field is a derived field.

(a)(i) Outline one reason why a derived field would be used in a database.

(a)(ii) Describe how the derived field Age would be calculated.

Patient information can be represented in the following format:

PATIENT (PatientID, FirstName, Surname, PreferredName, DateOfBirth)

(b) Outline the difference between first normal form (1NF) and second normal form (2NF).

(c) Construct a database in third normal form (3NF) for all of the data shown in Figure 2. You should use database notation as shown in the PATIENT table.

Most-appropriate topic codes (IB Computer Science SL):

• Topic A.2 — The relational database model (Parts (a)(i), (a)(ii), (b), (c))

▶️ Answer/Explanation

(a)(i)
For the correct answer:
A derived field is calculated using data that exists within the database, meaning the data does not have to be input or take up storage space.

Using a derived field like Age is a smart design choice because age changes constantly while a date of birth is fixed, so storing the age directly would require relentless manual updates. Instead, the database simply calculates it on the fly from the stored DateOfBirth whenever you run a query, guaranteeing the displayed age is always perfectly current without wasting any disk space.

(a)(ii)
For the correct answer:
Select Date of Birth from the patient table and subtract the year of the birth date from the current year to get the Age.

To dynamically compute a patient’s age, the system retrieves their stored DateOfBirth and mathematically finds the difference between that and today’s date. Behind the scenes, this usually involves a function like selecting `YEAR(CURDATE()) – YEAR(DateOfBirth)` from the table, which neatly outputs their age as a whole number without anyone having to do the mental arithmetic.

(b)
For the correct answer:
The prerequisite for 2NF is 1NF. The focus of 1NF is to eliminate repeating groups and ensure atomicity of data values, while the focus of 2NF is to ensure full functional dependency by removing partial dependencies.

First normal form is the foundational level where you make sure every cell contains just one value and there are no repeating columns, essentially cleaning up the messy spreadsheet layout into a proper table. Second normal form then builds on this by tackling only tables with composite primary keys, ensuring that every non-key attribute depends on the entire composite key, not just a part of it, which further reduces redundancy.

(c)
For the correct answer:
PATIENT (PatientID, FirstName, Surname, PreferredName, DateOfBirth)
DOCTOR (DoctorID, FirstName, Surname)
DEPARTMENT (DepartmentID, departmentName)
APPOINTMENT (PatientID*, DoctorID*, date, time, DepartmentID*)

To achieve a clean 3NF design, I need to split the appointment data into distinct entities because a patient and a doctor are separate real-world things with their own properties, and each department name should only be stored once. The central APPOINTMENT table ties everything together using foreign keys, so one appointment record simply points to the correct PatientID, DoctorID, and DepartmentID, while the date and time fields are fully dependent on that unique appointment event, eliminating any transitive dependencies.

Question 4

Global warming can be measured over time using mean temperatures. Figure 3 shows the mean daily maximum temperatures from 1970 to 2020 for Nauru, an island in the Pacific Ocean.

(a) State two variables that are necessary for the mean daily maximum temperature to be calculated.

(b) Identify the steps that would be used to create the diagram exactly as shown in Figure 3 using the data in a spreadsheet.

(c) Identify two reasons why the mean daily maximum temperature data is presented in graphical form.

An automated system is used to collect the temperature data for Nauru once an hour for one year.

(d) Outline one reason why the temperature data is collected once an hour rather than at shorter intervals, such as once a minute.

(e) Describe the steps that should be used to store the data collected for one day (24 hours) in suitable parallel one-dimensional (1D) arrays.

The time the data was collected must be easy to identify. Do not write a pseudocode algorithm.

The temperature data is downloaded every day and collated into a master file.

The data from the master file is loaded into a suitable array for that 24-hour period. The following statistics are calculated:

Maximum temperature
Minimum temperature
Mean temperature

As Nauru is very close to the equator, the length of the day changes very little throughout the year. For the purposes of part (f), the lengths of its day and night are:

Day: 07:00 to 18:59 inclusive (12 hours)
Night: 19:00 to 06:59 inclusive (12 hours)

(f) Construct a pseudocode algorithm to calculate the:

maximum temperature
minimum temperature
mean temperature
mean night-time temperature.

Assume the arrays for the time of day of the reading and hourly temperature readings have already been set up and populated as parallel 1D arrays.

Most-appropriate topic codes (IB Computer Science SL):

• Topic B.1 — The basic model (Part (a) (b) (d) (e))
• Topic B.3 — Visualization (Part (c))
• Topic B.2 — Simulations (Part (f))

▶️ Answer/Explanation

(a)
To calculate the mean daily maximum temperature, you need a variable to hold each day’s highest reading, such as `DailyMaximumTemperature`. You also need an accumulator variable, such as `TotalTemperature`, to sum these daily maximums before dividing by the count.

(b)
The steps to create the diagram would start with selecting the data cells for the mean daily maximum temperatures and the corresponding years. Next, you would choose the line graph chart type from the spreadsheet’s insert menu. Then, you would ensure the chart has the correct heading and finally label both axes appropriately, such as ‘Temperature (°C)’ for the y-axis and ‘Year’ for the x-axis.

(c)
The data is presented in graphical form firstly because displaying data visually allows trendlines to be identified, and this particular trend clearly displays the increase in mean daily temperatures over the time period shown. Secondly, complicated data like a long sequence of temperature readings is simply much easier for a person to understand and interpret quickly when it is displayed visually rather than as a table of raw numbers.

(d)
Collecting data once an hour, which results in 24 recordings over a day rather than the 1440 if collected every minute, will require significantly less data processing and less storage space. Furthermore, the minor temperature changes that might happen within each hour are negligible and should not affect the overall study of the climate, so hourly readings provide a sufficient and efficient dataset.

(e)
To store the data for a single day, you would first create two parallel one-dimensional arrays of size 24, perhaps named `TIME` and `TEMP`. You could then initialise the time array with the 24-hour clock format from 00:00, 01:00, and so on, up to 23:00 for the full day. Using a loop counter, you would loop 24 times, and during each iteration, you would store or display the current time element `TIME(N)` before inputting and storing the corresponding temperature reading in `TEMP(N)`.

(f)
The pseudocode algorithm would look like this:
TOTAL = 0
NIGHT_TOTAL = 0
MIN = 1000
MAX = -1000
loop T from 0 to 23
    NEXT = TEMP[T]
    TOTAL = TOTAL + NEXT
    if TIME[T] >= 19:00 AND TIME[T] < 07:00 then
        TOTAL_NIGHT = TOTAL_NIGHT + NEXT
    end if
    if NEXT < MIN then
        MIN = NEXT
    end if
    if NEXT > MAX then
        MAX = NEXT
    end if
end loop
MEAN_TEMP = TOTAL / 24
MEAN_TOTAL_NIGHT = TOTAL_NIGHT / 12

Question 5

The global increase in mean temperatures is causing concern, and governments are using computer models to determine potential future changes.

Many islands in the Pacific Ocean are close to or below sea level and have observed an increased incidence of coastal flooding. This suggests there is a relationship between the increase in mean temperatures and the increase in mean sea levels.

Figure 4 shows the mean sea level of Nauru from 1994 to 2015.

If the mean sea level rises, the probability of coastal flooding will also increase. This will put many of the inhabitants of Nauru at risk.

It has been proposed that a simulation be developed to show the effects of rising temperatures on the extent and frequency of coastal flooding.

(a) Distinguish between a model and a simulation.

(b) Describe how to identify the rules required to create a simulation from the mean temperature and mean sea level data.

(c) Evaluate how test cases could be used to effectively validate the accuracy of this proposed simulation.

(d) Discuss the advantages and disadvantages of using a simulation for decision making in the coastal areas of islands in the Pacific Ocean.

Most-appropriate topic codes (IB Computer Science SL):

• Topic B.1 — The basic model (Part (a))
• Topic B.2 — Simulations (Parts (b), (c), (d))

▶️ Answer/Explanation

(a)
For the correct answer:
A model is a mathematical representation/abstraction of a real-life situation. A simulation is the running of a mathematical model over time on a computer.

Think of a model as a static blueprint or a set of equations that describes the relationship between temperature and sea level—it’s the design on paper. A simulation breathes life into that model by executing it on a computer, stepping through time to show what actually might happen to Nauru’s coastline each year as temperatures rise.

(b)
For the correct answer:
Collect / analyse data for any pair of temperature, sea level, and flooding; Suggest rules for the relationship; Check the suggested rules with actual data.

First, I would plot the historical temperature data against the sea level data to look for a mathematical correlation, perhaps a linear or exponential trend, that can be turned into a formula or a rule connecting the two. Once I have a candidate rule, I would feed historical temperature data into it and see if the predicted sea levels match what was actually observed, tweaking the rule until the output realistically mirrors the real-world records.

(c)
For the correct answer:
Find historical data not impacted by external factors; Input the observed data; Verify the simulation against the known results; Modify the model if necessary. This use of test cases will help to improve the model and therefore make the simulation more accurate.

An effective way to validate is to use a technique called back-testing, where I would run the simulation on a past decade of data that we already know the outcome for, like the years 2000–2010, and check if the simulation’s predicted flooding matches the historical records. If the simulation’s forecast for that period aligns closely with the real documented floods, I gain confidence in its reliability for predicting future scenarios, but if it’s off, I must go back and refine the underlying rules and algorithms until the test cases pass.

(d)
For the correct answer:
Advantages: It can produce measurable visual results and help predict flooding so communities can be prepared. Disadvantages: It is a crude solution only testing one or two factors and does not consider social or cultural factors.

A simulation gives government planners a powerful, risk-free sandbox to visualise “what-if” scenarios, such as testing if a proposed sea wall will save a village under the worst-case temperature rise by 2100, all without spending a dollar on concrete. On the downside, these decisions carry a heavy burden because a simulation is a radical simplification of reality and cannot model the deep cultural trauma of relocating a community from their ancestral land, yet a flawed simulation might be the only tool available, leading to decisions that technically work on paper but are socially devastating.

Question 6

Organizations such as Earth.Org have raised concerns about the rate of sea level increase for Nauru. They stated, “Sea level rise is 2 to 3 times faster around Nauru than the global average, putting its freshwater supplies and crops at risk of saltwater contamination. Already reliant on economic aid, Nauru’s basic resource needs may have to be acquired externally for life to be sustained on the island.”

The 2D visualisation in Figure 5 shows the projected impact of mild and extreme sea level increases on Nauru by 2100.

(a) Define the term visualization.

There are proposals to develop a 3D visualization of the impact of rising sea levels on Nauru.

(b) Outline the relationship between images stored in memory and 3D visualizations.

(c) Discuss the time and memory considerations of 3D animation in the proposed 3D visualization for Nauru.

Most-appropriate topic codes (IB Computer Science SL):

• Topic B.3 — Visualization (All parts (a) to (c))

▶️ Answer/Explanation

(a)
For the correct answer:
Visualization is a graphical representation of data.

A visualization transforms abstract numbers, like projected sea levels, into a graphical format that our eyes and brain can instantly process. Instead of reading a table of elevation figures, a colour-coded map of Nauru showing blue where the water will encroach makes the existential threat immediately obvious to anyone.

(b)
For the correct answer:
The image in memory is stored as a mathematical model. Images in memory are rendered to create a 3D visualization.

Inside the computer’s memory, a 3D scene of a flooded Nauru isn’t stored as a photograph but as a mathematical wireframe model defining the precise coordinates of every building and coastline. The graphics processor then takes that mathematical description and performs rendering, calculating lighting, shadows, and textures to convert it into the final 2D image on the screen that looks three-dimensional to us.

(c)
For the correct answer:
3D animation is very complex in terms of programming and requires a lot of time for processing/rendering. Rendering different layers and transitions requires a lot of RAM and may also require the use of secondary memory/GPU, which has the issue of a different processing speed. Therefore, 3D animation requires sufficient, fast primary and secondary memory.

Creating a smooth, fly-through animation of Nauru’s future flooding is extremely demanding because each frame of the film showing water creeping inland could take minutes or even hours to render, meaning a full five-minute animation might tie up a powerful computer for days. On the memory side, the system must hold the massive 3D mesh of the island’s terrain and all the high-resolution textures for sand, vegetation, and buildings in RAM simultaneously; if the working memory runs out, the process spills over onto much slower secondary storage, causing the rendering to grind to a frustrating crawl, so investing in fast GPU memory and plenty of it is non-negotiable for this kind of project.

Question 7

Alexia and Jay are discussing their use of online resources to prepare for their IB examinations. They often visit websites such as BBC Bitesize.

Alexia refers to accessing the websites as “surfing the internet” and Jay refers to the resources as being “on the web”.

(a) Distinguish between the internet and the World Wide Web.

The uniform resource locator (URL) for the BBC Bitesize website is as follows: https://www.bbc.co.uk/bytesize/index.htm

(b) Identify two characteristics of a URL.

Jay wants to transfer a file to Alexia and suggests they use file transfer protocol (FTP).

(c) Identify two characteristics of file transfer protocol.

(d) Explain how a web browser functions.

Most-appropriate topic codes (IB Computer Science SL):

• Topic C.1 — Creating the web (All parts (a) to (d))

▶️ Answer/Explanation

(a)
For the correct answer:
The internet is a global network of inter-connected computers using internet protocols. The World Wide Web is a service on the internet, a collection of information and resources accessed via the internet using web browsers and protocols like HTTP.

The internet is the physical hardware infrastructure—a massive global network of cables, routers, and servers that connects billions of devices—while the World Wide Web is simply one of the many services that runs on top of that network, specifically the system of hyperlinked documents and resources accessed through a web browser using the HTTP protocol.

(b)
For the correct answer:
Protocol – https; Host – www.bbc.co.uk

A URL is packed with identifying information, starting with the protocol which tells the browser how to communicate with the server, which in this case is the secure HTTPS. Following that, the host or domain name uniquely identifies which specific computer on the vast internet hosts the files you want, here directing you to the BBC’s server.

(c)
For the correct answer:
FTP establishes two connections, one for control and one for data transfer; FTP facilitates the transfer of files efficiently between a client and a server.

FTP uses a clever dual-channel approach where one connection on port 21 handles the commands and login credentials, while a separate connection on port 20 is dedicated purely to shuttling the actual file data back and forth. This separation ensures that even while a large file is being transferred, you can still send control commands like abort or check the progress without clogging up the data pipeline.

(d)
For the correct answer:
Applies the appropriate protocols to enable communication with the web server; Provides a way to navigate to, access and fetch web pages using HTTP Request and Response; The data, typically written in a markup language such as HTML, is rendered to display web pages properly.

When you type a web address, the browser first acts as a translator, contacting a DNS server to convert the human-readable URL into a machine-friendly IP address before sending a structured HTTP GET request to the server. Once the server replies with the HTML, CSS, and JavaScript files, the browser switches roles and becomes an interpreter, parsing the raw code and meticulously laying out the text, images, and interactive elements on your screen in a process called rendering.

Question 8

Alexia and Jay are researching the web development language PHP. They type the phrase “PHP” directly into a web browser (see Figure 7).

The web browser redirects them to a popular search engine, which executes a search.

(a) Define the term search engine.

The operation of a search engine can be divided into three steps (see Figure 8).

(b) Describe how a web crawler functions.

(c) Outline why keywords are important for web indexing.

(d) Discuss whether an organization should use black hat search engine optimization (SEO) techniques to improve the ranking of its website.

Search engines return a very large number of results, but many of the web pages are not useful. The search needs to be refined.

Alexia and Jay’s teacher recommended that they use an online database that accesses the deep web.

(e) Distinguish between the surface web and the deep web.

One of the online databases provides Alexia and Jay with the following code:

<?php
if(isset(\$_FILES[‘CV’])){
\$errors = array();
\$file_name = \$_FILES[‘CV’][‘name’];
\$file_size = \$_FILES[‘CV’][‘size’];
\$file_tmp = \$_FILES[‘CV’][‘tmp_name’];
\$file_type = \$_FILES[‘CV’][‘type’];
\$file_ext = strtolower(end(explode(‘.’,\$_FILES[‘CV’][‘name’])));
\$extensions = array(“pdf”,”doc”,”docx”);
if(in_array(\$file_ext,\$extensions) === false){
\$errors[] = “This file extension not allowed”;
}
if(\$file_size > 2097152){
\$errors[] = “File size must be under 2 Mb”;
}
if(empty(\$errors) == true){
move_uploaded_file(\$file_tmp,”CV/”.\$file_name);
echo “Success”;
}else{
print_r(\$errors);
}
}
?>
<html>
<body>
<h1>Curriculum Vitae</h1>
<form action = “” method = “POST” enctype = “multipart/form-data”>
<input type = “file” name = “CV”/>
<input type = “submit”/>
</form>
</body>
</html>

(f) Identify four steps that take place during the processing of this PHP code.

The PHP code is processed on the server.

(g) Explain why an organization would choose to use server-side processing rather than client-side processing when delivering content to the client.

Most-appropriate topic codes (IB Computer Science SL):

• Topic C.2 — Searching the web (Part (a), (b), (c), (d), (e))
• Topic C.1 — Creating the web (Parts (f), (g))

▶️ Answer/Explanation

(a)
A search engine is a software system, program, or application that searches the World Wide Web or a database for keywords that match the user’s specification. It can also apply filters such as date, usage rights, size, and currency if used.

(b)
A web crawler, also called a bot or spider, starts at a designated “seed” or starting page and reviews and categorises web pages based on criteria for the information searched for. It looks for keywords, content, hyperlinks, and meta tags, and then follows hyperlinks from page to page. The crawler can move through a site either depth-first or breadth-first, often copying part or all of the content of a visited page. This review can be stopped by rules set in a site’s robots.txt file, and the crawler continues this cycle, constantly updating the index with new or changed content.

(c)
Web crawlers look for keywords from the meta keywords and meta tags found in the meta description, title, and potentially the URL of the page. They then determine how many times the keywords appear in the body of the page. This information is used by the ranking algorithm as part of the ranking process, making keywords critically important for web indexing and supporting Search Engine Optimization (SEO).

(d)
An organization must weigh the short-term benefits against the long-term consequences. Black hat SEO involves manipulating search engine rules to gain a higher ranking, which can bring increased traffic, more visitors, and potentially increased revenue, directing users to content the developer wants them to see. Techniques include keyword stuffing, poor quality or duplicated content, hidden keywords, paid links, link farming, and cloaking. However, the disadvantages are severe: if detected, a search engine can penalize the site, resulting in a lower score, blacklisting, or being flagged as an unsafe site, causing reputational damage. Although initial ranking may improve, the long-term score can drop significantly. Furthermore, there are ethical issues concerning the provision of inaccurate or unreliable content. Ultimately, the long-term risk of being de-indexed and suffering reputational harm outweighs the temporary gain in traffic, making it an unsound strategy.

(e)
The deep web is a part of the World Wide Web that is not indexed by search engines and therefore is not discoverable by normal search engines; this includes databases and dynamic pages that require authentication. The surface web, in contrast, is indexed by common search engines and is therefore accessible to most users, being searchable by normal or common search engines like Google or Bing.

(f) (Output of the PHP code, for reference purpose.)

Curriculum Vitae

Four steps during the processing of this PHP code are: first, a user selects a file and clicks the submit button. Second, data is extracted from the file, including file size, file type, and name. Third, the file is checked against criteria in conditional statements for file extension (which must be pdf, doc, or docx) and for file size (which must not be greater than 2,097,152 bytes). Fourth, if there are errors, they are added to the error array; if the error array is empty, the file is uploaded and a success message is printed; otherwise, the errors from the array are printed.

(g)
When an organization chooses server-side processing, the processing of the script occurs on the web server rather than on the browser of the client. This results in a consistent result regardless of the processing capacity of the client device, and only the processed result is seen by the user. Crucially, the processing and underlying data are secure on the server, offering a more consistent experience for the end user and greater control for the site owner.

Question 9

A social media file-sharing website allows users to upload their original video content.

The site provides an upload module that uses lossless compression.

(a) Outline how lossless compression maintains the quality of a media file.

When a file is decompressed, the content is rendered to fit a standard set of sizes, aspect ratios, and frame rates. Standard video formats, such as MP4, are used for the output of media.

The file format MP4 or MPEG-4 is an open standard for media files.

(b) Identify two characteristics of an open standard.

Social media sites use a distributed system where the data is stored in many countries.

In the European Union, Argentina, and the Philippines the right to be forgotten has been established as law. The right to be forgotten means a person has the right to have private information removed from databases and applications so it cannot be found in an internet search.

(c) Discuss the impact of the decentralized web on an individual’s right to privacy.

Xero, a small software company, was established in 2006 in Wellington, New Zealand. It produced an easy-to-use accountancy service for small to medium enterprises. This product was based on the software-as-a-service (SaaS) model and is sold in a subscription format all over the world. Customer data is securely stored online as part of the subscription cost.

(d) Explain how developments in the web have enabled small companies such as Xero to have a global reach.

Most-appropriate topic codes (IB Computer Science SL):

• Topic C.3 — Distributed approaches to the web (Parts (a), (b))
• Topic C.4 — The evolving web (Parts (c), (d))

▶️ Answer/Explanation

(a)
For the correct answer:
Lossless compression allows the compressed media to be reconstructed perfectly and completely, with no data deleted during the compression. It uses a shorthand version to replace repeating elements.

Instead of discarding any information like lossy compression does, lossless algorithms find and encode statistical patterns, such as replacing a long string of identical colour pixels with a short code saying “repeat this pixel value 500 times.” When the file is decompressed, every single bit of the original data is perfectly restored, so the video frame is mathematically identical to what was first captured.

(b)
For the correct answer:
A standard that is openly accessible and usable by anyone; Not owned by any governing body or private entity; Designed to ensure interoperability between systems and platforms.

An open standard like MP4 is essentially a public recipe book that any software developer can freely read and implement without paying royalties or asking permission from a controlling corporation. This open access guarantees that files created by one program will work flawlessly on any other device or platform, preventing the fragmentation and lock-in that happens with proprietary formats.

(c)
For the correct answer:
The decentralised web uses peer-to-peer networks where no single entity has control. Positive impacts: individual control means the user can allow or restrict information sharing, and ownership of the data remains with the user. Negative impacts: since there is no overall control, there is the ability to publish any information by anyone, making it harder for an individual to take down a page. Because data is stored across multiple nodes, there is a greater surface area for potential breaches or leaks. The right to be forgotten becomes nearly impossible to enforce on a truly decentralised system.

On the one hand, a decentralised architecture liberates individuals from surveillance by giant tech companies, giving them true ownership and cryptographic control over their personal data without a middleman mining it for profit. However, this same lack of a central authority becomes a privacy nightmare when harmful content or personal secrets are leaked, because there is no company headquarters to send a takedown request to; the data is replicated across countless independent nodes worldwide, making a right to be forgotten practically unenforceable under current law.

(d)
For the correct answer:
Economic and operational efficiency: no need to develop physical software packages or large infrastructure; cloud storage protects data from loss. Accessibility and scalability: users from any region can access the platform without physical distribution; widespread smartphone and internet access increases the user base.

The advent of cloud computing and SaaS meant a tiny startup like Xero in New Zealand didn’t need to build a global network of offices and distribution channels; it could simply launch a website and instantly offer its accounting service to anyone with a web browser. Combined with digital marketing and secure online payment gateways, the web effectively erased the geographical and logistical barriers that would have buried a small company just two decades earlier, allowing them to compete on a level playing field with global giants.

Question 10

A company sells healthcare products. Each product type is associated with a particular brand and brand price.

A system is designed to manage product sales, customers, and suppliers.

The Product class keeps details of a product. The following shows part of the code for this class:

public class Product {
private String prodCode; // eg X123
private String prodType; // eg Sunscreen
private String prodDescription; // about the product
private Brand prodBrand; // an object of type Brand
private int prodSale; // number of units sold
// Constructor
// code missing for constructor method
public int getProdSale() {
return prodSale;
}
public Brand getProdBrand() {
return prodBrand;
}
// all accessor and mutator methods are present but not shown
} // end of Product class

The Brand class keeps details of a particular brand. The following shows part of the code for this class:

public class Brand {
private String brandName; // eg Safesun
private float brandPrice; // price of the product of this brand
public Brand(String brandName, float brandPrice) {
this.brandName = brandName;
this.brandPrice = brandPrice;
}
public float getBrandPrice() {
return brandPrice;
}
// all accessor and mutator methods are present but not shown
} // end of Brand class

(a) (i) Define the term private.

(a) (ii) Define the term accessor method.

(a) (iii) Construct an accessor method in the Product class that returns the description for a product.

(b) (i) Construct a UML diagram for the given Product class.

(b) (ii) Construct the code for the constructor method for the Product class that initializes all attributes for a new product.

(c) (i) Construct the code to create an instance of the Brand class that has the brand name Safesun and a brand price of 2.17.

(c) (ii) Describe the difference between a class and an instance of a class.

(d) Identify two features of modern programming languages.

Most-appropriate topic codes (IB Computer Science SL):

• Topic D.3 — Program development (Part (a)(i), (a)(ii), (a)(iii), (b)(ii))
• Topic D.1 — Objects as a programming concept (Part (b)(i))
• Topic D.3 — Program development (Part (c)(i), (c)(ii), (d))

▶️ Answer/Explanation

(a)(i)
The term `private` is an access modifier or specifier used for attributes and methods. It makes them only accessible within the class in which they are defined, meaning private variables or methods cannot be accessed from outside the class.

(a)(ii)
An accessor method is a method that returns the value of a private or protected variable of an instance. It allows for accessing these private or protected variables from outside the class in a controlled manner.

(a)(iii)
The accessor method to return the product description would be written as:
public String getProdDescription() {
return prodDescription;
}

(b)(i)
The UML diagram for the Product class consists of a box divided into three sections. The top section contains the class name `Product`. The middle section lists the private attributes: `- prodCode: String`, `- prodType: String`, `- prodDescription: String`, `- prodBrand: Brand`, and `- prodSale: int`. The bottom section lists the public methods, which include `+ getProdSale(): int` and `+ getProdBrand(): Brand`.

(b)(ii)
The constructor code to initialise all attributes for a new `Product` object would be:
public Product(String prodCode, String prodType, String prodDescription, Brand prodBrand, int prodSale) {
    this.prodCode = prodCode;
    this.prodType = prodType;
    this.prodDescription = prodDescription;
    this.prodBrand = prodBrand;
    this.prodSale = prodSale;
}

(c)(i)
To create an instance of the `Brand` class named `b` with the specified arguments, the code would be:
Brand b = new Brand("Safesun", 2.17f);

(c)(ii)
A class is a blueprint or template that defines the attributes and behaviours of its instances, without necessarily allocating memory itself. An instance of a class is an actual object created from that blueprint, which holds the specific values for its attributes and gets a memory allocation for storing those values.

(d)
Two features of modern programming languages are:
1. Libraries and frameworks of pre-written code, which allow for code reuse and faster development.
2. Exception handling, which provides a structured way to manage runtime errors, making programs more robust and stable.

Question 11

(a) (i) Define the term primitive data type.

(a) (ii) Outline one advantage of using a primitive data type, such as int.

The ProductManagement class has the main method and other methods to generate the information required:

public class ProductManagement {
private Product[] allProducts = new Product[25];
public void sortProducts()
// sort in descending order of prodSale
{
// code missing
}
} // end of ProductManagement class

(b) Construct code for the method sortProducts() to sort the allProducts[] array in descending order of prodSale.

You must make use of the selection sort algorithm.

Most-appropriate topic codes (IB Computer Science SL):

• Topic D.3 — Program development (Part (a)(i), (a)(ii), (b))

▶️ Answer/Explanation

(a)(i)
A primitive data type is a data type that is pre-defined or fundamental in the programming language. It serves as one of the basic building blocks of composite data types or classes and is always assigned a value in memory.

(a)(ii)
One advantage of using a primitive type like `int` is memory efficiency. This is because primitive data types take up less memory than their corresponding object types, which makes them more efficient when working with large data sets or in memory-constrained environments.

(b)
The following code implements the `sortProducts()` method using a selection sort algorithm to sort in descending order:
public void sortProducts() {
    int n = allProducts.length; // accept 25
    for (int i = 0; i < n - 1; i++) {
        int maxIndex = i;
        for (int j = i + 1; j < n; j++) {
            if (allProducts[j].getProdSale() > allProducts[maxIndex].getProdSale()) {
                maxIndex = j;
            }
        }
        // Swap the elements
        Product temp = allProducts[maxIndex];
        allProducts[maxIndex] = allProducts[i];
        allProducts[i] = temp;
    }
}

Question 12

(a) (i) Outline one advantage of polymorphism.

(a) (ii) Outline one advantage of encapsulation.

(a) (iii) Outline one disadvantage of inheritance.

An invoice is created every time a customer purchases one or more products.

The Invoice class keeps details of each invoice. The following shows part of the code for this class:

public class Invoice {
private String invoiceID; // identifies a unique invoice
private static Product[] products = new Product[20]; // list of products purchased
private static int[] prodQuantity = new int[20]; // number of items of a particular product purchased
private boolean qualifiesForDiscount; // default value is false
private int numOfProducts; // how many products in this invoice
// constructor is defined, code not shown
public String getInvoiceID(){
return invoiceID;
}
// all accessor and mutator methods are present but not shown
public void addProduct(Product product, int quantity) {
// code missing
}
public void setQualifiesForDiscount(){
// if total value of purchases is more than 3000,
// qualifiesForDiscount value is set to true
// code missing
}
} // end of Invoice class

(b) Describe how encapsulation has been used in this code.

(c) Construct the method addProduct (Product product, int quantity) that will update an invoice.

The method should:

Update the products array
Update the prodQuantity array
Increment the numOfProducts.

If the total value of the purchases is greater than 3000, the invoice qualifies for a discount.

(d) Construct code for the method setQualifiesForDiscount() to change the status of an invoice.

The method should:

calculate the total value of an invoice
change the value of qualifiesForDiscount if needed.

(e) Outline one advantage of using modularity in program development.

Most-appropriate topic codes (IB Computer Science SL):

• Topic D.2 — Features of OOP (Part (a)(i), (a)(ii), (a)(iii), (b), (e))
• Topic D.3 — Program development (Part (c), (d))

▶️ Answer/Explanation

(a)(i)
One key advantage of polymorphism is code reusability. It allows developers to write more general and reusable code because a single function or method can work with objects of different classes that share a common interface or base class, reducing the need for duplicate logic.

(a)(ii)
Encapsulation provides data hiding and protection. By keeping sensitive data hidden from outside interference and misuse, and ensuring only authorized methods can access or modify the internal state, it significantly reduces the risk of unexpected behavior and protects the integrity of the object’s data.

(a)(iii)
A major disadvantage of inheritance is tight coupling. This occurs because a child class becomes tightly coupled to its parent class, meaning any change made to the parent can unintentionally affect the child, potentially introducing subtle and hard-to-find bugs.

(b)
Encapsulation has been used in this code by bundling or wrapping the variables and methods that operate on the data into one unit, the `Invoice` class. Furthermore, the variables of the `Invoice` class, such as `invoiceID` and `qualifiesForDiscount`, are declared as `private`. This prevents direct access from outside the class. Controlled access is provided through public methods, like the getter `getInvoiceID()` and the setter `setQualifiesForDiscount()`, to read and modify the private data safely.

(c)
The method to add a product to the invoice and update the arrays would be:
public void addProduct(Product product, int quantity) {
    products[numOfProducts] = product;
    prodQuantity[numOfProducts] = quantity;
    numOfProducts += 1;
}

(d)
The code for the `setQualifiesForDiscount` method would be:
public void setQualifiesForDiscount() {
    float totalValue = 0;
    for (int i = 0; i < numOfProducts; i++) {
        float price = products[i].getProdBrand().getBrandPrice();
        float amount = price * prodQuantity[i];
        totalValue = totalValue + amount;
    }
    if (totalValue > 3000) {
        qualifiesForDiscount = true;
    }
}

(e)
One significant advantage of modularity is that it makes debugging much easier and faster. Because a large program is broken down into smaller, independent modules, there are far fewer mistakes to search through in each individual module, allowing a developer to quickly isolate and fix an error.

Question 1

Question 2

Question 3

Question 4

Question 5

Question 6

Question 7

Question 8

Curriculum Vitae

Question 9

Question 10

Question 11

Question 12

Resources

Members

Company