IB DP Computer Science Option C: Web science -: C.2 – Searching the web SL Paper 2

Question

The internet and World Wide Web are often considered to be the same, or the terms are used in the wrong context.
Many organizations produce computer-based solutions that implement open standards.
A search engine is software that allows a user to search for information. The most commonly used search algorithms are the PageRank and HITS algorithms.
a. Distinguish between the internet and the World Wide Web.[2]
b. Outline two advantages of using open standards.$[4]$
c. Outline why a search engine using the HITS algorithm might produce different page ranking from one using the PageRank algorithm.[2]
d. Web crawlers browse the World Wide Web.[3]
Explain how data stored in a meta-tag is used by a web crawler.

▶️Answer/Explanation

Ans:

a. )
The internet is a global network of interconnected computers / a network of networks;
The World Wide Web is software / a service that runs on the hardware of the internet and provides access to content / a collection of pages that can be accessed through hyperlinks / a way of accessing and sharing the information that is held on the internet in webpages;
The World Wide Web uses the http protocol. This is only one of the many protocols used by the internet;
E-mail, File Transfer Protocol (FTP), and instant messaging services are part of the internet but not of the web;

b. )
Open standards provide a publicly available specification for a specified task;
This is an agreed set of parameters that enable interoperability and/or compatibility to occur;
Using Open standards means that you are not subject to a governing body with its own agenda/self-interest; Thus, you can be confident that you won’t be subject to fees/bias;
Open standards promote interoperability;
This enables the various devices to communicate with each other;
Open standards advocates also argue that openness encourages better and more secure systems;
this is because more people are able to analyse the standards and resulting software and no-one has a proprietary interest in suppressing knowledge of problems to keep sales up.

c. )
The HITS Algorithm ranks the page based on a combination of its importance as a hub and an authority;
The PageRank Algorithm ranks the page by counting the number and quality of links to a page to determine the relative importance of the website;

d.)
Meta tags are included in the header of a web-page which are available to a web-crawler and give information about the page that it could make use of;

When the web-page is crawled, a copy of the HTML is replicated in the search engine database;
When a user enters text into a search the search engine retrieves the data indexed from the web-page;
And the search engine ranks and displays the content (in order of relevance);

Question

The IB Coordinator of AB World Academy introduces the Extended Essay to the Grade 11 students in January by researching the difference between primary and secondary data on the internet.

Some of the students used the Google search engine (Google.com) and others used the Ask search engine (Ask.com). These search engines gave results.
Google uses the PageRank algorithm and Ask uses the HITS algorithm.
The IB coordinator uploaded the assignments onto a cloud-based Learning Management Platform.
As part of their research, students downloaded images from the internet. Most of the downloaded JPG images were compressed using lossy compression.
a. Define the term search engine.[1]

b. Distinguish between the principles of these two algorithms.$[4]$

c. Describe the difference between cloud computing and local client-server architecture.

d.i. State the alternative type of compression to lossy.

d.ii.Evaluate the advantages and disadvantages for students of using compressed images in their IB Coursework.$[4]$

e. A PNG image uses open standards.[2]
Distinguish between interoperability and open standards.

▶️Answer/Explanation

Ans:

a.)
Software that interrogates a database of web pages;
b. )
PageRank algorithm [2 max]
PageRank works by counting the number and quality of in links of a page to determine a rough estimate of how important the website is;
The assumption is that more important websites are likely to receive more links from other websites;
Pages are given a score (rank) / counts links per pages;
HITS algorithm [2 $\max ]$
Based on authorities and hubs;
Authorities: A page is called an authority, if it contains valuable information and if it is truly relevant for the search query. It is assumed that such a page has a high number of in-links;
Hubs: These are pages that are relevant for finding authorities. They contain useful links towards them. It is therefore assumed that these pages have a high number of out-links;
c. )
Local client-server architecture [1 max]
The server is the central communicator between clients (e.g. email/chat server)/allows different clients to access and manipulate data;
Cloud computing $[1$ max $]$
Cloud computing puts the focus on sharing computing resources over the internet;
Differences between the two [1 max]
Cloud computing is often offered as a service to individuals and companies whereas local client server architecture is based at an organizational level;
Cloud computing can scale up or down depending on current demands more easily than local client server networks;
Client / server networks are more secure than the cloud as the data transmission is carried out locally;

d.i. )
Lossless (Compression);
d.ii.)
Advantages of using compression [2 max]
Allows for more rapid upload/download as the compressed file is smaller than the original;
Scrolling through the coursework may be quicker;
Disadvantages of using compression [2 max]
This may mean that the overall quality of the work is reduced;
This may be an issue when printing the work / or the quality of the image is a key contributor to the overall quality of the work;
Overall comment [1 max]
The impact of using compression may be dependent on the context in which the work is compressed;
e. Award $[2 \max ]$.
Interoperability means a computer program can communicate and exchange information across a range of platforms;
Open standards are publically and free standards that enable interoperability;

Scroll to Top