Web Search Engines – Working and Page Ranking Algorithms
Web search engines have been the source of information to avail the information present on the World Wide Web. In today’s fast-growing world, the use of search engines has increased significantly. With the use of smartphones growing and internet-enabled population rising exponentially day-by-day, the vitality of search engines has also risen.
The fact that there are at least 2 trillion searches happening every day shows how users rely on the search engines for finding any kind of information. Moreover, the number of websites available on the web also growing significantly, the search engines’ task becomes extremely complicated.
This makes it essential for the search engines to give the best possible result set, containing the list of websites that are the most relevant websites which are in accordance with the search query, to the searchers.
The Working of Search Engines – An Overview
The search engine is the medium for providing all kinds of knowledge possible from the World Wide Web. However, it is necessary to understand how search engines present the result choosing from the millions of web pages available on WWW that match the user query. The search engines execute a plethora of task in order to cater quality and relevant results to the users.
A search engine basically performs three kinds of functions:
Crawling is performed with a web crawler which is basically a program that collects data about a website. This data includes titles, images, keywords, connected or linked pages along with other details such as page layout, the links etc. This is done for pages after pages using the linkage structure and adding them to a list of the next page to be visited by the web crawler.
Indexing involves placing the data collected by the crawler. This eases the process for faster and precise retrieval. Indexing serves the data required for measuring the importance of a page with other similar web pages.
The inability of the search engine to have accurate and effective crawlers and indexers will give inefficient data for retrieval and ranking purpose.
Retrieval and Ranking
The retrieval process encompasses the process of bringing the relevant pages to the users according to their search query.
The ranking of the websites most pertinent and quality information based on the user query. This ranking is done based on the keywords, the quality of inbound links, etc.
With the search engine’s behaviour changing every day, nowadays, the search engines give more priority to the authority of the linking sites for ranking the pages.
Page Ranking Algorithms
Search engines make use of different ranking algorithms to provide a result that contains the list of the most useful and relevant websites to the users.
Here are three of such page ranking algorithms:
The PageRank algorithm was developed by Larry Page and Sergey Brin in the year 1996. The algorithm is used by Google for ranking the web pages in its search results page.
The ranking algorithm gives importance to the linking structure for the ranking of web pages. The algorithm works on the principle that if a page has vital links towards it(inlinks) then the links from that page will have links to other useful and important pages(outlinks).
The most crucial parameter the PageRank considers are the backlinks which are an incoming hyperlink from a website to another website that acts as a reference or a citation.
It is to be noted that the score for ranking is calculated at the time of indexing the pages, which also acts as a disadvantage since the score is not calculated at the query time.
However, the algorithm is popular and the success of this algorithm is due to its ability to rank the pages based on the importance of the web pages and thereby give quality results to the user depending on the search query.
Hyperlink-Induced Topic Search (HITS)
HITS algorithm is also known as hubs and authorities and was developed by Jon Kleinberg. This ranking algorithm focuses on the link analysis. The algorithm starts with generating the result set pertinent to the search query and then performing the computation of the score of this result set.
The algorithm involves two types of the score:
- Authority score – This score calculates the quality and value of the content of a web page.
- Hub score – This score calculates the value of a page’s links to other pages. It is a value that is the sum of the authority scores of the pages it is pointing to.
The good authority score is pointed by many good hubs and a good hub score points to many efficient authorities.
HITS, thus, is an algorithm that provides with both important and relevant content. However, there are chances of topic drift leading to inefficiency problems.
Ontology Page Ranking Algorithms
Today’s Web has been the projection of knowledge and information which consists of big amounts of data. Ontologies efficiently handle such huge amounts of data by defining the information under a specific domain which makes the retrieval and access to relevant content easier.
The use of ontology has significantly improved the relevancy of results provided to the user depending on the search query.
Thus, ontologies widen the scope of the knowledge and thereby reduce the overload of data.
For Semantic web, ontologies have been the backbone for representation of data making sure that the most relevant and the most important pages are on top of the search results.
Search Engines, thus, strive to change or develop the existing algorithms that enable the user to look for the most precise and related information from the millions of websites available on the web.