| Daily we use Internet services and search tools in particular when searching for information. The search results are commonly called hits and are presented in the form of a list. The information may consist of web pages, images, data and other types of files. Some search engines also gather information available in databanks or open directories. In comparison with Internet directories that are maintained by human editors, search engines function algorithmically or are a mixture of human and algorithmic input.
Web search engines operate by storing information about a huge number of web pages which they retrieve from the INTERNET. These pages are retrieved by a web crawler, or differently called a spider. It is an automatically-controlled Web browser which follows every link it discovers. The content of each page is then analyzed to determine how it should be indexed. Words, for example, are removed from titles, headings or special fields called meta tags. Data about web pages are saved and stored in an index catalogue for further use in queries. Some search tools, such as Google, save and store the whole or part of the source page (referred to as a cache) and data about web pages, whereas others, such as AltaVista, store every word of every page they find. This cached page always contains the actual search text, since it is the one that was actually indexed. Consequently, it can be very helpful when the content of the current page has been changed and the search words are no longer in it.
As soon as a user has typed key words in the search field, the tool carries out checks on its index and provides a listing of the most suitable web pages according to its parameters, usually with a short summary coupled with the document's title and sometimes extracts from the text. Some search tools have installed an advanced option called proximity search which allows users to define the distance between key words.
The usefulness of a search engine hinges on the relevancy of the results it gives back. Since there can be millions of web pages that comprise a certain search word or word combination, web pages can be divided into relevant and irrelevant ones. Most search engines apply methods to rank the results to list the "best" results first.
The way a search engine displays web pages is specific to a search engine. The methods also alter in time, since the use of the Internet changes and advanced techniques are employed. |