Popular Search Engines
A search engine is a software system that is designed to search for information on the World Wide Web. The search results are generally presented in a line of results often referred to as search engine results pages (SERPs). The information may be a mix of web pages, images, and other types of files. Some search engines also mine data available in databases or open directories. Unlike web directories, which are maintained only by human editors, search engines also maintain real-time information by running an algorithm on a web crawler.
1) Web Crawling
Matthew Gray’s World Wide Web Wanderer (1993) was one of the first efforts to automate the discovery of web pages Gray’s web crawler would download a web page, examine it for links to other pages, and continue downloading links it discovered until there were no more links left to be discovered. This is how web crawlers, also called spiders, generally operate today.
2)Indexing and Ranking
When a web crawler has downloaded a web page, the search engine will index its content. Often the stop words, words that occur very frequently like a, and, the, and to, are ignored. Other words might be stemmed. Stemming is a technique that removes suffixes from a word to improve the content of the index. For example, eating, eats, and eaten may all be stemmed to eat so that a search for eat will match all its variants.
3) Rank Optimization
Search engines guard their weighting formulas as a trade secret since it differentiates their service from other search engines, and they do not want content-producers (the public who produces web pages) to “unfairly” manipulate their rankings. However, many companies rely heavily on search engines for recommendations and customers, and their ranking on a search engine results page (SERP) is very important. Most search engine users only examine the first screen of results, and they view the first few results more often than the results at the bottom of the page. This naturally pits content-producers in an adversarial role against search engines since the producers have an economic incentive to rank highly in SERPs. Competition for certain terms (e.g., Hawaii vacation and flight to New York) is particularly fierce. Because of this, most search engines provide paid-inclusion or sponsored results along with regular (organic) results. This allows companies to purchase space on a SERP for certain terms.