Web Crawler

Web Crawler
Web crawlers are programs that travel through the web in a methodical manner and read data from websites specifically to index the pages for search engines. These programs parse the content of the page's metadata, it's content and links to identify the purpose of the page, where it’s links go and what those pages contain. Because of this, they earned many nicknames including Web Spiders, Crawlers and Web Robots.

Web crawlers play an important role search engine effectiveness. They play an important role in the development of the semantic web. As they scan pages, they record information about the page. The data is then returned to the search engine so that it can be processed and analyzed. After the pages have been analyzed, search engines can more accurately determine whether or not that pages matches a particular search query.

== Types ==

Focused Web Crawler
These crawlers focus on collecting and categorizing pages that all relate to a specific topic. They are economical with respect to hardware and network resources.

Incremental Web Crawler
Frequently revisits pages to refresh their indexing. Additionally, pages which become more relevant will replace pages that have become irrelevant, ensuring the most important results are returned to the user.

Distributed Web Crawler
A technique that utilizes many crawlers to get the most coverage possible over the web. There are generally many physical locations associated with this technique making it resilient to network attacks.

Parallel Web Crawler
Not surprisingly, the use of several crawlers running in parallel. Parallel crawlers are being used more often due to the ever increasing size of the web. Having several crawlers run simultaneously maximizes website downloads.