THE education site for computer science and ICT

2. Web Crawlers and indices

Search engines have two main tasks

Finding web pages
Building an index to allow users to find the web pages

Crawlers

Search engine use automated software tools called 'crawlers' or 'spiders' for the first task. A crawler accesses an active web domain (such as teach-ict.com). It then follows every link on every page on that website. It reports back the results of its exploration to the search engine. Eventually, this builds up a map of the website.

Index

The search engine examines the content of every page found by the crawlers. It then adds a summary of the content to an index. When someone uses the search engine, their query is compared to the search engine's index. The most relevant results are presented to the user.

Each search engine uses different criteria and algorithms to work out which index results are most relevant to user queries.

As there are billions of web pages, these indices are absolutely massive and require data centres dotted around the world with each containing thousands of servers. Search engine users expect an almost instant response to their search terms and so the database has to be able to search the index extremely quickly.

Challenge see if you can find out one extra fact on this topic that we haven't already told you

Click on this link: What is a search engine