Currently Reading

Use Robots.txt to disallow spiders from specific pages or sections
Robots.txt is a file in your server which tells various search engine crawlers not to crawl or index specific parts of your site. It can tell certain search engines to ig‐nore certain pages, or tell all engines to ignore your site altogether. Even for op‐timization, you might want to hide certain parts of your site from search engines. For example, if your site has a “terms and conditions” page which is similar to most such pages on other sites and serves no search purpose, or you don’t want bots to crawl your cgi‐bin directory, or have any other directories or pages with duplicate content, you can use this file to tell search engines to ignore them:
A robots.txt file looks like this:
User-Agent: [Bot or Spider name]
Disallow: [File or Directory name]

A list of the names of various search engine spiders names:

Spider Name,  Search Engine, Status

AbachoBOT, Abacho, -
Acoon, Acoon, -
AESOP_com_SpiderMan, Aesop, -
ah-ha.com crawler
Ah-ha, -
appie, Walhello, -
Arachnoidea, Euroseek, active
ArchitextSpider, Excite, inactive
Atomz, Atomz, -
DeepIndex, DeepIndex (www.en.deepindex.com), -
ESISmartSpider, Ttravel Finder, -
EZResult, EZResults, -
FAST-WebCrawler, AlltheWeb, active
Fido, PlanetSearch, -
Fluffy the spider, SearchHippo, active
Googlebot, Google, active
Gigabot, Gigablast, active
Gulliver, Northernlight, inactive
Gulper, Yuntis, active
HenryTheMiragoRobot, Mirago, -
ia_archiver, Alexa, active
KIT-Fireball/2.0, Fireball (German SE at www.fireball.de), -
LNSpiderguy, Lexis-Nexis, -
Lycos_Spider_(T-Rex), Lycos, inactive
MantraAgent, LookSmart, active
MSN, Microsoft Prototype Crawler, -
NationalDirectory-SuperSpider, National Directory, -
Nazilla, Websmostlinked, -
Openbot, Openfind, -
Openfind, piranhaShark Openfind, -
Scooter, AltaVista,active
Scrubby, Scrub The Web, active
Slurp/3.0, Inktomi, active
Tarantula, AltaVista, inactive
Teoma_agent1, Teoma, active
UK Searcher Spider, UKSearcher, -
WebCrawler, WebCrawler, -

Defining the User‐Agent field with an asterisk (*) disallows all crawlers from that part of your site. Leaving the disallow field empty means that spiders are free to access and index your entire site. No matter how you use it, the robots.txt file gives you more control over your site and its optimization.

Share and Enjoy:
  • Digg
  • Sphinn
  • del.icio.us
  • Facebook
  • Mixx
  • Google

Related posts:

  1. Optimising your website for spiders Optimizing for Spiders Of course, there is more to SEO...

Related posts sorted by most relevant

Leave a Reply