SEARCH
Tag Archives: Robots.txt
Use Robots.txt to disallow spiders
Posted in: Blog by admin on November 12, 2009
Use Robots.txt to disallow spiders from specific pages or sections
Robots.txt is a file in your server which tells various search engine crawlers not to crawl or index specific parts of your site. It can tell certain search engines to ig‐nore certain pages, or tell all engines to ignore your site altogether. Even for op‐timization, you might want to hide certain parts of your site from search engines. For example, if your site has a “terms and conditions” page which is similar to most such pages on other sites and serves no search purpose, or you don’t want bots to crawl your cgi‐bin directory, or have any other directories or pages with duplicate content, you can use this file to tell search engines to ignore them:
A robots.txt file looks like this:
User-Agent: [Bot or Spider name]
Disallow: [File or Directory name]
Read more…