PHPJavascript

Web Development Guide – My Personal Library of Tutorials and Scripts

Create robot.txt file

April25
Robot.txt is a simple text document used for informing a search engine what to visit and what not to visit on a website. This is a useful tool as you can block certain folders and files from search engines, like the folders with images or javascript validation files++. There are some websites with critical data and can be altogether prevented from search engines.You can create this file in any text editor. It should be an ASCII-encoded text file, not an HTML file. The filename should be lowercase.

SummaryXML sitemap generator creates sitemap plus robot.txt file in a click
GSiteCrawler – Click here to download it.

Quoted from google website

Syntax
The simplest robots.txt file uses two rules:

  • User-Agent: the robot the following rule applies to
  • Disallow: the pages you want to block

These two lines are considered a single entry in the file. You can include as many entries as you want. You can include multiple Disallow lines and multiple User-Agents in one entry.

What should be listed on the User-Agent line?
A user-agent is a specific search engine robot. The Web Robots Database lists many common bots. You can set an entry to apply to a specific bot (by listing the name) or you can set it to apply to all bots (by listing an asterisk). An entry that applies to all bots looks like this:

User-Agent: *

Google uses several different bots (user agents). The bot we use for our web search is Googlebot. Our other bots like Googlebot-Mobile and Googlebot-Image follow rules you set up for Googlebot, but you can set up additional rules for these specific bots as well.

What should be listed on the Disallow line?
The Disallow line lists the pages you want to block. You can list a specific URL or a pattern. The entry should begin with a forward slash (/).

  • To block the entire site, use a forward slash.

    Disallow: /
  • To block a directory and everything in it, follow the directory name with a forward slash.

    Disallow: /private_directory/
  • To block a page, list the page.

    Disallow: /private_file.html

URLs are case-sensitive. For instance, Disallow: /private_file.html would block http://www.example.com/private_file.html, but would allow http://www.example.com/Private_File.html.

Example of robot.txt:

# robots.txt file for http://www.templatesetc.com/
# 3/5/2007 12:23

User-agent: *

Disallow: /admin/
Disallow: /backup/
Disallow: /download/
Disallow: /images/
Disallow: /mail/
Disallow: /newsletter/
Disallow: /gen_validator1.js
Disallow: /config.php
Disallow: /style.css

# end of file


Note: I have attached a file as robot.txt in the download section. Make sure you download the file on your PC and save it as robot.txt

posted under SEO

Email will not be published

Website example

Your Comment:

 

2,885 spam comments
blocked by
Akismet