Writing Your First Robots.Txt
You may have heard about the file “robots.txt” but still unsure what it’s really for. Robots.txt is file used to prevent web pages from being indexed by search engines. This may raise an alarm to some as, we know, our primary goal in doing all those SEO hassles is to ensure that “all” our pages are indexed by the search engines and to keep it index all the time. So why prevent our pages from being crawled? Well, the primary reason is that we don’t really want search engines to crawl some of our pages (or directories). Take my case as an example: I own several sites and most of them are dynamic. In order to simplify my work updating these sites, I use a separate part of the site where I can update my contents from time to time. I usually place this into a separate folder, say ‘admin’. Since ‘admin’ is my a private folder, I don’t want Google to be crawling its contents for security reasons. Think of what will happen if people suddenly find a page such as http://www.mysite.com/deletearticles.php in their search results page! That will certainly mean disaster.
In this context, I will have to ban search engines from crawling my admin folder and to prevent displaying my admin pages into the search pages. I can easily accomplish this by creating robots.txt file for my site. A typical robots.txt’ content that will serve my need at this point will look like the one below:
User-agent: *
Disallow: /admin/
This will disallow all search engines from accessing my admin folder thus also preventing files under it from appearing in the search pages. But beware, only disallow folders that you really need not want be indexed. Considering even experts such as Matt can sometimes make simple mistakes using robots.txt.
Till next time!


0 Comments:
Post a Comment
<< Home