With the recent addition of the parameter section in Google Webmasters, Google prefers that you use their feature instead of blocking them in your robots.txt file. Last week I wrote an article on how to write a robots.txt file and mentioned in the end that you should only disallow parameters for the MSNbots. I’ll explain it further here.

What the hell are these parameters you’re talking about?
I’m talking about the extra parameters your website may be creating. URLs with extra “?” at the end of them where if you removed them it would be the exact same page (these are called canonical URLs). Here are some examples:

  • www.mysite.com/index.php?ref_id=123
  • www.mysite.com/product?id=jeans&session=123
  • www.mysite.com/?utm_source=

How do I use this new feature?
Log into your Google Webmasters account and goto Site configuration > Settings and you’ll see Parameter handling at the bottom. Google will give you a list of parameters they found, but you can also add your own. It’s not ideal to block every parameter listed since you should make sure it’s actually something you want Googlebot to ignore. Make sure you enter the parameters with no ? or = in them as Google will find those automatically.

What do I ignore and what do I don’t ignore?
As always, ignore pages that have absolutely no value to your readers. I’m talking about duplicate content, useless dynamic pages, or pages with no value that have parameters. But there are certain pages that you will not want to ignore. Notice in one of the examples above (www.mysite.com/product?id=jeans&session=123) I only bolded the “session” part. Why? Because “product?id=jeans” is probably a page you want Google to crawl and index. But the whole string, “product?id=jeans&session=123″ is probably the exact same page.

Now that I’ve done that, what changes does my robots.txt need?
Take any and all parameters you set as ignore in Google Webmasters and make sure only msnbot ignores them.


User-agent: msnbot/2.0b
User-agent: msnbot/1.1

# Examples
Disallow: /?ref_id=*
Disallow: /product?id=jeans&*

You might be asking, why only msnbot? What about Yahoo? The reason I didn’t include Yahoo! is because Yahoo! Site Explorer offers similar features to Google’s: Dynamic URLs and Delete URLs. While these work not as great as Google, they basically (almost) do the same thing. If you don’t want to do the extra work, you can simply add Yahoo’s bot with MSN’s.


User-agent: Slurp
User-agent: msnbot/2.0b
User-agent: msnbot/1.1

Any questions, comments, or suggestions? Feel free to comment or share the word!