I don’t know how I missed Matt Cutts’ video on Webmaster Central Blog on removing your content from Google. Anyway it’s not too late to give a summary of the talk especially since it is very useful.
There are several options for removing your content from Google although not all options are very good. Dan, at Doublespark-SEO blog, recently discussed two options, robots.txt file and the nofollow attribute. To learn more about both options please read the posts there. The great thing about Matt Cutts’ talk is that I preferred the robots.txt method before, but now after listening to the talk, it simply makes sense to use the two methods he favored.
- .htaccess – Use this to password protect a directory. This method is really good for preventing a page/directory from being indexed since Google bot won’t even attempt to guess the password and so won’t even be able to crawl the page.
- URL Removal Tool – The URL removal tool in Google’s Webmaster console is the best option for removing pages that are already indexed. In case you make a mistake (like removing your entire domain) you can simply revoke it instead of having to email Google about the mistake.
- robots.txt - An ok option but sometimes pages still get referenced by Google. Make sure the robots.txt file is error-free by testing it first using Webmaster Tools.
- noindex – Good but if Google sees a link to the page and does not crawl the page yet it might reference the link since it hasn’t read the noindex meta tag yet and so won’t know that the page shouldn’t be indexed. The page if crawled will not be indexed at all. However, MSN and Yahoo still references those pages.
- nofollow – Not a good option. Unless you can be sure that all links to the page has a nofollow tag, then bots can still still follow other links to the page and thus index the page that way.
- not linking to the page at all – Very poor option. Even if you don’t link to the page someone else might accidentally link to that page. Though a remote possibility, it can happen. Aside from that if someone surf’s the page and his referrer settings are on then the bot will be able to find the page through that.