TIPS and TRICKS
Fixing Internal Duplicate Content with a Non-www Site Redirect
SEO Expert: Jerry West
Updated: December 29, 2007
One thing we all know as SEOs and Webmasters, search engine bots are stupid. I mean, real stupid. How else can you explain why a search engine would not only index TWO versions of a web site (the www version on non-www version) but also PENALIZE the site for duplicate content, when there isn't duplicate content - there is just ONE PAGE!!
How it Happens
Let's say someone links to CNN without the "www" like this: http://cnn.com/. When GoogleBot hits that link it indexes the page without the "www". If the site's internal linking structure of relative links instead of absolute links, well, the problem has just become site-side instead of just with that one page. The result is Google now has two copies of the site in the index and views them as separate sites.
The "Penalty"
Google hates duplicate content because it takes up room on their servers, they will "downgrade" pages, which are essentially "duplicates" of others it has already indexed. "So what?" you say, Google still has one version of my site that is okay.
Not so fast.
Even though Google looks at the sites as being different, because they originate from the same domain, Google discounts both versions and rankings suffer because of it. In fact, some have called this the "Slow Site Death" which is painful to watch if it is your own site.
Fortunately, there is an easy fix to this. First things first, let me show you the problem and why this problem is probably affecting you right now. Don't worry; this isn't one of those "ramble on forever videos". The entire process is covered in less than three minutes. I did this example for StomperNet back in May 2007. |
The Internal Duplicate Content Solution # Begin non-www page protection # Here are the steps you will need to take to edit your .htaccess file on your server. Of course, I am assuming you have an Apache server, if you are on a Windows server, scroll to the bottom of this page for the details you will need. 1) Find the .htaccess file on your server. It is on the root of your domain, which is the same folder as your home page. Normally this would be in the "public_html" folder on most servers. 2) Save a copy of the file to your local machine. 3) Edit the file. I prefer to edit the .htaccess file right on the server, and if you make a mistake, you have a backup. The reason I prefer this method is using a program such as Notepad often causes corruption in the .htaccess file, or Webmasters doing this the first time save it with a "txt" extension and can't figure out why it doesn't work. 4) Of course, I will assume here that you are smarter than a search engine bot and know to swap out your domain for mine above, or what will happen is you will just redirect all of your visitors to my site ... which, let's be honest, won't benefit them as that's not where they wanted to go. 5) Save the changes and type in "yourdomain.com" without the "www" and it should redirect to www.yourdomain.com. If you have problems, here is a brief troubleshooting guide: 1) If you can't locate the .htaccess file, try placing a "-a" in the remote filter. To get there, just right-click in the window showing the files on your server and choose "Filter". Find the area designated as "Remote Filter" and add the "-a". 2) Call your web host and make sure the Apache Rewrite Module is turned on. Most web hosts, for whatever reason, have this turned off as a default. Wait, can't you just skip this and go into your Google Webmaster account and change it there to recognize only the "www" version of your domain? You mean this screen? This screen is found under Dashboard > Tools > Set preferred domain Now, do you see where I underlined in red? Read that out loud. A lot of Webmasters think this fixes the problem, but it only applies to how the domain shows up in the SERPS visually on the screen. It has nothing to do with how your server serves your domain to a browser. Sadly, a presenter at the recent Webmaster World gave this as a tip on how to easily fix the non-www problem on a Webmaster's site. That's just bad advice. Now you know how to fix it, and if you have just average experience as a webmaster, you should have this fixed in about 8 minutes. Solution for Windows Servers |