How to Recognize and Fix Duplicate Content on your Website

Google says that duplicate content refers to blocks of content, within a particular domain or in other domains, which match other content exactly or are very similar. Even though the head of Google’s spam department, Matt Cutts, says you shouldn’t stress over duplicate content unless it is spammy, it can adversely affect your organic search ranking and link popularity.

It is important to eliminate duplicate content from your site because when people start linking to different versions of the same content, the search engines won’t know which one to show. In fact, Google says that if it perceives that the duplicate content can be used to manipulate your site’s ranking, it will adjust the ranking and indexing of your site. In this article, you will discover how to identify the different sources of duplicate content and fix them.

1. Understand the Common Sources

There are several technical reasons for duplicate content. But when you understand the most common sources, you will be able to locate and eliminate future occurrences. Research has revealed these sources of duplicated content:

The use of session ID’s
Printer-only versions of web pages
Store items that are displayed or linked through several distinct URLs
Generation of normal or stripped-down web pages for mobile browsers
Content scrapers and syndicators
Inconsistencies in the display of URL parameters by content management systems
Combining the use of URLs with the “www.” prefix with URLs that don’t have it.

2. Use Consistent Internal Linking

Structure all your internal links the same way. For instance, if you want to link to a page named index.htm in the page directory of your site, you will have several options. Choose one and stick to it. You can use any of the following:

http://www.yoursite.com/page/
http://www.yoursite.com/page
http://www.yoursite.com/page/index.htm

3. Syndicate Content With Care

If you need to syndicate your content on several sites, most search engines will display the version they presume is appropriate for searchers. But this may not be the version you want.

Therefore, you should include a link to the original article on every page where the content appears. You should also require other people who use your content to add the noindex meta tag to prevent unnecessary search engine indexing.

4. Use Webmaster Tools from Google

Use Google’s Webmaster Tools application to identify duplicate descriptions or page titles. You can do this by looking at Diagnostics and then clicking on the HTML Suggestions. You can also locate duplicate titles by clicking on HTML Improvements under the Search Appearance tab.

5. Search for Page Titles or Snippets

You can also use Google Search webpage to do a manual search for duplicate content. For example, if you want to bring up all the URLs on your website that have your article on Keyword Y, you should type this search phrase in to the search box:

site: yoursite.com intitle: “Keyword Y”

The results will reveal all the pages on yoursite.com that contain an instance of that keyword. To make it easier to locate and remove all duplicate instances of a particular article, you should make the intitle portion as specific as possible.

Besides locating duplicated content within your website, you can also use this method to identify other sites that are using your content across the web. For instance, if you wrote an article titled: “Keyword Y – All You Need to Know”, you can type this search phrase to locate all sites that have the same title:

Intitle: “Keyword Y – all you need to know”

6. Redirect Duplicate Content

Use 301 redirects to redirect users, and the search engine spiders from pages with duplicate content. If you use the Apache webserver, you should do this in an .htaccess file. You can also get this done in Internet Information Server (IIS) through its administrative console.

7. Use a Canonical Link Element

If you don’t want to delete a duplicated version of a page or article, you can add the rel=”canonical” link element. Place it within thesection of the web page. For example:

The href portion of the link should contain the right canonical URL for the article. When Google sees this link, it will transfer the link benefits gathered by this page to the canonical page.

This process works slower than using the 301 redirect. So you should use it when a 301 redirect is not feasible.