SEO - Dealing With Duplicate Content

By: Daniel Imbellino
Updated: Nov 30, 2015

Duplicate content comes in many forms on the web, and be caused by both intentional and unintentional means. Either way, duplicate content in any form can have drastic consequences for a given website. Here we're going to highlight some of the causes of duplicate content, as well as a there solutions.

For starters, some not so savvy people on the web will scrape content and place it on their own site as if it were their own. Regardless of what type of website you are creating, or what type of content it instills, don’t copy content from other places on the internet and simply post them in your own articles. Google will catch this fairly quickly, and they will drop your rank in SERP's (Search Engine Result Pages) as a result. In some cases Google may ban you from their index entirely as a result. Create content that is original to your sites theme and you can avoid this problem altogether.

Another issue that encompasses duplicate content, while not being deceptive in nature, are duplicate URL's that point essentially to the same page. The problem that arises here is that you have for instance one webpage named “pctechauthority.com/web-design/web-design.html”, and let’s say multiple sites are providing links to this same page using different URL’s. What problems could this cause?

Now let’s say that one link from one site links to your page using "http://pctechauthority.com/web-design/web-design.html", while another site links to that same page using "http://www.pctechauthority.com/web-design/web-design.html". Both are the same page, but have 2 different URL’s, one being with "WWW", and the other without the "WWW." Search engines generally see this as two completely different pages and may constitute them as being duplicate content, therefore affecting your rank in SERP’s by dropping your site further down the list.

Here's an example of a home page with multiple URL's, which could constitute them being idetifed as duplicate content by search engines:

<a href="http://your-site.com">Your site as non "WWW"</a>
<a href="http://www.your-site.com">Your site as WWW</a>

You see, these are 2 completely different URL's, and the matters just get worse when you take into account the home page URL ending with "/index.html" like so:

<a href="http://your-site.com/index.html">Ending in .HTML</a>

All these different URL's pointing to the same place confuses search engines, as they aren't sure which one to rank the highest, so they may end up dividing your page rank between all the URL's, effectively giving you a lower rank in SERP's as a result.

How do I fix the issue with duplicate URL's?

In some cases you can simply redirect all the separate versions shown above using 301 redirects on your server to the preferred version of the page on your site. Godaddy users can do this from launching their web hosting console and selecting URL redirects from the list of options.

If you are unable to implement 301 redirects, you can implement a canonical link declaration like this, <link rel=”canonical” href="http://www.example.com" /> in the head section of your webpage’s. To put it simply, a canonical URL is the preferred URL for a given page or set of pages. If we set the canonical link declaration of our web design home page as <link rel=”canonical” "href=http://www.pctechauthority/web-design/web-design.htm" />, then whenever another website links to this page either with or without the “WWW” included, search engines will then resolve both URL’s to the version you prefer (either www or non-www).

Also, Google gives their “webmaster tools” users a way to specify a preferred URL for their sites in search; either WWW or non-WWW. To set a preferred URL in Google Webmaster tools, add both URL's to your webmaster account (both the www and non-www versions), this must be done separately. The pick which URL you prefer from your dashboard. Now click configuration, settings, and under preferred domain select the version you wish to use. If you set your preferred version to "example.com", then all your pages will be set to resolve without the www, and vice versa.

Another issue that can cause duplicate content unintentionally is when an E-commerce website for instance displays multiple pages of the same content, but in different variations. Lets say a sportswear website decides to make several pages for one type of Nike shoes, displaying each page with the same shoe but in a different color or slightly different style. Search engines like Google, by default, will most likely notice the duplicate content and choose one page to be included in SERP’s. The problem is, this may not be the page you intended to show up in SERP’s. By declaring a preferred page to search engines and implementing a canonical link, you can effectively have the correct page appear in search results. You can do this using the link rel="next" and link rel="prev" declarations in the heading of the documents that you want to resolve to a certain page.

For example:

If you had 5 pages of shoes and you wanted only the first page to appear in search results you could set a <link rel="canonical" href="example.com/page-one.html" /> declaration in the head section of those 5 pages, so that only the first page appears in search results. If you happen to have a situation where you have a multi-page article and you only want the first page to appear in search results, but you want search engines to know these pages are directly related to each other, you can use the rel="next", and rel="prev" links in your documents head sections in order to specify this. From the first page of your multi-page article place a link in the head section like so,
<link rel="previous" href="part-one.html" /> <link rel="next" href="part-three.html" />

So this would be considered the 2nd page, since both links point to the 1st and the 3rd pages of your article. Multi-page articles, and duplicate E-commerce pages fall into the category of we call "paginated content." Here's a full article explaining exactly what paginated content is and how to deal with it: SEO - Working With Paginated Content .

Remember, the canonical link is the "preferred" version of a URL for a given page. If you can't implement redirects, then definitely make use of canonical links so that search engines like Google can properly rank your site without any added hiccups.

There seems to be a big argument as to whether or not multiple URL’s are actually seen as duplicate content by search engines to begin with. The answer, they won’t count against you in terms of playing by the rules. However, having separate URL’s pointing to the same page can hurt your page rank, since the search engines see both URL’s as being completely separate webpage’s. Google defines page rank as being a direct measure of the quality and relevance of links pointing to your website. Basically, the more quality and relevant sites linking to yours, the higher your page rank with search engines, or at least with Google in this case. If you have 2 URL’s pointing to the same webpage, and each URL has 500 quality links, then you are losing rank in search results! If you combined both of those URL’s with the canonical declaration, then both sets of links to both URL’s would be combined, effectively raising your rank and overall organic search results.

You can use absolute or relative links when specifying your canonical links. An absolute link, is a link that specifies a full URL, such as, "http://www.example.com/example-file.html”, or "http://example.com/example-file.html". A relative link, is a link from one page of a website to another, or a link to another document or resource located in the same directory like this “<a href="your-directory/example.html" target="_self">Example relative link</a>, this would be a relative link. Most web designers tend to use relative links when building a website. Using relative links, you can see how your webpage’s function while working offline, like say on your desktop PC. Absolute links on the other hand cannot be viewed offline from the server like relative links can, making them harder to work with. You can use either really. All major search engines automatically convert relative links to absolute links when presenting internet users with SERP’s (Search Engine Request Pages).

Other Forms of Duplicate Content:

In some cases people will use the same images across their site, and this should be expected in terms of using a site banner or logo, but some get carried away with using the same graphics on too many pages. It's great to include graphics in your content, but you should use each graphic only once, except for maybe a banner. If the graphics you implement are original to to your site specifically, meaning they aren't used elsewhere, then they count as original content, a big plus with all major search engines.