Duplicate content (SEO foundation)

Duplicate content

This is SEO foundation course series: duplicate content.

In this section, you’ll learn how to identify and mitigate this major problem by implementing technical solutions in the code of your website.

By the end of this blog, you will be able to:

Define duplicate content
Explain why duplicate content can create problems
Identify common instances of duplicate content
List three potential solutions for duplicate content

What is Duplicate content?

Well, first of all, it’s the most common problem you’re going to run into as an SEO.

Here’s the basic idea.

You have two pages or two pieces of content that are identical.

So, when a search engine comes to those two pieces of content, it has no idea.

In fact, it doesn’t have enough data to be able to figure out which one is the original article and should be ranked well in search results.

Therefore, sometimes it may not rank either or it will arbitrarily pick one version which you don’t want as an SEO because you don’t have any influence on that.

What does duplicate content look like?

There are a couple of examples of this.

One of the more common is when a blog post is written and then another blog or a scammer blog copies the same content and posts it on their website.

So, now you’ve two articles that are exactly the same word-for-word and the search engines have to figure out who posted it first and who owns that content.

For a lot of reasons that can be very difficult.

Another example of this would be duplicate content on your own website.

Created by the programming or with tracking parameters.

This would be when you posted a blog post or this could be a new product page, for example, and you’ve linked to it from an email and also you’ve linked to it from social media.

Therefore, the URL it might be slightly different but from the page perspective and the search engine perspective, it’s completely identical.

Again, you run into the problem where the search engine doesn’t know which is the most important or the original page.

So, you either rank one or neither.

These are both big problems and extremely common.

Now, many times a CMS or content management system can produce duplicated content.

You see, when you create a page, it’s added to a database and then it’s published through instructions that pull from that database.

These instructions can cause the same page to be published in different areas of the website.

This is a problem when the same content is accessible at two or more different URLs.

Here’s why. Firstly, because the search engine does not know which page’s the primary page.

They choose which to rank and which to ignore.

Secondly, if both pages are being linked to from outside sources, they’re now dividing the link benefit.

Ideally, you want all of the links going to a single page for that content to gain relevancy.

Common instances

Now that we have some understanding of what duplicate content is, let’s explain more of where it surfaces.

Common instances: WordPress

One of the common examples of this is with WordPress.

WordPress is a content management system that is used for a large number of websites on the internet.

The problem is, that the default settings with WordPress create a lot of duplicate content.

You can write a blog post and create a page that advertises your new product or service with the default settings of WordPress.

you’ll see this on the tag page, on archive pages, author pages, and on the home page in addition to the blog post page itself.

Therefore, it’s going to exist in multiple different places.

So, when a search engine comes to it, it’s going to have no idea which is the most important version. So again, it may just arbitrarily pick.

So your search results sometimes can be within your own website, the author section, or the category section.

This all can get very confusing very fast.

This also happens when sorting pages like an e-commerce site.

You’ll have a list of cars and suddenly you want to sort them to show me the cars that are red and what’s going to happen is you’ll get a subset of the master list.

Now, it has cars that are red but all of the information describing the cars on that one page is identical to the page before, just with fewer data.

So, from a search engine optimization perspective, you have all the same information again that’s just expressed in a slightly different way and accessible from a different page.

Now, functionally it’s correct and it shows the users the information they need but it can work against you with search engines and how they determine and deal with duplicated content.

Fixing Duplicate content

Now that we’ve identified how difficult and common of a problem duplicate content is, let’s talk about some of the solutions.

Now, I need to warn you.

Some of these are quite technical.

If you identify the problem, in order to fix it you’re going to have to work with an IT team or some skilled programmer.

Now, the first method is to remove and block.

So, let’s take WordPress example.

If you have a blog post you just launched, you’re excited about, you’re quickly moving along and you realized it exists in different places.

It’s also on your author page, category page, archive page, and probably more.

So, what you’re going to want to do is, remove it from those other pages or block it entirely from being spidered by the search engine.

This can be accomplished in a few ways.

Firstly, depending upon the content management system, you can talk to your IT team or provider about this issue.

Now secondly, you can block access to the author’s pages or content.

Some, if you’re using WordPress, you can get plugins to help manage them.

Blocking access is usually done through the robot.txt file.

You can add the directories that contain duplicated content to the disallow instructions.

Now, the next fix is a tool it’s a code instruction called Rel=”Canonical”.

This is an instruction that the search engines created to help remedy this exact problem.

If you have duplicate content, you can use this instruction to tell the search engine that the page, where you find this content, is not the actual or the original page and it includes a link to the primary original page source.

This Rel equals canonical solution is a better solution than the next option which is called Noindex, Follow.

Noindex, Follow is a code snippet that again, your IT team or a programmer would have to implement.

The general idea is that you add this code saying this page is not important.

So, Noindex but Follow any links as you still want those to count.

Again, this is a code edition.

So, don’t just do it if you’re not confident.

The general idea here is to let you know that there are solutions to this issue but because they are created through technical programming issues,

they require just as much technical programming expertise to remedy.

Key takeaways

Duplicate content is identical content at different URLs.
Search engines find it difficult to distinguish the correct version that must rank from the duplicate ones.
It can be an article or blog post that is available on many different URLs (homepage, author page, category page).
Solutions for duplicate content:

Remove and block
Rel=”Canonical”
“Noindex, Follow”

Duplicate content (SEO foundation)