HubSpot provides a service offering an “All-in-one Marketing Software” solution which is very popular and used by many businesses, small and large.
Hubspot supplies custom software which is used to create content for your website boasting a full set of features and hosted on their servers. The idea is solid, the software is of good quality and far better than it’s competitors.
The general consensus seems to be positive but I recently noticed a small omission from it’s blogging platform that could lead to ranking degradation or a penalisation by Google.
This article is about duplicate content so I don’t want to mince my words; Hubspot is a great set of tools that, for the most part, works well. I just want to highlight how missing a simple tag or redirect can affect the performance of a client’s website.
That being said the system works on a dynamically generated backend, so a problem with one site can often be found in others. So, what is duplicate content?
Google (other search engines are available… I think) gives more weight to original content. As posts, articles, tutorials, and other content gets distributed across the internet Google tries to give the highest priority to the originating webpage and lower value to subsequent coppies.
This sounds great as it helps prevent spam and helps give credit to the original content creators. There are a few instances in which this can be handled incorrectly and seen in a bad light by Google.
Google refers to content duplication across mulitple domains several times such as:
However, in some cases, content is deliberately duplicated across domains in an attempt to manipulate search engine rankings or win more traffic.
The thing that should stand out in the above quote is “deliberately”. As Google is essentially an algorithm with a nice interface it sometimes needs a nudge to help it think like a human.
There are many instances in which a website can be configured to server the same content from multiple locations. Our wonderfully large and logical brains have no trouble distinguishing between spam and a development server, test webpage, or something similar.
Google on the other hand tends to require a few signals to point it in the right direction. The main two signals Google uses are 301 or 302 redirects which, as it says on the tin, redirects the visitor to the correct page. The less used is a canonical tag placed in the head section of a webpage to show an originating source.
So, what’s the problem?
Content created using the HubSpot software is hosted on their servers and duplicated on the client’s website or blog, usually by setting an “A record”. I stumbled into an example of this by Googling for the following:
The results page looks a little odd.
All numbers between web2 and web12 seems to return results with content being hosted at two locations so we should take a random result and look a little closer by visiting a URL being indexed (this was a random choice):
A quick glance around the page shows a footer with a URL of the original page as www.dmhub.com. We can compare the two websites, as Google sees them, by searching for the following:
site:dmhub.com and... site:dmhub.web11.hubspot.com
Both are very similar with about 5 pages of results so Google must think they have some value. I suspect Google sees both websites as relevant and noteworthy so is ranking them accordingly. Fairly straight forward.
We should widen our search to see if this will affect normal searches and, possibly, targeted keywords. We could try Googling for a term that only the originating website has to see if it’s competing with the mirrored page:
The first result is our main website. Yay!
6th position however shows the duplicate content on HubSpot’s website so we could be competing with outselves.
Next we should narrow the results even further by searching for a unique sentence on the target page (including quotes):
"defined by real strategy and actions with a measurable return"
This returns one result, our main webpage, until we click the “repeat the search with the omitted results included” text. Now we see the HubSpot mirrored version ranks above the original version.
It looks like Google may be getting mixed signals and could use a hand… Or a denominator divisable by 0 (bad geek joke, sorry).
Neither pages redirect to the true content but both pages do, however, have canonical tags stating they have priority:
<link href=" HUBSPOT DOMAIN " rel="canonical" /> <link href=" MAIN DOMAIN " rel="canonical" />
I may be over thinking the significance of this, but my SEO spidey sense started tingling. It seems like an clear case of Google getting mixed signals with duplicate content.
Can we fix it?
Yes we can! Children’s T.V. referrences aside, the fix should be simple enough:
The best solution would be to use a 301 redirect match to forwared the entire *.webn.hubspot.com subdomain network to the relevant client’s page.
The easiest way to give Google a pointer would be to change the canonical tag on HubSpot’s networked pages to point to the client websites.
Either way, it is a simple fix and this post should be redundant within a week. If the issue still persists after a week or so, it may be worth contacting HubSpot or your SEO geek to have it rectified.