Posted on May 22, 2012 by
Spread the word...

I have always been interested in analysing search engine robot activity to gain an insight into their behaviour and to identify common characteristics of different crawlers on site by site basis. Most of us would have a rough idea how crawlers behave in general terms but we don’t have any idea how crawlers treat an individual site

Crawler activity analysis is one of the most overlooked issues as far as search engine optimisation is concerned. Understanding robot activity is important because it enables us to:

  1. Estimate the impact of search engine robot activity on the workload and performances of websites
  2. Discover and compare the different patterns search engine crawlers utilize
  3. Gain a better understanding of robot behaviour on a site by site basis
  4. Determine content popularity and crawl concentration

I love WordPress and that is why I thought it would be great if we could create a plugin that would allow webmasters to track and analyse robot activity. So without further ado, here is SEO Crawlytics – a free no strings attached WordPress plugin that you can download and install on your site straight away!

The graphs in SEO Crawlytics help you gain a better understanding of the behaviour of search bots on your website. Below you will find a description of each graph and how to read the information it contains.

Hourly/Daily Visits

This is the main graph in the WordPress dashboard, and also on the plugin dashboard. Along the X-axis is the date and time of the visit which spans from 2 weeks ago to the current time. Along the Y-axis is the total number of robot visits for each robot during that hour, each robot has its own colour as defined in the top right legend. The reason this graph is grouped by hour is that it provides you with the easiest method to see which days were the most active, and which hours of those days were most active.

Since the amount of plots on this graph can become quite busy if you experience a lot of search traffic, you can also select a region to zoom into so that you can get more precise values. If you wish to look at just one day, you can click and drag from the 15th to the 16th and you’ll see the visits per hour during that day, with the graph rescaled to fit the values properly. Below the main graph, there’s a small timeline graph so that you can select a new, wider area if you wish to zoom out.

The Daily Visits graph is similar to the graph above, however it groups all results by the day they occured so that you can easily see which days themselves are the busiest. Similar to the previous graph, this graph is also on a rolling 2 week cycle.

Top Crawled Pages/Categories

Each time a search bot accesses your site, SEO Crawlytics will determine which post is being viewed and what categories that post belongs to. With this information, we can then graph out what posts and categories are most commonly visited by search bots. There is no historical limit on these graphs and they will factor in all data captured since you started using it. Additionally, as some blogs may have hundreds of posts, if a post does not account for 10% or more of the views then it will be grouped under the “Other” category.

Peak Crawl Times

Whilst the Hourly Visits graph gives an hour by hour breakdown of bot visit activity, the purpose of this graph is to show which hours of the day are overall most active for the search bots. This graph takes every visit to the site by a bot and groups it by the hour of the visit and then records this amount. What you end up with is a list of 24 hours with the accumulative values for every day the plugin has been active. Using this graph, you can easily see which hours of the day the bots are most likely to crawl your site, and arrange your post updates around that time for optimal search engine presence.

Crawl Spead

This pie chart gives you a breakdown of the top most active robots on your website in percentage.

Robot Detection & Verification

In the configuration panel you add new robots to the list and to ensure that your data is not diluted by spoofed user-agents you can add use reverse DNS for verification. If you don’t use reverse DNS then it is very likely that your data will not be an accurate reflection of actual robot visits.

Email Notifications and Integration

In the configuration section you can enable email notifications. You can choose to be instantly notified as soon as a robot is detected or you can go for daily reports. There are several variable that you can use to customize notifications, variables include  {bot} {mask} {url} {time} {refer} and {ipaddress}.

Finally if parts of your website is static or is using a different CMS then you can use the integration option to track the entire site. All you need to do is add the tracking snippet given in the configuration in the footer.

Requirements

  • WordPress 3.3.2+
  • PHP 5.2+
  • A curious mind

Go ahead and download SEO Crawlytics now. If you have any questions or would like to suggest features etc give me a nudge on on Twitter, I am @ysekand. This is a free plugin, I am not asking you for anything in return, I am not adding cheeky links in your footer or anything of that sort.

The code is available on  GNU General Public License v2.

 

Leave a Comment

Your email address will not be made public or shared. Inappropriate and irrelevant comments will be removed.

  1. Christophe says:

    It looks like a great tool guys, do you have it as a module for Joomla?

  2. Stephen says:

    Hi,

    I have used your plugin for some time now. But, I find that when I do an update to a post and then ping a SE to crawl the post for the update, the crawl shows up in the stats but google does not make the change within its index. Do you know why this is?

    I used a tool to check when the page was last visited, and this does not match what shows within SEO Crawlytics stats. For example, I used a tool to show when google bot last visited a post of mine that I recently updated. It shows about 15 days ago, but the stats in SEO Crawlytics, shows that googled crawled the post today. I did ping Google to make Google aware that the post had been updated today.

    The thing that most concerns me is that Eventhough supposedly Google’s bot crawled my site today, the changes with Google’s index have not showed up. Does google wait to update its index eventhough the bot has crawled my changed post? Thanks.