Posted on October 31, 2012 by
Spread the word...

Hurricane Sandy crashed Google’s party forcing it to shut down its big Android event but it looks like Sandy has caused more damage. It looks like parts of Google’s indexer mechanism has been affected by Sandy, this is an assumption but I do have some theoretical evidence to make this claim.

Basically, in the last 24 hours I have noticed that new pages are being crawled by Google but they are no where to be seen in the index as quickly as they should. So I assume that while the crawlers are doing their job the index refresh is affected and the only thing that comes to my mind is Sandy. Bear in mind Google Search is an immensely huge distributed system and their infrastructure spans across all continents and they have a team of of people called Site Reliability Engineers or SREs  whose main focus is to keep Google search and other services running – more on this here. However, with all the planning and technology that drives data centers they are still vulnerable to natural disasters to some degree.

Let’s take a quick look at a typical informational retrieval system architecture. As you can see below, you have a crawler, a central repo, an indexer and then a ranking mechanism. With distributed systems any of these could be affected and because they are separate entities to some extent, they can have outages without affecting the overall system. So in this case I think the index is probably affected.

Now to show some “evidence”. My colleague Dom Calisto posted a blog titled “13 SEO nightmares that will keep you up at night!” on Oct 29, 2012 @ 11:16, see screenshot below:

 

Now a while ago I created SEO Crawlytics which is a WordPress plugin that tracks robot visits and does so very accurately. Using the plugin we can see the exact time stamp robot visits. As you can see below, Dom’s post was crawled within 15 minutes.

Historically, new pages on our site get crawled within 15 minutes (at the max) and then usually indexed within 40 minutes. This is something that we constantly monitor so we know the benchmarks very accurately. But this time around, it took Google around 48 hours to push the page into its index.

I understand this is not empirical data but I have noticed with a few of our sites and the only thing that I can think of is Sandy. So is Google’s infrastructure affected by it?

 

Leave a Comment

Your email address will not be made public or shared. Inappropriate and irrelevant comments will be removed.

  1. Interesting theory. If this was happening on a large scale, I’d expect to see a reduction in SERP volatility because new links are being picked up more slowly and so the search landscape’s changing less. Checking the Mozcast and Serpmetrics, volatility’s down only very slightly though. Their main data-centers are also a fair bit south of the brunt of Sandy (http://www.google.com/about/datacenters/inside/locations/). Have you seen this slow indexing on any other sites? Would be very interesting if true as it speaks to the resilience (or not) of their networks.

    • I have noticed it on a few sites that I am very familiar with. The data-centers are not in the hotspot area however Google does use co-location facilities as well, mainly using Equinix so parts of their indexing mechanism could be in the affected areas.

  2. Royal says:

    Very nice post. I just stumbled upon your weblog and wished to say that I’ve truly enjoyed
    browsing your blog posts. In any case I’ll be subscribing to your rss feed and I hope you
    write again soon!