We spend a lot of our lives using Google, looking at their ads, reading their guidelines, buying their products – but what have they done for you lately? Not a lot right? Maybe, if you’re lucky, they’ve crippled your incoming traffic with a Manual Action Penalty? Cheers.
So here’s a handy little trick using Google Docs to bend the almighty G to your will for once.
We’ve all had to do long, arduous manual data collection tasks in our time. A necessary evil you say? Well no longer!
Let’s imagine you’ve got some data, spread across multiple pages, that you’d like to stay on top of. Perhaps a selection of stock prices, the number of reviews on multiple Google Places pages or anything else you can think of that fits the bill. Now you could just check them all every time you need an update, but who’s got time for that?
Fire up a new Google Docs Spreadsheet and let’s get cracking.
For our example scenario we’re going to image that, as culinary connoisseurs of the Crawley area, we’d like to keep track of the average rating on Google+ of our favourite local dining establishments.
To start with we’ll find the Google Places listing of our first venue here (a favoured haunt of team RocketMill’s):
Now once I’ve gotten over my initial disgust at them not having specified a cover photo on their Places listing, I’m pleasantly surprised to see that they’ve got an average rating of 4.2. Hooray! That means we probably won’t all get food poisoning!
Click on the information that you’d like to extract, in this case the rating, and select, “Inspect Element” from the drop down menu to open up what I like to call the element inspector (mainly just because it sounds like a TV show about a detective who solves crimes with the power of science! But I digress…).
This will select the area of source code from the page that represents the data that you’re interested in.
But we’re not just here for the raw numbers; we want to know the value here at any time, even if it changes! So what we’re really focused on here is the reference to the point in the page that holds our rating, called the “Xpath”.
Right click on the code itself, exactly on the element that you’re trying to track, and select, “Copy XPath”.
Now we’re going to hop back to our Google Spreadsheet.
Enter the URL of the targeted page into one cell and your copied XPath into another. It should look something like this:
Now in a third cell we’re going to create a formula, just like the ones you use in Excel.
The formula in this case is a function called import XML and takes two arguments. The first is the URL of the target page and the second is the specific location of the required information (Xpath), so your formula should look like this:
=importXML(<URL Here>;<XPath Here)
As we’ve stored our URL and Xpath in cells already, that becomes more like this:
=importXML(<Cell Containing URL>;<Cell Containing XPath)
Which in the case of my example is:
Once you’ve created your formula and entered it into the cell, press enter to execute it.
Voila! Internet magic! The function should now retrieve the data you were after.
If each different page of the website you’re collecting your information from is organised in a uniform fashion (like Google Places for example) in that each different page follows the same structure, then the information that you’re looking for should always be in the same place on every page (meaning the XPath will be the same).
So if we wanted to compare our original restaurant with another one, just enter the URL of the second restaurant into your Spreadsheet and use the same XPath in your formula:
Just to ruin the effectiveness of my example, the average rating for our second restaurant is exactly the same as the first one, 4.2. But I promise that it has worked successfully!
This should work in theory for any website that structures each page in the same way.
Nifty eh! Once you’ve taken the initial time to work out what information you’d like to track and assembled your list of targets, this method can be a huge time saver. Simply refresh the document and you’ve got freshly up to date values.