How to get Google Analytics data in Power Query

How to get Google Analytics data in Power Query

Microsoft Power Query is an Excel add-in that helps you gather, transform and analyse data from various sources. It’s a very powerful tool and a must have for anyone interested in BI, no matter how experienced. Microsoft also releases frequent updates of this add-in, including new source connections.

Google Analytics is a powerful tool for monitoring and analyzing your website traffic. It helps you know who is visiting, what they are looking for and how they are getting there.
As a BI consultant at Kohera, we’ve had requests to have Google Analytics as a data source in Power Query. While this is possible very well with the Google Analytics Core Reporting API, it’s not as straight forward to setup the OAUTH2 authentication for the first time. In a first step this authentication is explained and further on you will find out how to retrieve Google Analytics results and process them in Power Query.

OAUTH 2.0

First of all, you need access to a Google Analytics View. You probably wouldn’t be reading this if you had none. In Google Analytics, go to `Admin` and then navigate to `View Settings`:

150131b

Copy the ID you find there for later use.

150131c

If not yet done, go to https://console.developers.google.com/project to create a new project (this is registering your application with Google). Once created, go to the section with the APIs and enable the `Analytics API` (change the status from `OFF` to `ON`).

150131d

Now go to Credentials to create a new Client ID (for native application):

150131e

150131f

Copy the Client ID and Secret for later use. The Client Secret should be kept confidential.

150131g

With all codes available the user can now grant access, so the application can consume the service. At first, an authentication code has to be requested with the following URL (use your Client ID):

https://accounts.google.com/client_id=YourClientID
&response_type=code
&redirect_uri=urn%3aietf%3awg%3aoauth%3a2.0%3aoob
&approval_prompt=force
&scope=https%3a%2f%2fwww.googleapis.com%2fauth%2fanalytics.readonly
&access_type=offline

Set the scope to what is wanted. In this case it was set to readonly access to Google Analytics. Visit the OAuth 2.0 Playground to browse the available scopes:

150131h

Entering the URL in a browser will make it possible to grant access (you need to URL encode the URL like changing `/` to `%2f` – this is already done in the url above):

150131i

When you accept, the first token (authentication code) will be returned:

150131j

Save this code, we’ll need it in a few moments.

From Authentication Code to Access Token

With the authentication code, you can now request the access and refresh token. This is a POST request, so you can not just build the url and launch it from your browser. There are several ways to do this. One method is using PowerShell. Since V3 it has a command named `Invoke-RestMethod`. This is some example code (replace the first 3 parameters with the proper information):

$TokenUrl = "https://accounts.google.com/o/oauth2/token";
  
$body = @{
  code="the authentication code from the previous step";
  client_id="your client id";
  client_secret="your client secret";
  redirect_uri="urn:ietf:wg:oauth:2.0:oob";
  grant_type="authorization_code"; # fixed value
};

$json = Invoke-RestMethod -Uri $TokenUrl -Method POST -Body $body;

The json object contains an access token and also the refresh key. The access token is valid for one hour (or check the expires_in property of the json object, that will normally be 3600).

A sortlike method can be followed to obtain a new access code, as every access token is only valid for one hour.

$TokenUrl = "https://accounts.google.com/o/oauth2/token";
  
$body = @{
  client_id="your client id";
  client_secret="your client secret";
  refresh_token="refresh key from the previous step";
  grant_type="refresh_token"; # fixed value
};

$json = Invoke-RestMethod -Uri $TokenUrl -Method POST -Body $body;

 

Getting Information from Google Analytics

Now that the authentication has been set, it’s time to write the URL that will return the needed information. As an example we want to return information about the Operating System that was used to visit the website. The URL looks like this:

https://www.googleapis.com/analytics/v3/data/ga?access_token=123456789&ids=ga:33784873&start-date=2014-10-01&end-date=2014-12-31&dimensions=ga:operatingSystem&metrics=ga:sessions

  • https://www.googleapis.com/analytics/v3/data/ga is the main part
  • access_token=123456789 is the Access Token (a token that is not yet expired)
  • ids=ga:33784873 is the Google Analytics view that we want to query
  • start-date=2014-10-01 is the start date (format: YYYY-MM-DD)
  • end-date=2014-12-31 is the end date (format: YYYY-MM-DD)
  • dimensions=ga:operatingSystem is to let the system know that we want to include the Operating System as a dimension
  • metrics=ga:sessions is the number of sessions against the dimension

As the access code is only valid for one hour, it needs to be set dynamically. We will take care of that in a moment.

Check this site for general information about the Google Analytics API. Available dimensions and references can be found here or play around with them here. Some common queries are listed here.

Power Query

Now that the URL is known, the returned information needs to get in Excel. For this, we use Power Query, as it can parse json. One of the things that needs to be part of the URL is the access_token parameter. We assume that it’s in an Excel table with the name `TblToken`. It has one column, named Access Token.

150131k

Power Query can read the token like this:

Excel.CurrentWorkbook(){[Name="TblToken"]}[Content]{0}[Access Token]

Combined with some VBA we can update the token in an easy way. Alternatively it’s possible to put the token in a text file and have Power Query read this (using `Lines.FromBinary` and `File.Contents`).

This can be done with the PowerShell code from above that we run every five minutes. If the token is about to expire we run the code and save the result to a file that Power Query can read.

The VBA and PowerShell code will be explained in a moment, but first, let’s look at the Power Query code. You can access the code in Power Query, via the `View` > `Advanced Editor`:

150131l

let
 analytics = "https://www.googleapis.com/analytics/v3/data/ga?access_token=",
 addAccessToken = analytics & Excel.CurrentWorkbook(){[Name="TblToken"]}[Content]{0}[Access Token],
 addViewId = addAccessToken & "&ids=ga:33784873",
 addStartDate = addViewId & "&start-date=2014-10-01",
 addEndDate = addStartDate & "&end-date=2014-12-31",
 addDimensions = addEndDate & "&dimensions=ga:operatingSystem",
 url = addDimensions & "&metrics=ga:sessions",
 GaJson = Json.Document(Web.Contents(url)),
 GetRows = GaJson[rows],
 TableFromList = Table.FromList(GetRows, Splitter.SplitByNothing(), null, null, ExtraValues.Error),
 ColOS = Table.AddColumn(TableFromList, "OS", each [Column1]{0}),
 ColCount = Table.AddColumn(ColOS, "Count", each [Column1]{1}),
 RemoveFirstColumn = Table.RemoveColumns(ColCount,{"Column1"})
in
 RemoveFirstColumn

The first lines build the URL. Most parameters are hard-coded, but we could put these values in Excel, just like we did with the Access Token. The part that contains `Json.Document` gets the return value of the posted URL and tries to parse it into a JSon object. As a list was returned, we first convert it to a table. This however results in a table with a list in each cell:

150131m

That is why we add two columns, to extract the values from these lists and present it in a cleaner way. The last line removes the first column, as it’s not needed any more. The final result looks like this:

150131n

And when we add this as a source in Power View (ordered and with some images) it could look like this:

150131o

Refresh the access token using VBA

Kyle Beachill has created a VBA Class Module that handles OAuth2 authentication, including a method to renew access tokens. I tried this module with success. If you’re working with Office 64 bit, the ScriptControl does not work, and so the GetProp function does not work. But you can change it with this code:

Private Function GetProp(strPropName As String, Optional strJSObject As String = "") As String

  Dim regEx As New RegExp
  Dim mc As MatchCollection

  If Len(strJSObject) > 0 Then strResponseText = strJSObject
  
  regEx.MultiLine = True
  regEx.Global = True
  
  regEx.Pattern = "\""" & strPropName & "\""" & "\s+:\s+\""(.+)\"""
  Set mc = regEx.Execute(strResponseText)
  
  If mc.Count > 0 Then
    GetProp = mc(0).SubMatches(0)
    Exit Function
  End If
  
  ' maybe we're looking for digits instead of a string
  regEx.Pattern = "\""" & strPropName & "\""" & "\s+:\s+(\d+)"
  Set mc = regEx.Execute(strResponseText)
  
  If mc.Count > 0 Then
    GetProp = CStr(mc(0).SubMatches(0))
    Exit Function
  End If
    
End Function

You need to add a few references in order to use the VBA Module:

150131p

Refresh the access token using PowerShell

This code was already shown above. We could extend the script to have the access token being saved in a (shared) location. The script could then run every hour using Windows Task Scheduler. Using Power Query it’s no problem to read the first line of a text file:

Table.FromColumns({Lines.FromBinary(File.Contents("C:\...\access_token.txt"))}){0}[Column1]

Or when saved on the (intra)net:

Table.FromColumns({Lines.FromBinary(Web.Contents("http://.../access_token.txt"))}){0}[Column1]

Refresh the access token using Power Query

In case you want to keep everything in Power Query, we can use the Web.Contents function with the `Content` variable. This will submit a POST request instead of a GET request (which is the default). This should return a json object with a valid access token. The code looks like this:

let
  clientId = "client_id=YourClientId",
  addSecret = clientId & "&client_secret=YourClientSecret",
  addRefreshToken = addSecret & "&refresh_token=YourRefreshToken",
  url = addRefreshToken & "&grant_type=refresh_token",
  GetJson = Web.Contents("https://accounts.google.com/o/oauth2/token",
    [
      Headers = [#"Content-Type"="application/x-www-form-urlencoded"],
      Content = Text.ToBinary(url)
    ]
  ),
    FormatAsJson = Json.Document(GetJson),
    access_token = FormatAsJson[access_token]
in
  access_token

The result of this query could then be used in the solution above (instead of getting the code from a cell in the worksheet you get it from the code above – if you reference another query in the same workbook, don’t forget to enable the Fast Combine workbook setting). You could even include further logic to only run the Web.Contents when the current code has expired, but then you need to save the expiration date and access_token in a worksheet and only refresh the token when the current date has passed that date.