Microsoft Power Query is an Excel add-in that helps you gather, transform and analyse data from various sources. It’s a very powerful tool and a must have for anyone interested in BI, no matter how experienced. Microsoft also releases frequent updates of this add-in, including new source connections.
Google Analytics is a powerful tool for monitoring and analyzing your website traffic. It helps you know who is visiting, what they are looking for and how they are getting there.
As a BI consultant at Kohera, we’ve had requests to have Google Analytics as a data source in Power Query. While this is possible very well with the Google Analytics Core Reporting API, it’s not as straight forward to setup the OAUTH2 authentication for the first time. In a first step this authentication is explained and further on you will find out how to retrieve Google Analytics results and process them in Power Query.
First of all, you need access to a Google Analytics View. You probably wouldn’t be reading this if you had none. In Google Analytics, go to `Admin` and then navigate to `View Settings`:
Copy the ID you find there for later use.
If not yet done, go to https://console.developers.google.com/project to create a new project (this is registering your application with Google). Once created, go to the section with the APIs and enable the `Analytics API` (change the status from `OFF` to `ON`).
Now go to Credentials to create a new Client ID (for native application):
Copy the Client ID and Secret for later use. The Client Secret should be kept confidential.
With all codes available the user can now grant access, so the application can consume the service. At first, an authentication code has to be requested with the following URL (use your Client ID):
https://accounts.google.com/client_id=YourClientID
&response_type=code
&redirect_uri=urn%3aietf%3awg%3aoauth%3a2.0%3aoob
&approval_prompt=force
&scope=https%3a%2f%2fwww.googleapis.com%2fauth%2fanalytics.readonly
&access_type=offline
Set the scope to what is wanted. In this case it was set to readonly access to Google Analytics. Visit the OAuth 2.0 Playground to browse the available scopes:
Entering the URL in a browser will make it possible to grant access (you need to URL encode the URL like changing `/` to `%2f` – this is already done in the url above):
When you accept, the first token (authentication code) will be returned:
Save this code, we’ll need it in a few moments.
With the authentication code, you can now request the access and refresh token. This is a POST request, so you can not just build the url and launch it from your browser. There are several ways to do this. One method is using PowerShell. Since V3 it has a command named `Invoke-RestMethod`. This is some example code (replace the first 3 parameters with the proper information):
$TokenUrl = "https://accounts.google.com/o/oauth2/token"; $body = @{ code="the authentication code from the previous step"; client_id="your client id"; client_secret="your client secret"; redirect_uri="urn:ietf:wg:oauth:2.0:oob"; grant_type="authorization_code"; # fixed value }; $json = Invoke-RestMethod -Uri $TokenUrl -Method POST -Body $body;
The json object contains an access token and also the refresh key. The access token is valid for one hour (or check the expires_in property of the json object, that will normally be 3600).
A sortlike method can be followed to obtain a new access code, as every access token is only valid for one hour.
$TokenUrl = "https://accounts.google.com/o/oauth2/token"; $body = @{ client_id="your client id"; client_secret="your client secret"; refresh_token="refresh key from the previous step"; grant_type="refresh_token"; # fixed value }; $json = Invoke-RestMethod -Uri $TokenUrl -Method POST -Body $body;
Now that the authentication has been set, it’s time to write the URL that will return the needed information. As an example we want to return information about the Operating System that was used to visit the website. The URL looks like this:
https://www.googleapis.com/analytics/v3/data/ga?access_token=123456789&ids=ga:33784873&start-date=2014-10-01&end-date=2014-12-31&dimensions=ga:operatingSystem&metrics=ga:sessions
As the access code is only valid for one hour, it needs to be set dynamically. We will take care of that in a moment.
Check this site for general information about the Google Analytics API. Available dimensions and references can be found here or play around with them here. Some common queries are listed here.
Now that the URL is known, the returned information needs to get in Excel. For this, we use Power Query, as it can parse json. One of the things that needs to be part of the URL is the access_token parameter. We assume that it’s in an Excel table with the name `TblToken`. It has one column, named Access Token.
Power Query can read the token like this:
Excel.CurrentWorkbook(){[Name="TblToken"]}[Content]{0}[Access Token]
Combined with some VBA we can update the token in an easy way. Alternatively it’s possible to put the token in a text file and have Power Query read this (using `Lines.FromBinary` and `File.Contents`).
This can be done with the PowerShell code from above that we run every five minutes. If the token is about to expire we run the code and save the result to a file that Power Query can read.
The VBA and PowerShell code will be explained in a moment, but first, let’s look at the Power Query code. You can access the code in Power Query, via the `View` > `Advanced Editor`:
let
analytics = "https://www.googleapis.com/analytics/v3/data/ga?access_token=",
addAccessToken = analytics & Excel.CurrentWorkbook(){[Name="TblToken"]}[Content]{0}[Access Token],
addViewId = addAccessToken & "&ids=ga:33784873",
addStartDate = addViewId & "&start-date=2014-10-01",
addEndDate = addStartDate & "&end-date=2014-12-31",
addDimensions = addEndDate & "&dimensions=ga:operatingSystem",
url = addDimensions & "&metrics=ga:sessions",
GaJson = Json.Document(Web.Contents(url)),
GetRows = GaJson[rows],
TableFromList = Table.FromList(GetRows, Splitter.SplitByNothing(), null, null, ExtraValues.Error),
ColOS = Table.AddColumn(TableFromList, "OS", each [Column1]{0}),
ColCount = Table.AddColumn(ColOS, "Count", each [Column1]{1}),
RemoveFirstColumn = Table.RemoveColumns(ColCount,{"Column1"})
in
RemoveFirstColumn
The first lines build the URL. Most parameters are hard-coded, but we could put these values in Excel, just like we did with the Access Token. The part that contains `Json.Document` gets the return value of the posted URL and tries to parse it into a JSon object. As a list was returned, we first convert it to a table. This however results in a table with a list in each cell:
That is why we add two columns, to extract the values from these lists and present it in a cleaner way. The last line removes the first column, as it’s not needed any more. The final result looks like this:
And when we add this as a source in Power View (ordered and with some images) it could look like this:
Kyle Beachill has created a VBA Class Module that handles OAuth2 authentication, including a method to renew access tokens. I tried this module with success. If you’re working with Office 64 bit, the ScriptControl does not work, and so the GetProp function does not work. But you can change it with this code:
Private Function GetProp(strPropName As String, Optional strJSObject As String = "") As String Dim regEx As New RegExp Dim mc As MatchCollection If Len(strJSObject) > 0 Then strResponseText = strJSObject regEx.MultiLine = True regEx.Global = True regEx.Pattern = "\""" & strPropName & "\""" & "\s+:\s+\""(.+)\""" Set mc = regEx.Execute(strResponseText) If mc.Count > 0 Then GetProp = mc(0).SubMatches(0) Exit Function End If ' maybe we're looking for digits instead of a string regEx.Pattern = "\""" & strPropName & "\""" & "\s+:\s+(\d+)" Set mc = regEx.Execute(strResponseText) If mc.Count > 0 Then GetProp = CStr(mc(0).SubMatches(0)) Exit Function End If End Function
You need to add a few references in order to use the VBA Module:
This code was already shown above. We could extend the script to have the access token being saved in a (shared) location. The script could then run every hour using Windows Task Scheduler. Using Power Query it’s no problem to read the first line of a text file:
Table.FromColumns({Lines.FromBinary(File.Contents("C:\...\access_token.txt"))}){0}[Column1]
Or when saved on the (intra)net:
Table.FromColumns({Lines.FromBinary(Web.Contents("http://.../access_token.txt"))}){0}[Column1]
In case you want to keep everything in Power Query, we can use the Web.Contents function with the `Content` variable. This will submit a POST request instead of a GET request (which is the default). This should return a json object with a valid access token. The code looks like this:
let
clientId = "client_id=YourClientId",
addSecret = clientId & "&client_secret=YourClientSecret",
addRefreshToken = addSecret & "&refresh_token=YourRefreshToken",
url = addRefreshToken & "&grant_type=refresh_token",
GetJson = Web.Contents("https://accounts.google.com/o/oauth2/token",
[
Headers = [#"Content-Type"="application/x-www-form-urlencoded"],
Content = Text.ToBinary(url)
]
),
FormatAsJson = Json.Document(GetJson),
access_token = FormatAsJson[access_token]
in
access_token
The result of this query could then be used in the solution above (instead of getting the code from a cell in the worksheet you get it from the code above – if you reference another query in the same workbook, don’t forget to enable the Fast Combine workbook setting). You could even include further logic to only run the Web.Contents when the current code has expired, but then you need to save the expiration date and access_token in a worksheet and only refresh the token when the current date has passed that date.
© 2022 Kohera
Crafted by
© 2022 Kohera
Crafted by