The hidden gems of the function Web.Contents()

Some of my customers want me to import their data from a standard on-premises or cloud-based (Azure) SQL-database, but others want me to import their data from a REST API. If ever, you’ll have to import data from a REST API, you’ll surely encounter some obstacles. That’s why I write this article to prevent you from this. Based on a real-life example, I’ll talk about the most common problems when using REST API’s, where for some of them still no solution has been found by Microsoft itself (although not that I’m aware of).

A little bit of context to my example

My customer wants me to import data in Power BI, coming from the Zendesk API. Zendesk is known as a support ticketing system, which is mainly used by the IT helpdesk where tickets are created and updated when IT-related problems arise in a company. To give you a little example: an employee of a company sees that the printer doesn’t work anymore, so he sends a request to the IT helpdesk. The IT helpdesk creates a ticket with all the information needed and will try to solve the problem or send another more skilled employee to solve this problem. In the meantime, the ticket has to be updated with all the information, such as the name of the requester, the name of the agent (IT helpdesk employee), a description of the problem, when the ticket has been created, when the ticket has been solved and so on.

To summarize briefly what I had to do, is to import data about all the agents in the company, as well as all the tickets that have been created, so my customer can analyze this data afterwards.

Optional information: A big plus about Zendesk is that there is a lot of documentation (which can be found here if interested) of how to use the REST API’s, which is in general a good start of importing the desired data in a correct way to Power BI. Unfortunately, this still doesn’t solve the problems that occur with the function Web.Contents() and Power BI.

 

When the problems occur

  • What’s the problem?

It all starts in Power Query M (Power BI) where I want to call REST API’s from the Zendesk API with the function Web.Contents(). This function is very useful and has a lot of parameters that can be used to call simple web pages such as Wikipedia, as creating new data with a REST API. If you don’t have any experience with calling REST API’s or how this all works internally, talking about HTTP-headers, GET, POST and such, then you’d struggle with the function Web.Contents() because it doesn’t have the essential information to use it correctly in Power Query M. It’s still a very powerful function if you use it correctly and can bypass it in certain ways.

The first problem that I crossed, is when I published my Power BI report (with the usage of Web.Contents() in it) from desktop to the service, to schedule an automatic refresh. Underneath is a screenshot that appeared before I wanted to schedule a refresh in Power BI service.

When I saw this screenshot, it told me nothing about how to solve this problem…

 

  • Starting from the bottom

To find out how the problem occurred, I started from the bottom with a simple connection to the REST API. The following code in Power Query M consists of the main function Web.Contents(). It requires one parameter, namely the URL you want to call, as in my case: https://example.zendesk.com/api/v2/incremental/tickets.json?include=metric_sets&start_time=0. This URL will call the REST API to get all the tickets that have been created. You’ll also see that there are extra parameters added after the question mark, these “include=metric_sets” and “start_time=0”.

Important to notice is that I’ve hardcoded my URL in the function (I’ll come back to it later). As an extra, I’ve added an optional parameter to the function in which I create a HTTP-header with an access token added to it. To get the data from the Zendesk API, it is obligated to use an access token. The result of this query will give me the first 1.000 tickets created starting from the beginning.

 

Optional information: “include=metric_sets” will give me some extra data about the metrics of a ticket, and “start_time=0” is a UNIX-based value and is linked to the datetime of the first created ticket. If changing this to another UNIX-based value like 1292834834 (which is Monday 20 December 2010 08:47:14 as datetime) it will give me all the created tickets equal and after this UNIX-based value. More information about the REST API can be found here.

When running this query for the first time, it will ask me how I want to authenticate. As I will authenticate as anonymous, because my credentials are the access token stored in the header. Secondly, it asks me on which level I want to set my authentication. If I’m setting anonymous as authentication on the root level https://example.zendesk.com, it means that all the API calls underneath will be as anonymous.

 

 

It is also possible to set the authentication on the level of the API call, for example, if you make a call to https://example.zendesk.com/api/v2/incremental/tickets.json?include=metric_sets&start_time=0 with basic authentication, you can set different credentials then when you call the API https://example.zendesk.com/api/v2/users.json?page=1 (I won’t give a detailed explanation about this, but it is good to know that these settings have a role in the usage of the function Web.Contents(). I’ll give some examples in the chapters).

When the credentials are all set, everything works fine in Power BI desktop, I get the desired data. When publishing my Power BI report to the service, it gives me my first error as seen in the following picture. Do you notice that it is different than the first error? This is because the URL that I’m trying to call is a valid URL, but the credentials aren’t valid to get the desired data. Though why doesn’t it raise an error in Power BI desktop and in the service it does?

 

 

Many questions left unanswered… All of these problems have to do with the fact that Power BI desktop (in which refreshes succeed) uses a different method of analyzing the code in Power Query M, to determine the right data sources and the corresponding credentials, than in Power BI service. In Power BI service, static analysis is being used. A difficult term, isn’t it? Based on my example, I’ll explain what it does:

  • Static analysis in Power BI service will evaluate the first parameter in the function Web.Contents() to determine what the data source is, which in my example is https://example.zendesk.com/api/v2/incremental/tickets.json and based on the given credentials (which is anonymous because my access token is being called in the header) it will check if those are correct to call the URL to the API of Zendesk. An easy way to check if the evaluation is correct, is by opening the URL in a web browser and if it is valid or it gives you the status code 200 which is OK, then it’s OK, correct, valid… In my example, it raises an error with status code 401, which is UNAUTHORIZED. But why does it raise an error if I’m using an access token in my HTTP-header to call the API? There’s the catch. The problem is that with static analysis it only evaluates the first parameter and if it raises an error, it will not execute the rest of the function. Basically, I’m calling the URL to the API of Zendesk authenticated as anonymous, in which as result it raises an error (because valid credentials are needed) and stops the execution of the rest of the function Web.Contents() including my headers where my access token is stored. Only when the URL of the first parameter in the function Web.Contents() is recognized and the credentials used to call the URL are valid, then the rest of parameters of that function be executed too.

 

  • How to solve this?

There’s a way to bypass this with a feature introduced by Microsoft which is called skip test connection. If you look back at the picture with the error message saying ‘invalid credentials’, at the bottom you’ll see a checkbox with the name skip test connection.

  • As in my example, the evaluation of the static analysis raises an error (or you can also say that the test connection failed), but when I check ‘skip test connection’, it will ignore the error given by the static analysis evaluation, therefore executing the rest of the function Web.Contents(). (In my case there’s always an extra step for making sure the right credentials are being used to call the API of Zendesk, this with my access token in the HTTP-header. Say I’ve hardcoded the access token in the HTTP-header and on this day, it works in Power BI desktop as in Power BI service but on the next day, the access token has been changed. Even when ‘skip test connection’ is checked on, resulting to a succeeded evaluation, still the full execution will lead to an error because I still use the wrong (hardcoded) access token.)

There are also cases in which this feature can be very useful, but here in my example it is not relevant to write about it. Though, there is a guy called Chris Webb whom is a Power Query M guru that speaks about the feature in more detail. (I must confess that this article is also based on the articles that he has been writing about the function Web.Contents(), but by using a real live example, it is easier to understand and getting more in detail about it.)

If you’ve been given the job to import data from public web contents such as the following URL: https://ckan.publishing.service.gov.uk/api/3/action/package_search?q=apples then you don’t have to worry about invalid credentials. When opening the URL in a web browser, it will give you the status code 200, which is OK and as expected also the right data. Therefore, as authentication, you can use anonymous and a scheduled refresh will succeed. You can try this by following the steps below:

 

 

As configured in Power BI destop, in Power BI service, the authentication method is also set to anonymous.

 

Instead of ignoring the test connection as discussed as the first problem solver, I could also add extra properties in the second parameter of the function Web.Contents(). In the example that raised the error in Power BI service, I used the full URL like shown below and because I used anonymous as authentication to call the API of Zendesk where credentials are actually needed, the test connection failed.

 

Because only the first parameter of the function Web.Contents() is being evaluated during the test connection or static analysis, I only need to give a valid URL as the first parameter, which in my case in the following example is https://example.zendesk.com. By opening this URL in a web browser, it gives me a valid result without giving me the error UNAUTHORIZED. Though, this doesn’t give me the API call to Zendesk… In the code, the call to the API of Zendesk as the extra parameters are given as extra properties in the second parameter of the function Web.Contents(). Only when the evaluation during the test connection or static analysis succeeds, then the rest of the function Web.Contents() will be executed. In my case, only when https://example.zendesk.com is valid as URL, then my properties in the second parameter of the function Web.Contents() will be executed.

 

 

To understand what is happening is that:

  1. In Power BI service, a test connection will be executed or static analysis will evaluate the URL https://example.zendesk.com with as authentication anonymous. Because no credentials are needed, this URL will give a valid result or a status code 200. The evaluation succeeds.
  2. The extra properties in the second parameter of the function Web.Contents will be appended to https://example.zendesk.com which means the rest of the URL in RelativePath and the parameters of the URL in Query will be appended to the root URL which results in the full URL https://example.zendesk.com/api/v2/incremental/tickets.json?include=metric_sets&start_time=0.
  3. The access token that is stored in the headers which are also part of the properties in the second parameter of the function Web.Contents() makes sure that the right credentials are given to make the call to the API of Zendesk, or here the full URL that has been appended.
  4. A scheduled refresh succeeds!

 

If possible, always use a different authentication method than anonymous. If for example, as in my case, the access token is stored hardcoded in the header, then it’s not secure. Next to using an access token, it is also possible to use basic authentication (other method than anonymous). To call the API of Zendesk, the agent of the IT helpdesk can use its login credentials, and in Power BI, it is also possible to use it as authentication method. In the following example, I’ll call the same URL like I did in the first example but instead of using anonymous I’ll use basic authentication. Giving the right credentials in Power BI, will lead to a successful scheduled refresh afterwards.

 

 

 

An extra remark: if I’m using different credentials than anonymous as authentication method in the user interface, it is not possible to combine it with the property ‘Authorization’ which is used in the second parameter of the function Web.Contents(), otherwise I get this error like this one.

 

That’s because the credentials that are passed during the authentication in the user interface are internally passed in the ‘Authorization’-property, so if I’m entering my credentials using basic authentication in the user interface, Power BI will convert these in base64 and pass them in the ‘Authorization’-property.

In the first code, I’m using basic authentication as authentication method, with the usage of the second parameter but without the ‘Authorization’-property.

In the second code, the username and password are parameters that the user can fill in. Here I’ll use anonymous as authentication method because the parameters are a replacement for the basic authentication method in the user interface.

If you have read the part of the test connection or static analysis, you’ll probably know how the piece of codes will be evaluated and executed.

 

  • How to solve the main problem?

Now I know how the function Web.Contents() works and I managed to resolve the problem with the invalid credentials. In my previous example, I always worked with static URL’s. But what if I want to work with parameters like in the following piece of codes. What if I want to include another set than the metric set I included in my previous examples? This can be done by parameters. In the following piece of codes which are functions, I can add an URL or a parameter, which will give me the wanted data. When I publish my Power BI report to the service, then I’ll get THE ERROR, the one that causes problems.

 

 

The problem has to do with the fact that Power BI service doesn’t recognize the data source. There’s not much information about why Power BI service falls over that, because if you give the exact URL as a parameter this should be the same as hardcoding the URL in the function Web.Contents()? I guess its logic that if you try to make it dynamic based on functions and parameters, that this can always be a variable value. In Power BI service, the test connection or static analysis will not know what the data source will be so it can’t be evaluated. In Power BI desktop, it also states that the used data source is not listed because the usage of a function or parameter. If Power BI desktop doesn’t recognize the data source, then Power BI service will certainly not recognize it.

 

The following code will finally solve the error. It is important that the root URL stays static if possible. In my previous experiences I know that the URL’s as the parameters of the API’s can change but that the root URL normally stays the same. For example, if I want to get the tickets of the company ‘example’ then I’ll use another path than I want to get the agents. If the test connection or static analysis succeeded with the root URL, then the rest of the function Web.Contents() will be executed anyway and then you can use as much parameters you want.

 

In the second part of the article I’ll show the full real-life case wherein functions as pagination will be used to call the API of Zendesk.