Blog

World Cup Russia dashboard in Power BI: importing the teams from multiple web pages

While working on our World Cup dashboard my colleague Philippe Van Rillaer and I wanted to import data from tables spread out over multiple Wikipedia pages. One example was importing players that are currently selected for the teams that qualified for the upcoming World Cup. We could navigate to each country’s page separately, but of course Power BI offers more powerful options than this.

 

Creating our function

Usually this mass importing is quite easy to do quickly, following the correct steps:

We start with a new source (web) and enter the link (E.G. https://en.wikipedia.org/wiki/Belgium_national_football_team). After selecting the table we want to import, we have the data in our query editor. The steps we took to get to this point are on the right, under ‘Query Settings’. We can go one level deeper though, by opening our advanced editor (under ‘Home’ in the ribbon). We’re met with the code behind the steps, which is in the language ‘M’:

Next, we want to transform the importing of this single table into a function and apply it to a list of the qualified countries. In order to do this, we first create a parameter that will hold the country name: under manage parameters pick ‘New Parameter’. I fill out ‘Country’ for its name, set the type to ‘Text’ and add ‘Belgium’ as current value to start out.

Returning to the advanced editor of the table we already imported, you can now replace ‘Belgium’ and reference the parameter instead. The entire link then becomes:

https://en.wikipedia.org/wiki/” & Country & “_national_football_team

At this point we right click the table and choose ‘Create function’. This leaves us with a function we can then apply to another table, effectively giving us a method of importing the same table for a list of countries.

Next, we go to a table that enumerates the qualified countries. In this table we invoke our custom function (you can find this under ‘Add Column’). I named mine ‘GetCurrent’, so this is the function query I pick from the list. Power BI also asks us which column we want to use for the parameter Country, an easy choice given that it is a single column table.

The result is the table above. Clicking on the two arrows in the top right corner gives us a choice of which columns to import from our data source, after which each of the countries is expanded to cover multiple rows in our table, one for each player in their respective current squads.

Referencing by table name

Or at least, this was wat we expected. In reality the numbers of columns to choose from was overly large and the tables didn’t contain the data we expected to find.

What happened? It turns out that we didn’t reference the table by name (‘Current squad[edit]’ for the Belgian page), but by number instead. You can see this in the code in the advanced editor, it reads “Source{1}”. Other Wikipedia pages had different numbers and so we imported an amalgam of tables containing all sorts of information we didn’t need.

The solution seemed simple enough: reference by table name instead of number. In our custom function “Source{1}” was replaced by “Source{[Caption=”Current squad[edit]”]}” in the advanced editor and all seemed well for a few seconds.

Exception handling

Reapplying the custom function to our list of participating countries resulted in a few error lines however. Not only do the webpages have different numbering, it appears that there are also variations in the caption of this table.

For Belgium and a few other countries it is ‘Current Squad[edit]’. A variation however was simply ‘Current’ (looking at you, Peru). Once again we went back to the advanced editor of our custom function.

This time we added some logic that checks for the alternative caption of the table:

Third time’s the charm, we were greeted by a beautiful table filled with the data we wanted. Sometimes it’s the simple pleasures in life.