Creating maps with R and Power BI

Door: Kevin Naels

The possibilities are infinite when it comes to creating custom visuals in Power BI. As long as you have creativity and knowledge about the right programming language, you can let your imagination run free. I’ll talk about languages – specifically about R – in more detail later. One interesting application of those endless possibilities is making maps Microsoft’s visualization tool. Let’s have a look at how to go about that.

Why use the programming language R

For this tutorial, I used R as my coding language. The reasons I chose it are:

  • The amount of documentation on how each package and function works.
  • The good collaboration between Power BI and R. There are a lot of packages, both default and custom packages that can be used to create your own custom visual.
  • The ease of debugging your R code. It’s a good idea to work with R Studio.
  • The creativity of the people who create new R packages. Maybe you’ll find ready-made visuals in line with the customer’s wishes.

It’s also possible to create custom visuals in JavaScript/TypeScript which is the default language for the designing made-to-measure visualizations in Power BI. Personally, I find this more difficult to work with. It requires more knowledge to understand the language and there’s less detailed documentation about what the functions do specifically.

Custom map with shapes and tooltips

As far as I know, there are already ready-to-go visuals in the AppSource for adding maps with shapes and the built-in tooltip functionality of Power BI. Nevertheless, the purpose of this blog is to give an example on how to start creating your own custom map with R. Moreover, it can set readers on their way to expand on their visualization possibilities and create custom imagery that doesn’t yet exist in the AppSource.

The United States of America

 

The sample data I used

The custom map I created with R represents all the states and an evolution of the population over the years. There are 3 different data sources I used in Power BI to get the desired data:

  • A KML file that contains the coordinates for each state to create the shapes on the map. These type of files can contain a lot of information which can be used to give more details to your shapes, like the specified fill color or border color. Be aware that when importing a long string of coordinates like in the image below Power Query M cuts the it if it exceeds 32.766 characters. To workaround this problem, I refer you to this interesting article of Chris Webb.

  • An EXCEL file that contains the evolution of the population throughout the years.
  • An XML file containing URLs about a specific state. I used this file to get the images of the flags.

I’ll not go into detail about the transformations I did in Power Query M, but I molded the thus far imported data to the following.

The data model and measure(s)

Before going to the most interesting part, just a brief explanation about the data model I used.

There are only a few fields that are sent to the R custom visual to process:

  • Each state contains multiple shapes and has a primary key to iterate over for each shape.
  • Each state has its evolution of the population from 2010 till 2019 that is used for the chart in the tooltip.
  • Each state has its own flag which is also used in the right upper corner in the tooltip.

In the image below, you see the rows that are sent to be processed for the R custom visual.

The ‘Population concatenation’ field is a measure which concatenates all the desired data for the chart in the tooltip. The reason why I created a measure for this, is because otherwise Power BI can’t determine the relationship between the two unrelated tables ‘States_Population’ and ‘States_Shapes’.

Population concatenation =

CALCULATE(

    CONCATENATEX(

        FILTER( States_Population, States_Population[Population LY] <> BLANK() ),

        States_Population[State] & ";" & States_Population[Year] & ";" & States_Population[Population LY] & ";"

            & CALCULATE( SUM( States_Population[Population] ) ) & ";" & States_Population[Population YOY] & ";" & States_Population[Color Threshold] & "\n"

    ),

    CROSSFILTER ( States[State], States_Shapes[State], BOTH )

)

 

There’s not always a need to do the same as I did, because it depends on how your data and data model looks like.

Writing some R

To let your local Power BI desktop use R, I refer you to this article on what to install first. To create my custom map in R, I’m using two environments. The first is Visual Studio Code to create the actual custom visual. The second is R Studio to install the required packages and to test my R code before compiling it.

The first step

First, I’m going to create a new empty custom visual which contains the folders and files that are making sure the custom visual will render correctly in Power BI. The most important file for this, is the ‘script.r’ file where all the code will reside in.

The precious code

The first line of code is written by default because the R code in general uses some specific functions used in the folder file ‘./r_files/flatten_HTML.r’. It is specifically used to install the required packages and to save the output of the script into an HTML widget for Power BI to use it.

The second line of code is a global option that I use to not let R convert the strings into factor variables. Based on my experience, I’ve had some issues in the past rendering my custom visual without this option because the strings in my data frame were converted to factor variables by default. There have been a lot of debates regarding this option which you can find in this article.

source('./r_files/flatten_HTML.r')
options(stringsAsFactors = FALSE)

 

The next important piece of code is the declaration of the required packages.

############### Library declarations ###############
libraryRequireInstall("leaflet");
libraryRequireInstall("sf");
libraryRequireInstall("htmlwidgets");
libraryRequireInstall("dplyr");
libraryRequireInstall("stringr");
libraryRequireInstall("ggplot2");
libraryRequireInstall("data.table");
libraryRequireInstall("cowplot");

 

It’s good to know that the R version and the version of the packages can differ between your local R Studio (Power BI) and R in the Power BI service. Locally, the version of R in my case is 4.1.1 and in Power BI service it’s 3.4.4. To know more about the packages and their versions that are installed in Power BI service you can find more information here.

I’ve had some problems that my custom visual worked PERFECTLY locally in Power BI desktop, but I had an error in Power BI service.

With the following piece of code, I initiate an empty map and add specific tiles to it to have the desired rendering of how the states are shown. By this I mean if you want to see the roads, or the railways or the relief in detail on the map.

############### Create an empty leaflet map ###############
LeafletMap <- leaflet()
LeafletMap <- addProviderTiles(LeafletMap, providers$CartoDB)

 

The tooltips I’m creating to show a chart with the population over the years will be saved as an SVG file and stored into a temporary folder. When running the script locally, I can find this folder back on my desktop, but I always wondered where this directory is located when running the script in Power BI service.

############### Create a temporary folder to store the SVG files for the tooltips ###############
TemporaryFolder <- tempfile()
dir.create(TemporaryFolder)

 

The following piece of code will iterate over each state and execute the two functions that I’ll discuss in the following sections. Like I already stated before when discussing the data model, each state can contain multiple shapes so the function ‘CreatePolygonFunction’ will be executed for each shape, this based on the ‘primary key’ column. Though, the function ‘CreatePlotFunction’ will only be executed once, for each state and not for each shape. The last function ‘addPolygons’ that is part of the leaflet package adds all the shapes with their corresponding tooltip to the map.µ

if (!(nrow(Values) == 0)) {

  for (state in unique(Values$State)) {

    Data <- Values %>% filter(State == state)

    DataForPopulation <- Data[ , c("State", "Flags", "Population concatenation")]

    DataForPopulation <- unique(DataForPopulation)


    Plot <- CreatePlotFunction(DataForPopulation)

    for (primary_key in Data$`Primary key`) {

      DataForPolygons <- Data %>% filter(`Primary key` == primary_key)

      DataForPolygons <- DataForPolygons[ , c("Primary key", "State", "Color", "Coordinates concatenation", "Max Color Threshold")]

      DataForPolygons <- unique(DataForPolygons)


      Polygon <- CreatePolygonFunction(DataForPolygons)

      LeafletMap <- addPolygons(LeafletMap, data = Polygon, label = Polygon$State, color = Polygon$Color, fillColor = Polygon$`Max Color Threshold`, popup = Plot, weight = 0.8, fillOpacity = 0.3)

    }

  }

}

 

The internal function ‘internalSaveWidget’ will save the map as an HTML widget to make sure it renders in Power BI.

############### Create and save widget ###############
internalSaveWidget(LeafletMap, 'out.html')

 

I’ll not discuss the code for the function ‘CreatePlotFunction’ in detail, four main parts are executed when running the visual:

    • With the function ‘fread’ from the data.table package I’ll store my output of my measure (from Power BI) into a table.
    • The function ‘ggplot’ from the ggplot2 package is a well-known function to create different kind of charts. As you can see there’s a lot going on and that’s because I want to have an evolution of the population year-over-year so I’m using two y-axes. Other than this, I also use dynamic coloring for the data points. If there’s a positive evolution of the population year-over-year, I’ll color the data points in green otherwise in red.
    • The ‘draw_image’ function of the cowplot package is used to add an image to an existing chart.
    • Last but not least, is to save the chart as an SVG file and store it in the previously created directory.
############### Create a plot for each state ###############
CreatePlotFunction <- function(p_state) {

  tryCatch(

    {

      Population <- gsub(p_state$`Population concatenation`, pattern = "\\n", replacement = "\n", fixed = TRUE)

      ColumnsPopulation =  c("State", "Year", "Population LY", "Population CY", "Population YOY", "Color Threshold")

      DataTablePopulation <- fread(text = Population, col.names = ColumnsPopulation, encoding = "UTF-8")




      DataTablePopulation <- DataTablePopulation[order(DataTablePopulation$Year),]




      NewDataTablePopulation <- melt(DataTablePopulation, id.vars = c('State', 'Year', 'Population YOY', 'Color Threshold'),

                                     measure.vars = c('Population LY', 'Population CY'), variable.name = "Population",

                                     value.name = "Sum of Population")




      ylim.primary <- c(min(NewDataTablePopulation$`Sum of Population`), max(NewDataTablePopulation$`Sum of Population`))

      ylim.secondary <- c(min(NewDataTablePopulation$`Population YOY`), max(NewDataTablePopulation$`Population YOY`))




      b <- diff(ylim.primary) / diff(ylim.secondary)

      a <- ylim.primary[1] - b * ylim.secondary[1]




      CreatePlot <- ggplot(NewDataTablePopulation, aes(x = Year)) +

        geom_col(aes(y = `Sum of Population`, fill = Population), position = "dodge") +

        geom_point(aes(y = a + `Population YOY` * b, color = `Color Threshold`), size = 2) +

        geom_line(aes(y = a + `Population YOY` * b), size = 0.5) +

        scale_x_continuous("By Year", labels = as.character(seq(min(NewDataTablePopulation$Year), max(NewDataTablePopulation$Year), by = 1)),

                           breaks = seq(min(NewDataTablePopulation$Year), max(NewDataTablePopulation$Year), by = 1), expand = c(0, 0)) +

        scale_y_continuous(name = "Sum of Population", sec.axis = sec_axis(~ (. - a) / b, labels = scales::percent, name = "Population Threshold YOY"),

                           limits = ylim.primary, oob = scales::rescale_none, labels = scales::unit_format(unit = "M", scale = 1e-6)) +

        scale_colour_manual("Population Threshold YOY", values = c("Red" = "red", "Green" = "green"), labels = c("Red" = "Under Threshold", "Green" = "Above Threshold")) +

        scale_fill_manual("Sum of Population", values = c("light grey", "dark grey")) +

        ggtitle(NewDataTablePopulation$State) + theme(axis.title = element_text(size = 8), legend.title = element_text(size = 6), legend.text = element_text(size = 6),

                                                      legend.position = "right", legend.box = "vertical", plot.margin = margin(0, 0, 0, 0, "cm"))




      if (!(str_trim(p_state$Flags) == "" | is.na(p_state$Flags))) {

        CreatePlot <- ggdraw(CreatePlot) +

          draw_image(image = p_state$Flags, x = 1, y = 1, hjust = 1, vjust = 1, halign = 1, valign = 1, scale = .2)

      }




      PathName <- paste(TemporaryFolder, paste0(p_state$State, ".svg"), sep = "/")




      svg(filename = PathName, width = 460 * 0.01334, height = 220 * 0.01334)

      print(CreatePlot)

      dev.off()




      StatePopup = paste(readLines(PathName), collapse = "")




      return(StatePopup)

    },

    error = function(e) {

      stop(paste("Failed to draw a plot", e, "."))

    }

  )

}

 

To create the shapes, I use the ‘st_polygon’ function from the sf package. Before doing this, it’s important to convert my long string of coordinates into a matrix with two columns for the latitude and the longitude. Other than that, it’s important to use the desired coordinate reference system (crs) and to convert your shape (polygon) into the datatype SpatialDataFrame.

############### Create a polygon for each state and primary key ###############

CreatePolygonFunction <- function(p_state_primary_key) {

  tryCatch(

    {

      ConvertPolygonCoordinatesToNumeric <- as.numeric(unlist(strsplit(as.character(p_state_primary_key$`Coordinates concatenation`), split = ",")))

      ConvertPolygonCoordinatesToMatrix <- matrix(ConvertPolygonCoordinatesToNumeric, ncol = 2, byrow = TRUE)




      CreatePolygon <- list(ConvertPolygonCoordinatesToMatrix)

      CreatePolygon <- st_polygon(CreatePolygon) %>% st_sfc(crs = 32611)

      SpatialPolygons <- as(CreatePolygon, 'Spatial')

      SpatialPolygons$State <- p_state_primary_key$State

      SpatialPolygons$Color <- p_state_primary_key$Color

      SpatialPolygons$`Max Color Threshold` <- p_state_primary_key$`Max Color Threshold`




      return(SpatialPolygons)

    },

    error = function(e) {

      stop(paste("Failed to draw a polygon on the map", e, "."))

    }

  )

}

 

Conclusion

The combination of custom visuals created in R and Power BI still needs improvement. Power BI typically uses the JavaScript/TypeScript SDKs which are more reliable to implement and have a better performance for rendering your custom visuals. In the following question/answer on the Power BI community it is stated why that is so.

Whether R meshes well with Power BI really depends on how your R script is written. Are your functions complex? How many for loops do you use? Are you using the best practices? If your code already performs slow in R Studio when testing it, then it will surely be slow in Power BI.

Personally, I find that R is easier to comprehend, and R Studio is a great tool to debug your script. Regarding the performance, it’s a good thing that you make all your calculations beforehand in your data source or in Power BI and just use your R custom visual to create the VISUAL. If your script is not too complex, then it shouldn’t be a problem regarding the performance.

References

Data

Documentation