Power BI: Transform Data with R

#

A common use case for R is its ability to reshape, wrangle and clean data sets before analysis. It's particularly good for automating this process by scripting it, especially when you have a lot of files to work with. But I still believe there are more convenient tools for ad hoc data manipulation - Excel and Power Query in particular are stand out tools for their ease of use and their strengths in munging multiple data sources and applying transformations along the way.

Despite being predominantly a tool for data science and analysis, Microsoft has now incorporated R inside Power BI for its data wrangling capability, and I've recently had the chance to use and test it out in a real work capacity.

What it can and can’t do

I was pleased with how easy some of the basics R capabilities were to find and how well things worked in general. I wanted to investigate how well R integrates into Power BI currently and experiment how much of my normal data manipulation I can do with R without having to leave Power BI.

At a high level, it can:

  • Allow you to use an R script as an input – Get Data has an R script option.
  • Run R visualisation functions on data – R Custom Visuals allow you to use R code to visualise instead of Power BI’s visualisation options.
  • Run scripts on existing data – transform data with an R script embedded in Power Query.

It can’t:

  • Output complex objects like lists – the output of R scripts must be data frames, which will form the tables that Power BI needs to create reports and dashboards.
  • Use any R library on the web service – you will need to test which libraries are compatible before publishing to the web.
  • Handle more than 150,000 rows. When it encounters larger data frames, it will take the top 150,000 rows and send a warning message to the user.

 

Importing from an R Script

Make sure you have the latest updates from Power BI: 

  • Go to Get Data -> More… and select the R Script options.
  • R scripts that you copy and paste in will be embedded in the Power BI report.
  • You may want to consider using source() for particularly long scripts, however be warned that you will not be able to publish the report with a reference to your local machine.

 GetData

You can specify an input script referencing a flat file or an RData file with data frames. The output must include a data frame - these provide the tabular data that Power BI needs. In this example, I simply referenced the dataset cats from the MASS library.

  Navigator

Creating a custom R Visual

If you want to create an R visualisation that is not yet available in Power BI:

  • Visualisations -> R script visual
  • Select the columns you want – these will be included in an input data frame called dataset that you can reference in your R script.
  • Write whatever plot functions you want to use.

 Plot

The output is a static image that gets embedded into your report. Therefore, you won’t be able to scroll over to view tooltips and drill down into the data on the tile just yet. However, you can change the values in the Filters dropdowns and regenerate the PNG image with each change. It’s a good idea to make sure that your R code runs relatively quickly.

Transforming Datasets with an R Script

You might rather perform a series of steps on some preloaded data using an R script. This could mean cleaning a data set loaded in via Excel, creating new columns from a SQL database source and pivoting a dataset. Since R is now integrated into Power Query, you can create an Execute R Script as a separate step at the beginning, middle or end of a data cleaning process.

  • Edit Queries -> Transform tab -> Run R Script
  • Copy and paste the R code, making sure to reference your dataset.
  • If the script returns a single data frame, it will take the last data frame and insert it into your Power BI model as a data set.
  • If you have a choice of data frames, then you can edit the query in the Advanced Editor to retrieve the data frame you want:

Script

The M language creates objects at each stage that you can reference and pull data from. Using the syntax {[Name="cats"]}[Value] I can reference a specific data frame created by an R script and use it in the next stage of my process.

 Cover

Gotchas

  • When trying to transform data with an R script, you might run into an issue with permissions (known as privacy levels in Power BI.) Go to Edit Queries -> Data source settings and adjust.
  • You must connect directly to a data source in your Power Query code. That means you can’t use R scripts with references, or you will return an error like the following:
“Formula.Firewall: Query 'IncurredData (2)' (step 'Run R Script') references other queries or steps, so it may not directly access a data source. Please rebuild this data combination.”
  • Power BI does not have its own version of R included. It will look for a version of R on your local machine.
  • Publishing – you won’t be able to publish local references to Power BI service. Make your code maintainable with headers and copy and paste the whole script into the R code editor.

Conclusions

Considering that visualisation in R isn’t actually my strongest skill, I can compensate by reading (or building) new datasets using R scripts and then by creating normal Power BI reports and dashboards over the top of them. This functionality is also particularly useful to me when I have to clean up or reshape datasets on the fly – instead of moving data files around externally, I can perform all of the manipulations in my environment without having to leave Power BI.

There are still some inefficiencies in the deployment of the R tool. Small changes at the visualisation level cause the R script to be rerun, which are fine so long as the script is fast, which sadly is not always going to be the case. Something a bit cleaner and less costly would be preferable, but it will do for now.

Overall, I am surprised at how well this works, considering that other Microsoft R tools have been a bit buggy or incomplete. I am looking forward to seeing if and how Microsoft will improve this functionality in future versions of Power BI.

References

https://powerbi.microsoft.com/en-us/blog/data-cleansing-with-r-in-power-bi/ 

https://powerbi.microsoft.com/en-us/documentation/powerbi-desktop-r-in-query-editor/

https://powerbi.microsoft.com/en-us/documentation/powerbi-desktop-privacy-levels/

 

Speak to an expert

Head of Innovation and Joint founder

Gordon Meyer

Contact Gordon

Latest from this author