I have recently been analysing some of the BOE/NMG data with R. Wanting to analyse the data required me to study what R is and how it works.
This blog will be split in 3 parts:
- What is R?
- BOE Data
What is R?
I have been using the RStudio development environment for approximately a month and I think it’s a great tool. It has a lot of libraries that I can use to plot the data in different ways to give me different visual details of the data. These libraries allow me to do something as simple as a bar chart to something more complex like a plotted map to show data, or a motion chart- I am sure there are more complicated libraries in R which I haven’t come across yet!
I think if anyone has data and wants to plot it in a specific way, RStudio is the software to use, mentioning the fact that it is a free software too.
Not to sway you off the software, but just like other software I did face difficulty in understanding some concepts of the libraries and dealing with errors, it isn’t the best at telling you what exactly the error is when you make one.
To read more about R in a formal fashion, you can visit https://www.r-project.org/about.html.
Note: R is the language and RStudio is a software that allows you to code R and display the output in the software.
The Bank of England published 6 datasets from which I chose the NMG household survey data. The data covers a time span of 11 years from 2004-2014.
This survey was done throughout the UK. The data within the survey is very diverse in the respect that, it covers all regions, and takes data input from different types of people i.e. people with different education levels, marital status, age range, job status, and much more. There are approximately 280 different variables recorded.
I have used the following data to test the statistical powers of R:
- Age group
- Region – Comparing the data of different regions and seeing what the data changes are in different regions.
- Job Stats – Employment status, unemployed and retired people throughout the UK.
- Tenure – Property tenure.
The data is stored in Sql Server, and I am taking that data and loading it in a dataframe in R in order for me to perform analytics.
The data that I have mentioned above will be used for different analytical purposes. Seeing the difference in percentage will allow us to understand how well employment is across the UK. The main visualisation method I use is plotted maps to show the data.
Below is an example of how the map will be plotted:
Firstly, I decided to use the regions and job status data to plot on a map.
To do this, I first had to make some views which would help me combine the data and present it in one data frame. Then I loaded the data into R and plotted as below.
When you click on the points they display data of the region as you can see on the map.
Following are the stats of the other regions:
|Region||Employed %||Unemployed %||Retired %|
|Yorkshire & Humberside||62.1%||8.9%||28.9%|
After I made the map, I decided to make a motion chart using googleVis functions.
The image above is a motion chart. Notice that there is a year slider in the bottom and the ability to select various regions.