We found out slightly too late about the Bank of England Visualisation Competition to investigate the data and find some interesting results, interesting enough to present a visualisation for the competition. But in any case, it is a very rich set of data, that is termed the ‘One Bank Research Datasets’.
The data itself comprised 7 sources
- This dataset comprises individual companies perception on past and future economics based on 11 questions. Covering 2008-2012 by quarter.
- This dataset contains the Bank of England annual balance sheet information from 1696 to 2014.
- NMG – Nmgface and nmgonline. This dataset contains survey results performed online and face-to-face to individuals asking questions relating to demographic and household financial matters, with some questions around the perception of the future economic situation. Covering 2004 to 2014 by year.
- A dataset examining individuals’ attitudes to future inflation levels. Data collected quarterly from 2001 Q1.
- A dataset of quantitative easing history from 2009 to 2014.
- ‘Three Centuries’. This is a very rich dataset incorporating a diverse set of annual and quarterly/monthly data of GDP, employment and various economic data from 1688 to 2014.
As I started browsing the data in Excel, I found various missing items, discrepancies and coding differences. I guess this was always going to be the state of the data, considering firstly how old the data is and secondly that some data were generated through surveys that changed over time. I needed to have the data in a more robust source that could be easily manipulated, so my tools of choice are SQL Server 2014, Excel with PowerPivot and XLCubed. My colleague, Dafydd loaded the data largely as it was, into a SQL Server 2014 database. Realising that the diversity of the data and volume of the data, although not large with 132,000 rows and 50Mb, was fairly complex, I had to start narrowing my search for something ‘interesting’. I decided that I wanted to narrow my focus and look at the perception of price rises one year ahead, and analyse them by various demographic factors.
I chose the following tools to sift the data
- SQL Server. I chose a standard SQL database over a NoSQL flavour since the data were both fairly simple in structure and relatively low in volume.
- As a number crunching tool, this is fantastic to be able to create calculated columns on the fly and bring in other data from different sources. The data will be read from SQL Server into PowerPivot.
- There are options of using Excel’s ‘web sources’ to gather extra data to merge into the nop data. Also I often use PowerPivot and XLCubed addins, which have an interesting set of features. The final analysis is done in Excel with a pivot tabled linked to PowerPivot.
I created a SQL Server view to clean the nop data. Then created a PowerPivot model to categorise the various demographic variables into a meaningful set of analytic groups. I needed a set of actual CPI data one year in advance of the dates the questions were asked. I was hoping to use Excel’s web source, but unfortunately in this case the technology let me down, so I sourced the data from the ONS and cut/pasted it into an Excel tab.
I will go through some of my thinking for how I got to my final analysis template. I wanted to work with a median value of perception, so constructed an excel pivot table as below
The full range of the data is from 2001 Q 1 to 2014 Q 4, but in this heat map I noticed a discontinuity that occurred at the end of 2008, when there appeared to be more range of choices. Detailed investigation showed that the questions changed over time. This is not very unusual with long term surveys that the original research question is found to be slightly short sighted. One can see that the percentage of people who responded at the 5.5% level was over 40% for two of the four quarters before they changed, which illustrated that there was an increased perception that prices were going to rise by 5.5% or more in the next 12 months. Nothing could be done about this problem and we work with the data we can find. You will notice there are discrepancies between the median 12 month forecast and the actual value of the CPI 12 months ahead. I decided to utilise a MSE (mean squared error) methodology to determine how well the population could forecast CPI one year in advance. Below is an extension to the table showing the error and error squared.
Then the MSE is calculated summing the error squared values.
Finally I added a set of slicers to then show a subset slice of the data.
So as I changed values on my slicers, it recalculated the MSE of the 15 years of forecast data on the fly. The lower the MSE, the better that subgroup of the population were in estimating CPI, the higher, the worse.
- I played with various combinations of the demographic data supplied: age, region, income, education, house tenure, social class, work, sex. I found that the MSE ranged from 56 through to over 200. Obviously as I exercised the slicers, the number of samples decreased and the smaller the number of samples the higher the degree of variability.
- Women (MSE=81) were marginally better than men (MSE=83)
- The younger the age group, the better predictors they were. 15-24 y.o (MSE=56), 25-44 (MSE=75), 35-44 (MSE=88), 45-54 (MSE=80), 55-64 (MSE=86), 65+ (MSE=93)
- Generally more education gave a better prediction. GCSE (105). A Level (MSE=81), Degree (MSE=75). Unknown education (where the respondent refused to answer or it was not available) gave a very high MSE=211. There were only 3700 rows which as a sample size is quite small for this type of analysis.
- With regard to class, the higher the class the better the prediction. AB (MSE=75), C1 (MSE=81), C2 (MSE=85), DE (MSE=100)
- For income, the lower the income the worse the prediction. <£9500 (MSE=90), £9500-17499 (MSE=81), £18000-24999 (MSE=79), >£25000 (MSE=76)
Broadly an upper middle class, well educated, high income, young people are the best predictors of inflation one year into the future. It would be interesting for a more robust follow up piece of research to be performed that targets specific demographic groups as a crowd-sourced inflation prediction system.