Multi-Dimensional Data Part I: Building a Cube

This is an introduction to building cubes for those with little to no experience of doing so; a guide through the basic process and some best practices that I have picked up from colleagues and mentors.

Software: SQL Server 2014, SSMS, SSAS, SSDT for Visual Studio 2013.

Note: SSAS has not changed much since 2008 R2, so using any other help that applies to 2008 will be applicable to your cube.

If you are looking for a whistle-stop tour of building a cube in SQL Server Data Tools (SSDT), then you have come to the right place. I will guide you through building and adjusting some views in a database engine that will serve as your data source, then into creating a Data Source View (DSV) from these views, and finally the dimension and fact tables that will make up your cube. We’ll be doing this on a small scale to ease you into the process. As a forewarning, the tables that I build the views off are not called Dims and Facts, but rather ‘Entities’ and ‘Links’, as the Warehouse I use is a Link Model, a variant of Kimball and Data Vault, taking the best features from each.

Building the Views

NSCubePt1.1.Base Tables
Fig.1 – Base Tables

As you can see from Fig.1, I have a set of tables in the [base] schema of an ANA (Analysis) database that are created and populated by a stored procedure from the DWH (Data Warehouse) database. In Fig.2 I have drilled down to view the columns – we will remove the row related information of dates and load, keeping the necessary Primary keys that link to what will become our fact table ([base].[LinkVisit]), and the corresponding data. E.g. kBrowser, Browser and BrowserVersion.

 

NSCubePt1.2.Base Table Columns
Fig.2 – Table Columns

Depending on the data you are using and how you want the end user to use the cube, you can be as simplistic or extravagant with your views as you like. Best practice would dictate that you should do as much of the data manipulation as possible at the database level. For this exercise we will be keeping things as simple as possible. In fig.3 You will see a CREATE VIEW statement that simply selects the three columns we want from the corresponding [base] table. It is good practice to create a new schema for your views to inform any other users of the database of what these are used for; I have called mine [cube].

Repeat this for each of your [base] tables until you have a set of basic views. You must also create a view of your fact table as this will provide your measures in the cube. Be careful in making sure that each dimension you create has a key column that is referenced in the fact table; if there is no link between the two tables then it is not worth having in the cube.

 

NSCubePt1.3.CreateDimBrowser
Fig.3 – CREATE VIEW

 

Setting Up Your SSDT Project

Let’s get started on building your cube. Open up SSDT. On the left-hand side of the window there should be a New Project option, which will open up a new window to enter the project properties. We will be creating an Analysis Services Multidimensional and Data Mining Project as highlighted in fig.4 – don’t worry about the data mining aspect of this as we will not be including this as part of our cube. Give your project a name and a location (file path), and we are ready to go.

NSCubePt1.4.NewProjectWindow
Fig.4 – New Project Window

Provide a Data Source

NSCubePt1.5.DataSourceWizard
Fig.5 – Data Source Wizard

NSCubePt1.6.DataSourceConnection
Fig.6 – Connection Manager

SSDT makes much of the beginning of the process quite simple, in that the objects in the solution explorer are almost in order of which should be created first. The Data Source comes first, followed by the DSV. Right click on Data Sources and choose New Data Source to bring up the wizard. Create a connection to the database where your views are stored. Select New to create a new connection (fig.5), remember to select the database at the bottom of the Connection Manager window, as seen in fig.6. The penultimate window is the Impersonation Information. These are the Windows credentials Analysis Services will use to connect to the Data Source. If you are doing this locally, then select “Use the credentials of the current user”.

Finally, give your Data Source a name. It’ll help other users/editors of the cube – if there are any – to know where the data is coming from. Hit Finish and you now have a data source.

 

Create Your Data Source View

NSCubePt1.7.DSVWizardwithAdvanced
Fig.7 – DSV Wizard with Advanced Options

As with Data Source in the solution explorer, right click on Data Source View and select New… to bring up the wizard. As you can see in fig.7, it is possible to create a new data source from this window, but it will just take you to the same Data Source Wizard as before. If you need to restrict the schemas in the source you can in the Advanced … settings also in this window. Select your data source and move to the next window.

The wizard should now ask if you want to create logical relationships between the tables. By setting foreign keys to match primary keys, the DSV will do some of the hard work for you.

NSCubePt1.8.DSVSelectViews
Fig.8 – View Selection

NSCubePt1.9.DSVComplete
Fig.9 – Completing the DSV

If you have created a schema unique to your data source and restricted the project in the advanced setting, selecting your views will be easy. As in Fig.8, you can select all views. In the Final window, Fig.9, give your DSV a name and check the views that will be visible.

NSCubePt1.10.DSVLogicalPrimaryKey
Fig.10 – Logical Primary Keys

Once the wizard is complete you should now be looking at group of boxes. If you look back at fig.3, I hadn’t provided a primary key for my views. This is not a problem as they can be created in this view of the DSV by clicking and holding on the k in the vwFact table and dragging across to the matching k in the corresponding vwDim. You will be asked if you want to create a logical Primary Key, as in Fig.10. Click Yes.

The Cube

NSCubePt1.11.FinalDSV
Fig.11 – Final DSV View

Now that your tables are all linked in the DSV (fig.11), it is time to create your cube. There are two options: 1. Create your cube now, creating your dimensions in the process… or 2. Create your dimensions first, then add them to a new cube. We’re going with option 1. Right click Cubes in the solution explorer and choose New. Here is your third and final wizard.

NSCubePt1.12.CubeWizard
Fig.12 – The Cube Wizard

As we have already created a data source and populated the DSV you will want to choose Use existing tables as the creation method (fig.12).

The next window asks you to choose your Measure group – in this case vwFactVisit – followed by a window in which to choose your distinct measures. Select the primary key of your fact table as the measure to keep it simple.

NSCubePt1.13.CubeComplete
Fig.13 – Cube Wizard Completion


To complete the wizard, select the dimensions you wish to use from the list of tables. Click 
Next and you should be on the final window where the last option is changing the name of the cube (fig.13). Hit Finish and your star schema cube will appear, hopefully looking a bit like fig.14. This is all well and good, but you’re not quite done yet. As you can also see in fig.14 I’ve done a bit of tidying up. In the Properties window on the left you can see the name-change of the IP Address Dimension, as well as all the others under each Dimensions tab. These are just aesthetic changes to make everything cleaner and easier to read – you will notice that the ID property of any object or attribute will remain the same as it originally was.

 

NSCubePt1.14.CubeView 1024x485
Fig.14 – Cube View

Congratulations, you now have a basic cube. But you’re not over the finish line yet. If you right click and open any of the dimensions in the Solution Explorer, you will find very little. In order to make your dimensions useful a few things need to happen:

  1. Create your Attribute Relationships (fig.15)
    NSCubePt1.15.AttributeRelationships

    Fig.15 – Attribute Relationships

  2. Create your Hierarchies
    • Natural (fig.16)
      NSCubePt1.16.DateHierarchy

      Fig.16 – Natural Hierarchy for Date Dimension

    • Unnatural
      • Not created in this cube as unnecessary for data
      • See here for examples

  3. Hierarchies do not have to use all attributes, but all attributes that are used as KeyValue, or Name Columns should be included in the attribute list

  4. Clean up what will/will not be visible to the end user
    • Keys should not be visible. In this example, the easy way to get around this is by using the FullDate  attribute to mask the key as the lowest level of the hierarchy (fig.17 and 18)
      NSCubePt1.17.KeyAttribute

      Fig.17 – Key Attribute

      NSCubePt1.18.KeyMask

      Fig.18 – Masking the Key Column using Properties

  5. Test everything. In your Cube window use the Browser tab to test your measures against your dimensions (fig.19). If you do not get the desired results, check your dimensions/attributes’ properties
NSCubePt1.19.CubeBrowser
Fig.19 – The Cube Browser

The good thing about building a small cube with your first attempt is that the small data set means very little processing time. My measure group was a little over 1000 distinct values with a few dimensions containing no more that 100 distinct values. In building mine, I must have reprocessed the entire thing more than a hundred times, but if you are thinking of doing this with a larger database, then consider creating a sample set to build your cube from. Do this in the views within the database, rather than trying to shrink it in SSDT.

For more in-depth tutorials on each section I have touched on, the videos here provide some great information, even if they focus on SSAS 2005 and 2008 – many of the principles are the same. And, as always, MSDN has all the information you could want from here onwards.

Latest from this author