If you've worked in any technology-related field, you've probably heard the term cube thrown around, but most traditional DBAs and database developers haven't worked with them. Cubes are powerful data constructs for rapidly aggregating multi-dimensional data. If your organization wants to perform data analysis on large volumes of data, a cube is the ideal solution.
What Is a Cube?
Relational databases were designed to support thousands of concurrent transactions while maintaining performance and data integrity. By their very design, relational databases fall short in large volume data aggregation and retrieval. To aggregate and return large volumes of data, a relational database must receive a set-based query that asks for a set of data to be aggregated on the fly. These relational queries are very costly due to their reliance on multiple joins and aggregations, so relational aggregation queries perform poorly when operating on large data sets.
Cubes are multidimensional entities that address this weakness in relational databases. With a cube, you can provide users with a data structure that facilitates rapid responsiveness for large-volume aggregation queries. Cubes perform this aggregation magic by pre-aggregating data (measures) across multiple dimensions. The cube's pre-aggregation (normally) takes place when a cube is being processed. When you process a cube, you're creating pre-calculated aggregations of data that are stored in binary form on disk.
A cube is the central data construct of an OLAP system such as SQL Server Analysis Services (SSAS). Cubes are (usually) constructed from an underlying relational database called a dimensional model, but they're separate technical entities. Logically, a cube is a data repository that is composed of dimensions and measures. Dimensions contain descriptive attributes and hierarchies while measures are the facts you are describing with dimensions. Measures are combined into logical groupings called measure groups. You tie dimensions to measure groups based on a granularity attribute.
In the file system, a cube is implemented as a series of related binary files. The binary architecture of a cube facilitates its fast retrieval of large volumes of multi-dimensional data.
I mentioned that cubes are constructed from an underlying relational database called a dimensional model. A dimensional model contains relational tables (fact and dimension) that correlate nicely to a cube's entities. Fact tables contain measurements such as the quantity of a product sold. Dimension tables store descriptive attributes such as product names, dates, and employee names. Generally, fact tables are related to dimension tables through primary-foreign key constraints (this relational join correlates to the cube's granularity attribute I spoke of earlier), with the foreign keys residing in the fact table. When dimension tables are directly related to a fact table, a star schema is formed. When dimension tables aren't directly related to a fact table, a snowflake schema is produced.
Note that dimensional models are categorized according to their scope. A data mart is a dimensional model designed for a single business process, such as sales or inventory. A data warehouse is a dimensional model designed to encompass multiple business processes, and thus facilitates cross-business process analytics.
Be Prepared: Software Requirements
Now that you have a basic understanding of what cubes are and why they're important, I'll switch gears and take you on a step-by-step tour of building your first cube using SSAS. There are some basic software components you'll need in place before building your first cube, so make sure your system meets these requirements before proceeding.
My sample Internet Sales cube will be built from the AdventureWorksDW 2005 sample database. I'll be building the sample cube from a subset of the tables found in the sample database that are useful for analyzing Internet sales data. Figure 1 shows these tables in a basic database diagram. Because I'm using the 2005, you can follow along with my directions using either SQL Server 2005 or SQL Server 2008. The AdventureWorksDW 2005 sample database can be found on the CodePlex website at msftdbprodsamples.codeplex.com.
 |
Figure 1: Subset of the AdventureWorks Internet Sales data mart. Click to expand. |
As mentioned above, you need access to an instance of SQL Server 2008 or 2005, including the SSAS and Business Intelligence Development Studio (BIDS) components. I'll be using SQL Server 2008, so you might see a few subtle differences if you're using SQL Server 2005.