Use counts to analyze textual information
When people think about OLAP, they think about analyzing numbersfinancial data or sales data, for example. However, many OLAP solutions exist outside traditional numeric applications. Some of the most valuable analysis solutions don't start with numeric information. You can analyze any databasewith or without numeric informationby using counts. A count can measure the quantity of transactions such as sales, calls in a support center, or defects in a product. Count analysis can be a very revealing exercise because it can show trends over time or relationships between numbers (e.g., the count of sales in Europe versus the count of sales in the United States). Let's examine count analysis in detail, then look at an example of how you can analyze data that has no numeric information.
You can easily aggregate textual information as counts, and you can use counts to derive trends or percentages. For example, say you have a simple text database that includes an entry for each article in SQL Server Magazine. The database would include fields containing such information as title, author, topic, applicable SQL Server version, and date published. Each of these fields is a candidate to become a dimension. The primary measure might be number of articles. Another dimension could be SQL Server version, in which the count would tally articles about SQL Server 2000 within the SQL Server 2000 dimension member and SQL Server 7.0 articles within the SQL Server 7.0 dimension member. To analyze the mix of content in the magazine, you could derive a new dimension member (a calculated member) in the SQL Server version dimension that measures the ratio of SQL Server 2000 articles to SQL Server 7.0 articles. Charting this new dimension member over time would reveal how the ratio of SQL Server 2000 articles to SQL Server 7.0 articles has changed. I suspect you'd see a spike of SQL Server 2000 content around the time of the initial release of SQL Server 2000, then a drop followed by gradual growth as the SQL Server 2000 product matures.
The previous example demonstrates how you might use a text database analysis; count analysis is also appropriate for databases that contain numeric information. For example, Analysis Services' FoodMart database includes a measure called Sales Count, which is a simple count of all the sales transactions in the database and is useful for determining statistical averages such as Sales Average (Store Sales divided by Sales Count). Sometimes a count measure isn't useful, but the average (or other calculated measures such as ratio) derived from the count is. In such a situation, you can create a count measure but mark it in Analysis Manager as hidden. A hidden count measure is available only for creating calculated measuresit isn't available for use directly.
Count measures can be easy or complex. In the FoodMart database, setting up the Sales Count measure is simple. You just create a new measure based on the fact table's primary key and set the measure's type to count. But let's look at a more complex example. Say a table called fact contains line items (entries) for invoices. An invoice can contain one or more entries. So you probably want a count measure that counts invoices, not entries. To count invoices, you want to count only the groups of entries that make up an invoice.
One way to solve this problem is to create a distinct count measure based on the fact table's invoice number column. This measure will give you the invoice count values you want, but distinct count measures have two serious limitations. First, each cube can contain only one distinct count measure. Second, Analysis Services doesn't aggregate distinct counts through dimension levels as it does other measures. The distinct count values are predetermined and stored at each cell with other measures, so you can't create new aggregations during an MDX query's execution. In an MDX query or a calculated member definition, using a function such as
Aggregate( [USA], [Mexico] )
won't work with a distinct count measure selected; determining the result of the function would require rescanning the fact table because the cube doesn't contain enough information to determine the function's result. Analysis Services can't rescan the source table, but even if it could, the process would be prohibitively slow. The effect of this restriction is that distinct count measures don't generally work well with other calculated members or sets.
A second solution is to create an extra column in the source table to store an invoice count. Fill one entry for each invoice with a value of 1; fill all other instances of the invoice count field with values of 0. You can then create an Invoice Count measure that's a sum of this invoice count column. This solution works as long as you select in the cube a cell that encompasses a group of entries that make up complete invoices. If your cell includes only some of the entries in an invoice, the invoice count column might not include the entry that contains the 1 value and thus would produce a sum of 0 instead of 1 for that invoice.
A third solution is to use a combination of the two approaches. Create an invoice distinct count measure, an invoice sum count measure, and an invoice count calculated measure that helps you determine which of the other two measures to use based on the cell that's selected. The invoice distinct count measure will return the correct answer when only some of the entries in an invoice are selected, and the invoice sum count will work in all other situations. The invoice sum count also gives you the benefit of working when custom combinations of members are selected. This invoice count example shows that, in real-world situations, count measures can get complicated because the count might depend on a distinct combination of a group of fact table columns.
However, count-based analysis can be valuable. Consider the following solution, which was created for the prosecuting attorneys office of a county criminal justice department. Such an organization frequently has to respond to requests for crime statistics. The department maintains the criminal case filings in a custom application that uses DB2 on an AS/400 system, but the application wasn't designed for analysis, so it can only look up individual cases or report about a series of cases according to simple criteria.
Prev. page  
[1]
2
next page