DOWNLOAD THE CODE:
Download the Code 24020.zip

Hide confidential information by suppressing measure values within a dimension

Editor's Note: Joining Russ Whitney as coauthor this month is Paul Goldy (pgoldy@proclarity.com), senior consultant with ProClarity. Paul plans and implements OLAP solutions with SQL Server 2000 Analysis Services.

Most OLAP implementations exist to provide summarized information for decision makers. When decision makers discover summarized information in the dimension's upper layers that leads to further questions, they can drill down into more detailed information. The added detail can help the decision makers determine what business factors contributed to the summarized values. However, in some OLAP implementations you must limit the detailed information that a business analyst can see either for confidentiality or for legal reasons. For example, in a corporate financial analysis, you might need to make detailed expense information available to the decision-making managers of your company, but you don't want all individual salaries available because of their confidential nature. Or secondary-education administrators might use students' standardized test scores to analyze the effectiveness of certain schools, teachers, and curricula, but because of certain laws governing confidentiality, the administrators can't view individual students' test scores. Analysis of a local criminal-justice system might provide local crime statistics to the media, but you'd need to avoid revealing detailed data about individual judges, prosecutors, and defendants.

This kind of data-hiding problem occurs frequently, so we wanted to show you the process we used to create one real-life solution. We created this analysis solution for a data brokerage that collects and sells point-of-sale retail data. Retail outlets such as Best Buy, CompUSA, and Wal-Mart track scanned data and sell the data to product manufacturers such as Hewlett-Packard (HP), Lexmark International, and Canon. The manufacturers analyze the scanned data and use the analysis to optimize their marketing and sales strategies.

If competing manufacturers will likely view the data, you need to hide certain data to prevent unfair competition between manufacturers. A manufacturer can see trends or summarized information about a competitor but not the exact pricing and volume of specific brands within individual retail chains. To help enforce this concept, data brokers implement data-suppression rules that indicate which detailed information in the OLAP system needs to be suppressed. For example, if a brand-name beer from the FoodMart 2000 Sales cube (e.g., Good, Pearl, Portsmouth) sells in two or fewer Store Names (in the Store dimension), the broker's data-suppression rules specify that the brand's measure values must be suppressed. The suppressed value doesn't disappear but falls into the ALL OTHER BRAND member.

Another requirement of this suppression rule is that individual store names from the Store Name level of the Store dimension can't be visible to the analyst. Yet, the analyst still has to check against the members of that level to count the number of retailers reporting information for the brand. So selecting the exact data to suppress can be a complex task.

Figure 1 and Figure 2 show what the results look like for queries that don't use suppression compared with queries that do use suppression, respectively. Figure 1 shows the Unit Sales of beer brands for some cities in Washington and the store names within those cities. Figure 1 also shows the member ALL OTHER BRAND, which is unpopulated because you're seeing Unit Sales as it typically comes out of the Sales cube. Contrast this level of detail with the level of detail in Figure 2. The query that generates Figure 2's results is nearly the same, but no store names are available for the cities, and using the measure Suppressed Unit Sales enforces the suppression rule that requires that a brand be reported in two or more stores within each city. Thus, in Figure 2, the ALL OTHER BRAND member is populated and the brand name values are empty. The results show only one store in each city (because each city actually has only one store), so all the Unit Sales values fall into the ALL OTHER BRAND member.

We could best illustrate this problem by showing a typical cube from a data broker. However, these cubes aren't publicly available. So, we modified a copy of the FoodMart 2000 Sales cube to show how the suppression rule is implemented. First, we made a copy of the Store dimension, renamed it Hidden Store, and set the Dimension Visible property to false. Second, we made the Store Name level of the Store dimension non-visible by setting the Level Visible property to false. Third, we added an ALL OTHER BRAND record to the product and product_class tables in the FoodMart 2000 Access database, as Figure 3 shows. (Note that the product and product_class tables are related by the product_class_id column.) After we made all our changes, we rebuilt the Product dimension and reprocessed the Sales cube.

The solution to this suppression problem has three parts. First, we hid the Store dimension's Store Name level. Hiding this level is tricky because the level must still be available to the calculated member so that the MDX that enforces the suppression rule can determine which brands to suppress. Second, we created a new Product dimension member called ALL OTHER BRAND that will display the Unit Sales values that are suppressed. We created this dimension member by adding another record in the Product dimension tables. Finally, we used a calculated measure to suppress the Unit Sales values for brands that sell into two or fewer stores. This measure calculates the Unit Sales value for ALL OTHER BRAND.

   Prev. page   [1] 2     next page



You must log on before posting a comment.

If you don't have a username & password, please register now.