Subscribe to SQL Server Magazine | See More Database Articles Here | Reprints | Or get the Monthly Online Pass—only $5.95 a month!
Main Article    Improving Analysis Services Query Performance
DOWNLOAD THE CODE:
Download the Code 41231.zip

To figure out which partition designs work best to reduce the number of sequential scans for Listing 2 and Listing 3 in the main article, I created six cubes, structured identically but with different partition designs. The idea is to partition along the Time and Occupant_Severity dimension members (which Figure 2 in the main article shows) that the queries in Listing 2 and Listing 3 are looking for so that they can get to those members directly without a scan. To create a new partition, you can use the Partition Wizard in Analysis Manager as Figure A shows and specify the data slice (i.e., tuple). (For more information about using the Partition Wizard, see SQL Server Books Online-BOL.) Table 1 in the main article summarizes the partitioning data slices of the six test cubes. I named the cubes by combining the names of their partitioning dimensions and the total number of partitions in the cube. For example, each of the 31 partitions of cube Year_Severity31 slices along the Time and Occupant_Severity tuples ([Partition Year].[1]), ([Partition Year].[2].[1997], [Fatal]), ([Partition Year].[2].[1997], [Injury A, Incapacitating]), and so on.

To study the effects of the number and speed of CPUs, disk I/O, and physical memory on the partitioned cubes, I executed Listing 2 and Listing 3 on six different servers. Table 2 in the main article gives the specifications of these servers ranging from the highest end to the lowest end in hardware resources. All the tests measured the response times for Listing 2 and Listing 3. Listing A shows a code snippet from the Visual Basic (VB) application that I used to drive the tests. Listing A's code uses an ADO recordset to send the query and return the result. The code computes the response time by calling the Win32 API GetLocalTime function immediately before the open and after the close of the recordset. During this window, the code retrieves one record at a time until it reaches the end of the recordset and displays each record in a graphical grid called MSFlexGrid.

Because Listing 2 and Listing 3 reference a large dimension level-[Street].[Street_List], which contains 27,691 members-and involve the (FILTER) function, they must both always execute on the server. This requirement means that we can observe the performance of both client and server. Listing 2 and Listing 3 have only two main differences. Listing 2 contains six calculated members, whereas Listing 3 involves none. And Listing 2 references three year members (1998, 1999, and 2000) in each of its six calculated members, but Listing 3 references only the [All Time] member.

To study the effects of server (Analysis Services) cache and client (PivotTable Service) cache and compare those effects with the effect of partitioning, I maintained both cold and warm caches in my testing. According to BOL, if you don't set the PivotTable Service property Default Isolation Mode, the cursor type that the Recordset requests determines whether PivotTable Service is in isolation or non-isolation mode. If PivotTable Service is in isolation mode, it never refreshes its cache. For my test, I set the cursor type to adOpenForwardOnly, which caused the PivotTable Service to keep the client cache warm between queries-in other words, PivotTable Service didn't flush the client cache, so I could study the performance effects of a warm client cache. Table A shows the test results I got for cold and warm caches. Here's how I achieved cold and warm caches for my tests:

  • Cold Caches: Neither the server nor client caches contain the queried data. I simulated cold caches by restarting Analysis Services.
  • Warm Server Cache: Only the server cache contains the results of the test query. I simulated warm-server and cold-client caches by restarting the client application so that the client cache was flushed but the server cache remained warm.
  • Warm Server and Client Caches: Both server and client caches contain the queried data. I simulated warm server and client caches by keeping the client connection and immediately resending the same query.

Performance Results
Figure 3 in the main article shows the performance of Listing 2's query on all the servers, and Figure B shows the performance of Listing 3's query. Surprisingly, for both queries, the performance is exactly in reverse proportion to server power. For example, in Figure 3, you can see that for the Default1 and Severity6 cubes, the high-end servers are twice as slow as the low-end servers. For the other cubes, which all have partitions in the Time dimension, the gap decreases, but the low-end servers are still faster. Figure B shows that for Listing 3, the high-end servers still perform significantly worse than the low-end servers. Partitioning doesn't reduce the performance gap between high- and low-end servers for Listing 3 on any cube. Note that these results are consistent regardless of how I change memory and thread settings. For example, I changed the settings on High1 from the default 16 threads to 2 threads to mimic the low-end servers. I also tried setting a higher value for minimum allocated memory-from the default 511MB to 2GB. The changes made no difference in High1's performance.

From these results, we can draw an important conclusion: Partitioning along queried data slices leads to the best performance. For example, the partitions in the Year_Severity31 cube and the PartitionYear_Severity7 cube produced the best performance for Listing 2. The Severity6 cube produced the best performance for Listing 3's query (which doesn't ask for Time members).

   Prev. page   [1] 2     next page
 
 

ADS BY GOOGLE