Where Bottlenecks Occur
The Snapshot and Distribution agents perform their tasks according to design, but a couple of these tasks can cause serious bottlenecks that compromise performance. Let's evaluate the agents' performance by applying a standard set of metrics.

Setting up performance criteria. To quantify an application's execution time, you can either empirically measure its execution or mathematically evaluate its algorithm. In this section, I do both an empirical measurement and a mathematical evaluation of the agents' steps to compare the results from these different methods. To quantify the execution time mathematically, you represent the execution time of the algorithm by the order of magnitude the time grows when the input-data size increases. This growing trend in execution time shows how well an algorithm can scale with the load. For example, if the time taken to process data of size n grows linearly with the data size (e.g., processing 1000 records takes 2 seconds and processing 10,000 records takes 20 seconds), the algorithm's execution time grows in linear order with the data size. This growing trend as a function of the data size is called the time complexity of the algorithm and is denoted as O(n) (or order n) for linear-order time complexity. An algorithm that always finishes in a constant amount of time no matter how much data you input has a constant-time complexity, which I denote as O(c).

Now, let's use these formulas to evaluate the Snapshot Agent's performance. Regardless of the number of subscriptions that you need to copy data to, snapshot.exe executes only once for each publication at each scheduled time. Generating .sch and .idx script files in this command takes a constant time (complexity O(c)), independent of the number of records (denoted as n) in the publication. The time that snapshot.exe takes to generate .bcp files, however, is directly proportional to the size of the publication (time complexity O(n)). My empirical measurement on a 500MB publication that has 6 million records shows that snapshot.exe takes 7 seconds to generate the .sch and .idx files, but takes 2 minutes and 38 seconds to generate .bcp files. These results confirm that the speed of generating .bcp files depends on the size of the publication. Compared to the Distribution Agent's command (distrib.exe), which I evaluate next, the Snapshot Agent's command (snapshot.exe) is still very fast, even in generating large .bcp files. Its performance is free of bottlenecks.

Because the Snapshot Agent doesn't slow down the snapshot replication process, the Distribution Agent must be the culprit. Applying the formulas to this agent's steps shows you where the bottlenecks occur.

First bottleneck: deleting target records. Distrib.exe executes once for each subscription. The command's functionality is exactly the opposite of snapshot.exe's functionality; the command applies (in order) all the .sch, .bcp, and .idx files of a publication to a target subscription database. In doing so, the command carries the overhead of logging all the DELETE statements in the sch step and indexing all the target tables in the idx step. A DELETE statement in the default sch step deletes all the records (assuming no filter has been specified) in a target table; that process has a complexity of O(n), where n is the number of records in that target table. In addition, each deletion records each deleted row in the transaction log and deletes the corresponding row from the clustered index and nonclustered index B-trees. According to the execution plan that Figure 9 shows, the DELETE statement copies each row to tempdb (Table Spool) once for each nonclustered index and deletes each row from each index B-tree. Each copy or delete in this execution plan costs about 10 percent of the overall cost of the DELETE statement. Therefore, for a table with m indexes, each index contributes roughly 1/m of the total processing cost. The overall complexity of the DELETE statement in the sch step thus is O(m * n). In the example I provided earlier (using a 4-CPU development server—with 1GB of RAM and RAID 5—to process a 500MB publication that has 6 million records), the typical duration for deleting all records in the target tables is 58 minutes. You can see that deleting your target records creates a bottleneck that grows with a publication's size and number of indexes.

Second bottleneck: bulk copying the source data. After the sch step is complete, the bcp step bulk-inserts all records in the .bcp files into respective indexed target tables. You can tune a Distribution Agent's execution by setting parameters in its profile. (To access the profile, right-click a Distribution Agent—CGIS1:pubs in the right pane of Figure 2—then select Agent Profile.) For example, raising the value of BcpBatchSize (the number of records copied per bcp batch) in the profile can reduce distribution history logging. Or you can adjust the value of MaxBcpThreads (the number of concurrent bcp threads in the bcp step) to match the number of CPUs on the distribution server. However, I didn't find the profile helpful for improving performance and scalability during the distribution of my test data. Moreover, distrib.exe lacks a useful bcp parameter—the ORDER hint. When you set it, the ORDER hint causes bcp to sort the data according to the clustered index (if any) on the target table before copying the data into the target table. In effect, the data is replicated in the clustered index order, and records copied later never cause SQL Server to enforce the clustered index order by rearranging earlier records. Consequently, ordered bcp is significantly faster than dropping the clustered index before and recreating it after an unordered bcp.

Let's mathematically evaluate the best-case scenario for the bcp step by assuming it has the ORDER hint set. The bcp step in this scenario essentially involves inserting n sorted records (i.e., records ordered by the clustered index) into each target table and inserting each row of this table's indexed columns into each of the m index B-trees (assuming m indexes exist). The complexity is O(m * n) + O(n * log(n)), where O(n * log(n)) is a typical time complexity of sorting and O(m * n) is the time complexity of inserting n records and maintaining m indexes for each of the n inserted records. Keep in mind that this is the best-case scenario time complexity for the bcp step. Using the earlier example of a 500MB, 6 million­record publication running on a 4-CPU development server with 1GB RAM and RAID 5, the bcp step takes 55 minutes, nearly the same as the time that the sch step takes. This result confirms the similarity in complexity between the sch and bcp steps. Like the sch step, the bcp step creates a bottleneck that grows with the publication's size and number of indexes.

Optimizing Performance
Clearly, the bottlenecks that form when you try to copy massive amounts of data can render large-scale snapshot replication impractical. However, you can take steps to avoid these bottlenecks.

Solving the DELETE bottleneck. You can easily eliminate the bottleneck that DELETE statements cause in the sch step by selecting the Delete all data in the existing table (using TRUNCATE) option in the Snapshot tab of an article's Properties dialog box, which Figure 8 shows. The result is a TRUNCATE <table> statement instead of a DELETE FROM <table> statement in the .sch file. TRUNCATE TABLE deallocates all pages associated with the table in one operation and logs only the deallocation operation. Therefore, it always takes a constant time—or complexity O(c)—which will scale up to any number of records and indexes in a publication. Testing in the same publication example shows that when you select the TRUNCATE option, the sch step takes less than a second, instead of 58 minutes. Using the TRUNCATE option is only possible, however, under one condition: The target table can't contain a foreign key constraint. If the target table is referenced by a foreign key constraint, the corresponding TRUNCATE statement in the sch step will fail. To fully optimize its sch step, the Distribution Agent has to drop all foreign key constraints associated with all target tables before taking the sch step, then recreate them after the sch step.

Solving the bcp bottleneck. Unfortunately, you can't fix the bcp bottleneck through configuration the way you can configure the TRUNCATE statement. The only way to optimize the bcp step is to drop all indexes and key constraints before the bcp step and recreate them after the bcp step. Dropping and recreating the indexes and key constraints reduces the complexity of bcp to a linear time proportional to the size of the publication—O(n)—because bcp no longer has to do sorting and indexing. This optimized bcp step will scale up to any number of indexes in a publication. The publication in my example comprises 10 clustered indexes, 29 nonclustered indexes, and 1 foreign key constraint. Dropping all indexes and keys takes 2 minutes, 8 seconds; recreating all indexes and keys takes 10 minutes, 36 seconds, and completing the bcp step takes 28 minutes—a total of 41 minutes in contrast to 55 minutes. These tests confirm that the optimization result of the bcp step isn't as dramatic as that of the sch step because of the bcp step's linear time complexity. The overall benefit of eliminating both the sch and bcp bottlenecks in the example is a savings of 72 minutes.

Failed shortcut to optimization. To implement these optimizations for the sch and bcp steps, you might be tempted to try a shortcut. For example, suppose you set the @creation_script parameter of the sp_addarticle stored procedure to a custom script file when you add each article to the publication. Such a custom script file would drop all indexes and keys in each target table. (For information about creating a custom script file, see the Web sidebar "Custom Scripting Feature in SQL Server 2000," InstantDoc ID 24661.) Remember that the Distribution Agent applies sch, bcp, and idx steps sequentially. The goal is to replace the .sch file with a custom script file while retaining the original .bcp and .idx script files. The task looks easy because replacing the .sch file requires no significant implementation except preparing the custom script files.

Unfortunately, this idea doesn't work for two reasons. First, you might expect to use the .idx file that the Snapshot Agent created to recreate foreign key constraints and indexes that the custom script dropped. However, the .idx file doesn't include primary and unique key constraints because in the .idx file, the Snapshot Agent captures these key constraints as indexes instead. For example, in Figure 6, UNIQUE CLUSTERED INDEX [PK_ADDRESS] comes from a primary key constraint named [PK_ADDRESS] but survives as an index only. The second and more significant reason is that although the @schema_option parameter of the sp_addarticle stored procedure lets you disable the Snapshot Agent's scripting and use the provided @creation_script parameter, the @creation_script parameter's setting disables the generation of .sch and .idx script files as well. Therefore, no .idx file will be available for recreating the indexes that the @creation_script file dropped. Setting the @creation_script parameter to a custom script file doesn't achieve the goal.

Customizing Snapshot Replication
Now, let's sketch out conceptually the steps for customizing—thereby optimizing—snapshot replication. I'll detail the full implementation of these steps in the second article of this series. Remember that each Distribution Agent runs within a job, as Figure 3 shows. The job's Run agent step is a command that applies to the whole publication; you can't interrupt this step between articles to customize it. However, Enterprise Manager lets you customize a job by inserting new steps and modifying the flow of the steps. You can insert a step—which I call the drop step—before the Run agent step to drop all indexes and keys and insert another step—called the recreate step—to recreate all indexes and keys. Because you insert these steps, the bcp substep within the Run agent step (which executes distrib.exe) never sees indexes and keys and hence is optimized to a linear time complexity. Further, because the inserted recreate step will be responsible for recreating all the indexes and keys, you won't need the .idx file anymore. So, on the Snapshot tab of an article's Properties dialog box, which Figure 8 shows, clear all Copy indexes options to empty the .idx file and select the Delete all data in the existing table (using TRUNCATE) option to customize the .sch file. Because the drop step dropped all foreign key constraints, the sch substep within the Run agent step can run TRUNCATE statements successfully and therefore is fully optimized to a constant time complexity. So, the optimized steps occur in the following order: drop, sch (TRUNCATE), bcp, idx (empty), and recreate. Figure 10 illustrates the steps of a customized Distribution Agent job.

Scale to Your Environment
The only drawback to the optimization is that between the drop and recreate steps of a Distribution Agent job, the drop step has dropped all indexes and keys of all target tables at the subscriber. The dropped indexes and keys can cause concurrent queries to execute more slowly and might compromise key constraints for the duration of that window. However, for a mostly read-only data warehouse server farm, the main requirement is to synchronize all data warehouses and data marts quickly within limited off-peak operating hours while keeping concurrent queries to a minimum. The optimization lends itself to this environment very well.

End of Article

Prev. page     1 2 [3]     next page -->



You must log on before posting a comment.

If you don't have a username & password, please register now.