DOWNLOAD THE CODE:
Download the Code 25538.zip

Adding the Recreate Step
The Distribution Agent remotely calls the create_<publication_name>_indexes stored procedure (in this case, create_addressing_truncate_indexes) during the recreate step of the Distribution Agent job to create all the indexes and key constraints for target tables at the subscriber, as Figure 4 shows. To customize the recreate step, you first insert the step after Run agent, then in the New Job Step dialog box, General tab, enter in the Command window the statement that Figure 4 shows. In this statement, CGIS1 is the remote subscriber, hub4_publish_db is the subscription database, and addressing_truncate is the publication name. When you include this complete information, you avoid conflicts with similar procedures of other publications. You can create and configure this custom step the same way you configured the drop step.

Putting Optimization to the Test
How does a snapshot replication perform after you've optimized it by customizing the Snapshot Agent and Distribution Agent jobs? Let's look at the optimized replication's performance during my tests. Figure 5 shows the elements of the test environment and Figure 6, page 38, describes the characteristics of each test. Note that each test result is a typical representative of five to six samplings of the same test. I performed all tests during the 10:00 pm to 6:00 am off-peak hours to make sure that other applications didn't introduce an irrelevant load.

Because the sp_start_job stored procedure starts a job and returns immediately without waiting for the job to finish, it's essentially an asynchronous stored procedure (i.e., the completion and return of this stored procedure doesn't signify the completion of the job that this stored procedure starts). The start distributors step in a Snapshot Agent job (see the Command window in Figure 2) therefore essentially starts all the Distribution Agent jobs concurrently. To achieve serial distributions (in Tests 2 and 4), first select any schedule-type option for all Distribution Agent jobs except Start automatically when SQL Server Agent starts, then disable the schedules. Choosing a schedule-type option prevents the Run agent step (i.e., the distrib.exe command) from running continuously. Second, start only one Distribution Agent job at the Snapshot Agent job's start distributors step, then insert another start distributors step before the last step of the first started Distribution Agent job. This sequence starts another Distribution Agent job, and so on, serially. Table 1 summarizes the results of these tests.

The most significant performance difference between concurrent and serial optimized distributions (i.e., between Test 1 and Test 2) occurs at the drop and recreate steps. The reasons are clear: The drop and recreate steps take place on each subscriber independently, without the intervention of the distributor. For concurrent distributions, only one drop and one recreate step appear to take place—a complexity of O(c) (constant-time complexity), regardless of the number of subscribers—whereas for serial distributions to n subscribers, n drop and recreate steps will occur—a complexity of O(n) (linear-time complexity). For more information about quantifying the execution time mathematically, see "Bottleneck Blues." The bcp step takes place between one distributor and n subscribers. Beyond some threshold t, the number of concurrent bcp processes start overloading the distributor and lose the advantage of concurrency. To further optimize, a more sophisticated customization would run t sp_start_job commands against t Distribution Agent jobs concurrently at the Snapshot Agent job's start distributors step. Assuming these jobs will finish roughly at the same time, pick an arbitrary Distribution Agent job and insert another start distributors step before the recreate step to start another batch of Distribution Agent jobs. The effect is serial batches of concurrent t distributions. Because the start distributors step is asynchronous, it essentially overlaps with its next step, the recreate step. In effect, your job would have n/t batches of drop and bcp steps, but only one recreate step (that of the last batch).

The following comparisons further highlight the advantage of optimized concurrent distribution and high-end hardware based on the test results:

  • o versus u: The optimized concurrent distribution (Test 1) is 63 minutes faster than its unoptimized counterpart (Test 3). The savings, as mentioned before, is largely from the sch step, and to some extent from the bcp step.
  • concurrent versus serial: Two concurrent distributions didn't overload the distributor; therefore, the threshold t apparently is more than two (t > 2).
  • high end versus low end: The I/O is much slower on subscriber CGISTEST1 than on CGIS1 because the latter is equipped with a RAID 5 of 6 fibre channel SCSI drives. However, in optimized distributions (Tests 1 and 2), the low-end subscriber performs no worse than the high-end one. The optimized-distribution result doesn't hold for unoptimized distributions (Tests 3 and 4), in which the low-end subscriber falls slightly behind. This slippage occurs because the sch and bcp substeps are I/O-intensive operations that can flood a low-end disk drive with I/O requests much faster than a high-end drive.

Manual Customization Speeds Performance
Customizing the Snapshot Agent job and the Distribution Agent jobs is crucial for optimizing the performance of a snapshot replication. My test results show a significant performance gain in sch steps and better scalability in bcp steps for an optimized snapshot distribution. Enterprise Manager provides a friendly interface for manually customizing these replication agent jobs. However, in a large-scale environment that has many servers and might be reconfigured frequently, manual customization becomes impractical. You need a faster, more reliable, and reproducible way of performing the customization. In a follow-up article, I'll show you how to automate some of these customization tasks by implementing them through SQL-DMO.

End of Article

Prev. page     1 2 3 [4]     next page -->



You must log on before posting a comment.

If you don't have a username & password, please register now.

Reader Comments

According to the article, the “Download the code” would contain the complete listings. All they contain are the snippets of the codes. Did I miss where to go to get the complete download? I am very much interested in trying this out.

George Wilson

How do you replicate 90 GB database to remote site(production server)

sam moayedi

Can SQLServerAgent servica be started on Windows 98?

Seemanth