DOWNLOAD THE CODE:
Download the Code 25521.zip

You can perform each test with or without the TABLOCK hint that I commented out in the code. I won't say much more about locking, but for understanding BULK INSERT operations and Listing 2, it's useful to know what the TABLOCK hint does. Be aware that another form of the TABLOCK hint is used in SELECT statements, but that form isn't the same as the TABLOCK hint you use with BULK INSERT operations. When you use it with BULK INSERT or the bcp utility, the TABLOCK hint instructs SQL Server to lock the whole table for the entire batch operation. Without this hint, each inserted row would acquire a row lock, so if you're inserting tens of thousands of rows, SQL Server has to manage tens of thousands of locks. The TABLOCK hint acquires just one lock for the operation. So TABLOCK can be a good thing—unless you don't want the entire table to be unavailable while the data is being loaded. Without the TABLOCK hint, other processes can access and modify any rows in the table that aren't affected by the INSERT operation. With the TABLOCK hint, no other processes can read or modify any rows in the table, with one exception: Other processes that are also performing BULK INSERT or bcp commands can simultaneously access the table when they also use the TABLOCK hint. This feature lets multiple processes insert into the table concurrently; in some cases, this concurrent insertion ability can give you a great performance boost.

I ran the script in Listing 2 four times. First, I created the od table with just over half a million rows and ran the three tests with TABLOCK, then I ran the same tests without TABLOCK. I then created a copy of the od table with more than a million rows and ran the tests again, with and without TABLOCK. Table 1 shows the time required in seconds for each of my tests.

This set of tests didn't show that building all the indexes before loading the data was faster, but my goal here is not to convince you that it is. I'm merely suggesting types of tests to run. You could run many other tests, and many factors could be different on another system, yielding vastly different results. For example, you could try running the tests with 5 million or 10 million rows, or try more or fewer indexes. You could build the indexes on different columns or create composite indexes.

You should base your tests on the data you'll actually be loading, including the types of indexes you expect to build. Other systems could give different results if they have better I/O systems, especially for the drive that contains tempdb, where most sorting happens. I performed the tests for this article on my Toshiba Tecra 8000 laptop, which has a 10GB hard drive. I wouldn't make any recommendations to my clients based on these results, other than my usual advice to thoroughly test whether dropping some of your indexes before executing batch operations will improve performance. Without thorough testing, it's almost impossible to figure out whether you should build one, none, or all of the indexes before loading the data.

Query Plans for Batch Operations
One limitation in trying to analyze my results for these BULK INSERT tests is that there's no way to see a query plan that shows whether SQL Server used row-at-a-time or index-at-a-time insertion. For a BULK INSERT operation, the plan tells you only that SQL Server performed a bulk insertion.

For batch UPDATE operations, however, you can determine whether they occurred at the table level (row-at-a-time) or at the index level by inspecting the query plan output. If SQL Server performs the update at the index level, the optimizer produces a plan that contains an UPDATE operator for each of the affected indexes. If SQL Server performs the update at the table level, the plan shows only one UPDATE operator.

If you ran the tests to create the load_od table and you still have the load_od table with all five indexes, you can demonstrate the difference for yourself. Copy the following query into Query Analyzer, then select the toolbar button to display the query plan (or press Ctrl-L):

UPDATE load_od
SET Quantity = Quantity + 1, Discount = Discount * 1.01 
— WHERE counter < 250

If the WHERE clause is commented out, the optimizer needs to find the best plan for updating a large number of rows—all the rows in the table. The graphical plan output will show two branches. The first branch updates the clustered index (i.e., the data), then sorts the data and updates the nonclustered index on the Quantity column; the second branch sorts the data, then updates the nonclustered index on the Discount column. This plan shows index-level updating. If you remove the comment marker from the WHERE clause so that only a fraction of the rows in the table are updated, you'll get a much simpler query plan. This plan has only one INDEX UPDATE operator for the clustered index (the table data). As it updates each row in the table, SQL Server makes the corresponding changes to the appropriate nonclustered indexes. So this plan shows table-level updating.

When you need to modify thousands or millions of rows in one operation, finding an efficient strategy is essential. However, the most efficient strategy isn't always obvious. Knowing what options SQL Server has available and what changes you can make to affect performance can help you create tests that reveal the best strategy for your applications when they're processing your data.

End of Article

Prev. page     1 [2]     next page -->



You must log on before posting a comment.

If you don't have a username & password, please register now.

 
 

ADS BY GOOGLE