I was finally ready to import the sorted data back into SQL Server. Luckily, I did a BULK INSERT test on 10 million rows to make sure my log wouldn’t explode. It did. As it turns out, by default, BCP loads the entire input file into the destination table as a single transaction, which means the entire table is fully logged. In my case, the log space required was 10 times the size of my data.

With a bit more reading, I discovered I could use BULK INSERT’s BATCHSIZE parameter to fix the log-space problem. This parameter tells BULK INSERT how many records it should commit at a time. I picked 100,000 and ran my test again. This time the log never exceeded 39MB. A distinct improvement over the 2GB it had ballooned to the first time.

Satisfied that everything was in place and fully tested, I started the import operation again and settled in for the expected 50-hour wait. You can imagine my excitement when I returned from a long weekend to discover the import had finished successfully. The resulting coordinate table was the same physical size as before because a clustered index takes virtually no extra space.

Next, I created a test location table that consisted of 200,000 latitude and longitude coordinates. I added new LatIndex and LonIndex fields in the location table to represent the integer versions of the original floating-point latitude and longitude coordinates. All that remained was to join the location table to the coordinate table on the indexed fields and return the results. Forty-five seconds later I was staring at 181,458 beautiful rows of data representing a join between 200,000 locations and 16 billion coordinates. I had watched the CPU graph on the server while the SELECT statement ran and was gratified to see it pinned at 100 percent for the entire time, which means that disk I/O was clearly not a bottleneck. SQL Server is one lean, mean row-slinging machine.

Here are some distilled tips from my experience:

  • The more data you have in a table with a defined clustered index, the slower it becomes to import unsorted records into it. At some point, it becomes too slow to be practical.
  • If you want to export your table to the smallest possible file, make it native format. This works best with tables containing mostly numeric columns because they’re more compactly represented in binary fields than character data. If all your data is alphanumeric, you won’t gain much by exporting it in native format. Not allowing nulls in the numeric fields can further compact the data. If you allow a field to be nullable, the field’s binary representation will contain a 1-byte prefix indicating how many bytes of data will follow.
  • You can’t use BCP for more than 2,147,483,647 records because the BCP counter variable is a 4-byte integer. I wasn’t able to find any reference to this on MSDN or the Internet. If your table consists of more than 2,147,483,647 records, you’ll have to export it in chunks or write your own export routine.
  • Defining a clustered index on a prepopulated table takes a lot of disk space. In my test, my log exploded to 10 times the original table size before completion.
  • When importing a large number of records using the BULK INSERT statement, include the BATCHSIZE parameter and specify how many records to commit at a time. If you don’t include this parameter, your entire file is imported as a single transaction, which requires a lot of log space.
  • The fastest way of getting data into a table with a clustered index is to presort the data first. You can then import it using the BULK INSERT statement with the ORDER parameter.

—Dmitry Mnushkin, application developer,
Renaissance Reinsurance

End of Article

Prev. page     1 [2]     next page -->



You must log on before posting a comment.

If you don't have a username & password, please register now.

Reader Comments

What a great article and a good tip to know about BCP with the 4-Byte Integer. But the best part was the persistant dealing with the problem and the willingness to persit in waiting for the results. I have been through these "nail biters" before and it takes nerves of steal to pull them off. Great Job.

Curt Spanburgh. MVP Dynamics CRM.

SCG

Article Rating 5 out of 5

 
 

ADS BY GOOGLE