When you trace the results in SQL Server Profiler, you get the SQL Trace output that Figure 2 shows, which indicates that the INSERT statement is fed from an IRowset. To run the same test when you're using BCP, run the following statement from the command prompt:
BCP.EXE testdb.dbo.sales IN
f:\data\sales.char -S
(local)\dev -T -c -b 10000
Because BCP sends the rowset remotely over TDS, you get the result that Figure 3 shows. The BULK INSERT statement results in the SQL:BatchCompleted event INSERT FROM {IRowset 0xE0808A05 }, whereas the BCP.EXE statement results in the SQL:BatchCompleted event INSERT BULK <column list with size and type information> with (<options>).
Besides the difference in implementation (in-process versus remote) between BULK INSERT and BCP, the two techniques differ in scale. The number of BULK INSERT sessions scales linear to the number of available CPUs in the machine running SQL Server. Because BULK INSERT relates to one thread per session in SQL Server, it never scales beyond the number of CPUs in the machine. So on an eight-CPU machine, you can have eight parallel BULK INSERT sessions (assuming you're using a separate User Mode SchedulerUMS for each session); if you initiate a ninth session, it won't start until a currently running session finishes. BCP doesn't have this limitation but is limited to network or IPC bandwidth.
Optimizing Performance by Limiting Logging
After you've decided which bulk-loading tool you want to use, you can optimize the bulk load's performance. The best way to improve your bulk-load performance is to minimize transaction logging. Typically, when SQL Server is loading data, the transaction log records each row insert, each page allocation, and each extent allocation. During a high-volume data load, recording all these transactions in the transaction log can seriously impede performance because the transaction log is the throughput bottleneck of your database. If you're adhering to best practices, you've already made sure that the SQL Server transaction log file for the database you're loading into resides on its own spindle or set of spindles and isn't shared with data files or load files. To further optimize performance, you can make SQL Server minimize the number of transaction log records it needs to write by making the inserts non-logged operations.
The term non-logged operation is a bit misleadingSQL Server always writes certain transaction log records so that database changes are durable and recoverable. But SQL Server can minimize the number of transaction log records by writing a record only for each extent allocation. (An extent is eight 8KB pages, or 64KB.) This practice reduces contention on the transaction log and increases overall throughput of the data load.
To best optimize your data-loading performance, when you create your data-loading strategy, be sure you adhere to the following five minimal-logging operations rules:
- Set the database recovery model to Bulk-Logged or Simple.
- Don't replicate the target table; doing so will create a larger-than-normal transaction log record because the log would need more details to reconstruct each operation when replicating it. The increase in record size will decrease the overall throughput of the transaction log and decrease data-load capacity.
- Be sure the target table doesn't contain any triggers because SQL Server uses transaction log information to create the Inserted and Deleted virtual tables inside a trigger. If you have triggers, SQL Server will log inserts as separate transaction log entries; you can't have triggers and have non-logged operations.
- Be sure the target table contains no rows or has no indexes associated with it. When a table contains rows but no indexes, SQL Server loads the data on a newly created extent so that it can improve efficiency by writing a transaction log record only for extent allocations. This way, in the case of a failure in which SQL Server needs to roll back a transaction, SQL Server can simply undo the extent allocations without destroying rows that already existed in the table when the load started. When the table contains both rows and indexes, the strategy of loading the data on a new extent makes optimization too complex. When the table doesn't contain rows, SQL Server can lock both the table and the index structures without harming concurrencynobody will be operating against rows that don't exist.
- Use a TABLOCK hint with the bulk-load operation.
Prev. page
1
[2]
3
next page