Let’s take this concept further by looking at another scenario. Let’s say the database becomes corrupt at 12:15 A.M., in the middle of the batch operation. You know that the tail of the transaction log isn’t accessible because you’re in the process of a bulk operation in the Bulk_Logged recovery model. As we just saw, your data loss is everything past 12:00 A.M. But you could have prevented some—and possibly all—of this data loss. Performing a transaction-log backup at 12:10 A.M., when the database was accessible (right before the bulk operation began), would have at least brought you up to 12:10 A.M., the moment before the bulk operation. If the bulk operation were the only operation occurring from 12:10 A.M. to 12:15 A.M. (when the database became corrupt), you could use the transaction-log backup to bring the database up to 12:10 A.M. Once recovered to 12:10 A.M., you could execute the bulk operation again to bring the database up to the time of the failure and continue it moving forward.
It’s crucial that you back up your transaction log immediately before performing a batch operation and immediately after performing a batch operation. Doing both minimizes the overall potential for data loss in the event of a failure. Remember that if the database is set to the Bulk_Logged recovery model and you’ve performed a bulk operation, you cannot back up the tail of the log even if the transaction-log file is accessible. If you haven’t performed a bulk operation, you can back up the log. Because log backups are possible in the Bulk_Logged recovery model, some people might consider always running in Bulk_Logged mode. However, always running in Bulk_Logged mode can be dangerous because you’re no longer entirely in control of the recovery. Performing bulk operations isn’t necessarily limited to DBAs or systems administrators. Anyone who owns a table can create or rebuild indexes on their tables, anyone with Create Table permissions can use SELECT INTO to create a permanent table, and anyone who has access to text data can manipulate it with WRITETEXT and UPDATETEXT.
Because you don’t control who performs bulk operations or when, it’s important to know and limit when operations are fully or minimally logged. If you’re responsible for data recovery and your environment can’t afford data loss, the only way to minimize data loss is by running in the Full recovery model and controlling changes through the Bulk_Logged recovery model. And you should switch to Bulk_Logged only if and when it’s appropriate for your environment. In some environments, switching might never be appropriate.
If your databases aren’t processing transactions around the clock or if you’re willing to have work-loss exposure to achieve better performance of your batch operations, you might consider a temporary change to the Bulk_Logged recovery model. The best practice, if you determine that you can periodically change to the Bulk_Logged recovery model, is to change recovery models within batch processes, which usually occur after normal working hours. This practice ensures that the window of potential work loss is limited to the safest times of the day. When you change to Bulk_ Logged during the batch operation, perform a log backup before the switch, then change back to the Full recovery model when the operation is complete, performing a log backup after the switch. To see how to programmatically and optimally change recovery models during a bulk operation, download the script DB Alter for Batch Operation.sql by clicking Download the Code.
In addition, you might consider breaking large or complex batch operations into smaller, simpler batch operations to keep the transaction log from growing excessively large. To minimize the potential for data loss (because you can’t back up the tail of the log if the database becomes suspect), consider performing log backups during the batch process and between some steps of the bulk operations. Breaking down large or complex operations and performing log backups between the larger steps gives you more recovery options.
A Recovery-Model Test
To see how your choice of recovery model can affect the database’s transaction-log size (the amount of space required to log the operation), the transaction-log backup size, and the operation’s speed let’s look at a simple test that uses the SELECT INTO operation to create a new table called TestTable, based on a table called charge from another database. The charge table has 800,000 rows, and the data is roughly 40MB in size.
As Table 2 shows, the Simple and Bulk_Logged recovery models seem to provide the same performance and the same active log size. However, recovery models don’t affect all operations. In a second test, I performed an UPDATE operation against all 800,000 rows in TestTable. As Table 3 shows, compared to the SELECT INTO operation, the UPDATE operation caused the transaction log for all three databases to grow significantly to handle the modification, but the UPDATE operation’s duration and the size of the transaction-log backup (where a transaction-log backup was permitted) were the same for all the recovery models.
In looking at the test results in Tables 2 and 3, you might think that Bulk_Logged is the best recovery model to use because it seems to allow transaction-log backups and because the operations affected by recovery models run faster than with the Full recovery model. However, remember that the transaction log isn’t always available for a transaction-log backup when you’re running the Bulk_Logged recovery model. If the device on which the data resides isn’t available when SQL Server attempts a transaction-log backup, SQL Server can’t perform the transaction-log backup, resulting in data loss. Thus, up-to-the-minute recovery isn’t always possible with Bulk_Logged. Take some time to get familiar with the different recovery models and their trade-offs and determine out how they affect speed, logging, and recovery for your production databases.