Data files in the buffer cache that have been modified but the modifications haven't yet been written to disk are called dirty data pages. Dirty data pages are later written, or flushed, to disk by an asynchronous write process, as Figure 2 shows. There are three types of asynchronous write processes:
- Eager writing process. This process writes dirty data pages associated with nonlogged (e.g., BULK INSERT) operations to disk.
- Lazy writing process. This process wakes up periodically to write dirty data pages to disk and remove infrequently used pages from the buffer cache.
- Checkpoint process. The checkpoint process writes dirty data pages to disk, including those pages whose transactions haven't been committed yet. Its primary purpose is to keep the number of dirty data pages to a minimum.
Data can be modified several times in the buffer cache before the dirty data pages are flushed to disk.
Upon SQL Server startup or a database restore, the database undergoes a recovery process. During recovery:
- Log records of data modifications not flushed to disk are rolled forward (they're replayed). For example, suppose a modification was made to a data file in the buffer cache and that modification was recorded in the transaction log as required by write-ahead logging, but the data file wasn't written to disk because the SQL Server service failed. When the SQL Server service restarts, the data file will be reloaded into the buffer cache and updated with the modification based on the information found in the transaction log.
- Log records of data modifications associated with incomplete transactions are rolled back (they're undone). For example, suppose a modification was made to a data file in the buffer cache, the modification was recorded in the transaction log, and the modification was written to disk by the checkpoint process. However, the transaction remained open and did not commit when the SQL Server service failed. When the SQL Server service restarts, the incorrect data file will be read into the buffer cache and corrected with the information found in the transaction log.
Exploring the Transaction Log Further
The transaction log is logically divided into virtual log files. VLFs determine the boundaries of transaction log reuse and truncation. A transaction log is a "wrap around" file in that inactive portions are overwritten with new log records. As Figure 3 shows, the active portion of the transaction log is from the minimum recovery log sequence number (MinLSN) to the last written log record. MinLSN is set by the checkpoint process. Log records preceding MinLSN can be truncated. Log records after MinLSN are active and can't be truncated.
The checkpoint process first marks a VLF as reusable. The log truncation process then marks the VLF as logically truncated. Truncated VLFs can be overwritten. Note that the physical transaction log file isn't shrunk by the log truncation process.
For databases with the full recovery model, log truncation usually occurs after a log backup and requires a checkpoint beforehand. For databases with the simple recovery model, log truncation usually occurs after a checkpoint. Log truncation won't occur if a data backup is running, if there's an active transaction, or if transactional replication is waiting for a transaction to be delivered to the distribution database. If the log truncation process fails, the transaction log will grow until it fills up the disk it resides on.
If you'd like more information about how the transaction log, data modification operation, or data retrieval operation works, check out these articles: