Although the SQL Server database engine is the core service for processing data, it's often misunderstood. However, it helps to know how the database engine performs the fundamental operations of retrieving and modifying data, including the transaction log's role. This information can help you architect disaster recovery strategies and understand the role of memory and disk on database performance.
Retrieving Data
The database engine has two major components: the storage engine and the query processor. The storage engine writes data to and retrieves data from stable media (e.g., disks). The query processor accepts, parses, and executes SQL commands.
When a user submits a SELECT statement, he or she is asking the query processor to retrieve data that resides in data files on the disk. These files usually have an .mdf or .ndf file extension. In addition to the data, the data files contain indexes and other objects. The indexes are copied to a server's RAM, as Figure 1 shows. This memory has several names, including buffer cache, buffer pool, and bpool. If the indexes are already in the buffer cache, the database engine doesn't need to copy the indexes from the disk again.
Once in the buffer cache, the query processor uses the indexes to find the desired data rows in the data files. To simplify this discussion, let's assume the table queried with a SELECT statement has a clustered index. (Tables without a clustered index are called heaps and have a different structure.) The data files reside on the leaf level of the clustered index. A table can have only one clustered index. The other indexes are called nonclustered indexes. Nonclustered indexes contain row locators that point to the data row in the clustered index. After the query processor uses the available clustered and nonclustered indexes to retrieve the data, it returns the result set.
Modifying Data
When a user submits an INSERT, UPDATE, or DELETE statement to modify data, the transaction log gets involved because data modification statements are always enclosed in a transaction, even if they're not coded as such. When you enclose a data modification statement in the BEGIN TRANSACTION and COMMIT TRANSACTION statements, it's considered an explicit transaction. When you don't use those statements, it's considered an implicit transaction. Whether implicit or explicit, committing the transaction makes all data modifications in that transaction a permanent part of the database.
A transaction log sequentially records all of a database's modifications and other types of transactions. (Microsoft recommends having only one transaction log per database.) A transaction log typically has an .ldf file extension.
The transaction log supports the A in ACID. That is, it helps transactions achieve atomicity, which is one of the four properties (Atomicity, Consistency, Isolation, and Durability) that transactions should possess if they're going to be processed reliably. Atomicity simply refers to the idea that a transaction needs to be processed completely or not at all. In other words, if a transaction fails at any point in the process, the entire transaction must be rolled back.
The process leading to a committed transaction is a complicated one. When a user submits a data modification statement, SQL Server reads the data files affected by the modification into the buffer cache (assuming they aren't already present in the buffer cache from a previous operation), updates the values in the buffer cache, then records the modification in the transaction log. Recording the modification is a vital step because, for a transaction to commit, the change information must be written to the transaction log. This requirement is called write-ahead logging.