SideBar    The Logical Puzzle, Beyond MERGE’s Basics
DOWNLOAD THE CODE:
Download the Code 97963.zip

Executive Summary:

Some data modification tasks involve merging source data into a target table. Before Microsoft SQL Server 2008 you had to carry out such tasks with multiple modification statements, requiring you to access both the source and the target multiple times, and causing complexities in terms of handling the modifications as an atomic unit. Microsoft SQL Server 2008 adds the standard MERGE statement to simplify and optimize these tasks.


Some data modification tasks involve merging source data into a target table. Such tasks are common in both OLTP and data warehouse environments. A simple example from an OLTP environment is modifying a target Customers table with more recent customer information from a source file. You’d need to overwrite the attributes of existing target customers with the newer information, as well as insert into the target any customers in the source that don’t already exist in the target. Examples from a data warehouse environment include handling slowly changing dimensions and merging fact additions and changes into a target fact table as part of your daily extraction, transformation, and loading (ETL) process.

Before SQL Server 2008 you had to carry out such tasks with multiple modification statements, requiring you to access both the source and the target multiple times, and causing complexities in terms of handling the modifications as an atomic unit. SQL Server 2008 adds the standard MERGE statement to simplify and optimize these tasks.

MERGE Fundamentals
I’ll use a simple example to help familiarize you with the basic syntax and concepts of the MERGE statement. Suppose you have a target Customers table in your OLTP environment, and you want to merge more recent customer info from a staging table called CustomersStage into the Customers table. Run the code in Web Listing 1 (www.sqlmag.com, InstantDoc ID 97963) to create the Customers and CustomersStage tables and populate them with sample data. This code creates the tables in the tempdb database, for demonstration purposes only. (Under normal circumstances, these tables are in a user database.)

Table 1 and Table 2 show the current contents of the Customers and CustomersStage tables. I’ll refer to the Customers table as the target and the CustomersStage table as the source. Updating the Customers table with info from the CustomersStage table involves two actions:

  1. Identifying customers that appear in both the source and the target (based on a match between the source and target custid values), then updating target customers by overwriting all their attributes (except custid values) with the corresponding source customer attributes.
  2. Identifying customers in the source that don’t exist in the target (based on the same condition), and inserting those customer rows into the target.

To achieve this task in SQL Server versions prior to 2008, you’d probably use an UPDATE statement based on an INNER JOIN followed by an INSERT statement with a NOT EXISTS predicate or an OUTER JOIN filtering only outer rows. Such a solution would require you to access the tables twice. Also, such a solution would complicate things in terms of handling the operation as an atomic one—more specifically, in avoiding conflicts with other INSERT operations to the target that take place between your UPDATE and INSERT operations.

In SQL Server 2008 you can use the new standard MERGE statement, as the code in Listing 1 shows, to achieve this task simply and efficiently. The semantics of the MERGE statement are very similar to those of an OUTER JOIN, with sections defining the target, the source, and the action to take when a match occurs or doesn’t occur.

The MERGE INTO clause defines the target table (e.g., Customers) and optionally assigns it an alias (e.g., TGT). The USING clause defines the source table (e.g., CustomersStage) and optionally assigns it an alias (e.g., SRC). The ON clause defines the MERGE condition.

In the WHEN MATCHED THEN clause, you define an action to take against the target when the source row has a matching target row based on the ON predicate. In our example, the action is UPDATE— that is, update existing target customers (2, 3, and 5) with the more recent customer info from the source. Notice the syntax of the UPDATE statement is very similar to that of a plain UPDATE statement, with one difference: You omit the name of the target table because you already defined it in the MERGE INTO clause. You can think of the action in this clause as single row operation—one UPDATE per each source row that finds a match in the target based on the ON clause. However, in practice, the MERGE statement is applied as a single set-based operation for all rows.

In the WHEN NOT MATCHED THEN clause, you define an action to take against the target when the source row doesn’t find a matching target row based on the ON predicate. Going back to JOIN semantics, you can think of the operation as source LEFT OUTER JOIN target. The WHEN MATCHED THEN clause defines the action to take in case of matches—namely, with inner rows; the WHEN NOT MATCHED THEN clause defines the action to take in case of non-matches— namely, with outer rows. In our example, the action is INSERT—that is, insert into the target source customer rows that don’t exist in the target (6 and 7). The statement is similar to a plain INSERT statement, with one difference: Similar to the WHEN MATCHED THEN clause, you don’t need to specify the target table name because you already defined it in the MERGE INTO clause.

In order to always start with the same sample data, run the MERGE statement in each example in the context of an explicit transaction, then roll it back when finished (after querying the Customers table to see the changes). For example, run MERGE statements as follows:

BEGIN TRAN;
  
  MERGE ...;
  SELECT * FROM dbo.Customers;
ROLLBACK TRAN;
Continue on Page 2
   Prev. page   [1] 2     next page



You must log on before posting a comment.

If you don't have a username & password, please register now.

Reader Comments

I have always wanted a SQL statement that did this! Great!!!

DWalker59

Article Rating 5 out of 5

Very useful ;)

cemuney79

Article Rating 5 out of 5

 

  Related Articles

Can’t Wait for SQL Server 2008! Grouping Sets, Part 2 Grouping Sets, Part 1 IS NOT DISTINCT FROM

  Related Whitepapers

Get Started with Oracle on Windows DVD StoreVault SnapManagers for Microsoft Exchange and SQL Server Introducing System Center Operations Manager 2007

  Related Events

Power Up! With Virtualization Online Conference Microsoft TechEd Developers Microsoft Belgium Developer & IT Pro Days 2006

  Related eBooks

Making SQL Server Perform Backup and Recovery Survival Guide HA Solutions for Windows, SQL, and Exchange Servers

  Related Essential Guides

The Essential Guide to Business Intelligence Reporting: Choosing the Right Tool for the Right Job Virtualization of SQL Server 2008 The Essential Guide to Reporting Services Tips & Tricks

  Related Resources

Buy One Get One Order SQL Server Magazine and get Windows IT Pro Magazine FREE!! Instant Gratification - Only $5.95!! Instant online access to thousands of SQL Server Magazine articles! Get It All - Order Windows It Pro VIP Today! Online access to 26,000+ articles. A $500+ value for only $279!!   Email Newsletters

  vLabs Links

SQL Server 2000 SQL Server 2005 Upgrade