Suppose your company has just established a purchasing relationship with a new computer-books supplier. The supplier periodically sends you a large XML document (between one and several hundred megabytes) that contains a catalog of publications. As a database professional, you have to load the XML catalog into your company's online catalog application, which uses a SQL Server database, so that employees can search the data.
Scenarios like this one are becoming more commonplace. Microsoft anticipated the need for loading large XML documents into SQL Server and provided an excellent utility, XML Bulk Load, that accomplishes that job. XML Bulk Load is a standalone COM object that reads data from an XML document by using a stream rather than reading the entire XML document into memory, which is impractical, if not impossible, for large documents. Using a stream conserves memory and increases the processing speed for large documents. But streams can present a challenge when you're loading data into a normalized database that uses identity columns for primary keys.
Identity columns contain values that SQL Server automatically generates when a row is inserted into a table. The values in an identity column uniquely identify the row, so they're good candidates for foreign keys that reference the row's data. But say you're using XML Bulk Load to store data in two related tables; an identity column is the primary key in the first table, and a foreign key in the second table references that column. In that case, you need to write a program to match rows from the two tables and correctly set the foreign key values.
Besides adding more work to your task, writing a program also proves problematic for cases when a secondary key doesn't exist in the first table because you have no way other than the identity column to uniquely identify rows. But now you can avoid programming altogether. With the release of SQLXML 3.0 Service Pack 2 (SP2) Beta 1, which you can download at
http://www.microsoft.com/downloads/details.aspx?FamilyId=4C8033A9-CF10-4E22-8004-477098A407AC&displaylang=en, there's a solution to this problem in XML Bulk Load. Let's look at how it works.
XML Bulk Load and Identity Columns
Before jumping into an example that illustrates how to use XML Bulk Load to properly store the value of a foreign key, let me explain how XML Bulk Load operates. The XML Bulk Load COM object provides properties that you use to set parameters and options. The available parameters include the connection string, the ADO Command object, and the location of an error log file. Options include the ability to lock tables and execute within the context of a transaction. XML Bulk Load uses an XML View to load data into SQL Server. A mapping schema defines the XML View and specifies how data from the XML document is stored in the tables in your database. The mapping schema also specifies how the relationships between elements in the XML document—most commonly expressed by nesting one element inside another—are modeled in your normalized relational database. Correctly constructing the mapping schema is the crucial part of successfully inserting foreign keys that use identity-column values.
To understand how you construct an appropriate mapping schema, first consider the XML document that Figure 1, page 33, shows. This simple XML document represents the supplier's catalog I described earlier. Each publication listed in the document includes the publication's author. The goal is to bulk-load the data from Figure 1's document into the tables that the T-SQL code in Listing 1 defines. Note that each table contains an identity column as the primary key (PubId and AuthId), and Titles contains a foreign key column that references the author of the publication in Authors.
Prev. page  
[1]
2
next page