DOWNLOAD THE CODE:
Download the Code 25572.zip

How to prepare for OpenXML in a real-world application

If you're writing an application that uses XML as a data-interchange format, you need a way to extract the data from the XML document and store that data in your database. T-SQL's OpenXML keyword is a convenient way to insert, delete, and update data in SQL Server 2000. OpenXML creates one or more relational views (or rowsets) of the XML document within a T-SQL stored procedure. You can access the data in those views to perform relational operations on your database. Although OpenXML is fairly easy to work with, you need to overcome a few hurdles before you can use it in real-world applications. I've discussed the largest stumbling block—the difficulty in passing an XML document to a stored procedure—in my Exploring XML columns in July 2001, InstantDoc ID 21077, and August 2001, InstantDoc ID 21259. Now let's look at two other common OpenXML obstacles you might encounter.

Declaring Namespace Prefixes
Real-world XML documents use namespace-qualified elements to ensure that the application that processes the data interprets the document correctly. If your application uses OpenXML to extract data from an XML document, at some point you'll likely need to specify namespace-qualified elements as XPath queries for the row and column patterns that OpenXML uses to construct the relational view. Row patterns use an XPath query to select elements from the XML document that correspond to rows in the relational view. Column patterns use XPath queries to select columns for each of the rows. Specifying namespace-qualified elements or attributes might be a challenge, especially if you don't know which prefixes will be used in the XML. A namespace prefix provides an abbreviated way to specify that an element or attribute belongs to a particular namespace defined by a Uniform Resource Identifier (URI). You can't depend on a prefix being associated with a particular namespace because the namespace specification lets you choose prefixes arbitrarily, even letting different prefixes correspond to the same namespace. (For more information about namespace specification, see the Namespaces in XML specification at http://www.w3.org/TR/1999/REC-xml-names-19990114/.) However, you can overcome the problem of undeclared namespace prefixes by using a system stored procedure.

Sp_xml_preparedocument is a system-supplied stored procedure that T-SQL uses to parse an XML document into an in-memory representation and return a numeric handle to the in-memory document. OpenXML uses this handle to obtain the in-memory representation of the XML document and build a rowset incorporating the data the document contains. If you've used sp_xml_preparedocument before, you're familiar with the first parameter (the handle to the in-memory representation of the XML document) and the second parameter (the XML document that is parsed into memory). But you might be surprised to know that you can pass a third, optional parameter to sp_xml_preparedocument. This parameter contains an XML document that declares prefixes for namespaces that you can use within the OpenXML syntax. You simply write a root element for the document, then include namespace declarations on it. For example, the <root xmlns:p="urn:myPerson" /> XML document declares the prefix p for the namespace associated with the "urn:myPerson" URI. Then, you can use the prefixes from these namespace declarations in your XPath queries to build a rowset that contains data from the namespace-qualified attributes or elements. Let's look at a practical example.

Listing 1, page 42, shows a simple XML document that contains information about a book, including the title of the book and the title of the book's author. These specifications create ambiguity because you're using the <title> tag to specify both the book and author titles. The two <title> tags would be indistinguishable except that each belongs to a different namespace. Suppose I want to use OpenXML to extract the book's title and the author's title, first name, and last name. Listing 2 shows the T-SQL code for the stored procedure containing an OpenXML statement that extracts the book and author data. Callout A shows the call that executes sp_xml_preparedocument. Notice the last parameter. This XML document declares the namespaces that will be used in the OpenXML statement. Callout B shows the OpenXML statement. Notice the use of the prefixes declared at callout A. OpenXML properly matches the namespaces associated with the prefixes.

Listing 3 shows the T-SQL code that executes Listing 2's stored procedure. Notice that the document passed to the stored procedure uses different prefixes (bk and pers) than those in the third parameter to sp_xml_preparedocument. Namespace prefixes are just placeholders for the URIs that define the namespaces (in this case, myBooks and myPerson). When the OpenXML statement executes, it matches the namespace URI associated with the prefix rather than the prefix itself. Therefore, the stored procedure correctly selects the book and author titles. Figure 1 shows the results of executing the stored procedure.

   Prev. page   [1] 2     next page



You must log on before posting a comment.

If you don't have a username & password, please register now.