Creating the New Tables
In the case of these keyword arrays, we're looking at breaking up a string of terms into its component words. If you don't have access to the application code, or if the application code is written so that it's impossible to tell what its doing with the arrays, it's better to proceed conservatively with the data transfer. To determine the length of the longest array string in ASSET.keyword, execute the following code:
SELECT max(len(keywords)) FROM dbo.ASSET
If the longest array string is 55 bytes, for example, you'll create KEYWORD.KeywordDscr as a varchar(55).
Now, let's look at the three Web listings you downloaded. Web Listing 1 creates the original ASSET table and loads it with sample data, including the keyword array for each record. The data set is atypically smallonly 33 rows. Production tables like this one typically contain many thousands of rows. Notice the comments scattered through the three code listings. When you're working on a data-transfer project and porting data from one database schema to another, you must document everything you do.
The code in Web Listing 2 creates the three new normalized tables, IMAGE, KEYWORD, and IMAGEKEYWORD. I set the length of KeywordDscr in the KEYWORD table to 40 characters based on the result I got from the SELECT max(len(keywords)) FROM dbo.ASSET query. I created on the KeywordDscr column a unique clustered index; it's indexed for fast searches and unique to avoid duplicate keywords.
Time to Transfer
The scripts for Web Listing 3, which are the most complex of the three listings we're working with, transfer data from ASSET into IMAGE, KEYWORD, and IMAGEKEYWORD. The first part of Web Listing 3 is a simple, direct transfer of data from field to corresponding field. I wrote the transfer script from the point of view of the target table (i.e., where we're transferring the data). But remember that you should always document where the data is coming from, as the comment at callout A in Web Listing 3 shows.
In addition, row counts are necessary to ensure that your transfer scripts are working correctly. Execute SELECT statements throughout a transfer script so that you maintain control over the data. No rule specifies how often to check rowcounts, but you should check in at logical intervals. For example, the SELECT statement at callout B in Web Listing 3 shows a rowcount of 33; by checking this count periodically, you can be sure you're not losing or adding rows. If you're doing a simple, direct transfer, generally the row count at the end of the script will equal the row count at the beginning. The more complicated the data transfer, the less true this statement. In this article's example, breaking apart the array string into a list of component words will vastly increase the row count from the source to the target. There's no way to retain absolute control over the source-to-target row count in this case, but checking it gives you a sense of whether you're on the right track.
Use the SET IDENTITY_INSERT <tablename> ON function (as the line at callout C in Web Listing 3 shows) when you need to transfer identity values from the old database to the new one. As I mentioned, in the case of the IMAGE table, you must carry forward the existing identity values.
Transferring data from the old ASSET table into the new KEYWORD table is complicated. The lines at callout D in Web Listing 3 show the start of the section of code that will break up the keyword array from the ASSET table and transfer the individual component words into the KEYWORD table. The first step in this section, at callout E, checks for the longest string in the source table to be sure the KeywordDscr column in the KEYWORD table is long enough to accommodate the longest string in the Keywords column of the ASSET table.
When I do data transfers, I use "working tables" to store intermediate results. In this example, the code at callout F creates the working table _TEMP_Keywords_Catalog to hold the component words from the keywords array. The code at callout G creates the worktable _TEMP_KeywordArray to hold a copy of the keywords array from the ASSET table, along with the unique identifier (UID) for each record. In this temporary table, you need just the columns from the ASSET.Keywords fieldnot the entire recordfor data manipulation.
The next section of code is where the real work starts. If the code at callout H finds a space in a keywords array record, the code writes the record's unique ID and all characters up to the space into the _TEMP_KeywordsCatalog holding table. Then, the code removes from the keywords array everything it just copied up to and including the space. If the remainder of the array includes any leading spaces (which might happen if the original array record included two or more spaces between words), the code truncates those leading spaces. The code repeats the process until there's nothing left in the original keywords array record, then moves on to the next ASSET table record.
Prev. page
1
[2]
3
next page