In last month's column ("Put Your Database to the Test," InstantDoc ID 93533),
I introduced the concept of formalized unit testing of database objects. In
that article, I showed how Visual Studio Team Edition for Database Professionals
(Team Data) can provide a rich infrastructure that lets you create a full set
of unit tests for your database. However, database testing requires more than
just the unit tests; in fact, one of the most important aspects of database
testing is a good set of test data. Without good test data, how can you be sure
that the tests you run find all the problems? In most cases, unless you test
against the real production data, it's hard to know that the tests you run are
meaningful. Team Data can help you create realistic test data so that you won't
need to deal with unit-testing problems.
The Problem with Data
In many situations, you can't test against production data because you aren't
allowed to see it. Real production data can contain credit card numbers, social
security numbers, salary information, and many other types of personal information
that you don't want everyone to see. So how do you get around this problem?
You could use an ETL tool to take a cut of the production data and obfuscate
the sensitive data. But this solution can lead to a random, inaccurate representation
of your production systems and requires a significant amount of work to set
up.
Another solution is to create a random set of test data by using union queries that will generate the required quantity of data. Unfortunately, this solution can result in repeated data and doesn't accurately represent the production system either.You can also use SQL Server 2005's TABLESAMPLE option to sample production environment data and copy it to the test environment. However, this solution also has limitations, such as difficulty with handling data that's related through foreign keys and production data exposure.
Even if you get past the first problem of generating adequate test data, you'll still run into a second problem: How do you ensure that every time you run a unit test— potentially many times—you're testing against exactly the same data each time? In many cases, the tests you run alter the data so that the next time you run the test, the data is different. In these cases, the test results might be different—or at the best inconclusive.
Start with a Plan
Team Data addresses these test-data problems by letting you create something
called a Data Generation Plan, which defines how to create a set of test data.
A functionality called DataGenerator runs this Data Generation Plan to create
the test data.To create a meaningful plan, you must understand the database's
types and ranges of data, the shape of the data, and the normal data distribution
between tables. All of these things are adjustable in the DataGenerator interface,
which Web Figure 1 shows. The interface lets you define relationships between
tables and define relationship ratios. For instance, if you looked at the shape
of the data and determined that, on average, a customer had 10 orders, you can
define this ratio to create data that matches that definition.
In addition, the Data Generation Plan lets you define exactly what data is
generated into each field for any particular table when the Plan runs.You can
define this data by using either one of the nine built-in generators or by creating
custom generators. The basic data generators let you create data for all the
standard data types and include a regular expression generator that lets you
use regular expressions (i.e., RegEX) to define how data is generated. You can
use RegEX to define the format, then let the generator create random values
for you.This functionality is useful for generating order numbers, zip codes,
phone numbers, and the like.
The DataGenerator functionality also avoids the danger of changing the data
every time you run a test by assigning a seed value to all the generated data.This
seed value ensures that every time you run a given Data Generation plan, the
Data-Generator will create exactly the same test data because it is based on
the seed.
You can run these Data Generation Plans manually or combine them with the unit
test functionality I explained in "PutYour Database to the Test." You can assign
a Data Generation Plan to a particular unit test so that every time you run
the unit test, the Data Generation plan runs and deploys the same test data
to the database that the test is running against. Now that we've defined a baseline,
created a set of tests to support that baseline, and ensured our data's reliability,
next I'll look at how Team Data features can make it easier to change the database.
End of Article