• subscribe
May 01, 2002 12:00 AM

Statistical Process Control

SQL Server Pro
InstantDoc ID #24814
Downloads
24814.zip

T-SQL provides a method for calculating a useful trend marker

Many industries use statistical process control (SPC) to monitor the quality or consistency of a product. Technicians or monitoring devices test the product at regular intervals, and data analysts process the resulting sample data to identify variations and trends. Historically, analysts manually transcribed data onto control charts such as the one that Figure 1 shows, performed statistical calculations, and made visual inspections. Today, process control can be highly automated; analysts use database management systems (DBMSs), either alone or with specialized software, to manage and analyze the data.

Suppose you're monitoring the concentration of antimony in Manhattan drinking water by testing water from various locations every hour. Fifteen sequential measurements from two locations might produce the data that Figure 1 shows. Patterns in this data might suggest that the antimony concentration is increasing and that water department officials should perform additional testing or take corrective action.

You expect to find some fluctuation in the sample measurements, but certain patterns will rarely occur unless an underlying trend exists. One such pattern, or trend marker, is the occurrence of seven consecutive above-average measurements; another is two consecutive measurements in the top 5 percent of recorded measurements. SPC analysis uses these and other criteria to identify potential trends. Commercial SPC software packages provide a variety of trend markers to choose from, and which ones you use depends on various aspects of your process. For example, some trend markers are better indicators than others when you know your measurements are prone to error; other trend markers are better at detecting or ignoring periodic variations. However, commercial software packages don't typically include one useful trend marker—the presence of an increasing subsequence of a given length. To make this helpful SPC measure available to a SQL Server application, I developed some T-SQL code to query data for the length of the longest increasing subsequence.

To understand what an increasing subsequence is and why it's a good trend marker, try to answer the following question: In what month of the year do you have the best chance of being able to take a noontime walk outside nine times during the month on days that the outside temperature is warmer than it was during the previous walk? If you live in New York City, you might say March or April. If you live in Sydney, you might say October. And if you live in Guam, you might say the question is impossible to answer because the noontime temperature is between 84 and 86 degrees Fahrenheit almost every day of the year.

Noontime temperature is a physical quantity that can be measured, and noontime temperatures for a one-month period are a sequence of measurements. The noontime temperatures on the days you walked form a subsequence of those measurements. The question above asked you to find an increasing subsequence of nine measurements; this challenge shows how trends in a physical quantity can reveal an increasing subsequence of measurements. Thus, a long increasing subsequence is a good trend marker.

One increasing subsequence in the data that Figure 1 shows comprises the Midtown water readings numbered 3, 4, 6, and 9, which have values of 3.939, 4.050, 4.140, and 4.143, respectively. Can you find the length of the longest increasing subsequence for Midtown? For Uptown? The answers will say something about possible trends in the data; later I describe a general solution in T-SQL.

The Problem
The original problem and solution resulted from a question that John Winterbottom posted to the Microsoft newsgroup microsoft.public.sqlserver.programming. Winterbottom wanted to process sequences of seven measurements to identify which sequences contained increasing subsequences of length 4. He asked how to enumerate the subsequences of length 4 within a sequence of length 7 so that he could inspect each subsequence to see whether the values were increasing. A sequence of length 7 contains only 35 subsequences of length 4, so generating those 35 subsequences and inspecting each one is a practical approach. Listing 1 shows the T-SQL code that generates the increasing subsequences of length 4 that exist in a sequence of seven measurements.

I proposed one improvement to Winterbottom's solution before I discovered a more efficient algorithm that I describe in a moment. I observed that the existence of an increasing length-4 subsequence depended only on the relative order of the seven measurements. For example, if the seven measurements, in the order they were made, are M1, M2, M3, M4, M5, M6, and M7 and of these values, if M1 is the third largest, M2 sixth largest, M3 fifth largest, M4 second largest, M5 seventh largest, M6 fourth largest, and M7 first largest, then M2, M3, M6, M7 is one increasing subsequence of length 4, regardless of the exact measurements, because these values are in the order sixth, fifth, fourth, and first largest of all seven.

With seven measurements, you can rank the measurements (first through seventh) in only 5040 ways, so it's possible to calculate the particular rank orderings that contain increasing subsequences of length 4 one time and store the results in a permanent table. Then, the next time you need to process a sequence of seven measurements, you can compute the rank order for that sequence and use the permanent table to look up the order. Creating the permanent table is a tedious process, but this method provides a reasonable solution to the problem, and being able to reuse the information in the table makes the process simpler. (If duplicate measurements exist among the seven, the earlier one must be ranked higher for this method to give correct results.)



ARTICLE TOOLS

Comments
    There are no comments to display. Be the first one!
You must log on before posting a comment.

Are you a new visitor? Register Here