• subscribe
June 23, 2010 01:33 PM

Programming SQL in a Set-Based Way

Pull yourself out of your comfort zone and think in a new way
SQL Server Pro
InstantDoc ID #125198
Downloads
SurvanceListings.zip

As T-SQL programmers, we always hear that the SQL language is optimized for set-based solutions rather than procedural solutions, but we seldom see examples from that perspective. Consequently, many beginning SQL programmers don’t have a clear understanding of what set-based means in terms of the code they need to write to solve a specific problem.

Even for those who understand the concept, there are many programming problems for which a set-based solution seems impossible. Sometimes that's true. It's not always possible to find a set-based solution, but most of the time we can find one by using a little creative thinking. A good SQL programmer must develop the mental discipline to explore set-based possibilities thoroughly before falling back on the intuitive procedural solution.

In this article, I provide a relatively simple example that illustrates how to think in a set-based way about a common type of problem that also has an intuitive procedural solution.

 

The Business Case

When you visit the doctor’s office, the first thing the nurse does is put you on a scale, record your weight, and check your height. Checking your weight makes sense from a medical point of view, but have you ever wondered why the nurse records your height each time? Unless you're very young, your height hasn’t changed since your last visit and isn't likely to change again.

The reason the nurse checks your height is to guard against identity theft. Health care providers want to make sure that the services they provide are going to the person who gets the bill—not to an imposter with a forged identity card.

This kind of identity theft happens more frequently than you might think. HIPAA regulations now require an audit of changes in permanent physical characteristics in a patient’s history that might suggest identity theft.

Querying this kind of information provides a good example for comparing procedural thinking and set-based thinking when programming in SQL.

 

The Problem Statement

The generic programming problem is that the solution depends on the order of rows and requires the comparison of current row values with values in previous rows. This is a type of problem in which the procedural solution is intuitive, but the set-based solution isn't so obvious.

In this particular problem, we're looking for rows where a previous visit for the same patient has a height value that's different from the height on the current record. We want to return the patient’s unique medical record number, the date the change occurred, what the height was changed from, and what the height was changed to. We don't want to return any records that don't mark a change in height.

Listing 1 gives you the code to create and populate the tables in this example, if you'd like to run the example yourself.

Listing 1: Creating and populating the tables

/*
We use the AdventureWorks sample database to create tables for
our test but you may use another database by changing
the USE statements in all 3 listings.
*/

USE AdventureWorks;

SET NOCOUNT ON;

CREATE TABLE Dates
(ID int, VisitDate datetime);

--populate table with 20 visit dates
DECLARE @i int, @startdate datetime;
SET @i = 1;
SET @startdate = GETDATE();

WHILE @i <= 20
BEGIN
    INSERT Dates
    (ID, VisitDate)
    VALUES (@i, @startdate);
    
    SET @startdate = DATEADD(dd,7, @startdate);
    SET @i = @i+1;
END

CREATE TABLE PatientHeight
(PatientID  int not null
,Height int);

-- populate table with 1000 patientids with heights between 59 and 74 inches
SET @i = 1;

WHILE @i <= 10000
BEGIN
    INSERT PatientHeight
    (PatientID, Height)
    VALUES (@i, @i % 16 + 59);
    
    SET @i = @i+1;
END

ALTER TABLE PatientHeight ADD CONSTRAINT PK_PatientHeight
    PRIMARY KEY(PatientID);

-- cartesian join produces 200,000 PatientVisit records

SELECT
    ISNULL(PatientID, -1) AS PatientID,
    ISNULL(VisitDate, '19000101') AS VisitDate,
    Height
INTO PatientVisit
FROM PatientHeight
CROSS JOIN Dates;

ALTER TABLE PatientVisit ADD CONSTRAINT PK_PatientVisit
    PRIMARY KEY(PatientID, VisitDate);

-- create changes of height
SET @i = 3;

WHILE @i < 10000
BEGIN
    UPDATE pv
    SET Height = Height +2
    FROM PatientVisit pv
    WHERE PatientID = @i
    AND pv.VisitDate =
    (SELECT TOP 1 VisitDate
    FROM Dates
    where id = ABS(CHECKSUM(@i)) % 19);
    
SET @i = @i + 7;
END

/*
-- return AdventureWorks to its previous state when you are finished
-- with this example.

DROP TABLE Dates;
DROP TABLE PatientHeight;
DROP TABLE PatientVisit;
*/

A Procedural Approach

The intuitive, procedural way to attack this problem is to order the records by patient and visit date, then loop through the records for each patient one row at a time. We query the first record for the patient and save the patient’s original height in a variable. Then, we loop through subsequent records for the patient, comparing height values. If we find that the height is different on a subsequent record, we write an audit record, update the height variable with the current value, and continue looping through the rows. Then we move to the next patient.

Listing 2 contains the code for the cursor-based solution. The cursor method works, but it's very inefficient. It could pose a serious performance problem when working with a large number of rows. How can we do this in a set-based and presumably more efficient way?

Listing 2: The cursor-based solution
USE AdventureWorks;

CREATE TABLE #Changes
( PatientID int
, VisitDate    datetime
, BeginHeight smallint
, CurrentHeight    smallint);

DECLARE @PatientID        int
,        @CurrentID        int
,        @BeginHeight    smallint
,        @CurrentHeight    smallint
,        @VisitDate        datetime;

SET @PatientID = 0;

DECLARE Patient_cur CURSOR FAST_FORWARD FOR
SELECT PatientID
, VisitDate
, Height
FROM PatientVisit
ORDER BY PatientID
,VisitDate;

OPEN Patient_cur;

FETCH NEXT FROM Patient_cur INTO @CurrentID, @VisitDate, @CurrentHeight;

WHILE @@FETCH_STATUS = 0
BEGIN
-- first record for this patient
IF @PatientID <> @CurrentID
BEGIN
    SET @PatientID = @CurrentID;
    SET @BeginHeight = @CurrentHeight;
END

IF @BeginHeight <> @CurrentHeight
BEGIN
INSERT #Changes ( PatientID
, VisitDate
, BeginHeight
, CurrentHeight)
VALUES
(@PatientID
, @VisitDate
, @BeginHeight
, @CurrentHeight);

SET @BeginHeight = @CurrentHeight;

END

FETCH NEXT FROM Patient_cur INTO @CurrentID, @VisitDate, @CurrentHeight;

END

CLOSE Patient_cur;
DEALLOCATE Patient_cur;

SELECT * FROM #Changes

DROP TABLE #Changes

 



ARTICLE TOOLS

Comments
  • SURVANCE
    2 years ago
    Oct 05, 2010

    AMY,

    that solution gets the same results but relies on data that you won't have in the real world, ie a correct height for each patient in the patientheight table. that table was an itermediate step in creating the sample data but does not figure in the solution.

  • Amy
    2 years ago
    Sep 30, 2010

    After populating the tables, I tried to write my own solution b4 looking at yours. I came up with:

    SELECT ph.PatientID, ph.Height as BeginningHeight, VisitDate, pv.Height as ChangedHeight
    FROM PatientHeight ph INNER JOIN PatientVisit pv
    ON ph.PatientID=pv.PatientID
    WHERE ph.Height <> pv.Height
    Order by ph.PatientID

    This actually seems to work better than the CTE solution. It would never have occurred to me to try a cursor solution.

  • SURVANCE
    2 years ago
    Aug 30, 2010

    the 19 in ABS(CHECKSUM(@i)) % 19 as well as the 7 in SET @i = @i + 7; are prime numbers that were used to make the selection of visitdates and patients psuedo random. It didn't work out very random but it was enough for our test purposes.

  • Kehoe
    2 years ago
    Aug 19, 2010

    I don't see why 'ABS(CHECKSUM(@i)) % 19' was used in the UPDATE statement in Listing 1. Should it have been '1 + ABS(CHECKSUM(@i)) % 19' so that height changes would include all 20 dates and 19 possible date differences?

You must log on before posting a comment.

Are you a new visitor? Register Here