As T-SQL programmers, we always hear that the SQL language is optimized for set-based solutions rather than procedural solutions, but we seldom see examples from that perspective. Consequently, many beginning SQL programmers don’t have a clear understanding of what set-based means in terms of the code they need to write to solve a specific problem.
Even for those who understand the concept, there are many programming problems for which a set-based solution seems impossible. Sometimes that's true. It's not always possible to find a set-based solution, but most of the time we can find one by using a little creative thinking. A good SQL programmer must develop the mental discipline to explore set-based possibilities thoroughly before falling back on the intuitive procedural solution.
In this article, I provide a relatively simple example that illustrates how to think in a set-based way about a common type of problem that also has an intuitive procedural solution.
The Business Case
When you visit the doctor’s office, the first thing the nurse does is put you on a scale, record your weight, and check your height. Checking your weight makes sense from a medical point of view, but have you ever wondered why the nurse records your height each time? Unless you're very young, your height hasn’t changed since your last visit and isn't likely to change again.
The reason the nurse checks your height is to guard against identity theft. Health care providers want to make sure that the services they provide are going to the person who gets the bill—not to an imposter with a forged identity card.
This kind of identity theft happens more frequently than you might think. HIPAA regulations now require an audit of changes in permanent physical characteristics in a patient’s history that might suggest identity theft.
Querying this kind of information provides a good example for comparing procedural thinking and set-based thinking when programming in SQL.
The Problem Statement
The generic programming problem is that the solution depends on the order of rows and requires the comparison of current row values with values in previous rows. This is a type of problem in which the procedural solution is intuitive, but the set-based solution isn't so obvious.
In this particular problem, we're looking for rows where a previous visit for the same patient has a height value that's different from the height on the current record. We want to return the patient’s unique medical record number, the date the change occurred, what the height was changed from, and what the height was changed to. We don't want to return any records that don't mark a change in height.
Listing 1 gives you the code to create and populate the tables in this example, if you'd like to run the example yourself.
Listing 1: Creating and populating the tables
/*
We use the AdventureWorks sample database to create tables for
our test but you may use another database by changing
the USE statements in all 3 listings.
*/
USE AdventureWorks;
SET NOCOUNT ON;
CREATE TABLE Dates
(ID int, VisitDate datetime);
--populate table with 20 visit dates
DECLARE @i int, @startdate datetime;
SET @i = 1;
SET @startdate = GETDATE();
WHILE @i <= 20
BEGIN
INSERT Dates
(ID, VisitDate)
VALUES (@i, @startdate);
SET @startdate = DATEADD(dd,7, @startdate);
SET @i = @i+1;
END
CREATE TABLE PatientHeight
(PatientID int not null
,Height int);
-- populate table with 1000 patientids with heights between 59 and 74 inches
SET @i = 1;
WHILE @i <= 10000
BEGIN
INSERT PatientHeight
(PatientID, Height)
VALUES (@i, @i % 16 + 59);
SET @i = @i+1;
END
ALTER TABLE PatientHeight ADD CONSTRAINT PK_PatientHeight
PRIMARY KEY(PatientID);
-- cartesian join produces 200,000 PatientVisit records
SELECT
ISNULL(PatientID, -1) AS PatientID,
ISNULL(VisitDate, '19000101') AS VisitDate,
Height
INTO PatientVisit
FROM PatientHeight
CROSS JOIN Dates;
ALTER TABLE PatientVisit ADD CONSTRAINT PK_PatientVisit
PRIMARY KEY(PatientID, VisitDate);
-- create changes of height
SET @i = 3;
WHILE @i < 10000
BEGIN
UPDATE pv
SET Height = Height +2
FROM PatientVisit pv
WHERE PatientID = @i
AND pv.VisitDate =
(SELECT TOP 1 VisitDate
FROM Dates
where id = ABS(CHECKSUM(@i)) % 19);
SET @i = @i + 7;
END
/*
-- return AdventureWorks to its previous state when you are finished
-- with this example.
DROP TABLE Dates;
DROP TABLE PatientHeight;
DROP TABLE PatientVisit;
*/
A Procedural Approach
The intuitive, procedural way to attack this problem is to order the records by patient and visit date, then loop through the records for each patient one row at a time. We query the first record for the patient and save the patient’s original height in a variable. Then, we loop through subsequent records for the patient, comparing height values. If we find that the height is different on a subsequent record, we write an audit record, update the height variable with the current value, and continue looping through the rows. Then we move to the next patient.
Listing 2 contains the code for the cursor-based solution. The cursor method works, but it's very inefficient. It could pose a serious performance problem when working with a large number of rows. How can we do this in a set-based and presumably more efficient way?
Listing 2: The cursor-based solution
USE AdventureWorks;
CREATE TABLE #Changes
( PatientID int
, VisitDate datetime
, BeginHeight smallint
, CurrentHeight smallint);
DECLARE @PatientID int
, @CurrentID int
, @BeginHeight smallint
, @CurrentHeight smallint
, @VisitDate datetime;
SET @PatientID = 0;
DECLARE Patient_cur CURSOR FAST_FORWARD FOR
SELECT PatientID
, VisitDate
, Height
FROM PatientVisit
ORDER BY PatientID
,VisitDate;
OPEN Patient_cur;
FETCH NEXT FROM Patient_cur INTO @CurrentID, @VisitDate, @CurrentHeight;
WHILE @@FETCH_STATUS = 0
BEGIN
-- first record for this patient
IF @PatientID <> @CurrentID
BEGIN
SET @PatientID = @CurrentID;
SET @BeginHeight = @CurrentHeight;
END
IF @BeginHeight <> @CurrentHeight
BEGIN
INSERT #Changes ( PatientID
, VisitDate
, BeginHeight
, CurrentHeight)
VALUES
(@PatientID
, @VisitDate
, @BeginHeight
, @CurrentHeight);
SET @BeginHeight = @CurrentHeight;
END
FETCH NEXT FROM Patient_cur INTO @CurrentID, @VisitDate, @CurrentHeight;
END
CLOSE Patient_cur;
DEALLOCATE Patient_cur;
SELECT * FROM #Changes
DROP TABLE #Changes