Test your T-SQL to solve the unsolvable puzzle
Imagine you're at a job interview for an SQL programming position when the interviewer challenges you to find the solution to a puzzle. You give it your best shot, attacking the problem from several angles, and when the time is up, you find yourself right where you startedwith no solution. Years later, after giving more thought to the problem, you decide that the puzzle probably doesn't have a solution after all! That's what happened to Isaac Blank, a talented SQL programmer who kindly gave me permission to share his experience with you.
Here's the puzzle Isaac faced. Given the table that the code in Listing 1 creates, find one SELECT statement that retrieves all the missing numbers in the table. The output should be in the format that Figure 1 shows. Note that you're not allowed to create or use any additional tables, and the SELECT statement should be static (i.e., not built as an SQL string and run dynamically).
Isaac's interviewer claimed that the puzzle solution was a single SELECT query that involved a complex (five-way or so) self-join. Isaac thinks now that no such solution exists. I gave the puzzle my best shot, and after trying several solutions, I realized that they're all limited in some way. I also think that the puzzle probably doesn't have a solution, but I might be missing something. Let's look at how I went about trying to solve the puzzle. You can try to solve it first on your own or read on to see my conclusions. If you think that Isaac and I missed something and that you have a solution, I'd love to hear it.
Laying the Groundwork
Before you start a puzzle like this, you need to make sure that you have all the information required to solve it. It's important not to leave any room for ambiguity because a missing piece of information, however subtle, might determine the solution's complexity or even whether the puzzle is solvable. This puzzle is missing several pieces of information. First, must the solution be ANSI-compliant, or can it be any valid T-SQL query? In this case, the solution must be ANSI-compliant.
Second, the puzzle doesn't specify the lower and upper boundaries. Are they 1 and 1000? Or 1 and the maximum value in column a? Or the minimum and maximum values in column a? Let's presume just the minimum and maximum values in a, which happen to be 1 and 1000 in this case. The minimum and maximum values might be different next time you run your query if the table undergoes modifications in the meantime. For example, if the row with a value of 1 in column a is deleted and a new row with a value of 2000 is inserted, the minimum and maximum values become 2 and 2000. Your query needs to work correctly without your knowing what the minimum and maximum values will be. Furthermore, let's deal only with natural numbers (positive integers).
Third, what's the highest possible upper boundary? A certain specific value? The maximum integer SQL Server 2000 allows (the bigint data type allows a maximum of 263-1)? Or maybe any theoretical natural number that's not constrained by the limitations of the existing data types in the product you're implementing the solution on? Any theoretical natural number means that the solution would work without any changes even in future releases of SQL Server that might support larger integer data types. Isaac and I decided to assume any theoretical nonfinite natural number.
Now that you have all the missing pieces of information, you can start working on a solution. Note that you're not allowed to revise any of the requirements. For example, my initial thought was to provide the output not in the exact format specified but as ranges of missing numbers. You can try this approach as an exercise (I'll even get to it later in the article), but be aware that you're solving a different puzzle. Before you continue reading, I urge you to first give the puzzle some thought and try to find a solution.
Digging In
One approach to solving this puzzle is to first isolate the primary obstacles that prevent you from providing a solution. By ignoring certain requirements, you can create a simple solution, then focus on finding an alternative that meets the puzzle's requirements. The two most obvious obstacles are that you're allowed to refer to only Test1 and that the upper boundary can theoretically be any natural number. Let's remove these two obstacles for now by assuming that the upper boundary is finitesay 1000and that you can create and use tables other than Test1 in your query. At this point, the solution is simple: Use an auxiliary table that contains all integer numbers in the range 1 to 1000. You can use the script in Listing 2 to create and populate a table called Nums that has one integer column called num.
The following query is a solution to the simplified puzzle (make sure you first run the code in Listings 1 and 2 to create the tables):
SELECT num
FROM Nums
WHERE num BETWEEN (SELECT MIN(a) FROM Test1)
AND (SELECT MAX(a) FROM Test1)
AND NOT EXISTS(SELECT * FROM Test1
WHERE a = num)
ORDER BY num
The query returns from the Nums table all numbers that are between the minimum and maximum values in Test1 but that don't exist in Test1in other words, all the missing numbers you're looking for.