I've received a lot of feedback during the past few months from readers suggesting solutions to puzzles I've presented in my column. So this month and next, I'll share some of the best solutions. I received many creative solutions and had a tough time deciding which to discuss. Thanks to all who sent in your ideas, and keep trying to solve the puzzles and improve your T-SQL skills. Sometimes, working on a puzzle can be its own reward.
August Relational-Division Puzzle
This month, I share two set-based solutions to last month's relational-division puzzle (which I had posted in a private SQL Server MCT forum) from Leroy Tuttle and Dieter Nöth. The puzzle scenario involved a training facility that uses three tables—Courses, Materials, and CourseMaterials—to track information about courses and course materials. You can run the code from August's Listing 1 to create and populate these tables. (To read last month's column, "Survive the (Relational) Divide," or download its code listings, enter InstantDoc ID 39300 at http://www.sqlmag.com.) The puzzle's object was to return, for each course, the minimum course ID of all the courses that use the same set of course materials. The desired solution appeared in August's Figure 1.
For benchmarking, August's Listing A in the sidebar "Benchmarking Relational-Division Solutions" (InstantDoc ID 39301) provided code that populates the tables with large sets of data. When I populated the tables with 10,000 courses, 10,000 materials, and about 55,000 course materials, my set-based solutions ran for several minutes on my laptop. In contrast, Tuttle's solution runs for about 30 seconds, and Nöth's solution runs for less than 1 second.
Tuttle's Solution
Tuttle's solution correlates courses with each other based on a match in both the total count of materials per course and the individual materials themselves. I slightly revised the original solution Tuttle sent me to make it purely set-based, but the principles are the same. Let's first look at a modular approach that uses views to solve the puzzle one step at a time; then, I'll present a one-step solution that uses derived tables.
The first step in the modular approach is to create the CoursesXYCnt view by running the code that Listing 1 shows. The view query joins the CourseMaterials table to itself, matching rows that have the same courseid and materialid, and returns a row for each unique combination of course IDs that have matching rows. I call the two instances of CourseMaterials X and Y. The view returns the courseid from X (courseidX), the courseid from Y (courseidY), and the count of matching rows (cnt). So we're matching the materials that are the same for both courses and returning the count of matches. Note that the count of matching rows can be smaller than the count of course materials in X or Y if courses X and Y don't have equal sets of course materials. Figure 1 shows the results of a SELECT * query against CoursesXYCnt.
The second step in this solution is to create the CoursesCnt view by running the code that Listing 2 shows. This view contains a simple query that returns the number of course materials for each course. Figure 2 shows the results of a SELECT * query against CoursesCnt.
The third and final step is to join the CoursesXYCnt view to two instances of CoursesCnt—one for Course X and one for Course Y. The join ensures that you retrieve only those rows from CoursesXYCnt that have the same count as the number of course materials in course X and course Y. Only the courses whose course materials match exactly meet the join criteria. Now all you need to do to get the desired output is to group the result by courseidX and return the minimum courseidY, as the query in Listing 3 shows.
I used views to simplify the solution's explanation, solving the problem a step at a time, but you might prefer to use derived tables instead of views, as the code in Listing 4 shows. Derived tables are handy if you'd rather not create persistent objects such as views in your database—for example, because of lack of permissions. In terms of performance, both solutions perform the same. Views are often handy when your code becomes too long or complex to maintain, but the code in Listing 4 is short even when you use derived tables, so there's no practical reason to break it into views. As I said earlier, Tuttle's solution runs for 30 seconds on my laptop against a table containing about 55,000 course materials.