• subscribe
August 19, 2003 12:00 AM

Try, Try Again

Readers offer alternative solutions to the relational-division puzzle
SQL Server Pro
InstantDoc ID #39604
Downloads
39604.zip

I've received a lot of feedback during the past few months from readers suggesting solutions to puzzles I've presented in my column. So this month and next, I'll share some of the best solutions. I received many creative solutions and had a tough time deciding which to discuss. Thanks to all who sent in your ideas, and keep trying to solve the puzzles and improve your T-SQL skills. Sometimes, working on a puzzle can be its own reward.

August Relational-Division Puzzle
This month, I share two set-based solutions to last month's relational-division puzzle (which I had posted in a private SQL Server MCT forum) from Leroy Tuttle and Dieter Nöth. The puzzle scenario involved a training facility that uses three tables—Courses, Materials, and CourseMaterials—to track information about courses and course materials. You can run the code from August's Listing 1 to create and populate these tables. (To read last month's column, "Survive the (Relational) Divide," or download its code listings, enter InstantDoc ID 39300 at http://www.sqlmag.com.) The puzzle's object was to return, for each course, the minimum course ID of all the courses that use the same set of course materials. The desired solution appeared in August's Figure 1.

For benchmarking, August's Listing A in the sidebar "Benchmarking Relational-Division Solutions" (InstantDoc ID 39301) provided code that populates the tables with large sets of data. When I populated the tables with 10,000 courses, 10,000 materials, and about 55,000 course materials, my set-based solutions ran for several minutes on my laptop. In contrast, Tuttle's solution runs for about 30 seconds, and Nöth's solution runs for less than 1 second.

Tuttle's Solution
Tuttle's solution correlates courses with each other based on a match in both the total count of materials per course and the individual materials themselves. I slightly revised the original solution Tuttle sent me to make it purely set-based, but the principles are the same. Let's first look at a modular approach that uses views to solve the puzzle one step at a time; then, I'll present a one-step solution that uses derived tables.

The first step in the modular approach is to create the CoursesXYCnt view by running the code that Listing 1 shows. The view query joins the CourseMaterials table to itself, matching rows that have the same courseid and materialid, and returns a row for each unique combination of course IDs that have matching rows. I call the two instances of CourseMaterials X and Y. The view returns the courseid from X (courseidX), the courseid from Y (courseidY), and the count of matching rows (cnt). So we're matching the materials that are the same for both courses and returning the count of matches. Note that the count of matching rows can be smaller than the count of course materials in X or Y if courses X and Y don't have equal sets of course materials. Figure 1 shows the results of a SELECT * query against CoursesXYCnt.

The second step in this solution is to create the CoursesCnt view by running the code that Listing 2 shows. This view contains a simple query that returns the number of course materials for each course. Figure 2 shows the results of a SELECT * query against CoursesCnt.

The third and final step is to join the CoursesXYCnt view to two instances of CoursesCnt—one for Course X and one for Course Y. The join ensures that you retrieve only those rows from CoursesXYCnt that have the same count as the number of course materials in course X and course Y. Only the courses whose course materials match exactly meet the join criteria. Now all you need to do to get the desired output is to group the result by courseidX and return the minimum courseidY, as the query in Listing 3 shows.

I used views to simplify the solution's explanation, solving the problem a step at a time, but you might prefer to use derived tables instead of views, as the code in Listing 4 shows. Derived tables are handy if you'd rather not create persistent objects such as views in your database—for example, because of lack of permissions. In terms of performance, both solutions perform the same. Views are often handy when your code becomes too long or complex to maintain, but the code in Listing 4 is short even when you use derived tables, so there's no practical reason to break it into views. As I said earlier, Tuttle's solution runs for 30 seconds on my laptop against a table containing about 55,000 course materials.



ARTICLE TOOLS

Comments
  • Itzik Ben-Gan
    9 years ago
    Sep 21, 2003

    Hi, Alejandro,

    It's encouraging to see that people don't give up on pure set-based solutions! Your solution is interesting and runs for a minute on my laptop against ~55,000 rows. I've got a few more pure set-based solutions from other readers, and one of them even performs better than all the solutions that I presented in the September column. Stay tuned until the October column, where I present a couple of those solutions. Thanks for sharing your solution.
    --Itzik

  • Alejandro Mesa
    9 years ago
    Sep 05, 2003

    I really enjoy every single article on this Web site. Here's another try at solving the relational-division puzzle:

    select c.courseid,
    case
    when coalesce(min(a.c2), c.courseid) > c.courseid then c.courseid
    else coalesce(min(a.c2), c.courseid)
    end as min_courseid
    from courses as c
    left join
    (
    select cm1.courseid,
    cm2.courseid,
    count(*)
    from courseMaterials as cm1 inner join courseMaterials as cm2
    on cm1.materialid = cm2.materialid
    where cm1.courseid <> cm2.courseid
    group by cm1.courseid, cm2.courseid
    having 2 * count(*) = (
    select count(*)
    from courseMaterials as cm
    where cm.courseid in (cm1.courseid, cm2.courseid)
    )
    ) as a(c1, c2, mats)
    on c.courseid = a.c1
    group by c.courseid

    I'm not totally proficient in English yet (still learning), but here's my explanation of the solution. Joining the table courseMaterials to itself by materialid and grouping on both course ids (cm1.courseid, cm2.courseid) having its count(*) multiplied by 2 equal to the number of materials used by both course ids will give us the pairs of courses that use the same materials.

    If we left-join the table courses to the previous result and group by courseid, we can have the minimum between the courseid and the ones that match.

    Regards,
    Alejandro Mesa

You must log on before posting a comment.

Are you a new visitor? Register Here
  • SP1?
    I know there is a SP1 for SQL 2008 R2 available....and there is a "feature pack" as well... ...
  • SQL database mirroring
    I have SQL Server 2008 R2 Enterprise 64bit on Windows 2008 R2 Enterprise 64bit.  Each SQL Server has...
  • Dell Compellent Disk Drive
    Does anybody has experience with Dell Compellent Disk Drive? Basically, this system manages all disk...
  • Sql server performance tuning
    I need to find a tool that help me to optimize sql server,queries,improve the performance and solve ...