One of the most practical models available in SQL Server to represent trees and hierarchies, such as an Employees hierarchy, is known as the enumerated path model. In this model you add two attributes to the table. One attribute holds an enumerated path of all node IDs leading to the current node, and another attribute holds the level, or distance, from the root node. To implement this model, you can create your own custom solution in any version of SQL Server, or you can use the built-in HIERARCHYID data type introduced in SQL Server 2008. I covered both approaches in detail in previous articles. I covered the custom solution in “Maintaining Hierarchies” and the solution based on the HIERARCHYID data type in “HierarchyID” and “SQL Server 2008’s HierarchyID, Part 2.”
The queries you would write to address common requests against trees and hierarchies implemented based on the enumerated path model are quite intuitive and straightforward. The intuitive queries you would write to address some of the requests, such as returning a subtree (e.g., returning all subordinates of a given manager), run very efficiently if you create the correct indexes. However, the intuitive queries you would write to address the common path request (e.g., returning the chain of management leading to a given employee) are optimized very inefficiently. The path request is the focus of this article; I explain why the intuitive queries addressing this request are optimized inefficiently, and I provide alternative solutions that perform quite well.
The sample data that I use in my queries are two tables called Tree1 and Tree2, which you create and populate by running the code in Listing 1. Note that the code takes a few minutes to run because it populates each of the tables with more than a million rows. The code in Listing 1 first creates an auxiliary table of numbers called Nums with a column called n; the code later uses the table to populate Tree1 and Tree2.
Tree1 is implemented based on the custom enumerated path model. It has the following attributes: nodeid (integer node ID such as an employee ID), filler (character 200-byte filler representing various attributes you would normally have, such as employee name or salary), mpath (character materialized enumerated path—e.g., '.1.3.7.9.' for node 9, which is a child of 7, which is a child of 3, which is a child of 1, which is the root node), and lvl (integer level in the tree, with 0 assigned to the root node, 1 to the children of the root node, etc.). Tree2 is implemented using the HIERARCHYID data type in SQL Server 2008. It has the attributes nodeid, filler, hid (of the HIERARCHYID data type), and lvl.
The code in Listing 1 also creates indexes to support various queries against the trees. The clustered index is created on the path column. This index will potentially support requests that are handled in a depth-first manner. One nonclustered index is created on (lvl, path) to potentially support requests that are handled in a breadth-first manner. Another nonclustered index is created on the nodeid attribute to enforce the primary key and to support requests for a particular node. Note that to follow best practices you would typically want to define UNIQUE constraints instead of explicitly creating unique indexes like I did. Of course, the UNIQUE constraints will be enforced behind the scenes with unique indexes, and in terms of performance discussions, which is our focus, it makes no difference.
Querying the Trees
Next, I demonstrate queries against trees and discuss their performance. To measure I/O cost, CPU time, and elapsed time, you need to turn on STATISTICS IO and STATISTICS TIME in your session by running the following code:
SET STATISTICS IO ON;
SET STATISTICS TIME ON;
Also, to make the comparisons between the solutions fair, clear the data cache before running each solution by running the following code:
CHECKPOINT;
DBCC DROPCLEANBUFFERS;
Efficient Handling of the Subtree Request
In this section I demonstrate requests for which the intuitive queries are actually handled very efficiently. Consider the subtree request in which you are given the ID of a node, and you need to return that node and all of its descendants. An example of such a request against an Employees hierarchy would be returning all subordinates—direct and indirect—of a given manager. With our sample data, you are supposed to write queries against Tree1 and Tree2 that return the subtree of some node (e.g., the one with node ID 11111). The desired outputs of the queries against Tree1 and Tree2 are shown in abbreviated forms in Figure 1 and Figure 2, respectively.