| Executive Summary: SQL Server 2008’s new HIERARCHYID data type lets you maintain and query hierarchical data. Learn how to define and index the table that will hold the hierarchy data, how to insert new nodes into the table, and how to query the hierarchy. |
SQL Server 2008 introduces the
HIERARCHYID data type,
which is designed to support
storing and manipulating data representing a hierarchy.
Examples of hierarchies that you might need to
represent in a SQL Server database include employee
organizational charts, folder lists, forum discussions, and
so on. This article is the first in a two-part series about
the new HIERARCHYID data type. This month I’ll
introduce you to the new data type, and I’ll explain how
to define and index the table that will hold the hierarchy
data, how to insert new nodes into the table, and how
to query the hierarchy. Next month I’ll explain how to
handle changes in the hierarchy (e.g., reparenting nodes)
and how to convert a traditional parent-child hierarchy
(aka an adjacency list) to a representation with the new
HIERARCHYID data type.
Creating and Populating the Hierarchy
The HIERARCHYID data type provides the means
to store and manipulate data representing a hierarchy.
HIERARCHYID is a system-supplied CLR type with
a set of methods that allow manipulation of the type.
The methods supported by the type include GetLevel,
GetRoot, GetDescendant, ToString, Parse, IsDescendantOf,
GetAncestor, GetReparentedValue, Read, and
Write. I’ll describe some of the methods this month
and some next month.
The type internally holds a binary value as large as
892 bytes that represents the path of ancestors leading
to the current node. The value positions the current
node among other nodes in the hierarchy—both in
terms of parent-child positioning, and in terms of
positioning among siblings. For example, if you use the
HIERARCHYID data type to represent a hierarchy
of employees, the value holds the path of all managers
leading to the current node; the value represents the
correct manager-subordinate relationships, and even
the correct position among direct subordinates of the
same manager, in case there’s a need for it.
The HIERARCHYID type provides topological
sorting—a term taken from graph theory—meaning
that the value of a node is greater than the values of
all ancestors of the node and less than the values of
all descendants of the node. The fact that the type provides
topological sorting means that when sorting by
HIERARCHYID values, a child would always appear
in the output after the parent.
I’ll use a hierarchy of employees in my examples. Run
the code in Web Listing 1 (www.sqlmag.com, InstantDoc
ID 99036) to create the Employees table, as well as
indexes on the table to support querying the hierarchy.
The hid column in the Employees table (see Web Table 1)
is of a HIERARCHYID data type, and it represents the
position of the current employee in the hierarchy. Notice
the computed persisted column called lvl that is defined
by a call to the GetLevel method of the hid column.
The GetLevel method returns the level of the node in
the hierarchy, starting with 0 for the root, 1 for the direct
subordinates of the root, and so on.
Besides creating the Employees table,
Web Listing 1 also contains statements that
create two indexes to support querying the
hierarchy; the first is a unique clustered index
created on the hid column and the second is
a unique nonclustered index created on the lvl and hid
columns. These indexes implement two indexing strategies
called depth-first and breadth-first, respectively.
The depth-first index organizes all members of a subtree
(manager and all subordinates in all levels) close to
each other; therefore it will support requests to return
a whole subtree. This index will also support requests
to sort the hierarchy by the hid column (in topological
order). The breadth-first index organizes all members
of the same level close to each other; therefore it will
support requests for siblings from the same level (e.g.,
a request for all direct subordinates of a manager). Later in the article, I’ll demonstrate such requests and
the way they are optimized.
To insert a new row into the Employees table representing
a new employee in the hierarchy, you need to
generate a HIERARCHYID value for the hid column.
The value should represent the correct position of
the employee in the hierarchy. You can use methods
of the type to help in this task. For the root of the
hierarchy—the big boss or the CEO—you can use the
HIERARCHYID::GetRoot static method. This method
simply produces a HIERARCHYID value that is internally
an empty binary string representing the root of the
tree. If the new node you are adding is not the root, you
can produce the HIERARCHYID value for the new
node by invoking the method GetDescendant on the
hid value of the manager of the new node. This method
accepts two HIERARCHYID inputs; I’ll call them
@lft and @rgt. If both values are not NULL, the
method produces a new
value positioned between
the two. If both are
NULL, the method produces
a value simply below
the manager at hand. If
@lft is not NULL and
@rgt is NULL, the method
produces a value greater
than @lft. Finally, if
@lft is NULL and @rgt
is not NULL, the method
produces a value smaller
than @rgt.
Run the code in Web
Listing 2 to create the usp_
AddEmp procedure that is
in charge of adding a new
node into the hierarchy. If
the new employee is the
root employee, the procedure
uses the HIERARCHYID::
GetRoot static
method to produce the
HIERARCHYID value
for the new employee. If
the new employee is not
the root, the procedure
first obtains the hid value
of the new employee’s
manager, as well as the
maximum hid value
among the direct subordinates
of that manager. The
procedure then uses the
GetDescendant method
to produce the hid value
of the new employee as
the last node under the given manager.
To populate the Employees table with sample data,
run the code in Web Listing 3. As long as the hierarchy
doesn’t contain a large number of levels, the HIERARCHYID
values are quite economical.
The HIERARCHYID data type also exposes a
method called ToString that produces a canonical
string representation of the hierarchical value as a
path using a slash as a separator between levels. Unlike
the raw binary representation of a HIERARCHYID
value, the string representation produced by ToString
is more meaningful to the naked eye. For example, the
ToString method returns the string ‘/2/1/1/3/’ for the
HIERARCHYID value 0x6AD6F0. You can easily
see in the string representation of the value that the
node is positioned at level 4, and that it’s the child of
the parent node with the path ‘/2/1/1/’. To convert a
given canonical string representation of a hierarchical
value to HIERARCHYID, you can either use the
static method HIERARCHYID::Parse, or simply use
the CAST or CONVERT functions. For example, the
expression CAST('/2/1/1/3/' AS HIERARCHYID)
returns the HIERARCHYID value 0x6AD6F0.
To return the current contents of the Employees
table, along with the logical string representations of
the hid values, run the following query:
SELECT hid, hid.ToString() AS path, lvl, empid, empname,
salary
FROM dbo.Employees
ORDER BY hid;
This query returns the output shown in Web Table 1.
Querying the Hierarchy
Now that I’ve covered how to create and populate the
hierarchy, I’ll explain how to query the hierarchy to
answer common requests such as sorting the hierarchy,
and returning a subtree, a path, direct subordinates,
and so on.
Remember that earlier I mentioned that the HIERARCHYID
type provides topological sorting. So if you
want to sort the employees such that a subordinate will be
returned after a manager, you can simply sort by the hid
column. You can use the lvl column to produce indentation,
replicate some string lvl times, and concatenate the
employee name. The following query demonstrates both
sorting topologically and producing indentation:
SELECT empid, REPLICATE(‘ | ‘, lvl) + empname AS emp
FROM dbo.Employees
ORDER BY hid;
Continue on Page 2
Prev. page  
[1]
2
next page