• subscribe
October 29, 2008 12:00 AM

Normalizing HIERARCHYID Values

SQL Server Pro
InstantDoc ID #100646

When using the HIERARCHYID datatype to represent graphs, in certain cases the values can become long. With very deep graphs this is natural, since the HIERARCHYID value represents a path of all nodes leading to the current starting with the root node. However, in certain cases even when the graph is not very deep, the path can become long. First I’ll explain the circumstances in which this can happen, and then I’ll provide a solution to normalizing the values, making them shorter.

HIERARCHYID values can become long when you keep adding new nodes in between two existing nodes that have consecutive values in the last numeric element in their canonical path. As an example, say you have two nodes with the following canonical paths: /1/ and /2/, and you add a node between them. You get a new value whose canonical path is /1.1/. Now add a value between /1.1/ and /2/ and you get /1.2/. Now add a value between /1.1/ and /1.2/ and you get /1.1.1/. As you can realize, if you keep adding nodes in between two existing nodes in this manner, you can get lengthy paths that represent lengthy HIERARCHYID values even when the graph is not deep.

If order among siblings is not important, you can add child nodes always after the last child, or always before the first, and this way, the paths will be more economic. But when order among siblings matters, you can’t control this. If you frequently add new nodes between existing ones you may end up with very long HIERARCHYID values. In such a case, you can periodically run a procedure that I will provide here that normalizes the HIERARCHYID values for the whole graph, making them shorter.

Uses the following code to create a table called Employees and populate it with sample data:

USE tempdb;

CREATE TABLE dbo.Employees

(

  empid   INT NOT NULL,

  hid     HIERARCHYID NOT NULL,

  lvl AS hid.GetLevel() PERSISTED,

  empname VARCHAR(25) NOT NULL

);

 

CREATE UNIQUE CLUSTERED INDEX idx_depth_first ON dbo.Employees(hid);

CREATE UNIQUE INDEX idx_breadth_first ON dbo.Employees(lvl, hid);

CREATE UNIQUE INDEX idx_empid ON dbo.Employees(empid);

GO

 

CREATE PROC dbo.AddEmp

  @empid      AS INT,

  @mgrid      AS INT,

  @leftempid  AS INT,

  @rightempid AS INT, 

  @empname    AS VARCHAR(25)

AS

 

DECLARE @hid AS HIERARCHYID;

 

IF @mgrid IS NULL

  SET @hid = HIERARCHYID::GetRoot();

ELSE

  SET @hid = (SELECT hid FROM dbo.Employees WHERE empid = @mgrid).GetDescendant

    ( (SELECT hid FROM dbo.Employees WHERE empid = @leftempid),

      (SELECT hid FROM dbo.Employees WHERE empid = @rightempid) );

 

INSERT INTO dbo.Employees(empid, hid, empname)

  VALUES(@empid, @hid, @empname);

GO

 

EXEC dbo.AddEmp @empid =  1, @mgrid = NULL, @leftempid = NULL, @rightempid = NULL, @empname = 'A';

EXEC dbo.AddEmp @empid =  2, @mgrid =    1, @leftempid = NULL, @rightempid = NULL, @empname = 'B';

EXEC dbo.AddEmp @empid =  3, @mgrid =    1, @leftempid =    2, @rightempid = NULL, @empname = 'C';

EXEC dbo.AddEmp @empid =  4, @mgrid =    1, @leftempid =    2, @rightempid =    3, @empname = 'D';

EXEC dbo.AddEmp @empid =  5, @mgrid =    1, @leftempid =    4, @rightempid =    3, @empname = 'E';

EXEC dbo.AddEmp @empid =  6, @mgrid =    1, @leftempid =    4, @rightempid =    5, @empname = 'F';

EXEC dbo.AddEmp @empid =  7, @mgrid =    1, @leftempid =    6, @rightempid =    5, @empname = 'G';

EXEC dbo.AddEmp @empid =  8, @mgrid =    1, @leftempid =    6, @rightempid =    7, @empname = 'H';

EXEC dbo.AddEmp @empid =  9, @mgrid =    8, @leftempid = NULL, @rightempid = NULL, @empname = 'I';

EXEC dbo.AddEmp @empid = 10, @mgrid =    8, @leftempid =    9, @rightempid = NULL, @empname = 'J';

EXEC dbo.AddEmp @empid = 11, @mgrid =    8, @leftempid =    9, @rightempid =   10, @empname = 'K';

EXEC dbo.AddEmp @empid = 12, @mgrid =    8, @leftempid =   11, @rightempid =   10, @empname = 'J';

EXEC dbo.AddEmp @empid = 13, @mgrid =    8, @leftempid =   11, @rightempid =   12, @empname = 'L';

EXEC dbo.AddEmp @empid = 14, @mgrid =    8, @leftempid =   13, @rightempid =   12, @empname = 'M';

EXEC dbo.AddEmp @empid = 15, @mgrid =    8, @leftempid =   13, @rightempid =   14, @empname = 'N';

EXEC dbo.AddEmp @empid = 16, @mgrid =    8, @leftempid =   15, @rightempid =   14, @empname = 'O';

EXEC dbo.AddEmp @empid = 17, @mgrid =    8, @leftempid =   15, @rightempid =   16, @empname = 'P';

EXEC dbo.AddEmp @empid = 18, @mgrid =    8, @leftempid =   17, @rightempid =   16, @empname = 'Q';

EXEC dbo.AddEmp @empid = 19, @mgrid =    8, @leftempid =   17, @rightempid =   18, @empname = 'E';

EXEC dbo.AddEmp @empid = 20, @mgrid =    8, @leftempid =   19, @rightempid =   18, @empname = 'S';

EXEC dbo.AddEmp @empid = 21, @mgrid =    8, @leftempid =   19, @rightempid =   20, @empname = 'T';

Then run the following code to show the current HIERARCHYID values and their canonical paths:

SELECT

  empid,

  REPLICATE(' | ', lvl) + empname AS emp,

  hid,

  hid.ToString() AS path

FROM dbo.Employees

ORDER BY hid;

You get the following output:

empid  emp      hid               path                  

------ -------- ----------------- -----------------------

1      A        0x                /

2       | B     0x58              /1/

4       | D     0x62C0            /1.1/

6       | F     0x6316            /1.1.1/

8       | H     0x6318B0          /1.1.1.1/

9       |  | I  0x6318B580        /1.1.1.1/1/

11      |  | K  0x6318B62C        /1.1.1.1/1.1/

13      |  | L  0x6318B63160      /1.1.1.1/1.1.1/

15      |  | N  0x6318B6318B      /1.1.1.1/1.1.1.1/

17      |  | P  0x6318B6318C58    /1.1.1.1/1.1.1.1.1/

19      |  | E  0x6318B6318C62C0  /1.1.1.1/1.1.1.1.1.1/

21      |  | T  0x6318B6318C6316  /1.1.1.1/1.1.1.1.1.1.1/

20      |  | S  0x6318B6318C6340  /1.1.1.1/1.1.1.1.1.2/

18      |  | Q  0x6318B6318C68    /1.1.1.1/1.1.1.1.2/

16      |  | O  0x6318B6318D      /1.1.1.1/1.1.1.2/

As you can see, even though the tree is only three levels deep, some of the HIERARCHYID values became quite long due to the insertion order of siblings.

The solution that normalizes the values involves the following steps:

·         Define a CTE called EmpsRN that calculates for each node a row number partitioned by parent ordered by current hid value.

·         Define a recursive CTE called EmpsPath that iterates through the levels of the tree, starting with the root node, and proceeding to the next level of children in each iteration. Use this CTE to construct a new canonical path for the nodes. The root should be assigned with ‘/’, and for each node in the next level concatenate the parent’s path, the current node’s row number that represents its position under the parent, and a ‘/’.

·         Join the Employees table with the EmpsPath CTE, and overwrite the existing hid values with the new ones that are converted from the new canonical paths.

Here’s the code that performs this normalization process:

WITH EmpsRN AS

(

  SELECT

    empid,

    hid,

    ROW_NUMBER() OVER(PARTITION BY hid.GetAncestor(1) ORDER BY hid) AS rownum

  FROM dbo.Employees

),

EmpsPath AS

(

  SELECT empid, hid, CAST('/' AS VARCHAR(900)) AS path

  FROM dbo.Employees

  WHERE hid = HIERARCHYID::GetRoot()

 

  UNION ALL

 

  SELECT C.empid, C.hid,

    CAST(P.path + CAST(C.rownum AS VARCHAR(10)) + '/' AS VARCHAR(900))

  FROM EmpsPath AS P

    JOIN EmpsRN AS C

      ON C.hid.GetAncestor(1) = P.hid

)

UPDATE E

  SET hid = CAST(EP.path AS HIERARCHYID)

FROM dbo.Employees AS E

  JOIN EmpsPath AS EP

    ON E.empid = EP.empid;

Now query the data again after normalization:

SELECT

  empid,

  REPLICATE(' | ', lvl) + empname AS emp,

  hid,

  hid.ToString() AS path

FROM dbo.Employees

ORDER BY hid;

And you get nice and compact values:

empid  emp      hid     path  

------ -------- ------- ------

1      A        0x      /

2       | B     0x58    /1/

4       | D     0x68    /2/

6       | F     0x78    /3/

8       | H     0x84    /4/

9       |  | I  0x8560  /4/1/

11      |  | K  0x85A0  /4/2/

13      |  | L  0x85E0  /4/3/

15      |  | N  0x8610  /4/4/

17      |  | P  0x8630  /4/5/

19      |  | E  0x8650  /4/6/

21      |  | T  0x8670  /4/7/

20      |  | S  0x8688  /4/8/

18      |  | Q  0x8698  /4/9/

16      |  | O  0x86A8  /4/10/

Cheers,

BG

 



ARTICLE TOOLS

Comments
  • meganbearly
    3 years ago
    Feb 19, 2009

    Thanks for your question Balaji!

    I passed your question on to Itzik Ben-Gan. You can read his response below:

    "Hi Balaji,

    One thing that comes to mind that can help is to produce values with gaps when normalizing. That is, instead of using consecutive row numbers, multiply the row numbers by some factor.

    This would allow producing values in between existing ones, so it would mitigate the problem in between normalization processes.


    Cheers,

    Itzik"


    Megan Keller

    Associate Editor, SQL Server Magazine

    mkeller@sqlmag.com

  • Balaji
    3 years ago
    Jan 31, 2009

    An Eye opener in optimizing hierarchyid usage!

    But, isn't it going suffer the flexibility of inserting members inbetween e.g., EXEC dbo.AddEmp @empid = 22, @mgrid = 8, @leftempid = 19, @rightempid = 21, @empname = 'X';
    would add the following which doesn't disturb any other path below it
    empid emp hid path
    22 | | X 0x866580 /4/6.1/

    But if it were to follow the structure post normalization all other rows needs to be updated. Is there any remedy to this?

    Thanks
    Balaji

You must log on before posting a comment.

Are you a new visitor? Register Here
  • SP1?
    I know there is a SP1 for SQL 2008 R2 available....and there is a "feature pack" as well... ...
  • SQL database mirroring
    I have SQL Server 2008 R2 Enterprise 64bit on Windows 2008 R2 Enterprise 64bit.  Each SQL Server has...
  • Dell Compellent Disk Drive
    Does anybody has experience with Dell Compellent Disk Drive? Basically, this system manages all disk...
  • Sql server performance tuning
    I need to find a tool that help me to optimize sql server,queries,improve the performance and solve ...