Recursive stored procedures populate a tree from a SQL Server table
Storing and representing data in a hierarchical, or tree, form is a common requirement in database development and an elegant solution for many real-world programming problems. T-SQL provides a powerful way of rendering the rows of a resultset in tree form (i.e., indented to resemble a tree). Relational database management systems (RDBMSs) such as SQL Server inherently produce resultsets as a list of rows, but you can produce the same list as a treeif you set up the correct schema and relationships. Here, we develop three stored procedures to build a tree from a SQL Server table. The client application calls one of the stored procedures, which uses the other two as needed. Then, an Active Server Pages (ASP) or HTML page uses the output from the stored procedures to produce an HTML list box without requiring any client processing.
Our T-SQL tree solution uses stored procedures, cursors, and global temporary tables (the names of which begin with two pound symbols##). One of our examples also uses ASP or HTML to interface with the SQL Server output, although a Visual Basic (VB) or Visual C++ (VC++) client can use the same output. We used SQL Server Enterprise Manager to design the database schema and Microsoft Visual InterDev 6.0 to develop the stored procedures and the ASP page. (SQL Server Query Analyzer and Notepad are possible alternatives to Enterprise Manager and Visual InterDev, respectively.)
SQL Server features let you provide robust solutions to many complex problems. Tree generation is only one example of what you can do. We use the following SQL Server features in this article:
- storing hierarchical data in a table structure or schema
- using recursion in T-SQL stored procedures
- using a global temporary table
- making a solution data-tier scalable by using SQL Server's system process ID (SPID)
- employing CURSOR OUTPUT VARYING to use the output resultset from one stored procedure in the parent stored procedure
- using the sp_executesql system stored procedure to execute queries constructed on the fly
Data Structure Must Support a Tree
To present a resultset as a tree, your schema or table structure must let you process the data in hierarchical, or tree, form. The key to creating a tree is to define the parent-child relationship with no limit on the number of child, or nested, levels possible. For example, let's look at a simple project plan: A project contains n main activities, main activities have subactivities, and subactivities have further subactivities (or sub-subactivities). Consider the schema diagram in Figure 1. The following code creates the project activity table:
CREATE TABLE project_activity_schedule (
activity_code VARCHAR(10) NOT NULL PRIMARY KEY,
activity_description VARCHAR(255) NOT NULL,
planned_start_date datetime NULL,
parent_activity_code VARCHAR(10) NULL
)
A unique code in the activity_code column identifies each activity in the project. The activity_description and planned_start_date columns provide more information about each activity. But the column that converts the table's structure from a simple list-type structure to a hierarchical structure is parent_activity_code. This column's type and width are the same as those of activity_code.
The parent_activity_code column uses a foreign key to reference the activity_code column in the same table; thus, activity_code and parent_activity_code have a recursive relationship, which the pipe in the upper left corner of Figure 1 depicts. Therefore, each activity that has a parent activity has that parent activity's code in the activity's parent_activity_code column. Activities without a parent have NULL in the parent_activity_code column. You can use Enterprise Manager's Database Designer to create this recursive relationship, or you can use the following statements after you've created the table:
ALTER TABLE project_activity_schedule
ADD CONSTRAINT fk_parent_child_activity
FOREIGN KEY (parent_activity_code) REFERENCES
project_activity_schedule(activity_code)
The activity_code's data type depends on the project requirements. For simplicity, we stored only two attributes for each activitythe description and the start datebut you can use more attributes if you want.
Let's determine whether the structure of the data set in Table 1 satisfies the requirement for a tree with unlimited levels. (Although the schema provides unlimited nesting levels, SQL Server imposes a limit of 32 levels for nested stored procedures.) Of the three root-level activities, or nodes, A1 has no subactivities, A2 contains three subactivities, and A3 has two subactivities. These subactivities have A2 and A3, respectively, as their parent_activity_code. You can further extend this structure to 32 levels by specifying the appropriate parent code. Now, let's create the three stored procedures required to build the tree.
The Three Stored Procedures
The client application calls the topmost stored procedure, tsp_prj_activity_tree_list (Tree List), to create a unique, global temporary table. Web Listing 1 shows the Tree List procedure (for download instructions, see the More on the Web box, page 56). Then, the Tree List procedure calls the tsp_prj_activity_tree_build procedure (Build), which actually builds the tree by using recursion and feeds the tree into the temporary table. The Build procedure calls the tsp_prj_activity_child_list procedure (Child List), which issues a query to select rows from the table and returns the rows to the Build procedure. (Figure 2 shows the relationships among the three stored procedures.) Finally, the Tree List procedure extracts the tree from the temporary table and drops the table.
Why create the table by using sp_executesql rather than the CREATE TABLE statement? The answer lies in the word unique. Because you can invoke multiple instances of this procedure at one time, using only one table would lead to errors. All procedures would try to create the same table and feed data into it. Either the different procedures would fail at creation because the table already existed, or when one procedure terminated, it would drop the table, making all the other simultaneous procedures fail. Also, the data fed into the table wouldn't make any sense if the data came from various procedures because its context would be missing. Thus, each concurrent instance of the procedure needs its own unique table.
You can create unique tables if you add @@SPID to the static table namein this case, ##prj_activity_tree_tableso that the final tables' names are similar to ##prj_activity_tree_table1 and ##prj_activity_tree_table2. SQL Server assigns a unique SPID to each logged-on session. You can use the @@SPID global variable to obtain this ID's value, which is an integer. Therefore, for each user who makes a connection and invokes the procedure, we add this SPID to the table name, making the name unique. Because you don't know this part of the table name beforehand, you must construct the query on the fly as a Unicode string, then use sp_executesql to execute the query.
Using a global temporary table and SPID lets the table persist across stored procedures and support simultaneous multiple instances in each stored procedure. (For information about the differences between local and global temporary tables, see the sidebar "Temporary Tables: Local vs. Global.") To get a list of all current connections and their associated SPIDs, you can run the sp_who system stored procedure. (For more information about SPIDs, see SQL Server Books OnlineBOL.)
You use @gtemp_table_name to store the table name and @sql_statement_string to store the queries that you use to create the table, select records from it, and drop it. You enclose the entire operation between the BEGIN TRANSACTION and COMMIT TRANSACTION statements to ensure that the operation remains independent. Because the operation spans stored procedures and involves multiple WRITE operations, such as CREATE TABLE and DROP TABLE, you should ensure that WRITE operations either succeed or fail completely (i.e., that the procedure leaves no operation incomplete because of errors). Transaction control isn't mandatory here because SQL Server has an automatic mechanism for cleaning up global and local temporary tables. However, good programming practice dictates that you ensure completion in such contexts, and when you extend this example to build an expanding-collapsing tree, you must also maintain state, which involves a WRITE operation.