• subscribe
March 21, 2001 12:00 AM

Sorting Techniques

SQL Server Pro
InstantDoc ID #19737
Downloads
19737.zip

NULLs, UNION ALL, and random sorting

Editor's Note:Send your experts-only T-SQL tips to Itzik Ben-Gan at blackbelt@sqlmag.com. If we use your tip in the magazine, you'll receive $100 and an exclusive T-SQL Black Belt shirt.

Sorting isn't always as simple as it seems. From time to time, T-SQL programmers receive requests for output sorted in a unique or unusual way and must handle both the logical aspects and the performance implications of those requests. For example, whenever you provide a sorting solution that involves NULLs, you must determine how to handle them. The sorting ideas of Alexey Ruban—a software engineer, developer, and DBA from Ukraine—and SQL Server Most Valuable Professionals (MVPs) Fernando G. Guerrero and Tibor Karaszi motivated me to write this article.

Getting NULLs to Sort High
NULLs are special, no doubt. Like some children, they seem to require special care and attention. Sorting output by columns that allow NULLs is no exception. NULL stands for either an unknown or an irrelevant value. So, in general, you can't assume that one NULL is the same as another. For example, to find rows that have NULLs in a specific column, you might issue the following query:

SELECT * FROM T1 WHERE colwithnulls = NULL

However, according to the ANSI standard, you won't get any rows in the output, even if the colwithnulls column contains some NULLs. To get all rows that contain NULLs in a certain column, you must use the IS NULL clause instead of = NULL. Or, you can turn off the ANSI_NULLS session option so that SQL Server treats NULL comparisons that use an equals sign as if you had specified IS NULL.

Regardless of the ANSI_NULLS setting, certain operations (e.g., the UNIQUE constraint in SQL Server, GROUP BY, ORDER BY) treat NULLs as if they were equal to one another. The ANSI standard specifies that NULLs should sort into one group as if they were equal, but it doesn't specify whether NULLs should sort lower or higher than actual values (i.e., whether they should appear at the beginning or the end of the output). Therefore, different ANSI-compliant relational database management systems (RDBMSs) implement NULL sorting differently—some placing NULLS at the end and some placing them at the beginning.

SQL Server sorts NULLs as if they're the lowest value. For example, let's select all rows from the Publishers table in the Pubs database, ordering by state. Then, use a new SQL Server 2000 optimization technique, which lets you create indexes on computed columns, to put an index on the state column:

CREATE INDEX idx_nci_state ON Publishers(state)
SELECT * FROM Publishers
ORDER BY state

Figure 1 shows this query's execution plan. The query performs an ordered index scan on the idx_nci_state index you created. This scan, which improves the query's performance, prevents an explicit sort operation from taking place because the index is already sorted by the state column. Note that the Publishers table is very small. If the table were bigger, the optimizer might choose not to use an index. Using an index involves visiting a page to look up the data row for each key in the index, and the I/O involved might cost more than simply scanning the table and performing an explicit sort. Nevertheless, for this demonstration, the Publishers table's size is convenient.

Suppose you want to sort NULLs high. You can use a CASE expression in the ORDER BY clause:

SELECT * FROM Publishers
ORDER BY CASE
           WHEN state IS NULL THEN 1
		   ELSE 0
		   END, state

The first value in the ORDER BY clause will be either 1 (when state is NULL) or 0 (when state has an actual value); this value determines the output's primary order. The values in the state column determine the output's secondary order so that you get the desired order for the rows in the output. Note that the ANSI-92 standard specifies that the ORDER BY clause can use only expressions that appear in the SELECT list, but ANSI-99 removed this limitation. SQL Server has never had that restriction.

Figure 2 shows the execution plan for the above query. Notice that a performance problem exists. The query performed an explicit sort operation based on the 1 and 0 values that the compute scalar operation calculated. You can't use the state column index here because the state column doesn't determine the output's primary order; a computation based on the state column does. If you use SQL Server 2000, you can handle this problem elegantly. Add the following computed column to the Publishers table:

ALTER TABLE Publishers
 ADD statenullslast AS CASE
 	WHEN state IS NULL THEN 1
	ELSE 0
	END

Create a composite index on the statenullslast and state columns:

CREATE INDEX idx_nci_statenullslast ON Publishers(statenullslast, state)

Now, try the following query:

SELECT pub_id, pub_name, city, state, country
FROM Publishers
ORDER BY statenullslast, state

Figure 3 shows the execution plan for this query. Note that the plan uses the new index instead of performing an explicit sort operation.



ARTICLE TOOLS

Comments
    There are no comments to display. Be the first one!
You must log on before posting a comment.

Are you a new visitor? Register Here
  • SP1?
    I know there is a SP1 for SQL 2008 R2 available....and there is a "feature pack" as well... ...
  • SQL database mirroring
    I have SQL Server 2008 R2 Enterprise 64bit on Windows 2008 R2 Enterprise 64bit.  Each SQL Server has...
  • Dell Compellent Disk Drive
    Does anybody has experience with Dell Compellent Disk Drive? Basically, this system manages all disk...
  • Sql server performance tuning
    I need to find a tool that help me to optimize sql server,queries,improve the performance and solve ...