In the Reader to Reader article "T-SQL Statement Tracks Transaction-Less Dates," Saravanan Radhakrisnan presented a challenging task that I refer to as reporting on non-existent data. He had to write a T-SQL query that would determine which stores didn't have any transactions during a one-week period, but the table being queried included only the stores' IDs and the dates on which each store had a transaction. Listing 1 shows his solution.
Listing 1: The Original Query
SELECT st1.Store_ID, st2.NoTransactionDate
FROM (SELECT DISTINCT Store_ID FROM MySalesTable (NOLOCK)) AS st1,
(SELECT DISTINCT TransactionDate AS NoTransactionDate
FROM MySalesTable (NOLOCK)) AS st2
WHERE st2.NoTransactionDate NOT IN
(SELECT DISTINCT st3.TransactionDate FROM MySalesTable st3
(NOLOCK) WHERE st3.store_id = st1.store_id)
ORDER BY st1.store_id, st2.NoTransactionDate
GO
Although this query works, it has some shortcomings:
- As Radhakrisnan mentions, if none of the stores have a transaction on a certain day, the query won't return any results for that particular day for all the stores. So, for example, if all the stores were closed on a national holiday and therefore didn't have any sales, that day won't appear in the results.
- If a store doesn't have any sales for all the days in the specified period, that store won't appear in the results.
- The query uses T-SQL functionality that isn't recommended because of the poor performance it can cause. Specifically, the query uses three derived queries with the DISTINCT clause and the NOT IN construction. You can encounter performance problems when derived tables get too large to use indexes for query optimization.
I'd like to call attention to several different ways to work around these shortcomings.