If you look at the data closely, you'll see that some orders occur on the same day. We needed to have only one order per customer per day because we wanted to measure the average number of days between orders. (You might choose to do this differently depending on the business problem and rules.) So, as callout B shows, I used the GROUP BY statement to get only one order per customer per day. I also eliminated the time portion from the datetime field. I then used a SELECT statement to get the results into a temporary table named #orders. At this point, the results looked like that in Figure 2.
Figure 2: Sample results from the code that makes sure there isn't more than one order from the same customer on any given day |
 |
The next step was to join the table to itself (i.e., join Copy1 of the #orders table to Copy2 of the #orders table) in such a way that consecutive orders could get matched, thus making date comparisons simple. The idea was to join the first order to the second order, the second order to the third order, and so on for each customer, as Figure 3 shows. The trick that turned this idea into reality was adding 1 to the RowNumber of Copy1 before joining it to Copy2. That way, a customer's first order is joined to his or her second order.
Figure 3: Matching consecutive orders to make date comparisons simple (click to enlarge) |
 |
With the consecutive orders matched, I used the DATEDIFF function to measure the time between them. The beauty of this approach is that you don't have to worry about the endpoints because they take care of themselves. The last order for a customer isn't joined to any order (e.g., customer 11000 order 3 in Copy1 isn't joined to any order in Copy2) and is therefore automatically dropped from the result set.
Callout C in Listing 1 highlights the code that joins the #orders table to itself, matches corresponding orders, and measures the days between those orders. As you can see, the resulting data set is put into the new #BtwnOrders table. If you were to select the records from #BtwnOrders by adding
SELECT * FROM #BtwnOrders
at the end of the code in callout C, you'd see results like that in Figure 4.
Figure 4: Sample results from running DaysBetweenOrders.sql in its entirety (click to enlarge) |
 |
The final step was to use the AVG function in a SELECT statement to calculate the average number of days between orders for each customer. Callout D in Listing 1 shows this code. When you run DaysBetweenOrders.sql in its entirety, you'll get results such as
| CustomerKey |
Avg |
| 11000 |
417 |
| 11001 |
530 |
I found that this solution took less than one second to run against the FactInternetSales table, which is a vast improvement over the four minutes required by the looping algorithm. When I ran this solution against 12 million records for 120,000 customers, it completed in just over two minutes.
You can download DaysBetweenOrders.sql by clicking the 101924.zip hotlink at the top of the page. Using DaysBetweenOrders.sql as a template, you can create your own solution that determines the average number of days between important events, such as customer orders, server failures, support calls, or website visits.