• subscribe
May 14, 2009 12:00 AM

How to Determine the Average Number of Days Between Orders or Other Important Events

Use the ROW_NUMBER function for fast queries
SQL Server Pro
InstantDoc ID #101924
Downloads
101924.zip

If you look at the data closely, you'll see that some orders occur on the same day. We needed to have only one order per customer per day because we wanted to measure the average number of days between orders. (You might choose to do this differently depending on the business problem and rules.) So, as callout B shows, I used the GROUP BY statement to get only one order per customer per day. I also eliminated the time portion from the datetime field. I then used a SELECT statement to get the results into a temporary table named #orders. At this point, the results looked like that in Figure 2.


Figure 2: Sample results from the code that makes sure there isn't more than one order from the same customer on any given day


The next step was to join the table to itself (i.e., join Copy1 of the #orders table to Copy2 of the #orders table) in such a way that consecutive orders could get matched, thus making date comparisons simple. The idea was to join the first order to the second order, the second order to the third order, and so on for each customer, as Figure 3 shows. The trick that turned this idea into reality was adding 1 to the RowNumber of Copy1 before joining it to Copy2. That way, a customer's first order is joined to his or her second order.


Figure 3: Matching consecutive orders to make date comparisons simple (click to enlarge)


With the consecutive orders matched, I used the DATEDIFF function to measure the time between them. The beauty of this approach is that you don't have to worry about the endpoints because they take care of themselves. The last order for a customer isn't joined to any order (e.g., customer 11000 order 3 in Copy1 isn't joined to any order in Copy2) and is therefore automatically dropped from the result set.

Callout C in Listing 1 highlights the code that joins the #orders table to itself, matches corresponding orders, and measures the days between those orders. As you can see, the resulting data set is put into the new #BtwnOrders table. If you were to select the records from #BtwnOrders by adding

SELECT * FROM #BtwnOrders

at the end of the code in callout C, you'd see results like that in Figure 4.


Figure 4: Sample results from running DaysBetweenOrders.sql in its entirety (click to enlarge)


The final step was to use the AVG function in a SELECT statement to calculate the average number of days between orders for each customer. Callout D in Listing 1 shows this code. When you run DaysBetweenOrders.sql in its entirety, you'll get results such as

CustomerKey Avg
11000 417
11001 530

I found that this solution took less than one second to run against the FactInternetSales table, which is a vast improvement over the four minutes required by the looping algorithm. When I ran this solution against 12 million records for 120,000 customers, it completed in just over two minutes.

You can download DaysBetweenOrders.sql by clicking the 101924.zip hotlink at the top of the page. Using DaysBetweenOrders.sql as a template, you can create your own solution that determines the average number of days between important events, such as customer orders, server failures, support calls, or website visits.



ARTICLE TOOLS

Comments
  • RICHARD
    3 years ago
    Jul 09, 2009

    The algorithm can be simplified significantly. All you need is the number of orders and the first and last order dates:

    AvgDaysBetweenOrders = (LastOrderDate - FirstOrderDate)/(NumberOfOrders - 1)

    This is very similar to what jcelko has proposed.

  • Joe
    3 years ago
    Jun 28, 2009

    Why not pick a window (start_date end_date) and use (size of window in days) / (# of orders in window) = (avg days between orders)? Ex: If you look at a 30 day window and you had 5 orders ANY TIME in the window, then you averaged one order every 6 days. Draw a time line on paper, represent your orders as counters on the line and it is easier to see.

  • Mike
    3 years ago
    May 28, 2009

    Awesome Article....I will be using this solution over and over again....Thanks

You must log on before posting a comment.

Are you a new visitor? Register Here