I've been working on a large B2B Customer Relationship Management (CRM) program. One important Key Performance Indicator (KPI) we wanted to develop for the program was the average number of days between orders. We wanted this KPI to determine the customers' usual purchase patterns so we could answer questions such as "Do some customers purchase more often than the average customer?" and "Is the average time between purchases increasing?" (An increase could be a sign that the customer is at risk for leaving the program.)
Determining the average number of days between orders involves finding two consecutive orders from the same customer and calculating the number of days between them. The first solution I tried used a looping algorithm that worked something like this:
- Find the first order for a customer and save the order date in a temporary variable.
- Find the next order and save that order date in a second temporary variable.
- Find the difference between the two dates and save that value.
- Move the second temporary variable's value into the first temporary variable, retrieve the next record, and save that record's order date in the second temporary variable. Find the difference between the two dates and save that value.
- Continue looping through all the records for a customer, comparing the dates from consecutive orders until you encounter the next customer's records.
Not only were the algorithm queries complicated to write, they didn't perform very well. When I tested the algorithm against the FactInternetSales table, it took more than four minutes to complete on my server running SQL Server 2005 SP2. FactInternetSales has only around 60,000 records and 19,000 customers. I knew that the performance of this algorithm would only get worse as these numbers got larger.
I decided to try a different approach. I began by using the ROW_NUMBER function with an OVER clause to get a chronological list of each customer's orders. After I saved the output to a temporary table, I used an INNER JOIN operation to bring adjacent records together. Callout A in Listing 1 shows this query.
Listing 1: DaysBetweenOrders.sql |
 |
As Figure 1 shows, the query produces a list of orders sorted by date for each customer and an additional column that specifies the sequence of each order in each customer list.
Figure 1: Sample results from the query that produces a chronological list of each customer's orders |
 |