• subscribe

TSQL Challenge – Reoccurring Visits


Posted @ 6/29/2011 9:44 AM By Itzik Ben-Gan

 

This is a nice little puzzle that I got some time ago from a friend and colleague, Eladio Rincón. It’s not a very difficult challenge, but I find it to be a fun one to work on, so I hope you will enjoy it too.

Suppose that you record data about people’s daily visits to a website in a table called DailyVisits. Here’s code to create the DailyVisits table and populate it with some sample data:

-- DDL and sample data for DailyVisits table

SET NOCOUNT ON;

USE tempdb;

IF OBJECT_ID('dbo.DailyVisits', 'U') IS NOT NULL

  DROP TABLE dbo.DailyVisits;

GO

CREATE TABLE dbo.DailyVisits

(

  dt      DATE        NOT NULL,

  visitor VARCHAR(10) NOT NULL,

  CONSTRAINT PK_DailyVisits PRIMARY KEY(dt, visitor)

);

INSERT INTO dbo.DailyVisits(dt, visitor) VALUES

     ('20110601', 'A'),

     ('20110601', 'B'),

     ('20110601', 'C'),

--

     ('20110602', 'A'),

     ('20110602', 'C'),

--

     ('20110603', 'A'),

     ('20110603', 'D'),

--

--

     ('20110607', 'A'),

     ('20110607', 'D'),

--

     ('20110608', 'D'),

     ('20110608', 'E'),

     ('20110608', 'F');

SELECT *

FROM dbo.DailyVisits;

dt         visitor

---------- ----------

2011-06-01 A

2011-06-01 B

2011-06-01 C

2011-06-02 A

2011-06-02 C

2011-06-03 A

2011-06-03 D

2011-06-07 A

2011-06-07 D

2011-06-08 D

2011-06-08 E

2011-06-08 F

 

Observe that there’s a key defined on dt and visitor, meaning that you store only one occurrence at most per visitor and day. Also note that there is a possibility that during some days there would be no visitors.

The challenge is to write a query that works with two input arguments @from and @to holding the start and end dates of a date range, and calculates, for each day, various statistics in respect to the previous day. Specifically, how many visitors visited the site that day, how many new visitors were added compared to the previous day, how many visitors were removed compared to the previous day, and how many remained. For the given sample data, the desired result should look like this:

dt         numvisits   added       removed     remained

---------- ----------- ----------- ----------- -----------

2011-06-01 3           3           0           0

2011-06-02 2           0           1           2

2011-06-03 2           1           1           1

2011-06-04 0           0           2           0

2011-06-05 0           0           0           0

2011-06-06 0           0           0           0

2011-06-07 2           2           0           0

2011-06-08 3           2           1           1

 

Feel free to post your solution as a comment here. I’ll post an entry next week with my solution.

Good luck!

BG

Related Content:

Comments

Add A Comment
  • Posted @ July 06, 2011 11:47 AM by SwePeso

    Here is my stats with 100,000+ row sample data using same date interval as original

    (8 row(s) affected)
    Table 'Worktable'. Scan count 0, logical reads 0.
    Table 'DailyVisits'. Scan count 2, logical reads 10.

  • Posted @ July 06, 2011 09:48 AM by simran

    I totally spaced out Pesomannen teaser which definitely is challenging. Studiomoderna's first solution's Statistics IO show's Table 'DailyVisits'. Scan count 3, logical reads 6. We can bring it down to Table 'DailyVisits'. Scan count 2, logical reads 4 by changing the logic in the CTE to not use DailyVisits table. All other alternative solutions that I have tried have Scan count 2, logical reads 4 as I am selecting from DailyVisits atleast twice. Having said that, I am very curious to see your solution Pesomannen. It would be some good learning. Thanks

  • Posted @ July 06, 2011 12:51 AM by SwePeso

    Studiomoderna, give it a new try.
    Think about it, is there a way to minimize the number of joins?
    Is there another way to get rid of the distinct aggregate?

  • Posted @ July 05, 2011 04:52 AM by studiomoderna

    two solutions:

    -- SOLUTION 1 Tomaz Kastrun (tomaz.kastrun@studio-moderna.com)
    ;with dt_cte as
    -- insert empty/missing dates
    (select
    min(dt) as dt_
    ,max(dt) as dt_m
    from dailyvisits
    union all
    select
    dt_+1 as dt_
    ,dt_m
    from dt_cte
    where
    dt_+1 <= dt_m)
    -- calculating added, removed and remained users
    select
    date_.dt_ as date
    ,count(distinct dv.visitor) as numvisits
    ,count(distinct dv.visitor)-(sum( case when (dv1.visitor = dv.visitor and dv.visitor is not null and dv1.visitor is not null) then 1 else 0 end)) as added
    ,count(distinct dv1.visitor) - sum(case when dv.visitor = dv1.visitor then 1 else 0 end) as removed
    ,sum(case when dv.visitor = dv1.visitor then 1 else 0 end) as remained

    from
    dt_cte as date_
    left hash join dailyvisits as dv
    on date_.dt_ = dv.dt
    left hash join dailyvisits as dv1
    on date_.dt_ = dateadd(day,1,dv1.dt)
    group by date_.dt_



    -- SOLUTION 2 Tomaz Kastrun
    -- calculating added, removed and remained users
    select
    date_.dt_ as date
    ,count(distinct dv.visitor) as numvisits
    ,count(distinct dv.visitor)-(sum( case when (dv1.visitor = dv.visitor and dv.visitor is not null and dv1.visitor is not null) then 1 else 0 end)) as added
    ,count(distinct dv1.visitor)-sum(case when dv.visitor = dv1.visitor then 1 else 0 end) as removed
    ,sum(case when dv.visitor = dv1.visitor then 1 else 0 end) as remained

    from
    -- insert empty/missing dates
    (
    select
    (select min(dt) from dailyvisits)+number as dt_
    from master..spt_values as b with (nolock readuncommitted)
    where
    [type] = 'P'
    and number between 1 and (select datediff(day,(select min(dt) from dailyvisits), (select max(dt) from dailyvisits)))
    ) as date_
    left hash join dailyvisits as dv
    on date_.dt_ = dv.dt
    left hash join dailyvisits as dv1
    on date_.dt_ = dateadd(day,1,dv1.dt)
    group by date_.dt_


    but I still can not beat Pesomannen teaser :)

  • Posted @ July 03, 2011 02:41 AM by simran

    another version with recursive CTEs, MAXRECURSION can be modified if the date range spans across more than 100 days

    use SimTest
    go

    Declare @Start date
    Declare @End date
    Set @Start = '2011-06-01'
    Set @End = '2011-06-08';

    Declare @DailyVisits table
    (
    dt date,
    visitor char(1)
    )
    insert into @DailyVisits
    Select * from DailyVisits

    ;with DailyVisist_CTW(dt) as
    (
    select @Start as dt

    union all

    select
    dateadd(DD, 1, dt) as dt
    from DailyVisist_CTW
    where dt < @End
    )

    Select
    dt,
    Visitor = (select count(visitor) from @DailyVisits where dt = d1.dt),
    Added = (
    Select COUNT(*) from
    (
    select visitor
    from @DailyVisits where dt = d1.dt
    and visitor not in
    (
    select visitor
    from @DailyVisits where dt = DATEADD(dd, -1, d1.dt)
    )
    ) a
    ),
    Removed = (
    Select COUNT(*) from
    (
    select visitor
    from @DailyVisits where dt = DATEADD(dd, -1, d1.dt)
    EXCEPT
    select visitor
    from @DailyVisits where dt = d1.dt
    ) a
    ),
    Remained = (
    Select COUNT(*) from
    (
    select visitor
    from @DailyVisits where dt = d1.dt
    INTERSECT
    select visitor
    from @DailyVisits where dt = DATEADD(dd, -1, d1.dt)
    ) a
    )
    from
    DailyVisist_CTW d1


You must log on before posting a comment.

Are you a new visitor? Register Here