Historical User Cohort Analysis

Hi all,

I am trying to figure out flexible user cohort analysis on week by week, but I have trouble getting my head around it, so I could use a nudge in the right direction :wink:

Let’s say I have defined a user cohort based on two aspects:

  1. The duration since the user account was created
  2. The duration since last user transaction
    In this case I want to know the amount of users where 1 is above 3 months and 2 is below 3 months.

Calculating the amount of users in this cohort based on all the data is no problem. However I would like to know how this group evolved over time. Meaning I want to compare the status today to last week to the week before etc. In those previous weeks the duration calculation would not compare to the current day but to the defined end date in the dimension to calculate the duration, thus limiting the data used for the calculation of group size. And I would like to see all weeks next to each other at the same time in 1 table that shows the amount of users in a group based on that date dimension.

For example. My account was registered on 1-1-2019 based on last Sunday (10-2) the duration between creation and query date = 41 days. The Sunday prior to that it would be 34 days. That would be two rows in a table, one with the date dimension and the other with the duration. If I wanted to know how many users have 40+ days since account creation my account would be counted this week, but not the previous week.

I hope this makes sense. Any ideas on where to start?


The only way I can think of to do this would be the following:

  1. use a pre-existing date dimension table, or create one using a derived table (for instance in BigQuery you can use the GENERATE_DATE_ARRAY function, in Redshift there is a GENERATE_SERIES function, other DBs will have other options) - the output of this should be a table which contains every week as a row - I’ll assume from now on you’ve called this view date_dim and the column is called week
  2. cross join this table to your table with the cohort information - use type: cross in the explore’s join definition
  3. filter down to ${date_dim.week} > ${account_creation_date}

This now gives you, for every week, the list of accounts that were created before this week started, you can then use the difference between the account creation date, and this week to get the filter you’re looking for, and calculate the number of accounts by doing a count_distinct type measure.

The explore you would build would then use the week from the date table as the dimension for each row, and the measure you’ve calculated above. This gives you a week by week calculation of the number of accounts created within a given timeframe of each week. You can extend this pattern to include the number of transactions within a given timeframe too.

You may need to utilise PDTs to pre-compute the cross join however as that will be costly in terms of performance to create at query time. In the worst case scenario you may even need to pre-compute the measures you want to calculate on the explore too because even storing a table which has length number_of_weeks * number_of_accounts (or transactions) is pretty intensive.

Let me know if anything above is unclear


Thanks Andy.

I was already working in that direction and through trial and error managed to come up with something that works, but is indeed very heavy to query. I am still working out some kinks,
but for now I can move on.

If you get the time, you should post your pattern so that we may bask in the glory of it! :sunny: