Trying to create mixpanel style customisable funnel visualisation using liquid parameters for defining steps

The company I work for keeps Mixpanel for easy ad hoc reports & funnels and I’m trying to create as flexible funnel builder.

Here’s my two-step sample query for the derived table using MySQL 8:

with 
  events as (
    select id as user_id, 'Account Created' as label, created_at as ts, id from users
    union
    select user_id, 'Order Completed' as label, completed_at as ts, id from orders where state = 'complete'
    union
    select user_id, 'User Referred' as label, created_at as ts, id from referrals
  ),
  step1 as (
    select user_id, min(ts) as step1_ts
    from events  
/*{% if test._parameter_value == 'yes' %}*/ /*inner join for filtering?*/ /*{% endif %}*/
    where label = {% parameter step_1_action %} /*$event_step_1*/
      and ts >= {% date_start funnel_activation_timeframe %}
      and ts <= {% date_end funnel_activation_timeframe %}/*$timeframes_step_1*/ 
    group by user_id
  ),
  step2 as (
    select events.user_id, min(ts) as step2_ts
    from events inner join step1 
      on step1.user_id = events.user_id 
      and timestampdiff(minute, ts, step1.step1_ts) > 0
      and timestampdiff(day, ts, step1.step1_ts) <= {% parameter days_to_complete_step_2 %} /*$time_to_complete_step_1*/
    where label = {% parameter step_2_action %} /*$event_step_2*/
    group by user_id
  )
select step1.*, step2.step2_ts from step1 left join step2 on step2.user_id = step1.user_id

So there are two problems I can see, but I can’t really figure out a solution:

  1. The commented line in step1 query is something I wanted to use for filtering on event properties (like items bought, geo data, etc.), but it doesn’t seem to work this way (it looks like it just ignores the if condition statement and executes regardless). I wanted to make a conditional inner join to retrieve the data from other tables (something like [if ‘condition’] inner join order_items oi on oi.order_id = events.id and oi.category = shoes ). Is there a way I could handle joins using liquid parameters (or whatever)? Since I have different event types, I only want to join tables if the filter is chosen.

  2. Selecting the lowest timestamp in step1 might miss someone who, for example, performed a funnel activation event twice and, after doing this second time, then completed the funnel (when it took to long from the first time to the second step).

Has anybody done flexible funnels in Looker and could help me there?
Thanks a lot.

  1. To get the full flexibility of Looker’s filters in subqueries you’ll probably want to use the {% condition %} tag. (In addition, you can declare the fields as view>filter fields, and specify their sql clause as TRUE to prevent the filter from affecting the outer where clause.) See more here: https://docs.looker.com/data-modeling/learning-lookml/templated-filters

  2. This will be trickier to specify. I would use window functions. MySQL’s window function support has traditionally been spotty , but looking at the docs, it seems the latest version should offer good support. It would look something like LAG(CASE WHEN <condition to match events> AND <timestamp recency requirements> THEN TRUE ELSE NULL END) IGNORE NULLS OVER (PARTITION BY user_id ORDER BY timestamp)

1 Like

Thank you very much for your answer, but I’m not sure how your answer on point 2 could work in 4-step funnel for example.

Imagine we wanted to build quadruple order completed funnel with max 10 day gaps between and we have a user with a purchase history like in this example (the number sequence corresponds to days he orders, where 0 is his first purchase; assume all orders are within funnel activation time frames):

0, 15, 20, 100, 101, 101, 102, 200, 201.

So he completes the funnel in days 100-102, but I don’t see how to use the lag function to be able to spot it.

I’ve been thinking about querying multiple timestamp sequences for each user with user ids each step and then just counting distinct user ids as a measure. But if there’s a more efficient way, I would really like to implement it.
Edit: No, it’s so slow it can’t be considered a solution.

PS Since you’re part of Looker Staff, I have to mention that {% condition %} tags work in SQL queries even if commented, which is pretty inconvenient. Is there a reason for that?

Re: the condition tags - unless I’m misunderstanding, it’s because all liquid processing is a pre-processing layer used to generate some SQL text, and the liquid does not particularly know what the containing SQL does. In fact in some cases, this can be used to conditionally insert & manipulate SQL comments.

So, on this dataset (I’m just showing one user’s data) :

Event Date
A 1
B 5
C 25
A 30
B 31
C 32

You could do

CASE WHEN
   event = C
   AND 
       LAG(CASE WHEN event = B THEN date END) 
       IGNORE NULLS OVER (PARTITION BY user_id ORDER BY date)
       > date-10
   AND
       LAG(CASE WHEN event = A THEN date END)
       IGNORE NULLS OVER (PARTITION BY user_id ORDER BY date)
       > date-10
    THEN ' Qualifying C event!' END
FROM events
-- WHERE <optional date filter using {% date_start %} and {% date_end %} to hit a MySQL date index if available}

This says that any event C, for which the latest B event was within 10 days of the C event, and for which the latest A event was within 10 days of the C event, then mark that C event as a qualified event.

You could repeat this for a number of different types of events, and put that in a derived table, and then your final query would count these different progress events