Modeling Klaviyo events in LookML

bigquery
lookml

(Akash Agrawal) #1

So we’re bringing in Klaviyo data to Looker. The events table contains a list of all events associated with Klaviyo campaigns, for example “Received Email”, “Opened Email”, “Clicked Email” etc.

We want to find the time between any two events for the same person. So essential datediff(‘timestamp of email received’, ‘timestamp of email opened’), but since the events are recorded in separate rows, I wasn’t sure of the best way to do it.

We could always create a derived table that flattens the events table to create separate columns for “Email Received Timestamp”, “Email Opened Timestamp” etc, but I was wondering if there’s a more elegant way to go about it in LookML?


(peter.whitehead) #2

Hi @akash!

My initial thoughts are that using a derived table and going from there would be the best solution for this scenario. Another way, and not as elegant, to do this could be with table calculations and using the offset function to calculate the datediffs in the explore. This doc here discusses positional functions and also has some links to examples: https://docs.looker.com/exploring-data/creating-looker-expressions/looker-functions-and-operators#positional_functions. Let us know if you have any other questions!


(lloyd tabb) #3

You’ve tagged this with BigQuery as well as LookML so I’m going to answer it in a way that is unique to BigQuery. With nested tables, you can turn it into sessions with nested events. You don’t have to persist this and BigQuery is smart enough to only pull in what you need to do it.

Here is an example below for StackOverflow posts and comments. I’m defining duration of a post as the difference between the first and last comment.

view: posts combines the posts table and comments table into a single nested derived table with the comments nested inside each post. The query here is pretty generic and can be modified to nest any two tables.

Once the tables are nested we can build a dimension inside the post the examines the nested data with a subselect, in this case looking at the difference in time between the first and last comment. Since this operates on a single row (the row of a single post), it is super efficient.

 (SELECT TIMESTAMP_DIFF( MAX(creation_date), MIN(creation_date),second) as duration FROM UNNEST(${comments}) );

The explore unpacks the nested comments so you can explore them further. This technique probably warrants an more elaborate post. It can be used in any situation with a one_to_many join.

connection: "bigquery_publicdata_standard_sql"

explore: posts {
  join: comment {
    sql: LEFT JOIN UNNEST(${posts.comments} comment ;;
    relationship: one_to_many
  }
}

view: posts {
  derived_table: {
    sql: 
    SELECT p.*, comments
      FROM `bigquery-public-data.stackoverflow.stackoverflow_posts`  p
      LEFT JOIN (
        select post_id
          , ARRAY_AGG((SELECT AS STRUCT c.*)) comments
        FROM `bigquery-public-data.stackoverflow.comments` c
        GROUP BY post_id 
        ) c ON p.id=c.post_id
    ;;
  }
  
  dimension: id {primary_key: yes}
  dimension: title {}
  dimension: comments {hidden: yes}
  dimension: post_duration {
    sql: (SELECT TIMESTAMP_DIFF( MAX(creation_date), MIN(creation_date),second) as duration FROM UNNEST(${comments}) );;
  }
  dimension: post_duration_tiered {
    type: tier
    sql: ${post_duration} ;;
    tiers: [0,10,100,1000,10000,100000]
  }
  measure: count {type:count}
}

view: comment {
  dimension: id {primary_key:yes}
  dimension: user_id {}
  measure: count {type:count}
}

And some output.


(Akash Agrawal) #4

Thanks for your suggestions! I did end up flattening the events table in a derived table, though with a slightly different query that doesn’t involve joins. The query is below.

SELECT campaign_id, person_id, event_type, ts,
FIRST_VALUE(IF(event_type=‘Received Email’, ts, NULL) IGNORE NULLS) OVER(win) receive_ts,
FIRST_VALUE(IF(event_type=‘Opened Email’, ts, NULL) IGNORE NULLS) OVER(win) open_ts,
FIRST_VALUE(IF(event_type=‘Clicked Email’, ts, NULL) IGNORE NULLS) OVER(win) click_ts
FROM events
WINDOW win AS (PARTITION BY campaign_id, person_id ORDER BY ts ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING)