'Point In Time' actions modeling

This is a bit long-winded but please bear with me on this one.

I’m not sure our modeling journey has been as straight forward and seamless as we would have anticipated or wanted.

Our use case is as below.

We have 6 separate tables that contain information from actions that our company takes on a regular basis.

Abandonments
Sends
Opens
Clicks
Conversions
Direct Sales

These tables contain a date that those actions took place along with a session_id to identify the session. Those dates look like the below

utc_date_abandoned
utc_date_sent
utc_date_opened

etc…….

The way our model works is that these numbers need to be reported side by side and are a reflection of our ‘Point in Time’ performance. To be more specific they don’t share any common identifiers other than the date that the actions took place, so a typical join won’t work in this instance (session_id = session_id).

Our current solution is to join these tables up via UNION into a PDT using standard SQL and giving each table an ‘Event’ type. See below example (rough mockup)

SELECT * FROM 
(
SELECT
	utc_date_abandoned AS [Activity Date]
	,client_number AS [Client]
	,’Abandoned’ AS [Event]
	,COUNT(DISTINCT session_id) AS [Number of Abandonments]
	,NULL AS [Number of Sends]
	,NULL AS [Number of Opens]

FROM data.abandonments

	WHERE utc_date_abandoned >= ‘2019-10-10’
		     AND utc_date_abandoned < ‘2019-10-15’
		     AND client_number = 12345

UNION ALL

SELECT
	utc_date_sentAS [Activity Date]
	,client_number AS [Client]
	,’Sends’ AS [Event]
	,NULL AS [Number of Abandonments]
	,COUNT(DISTINCT session_id) AS [Number of Sends]
	,NULL AS [Number of Opens]

FROM data.abandonments

	WHERE utc_date_sentAS >= ‘2019-10-10’
		     AND utc_date_sentAS < ‘2019-10-15’
		     AND client_number = 12345

etc….
).

The final results would give us the Total number of Abandonments, Sends and every other metric we decide on by adding another table to the union.

This solution works well for us but does have a few downfalls that we have discovered over time.

1.) We decided to run the UNION in the PDT without date restrictions overnight and cache the results.
- This resulted in really quick querying times but was snowflake usage heavy and used far too many credits.

2.) We added restrictions to the WHERE clause which reduced the volume of data but meant that it was running in query and taking almost 2 minutes to run.

My initial suspicion is that the route we’ve gone down is not the best solution for either performance or cost.

Has anyone had a similar use case or does anyone have any ideas on how we might be able to do this in a more efficient manner?

Hey @Paul_Walton - I would probably use UNION ALL too. But to save on Snowflake credits you could implement an ‘incremental’ rebuild of the PDT, take a look at How to: Simple incremental PDTs using create_process

This’ll allow you to only process a subset of your new events, whilst maintaining a complete view in the ‘cached’ table.

2 Likes

Hi Paul,

This approach may be useful for you if you want to avoid a PDT altogether and allow users to hit raw data, but perform the “union” at query time: https://help.looker.com/hc/en-us/articles/360023686574-A-Cure-For-the-one-to-many-Blues