Update and insert PDT instead of reload

(Mark Brown) #1

It would be great to be able to have PDTs that could be inserted and updated on trigger instead of a truncate and reload.
The main reason would be for creating slowly changing dimensions with Looker instead of other solutions. This would require multiple queries, update to the old row end date and insert of a new row.
A secondary reason would be for large tables, incrementally inserting new rows. We do this currently with multiple PDTs, one triggered yearly (< this year), one monthly (>= this year and < this month) and one daily or hourly (>= this month) with a union of the three.
Does anyone else have this need or a solution for it using Looker?

(Dirty Looker) #2

Hey @Mark_Brown,

This is something that we’ve heard a lot about.

We’ve had customers request this in the past with event or log data because it is immutable and there are not changes occurring in the calculated results. Today PDTs are not designed to handle this use case. We recommend formalizing these jobs in your ETL architecture. We have looked into what it would take to build a solution around this, but recognize that there are already existing tools or techniques to accomplish this that are very flexible and powerful.

We also definitely use PDTs to prototype or massage migrated data as a temporary solution. The PDTs are super transparent and easy to migrate.

(Mark Brown) #3

That would work. Currently I am trying to not build any of the normal ETL for fact and dimensions like I have for the past 10 years.
Using only Fivetran and Looker lookml, I have made it really far and am doing more with one person than I have accomplished in the past with a team of three.
Having PDTs that could run a query that would mutate either buy only insert or insert and update would allow me to keep going just using lookml and Fivetran.

(Tristan Handy) #4

Mark! I love this conversation so much. Data archival of SCDs is a big deal, as are incremental data models. I’m a contributor to an open source project called data build tool (dbt) that provides exactly the functionality you’re describing: if you plug in the table you want to archive as well as the unique key and the date, it will maintain a copy of that table and log all updates to that table as new rows with new field values. You don’t have to write any SQL at all.

Learn more about dbt here:

And learn more about the just-released archival functionality here:

We got this ask from folks at SeatGeek who are just in the process of deploying dbt within their data team and thought it was a lovely feature to implement. dbt’s primary purpose is to perform in-database modeling using SQL, and one of its primary benefits is that it can handle incremental data models (insert/select instead of create table as).

(Leland) #5

I just discovered that you can hack around this in Postgres (probably other languages too) with CTEs. You can combine INSERT statements in the calling of a CTE within a PDT to do whatever you want to do incrementally. You are inserting into a table in your database that you have to create beforehand, but this allows you to use Looker to add incremental values. Here is my LookML implementation:

view: foo {
  derived_table: {
      WITH foo AS 
       (INSERT INTO schema.bar SELECT...lots of SQL FROM schema.table WHERE date > current_date - 1 RETURNING *)
      SELECT * FROM foo
    sql_trigger_value: select current_date
    indexes: ["fubar"]

Then you should be able to setup your trigger value to correspond with whatever your defined increment is in your SQL.

(Josiah Vincent) #6

Hey @Leland_Robbins

I am new to Looker but wouldn’t this increment just cause the PDT to recreate and only add the inserted values/new?