How to: Simple incremental PDTs using create_process

Something we’ve been wanting to achieve in Looker for a while is the ability to incrementally build PDTs rather than doing a DROP and CREATE nightly.

I think I’ve found a reasonable solution, using create_process. I’m sharing my findings here as I’ve seen a lot of topics trying to solve this issue, so hopefully this can help some people.

The table I’m trying to increment is quite simple, I’m pulling rows from one table (that fit a criteria), modifying them a bit, then inserting them into another table, but I’m sure you can use the same solution for more complex processes.

  1. First you need to create the table you wish to make incremental in your warehouse, outside of the looker_scratch schema. I won’t go into the DDL process but feel free to comment if you need help.

  2. You then need to grant SELECT and INSERT on this table to your looker user in your warehouse. You can achieve this with GRANT SELECT ON TABLE schema.incremental_table_name TO looker_username;

  3. In LookML, create a new view containing an incremental PDT like so

view: incremental_table {
  derived_table: {
    create_process: {
      FROM existing_table AS e
      -- Only select new rows 
      WHERE e.tstamp > (SELECT MAX(tstamp) FROM schema.incremental_table_name);;
        INSERT INTO schema.incremental_table_name
        -- Optional filtering, assuming Col1 is unique
        WHERE ${SQL_TABLE_NAME}.col1 NOT IN (SELECT col1 FROM schema.incremental_table_name);;
  1. Set the sql_trigger_value to something that indicates the underlying table has been updated e.g. sql_trigger_value: SELECT MAX(tstamp) FROM schema.incremental_table_name;;

  2. You may want to backdate the incremental table before you set up the incremental PDT to start running.

Basically what you’re doing is storing ONLY the new rows in the looker_scratch table, then another sql_step is inserting these new rows into an existing table outside of your looker_scratch schema.

As I mentioned this is a very simple example and will only really work in cases where you want to do simple extractions and processing of rows, then loading them into a separate table.

Interested to see of other use cases people can come up with. I’m going to see if we can use this method to make our page_views table incremental from the raw events data we collect with Snowplow.