Update PDT when both tables are updated

snowflake

(Menashe Hamm) #1

We load data into certain Snowflake tables daily. I want to update a PDT only when both tables have been loaded for the day. To update the PDT when either table has been loaded, I use

sql_trigger_value:
  select sum(c)from(
    select count(*) c from TABLE1
    union all
    select count(*) from TABLE2
  )
;;

(And other solutions exist.) But how do I update the PDT when both tables have been loaded? Assume that at least one of the tables has no column whose value can be used for this purpose (like a timestamp).


(Menashe Hamm) #2
sql_trigger_value:
  select min(last_load_time)
  from information_schema.load_history
  where schema_name='...'
  and table_name in('TABLE1','TABLE2')
;;

(Elliot Park) #3

Hey @menashe,

Good question! There are at least two ways that we could get a PDT trigger that only triggers after two different tables in the DB have been updated.

  1. If last_load_time (or load times in general) are available, we should be able to use that query in your last comment or a query like below to trigger a rebuild only after both tables have been updated:
-- Note: this min(timestamp) will only change when the most recent upload times for both
-- Table A and Table B have changed
select min(timestamp) from 
(select max(timestamp) as timestamp from Table_A union select max(timestamp) from Table_B) 

Alternatively, we could a setup like below if Table load times are not available to us:

# sql_trigger_value will trigger a rebuild every day
view: pdt_one {
  derived_table:
     sql: .... ;;
  sql_trigger_value: SELECT current_date;;
}

# sql_trigger_value will only force a rebuild when the master table updates
view: pdt_one_prime {
   derived_table:
      sql: SELECT count(*) as num FROM ${pdt_one.SQL_TABLE_NAME};;
   sql_trigger_value: SELECT count(*) from ${master_table.SQL_TABLE_NAME}
}

# sql_trigger_value will trigger a rebuild every day
view: pdt_two {
  derived_table:
    sql: .... ;;
  sql_trigger_value: SELECT current_date;;
}

# sql_trigger_value will only force a rebuild when the master table updates
view: pdt_two_prime {
  derived_table:
    sql: SELECT count(*) as num FROM ${pdt_one.SQL_TABLE_NAME};;
  sql_trigger_value: SELECT count(*) from ${master_table.SQL_TABLE_NAME}
}

# sql_trigger_if will rebuild when query returns "True" (i.e. when pdt_one is larger than the pdt_one_prime snapshot table AND when pdt_two is larger than the pdt_two_prime snapshot table)
view: master_table {
   derived_table: 
      sql: ... ;;
   sql_trigger_if: SELECT CASE WHEN pdt_one.num > pdt_one_prime.num AND pdt_two.num > pdt_prime.num THEN TRUE ELSE FALSE  FROM (SELECT count(*) as num FROM ${pdt_one.SQL_TABLE_NAME}) as pdt_one, (SELECT count(*) as num FROM ${pdt_two.SQL_TABLE_NAME}) as pdt_two, ${pdt_one_prime.SQL_TABLE_NAME} as pdt_one_prime, ${pdt_two_prime.SQL_TABLE_NAME} as pdt_two_prime
}

To break this down, the master_table represents your main PDT (i.e. the PDT you’d like to trigger only after two subordinate tables have been updated). The idea is that if pdt_one and pdt_two both become larger than prime_one_prime and pdt_two_prime, which are something like “snapshot” tables in this case, then we force a rebuild of the master_table. Note that pdt_one and pdt_two will rebuild at the rate you specify and in this example, they’re rebuilt everyday using SELECT current_date. The master table being rebuilt will then trigger to the prime tables being rebuilt, which then gets our loop started back up.

Hope that help provides some options regarding getting this PDT triggered accordingly!
Elliot