Hi @Dan! This is a great question that we run into very frequently. @George_Fraser--thanks for the shout out and giving a bit of a preview of my answer
I'm one of the maintainers of an open source product called dbt (data build tool). dbt is very similar to Looker's Persistent Derived Table functionality, but is more sophisticated in a couple of key areas. The one that's relevant to your question is that dbt natively supports incremental table rebuilds: it's baked into the core of the tool. As a result, it only requires a couple of lines of code to go from full table rebuilds to incremental rebuilds in most cases.
I linked the relevant section of the docs below, but the short version is that you need to do two things to make a table incremental in dbt:
- Write a dynamic
where condition in the model that performs date limiting. This
where condition should limit records to only those that are new or changed since your prior table load.
- Specify a table unique key.
That's it: dbt takes care of the rest. It uses the same conceptual approach that @George_Fraser outlines, where updated records are deleted and new data is inserted, but dbt writes all of the inserts and deletes for you. This keeps your code simple, maintainable, and extremely performant (and very much resembles a PDT!).
dbt is starting to see quite a lot of usage in the Looker community; there are over 200 companies using dbt in production as of this writing and many of them are Looker customers. dbt models present a very natural transition for when you're ready to take the next step beyond PDTs, whether that's to improve performance with incremental rebuilds, build an explicit DAG, or much more. You can transition PDTs over to dbt models 1:1 just using copy/paste, and then refactor from there. EZPZ.
Docs on incremental dbt models are here. If you run into any trouble, just ping us in Slack.