Create View/Model that exposes only latest updated record

filter
lookml

(Benjamin Lutz) #1

Hello,

I have a database table made up of records that have an ID, payload, and timestamp. These are streamed into the database using only inserts, so there are multiple records with the same ID, but different timestamps. I’d like to provide a view (and models) of this data that’s similar to a database where all the records were upserted, so there is only 1 of each ID, that has the payload of the record with the most recent timestamp.

We’d be able to accomplish this by creating another data source that does use upserts, but as we expect to have many more upserts than we do queries on the data, it seems more economical to collapse the data set down at query time than at insertion time.

Articles I’ve found that cover similar topics below, but only refer to a measure, and don’t provide the user experience I’m looking for. I’d like the fact that the database contains more than 1 record per ID to be completely hidden from the user.


Could anyone point me in the right direction? Please let me know if I can provide any more information.


(fabio) #2

Hi Benjamin,

In my article that you linked to, if you stop before you get to the section “using it in lookml”, I’ve provided a SQL query that will implement that transformation, without any measures or dynamic granularity.

If your “payload” has a lot of columns and want to avoid writing a max aggregate for each of them, you can use a SELECT * style query by structuring it like:

WITH latest as (SELECT id, max(timestamp) as timestamp FROM log GROUP BY id)
SELECT * FROM log WHERE (id, timestamp) in latest

The above only works correctly though if there aren’t duplicate timestamps per id, so confirm that before implementing it this way


(Benjamin Lutz) #3

Nice! We’ll take another look at this and update with the outcome!


(Benjamin Lutz) #4

Worked out great. For anyone that’s following up on this, it’s a pretty simple idea, basically get your primary key and timestamp, find the max timestamps for each primary key, and then do a left join with the same view, taking only where PK=PK, and timestamp=timestamp.