How-to: Reading only the latest record in a time period using 'deltas only' ETL

How-to: Reporting on end-of-period records only using ‘deltas only’ ETL

Originally authored for Looker 6.20.19

Business Requirement:

Business users have requested the ability to show only the ‘end of period’ value for inventory. Looker does not yet fully support ‘ending value only’ reporting as an out-of-the-box feature so a custom code solution is required.

Example: Users would like to enter the Inventory explore, select a single day in the reporting interface, and have Looker retreive the latest value for each unique combination of location and UPC. Users routinely report on inventory in the past few days/weeks so a hard-coded ‘yesterday’ or ‘now’ view is not an acceptable solution.

Technical Pre-Requisites/Assumptions:

This write up is intended for a technical audience. The write up assumes a strong understanding of LookML as well as write privileges for your instance. All SQL is written for BigQuery.

This approach is built for a database in which a single table stores all inventory records in an append-only orientation. The database picks up inventory deltas as they occur (any time a unit is sold/transferred/returned/etc., a new inventory record is appended to the database). If more than one row inventory record is generated for a given date/location/UPC combination, the Looker application needs to be intelligent enough to select which record was the last to be appended. If no row exists for a given date/location/UPC, the last record to be provided for that given location/UPC is the most accurate.

The database picks up inventory deltas as they occur (any time a unit is sold/transferred/returned/etc., a new inventory record is appended to the database).

Step 1: Defining which values are to be held in Inventory PDT

Begin by defining the logic necessary to capture only the latest value for each day/location/UPC. These values will be held in a stand-alone persistent derived table (PDT). Isolating the correct values can be accomplished using query/sub-query. Defining the correct query requires careful review of your production data. A sample view header is provided below as a framework.

view: inventoryhistorydata {
  derived_table: {
    datagroup_trigger: daily_etl_complete
    sql:
    SELECT
      cast(marco.justdate as date) as inventorydate,
      polo.location_id,
      polo.product_id,
      polo.ats  --ats is an abbreviation for available-to-sell
    FROM
      inventoryhistory AS polo
    JOIN (
      SELECT
        SUBSTR(CAST(inventory_ts AS string), 0, 10) AS justdate,
        MAX(inventory_ts) AS maxtimestamp,
        location_id,
        product_id
      FROM
        inventoryhistory
      GROUP BY
        1,  3,  4
      ORDER BY
        1 ASC) AS marco
    ON
      polo.location_id = marco.location_id
      AND polo.product_id = marco.product_id
      AND polo.inventory_ts = marco.maxtimestamp
    GROUP BY
      1,  2,  3,  4 ;;
  }

Step 2: Complete the inventorydata view definition

Complete the inventorydata view by declaring a robust primary key at the day/location/UPC level and defining the required dimensions. All objects in this view are hidden. This approach will become clear in later steps.

  dimension: inventorypk {
    primary_key: yes
    hidden: yes
    type: string
    sql: concat(cast(${inventorydate_date} as string)
          , cast(${inventory_location_id} as string)
          , cast(${product_id} as string)) ;;
  }

  dimension_group: inventorydate {
    hidden: yes
    type: time
    datatype: date
    timeframes: [date]
    sql:  ${TABLE}.inventorydate ;;
    convert_tz: no
  }

  dimension: inventory_location_id {
    hidden: yes
    type: string
    sql: ${TABLE}.location_id ;;
  }

  dimension: product_id {
    hidden: yes
    label: "UPC"
    type: string
    sql: ${TABLE}.product_id ;;
  }

  dimension: dimensionats {
    hidden: yes
    type: number
    sql: ${TABLE}.ats ;;
  }

Step 3: Construct a new view for the inventory reporting objects

Construct a stand-alone non-persistent derived table for inventory reporting. If you’re unfamiliar with how Looker uses persistent derived tables versus derived tables, please read this page carefully. The derived table is defined using a Common Table Expression (sometimes referred to as a CTE) and allows the reporting derived table to reference the inventoryhistorydata PDT.

Include liquid markup in the derived table definition to make the results of the query reactive to the date parameter. If you’re unfamiliar with how Looker interprets Liquid Variables, please read the Looker documentation here. The liquid allows the end user to select date at run time. The Looker architecture will then look back in time and select the most recently uploaded date/location/UPC record. This allows maximum flexibility in the reporting as well as a simple interface for your end users.

view: inventoryhistoryreport {
  derived_table: {
    sql:
    WITH x AS
      (
      SELECT
        product_id
        , location_id
        , MAX(inventorydate) as latest_inventory_date
      FROM ${dtcinventoryhistory.SQL_TABLE_NAME}
      WHERE inventorydate <= CAST({% parameter inventory_date %} AS DATE)
      AND {% condition product_id %} product_id {% endcondition %}
      AND {% condition location_id %} location_id {% endcondition %}
      GROUP BY 1,2
      )

   SELECT
     x.*
     , y.ats
   FROM x
   LEFT JOIN ${dtcinventoryhistory.SQL_TABLE_NAME} AS y
   ON x.product_id = y.product_id
   AND x.location_id = y.location_id
   AND x.latest_inventory_date = y.inventorydate ;;
  }

parameter: inventory_date {
  type: date
}

dimension: product_id {
  type: string
  sql: ${TABLE}.product_id ;;
}

dimension: location_id {
  type: string
  sql: ${TABLE}.location_id ;;
}

dimension: ats {
  hidden: yes
  type: number
  sql: ${TABLE}.ats ;;
}

dimension: latest_inventory_date {
  hidden: yes
  can_filter: no
  type: date
  sql: ${TABLE}.latest_inventory_date ;;
}

measure:  eop_ats_total {
  type: sum
  sql: ${ats} ;;
}

Step 4: Define the inventory explore

Declare the explore in the model file. Note the explore will begin in the inventoryhistoryreport view. This starting point is important for join behavior. This is an opportune moment to join out to any other master data tables that were previously defined. A simple example is included below.

explore: inventoryhistoryreport {
  label: "Inventory"
  view_label: "Inventory History"
  always_filter: {
    filters: {
      field: inventory_date
      value: ""
    }
  }

  join: itemmaster {
    view_label: "Product Dimensions"
    relationship: many_to_one
    type: left_outer
    sql_on: ${inventoryhistoryreport.product_id} = ${product_master.upc} ;;
  }

The always_filter clause ensures the date parameter will appear in every inventory report

Step 4: IT Testing, UAT, Deployment to Production, etc.

Step 5: Change Management & End-User Training

Good luck!

3 Likes

Super cool! I’ve already shared this with a few people, so it’s definitely getting use