DATEDIFF in PostgreSQL

datediff
lookml
postgresql

(Max Reid) #1

Update: New in Looker 6.0, you can use dimension_group with type: duration to do this automatically.

DATEDIFF in PostgreSQL

Calculating the difference between timestamps in PostgreSQL can be a challenge, due to its lack of a native DATEDIFF function. This article will provide some pre-written LookML to address these difficulties. While some transformations are straightforward such as day difference, we’ll need to rely on some handy transformations to deal with minutes, seconds, and hours by converting to EPOCH time, and week and month have to rely on first using the AGE function to get the interval, then extracting the subfields from there.

The Situation

While you can use DATE_PART to extract certain subfields from a timestamp, like hour or month, calculating the difference between timestamps is challenging without a DATEDIFF function - especially when the two timestamps span multiple date parts (for example, trying to get the number of hours between two timestamps that span multiple days, or trying to get the number of months between two timestamps that span multiple years.)

Using PostgreSQL’s built in functions, we might try to calculate the difference between 2018-01-01 and 2017-11-01 like this:

SELECT DATE_PART('month', TIMESTAMP '2018-01-01 00:00:00') - DATE_PART('month', TIMESTAMP '2017-11-01 00:00:00')

While we expected the result to be 2 months, this returns 01 - 11 = -10, because using DATE_PART makes us lose valuable information - in this case, that the timestamps span multiple years.

We might next try the AGE function, which gives an interval - however, this is limited if the timestamps span multiple date parts - for example, if I got the AGE between two timestamps:

SELECT AGE(TIMESTAMP '2018-01-01 00:00:00', TIMESTAMP '2017-11-01 00:00:00')

Our result will be
0 years 2 mons 0 days 0 hours 0 mins 0.00 secs

Trying then to get the hours between the two timestamps, we would not be able to use PostgreSQL’s EXTRACT function:
SELECT EXTRACT(HOUR from AGE(TIMESTAMP '2018-01-01 00:00:00', TIMESTAMP '2017-11-01 00:00:00')) gives us 0 hours between those two timestamps, when the actual answer is 1464.

The Solution

To remedy this, we can use the following pattern in LookML that, based on the subfield we want (second, minute, hour, day, week, month, year), will write the correct SQL in PostgreSQL to calculate that difference. In this situation, we are calculating the difference between timestamp_one and timestamp_two, with timestamp_two occurring after timestamp_one.

To implement this yourself, all you have to do is:

  1. Change the sql:in timestamp_one and timestamp_two to match whichever timestamps you want to use
  2. Use the Find and Replace functionality (see the " To Search and Replace Across a Single File" section) to Replace Alltimestamp_one_and_timestamp_two to whatever you want to label it as (for example, shipped_and_created)
  3. Use Find and Replace All to change Timestamp One and Timestamp Two to however you want it to appear in the group_label (for example, Shipped and Created).
   dimension_group: timestamp_one {
    type: time
    hidden: yes
    timeframes: [raw, date, week, month, year]
#### Change the sql: below to match whatever timestamp you want - this is only used for the calculation
    sql: ${TABLE}.timestamp_one ;;
  }
  
  
  dimension_group: timestamp_two {
    type: time
    hidden: yes
    timeframes: [raw, date, week, month, year]
#### Change the sql: below to match whatever timestamp you want - this is only used for the calculation
    sql: ${TABLE}.timestamp_two ;;
  }

  dimension: age_between_timestamp_one_and_timestamp_two {
#This is a dimension that gives an output of ‘A years B mons C days D years E mons F days G hours H mins X.XX secs –
#it is used in some of the below calculations but not displayed in the Explore UI
    hidden: yes
    type: string
    sql: AGE(${timestamp_two_raw},${timestamp_one_raw}) ;;
  }

  dimension: seconds_between_timestamp_one_and_timestamp_two {
### NOTE: used FLOOR to round it down to nearest whole number
    group_label: "Time Between Timestamp One and Timestamp Two"
    type: number
    sql: FLOOR(EXTRACT(EPOCH FROM ${timestamp_two_raw}) - EXTRACT(EPOCH FROM ${timestamp_one_raw}))  ;;
  } 

  dimension: minutes_between_timestamp_one_and_timestamp_two {
### NOTE: used FLOOR to round it down to nearest whole number
    group_label: "Time Between Timestamp One and Timestamp Two"
    type: number
    sql: FLOOR((EXTRACT(EPOCH FROM ${timestamp_two_raw}) - EXTRACT(EPOCH FROM ${timestamp_one_raw}))/60)  ;;
}

  dimension: hours_between_timestamp_one_and_timestamp_two {
  ### NOTE: used FLOOR to round it down to nearest whole number
    group_label: "Time Between Timestamp One and Timestamp Two"
    type: number
    sql: FLOOR((EXTRACT(EPOCH FROM ${timestamp_two_raw}) - EXTRACT(EPOCH FROM ${timestamp_one_raw}))/3600) ;;
  }

  dimension: days_between_timestamp_one_and_timestamp_two {
    group_label: "Time Between Timestamp One and Timestamp Two"
    type: number
  ### NOTE: we are just subtracting the _date fields, as opposed to _raw
    sql: ${timestamp_two_date} - ${timestamp_one_date} ;;
  }

  dimension: weeks_between_timestamp_one_and_timestamp_two {
    group_label: "Time Between Timestamp One and Timestamp Two"
    type: number
  ### NOTE: we are just subtracting the _date fields, as opposed to _raw
    sql: FLOOR((1.0*(${timestamp_two_date} - ${timestamp_one_date}))/7) ;;
  }

  dimension: months_between_timestamp_one_and_timestamp_two {
    group_label: "Time Between Timestamp One and Timestamp Two"
    type: number
    sql: EXTRACT(YEAR from ${age_between_timestamp_one_and_timestamp_two})*12 + EXTRACT(MONTH from ${age_between_timestamp_one_and_timestamp_two}) ;;
  }
  
  dimension: years_between_timestamp_one_and_timestamp_two {
    group_label: "Time Between Timestamp One and Timestamp Two"
    type: number
    sql: EXTRACT(YEAR from ${age_between_timestamp_one_and_timestamp_two}) ;;
  }

Once we have the below dimensions built, we can build additional measures on top of them for averages, sums, etc.


  measure: average_seconds_between_timestamp_one_and_timestamp_two {
    group_label: "Time Between Timestamp One and Timestamp Two"
    type: average
    sql: ${seconds_between_timestamp_one_and_timestamp_two}  ;;
  }

  measure: average_minutes_between_timestamp_one_and_timestamp_two {
    group_label: "Time Between Timestamp One and Timestamp Two"
    type: average
    sql: ${minutes_between_timestamp_one_and_timestamp_two}  ;;
  }

  measure: average_hours_between_timestamp_one_and_timestamp_two {
    group_label: "Time Between Timestamp One and Timestamp Two"
    type: average
    sql: ${hours_between_timestamp_one_and_timestamp_two}  ;;
  }

  measure: average_days_between_timestamp_one_and_timestamp_two {
    group_label: "Time Between Timestamp One and Timestamp Two"
    type: average
    sql: ${days_between_timestamp_one_and_timestamp_two}  ;;
  }

  measure: average_weeks_between_timestamp_one_and_timestamp_two {
    group_label: "Time Between Timestamp One and Timestamp Two"
    type: average
    sql: ${weeks_between_timestamp_one_and_timestamp_two}  ;;
  }

  measure: average_months_between_timestamp_one_and_timestamp_two {
    group_label: "Time Between Timestamp One and Timestamp Two"
    type: average
    sql: ${months_between_timestamp_one_and_timestamp_two}  ;;
  }

  measure: average_years_between_timestamp_one_and_timestamp_two {
    group_label: "Time Between Timestamp One and Timestamp Two"
    type: average
    sql: ${years_between_timestamp_one_and_timestamp_two}  ;;
  }

Additionally, by adding a group label “Time Between Timestamp One and Timestamp Two”, it will all appear under one drop-down as in the screenshot below.