How to zero fill dates in Looker


#1

A daily_usage table can be useful for summarizing usage per day. But what if there is a day without any usage information? Having missing dates when there was no usage could make analysis misleading or confusing. Thus in this case, we will want to have one row for each date, regardless of whether there is usage information on that date.

This can be done in a derived table using a generated date series as the base table. Lets create a simple daily_usage derived table that counts total events per day:

Old LookML

- view: daily_active_users_new
  derived_table:
    sql: |
      SELECT general_date.date AS general_date
           , COUNT(*) AS event_count
       FROM
          events

New LookML

view: daily_active_users_new {
  derived_table: {
    sql: SELECT general_date.date AS general_date
           , COUNT(*) AS event_count
       FROM
          events
       ;;
  }
}

However, this table may be missing rows for dates without events. To ensure that all dates are included in this daily_usage table, we can put a generated date series in the FROM clause, and then join events to that date series:

Note: the generated date series will vary based on the dialect of your database.

PostgreSQL

- view: daily_active_users_new
  derived_table:
    sql: |
      SELECT general_date.date AS date
           , COUNT(*) AS event_count
       FROM (SELECT GENERATE_SERIES('2012-10-12', CURRENT_DATE, '1 day')::DATE AS date) AS general_date
       LEFT JOIN events
       ON events.created_date = general_date.date

fields:

  - dimension_group: event
    type: time
    timeframes: [time, date, month, year]
    sql: ${TABLE}.date

  - dimension: event_count
    type: number
    sql: ${TABLE}.event_count

MySQL and Redshift

- view: daily_active_users_new
  derived_table:
    sql: |
       SELECT general_date.date AS date
           , COUNT(*) AS event_count
       FROM (
            SELECT
                DATE_ADD('2001-01-01', INTERVAL numbers.number DAY) AS date
                --DATEADD(day,number::int,'2001-01-01') AS date  ##redshift syntax
            FROM 
                (SELECT 
                    p0.n 
                    + p1.n*2 
                    + p2.n * POWER(2,2) 
                    + p3.n * POWER(2,3)
                    + p4.n * POWER(2,4)
                    + p5.n * POWER(2,5)
                    + p6.n * POWER(2,6)
                    + p7.n * POWER(2,7) 
                    as number
                  FROM 
                    (SELECT 0 as n UNION SELECT 1) p0,
                    (SELECT 0 as n UNION SELECT 1) p1,
                    (SELECT 0 as n UNION SELECT 1) p2,
                    (SELECT 0 as n UNION SELECT 1) p3,
                    (SELECT 0 as n UNION SELECT 1) p4,
                    (SELECT 0 as n UNION SELECT 1) p5,
                    (SELECT 0 as n UNION SELECT 1) p6,
                    (SELECT 0 as n UNION SELECT 1) p7) as numbers) as general_date
                   LEFT JOIN events
                   ON events.created_date = general_date.date

fields:

  - dimension_group: event
    type: time
    timeframes: [time, date, month, year]
    sql: ${TABLE}.date

  - dimension: event_count
    type: number
    sql: ${TABLE}.event_count

Alternative MySQL Approach

- view: daily_active_users_new
  derived_table:
    sql: |
      SELECT general_date.date AS date
           , COUNT(*) AS event_count
       FROM (
           SELECT date 
           FROM (
              SELECT curdate() - interval (a.a + (10 * b.a) + (100 * c.a) + (1000 * d.a) + (10000 * e.a)) day as date
              FROM (select 0 as a union all select 1 union all select 2 union all select 3 union all select 4 union all select 5 union all select 6 union all select 7 union all select 8 union all select 9) as a
              CROSS JOIN (select 0 as a union all select 1 union all select 2 union all select 3 union all select 4 union all select 5 union all select 6 union all select 7 union all select 8 union all select 9) as b
              CROSS JOIN (select 0 as a union all select 1 union all select 2 union all select 3 union all select 4 union all select 5 union all select 6 union all select 7 union all select 8 union all select 9) as c
              CROSS JOIN (select 0 as a union all select 1 union all select 2 union all select 3 union all select 4 union all select 5 union all select 6 union all select 7 union all select 8 union all select 9) as d
              CROSS JOIN (select 0 as a union all select 1 union all select 2 union all select 3 union all select 4 union all select 5 union all select 6 union all select 7 union all select 8 union all select 9) as e
               ) dates
           WHERE date >= '2012-01-01'
           ORDER BY date) as general_date
       LEFT JOIN events
       ON events.created_date = general_date.date

fields:

  - dimension_group: event
    type: time
    timeframes: [time, date, month, year]
    sql: ${TABLE}.date

  - dimension: event_count
    type: number
    sql: ${TABLE}.event_count

Looker 3.44 Release Notes
(Rex Gibson) #2

There are many data dimension generators out there. Look for the the Kimball Group Generator as one example. These date and time dimension generators allow for adding columns for more complex time groupings than what looker provides out of the box.


(Rex Gibson) #3

I just remembered about this generator which is very handy. Remember that one of the things about a date dimension is that it give you the power to do exotic things with grouping. Things like adding a column for third of a day or same day of week last quarter without complicated sql. I highly recommend you read up on the Data Warehouse Toolkit’s Date Dimension sections in Kimbal’s Book. They are the standard.


(Rex Gibson) #4

Just wanted to drop a quick reminder that now Looker can fill in gaps in dates without a date dimension. Thanks @lindsey!


(vincent) #5

Hi @Rex,

Correct, with Looker 4.0+, we have introduced Dimension Fill that will do this for you. More on that here. With Looker 4.2+, Dimension Fill now be a parameter allow_fill for relevant dimensions. Learn more about that here.

Cheers,
Vince


(jeffrey.martinez) #6

If you’re doing this using BigQuery, this article from SO may be helpful to create a range of dates.

Skipping straight to the dessert, to get a list of date values between two dates (2015-06-1 until the current date in this example):


SELECT day FROM 
UNNEST(GENERATE_DATE_ARRAY(DATE('2015-06-01'), CURRENT_DATE(), INTERVAL 1 DAY)) AS day