Flexible Retail Year over Retail Year

lookml

(Russell) #1

Flexible Retail Calendar over Retail Calendar Analysis

What Do You Get?

Many ecommerce platforms, when reviewing their sales history, want to be able to look back on their retail calendars to do comparisons. It’d additionally be nice to be able to do that in a way that plays nice with Looker and its date filters, so that it can compare any window of time to any matching window of time in the previous retail year.

Benefits

  • Flexible window that behaves just like typical looker date fields
    • Past 28 complete days
    • Past 3 weeks
    • A date range
    • etc.
  • Ability to handle restated years with a defaulted filter
    • Can change filter when necessary
    • End user typically doesn’t need to touch it
    • Can specify several if looking back very far
  • Select an arbitrary number of retail years in the past
    • This year and last year
    • This year, last year, five years ago, and ten years ago
    • Only limited by the size of your retail calendar table
  • Base explore can be joined to any other explore through a date field
    • Use extends to keep those default filters
    • Uses simple one-to-many join
    • Plugging in to use is actually fairly trivial
  • Uses dates from current periods to group data
    • Allows you to keep context of past years associated with that date
    • Allows you to specify time frames
    • Can also just use Day In Period

Drawbacks

  • Have to make an explore for each date you want to perform analysis on
    • Really only a few lines of code, but still can add up
  • Not sure how to optimize
    • PDTs may not be very applicable because this is designed to be run-time flexible
    • Maybe you’ll come up with something
  • The provided view is non-trivial to understand
    • But I already wrote it!
    • If you find a bug, please let me know so we can tweak it!
  • Alignment breaks or behaves unexpectedly when looking back over a period that should be restated
    • We don’t have a use case for doing these comparisons without restatement, so I didn’t develop it, but I suspect it can be done
    • At least it’s visually obvious when it happens!

Examples

To protect our data, all my examples use a table that I generated randomly, so actual trends will likely look neater than the below graphs.

In the above, the retail dates of the past years are lined up with the date of the current retail year’s. That is, with 2017 restated, July 23 of 2018 is on the same line of data as July 24 of 2017, which is on the same line as July 18 of 2016.

Expand to see these three retail years as separate looks that don't use the model to verify that the alignment is done correctly.

2018

2017

2016

How Do You Use It?

  • Get a SQL retail calendar table with the following columns:
    • Has the “actual” or Gregorian date
    • Has an ID (or you can alter the sql query further)
    • Has retail calendar year, month, and day
  • Take the com_pop_calendar.view.lookml provided below
    • Modify the sql so that it looks at your data table with your columns
    • Ours have rc in front of most of them
  • Take the common.model.lookml provided below
    • Include it in the model of the view you want to plug in
    • Follow the snippet instructions to attach it

I sincerely recommend using the com_pop_calendar view as the base of the explore, making it extend the com_pop_calendar explore so it gets the default values for the filters, and then joining the target explore into it by the date dimension you want to analyze.

How Did I Do It?

… Uhh. Oh man. Like. Okay. So.

Retail Score

I felt that, in order to make handling much of the logic simpler, it would make sense to translate retail dates to a “retail score”. This squishes each retail date into one unique value. Generally, it’s a number that looks like year.day.

The Unadjusted Retail Score

f(retail_year, retail_day) = retail_year + retail_day/1000

Examples:

f(2017, 5) = 2017.005
f(2017, 370) = 2017.370
f(2018, 5) = 2018.005

The Adjusted Retail Score

However, retail calendars notoriously encounter cases where they need to be restated. This means we need a small adjustment to exclude certain dates and shift certain dates up.

  • If in restated year
    • If in the first 7 days, the score is flipped to be negative (multiplied by -1).
    • Otherwise, their score needs to move up 7 days (subtract 0.007).
  • Otherwise, score is the same as the unadjusted score.

Examples:

f(2017, 5, true) = -2017.005
f(2017, 370, true) = 2017.363
f(2018, 5, false) = 2018.005

I’ll explain the “Current Adjusted Retail Score” later, but it basically maps the adjusted retail score to the score it would’ve had in the current retail year (so it can be grouped with the current retail year’s adjusted retail score).

Adjusted Retail Score as a SQL Function

I admit that this is ugly, and I have a deep disdain for redshift’s syntax.

$1: The first numeric is the retail year.
$2: The second numeric is the retail day.
$3: The boolean is true if the retail year is a restated year (false if it’s not a restated year).

CREATE OR REPLACE FUNCTION common.f_retail_date_score(numeric, numeric, boolean)
RETURNS numeric AS
$BODY$
	select
		case when $3
			then case when $2 < 8
then -($1 + $2/1000.0)
				else ($1 + $2/1000.0) - 0.007
			end
			else ($1 + $2/1000.0)
		end
$BODY$
LANGUAGE sql IMMUTABLE;

Derived Table

You need a datatable that contains all of your retail dates:

  • actualdate: the Gregorian, “regular” non-retail date
  • rcyear: the retail calendar year
  • rcweek: the retail calendar week
  • rcday: the retail calendar day
  • rcid: an id for the retail calendar for the purpose of joins (can be actualdate, but isn’t for us)

You need a few conditions:

  • restated_years: Years that need to be restated
  • date_range: The date range that will be compared against its previous years
  • selected_periods_past: The previous retail calendars to include
    • 0 would be the year specified in the date_range
    • 1 would be the year before the one in the date_range
    • 2 would be 2 years before, and so on.

SQL:

  with
    pop_start as (
      select
        actualdate::date  as actualdate,
        rcyear            as year,
        rcday             as day,
        common.f_retail_date_score(rcyear, rcday, {% condition restated_years %} rcyear {% endcondition %}) as score,
        {% condition restated_years %} rcyear {% endcondition %} as is_in_restated_year
      from common.retailcalendar
      where actualdate::date = {% date_start date_range %}
    ),

    pop_end as (
      select
        actualdate::date  as actualdate,
        rcyear            as year,
        rcday             as day,
        common.f_retail_date_score(rcyear, rcday, {% condition restated_years %} rcyear {% endcondition %}) as score
      from common.retailcalendar
      where actualdate::date = {% date_end date_range %}
    ),

    rc_score as (
      select
        rcid,
        common.f_retail_date_score(rcyear, rcday, {% condition restated_years %} rcyear {% endcondition %}) as score,
        {% condition restated_years %} rcyear {% endcondition %} as is_in_restated_year
      from common.retailcalendar
    ),

  pop_calendar as (
    select
      pop_start.actualdate          as pop_start,
      pop_start.score               as start_score,
      pop_start.is_in_restated_year as start_is_in_restated_year,
      pop_end.actualdate            as pop_end,
      pop_end.score                 as end_score,
      rc.actualdate,
      rc_score.score,
      rc_score.is_in_restated_year,
      floor(pop_end.score - rc_score.score) as pop_periods_past,
      rc.rcyear,
      case when {% condition restated_years %} rc.rcyear {% endcondition %} then rc.rcweek - 1 else rc.rcweek end as rcweek,
      rc.rcday
    from common.retailcalendar as rc
    join pop_start on true
    join pop_end on true
    join rc_score on rc_score.rcid = rc.rcid
    where
      {% condition selected_periods_past %} floor(pop_end.score - rc_score.score) {% endcondition %}
      and
        common.f_retail_date_score(rc.rcyear + floor(pop_end.score - rc_score.score), rc.rcday, rc_score.is_in_restated_year)
        between pop_start.score and pop_end.score - 0.001 -- want exclusive in past periods
  ),

  leap_week as (
    select max(rcweek) > 52 as has_leap_week
    from pop_calendar
  )

  select *
  from pop_calendar
  join leap_week on 1=1
;;

Derived Table Comments

  • pop_start grabs necessary fields from the first date in the range
  • pop_end grabs necessary fields from the last date in the range
  • rc_score calculates the adjusted retail score for each retail calendar day in the table
  • pop_calendar actually forms the bulk of the derived table
  • leap_week is used to handle when a year has a leap week and isn’t restated

That Where Clause

I admit that’s confusing.

Basically, it maps the adjusted retail score to the score it would’ve had in the current retail year (so it can be grouped with the current retail year’s adjusted retail score).

First, we want to make sure that dates returned are in the selected_periods_past that the user selected. You can calculate how many periods past any arbitrary date is by rounding down the difference between the last date’s adjusted retail score and that date’s adjusted retail score. Or…

pop_periods_past = floor(pop_end.score - rc_score.score)

Secondly, we need a “Current” Adjusted Retail Score. Now that we have every date’s adjusted retail score, we need to map those onto what values they would have in the specified date range. For example, if we want to look at week 3 of this year compared to the previous year, we have to map the previous year’s adjusted retail scores to what they would be in the current year’s adjusted retail scores. This is done by adding the pop_periods_past from above to the retail calendar year when calculating the Adjusted Retail Score.

Putting those two together, we get that the Current Adjusted Retail Score is the same as adding pop_periods_past to retail year into the function that calculates the Adjusted Retail Score. That is…

f(year + pop_periods_past, day, is_in_restated_year)

Putting it all together, we have that where clause. It makes sure the date is in the selected past periods and that the Current Adjusted Retail Score is in the date range the user specified.

The Whole View’s LookML

For this, you mostly need to copy and paste it into one of your own files. The only modifications you need to make are to the column names and table names in the derived table. Double check the convert_tz: no to see if they’re appropriate for you.

com_pop_calendar.view.lookml

view: com_pop_calendar {
  derived_table: {
    sql:
      with
        pop_start as (
          select
            actualdate::date  as actualdate,
            rcyear            as year,
            rcday             as day,
            common.f_retail_date_score(rcyear, rcday, {% condition restated_years %} rcyear {% endcondition %}) as score,
            {% condition restated_years %} rcyear {% endcondition %} as is_in_restated_year
          from common.retailcalendar
          where actualdate::date = {% date_start date_range %}
        ),

        pop_end as (
          select
            actualdate::date  as actualdate,
            rcyear            as year,
            rcday             as day,
            common.f_retail_date_score(rcyear, rcday, {% condition restated_years %} rcyear {% endcondition %}) as score
          from common.retailcalendar
          where actualdate::date = {% date_end date_range %}
        ),

        rc_score as (
          select
            rcid,
            common.f_retail_date_score(rcyear, rcday, {% condition restated_years %} rcyear {% endcondition %}) as score,
            {% condition restated_years %} rcyear {% endcondition %} as is_in_restated_year
          from common.retailcalendar
        ),

      pop_calendar as (
        select
          pop_start.actualdate          as pop_start,
          pop_start.score               as start_score,
          pop_start.is_in_restated_year as start_is_in_restated_year,
          pop_end.actualdate            as pop_end,
          pop_end.score                 as end_score,
          rc.actualdate,
          rc_score.score,
          rc_score.is_in_restated_year,
          floor(pop_end.score - rc_score.score) as pop_periods_past,
          rc.rcyear,
          case when {% condition restated_years %} rc.rcyear {% endcondition %} then rc.rcweek - 1 else rc.rcweek end as rcweek,
          rc.rcday
        from common.retailcalendar as rc
        join pop_start on true
        join pop_end on true
        join rc_score on rc_score.rcid = rc.rcid
        where
          {% condition selected_periods_past %} floor(pop_end.score - rc_score.score) {% endcondition %}
          and
            common.f_retail_date_score(rc.rcyear + floor(pop_end.score - rc_score.score), rc.rcday, rc_score.is_in_restated_year)
            between pop_start.score and pop_end.score - 0.001 -- want exclusive in past periods
      ),

      leap_week as (
        select max(rcweek) > 52 as has_leap_week
        from pop_calendar
      )

      select *
      from pop_calendar
      join leap_week on 1=1
    ;;
  }

  # Retail over Retail Filters
  filter: date_range {
    description: "The range of included dates for the first period you want to examine"
    hidden: yes
    type: date
    convert_tz: no
  }

  filter: selected_periods_past {
    description: "0 to look at this year. 1 to look at the previous retail year. Any whole number for that many retail years prior to the specified Date Range."
    hidden: yes
    label: "Selected Retail Years Ago"
    type: number
    default_value: "0, 1"
  }

  filter: restated_years {
    description: "Enter years that should be restated here. If you do not want to restate any years, enter 0."
    hidden: yes
    type: number
    default_value: "2017"
  }

  # Dimensions for users
  dimension: periods_ago {
    description: "Pivot this. The number of retail years ago this date was from the selected date range."
    type: string
    sql:
      case
        when ${_periods_ago} = 0 then 'Current Retail Year'
        when ${_periods_ago} = 1 then 'Last Retail Year'
        else ${_periods_ago} || ' Retail Years Ago'
      end ;;
    order_by_field: _periods_ago
  }

  dimension: day_in_period {
    type: number
    sql:
      (case when ${_day_in_period_score} > .629
        then
          case when ${_has_leap_week}
            then ${_day_in_period_score} - .629
            else ${_day_in_period_score} - .636
          end
        else ${_day_in_period_score}
      end) * 1000 + 1 ;;
  }

  dimension_group: current_period {
    type: time
    timeframes: [raw, date, week, month]
    sql: dateadd(day, ${day_in_period}::int, ${TABLE}.pop_start) ;;
  }

  # Dimensions to support the ones exposed to the user or help with debugging.
  dimension: _periods_ago {
    group_label: "_Debug"
    hidden: yes
    sql: ${TABLE}.pop_periods_past ;;
  }

  dimension: date {
    primary_key: yes
    group_label: "_Debug"
    hidden: yes
    type: date
    sql: ${TABLE}.actualdate ;;
    convert_tz: no
  }

  dimension: days_in_date_range {
    group_label: "_Debug"
    hidden: yes
    type: number
    sql: datediff(day, ${TABLE}.pop_start, ${TABLE}.pop_end) ;;
  }

  dimension: _day_in_period_score {
    group_label: "_Debug"
    hidden: yes
    type: number
    value_format_name: decimal_3
    sql: ${_adjusted_day_score} - ${_pop_start_score} ;;
  }

  dimension: _adjusted_day_score {
    group_label: "_Debug"
    hidden: yes
    type: number
    value_format_name: decimal_3
    sql:  common.f_retail_date_score(${TABLE}.rcyear + ${_periods_ago}, ${TABLE}.rcday, ${TABLE}.is_in_restated_year) ;;
  }

  dimension: _day_score {
    group_label: "_Debug"
    hidden: yes
    type: number
    value_format_name: decimal_3
    sql: ${TABLE}.score ;;
  }

  dimension: _pop_start_score {
    group_label: "_Debug"
    hidden: yes
    type: number
    value_format_name: decimal_3
    sql: ${TABLE}.start_score ;;
  }

  dimension: _pop_end_score {
    group_label: "_Debug"
    hidden: yes
    type: number
    value_format_name: decimal_3
    sql: ${TABLE}.end_score ;;
  }

  dimension: _is_in_restated_year {
    group_label: "_Debug"
    hidden: yes
    type: yesno
    sql: ${TABLE}.is_in_restated_year ;;
  }

  dimension: _has_leap_week {
    group_label: "_Debug"
    hidden: yes
    type: yesno
    sql: ${TABLE}.has_leap_week ;;
  }

  measure: _future_test {
    group_label: "_Debug"
    description: "When day_in_period or current_period is used as a dimension, this tells you what retail calendar day this is. It helps you troubleshoot and double check issues for future dates where you might not have data."
    hidden: yes
    type: sum
    sql: ${TABLE}.rcday ;;
  }

  set: debug {
    fields: [_day_score, _pop_end_score, _is_in_restated_year, days_in_date_range, _future_test]
  }

  set: filters {
    fields: [selected_periods_past, restated_years, date_range]
  }

  set: dimension_helpers {
    fields: [_periods_ago, _day_in_period_score, _adjusted_day_score, _pop_start_score, _has_leap_week]
  }

  set: joins {
    fields: [date]
  }

  set: dimensions {
    fields: [periods_ago, day_in_period, current_period_date, current_period_month, current_period_raw, current_period_week]
  }
}

View Comments

I put some sets in there to help give context. The debug set are fields that can be deleted, but are helpful for debugging issues as a developer. The dimension_helpers set are fields that are hidden, but are used in non-hidden dimensions. The dimensions set are exposed to the end user. The filters set are public filters that are ideally set with default values at the explore level. The joins set is simply the date, which is used to join to others.

The Whole Model’s LookML

common.model.lookml

include: "com_pop_calendar.view"

explore: com_pop_calendar {
  extension: required
  always_filter: {
    filters: {
      field: com_pop_calendar.selected_periods_past
      value: "0, 1"
    }
    filters: {
      field: com_pop_calendar.restated_years
      value: "2017"
    }
    filters: {
      field: com_pop_calendar.date_range
      value: "28 days ago for 28 days"
    }
  }

  # snippet for being base explore
  # explore: pop_sessions {
  #   from: com_pop_calendar
  #   view_name: com_pop_calendar
  #   extends: [com_pop_calendar]
  #   ...
  # }

  # snippet to join into explore
  # join: com_pop_calendar {
  #   view_label: "Period over Period YourJoinDimensionDisplayName"
  #   type: inner
  #   sql_on: ${YourExplore.YourJoinDimension_date} = ${com_pop_calendar.date} ;;
  #   relationship: YourRelationship
  #   fields: [com_pop_calendar.filters*, com_pop_calendar.dimensions*, com_pop_calendar.dimension_helpers*, com_pop_calendar.joins*]
  # }
}

Model Comments

This is used to be able to define and pass the same explore-level defaults across any explores that extend it. Pretty useful to avoid repeating that code everywhere, especially when it needs to change.

Example Explore LookML

It really is as simple as the snippets say to join with another explore.

orders

explore: pop_orders {
  from: com_pop_calendar
  view_name: com_pop_calendar
  extends: [com_pop_calendar]

  join: orders {
    type: inner
    sql_on: ${com_pop_calendar.date} = ${orders.order_date} ;;
    relationship: one_to_many
  }
}

Explore Comments

While you can also use the other snippet to join in the com_pop_calendar view, I couldn’t figure out how to get it to inherit the default, explore-level filters that are gained by using the com_pop_calendar view as the base and joining in the other explores instead.

Conclusion

I hope this helps. I did what I could in spare time to explain what I could so that the view isn’t entirely a black box.

PS: Also, I’m vaguely suspicious that the adjusted retail scores could be solved with manipulation of year and day separately instead of together, but they made sense to me, so I kept using them.