Rolled-up averages with dynamic timeframes

(Josh Temple) #1

I often need to take transactional/event data, roll it up to some less granular timeframe (say, a daily level), and compute an average of the results for some set of dimensions and filters.

For example, take a table of orders across retail stores. I might want to know the average sales for Wednesdays over the past 8 weeks. To do that, I need to sum sales by day, filter to Wednesdays, and average.

Since my source data is at the order level, I can’t just slap an average measure on it (that would compute the average sales of each order, not the average of each day). In the past, I’ve used a derived table to sum to the daily level, then created an average measure in the derived table’s view. Then I’ve join the derived table back to the original view using the date dimension.

The problem with this approach is that I can’t anticipate other ways that a user might want to slice this derived table average measure. What if you wanted to see average daily sales for a specific product on Wednesdays? In order for that to work, the product name field needs to be a dimension in the derived table and needs to exist in the join from the derived table to the original view.

Is there a best practice for creating measures that can be dynamically aggregated to a specified granularity? Curious to know if I can do something clever with Liquid or parameters to accomplish this.


(Carter Moar) #2

I ran in to a similar situation the other day @joshtemple. I’m not in love with my solution and would be hesitant to say “hey do this, definitely best practice!”, but it’s been working ok for me for the past week or two.

I basically took my daily roll up and used parameters to allow the user to pick the thing they want to slice by. I came up with a fairly exhaustive list (in this case it only 4 or 5 things made sense) and baked it in to the derived table. To keep things “simple” and reduce interpretation costs I limited the users’ slice combination options to just 1, created my content, and then hiddened all of the fields so that people don’t confuse themselves due to aggregating over invalid groupings (after confusing myself several times).

I’ve oversimplified the example to try and make the concepts easy to read, but he derived table’s SQL ended up looking something like this:

with count_finder as (
          , {% if arbitrary_grouping._parameter_value == 'thing0' %}
              thing_people_might_use as grouper
            {% elsif arbitrary_grouping._parameter_value == 'thing1' %}
              other_thing_people_might_use as grouper
            {% elsif arbitrary_grouping._parameter_value == 'thing2' %}
              other_other_thing_people_might_use as grouper
            {% else %}
             whatever_makes_sense_as_the_defualt as grouper
            {% endif %}
          , count(*) as grouping_count
        from the_base_thing
        left join some more stuff
--a few of the grouping options required additional joins
        {% if arbitrary_grouping._parameter_value == 'thing1' %}
          left join some_stuff
        {% endif %}
      where [whatever you need to filter on]
      group by 1,2

I then made sure the model joins all mapped up, using the parameters as necessary and did some label name substitution to make things legible to end users in my blessed content.

It’s more brittle than I’d like, but has been working ok for me so far.

1 Like