How can I create filterable charts that show all relevant dimension values even when there are no associated facts?

visualizations
lookml
open
low_priority
filter
(Ian Terrell) #1

I’m using a Kimball-lite data schema and have run into an issue with building charts according to how our business users would like to see them.

The issue has been dubbed “the zero problem” internally, and it has to do with reporting on things that didn’t happen. Although this pattern repeats itself across our system, the simplest use case to illustrate the point is our question and answer modeling.

In our app, users can answer multiple choice questions. I’ve modeled it generally like this:

erd

For a given question, we want to show a chart of its answers and how many responses each one got. Crucially, we want to show all answer options even if they got no responses!

For data, let’s assume the only user metadata field is something called is_foodie, and we have two multiple choice questions:

  1. What is your favorite food?
    a. Salad
    b. Pizza
    c. Tacos
    d. Coq au vin
  2. What is your favorite color?
    a. Blue
    b. Purple
    c. Red

We also have a smattering of responses to these questions.

Data Tables
surrogate_key name is_foodie
1 Ian false
2 John true
3 Sally true
4 Peter false
5 Jeff false
surrogate_key question_key question_body choice_body
1 1 What is your favorite food? Salad
2 1 What is your favorite food? Pizza
3 1 What is your favorite food? Tacos
4 1 What is your favorite food? Coq au vin
5 2 What is your favorite color? Blue
6 2 What is your favorite color? Purple
7 2 What is your favorite color? Red
surrogate_key dim_user_key dim_question_key
1 1 2
2 2 4
3 3 4
4 4 3
5 5 3
6 1 5
7 2 6
8 4 7

SQL Fiddle Schema

Setting this up in the straightforward way in Looker will yield queries roughly like this in order to get a count of responses per question, grouped by choice:

  SELECT 
      dim_questions.choice_body
    , COUNT(fact_responses.surrogate_key)
 FROM fact_responses
 JOIN dim_questions 
   ON fact_responses.dim_question_key = dim_questions.surrogate_key
WHERE dim_questions.question_key = 1
GROUP BY 1
ORDER BY 1

With our data, that yields:

choice_body count
Coq au vin 2
Pizza 1
Tacos 2

Unfortunately, no one responded that they liked salad so that’s not reported in the chart! This is a nonstarter for our use case. Writing SQL manually I would solve this with a right join.

  SELECT 
      dim_questions.choice_body
    , COUNT(fact_responses.surrogate_key)
 FROM fact_responses
RIGHT JOIN dim_questions 
   ON fact_responses.dim_question_key = dim_questions.surrogate_key
WHERE dim_questions.question_key = 1
GROUP BY 1
ORDER BY 1

And that yields what we want:

choice_body count
Coq au vin 2
Pizza 1
Salad 0
Tacos 2

All good so far, and I can do the above in Looker by modeling my explore with right join.

But now let’s try to slice and dice the data. In this case, let’s look only at what the foodies answered. Simply adding the filter to the chart in the straightforward way yields SQL like this:

  SELECT 
      dim_questions.choice_body
    , COUNT(fact_responses.surrogate_key)
 FROM fact_responses
 JOIN dim_users
   ON fact_responses.dim_user_key = dim_users.surrogate_key
RIGHT JOIN dim_questions 
   ON fact_responses.dim_question_key = dim_questions.surrogate_key
WHERE dim_questions.question_key = 1
  AND dim_users.is_foodie
GROUP BY 1
ORDER BY 1
choice_body count
Coq au vin 2

Uh oh, the zeros are missing again! That’s because the WHERE filtration occurs after the join, and the values are either null or false. I think the only way to solve this is to use the join’s ON clause for filtration rather than the WHERE clause (and then to make sure that the right joined dimension is joined last).

  SELECT 
      dim_questions.choice_body
    , COUNT(fact_responses.surrogate_key)
 FROM fact_responses
 JOIN dim_users
   ON fact_responses.dim_user_key = dim_users.surrogate_key
  AND dim_users.is_foodie
RIGHT JOIN dim_questions 
   ON fact_responses.dim_question_key = dim_questions.surrogate_key
WHERE dim_questions.question_key = 1
GROUP BY 1
ORDER BY 1
choice_body count
Coq au vin 2
Pizza 0
Salad 0
Tacos 0

So far I’ve only figured out how to do that in Looker by using parameters, with LookML similar to this:

join: dim_users {
  type: inner
  sql_on:
    ${fact_responses.dim_user_key} = ${dim_users.surrogate_key}
    {% if dim_users.is_foodie_parameter._in_query %}
    and dim_users.is_foodie = {% parameter dim_users.is_foodie_parameter %}
    {% endif %}
    ;;
  relationship: many_to_one
}

This works in simple cases, but has a few downsides:

  1. A parameter must be set up that duplicates the dimension for all possible filters
  2. The join query must be amended for each possible filter
  3. Looker’s filter support is much more robust that this supports

The latter point is the one I’m most hung up on. I’d like to expose looks built this way on dashboards with other looks built in the straightforward way, and allow filters to work on all of them simultaneously. I can do this by mapping the dashboard’s filter to the chart’s parameter, but this only works in simple cases, like a single string value.

My main question is:

Is it possible to support this use case and take advantage of Looker’s filtration options?

That includes being able to filter on dates by range (e.g. “last 7 days” on another dimension), including multiple values (e.g. user name in ('Bob', 'Sally')), etc.

Has anyone else encountered this? What are other solutions? What am I missing?

Thanks!

0 Likes

The Podium — May 22nd, 2019
(Pan Sun) #2
{% if fact_responses.fact_date._in_query %}
              AND  {% condition fact_responses.fact_date%}  
                       ${dim_user.user_date}
                   {% endcondition %}
{% endif %}
0 Likes

(Ben Silverstein) #3

Hey Ian, welcome! If I’m understanding correctly, why not just do a full join between fact_responses and dim_questions on your explore to replicate the effects of a right_join? Since your fact_responses are limited to the questions from dim_questions, the practical effect of a full_join and a right_join should be the same, except for solving your “show me when there’s no data” problem with a full_join.

Base = fact_responses, Full Join to dim_questions with SQL_ON fact_responses.dim_question_key = dim_questions.question_key, Full Join to dim_users with SQL_ON fact_responses.dim_user_key = dim_users.surrogate_key. Thinking through each of your scenarios, I believe the results would be what you’re expecting.

0 Likes

(Ian Terrell) #4

Pan_Sun — Thanks! I was unaware of the condition tag. I think this would get me further along with using filters in custom join logic.

bens — I’m not sure a full join solves the issue appropriately, as filters are still generally applied after the join, thus filtering out the “zero” cases (e.g. if no one responded with an answer, then filtering on just foodies can’t let the data through, since “no one” can’t be a foodie). But I might be misunderstanding!


After speaking with Looker support, the solution I ended up going with was using a native derived table with bind filters. This ultimately allows the filters to be applied before the right join, thus preserving both filtering logic and showing zero data cases.

The native derived table generates a common table expression, with SQL for the query similar to this:

WITH native_derived_table AS (
    SELECT 
        fact_responses.surrogate_key
      , fact_responses.dim_question_key
   FROM fact_responses
   JOIN dim_users
     ON fact_responses.dim_user_key = dim_users.surrogate_key
  WHERE dim_users.is_foodie
)

  SELECT 
      dim_questions.choice_body
    , COUNT(native_derived_table.surrogate_key)
 FROM native_derived_table
RIGHT JOIN dim_questions 
   ON native_derived_table.dim_question_key = dim_questions.surrogate_key
WHERE dim_questions.question_key = 1
GROUP BY 1
ORDER BY 1

Since the common table expression is filtered on creation, then the right join still preserves the empty rows afterward.

Assuming a straightforward base explore of fact_responses, the LookML for the derived table is something like this:

view: zero_fact_responses {
  derived_table: {
    explore_source: fact_responses {
      column: surrogate_key { field: fact_responses.surrogate_key }
      column: dim_question_key { field: fact_responses.dim_question_key }

      bind_filters: {
        from_field: zero_fact_responses.user_is_foodie
        to_field: dim_users.is_foodie
      }
    }
  }

  filter: user_is_foodie {
    type: yesno
  }

  measure: count {
    type: count_distinct
    sql: ${TABLE}.surrogate_key ;;
  }
}

That table sets up all of the filterable options to be used as part of creation of the derived table, so they will be applied to the common table expression. Then in our explore we can connect this derived table to the dimension with a right join very simply:

explore: zero_fact_responses {
  always_join: [
    dim_question_choices,
  ]

  join: dim_questions {
    sql:
      right join schema.dim_questions
      on zero_fact_responses.dim_question_key = dim_questions.surrogate_key
    ;;
    relationship: many_to_one
  }
}

Although I still need to set up each filter condition manually, this solves our needs and lets our users apply filters via Looker in the way that they expect.

Thanks everyone!

2 Likes