Symmetric aggregates that are only needed some of the time

TLDR: We need symmetric aggregates for a small number of records in our db, but they’re causing performance issues for the records that don’t need them. Any suggestions?

At Zearn, we use Looker to report on usage of our web application. We’re an educational nonprofit, so our customers/users are students, teachers, and administrators (principals etc.) at elementary schools and school districts.

One quirk of our data model is that some users can belong to multiple schools (our “customers”). Specifically, we have a specific user role (“school administrator”) who can belong to multiple schools. School admins who actually belong to multiple schools make up probably <1% of our userbase.

Our main usage Explore looks something like:

explore: fact_daily_usage {
    join: dim_zearn_users {
        relationship: one_to_one
        sql_on: ${fact_daily_usage.user_id} = ${dim_zearn_users.id}
    }
    join: dim_schools {
        relationship: many_to_many #THE PROBLEM
        sql_on: ${dim_zearn_users.school_id} = ${dim_schools.id}
    }
    # ... etc
} 

Because the join to schools is many-to-many for these multi-school admins, it can cause fanouts, which Looker helpfully solves with symmetric aggregates. However, we have some Looks from this explore that only deal with other types of users and don’t need symmetric aggregates, but are slowed down a ton (e.g. 2 minutes runtime becomes 40 mins) by Looker inserting symmetric aggregates (after all, we told Looker it was many_to_many!).

Does anyone have ideas of how to solve? One thing we could do is make a separate explore just for admin usage, but I’m worried that would be confusing to our business users.

Just spitballing here to get the conversation rolling…

I agree that creating a separate explore and using symmetric_aggregates: no would be confusing. However, since you can “turn off” sym aggs at the field level by using type: number for the relevant measure and specifying the aggregation in the sql: parameter, you may be able to do something a bit more fine-grain.

What if you had two measures:

measure: sum_course_reg_sym_agg {
type: sum
hidden: yes
sql: ${TABLE}.course_reg ;;
}
measure: sum_course_reg_no_sym_agg {
type: number
hidden: yes
sql: SUM(${TABLE}.course_reg ;;
}

and then a third that will be the actual measure used in the Look:

measure: sum_course_reg {
type: number
label: "Sum Course Registration"
sql: {% if big_bad_join_view._in_query %}
       ${sum_course_reg_sym_agg}
       {% else %}
        ${sum_course_reg_no_sym_agg}
       {% endif %} ;;
}

The liquid if logic would check to see if the relevant join is in the current query, and if not, use the non-symmetric aggregate measure.

I haven’t tested this out. Anyone think it would/wouldn’t work? The part I’m not certain about is if symmetric aggregates will still kick in if the measure is referenced in another measure of type: number.

It works! I got bored and tested it.

I think that honestly doesn’t sound like that bad of a system. Sure, it requires 3 measures for each measure, which isn’t great, but from a business user ease-of-use standpoint, it seems like the best option.

Awesome, thanks for that suggestion @izzy! To be clear, we will be joining to the dim_zearn_users view in most/all of these cases - it’s just that sometimes (e.g. when we’re filtering only for teachers and/or students) we don’t need symmetric aggregates from that join, and sometimes (when school administrators are included) we do.

Is there a way to modify your code to account for that?

Hi Michael,

I’ve written a lot about avoiding symmetric aggregates altogether:


However, if you do want to keep them in some cases, maybe you could implement two separate joins for your users table, relabel one of the joins as “Administrators”, and then apply an additional constraint in your two join conditions that checks for role= or role<> administrator. That way, which version of the field is chosen by your explore user both determines which join will be used, but also which set of rows from the users table will be used

2 Likes

Whoa, I hadn’t seen those posts before! My mind is blown. Thanks for sharing them. Looks like it’ll be a bit of work, but I suspect one of those approaches will solve our problem here. Stay tuned!

And Fabio comes through with the actual analytics best practices instead of the janky Izzy workaround! :pray:

1 Like