How to Identify and Fix Issues to Support Symmetric Aggregates


(Ken Cunanan) #1

Why Do We Need Looker’s Symmetric Aggregate Functionality?

SQL enables you to join two tables together on columns containing similar values. However, when joining these two tables together using SQL, the relationships between these tables can potentially cause some problems for aggregate functions in SQL.

As discussed in our more detailed explanation of symmetric aggregates, there are actually two types of aggregate functions in SQL: symmetric aggregates, and non-symmetric aggregates.

Symmetric Aggregates are aggregate functions that will be computed correctly in SQL no matter how the tables are joined. These functions are:

COUNT(DISTINCT)
MAX()
MIN()

Non-Symmetric Aggregates are aggregate functions that depend on how the tables are joined together to be computed correctly in SQL. These functions are:

SUM()
AVERAGE()
COUNT()

As explained in this doc about non-symmetric aggregate functions, if the two tables joined together have a one-to-one relationship, non-symmetric aggregates will work in both parts of the join in SQL. If the two tables joined together have a one-to-many relationship, non-symmetric aggregates will work on the primary table, but may not work on the joined tables if they’re fanned out. However, if the two tables joined together have a many-to-one, or many-to-many relationship, the aggregate functions could provide inaccurate data in both tables in the SQL.

Looker solves this problem for you, by translating measures of type: sum, type: avg, and type: count into SQL code that is symmetric. This means that these measures will return accurate results for all joins, regardless of the relationships between the tables.

To do this, Looker relies on the developer to specify a primary key for the table. This primary key can be a composite of multiple table columns but the key must be distinct – each row’s primary key must be unique.

The issues that developers often encounter for symmetric aggregates—both related to the required primary key.

Issue 1: Measures Not Appearing in the Field Picker

Occasionally, you’ll declare measures on a table and join that table into an explore, only to find that the measures don’t appear in the field picker in the Explore!

A disappearing measure is usually caused by the lack of a primary key in the view containing the measure. Symmetric aggregates depend on the primary key of the view to disambiguate the individual component values.

To fix this issue, you want to make sure that you declare a primary key on a dimension in the view that contains those measures and the view on the other side of the join.

Issue 2: Non-Unique Primary Key

When bringing measures through a join, Looker needs a distinct primary key (all values unique) in order to properly aggregate the values in the table.

If you declared a primary key for a dimension but that field has duplicate values, you can see this error:

Non-Unique value/primary key (or sql_distinct_key), value overflow or collision when computing sum
To verify the cause of this error, query the primary key dimension(s) and a regular count dimension together, with a sort on the the count dimension in descending order. If we see any instances where the count is greater than 1, this field(s) is not unique and cannot be used as the primary key.

The way to resolve this would be to create a primary key by concatenating the original field with another field to create a compound primary key.