When is it okay to use measure type: number? (ERROR: Column must appear in the GROUP BY clause or be used in an aggregate function)

low_priority
done
reply
(Nicole Beyer) #1

Measures of type: number are meant to be used to perform simple transformations on other measures. If you use a measure of type: number without aggregating any data Looker will throw errors, or worse, fail to throw errors but return bad data.

If Looker does throw an error it will probably be of the form:
A field must appear in the group by clause or be used in an aggregate function.

For example, say we defined some dimensions and measures in the following manner:

dimension: taxi_out_time {
  type: number
  sql: ${TABLE}.taxi_out ;;
}

dimension: taxi_in_time {
  type: number
  sql: ${TABLE}.taxi_in ;;
}

measure: type_number_to_break  {
  # bad measure!
  type: number
  sql: ${taxi_out_time} - ${taxi_in_time} ;;
}

Looker will throw the following error if we attempt to create an query that includes taxi_in_time and our bad measure type_number_to_break:

This error occurs because of the way dimensions and measures map to columns in SQL. A Looker dimension tells SQL that this column should appear in the GROUP BY clause, whereas a measure is meant to be aggregate and thus should not appear in the GROUP BY clause.

So, when we declare a measure type: number without actually performing an aggregation, SQL excludes this column from the GROUP BY clause when it in fact should be in the GROUP BY clause

A Dangerous Edgecase

We note that Looker will not always throw an error. For example, if we run a query with all three of the fields we defined above Looker will not throw an error because we happened to include both dimensions that the bad measure is based on, thus everything is included in the GROUP BY clause. However, we note that the data is still not being aggregated, possibly leading to inaccurate data.

3 Likes

(sam) #2

Measures should always perform aggregations. Given that, you’ll want to solve this error by either:

  1. Using a dimension instead of a measure. This would work in the example above, where the two inputs are just dimensions.
  2. Turn the dimensions into measures before using a measure of type number. You can do this with a dummy measure of type max, for example.
1 Like

(sam) #3

Another Dangerous Edge Case: MySQL

MySQL won’t throw an error like this, and will allow arithmetic between aggregates and non-aggregates. However, this behavior should still be avoided.

Take for example this query which selects all the “years” in my video game database. Year isn’t a primary key, and we aren’t grouping by year, so we see several duplicates. This seems expected.

|650xauto

The problem arises when we also select a COUNT(*) without any grouping. As expected, in the below query we get back one row with the full count of records. But we also see the “year” here as well, with a random value chosen. Most databases would have errored here instead of showing us a year.

|650xauto

While this may seem purely cosmetic, this can lead to bad data fast once you start doing arithmetic between the aggregated count and unaggregated year. Since you don’t know which year will be used, your result in the below query is probably unexpected and incorrect.

​|650xauto

0 Likes

(Izzy) #4

What do you mean by “random”? Is it actually just throwing out a random value, or is that a count that’s just inaccurate due to the grouping (or lack thereof) ?

Also, thanks for sharing! sneaky.

0 Likes

(Pan Sun) #5
measure: type_number_to_break  {
 type: sum
 sql: ${taxi_out_time} - ${taxi_in_time} ;;
 }
0 Likes