Error: Non-unique value/primary key error


(Anika Kuesters Smith) #1

Meaning of Error

In Looker, you might come across an error of this form:

Non-Unique value/primary key (or sql_distinct_key), value overflow or collision when computing sum

This error occurs when Symmetric Aggregates (formerly Joined Sums and Averages) is used to bring measures through a join. It indicates that a field defined as primary_key: true in one of the views involved in the join is not completely unique for the values being calculated, and therefore cannot safely be used to make distinct aggregates. It is usually caused by creating a calculation based on a dimension from another view further down in the join hierarchy, or that your primary key is not in fact unique.

Resolving this Error

Make sure that any field you have defined as primary_key: true is unique in its table. One way to test this is to create an explore for the view you want to test, then make a Look with that primary key and a count of the table. Sort on the count, and look for instances where count is greater than 1; this indicates a non-unique field.


How to Identify and Fix Issues to Support Symmetric Aggregates
Why aren't my measures coming through a join?
(jake) #2

If you don’t have a unique primary key in a table, you can build a concatenated primary key in LookML or, in a derived table, define one.

Even when you have defined a primary_key in each view (and it is in fact unique), you may see the “non-unique value” error when running a query. One case where you might see this is if you have multiple one_to_many joins in an explore and there is a measure that aggregates a dimension that combines values from two of the joined in tables. In this case, the combined dimension may no longer be associated with a unique primary_key from the point of view of the base table.

The Solution: for the measure that is giving the error, you can add the parameter sql_distinct_key and concatenate the primary keys from the views that are being combined.

This is not easy to understand, so let’s consider a concrete example:
[NOTE: This may NOT the best way to model the following data. But is used as an example of how this error may occur. You may also want to consider if there is a better way to model your data if you see this error]

Consider

explore: aircraft {
  join: accidents {
    type: left_outer
    relationship: one_to_many
    sql_on: ${aircraft.tail_number} = ${accidents.registration_number}
  }
  join: flights {
    type: left_outer
    relationship: one_to_many
    sql_on: ${flights.tail_num} = ${aircraft.tail_number}
  }
}

I have a unique, primary_key defined for each of the views involved. But if I try to build a metric like:

view: accidents {
  dimension: morning_injuries {
    sql:  CASE WHEN ${flights.depart_hour_of_day}::int < 12 THEN ${number_injured}
            ELSE 0
            END;
  }
  measure: total_morning_injuries {
    type: sum
    sql: ${morning_injuries}
  }
}

I will get the eponymous error when I include total_morning_injuries in a query. Let’s take a look at one plane to see why - the case of N371UA. In our data set, the aircraft N371UA has flown over 11,000 times and had 2 accidents (one with 0 injuries, one with 15). To pull in both the flights and the accidents data for my morning_injuries dimension - Looker needs to join both of those on aircrafts. Consider the table that results if I query the primary keys for flights and accidents and morning_injuries for N371UA:

If I want to aggregate over morning_injuries, I only want to count 15 once and 0 once. But neither of the primary keys from flights or accidents can uniquely identify the morning_injuries dimension. So in order to aggregate correctly I need to write:

measure: total_morning_injuries {
    type: sum
    sql: ${morning_injuries}
    sql_distinct_key: ${flights.primary_key}||${accidents.accident_number}
  }

The concatenation of the primary keys DOES identify that there is one flight/accident combo with 15 injuries and one flight/accident combo with 0 injuries. And with the concatenated sql_distinct_key, Looker can perform the aggregates correctly!