Using row_number to generate a primary key for a derived table


(Ian Ross) #1

The Problem

When making a derived table, you often want to join it to another view. However, if this derived table does not have a primary key declared, the measures may not show up through the join. More info on why this occurs here.

The Solution

You can use the row_number() window function in Postgres and Redshift to make a unique field that can be used as a primary key. In MySQL, you can use a variable that iterates every row to achieve the same effect.

Postgres and Redshift

view: derived_table_name {
  derived_table {
    sql: 
      SELECT 
        row_number() OVER(ORDER BY created_at) AS prim_key, 
        *
      FROM orders ;;
    }

   dimension: prim_key {
    type: number
    primary_key: yes
    sql: ${TABLE}.prim_key ;;
   }
}

MySQL

view: derived_table_name {
  derived_table {
    sql: 
      SELECT  
        CAST(@rownum := @rownum + 1 AS UNSIGNED) AS prim_key, 
        t.*
      FROM orders t, 
          (SELECT @rownum := 0) r ;;
    }
   dimension: prim_key {
    type: number
    primary_key: yes
    sql: ${TABLE}.prim_key ;;
}

:heart: Ian


Modelling challenge - Generate Unique key for Every record of that date
Error: Non-unique value/primary key error
#2

fyi, I found that in my version of MySQL, the * needs to be t.* in order for this to work without an error. Like so:

SELECT
CAST(@rownum := @rownum + 1 AS UNSIGNED) AS prim_key,
t.*
FROM orders t,
(SELECT @rownum := 0) r


(Ian Ross) #3

Nice, thank you for pointing that out, I will update my above syntax.

Thanks,
Ian


#4

You can also generate primary keys in views that do not have them by concatenating the values of 2 or more dimensions that create a unique combination for every row in the table. We have examples of doing this here.


(Tristan Handy) #5

Quick heads up – if you’re using a table that needs a composite PK to be created, it’s better to use a function like:

md5(field1 || field2)

This will generate a unique composite key that is cleaner to deal with in subsequent usage; we do this very frequently in our data models.