Community

LookML: Nested Records in BigQuery

New LookML is coming to help deal with nested records in BigQuery. This feature will land in 3.34 and is available now on learnbeta.

Consider the BigQuery Table:

The data set is about people, but there are two subrecords.

The phoneNumber subrecord is simple, not repeated and really not any different from a regular field.

The subrecord of ‘citiesLived’ is REPEATED and each person can live in multiple cities.

The design goal we had in mind was to make it simple as possible to explore this data and to build a model.

The first rule, for dimensions and measures that are based in repeated records, declare

fanout_on: record_name

Where record_name is the name of the subrecord in the table. We chose the words fanout_on because that’s what happens when you use the field: the data in the person table fans out. Any SUM or AVERAGE can’t be computed when you included fanned out fields in your query, and will be removed with a warning when the query is run.

BigQuery specifically, allows aggregates on subrecords (COUNT WITHIN RECORD xxx). Unfortunately, they aren’t dimensions, as you might think. If you want to use them, place them in a derived table or use sql_table_name.

See the full example below and play with it on learnbeta

- explore: persons_living
- view: persons_living
  sql_table_name: |
    (SELECT *
      , COUNT(citiesLived.place) WITHIN RECORD as num_cities
    FROM [bigquery-samples:nested.persons_living])
  fields:
  - dimension: kind

  - dimension: full_name
    sql: ${TABLE}.fullName

  - dimension: age
    type: number

  - dimension: gender

  - dimension: phone_number
    sql: ${TABLE}.phoneNumber.number

  - dimension: phone_area_code
    sql: ${TABLE}.phoneNumber.area_Code
    
  - dimension: age_tier
    type: tier
    tiers: [20,30,40]
    sql: ${age}
    style: integer
    
  - dimension: num_cities
    type: number
    
  - measure: average_num_cities
    type: average
    sql: ${num_cities}
    
  - measure: count
    type: count_distinct
    sql: ${full_name}
    drill_fields: [full_name, kind, age]

  - measure: average_age
    type: average
    sql: ${age}
    decimals: 2
    
  - dimension: city_lived
    sql: ${TABLE}.citiesLived.place
    fanout_on: citiesLived
    
  - dimension: years_lived
    sql: ${TABLE}.numberOfYears
    fanout_on: citiesLived
    
  - measure: cities_count
    type: count_distinct
    sql: ${city_lived}
    drill_fields: [city_lived]
    fanout_on: citiesLived
    
  - measure: city_list
    type: list
    list_field: city_lived
    fanout_on: citiesLived

An alternative approach using extends

In this example, there are really two Explores, the person Explore and the cities lived Explore. You can model these in LookML directly. Creating two Explores forces the end user to choose the right Explore to ask the question. Looker will write the same SQL queries with either mechanism.

To learn more about LookML extensions, Looker’s code reusability method, see this article.
Play with it on Learnbeta

- connection: thelook_bigquery

- view: persons_living_base
  fields:
  - dimension: kind
  - dimension: full_name
    sql: ${TABLE}.fullName
  - dimension: age
    type: number
  - dimension: gender
  - dimension: phone_number
    sql: ${TABLE}.phoneNumber.number
  - dimension: phone_area_code
    sql: ${TABLE}.phoneNumber.area_Code
    
  - dimension: age_tier
    type: tier
    tiers: [20,30,40]
    sql: ${age}
    style: integer
    
  - measure: count
    type: count_distinct
    sql: ${full_name}
    drill_fields: [full_name, kind, age]
    
- explore: persons_living

- view: persons_living
  extends: persons_living_base
  derived_table:
    sql: |
      SELECT *
        , COUNT(citiesLived.place) WITHIN RECORD as num_cities
      FROM [bigquery-samples:nested.persons_living]
  fields:
  - dimension: num_cities
  
  - measure: average_age
    type: average
    sql: ${age}
    decimals: 2

- explore: persons_living_with_cities_lived

- view: persons_living_with_cities_lived
  extends: persons_living_base
  derived_table:
    sql: |
      SELECT *
      FROM flatten([bigquery-samples:nested.persons_living], citiesLived)
  fields:
  - dimension: city_lived
    sql: ${TABLE}.citiesLived.place
  - dimension: years_lived
    sql: ${TABLE}.numberOfYears
    
  - measure: cities_count
    type: count_distinct
    sql: ${city_lived}
    drill_fields: [city_lived]
    
  - measure: city_list
    type: list
    list_field: city_lived

Is this for BQ Legacy SQL only?