Modeling Nested Data in Snowflake

Snowflake offers a variety of built in functions to effectively query semi structured data. The good news is that LookML allows for effective modeling/ad hoc exploration of tables with nested fields without requiring pre-flattening.

Sample Data Structure:

Below is a sample table structure of a typical event stream of session level records where hit level detail as well as totals summarizing the session are stored as variants (something you would likely see in a GA 360 data export).

Non repeated object extraction:

The totals variant column in the sample data is just a simple JSON structure with key-value pairs where there is no repetition(i.e. multiple totals objects). See below for the sample totals column value:

Totals json structure:

{ "hits": 2, "pageviews": 2, "bounces": 1, "visits": 1 }

Extraction with <column>:<key>::<datatypecast> notation is relatively simple to handle in lookML. Mapping of the visits metric key to lookML would look like:

view: sessions {
    dimension: visits {sql: ${TABLE}.totals:visits::integer ;; hidden: true}
    measure: sum_of_visits { type: sum sql: ${visits} ;;}
}

Repeated object extraction:

The hits column looks a bit more daunting as there is repetition of both hits as well as index and values within the customDimensions array. Below is the formatted hits variant column:

[
  {
    "hitNumber": 1,
    "time": 0,
    "hour": 10,
    "minute": 10,
    "type": "page",
    "page": {
      "pagePath": "/path1/",
      "hostname": "www.mydomain.com",
      "pageTitle": "My Title",
    },
    "customDimensions": [
      {
        "index": 1,
        "value": "a"
      },
      {
        "index": 2,
        "value": "b"
      },
      {
        "index": 3,
        "value": "c"
      }
    ]
  },
    {
    "hitNumber": 2,
    "time": 0,
    "hour": 10,
    "minute": 11,
    "type": "page",
    "page": {
      "pagePath": "/path2/",
      "hostname": "www.mydomain.com",
      "pageTitle": "My Title",
    },
    "customDimensions": [
      {
        "index": 1,
        "value": "a"
      },
      {
        "index": 2,
        "value": "b"
      },
      {
        "index": 3,
        "value": "c"
      }
    ]
  }
]

There is a lateral flatten snowflake function to handle repeated fields by exploding the compound values within the variant in lookml joins by writing your own join relationships with the sql: clause rather than the sql_on: clause.

You will also notice that all the key references are prefixed with ${TABLE}.value.<keys> syntax to leverage the appropriate exploded column output from the lateral flattening.

You will see in the above example we had an array for customDimensions with repetition of index,value properties. The same type of lateral flattening will happen here, but you need to include a required_joins field as the unnesting of customDimensions is dependent on hits being unnested as well.

Explore definition:

explore: sessions {
  join: hits {
    sql: ,lateral flatten(input=>hits) hits ;;
    relationship: one_to_many
  }
  join: hits_customdimensions {
    sql: ,lateral flatten(input => hits.value:customDimensions) hits_customdimensions ;;
    required_joins: [hits]
    relationship: one_to_many
}
}

Views:

view: hits {
  dimension: page_path {
    type: string
    view_label: "Hits:Page"
    sql: ${TABLE}.value:page:pagePath ;;
  }
  dimension: hit_number {
    type: number
    sql: ${TABLE}.value:hitNumber ;;
  }
}

view: hits_customdimensions {
  view_label: "Hits:Custom Dimensions"
  dimension: hits_cd_index {
    type: string
    sql: ${TABLE}.value:index ;;
  }
  dimension: hits_cd_value {
    type: string
    sql: ${TABLE}.value:value ;;
  }
  }

The lookml pattern above will help to only lateral flatten as needed and is a good starting point to begin thinking about how you want to model nested data on Snowflake.

2 Likes

Hi @brett_g,

I’m trying to follow your discourse. but I’m only getting nulls for my all my column names.

The JSON file I’m trying to parse is below. This is just one “cell” I have a column full of these json snippets - one for each order id.

{ “active”: false, “android”: [], “checkpoints”: [ { “checkpoint_time”: “2019-05-08T18:06:46”, “city”: null, “coordinates”: [], “country_iso3”: “USA”, “country_name”: “United States”, “created_at”: “2019-05-10T02:23:06+00:00”, “location”: “UNITED STATES”, “message”: “ELECTRONIC NOTIFICATION RECEIVED: YOUR ORDER HAS BEEN PROCESSED AND TRACKING WILL BE UPDATED SOON”, “slug”: “dhl-global-mail”, “state”: null, “subtag”: “InfoReceived_001”, “subtag_message”: “Info Received”, “tag”: “InfoReceived”, “zip”: null }, { “checkpoint_time”: “2019-05-08T18:21:07”, “city”: null, “coordinates”: [], “country_iso3”: “USA”, “country_name”: “United States”, “created_at”: “2019-05-10T02:23:06+00:00”, “location”: “UNITED STATES”, “message”: “EN ROUTE TO DHL ECOMMERCE”, “slug”: “dhl-global-mail”, “state”: null, “subtag”: “InTransit_001”, “subtag_message”: “In Transit”, “tag”: “InTransit”, “zip”: null }, { “checkpoint_time”: “2019-05-09T20:26:36”, “city”: “Avenel”, “coordinates”: [], “country_iso3”: “USA”, “country_name”: “United States”, “created_at”: “2019-05-10T20:36:19+00:00”, “location”: “Avenel, NJ, US”, “message”: “ARRIVAL AT DHL ECOMMERCE DISTRIBUTION CENTER”, “slug”: “dhl-global-mail”, “state”: null, “subtag”: “InTransit_003”, “subtag_message”: “Arrival scan”, “tag”: “InTransit”, “zip”: null }, { “checkpoint_time”: “2019-05-10T13:21:44”, “city”: “Avenel”, “coordinates”: [], “country_iso3”: “USA”, “country_name”: “United States”, “created_at”: “2019-05-10T20:36:19+00:00”, “location”: “Avenel, NJ, US”, “message”: “PROCESSED”, “slug”: “dhl-global-mail”, “state”: null, “subtag”: “InTransit_001”, “subtag_message”: “In Transit”, “tag”: “InTransit”, “zip”: null }, { “checkpoint_time”: “2019-05-10T14:00:17”, “city”: “Avenel”, “coordinates”: [], “country_iso3”: “USA”, “country_name”: “United States”, “created_at”: “2019-05-10T20:36:19+00:00”, “location”: “Avenel, NJ, US”, “message”: “DEPARTURE ORIGIN DHL ECOMMERCE FACILITY”, “slug”: “dhl-global-mail”, “state”: null, “subtag”: “InTransit_007”, “subtag_message”: “Departure Scan”, “tag”: “InTransit”, “zip”: null }, { “checkpoint_time”: “2019-05-12T18:52:04”, “city”: “Orlando”, “coordinates”: [], “country_iso3”: “USA”, “country_name”: “United States”, “created_at”: “2019-05-13T03:09:49+00:00”, “location”: “Orlando, FL, US”, “message”: “ARRIVAL DESTINATION DHL ECOMMERCE FACILITY”, “slug”: “dhl-global-mail”, “state”: null, “subtag”: “InTransit_003”, “subtag_message”: “Arrival scan”, “tag”: “InTransit”, “zip”: null }, { “checkpoint_time”: “2019-05-13T04:32:37”, “city”: “Orlando”, “coordinates”: [], “country_iso3”: “USA”, “country_name”: “United States”, “created_at”: “2019-05-13T09:09:50+00:00”, “location”: “Orlando, FL, US”, “message”: “TENDERED TO DELIVERY SERVICE PROVIDER”, “slug”: “dhl-global-mail”, “state”: null, “subtag”: “InTransit_002”, “subtag_message”: “Acceptance scan”, “tag”: “InTransit”, “zip”: null }, { “checkpoint_time”: “2019-05-14T04:51:00”, “city”: “Miami Beach”, “coordinates”: [], “country_iso3”: “USA”, “country_name”: “United States”, “created_at”: “2019-05-14T09:22:58+00:00”, “location”: “Miami Beach, FL, US”, “message”: “ARRIVAL AT POST OFFICE”, “slug”: “dhl-global-mail”, “state”: null, “subtag”: “InTransit_003”, “subtag_message”: “Arrival scan”, “tag”: “InTransit”, “zip”: null }, { “checkpoint_time”: “2019-05-14T03:36:00”, “city”: “Miami Beach”, “coordinates”: [], “country_iso3”: “USA”, “country_name”: “United States”, “created_at”: “2019-05-14T15:24:16+00:00”, “location”: “Miami Beach, FL, US”, “message”: “ARRIVED USPS SORT FACILITY”, “slug”: “dhl-global-mail”, “state”: null, “subtag”: “InTransit_002”, “subtag_message”: “Acceptance scan”, “tag”: “InTransit”, “zip”: null }, { “checkpoint_time”: “2019-05-14T08:45:00”, “city”: “Miami Beach”, “coordinates”: [], “country_iso3”: “USA”, “country_name”: “United States”, “created_at”: “2019-05-14T15:24:16+00:00”, “location”: “Miami Beach, FL, US”, “message”: “SORTING COMPLETE”, “slug”: “dhl-global-mail”, “state”: null, “subtag”: “InTransit_001”, “subtag_message”: “In Transit”, “tag”: “InTransit”, “zip”: null }, { “checkpoint_time”: “2019-05-14T08:55:00”, “city”: “Miami Beach”, “coordinates”: [], “country_iso3”: “USA”, “country_name”: “United States”, “created_at”: “2019-05-14T15:24:16+00:00”, “location”: “Miami Beach, FL, US”, “message”: “OUT FOR DELIVERY”, “slug”: “dhl-global-mail”, “state”: null, “subtag”: “OutForDelivery_001”, “subtag_message”: “Out for Delivery”, “tag”: “OutForDelivery”, “zip”: null }, { “checkpoint_time”: “2019-05-14T13:44:00”, “city”: “Miami Beach”, “coordinates”: [], “country_iso3”: “USA”, “country_name”: “United States”, “created_at”: “2019-05-14T18:54:56+00:00”, “location”: “Miami Beach, FL, US”, “message”: “DELIVERED [IN/AT MAILBOX]”, “slug”: “dhl-global-mail”, “state”: null, “subtag”: “Delivered_001”, “subtag_message”: “Delivered”, “tag”: “Delivered”, “zip”: null } ], “courier_destination_country_iso3”: “USA”, “created_at”: “2019-05-10T02:19:28+00:00”, “custom_fields”: { “item_names”: “Compact plastic applicator tampons - Custom / Box of 18 x 1” }, “customer_name”: “Stephanie Denault”, “delivery_time”: 6, “delivery_type”: null, “destination_country_iso3”: “USA”, “emails”: [ "stephaniedenault@me.com" ], “expected_delivery”: “2019-05-16”, “id”: “5cd4dfb04dfc673c10bea053”, “ios”: [], “language”: null, “last_mile_tracking_supported”: null, “last_updated_at”: “2019-05-14T18:54:56+00:00”, “note”: null, “order_id”: “1001004826670”, “order_id_path”: null, “order_promised_delivery_date”: null, “origin_country_iso3”: “USA”, “path”: “deprecated”, “pickup_location”: null, “pickup_note”: null, “return_to_sender”: false, “shipment_delivery_date”: “2019-05-14T13:44:00”, “shipment_package_count”: 1, “shipment_pickup_date”: “2019-05-08T18:06:46”, “shipment_type”: “DHL SM Parcel Ground”, “shipment_weight”: 0.408, “shipment_weight_unit”: “lb”, “signed_by”: null, “slug”: “dhl-global-mail”, “smses”: [], “source”: “shopify-private”, “subscribed_emails”: [], “subscribed_smses”: [], “subtag”: “Delivered_001”, “subtag_message”: “Delivered”, “tag”: “Delivered”, “title”: “L-2547090”, “tracked_count”: 22, “tracking_account_number”: null, “tracking_destination_country”: “USA”, “tracking_key”: null, “tracking_number”: “9274899999898086172176”, “tracking_origin_country”: null, “tracking_postal_code”: “33141”, “tracking_ship_date”: “20190510”, “tracking_state”: null, “unique_token”: “deprecated”, “updated_at”: “2019-05-14T18:54:56+00:00” }

The explore in the model file :

explore: aftership_data {
join: msg {
sql: ,lateral flatten(input => msg) msg ;;
relationship: one_to_many
}
join: msg_checkpoints {
sql: ,lateral flatten(input => msg.value:checkpoints) msg_checkpoints ;;
relationship: one_to_many
required_joins: [msg]
}
}

Please find the views below:

view: aftership_data {
sql_table_name: AFTERSHIP.DATA ;;

dimension: __sdc_primary_key {
type: string
sql: ${TABLE}."__SDC_PRIMARY_KEY" ;;
}

view: msg {
dimension: order_id {
type: string
sql: ${TABLE}.value:order_id;;
}
}

view: msg_checkpoints {
dimension: checkpoints__message {
type: string
sql: ${TABLE}.value:checkpoints:message::string ;;
}
}

I’m only getting nulls for Order ID and Checkpoints Message. Can you tell me what I’m missing on?

Here’s the sql for the explore:

SELECT
aftership_data."__SDC_PRIMARY_KEY" AS “aftership_data.__sdc_primary_key”,
msg.value:order_id AS “msg.order_id”,
msg_checkpoints.value:checkpoints:message::string AS “msg_checkpoints.checkpoints__message”
FROM AFTERSHIP.DATA AS aftership_data
,lateral flatten(input => msg) msg
,lateral flatten(input => msg.value:checkpoints) msg_checkpoints

GROUP BY 1,2,3
ORDER BY 1
LIMIT 500

Hey, this is a pretty amazing walk through, thanks for that.
Though I ran into an issue and I thought it might be useful to others.

In your explore definition you say

join: hits_customdimensions {
sql: ,lateral flatten(input => hits.value:customDimensions) hits_customdimensions ;;
required_joins: [hits]
relationship: one_to_many
}

Though it came out that the comma created the following problem:
Any table joined to the base table (say the table hits in your example) after joining hits_customdimensions would trigger the following error:

Lateral view cannot be on the left side of JOIN

Solution for that is to use the notation CROSS JOIN instead of a comma

1 Like