BigQuery - How to use filters to search for multiple matches in a nested field

low_priority
done
reply
bigquery
#1

We’re in the process of modeling our BigQuery data warehouse in Looker. While most analyses are working as expected, we are experiencing an issue that I am not sure how to solve. Basically, we have a table that models individuals customer purchases with the following example structure:

---------------------------------------------------------
|  purchase_id | user_id |    date    | purchased_items |
---------------------------------------------------------
|        1     |    32   | 2018-01-01 |     pants       |
---------------------------------------------------------
|              |         |            |    t-shirt      |
---------------------------------------------------------
|              |         |            |     shoes       |
---------------------------------------------------------
|        2     |    62   | 2018-01-01 |     pants       |
---------------------------------------------------------
|              |         |            |    t-shirt      |
---------------------------------------------------------
|              |         |            |     shoes       |
---------------------------------------------------------
|        3     |    14   | 2018-01-01 |     pants       |
---------------------------------------------------------
|              |         |            |    t-shirt      |
---------------------------------------------------------
|        4     |    87   | 2018-01-01 |     pants       |
---------------------------------------------------------
|              |         |            |    t-shirt      |
---------------------------------------------------------
|              |         |            |     shoes       |
---------------------------------------------------------
|        5     |    47   | 2018-01-01 |     pants       |
---------------------------------------------------------
|              |         |            |      belt       |
---------------------------------------------------------
|              |         |            |     socks       | 
---------------------------------------------------------

We have a primary view called purchases that looks like the following:

view: purchases {
    dimension: purchase_id {
        description: "The ID of the purchase"
        type: number
        sql: ${TABLE}.purchase_id ;; 
    }

   dimension: user_id {
        description: "The ID of the user"
        type: number
        sql: ${TABLE}.purchase_id ;;
    }

   dimension_group: session {
        description: "The time that the session occurred as a date"
        hidden: no
        type: time
        timeframes: [
           raw,
           date,
           week,
           month,
           quarter,
           year
        ]
       convert_tz: no
       datatype: date
       sql: ${TABLE}.date ;;
  }
}

view: purchase_items {
    dimension: items {
        description: "The names of the items that are purchased"
        type: string
        sql: purchase_items ;;
    }
}

Then in the explore we have the following:

explore: purchases {
    join: purchase_items {
        view_label: "Purchase Items"
        relationship: many_to_one
        sql: LEFT JOIN UNNEST(purchase_items) AS purchase_items ;;
    }
}

The problem that we are trying to solve is that when a user goes to the explore page and wants to search for purchases where certain combinations of items occur. For example, they may want to retrieve the count of purchases where the customer bought both a t-shirt and shoes. In BigQuery this can be done with the following:

SELECT purchase_id
FROM purchases p
WHERE 
('t-shirt', 'socks')  IN UNNEST(p.purchase_items);

I’ve tried to use both templated filters and parameters in the dimensions but neither result in the desired outcome. Does anyone know how I can form the above query or solve this problem?

0 Likes

(Izzy) #2

Nested/repeated fields in general are always tough to work with.

If you could turn purchase_items into a list, using LIST_AGG, then you could do a simple IN, I think. Would that work?

0 Likes