Announcing Date/Time Partitioned PDTs for Google BigQuery

(conrad) #1

Announcing Partitioned PDTs in BigQuery
This past Friday, Google announced that BigQuery has added support for date/time column-based partitioning. Looker works closely with the BigQuery team and is pleased to announce that, as of Looker version 5.6, PDTs can take advantage of this exciting enhancement to BigQuery. It’s quite simple and highly recommended for large PDTs that are regularly filtered by a date/time column when queried.

Why use partition columns?
Queries that are filtered on the partition column will limit the table scan to only the partitions included by the date filter, rather than scanning the entire table. Because BigQuery is processing a smaller subsection of the table, this can significantly reduce the cost of querying large PDTs when the appropriate partition and filter is specified.

How do I declare a partition column?
To declare the partition column, simply include the partition_keys: ["datetime_column_to_partition_by"] to your persistent derived table definition.

view: partitioned {
  derived_table: {    
    partition_keys: ["created_at"]
    sql: SELECT some_stuff, created_at FROM facts_source... ;;
    datagroup_trigger: daily_datagroup
  dimension: some_stuff {

Known Caveats

  • The column you are partitioning by must be a date/time column.
  • Even though the LookML keyword partition_keys is plural and takes a list of columns, BigQuery will only partition by a single column.

(conrad) unlisted #2

(Ian Ross) listed #3

(Rogier Werschkull) #4

Nice, this is going to help!

However, you could already have done this as it was already possible to time partitioned data using the _PARTITIONTIME pseudo column. Then, you would need a view on top of the PDT in which you would need to rename this pseudocolumn to the input column you used to time partitioning on. This leads to the same result, plus two additional advantages:

  1. Normal time partitioning is not in beta
  2. When you use _PARTITIONTIME you could actually implement TWO level partitioning now already by also using BigQuery’s _TABLE_SUFFIX system. In the PDT view you could then expose the _TABLE_SUFFIX as the column you used for the 2nd level partitioning.