"Resources exceeded during query execution" when building derived table in BigQuery

bigquery

(lloyd tabb) #1

In some cases you will get an “Resources exceeded during query execution” error when trying to build a large derived table in BigQuery.

Why this happens

BigQuery is a distributed database. This means most queries execute on multiple nodes. But there are certain operations that require all the data exist on a single node. When that data can no longer fit on that node, you get the “Resources exceeded during query execution” error and the whole query fails.

Which Operations Operate on a Single Node?

Operations that need to see all the data in the resulting table at once have to operate on a single node. Un-partitioned window functions like RANK() OVER() or ROW_NUMBER() OVER() will operate on a single node.

When will it matter?

Often you will want to generate a primary key for a derived table. A common way to do this is to use ROW_NUMBER() OVER() as id to generate a sequence number for a each row in your table. If the table is big we fail with “Resources exceeded” because this function tries to operate on a single node.

Luckily, there’s an easy way around this. You can PARTITION the the window function by date and build a string as the primary key. For example:

CONCAT(CAST(ROW_NUMBER() OVER(PARTITION BY event_date) AS STRING),
     '|',(CAST(event_date AS STRING)) as id

ORDER BY

If your query contains an ORDER BY clause, all the data is still going to be passed to a single node and then sorted. If it doesn’t fit on the node then it fails. Since BigQuery is a scanning database (it scans the whole column for each query) sorting the data is unnecessary. So don’t use ORDER BY in ordering derived table results.


(Dillon Morrison) #2

To add more detail from a slack convo:

Whatever queries you’re putting your window functions in, should also contain the minimum number of columns. You can then rejoin that back onto a final PDT to achieve the desired functionality.


(ernesto ongaro) #3

A quick tip from @kabalan.gaspard was a life saver for me:

if you’re using OVER() to generate a simple primary key, try using GENERATE_UUID() instead