In some cases you will get an “Resources exceeded during query execution” error when trying to build a large derived table in BigQuery.
BigQuery is a distributed database. This means most queries execute on multiple nodes. But there are certain operations that require all the data exist on a single node. When that data can no longer fit on that node, you get the “Resources exceeded during query execution” error and the whole query fails.
Operations that need to see all the data in the resulting table at once have to operate on a single node. Un-partitioned window functions like RANK() OVER()
or ROW_NUMBER() OVER()
will operate on a single node.
Often you will want to generate a primary key for a derived table. A common way to do this is to use ROW_NUMBER() OVER() as id
to generate a sequence number for a each row in your table. If the table is big we fail with “Resources exceeded” because this function tries to operate on a single node.
Luckily, there’s an easy way around this. You can PARTITION
the the window function by date and build a string as the primary key. For example:
CONCAT(CAST(ROW_NUMBER() OVER(PARTITION BY event_date) AS STRING),
'|',(CAST(event_date AS STRING)) as id
ORDER BY
If your query contains an ORDER BY
clause, all the data is still going to be passed to a single node and then sorted. If it doesn’t fit on the node then it fails. Since BigQuery is a scanning database (it scans the whole column for each query) sorting the data is unnecessary. So don’t use ORDER BY in ordering derived table results.
To add more detail from a slack convo:
Whatever queries you’re putting your window functions in, should also contain the minimum number of columns. You can then rejoin that back onto a final PDT to achieve the desired functionality.
A quick tip from @kabalan_gaspard was a life saver for me:
if you’re using OVER() to generate a simple primary key, try using GENERATE_UUID() instead
A post was split to a new topic: Resources exceeded during query execution in BigQuery with window function