How Looker treats weeks (Bigquery, Non-UTC Timezone)

on-hold
low_priority
reply
(vfx) #1

It’s interesting how Looker deals with weeks.

Suppose we have a dimension of type timestamp. We place it on the row, and apply a filter on the last complete week.


This is what the generated SQL looks like:

SELECT 
	FORMAT_TIMESTAMP('%F %T', TIMESTAMP(FORMAT_TIMESTAMP('%F %T', test_datetime.ts , 'Asia/Kolkata'))) 
  AS test_datetime_time_stamp
FROM test_datetime

WHERE 
	(((test_datetime.ts ) >= ((TIMESTAMP(FORMAT_TIMESTAMP('%F %T', TIMESTAMP_ADD(
  TIMESTAMP_ADD(TIMESTAMP_TRUNC(CAST(TIMESTAMP_TRUNC(TIMESTAMP(FORMAT_TIMESTAMP(
  '%F %T', CURRENT_TIMESTAMP(), 'Asia/Kolkata')), DAY) AS TIMESTAMP), DAY), 
  INTERVAL (0 - CAST((CASE WHEN (EXTRACT(DAYOFWEEK FROM TIMESTAMP_TRUNC(TIMESTAMP(
  FORMAT_TIMESTAMP('%F %T', CURRENT_TIMESTAMP(), 'Asia/Kolkata')), DAY)) - 1) - 1 + 7 < 0 
  THEN -1 * (ABS((EXTRACT(DAYOFWEEK FROM TIMESTAMP_TRUNC(TIMESTAMP(FORMAT_TIMESTAMP(
  '%F %T', CURRENT_TIMESTAMP(), 'Asia/Kolkata')), DAY)) - 1) - 1 + 7) - (ABS(7)
  * CAST(FLOOR(ABS(((EXTRACT(DAYOFWEEK FROM TIMESTAMP_TRUNC(TIMESTAMP(FORMAT_TIMESTAMP(
  '%F %T', CURRENT_TIMESTAMP(), 'Asia/Kolkata')), DAY)) - 1) - 1 + 7) / (7))) 
  AS INT64))) ELSE ABS((EXTRACT(DAYOFWEEK FROM TIMESTAMP_TRUNC(TIMESTAMP(
  FORMAT_TIMESTAMP('%F %T', CURRENT_TIMESTAMP(), 'Asia/Kolkata')), DAY)) - 1) - 1 + 7)
  - (ABS(7) * CAST(FLOOR(ABS(((EXTRACT(DAYOFWEEK FROM TIMESTAMP_TRUNC(TIMESTAMP(
  FORMAT_TIMESTAMP('%F %T', CURRENT_TIMESTAMP(), 'Asia/Kolkata')), DAY)) - 1) - 1 + 7)
  / (7))) AS INT64)) END) AS INT64)) DAY), INTERVAL (-1 * 7) DAY)), 'Asia/Kolkata')))
  AND (test_datetime.ts ) < ((TIMESTAMP(FORMAT_TIMESTAMP('%F %T', TIMESTAMP_ADD(
  TIMESTAMP_ADD(TIMESTAMP_ADD(TIMESTAMP_TRUNC(CAST(TIMESTAMP_TRUNC(TIMESTAMP(
  FORMAT_TIMESTAMP('%F %T', CURRENT_TIMESTAMP(), 'Asia/Kolkata')), DAY) AS TIMESTAMP)
  , DAY), INTERVAL (0 - CAST((CASE WHEN (EXTRACT(DAYOFWEEK FROM TIMESTAMP_TRUNC(
  TIMESTAMP(FORMAT_TIMESTAMP('%F %T', CURRENT_TIMESTAMP(), 'Asia/Kolkata')), DAY)) - 1)
  - 1 + 7 < 0 THEN -1 * (ABS((EXTRACT(DAYOFWEEK FROM TIMESTAMP_TRUNC(TIMESTAMP(
  FORMAT_TIMESTAMP('%F %T', CURRENT_TIMESTAMP(), 'Asia/Kolkata')), DAY)) - 1) - 1 + 7)
  - (ABS(7) * CAST(FLOOR(ABS(((EXTRACT(DAYOFWEEK FROM TIMESTAMP_TRUNC(TIMESTAMP(
  FORMAT_TIMESTAMP('%F %T', CURRENT_TIMESTAMP(), 'Asia/Kolkata')), DAY)) - 1) - 1 + 7)
  / (7))) AS INT64))) ELSE ABS((EXTRACT(DAYOFWEEK FROM TIMESTAMP_TRUNC(TIMESTAMP(
  FORMAT_TIMESTAMP('%F %T', CURRENT_TIMESTAMP(), 'Asia/Kolkata')), DAY)) - 1) - 1 + 7)
  - (ABS(7) * CAST(FLOOR(ABS(((EXTRACT(DAYOFWEEK FROM TIMESTAMP_TRUNC(TIMESTAMP(
  FORMAT_TIMESTAMP('%F %T', CURRENT_TIMESTAMP(), 'Asia/Kolkata')), DAY)) - 1) - 1 + 7)
  / (7))) AS INT64)) END) AS INT64)) DAY), INTERVAL (-1 * 7) DAY), 
  INTERVAL (1 * 7) DAY)), 'Asia/Kolkata')))))
GROUP BY 1
ORDER BY 1 DESC
LIMIT 500

While it undoubtedly does the job, one may wonder why it has to be this complicated.

It starts with the definition of timestamp which goes like this FORMAT_TIMESTAMP('%F %T', TIMESTAMP(FORMAT_TIMESTAMP('%F %T', test_datetime.ts , 'Asia/Kolkata'))) AS test_datetime_time_stamp and then turns into what we see above. It hits the BQ performance as this calculation happens for every row in the set.

Beyond performance penalty, one of the many Looker’s goodies is that the SQL is actually readable by humans too. So I was curious isn’t there a simpler way?

0 Likes

(Izzy) #2

That SQL is quite the witches brew of date formatting! I don’t know why we do every part of that, but some of it is to correctly compensate for week_start_day, which has to be done twice since it’s a comparison between 2 dates.

I grabbed the attention of an engineer who works on the model to see if they can find anywhere it could be simplified, because you’re right that it’s a lot.

Also, it looks kinda like a bowl of berries the way Discourse formatted the code.
Image result for bowl of berries

0 Likes

(Izzy) #3

Also, since this is just the latest in a string of many interesting time/date related Looker questions you’ve posted, I’ve granted you the Father Time badge— Wear it wisely!

1 Like

(vfx) #4

Wow!

Thanks a lot for the badge! I’ll wear it proudly!

But the fact of life is - there’s rarely a product you can just love. Looker is that product. Starting from the concept, through the execution - and to the amazing community. For me - coming from the mixed-taste world of Tableau this is a major, major experience upgrade.

If names cannot be named, I’ll edit the post :slight_smile:

1 Like

(Izzy) #5

:heart: for that whole post!

With regards to the original question— It is some crazy looking SQL, but I don’t think performance takes a hit from it, since the function would evaluate to a constant on the database and not have to be re-run for every row.

Word from the engineers is that some of the complexity definitely seems like it could be simplified, specifically around how we truncate to weeks, and I’ve filed that with them, but given the lack of big performance concerns I think there’s other things first in line on their plate (Like those other bigquery date-time oddities you found!). Thanks for asking about it, I learned something too.

0 Likes

(vfx) #6

Yeah,

I realize now my explanation was not very clear.

There were two concerns - performance penalty and readability of SQL.

About the readability - obviously not a very big issue, rather a nice-to-have one, mostly because it helps to audit the SQL and be sure it does what is expected. And while stuff like symmetric aggregates is tough to make look nice, this one seems eventually possible.

About the performance I’m also convinced constant expressions are evaluated just once, so the thing to consider here is the timestamp transformation which seams redundant. While this transformation serves a purpose in the current scheme of things (needed by the Looker query the way it is designed) it is in fact redundant and I think can be replaced by a simpler transformation as shown in the Human SQL query below. Here’s the original SELECT statement which produces the week and is not a constant calculation:

FORMAT_TIMESTAMP('%F',TIMESTAMP_TRUNC(TIMESTAMP_ADD(
TIMESTAMP_TRUNC(CAST(TIMESTAMP(FORMAT_TIMESTAMP(
'%F %T', h8_cdr.CreatedDate , 'Asia/Kolkata')) AS TIMESTAMP), DAY), 
INTERVAL (0 - CAST((CASE WHEN (EXTRACT(DAYOFWEEK FROM 
TIMESTAMP(FORMAT_TIMESTAMP('%F %T', h8_cdr.CreatedDate , 
'Asia/Kolkata'))) - 1) - 1 + 7 < 0 THEN -1 * (ABS((EXTRACT(DAYOFWEEK FROM 
TIMESTAMP(FORMAT_TIMESTAMP('%F %T', h8_cdr.CreatedDate , 
'Asia/Kolkata'))) - 1) - 1 + 7) - (ABS(7) * CAST(FLOOR(ABS(((EXTRACT(
DAYOFWEEK FROM TIMESTAMP(FORMAT_TIMESTAMP('%F %T', 
h8_cdr.CreatedDate , 'Asia/Kolkata'))) - 1) - 1 + 7) / (7))) AS INT64))) ELSE 
ABS((EXTRACT(DAYOFWEEK FROM TIMESTAMP(FORMAT_TIMESTAMP('%F %T', 
h8_cdr.CreatedDate , 'Asia/Kolkata'))) - 1) - 1 + 7) - (ABS(7) * 
CAST(FLOOR(ABS(((EXTRACT(DAYOFWEEK FROM TIMESTAMP(
FORMAT_TIMESTAMP('%F %T', h8_cdr.CreatedDate , 'Asia/Kolkata')))
 - 1) - 1 + 7) / (7))) AS INT64)) END) AS INT64)) DAY), DAY))

And the Human SQL version of the SELECT:

FORMAT_TIMESTAMP( "%F", 
TIMESTAMP_TRUNC(CreatedDate, WEEK(MONDAY), "+5:30"),  "+5:30")

So I ran some tests over a half-billion records table, not partitioned on the tested field:

I ran two tests - with SQL generated by Looker, and a simpler - human readable SQL, both with cached results off to make sure we are comparing apples with apples.

Looker SQL (10.1 sec):

Human SQL (3.5 sec):

Having in mind I’m pretty far away from this BQ, it will take about 1 sec overhead in both cases so the actual performance penalty is about 3 times. I also think it comes from the timestamp shuffling, not the constants evaluation.

Also if we move from timestamp to datetime it is getting even slower as it adds one more conversion instead of dealing with datetime directly. If interesting I can provide some more test results…

I’m not going to take any more time on this topic as data warehouses are pretty powerful and can manage such small inefficiencies. Still hope the above helps in case performance of the queries is something Looker wants to optimize…

2 Likes