Median using BigQuery connection error on measure

median

(Camila Fracaro) #1

I’ve found a possible error when you create a measure type median using a BigQuery connection. I contacted Looker Support, but I’m also sharing with community in case someone is using this type of measure.

I created a measure like below to substitute a calculated field on a table I was using, but the values were different between them and I had a bit of digging to understand it.

measure: median_days_since {
type: median
sql: ${days_since} ;;
}

You can check on the screenshot below that the calculated field (2nd column) is calculating right the median as Looker’s documentation, but the total for column 3 is not. Last column is only to show that there was only one value for each row in “Days since”.


(bernard.kavanagh) #2

Hi Camila,

Thanks for posting this :slight_smile:

Looking at the example you posted above, it seems that the expected output in the total calculation is a median of the returned results. It is expected that table calculations perform this type of operation over the returned result set, and totals operate over the entire database table.

We can see here that totals are calculated by creating a separate SQL query that aggregates over the entire database table:

If we look at the examples below, we can see that the median(${orders.days_since_first_order}) table calculation returns 1,425.5 based on the returned rows, whereas the total shows 425, this is aggregated over the entire table.

To test this further, I have created measures of type Average and Percentile (50) to see what results are returned:

Because this query does not contain any dimensions, we can expect an Average, Median and Percentile (50) of the entire table, and likewise the total column illustrates that the values calculated here, are from the same query. Adding Percentile (50) is a second mechanism to test the returned result of Median.

To summarise, table calculations operate over the returned results, Totals operate over the entire backend table.

Hoping this helps!

Bernard.


(Camila Fracaro) #3

Hi Bernard,
Thanks for your response. I still think it’s not calculating it right, I added the 4th column so people could see that I only had 1 row for each one of the days since, so both values (median as a calculated field and total median) should be the same. I only have 4 rows for this database that I’ve just rechecked it.
I re-created below the query with data that I can share using code given by looker on my median measure:
WITH db AS (
SELECT CAST(“2018-12-01” AS TIMESTAMP) AS last_time
UNION ALL SELECT CAST(“2018-11-09” AS TIMESTAMP) AS last_time
UNION ALL SELECT CAST(“2018-09-11” AS TIMESTAMP) AS last_time
UNION ALL SELECT CAST(“2018-08-29” AS TIMESTAMP) AS last_time
),
checking_days_since AS
(SELECT , DATE_DIFF(current_date(), (CAST(TIMESTAMP(FORMAT_TIMESTAMP(’%F %T’, db.last_time , ‘Europe/Amsterdam’)) AS DATE)), DAY) AS days_since
FROM db
)
SELECT
COUNT(
) AS count,
CASE WHEN COUNT((DATE_DIFF(current_date(), (CAST(TIMESTAMP(FORMAT_TIMESTAMP(’%F %T’, db.last_time , ‘Europe/Amsterdam’)) AS DATE)), DAY)) ) <= 2 THEN AVG((DATE_DIFF(current_date(), (CAST(TIMESTAMP(FORMAT_TIMESTAMP(’%F %T’, db.last_time , ‘Europe/Amsterdam’)) AS DATE)), DAY)) ) ELSE APPROX_QUANTILES((DATE_DIFF(current_date(), (CAST(TIMESTAMP(FORMAT_TIMESTAMP(’%F %T’, db.last_time , ‘Europe/Amsterdam’)) AS DATE)), DAY)) ,1000)[OFFSET(500)] END AS db_median_days_since
FROM db AS db
LIMIT 1


(bernard.kavanagh) #4

Camila,

Thanks for coming back so quickly. I will build a repro of this using a BigQuery connection and reply with the results :slight_smile:

Bernard