Linear Extrapolation in Table Calcs (Forecasting)

low_priority
done
reply
(Levi Davis) #1

Running a regression on dates and then using the estimated parameters to produce a forecast presents two challenges: (1) you can’t do math on dates, and (2) it’s not straight forward to add future dates to a model.

Here’s how to do it in Looker:

Step 1:

Right-join a calendar view to the Explore.

- join: mr_calendar_dates
  sql: RIGHT OUTER JOIN mr_calendar_dates ON calendar_dates.date_id = order.created_at

Step 2

Create two dimensions in the calendar view that transform dates into a series of numbers. One of them should go into the past and have NULL values for the future (for training the model), and the other should go into the past and the future (for making predictions). I did it like this:

Please note that this SQL will need to be modified according to your SQL dialect.

  - dimension: is_past
    type: number
    sql: CASE WHEN ${date} > CURRENT_DATE THEN NULL ELSE 1 END
    hidden: true
    
  - dimension: days_from_today
    type: number
    sql: DATEDIFF(${date}, current_date)
    
  - dimension: days_from_today_past
    type: number
    sql: DATEDIFF(${date}, current_date)*${is_past}

Step 3

We are now ready to go to the Explore. Select the two time dimensions we just made (x) and whatever measure you want to model (y). Take the ln(y) if you want to assume exponential growth. Filter on the date, selecting the amount of time in the past you want to train the model on, and the amount of time in the future you want to predict.

Step 4

Use these formulae to estimate intercept (a) and slope (b) using the time-series with NULL values for the future:

Or, in words, you need sums and sum^2 of the time axis, counts of numbers in one of your columns, and then the sum and sum_product of the predicting set. Then apply the estimated parameters to the time-series that has future values (and wrap the whole thing in exp() if you were regressing on ln(y)).

Step 5

Make the plot deselecting everything but y and predicted y. The x-axis label won’t be perfect because we have multiple dimensions, and if you want to clean it up a bit, you can turn Axis Value Labels off.

16 Likes

Predictive Element for Dashboard
Anyone used table calculations to visualize error bars yet?
(Carter Moar) #2

Way cool :snowman:

0 Likes

(Alexander Burch) #3

The screenshot of the table calculations doesn’t show the full calculations. Could anyone please (maybe @Levi_Davis1) include the full table calculations so I can see how they are implemented?

0 Likes

(Levi Davis) #4

This has gotten a lot easier since Looker introduced statistical functions in table calculations. Now Looker has functions that calculate directly the slope and intercept of a linear regression. Check out the intercept() and slope() functions in table calculations. That’s a lot easier than doing all that math!

0 Likes

(Alexander Burch) #5

@Levi_Davis1 Thanks so much. This helps, but I am still getting lost in the weeds of it and not seeing results that look right. Is there any chance you or a kind Looker support member could post a full example of this using intercept() and slope() ?

0 Likes

(Levi Davis) #6

It looks like I can’t post images in comments, but I can give you the code. This assumes that you’ve already done the first part of creating the dates in the future.

For the slope term: slope(${y}, ${x})

For the intercept: intercept(${y}, ${x})

Regression line: intercept + slope * ${x}

Where y is the variable you want to predict and x is the predictor variable (time in this case).

If you want to make an exponential model like in the example, do this.

For the slope term: slope(ln(${y}), ${x})

For the intercept: intercept(ln(${y}), ${x})

Regression line: exp(intercept + slope * ${x})

0 Likes

(Brecht Vermeire) #7

This isn’t always necessary, Looker can do the date filling at the query level. Just make sure you filter on a specific timeframe with an end date in the future. The result is the same:

0 Likes

(Riley Peterson) #8

Hello @brecht I was wondering if you could help me with a similar situation.

I am trying to add a vertical reference line to a graph for a date which is not within the min and max of dates plotted on the x-axis.

I have seen your post here about creating a reference line. However, in my case the if condition is never met because the x axis value for the reference line I want to add is always greater than the x axis dates.

I have read this^ post, but I’m unclear on ways I could add the future date to the sql result (x axis). My instinct is to just UNION the date to the end of the result… but I have not successfully managed to do this.

Some specifics:
I have the following lookml model:

explore: event {
  label: "Events, Offers & Zones"
  join: offer {
    type: inner
    sql_on: ${offer.event_id} = ${event.id} ;;
    relationship: one_to_many
  }
  join: offer_quantity {
    type: inner
    sql_on: ${offer_quantity.offer_id} = ${offer.id} ;;
    relationship: one_to_many

I am making visualizations that plot offer_quantity.datetime on the x axis and offer_quantity.qty_available on the y axis. They look like this (multiple lines due to different zones and price points which have been pivoted):

I want to make this plot extend out as far as event.datetime (event.datetime is greater than max(offer_quantity.datetime)) and add a reference line there.

I have tried to do this by UNIONing event.datetime to offer_quantity.datetime. I’m trying to trick it into thinking there is one more date in the future with value max(qty_available)), but have run into some lookml syntax errors. Do you have any advice or clever ways to achieve this result?

Riley

0 Likes

(Noel Wigdor) #9

@Levi_Davis1 - thanks so much for your clear explanation of how to do linear and exponential projections. If I begged, could you tell me how to do a logarithmic projection?

Many thanks!

0 Likes

(Levi Davis) #10

You wouldn’t even have to beg!

This should be easier than the exponential model. You just need to take the log of your x axis and put that into the regression formulas.

So it would just be:

For the slope term: slope(${y}, ln(${x}))

For the intercept: intercept(${y}, ln(${x}))

Regression line: intercept + slope * ln(${x})

The only trick then would be that you’d want to display x on the x axis and not ln(x), so just hide the ln(x) field with the “Hide from Visualization” option.

1 Like

#11

Where exactly are you implementing Steps 1 & 2?

0 Likes

(Levi Davis) #12

This is using the “Calculations” feature. See the picture in the original posting. But, instead of doing all that math, though, you can just use the slope() and intercept() functions.

0 Likes