Unable to schedule "All Results" to be delivered with the option “and results changed since last run”

schedule
etl

(Peter McCabe) #1

Unfortunately, we are not able to use Datagroups to schedule data as there is no option to control the time that a report’s issued when new data is loaded and this means that our report period filter script will not have updated before the schedule issues the report, thus the report’s are sent with old data. The dates we add data to the portal for certain clients are not always the same so we can’t just pick a set date or time to schedule data and need to rely on selecting the option “and results changed since last run”, however, when you select this option you can’t select “All Results” and are thus limited to the row limit of 5,000.
We urgently need a solution to this as we are now having to manually send these reports when we should be able to automate this and some of our clients are unhappy as they have scheduled reports that were missing data.


(bernard.kavanagh) #2

Hi Peter,

It should still be possible to automate this process using data groups as we can set the datagroup to check when data has been added by using a function like Select MAX(id). This will detect when new data has been added and should negate the other advanced options such as “and results changed since last run”.

What I would like to know before I set a test is what is the SQL of the datagroup you are currently using? We can alter the logic to suit your use case

You mentioned that “The dates we add data to the portal for certain clients are not always the same” can you confirm if you add data manually?

Bernard Kavanagh


(Peter McCabe) #3

Hi Bernard,

We have already tried that option but the problem is that there is no control over when the Select MAX(id) trigger will run and then deliver the report. This, in turn, means that the report runs before the report period filter script has run and so the report is sent out with out of date data and not the new data.

Thanks,

Peter


(Peter McCabe) #4

We have a process that uploads data to the database on a monthly or 4 weekly basis. The date this is completed changes from one month to the next due to various reasons. As a result, we can’t pick a date and schedule the report for then.


(Peter McCabe) #5

We need a datagroup trigger to update a user attribute for the report period filters, this needs to run first and then the second datagroup trigger to run e.g.

+datagroup: usage_loaded_trigger {

  • sql_trigger: SELECT count(*) FROM ${totals.SQL_TABLE_NAME} ;;
    +}

+datagroup: report_period_updated_trigger {

  • sql_trigger: SELECT {{ _user_attributes[‘default_report_period_1_month’] } ;;
    +}

(William Lane) #6

Hey Peter,

I might have a suggestion for you that I know has worked for others.

Fact of life with triggers and Datagroups is that they’re simple SQL statements that run every five mins by default and if the result changes, then the trigger happens/schedule runs. Triggers and scheduled Looks that watch the very same tables for the data for the Look itself AND for triggering will often trigger before the data is finished loading. This is just because the SELECT count(*) or SELECT MAX(id) or SELECT MAX(date) are run and results change as the table is loaded. That causes the Look to do a run during the data load. Sorry if that’s a bit unclear, I can try to explain in greater detail if you need but let me suggest the solution first as this might help clarify things.

Create a new table in your database. Lets call it: SCRIPTFINISHEDLOAD (terrible naming I know). This new table would have two columns:

TABLECALLED TIMELOADFINISHED
totals 2018-09-25 02:49:00

If you change your load script to update the above table’s TIMELOADFINISHED column for the totals table, you’ll have a record of when the load is finished.

You can then point your schedule datagroup’s trigger at this table:

 sql_trigger: SELECT TIMELOADFINISHED FROM SCRIPTFINISHEDLOAD WHERE TABLECALLED='totals' ;;

It’ll cause the schedule to only run when the load is finished because the result of the above SQL will only change when the script is done.


(Peter McCabe) #7

Hi William,

I have checked this with our database engineer and unfortunately this raises security concerns and is a non starter.

Is there a way to have a datagroup check the value of a user attribute as this would fix our issue?

Thanks,

Peter


(William Lane) #8

Super interested to hear where the security concern is with the approach I suggested. Could you give me more detail via support@looker.com?

I can certainly check that out. Have you tested it? I’ll need to put together a quick test some time today.

I know that the liquid variable _user_attributes is allowed in sql: parts of LookML so it might be possible.


(Peter McCabe) #9

Hi William,

I’ll ask what the concern was. We’ve tried to have a datagroup check the value of a user attribute but we couldn’t get this to work.

Thanks,

Peter


(Xin Bao) #10

Hi Peter,

Liquid ( including _user_attribute) is only allowed in sql, sql_on and sql_table_name based on our documentation: https://docs.looker.com/reference/liquid-variables#liquid_variable_definitions

Also, another article for your reference: User attributes and derived tables causes regenerator errors

We have a feature request to allow liquid in general (including _user_attribute) everywhere. I have +1 for you.

Meanwhile, I’m wondering if you have got any feedback from the data engineers regarding the security concerns using William’s approach?

Best,
Xin