Your company's data team organization

etl

(Lawrence) #1

I had asked a question earlier this year surveying everyone here about your company’s data stack and thought it was a fruitful conversation.

As my company Payoff has matured, I’ve realized more and more that something that is as or possibly more important is how the data team is structured across the company. And when I say data team I’m referring to the full data stack and the role people play in building out the infrastructure, building data integrations and of course analyzing the data itself.

So how is your data team structured? Does it have a clear delineation between data engineers (generate the data in the application aka website), DBAs (those that work on the data pipes or ETL) and data analysts/scientists (those that derive business value from the data)? Or do you have an organization where the line is blurred between DBAs and data scientists kind of like Stichfix does it: their motto is engineers shouldn’t write ETL.

To kick things off I thought I’d share how my company is structured. We have:

  • Data Engineers: Backend engineers who are creating production data
  • DBAs: Those who are only responsible for building out data integrations and bringing first-party and third-party data into a centralized data repository
  • Data Analysts/Scientists: End-users of the data who run analysis, build Looks/Dashboards, create markdown output, automate reporting, building models

Pros

  • Clear delineation of responsibilities. People generally do what they are good at.
  • Data scientists and analysts do not have to worry about ETL. They can focus their energies on analysis.

Cons

  • When the ETL breaks or there is some issue with it, it’s hard for the data scientists
    to validate where the issue is coming from since in this setup the ETL is essentially a blackbox
  • ETL can be somewhat dry work as they are often not informed of the output or end result of the data they are working with. In this structure, they function kind of like a middle-man (albeit a very important one!) taking data and passing it along
  • The DBA does not often have enough context either up-stream about the production data or down-stream as to what data is important for the business

I’m curious how other company’s divvy up the data work across these different roles and manage the overlap between responsibilities.


(Devin) #2

@segahm helped build and design part of our ETL. He did a great job understanding our end-reporting needs in Looker and designing the correct ETL processes to meet those needs.


(vivek) #3

Since, looker does have import feature, we too have implemented the end to end automation combining ETL & looker, for end to end delivery of the product.


(Aron Clymer) #4

I love this topic - thanks for posting it Lawrence. I was Head of Data at PopSugar for 1.5 years (I recently left and am now advising & consulting), and prior to that I led a 20 person data team at Salesforce. At both companies I found it very hard to find people who could develop across the full data stack. If you have such rock-stars working at your company, indeed a lot of the “cons” that you pointed out go away. However, most companies should focus more on creating a collaborative process that fosters alignment across all data professionals. The business goals should be clear to everyone involved. The data lead should have projects scoped out and hold meetings with all team members prior to execution. I’m a big fan of minimal documentation, so I try to create one-page “tear sheets” for any decent sized project. I encourage cross-training during this process so that you can move towards that goal of full-stack capability: people who can model data, build ETL, code in LookML, and do the analysis. To a certain degree, I like to keep Data Scientists with advanced statistical modeling skills focused on building models, with data engineers supporting them with some of the heavy lifting. Hope this helps - happy to discuss further anytime.