ETL Tool Recommendations

etl

(Lucas Thelosen) #1

I am thinking about getting a commercial ETL tool. We have a bunch of data sources (Facebook, Salesforce, NetSuite etc.). They are important and right now we have jobs scheduled in all kinds of places that are owned by multiple employees. In an effort to operationalize this, I am looking at some ETL tools.

Do you guys have any recommendations? Does anyone know of “the Looker of ETL tools?”

Thanks,
Lucas


JOIN 2017 - Deep Dive - To Use or Not Use PDT's
(Chris) #2

We’ve made very good experiences with datavirtuality(.com), with in your case pre-built connectors to facebook, salesforce and many others. They’ve become a native looker connector (still a bit beta though).


(Kevin Marr) #3

If you want it to go into PostgreSQL, MySQL, or Redshift, check out Fivetran. They’re a hot new player in the space. I’ve been really impressed with their speed and effectiveness.

Another name that comes to mind is Xplenty, although I know less about them.

I’ve also heard good things about Datavirtuality.


(Kevin Marr) #4

Alooma is also a great option; check them out: https://www.alooma.io/


(Daniel Weitzenfeld) #5

I’ve had a great experience with Fivetran. They pair well with Looker because they focus primarily on the E and L steps of the ETL, leaving the T step to Looker, which makes it easy with PDTs.


(Gruen) #6

@weitzenfeld: One you got set up, were there any issues or failures that required your attention?


(Daniel Weitzenfeld) #7

@gruen: nope, it has been very smooth sailing.


#8

@kevin or @weitzenfeld - seen any similar options as Fivetran that work with Oracle on AWS RDS moving data to Redshift? I was looking at Attunity but curious if you know of other options.


(Daniel Weitzenfeld) #9

@zhill I haven’t - I imagine one of the bigger players like Informatica would work. FWIW - Fivetran is going to support Oracle in 3-6ish months, if you can wait that long.


(Kevin Marr) #10

@zhill Xplenty might support that, although their website doesn’t seem to suggest anything. And as @weitzenfeld mentioned, Informatica (Cloud) might also be a viable option.


(Jerome Myers) #11

We are pretty dependent on RDS Oracle instances. So decided to build our own tool to pull out the data. We are in the testing phase of our ETL app and it’s looking promising.


(Nouras Haddad) #12

RJMetrics launched a new product - RJMetrics Pipeline into open beta and we are partnering with Looker.

It is a self-serve product, and you can setup your first sync in 5 minutes. The first 5 million rows/events per month are free, forever.

We would love it if you tried it out and gave us your thoughts.

Thanks,
Nouras


(Bridge ) #13

@kevin or @weitzenfeld Are you all still in the “I would recommend FiveTran camp”. We’ve tested them out for some of our other tools, such as Hubspot and it seems okay (some tables missing/partial loads), but we’d like faster refresh speeds.


(Bridge ) #14

Another player is Segment who has a Salesforce pipeline in Beta with intentions to add many more integrations soon. We’re testing them out, so I can’t speak to functionality yet.


(Lawrence) #15

What integrations does Segment allow you to ETL into Redshift @Bridge


(Corey Maher) #16

I’m in the same boat - In addition to the ones mentioned above we’re looking at:

Alooma.com looks the coolest so far by a long shot, but they are very new and quite expensive.


(Daniel Weitzenfeld) #17

Yes, I would still recommend Fivetran, but I haven’t tried any of the (growing number of) alternatives.
Our most time-sensitive integration is with MySQL, and Fivetran refreshes it every 10-15 minutes. For some integrations, the limiting factor is the third party - e.g. I wouldn’t mind if Salesforce synced more frequently, but SFDC limits your daily API hits, so it’s not really an option.


(Bridge ) #18

Just to chime in, when we compared Segment to FiveTran for Salesforce, the amount of data (based on # tables) available was far more comprehensive in FiveTran than Segment.


(Brad Ruderman) #19

If anyone needs scripts for replicating a salesforce to postgres, though easily transferrable to other environments, let me know. They aren’t the best coded but they do the job.


(Segah A. Mir) #20

A lot of names are thrown around here, so I feel this thread needs a bit more structure.

First of all, ETL solutions can roughly be categorized as:

  • connectors
  • job workflow management

Most of what has been discussed above, refers to connectors. In the job flow category there are a lot more free open-source solutions - these are the solutions that focus on scheduling, dynamic interdependencies between data flows, and complex streaming requirements. Some paid-solutions offer both: job flows and connectors. And, of course, behind most connectors, there is some implementation of a job workflow management - it is just not exposed to the end user.

Second, the devil is in the details. It’s typically not enough to just have the data appear in your data warehouse initially. You want to test a number of scenarios. So learn about the solution’s ability to handle:

  • updates
  • streaming data
  • schema changes
  • datatype changes
  • lost records (just because you don’t see it, does not mean it does not happen)
  • etc

As a Looker employee, it would be unethical for me to recommend any one solution. That said, this is a very popular question, so I have previously led some research internally to understand the landscape. The criteria that we identified and that might be useful to others were:

  • Time to “Ready Data”
  • Data Source Diversity
  • Transparency / Monitoring Capability
  • Support for Redshift features or other Data Warehouses (BigQuery, Vertica)
  • Affordability
  • Simplicity

Finally, in some cases, you might be interested in HIPAA compliance and other stuff like that. Here you will likely be limiting your cloud vendors to only those who host a separate instance for you (just as you do with Looker).

Of course, I am happy to provide guidance in individual cases, where I understand the data. So @brad @Bridge @Lucas pm. if you have further questions specific to your datasets.