Segment v's Snowplow

etl

(Richard) #1

Does anyone have an opinion on the best way to track user behaviour on a website (i.e. javascript events)? I’m considering both Segment and Snowplow at the moment. Segment appears to be very well integrated with Looker and has a beautifully slick UI but the price I’ve been offered is, to quote my boss, “a bit punchy”. This has forced me to consider going the open-source root with Snowplow but I don’t want to make life too hard for myself.

Any advice appreciated as once we pick a technology I doubt we’ll be able to switch any time soon.


(Brett Sauve) #2

@Rich1000 I can’t speak to having used Segment or Snowplow, but Snowplow does have some Looker blocks available, if that makes the decision any easier:


(Erin Franz) #3

@Rich1000 We partner with both Segment and Snowplow and have Looker blocks for each. Snowplow’s open source option is great if you have the resources and technical expertise. They also offer services to get you up and running. It all depends on your needs which tool would be right for you - you can reach out to your Looker CSM for more details.


(George Fraser) #4

@Rich1000 One of the most important points about Snowplow is that it’s open-source and unlike Segment, it has a multi-vendor ecosystem:

We at Fivetran have recently been doing a bunch of work to align more closely with the open-source snowplow collector, and Fishtown analytics has come up with a great strategy where you can use a CNAME record to point your own domain snowplow.yoursite.com to Fivetran’s snowplow collector. This way, you preserve the option to change do a different vendor or operate your own collector in the future, just by changing the CNAME.


(Tristan Handy) #5

Hey @Rich1000! I’m a couple of days late to this but hopefully it’s still relevant. @George_Fraser, thanks for calling us out in your comment, good stuff.

I’m the founder and CEO of Fishtown Analytics. We do a ton of work in the Looker / Redshift world, and event analytics is frequently a part of our projects. Over the past year we’ve done ~ 15 Snowplow installs. We’ve also worked with Segment a bunch because it’s fairly common that customers will have it installed prior to engaging with us. So, we’ve seen a lot of both.

We always, always recommend Snowplow over Segment. The reasoning behind this is simple. Your event data pipeline is critically important to your business—data about every single aspect of your business will flow through it, and your demands on it will grow supralinearly with the growth of your user base. Code triggering events will be in your website, your server-side applications, and your mobile apps. Vendor lock-in here is a nightmare. Changing data warehouses (say, moving from Redshift to BigQuery) is actually far easier than changing event data pipelines.

If you use Snowplow, you “own” it. It’s open source, and you can do whatever you want with the code. We almost always use Fivetran to host the Snowplow collector for the clients we work with because it’s a very cost-effective solution for early in a startup’s life. However, as you scale, you have the choice to host the infrastructure yourself at any point.

I have seen multiple companies happily use Mixpanel / Heap / Segment prior to having their growth really explode, only to realize that as they 10x in users, their event tracking bill also 10xs. This is an untenable position, from my perspective, and outweighs any other considerations when choosing your event data pipeline. The companies that I referred to above all had to undertake very significant (and costly) migrations to replace a deeply embedded tool exactly when they needed their engineering resources to support user growth.

Aside from cost / ownership considerations, we just really like Snowplow and find it to be a superior tool. This is for a lot of really small reasons.

  • Snowplow is append-only, which causes it to be very efficient as it writes to your warehouse.
  • Snowplow has a single events table, which is far more convenient for analysis than Segment’s strategy of creating a table for each individual event.
  • Snowplow goes deeper in its web analytics functionality than Segment does; it’s a drop-in replacement not only for Segment, but also for something like Google Analytics Premium. It comes out of the box with auto-track, page pings (to measure time on site), and lots of built-in functionality to resolve referring URLs to channels and sources in the way you would see in the GA interface.

Always happy to talk more about this if you’re curious, engagement or no.

The only other thing I’d add is that, while the Looker blocks mentioned above are a nice convenience, they’re really not production-ready. This is because Looker’s PDTs do not support incremental rebuilds. When you have millions or billions of events, you can’t afford to re-sessionize all of your events every time you want a refresh. This absolutely crushes your warehouse and can take many hours, at which time you’re ready to start all over again. Whenever we install Snowplow, we use an open-source tool called dbt, which is similar to Looker PDT functionality but supports incremental model rebuilds (among other useful features). We’ve published open source code that sessionizes Snowplow events incrementally here. Feel free to steal it :slight_smile: