Hey @Rich1000! I’m a couple of days late to this but hopefully it’s still relevant. @George_Fraser, thanks for calling us out in your comment, good stuff.
I’m the founder and CEO of Fishtown Analytics. We do a ton of work in the Looker / Redshift world, and event analytics is frequently a part of our projects. Over the past year we’ve done ~ 15 Snowplow installs. We’ve also worked with Segment a bunch because it’s fairly common that customers will have it installed prior to engaging with us. So, we’ve seen a lot of both.
We always, always recommend Snowplow over Segment. The reasoning behind this is simple. Your event data pipeline is critically important to your business—data about every single aspect of your business will flow through it, and your demands on it will grow supralinearly with the growth of your user base. Code triggering events will be in your website, your server-side applications, and your mobile apps. Vendor lock-in here is a nightmare. Changing data warehouses (say, moving from Redshift to BigQuery) is actually far easier than changing event data pipelines.
If you use Snowplow, you “own” it. It’s open source, and you can do whatever you want with the code. We almost always use Fivetran to host the Snowplow collector for the clients we work with because it’s a very cost-effective solution for early in a startup’s life. However, as you scale, you have the choice to host the infrastructure yourself at any point.
I have seen multiple companies happily use Mixpanel / Heap / Segment prior to having their growth really explode, only to realize that as they 10x in users, their event tracking bill also 10xs. This is an untenable position, from my perspective, and outweighs any other considerations when choosing your event data pipeline. The companies that I referred to above all had to undertake very significant (and costly) migrations to replace a deeply embedded tool exactly when they needed their engineering resources to support user growth.
Aside from cost / ownership considerations, we just really like Snowplow and find it to be a superior tool. This is for a lot of really small reasons.
- Snowplow is append-only, which causes it to be very efficient as it writes to your warehouse.
- Snowplow has a single events table, which is far more convenient for analysis than Segment’s strategy of creating a table for each individual event.
- Snowplow goes deeper in its web analytics functionality than Segment does; it’s a drop-in replacement not only for Segment, but also for something like Google Analytics Premium. It comes out of the box with auto-track, page pings (to measure time on site), and lots of built-in functionality to resolve referring URLs to channels and sources in the way you would see in the GA interface.
Always happy to talk more about this if you’re curious, engagement or no.
The only other thing I’d add is that, while the Looker blocks mentioned above are a nice convenience, they’re really not production-ready. This is because Looker’s PDTs do not support incremental rebuilds. When you have millions or billions of events, you can’t afford to re-sessionize all of your events every time you want a refresh. This absolutely crushes your warehouse and can take many hours, at which time you’re ready to start all over again. Whenever we install Snowplow, we use an open-source tool called dbt, which is similar to Looker PDT functionality but supports incremental model rebuilds (among other useful features). We’ve published open source code that sessionizes Snowplow events incrementally here. Feel free to steal it