Parsing Logs with Spark Streaming and Looker

spark
etl

(Scott Hoover) #1

In a recent webinar with Daniel @Mintz , I stepped through a streaming pipeline that ingests raw logs with Flume, processes the raw logs in Spark Streaming, and persists the processed data as Parquet in HDFS. Daniel then demonstrated how Looker can connect to Spark SQL and query these data via JDBC.

I thought it might be useful to open this code up to people who might be investigating scalable event-data pipelines and SQL-on-Hadoop solutions. Take a look. Feel free to clone, contribute, or comment!