Revature 200413

Logo

Data Engineering with Java & Apache Spark

View My GitHub Profile

Project 2

A data pipeline with several modular components, including but not limited to:

The pipeline is intended to be several applications which run in succession, taking a datasource from an S3 and deploying a Spark job on a cluster for analysis before saving the results in a SQL database. Some recommendations for organizing and extending the project:

Features

Tech Stack

Presentation