Data Engineering with Java & Apache Spark
Welcome to the docs repository for Revature’s 200413 Big Data/Spark cohort. Here you will find weekly topics, useful resources, and project requirements.
Every week, we will focus on a particular technology or theme to add to our repertoire of competencies. These topics will feature heavily in assessments and QC meetings every week, and self-study and practical exploration will be necessary.
Each week may have a list of topic-based questions, which you should be prepared to study and answer in an assessment, whether in a meeting or a quiz. Associates are expected to answer at least 5 on a weekly discussion board, and respond to other posts with suggestions to improve or clarify them.
Google Doc - Contains our standard schedule, QC assessments overview and links, and a list of important contacts.
This cohort will prioritize individual and group-based project work:
Each project will require a list of features to be implemented, whether functional or operational, and finishing your MVP (minimum viable product) as early as possible before iterating new features upon the project is highly suggested. Plan ahead, and be sure to reach out to everyone whenever you require guidance (or offer your own to those in need).
To maximize resources and minimize troubleshooting, please perform a clean install or refresh of your operating system. Update your system, Enable VT-x in BIOS if possible, and uninstall all unnecessary programs. Your development environment should be set up for Java, Git, and Maven as soon as possible. In later weeks we will also require PostgreSQL, Docker, SSH, curl, and of course Apache Spark. Refer to this Readme or the links provided in each week’s topic and resources document to keep updated on the latest tools and programs needed for project work. You will be responsible for maintaining your environment throughout the program.
Powershell
as an administrator.Set-ExecutionPolicy AllSigned
Set-ExecutionPolicy Bypass -Scope Process -Force; iex ((New-Object System.Net.WebClient).DownloadString(‘https://chocolatey.org/install.ps1’))
Powershell
window as an administrator and run the following commands:choco install git
choco install adoptopenjdk8
choco install maven
choco install vscode
choco install eclipse
choco install intellijidea-community
To confirm all tools are properly installed and configured, be sure the following commands return no errors:
git -v
java -version
javac -version
mvn -v
java
and javac
should only reference Java 1.8.
All above tools can be installed at once for convenience using the following command:
choco install -y git adoptopenjdk8 maven vscode