Revature 200413


Data Engineering with Java & Apache Spark

Java is an object-oriented programming language and a platform developed by Sun Microsystems (eaten by Oracle). Using the principle of WORA (Write Once, Run Anywhere), a Java application can be compiled and executed on any platform supported by Java. Flexible, popular, and well-supported, Java has helps developers write scalable client-server web applications, desktop and mobile applications, and frameworks and libraries.


Programming and Compiling

Most Java applications only require the JRE (Java Runtime Environment). But to write and compile you need the JDK (Java Development Kit). While the JRE provides Java’s standard libraries and exceptions as well as a JVM, the JDK provides all the above as well as javac, the compiler. Java source code is written in text files labeled with .java extension. It is then compiled into bytecode in .class files by javac. Then the bytecode is executed by the JVM, which translates the Java commands into low-level instructions to the operating system.

Since Java 6, all Java programs not run inside a container (such as a Servlet Web Container) start and end with the main method. The class containing the main method can have any name, but the method itself should always be named main

class Example {
    public static void main(String[] args) {
        System.out.println("Num args:" + args.length);

We can compile this code into a .class file of the same name:


And to run the resulting Example.class file:

java Example

The java and javac commands require the full directory path or class path to any source code or binary file respectively. If you have a package com.demo in the first line of Example, then you would nest the java file into a com/demo/ directory and then run:

javac com/demo/

java com.demo.Example

From here we can add packages and imports, expanding the application into a set of interacting objects. By default, the javac compiler implicitly imports several base packages from the standard library. the -help flag can display available options. For example, the following will compile using UTF-8 encoding while conforming to Java 1.8 features:

javac -encoding UTF-8 -source 8 -target 8

Object-Oriented Programming

Although Java accommodates several paradigms, OOP is the foundation for most applications. In OOP, a program is organized into objects encapsulating related fields (representing its state) and methods (usually to control that state or perform related functions). When defining objects, Java reserves the keyword class (not to be confused with the .class file extension) which serves as their blueprint. An object in Java represents an instance in memory of a class, and also every class implicitly inherits from the Object superclass which provides useful convenience methods such as equals() and toString(). Java popularized several ‘Pillars’ of OOP design theory. While the numbers vary between OOP languages, Java focuses on four:


A value is stored and identified in memory by a variable. Variables have a name that makes it possible to access the value, and a type that defines what sort of value it stores.

int variableName = 64;
String txtVar = "Hello World";

Primitive data types

Java handles two kinds of datatypes: primitives and references. Primitives are variables that store simple values. There are eight in Java.

Reference types

Reference types store the memory address location of more complex data types in the heap. Reference types include:

Naming variables

Scopes of a variable

A variable’s reference will only exist within the context of its declared scope, which is based on the location of its declaration.

Be aware of shadowing: when two variables in different scopes share names.


Methods accept a list of arguments known as parameters and return some value. They are used to implement repeatable, consistent actions on variable input, much like math functions.

public int myMethod(int a, int b);
public int myMethod(int a);


Classes not only define object fields and methods, but how it should be instantiated through special methods called constructors. Constructors must have no return type and share the same name as its class. Java will automatically give you a noargs constructor. However, if you define any constructor, you will lose the automatically given constructor.

While a constructor may be private, used for singletons, it may not be final, static, or abstract.

Access modifiers

Classes should only be public or default. There are no cascading access levels, and unspecified fields will be default. Subclasses can only change inherited fields to be less restrictive.



Java Arrays are special reference types that store similarly typed data iteratively. A pair of brackets define an array of some data type, and can be written anywhere after the type:

// One-Dimensional Arrays
int[] arrayOne;
int []arrayTwo;
int arrayThree[];

// Two-Dimensional Arrays
int[][] 2DArrayOne;
int 2DArrayTwo[][];
int []2DArrayThree[];


While Java does not allow direct memory access to its arrays like other languages, they are still of fixed size once defined by the new keyword or by an array literal.

// One-Dimensional Arrays
int[] instancedArray = new int[3];
int[] literalArray = {1, 2, 3};

// Two-Dimensional Arrays
int[][] instanced2DArray = new int[3][4];
int[][] literal2DArray = { { 1, 2 }, { 3, 4 }, { 5, 6 } };


Java for loops can iterate through arrays like most other languages:

// One-Dimensional Arrays
for (int i = 0; i < arrayOne.length; i++) {

// Two-Dimensional Arrays
for (int i = 0; i < 2DArrayOne.length; i++) {
    for (int j  =0; j < 2DArrayOne[i].length; j++) {

// Foreach loops
for (int i : arrayOne) {


The java.util.Arrays class provides various methods for manipulating arrays.

int[] messyArray = {234, 5346, 3, 64};


Varargs is a special parameter that can accept multiple arguments of the same type into a dynamically constructed array, and denoted by an ellipsis (…) instead of brackets. A varargs parameter must be the last or only parameter in a method signature.

varArgMethod("m", 1, 2, 5, 35, 346, 345, 4634);


public static void varArgDemo(String m, int... intArgs) {
    for (int i : intArgs) {

Exception Handling

When an something wrong occurs during execution, the current stack frame will throw an exception. If the exception is not handled, or thrown up the stack to be handled elsewhere, the program will crash. Good exception handling helps a program continue execution. Common issues which can throw exceptions involve stack or heap memory overflow, an array iterating out of bounds, or an interrupted stream or thread.


Exception and error objects extend Throwable and are either checked or unchecked.

Checked exceptions Unchecked exceptions / Runtime exceptions Errors Runtime and unchecked exceptions refer to the same thing. We can often use them interchangeably.

Checked Exceptions are compile-time issues that must be handled or thrown before the compiler can build, such as IOException. Unchecked Exceptions occur at runtime, so the compiler cannot predict them and does not force they be handled. Most unchecked exceptions extend RuntimeException, such as NullPointerException. Errors are serious issues and should not be handled, such as StackOverflowError.


The throws keyword re-throws an exception up the stack to the method that called the throwing method. If the exception is continually thrown and never handled, the compiler will be satisfied in the case of checked exceptions but any issues will still break the program.

public void methodThatThrows() throws IOException {
    // throw (singular) will throw a new exception every time.
    throw new IOException();

public void methodThatCalls() {
    methodThatThrows(); // IOException must now be handled here, or methodThatCalls() must use throws as well


The most basic form of exception handling is the try-catch:

public void methodThatThrows() throws IOException {
    try {
        throw new IOException();
    } catch (IOException exception) {
        // Do something with the exception
        logger.warn("IOException thrown");

A try block must be followed by at least one catch (or finally) block, but there can be any number of catch blocks for specific (or broad) exceptions. Catch blocks must be ordered from most specific to least specific Exception objects else later catch blocks catching subclasses of exceptions caught in catch blocks above it will become unreachable code.

Multiple exceptions can also be handled in one catch block:

public void methodThatThrows() throws IOException {
    try {
        throw new IOException();
    } catch (IOException ex1 | ServletException ex2) {
        // Do something with the exception
        logger.warn("IOException thrown");


Try blocks can be followed by one finally block, and can either replace the mandatory single catch block or follow one or more catch blocks. They are always guaranteed to execute, even if no exceptions are thrown, and are useful for closing resources that may be left open in memory due to an interruption from a thrown exception.

public void methodThatThrows() throws IOException {
    try {
        throw new IOException();
    } finally {
        System.out.println("Will always run");


Declaring and defining a resource - any object that implements AutoCloseable - within a pair of parenthesis after the try keyword removes the necessity of a finally block to close that resource.

public void methodThatThrows() throws IOException {
    try (FileReader fr = new FileReader()) {
        throw new IOException();
    } catch (IOException exception) {
        logger.warn("IOException thrown");


InputStream/OutputStream -> BufferedReader/BufferedWriter

The JVM can connect to external datasources such as files or network ports. InputStream/OutputStream and its implementations stream this data as an array of bytes whereas Reader/Writer and its implementations wrap InputStream/OutputStream to stream data as a char array. BufferedReader/BufferedWriter wraps Reader/Writer to stream several characters at a time, minimizing the number of I/O operations needed.

// Overloaded BufferedReader constructors
BufferedReader sbr = new BufferedReader(new StringReader("Bufferedreader vs Console vs Scanner in Java"));
BufferedReader fbr = new BufferedReader(new FileReader("file.txt"));
BufferedReader isbr = new BufferedReader(new InputStreamReader(;

// Autoclosable, works with try-with-resources syntax
try (BufferedReader reader = new BufferedReader(new FileReader("input.txt"))) {
	return readAllLines(reader);

// Prone to throwing IOExceptions
public String readAllLines(BufferedReader reader) throws IOException {
	StringBuilder content = new StringBuilder();
	String line;
	while ((line = reader.readLine()) != null) {
	return content.toString();


BufferedReader provides many convenient methods for parsing data. Scanner can achieve the same, but unlike BufferedReader it is not thread-safe. It can however parse primitive types and Strings with regular expressions. Scanner has a buffer as well but its size is fixed and smaller than BufferedReader by default. BufferedReader requires handling IOException while Scanner does not. Thus, Scanner is best used for parsing input into tokenized Strings.

Scanner sc = new Scanner(new File("input.txt"));
Scanner issc = new Scanner(new FileInputStream("input.txt"));
Scanner csc = new Scanner(;
Scanner strsc = new Scanner("A B C D");


Load properties as key-value pairs from a file //

Properties props = new Properties();
props.load(new FileInputStream("");
String value = props.getProperty("key", "defaultValue");


When passing objects into methods and data structures, a developer can overload or extend for its specific type or cast the object up and down its inheritance heirarchy. In contrast a generic type improves code reuse and type safety, reducing code by allowing methods and data structures to accept any type without risking dynamic runtime exceptions. Generic type parameters act as placeholders in a method signature while diamond operators specify the type for the compiler to enforce at compile time:

ArrayList<String> list = new ArrayList<>();

public <T> String genericToString(T a) {   
    return a.toString();

public <T, E> String genericToStrinCat(T a, E b) {   
    return a.toString() + b.toString();

The type parameters T and E will be replaced by the compiler through type erasure:

String s1 = genericToString(1);
String s2 = genericToString("Hello", 3.5);

Collections Framework

Java’s collections framework provides an API and reference implementations for common data structures

Comparable vs Comparator

Comparable is a functional interface used to define a ‘natural ordering’ between instances of a class, commonly used by sorted collections such as TreeSet.

Comparator is another functional interface used in a dedicated utility class that can compare two different instances passed to it. It can be passed to a sort method, such as Collections.sort() or Arrays.sort(), or to sorted collections.

For example, to automatically sort a TreeSet of type Person according to age. We can either make the object class comparable or pass the constructor a comparator.


class Person implements Comparable<Person>{
	int age;
	Person(int age) {
		this.age = age;
	public int compareTo(Person o) {
		return o.age - this.age;
public static void main(String[] args) {
	TreeSet<Person> persons = new TreeSet<Person>();
	persons.add(new Person(43));
	persons.add(new Person(25));
	persons.add(new Person(111));


class Person {
	int age;
	Person(int age) {
		this.age = age;
class PersonAgeComparator implements Comparator<Person> {
	public int compare(Person a, Person b) {
		return a.age - b.age;
public static void main(String[] args) {
	TreeSet<Person> persons = new TreeSet<Person>(new PersonAgeComparator());
	persons.add(new Person(43));
	persons.add(new Person(25));
	persons.add(new Person(111));


Reflection allows one to examine or modify runtime behavior of a program. Java’s Reflection API mostly allows introspection of structure, while modifying is only allowed on access modifiers of methods and fields. Many frameworks such as JUnit, Application/Servlet Containers, and Spring use reflection to examine class fields, construct objects, and invoke methods at runtime.

Class<?> c = Class.forName("classpath.and.classname");
Object o = c.newInstance();
Method m = c.getDeclaredMethod("aMethod", new Class<?>[0]);


A thread is a unit of program execution that runs independently from other threads. Java programs can consist of multiple threads of execution that behave as if they were running on independent CPUs.

Besides the main thread, developers can instantiate their own when:

  1. A custom class extends the Thread class
  2. A Thread is passed an implemented Runnable instance

Override the run() method in both cases to define the program to be run concurrently in the new thread, then call the Thread instance’s start() method.

Extend Thread

public class CustomThread extends Thread {
    public void run() {
        // Do something

    public static void main(String[] args) {
        new CustomThread().start();

Implement Runnable

public class CustomRunnable implements Runnable {
    public void run() {
        // Do something

    public static void main(String[] args) {
        new Thread(new CustomRunnable()).start();

Anonymous Runnable Class

new Thread(new Runnable() {
        public void run() {
            // Do something

Runnable Lambda

new Thread(
    () -> { /* Do something */ };

Password Security & Authentication

Storing plain text passwords is a very bad idea. Often times security breaches leak spreadsheets and database dumps revealing user account information, including these passwords, which compromises not only these accounts but any others shared by users on other sites where these passwords may be reused. A basic solution hashes passwords, such as SHA, but there are easy ways to reverse them. A better approach is to add a few random bytes of data known as a ‘salt’ to make it harder to reverse. An even better solution however is to use encryption such as PBKDF2 or BCrypt

String passwordToHash = "p4ssw0rd";

// Generate random salt
SecureRandom random = new SecureRandom();
byte[] salt = new byte[16];

// Option 1: Hash password using SHA + salt
MessageDigest md = MessageDigest.getInstance("SHA-512");
byte[] hashedPassword = md.digest(passwordToHash.getBytes(StandardCharsets.UTF_8));

// Option 2: Hash password using PBKDF2 + salt
KeySpec spec = new PBEKeySpec(passwordToHash.toCharArray(), salt, 131072, 128);
SecretKeyFactory factory = SecretKeyFactory.getInstance("PBKDF2WithHmacSHA1");
byte[] hashedPassword = factory.generateSecret(spec).getEncoded();

Using either option, both the hashed password and the salt should be saved in a database for future authentication.

public String hashingMethod(String password, byte[] salt) {
    // return hash using SHA/Encryption + salt

public void createUser(String user, byte[] salt, byte[] hashedPassword) {
    String sql = "insert into users (username, salt, hash) values (?, ?, ?)";

    try (PreparedStatement pstmt = connection.prepareStatement(sql)) {
        pstmt.setString(1, user);
        pstmt.setBytes(2, salt);
        pstmt.setBytes(3, hashedPassword);
    } catch(NoSuchAlgorithmException | SQLException | UnsupportedEncodingException ex) {
        // log errors

public boolean authenticateUser(String user, String password) {
    String hashedPassword;
    String salt;
    String sql = "select salt, hash from users where username = ?";
    try(PreparedStatement pstmt = connection.prepareStatement(sql)) {
        pstmt.setString(1, user);
        ResultSet resultSet = pstmt.executeQuery();;
        salt = resultSet.getBytes("salt");
        hashedPassword = new String(resultSet.getBytes("hash"));

        if (hashedPassword.equals(hashingMethod(password, salt))) {
            return true;
        } else {
            return false;
    } catch(NoSuchAlgorithmException | SQLException | UnsupportedEncodingException ex) {
        return false;