Revature 200413
Data Engineering with Java & Apache Spark
View My GitHub Profile
Week 4 - Big Data
Concepts
Big Data: Batch vs stream processing
JavaSE 8: Streams, Functional Interfaces,
Lambdas
MapReduce
Hadoop: Ecosystem, HDFS
Apache Spark:
Local vs cluster mode
RDD
Key/Value pairs
Transformations
Actions
Shared variables
Accumulators
-
org.apache.spark.api.java
JavaSparkContext
JavaRDD