Upcoming Batch - Date and Time

Customize Apache Spark and Scala training course according to your requirement

Enquire for Upcoming Batch - Date and Time

Get Customized

Overview

Course Description

Course Content


  • Introduction
  • Objectives
  • Evolution of Distributed Systems
  • Need of New Generation Distributed Systems
  • Limitations of MapReduce in Hadoop
  • Limitations of MapReduce in Hadoop (contd.)
  • Batch vs. Real-Time Processing
  • PairRDD Methods-Others
  • Application of In-Memory Processing
  • Introduction to Apache Spark
  • Components of a Spark Project
  • History of Spark
  • Language Flexibility in Spark
  • Spark Execution Architecture
  • Automatic Parallelization of Complex Flows
  • Automatic Parallelization of Complex Flows-Important Points
  • APIs That Match User Goals
  • Apache Spark-A Unified Platform of Big Data Apps
  • Running Spark in Different Modes
  • Installing Spark as a Standalone Cluster-Configurations
  • Installing Spark as a Standalone Cluster-Configurations
  • Overview of Spark on a Cluster
  • Tasks of Spark on a Cluster
  • Hadoop Ecosystem vs. Apache Spark
  • Introduction to Scala
  • Features of Scala
  • Basic Data Typesand Literals
  • Introduction to Operators
  • Types of Operators
  • Use Basic Literals and the Arithmetic Operator
  • Use the Logical Operator
  • Introduction to Type Inference
  • Type Inference for Recursive Methods
  • Type Inference for Polymorphic Methods and Generic Classes
  • Unreliability on Type Inference Mechanism
  • Mutable Collection vs. Immutable Collection
  • Functions and Anonymous Functions
  • Objects and Classes
  • Traits as Interfaces and examples
  • Collections and Types of Collections
  • Lists and Perform Operations on Lists
  • Maps and Maps-Operations
  • Pattern Matching
  • Implicits
  • Streams
  • Use Data Structures
  • Question-Answer Session
  • Introduction RDDs
  • RDDs API
  • Features of RDDs
  • Creating RDDs
  • Creating RDDs-Referencing an External Dataset
  • Referencing an External Dataset-Text Files
  • Referencing an External Dataset-Sequence Files
  • Referencing an External Dataset-Other Hadoop Input Formats
  • Creating RDDs-Important Points
  • RDD Operations
  • RDD Operations-Transformations
  • Features of RDD Persistence
  • Storage Levels Of RDD Persistence
  • Choosing The Correct RDD Persistence Storage Level
  • Invoking the Spark Shell
  • Importing Spark Classes
  • Creating the SparkContext
  • Loading a File in Shell
  • Packaging a Spark Project with SBT
  • Running a Spark Project With SBT
  • Build a Scala Project
  • Build a Spark Java Project
  • Shared Variables-Broadcast and Variables-Accumulators
  • Writing a Scala Application and run a Scala Application
  • Write a Scala Application Reading the Hadoop Data
  • Run a Scala Application Reading the Hadoop Data
  • Scala RDD Extensions
  • DoubleRDD Methods
  • PairRDD Methods-Join
  • Java PairRDD Methods
  • General RDD Methods
  • Java RDD Methods and Common Java RDD Methods
  • Spark Java Function Classes
  • Method for Combining JavaPairRDD Functions
  • Transformations in RDD
  • Actions in RDD
  • Key-Value Pair RDD in Scala and Java
  • Using MapReduce and Pair RDD Operations
  • Reading and writing Text File from HDFS
  • Reading and writing Sequence File from HDFS
  • Using GroupBy
  • Run a Scala Application Performing GroupBy Operation
  • Run a Scala Application Using the Scala Shell
  • Write and Run a Java Application
  • Question-Answer Session
  • Importance of Spark SQL
  • Benefits of Spark SQL
  • DataFrames
  • SQLContext
  • Creating a DataFrame
  • Using DataFrame Operations
  • Run SparkSQL with a Dataframe
  • Interoperating with RDDs
  • Using the Reflection-Based Approach
  • Using the Programmatic Approach
  • Run Spark SQL Programmatically
  • Data Sources
  • Save Modes
  • Parquet Files
  • Partition Discovery
  • Schema Merging
  • JSON Data
  • Hive Table
  • DML Operation-Hive Queries
  • Run Hive Queries Using Spark SQL
  • JDBC to Other Databases
  • Supported Hive Features
  • Supported Hive Data Types
  • Case Classes
  • Introduction to Spark Streaming
  • Working of Spark Streaming
  • Features of Spark Streaming
  • Streaming Word Count
  • Micro Batch
  • DStreams
  • Input DStreams and Receivers
  • Basic Sources
  • Advanced Sources
  • Advanced Sources-Twitter
  • Transformations on Dstreams
  • Output Operations on DStreams
  • Design Patterns for Using ForeachRDD
  • DataFrame and SQL Operations
  • Checkpointing and Enabling Checkpointing
  • Socket Stream and File Stream
  • Stateful Operations and Window Operations
  • Types of Window Operations
  • Join Operations-Stream-Dataset Joins
  • Join Operations-Stream-Stream Joins
  • Monitoring Spark Streaming Application
  • Performance Tuning-High Level
  • Performance Tuning-Detail Level
  • Capture and Process the Netcat Data
  • Capture and Process the Flume Data
  • Capture the Twitter Data
  • Question-Answer Session
  • Introduction Spark ML Programming
  • Introduction to Machine Learning
  • Common Terminologies in Machine Learning
  • Applications of Machine Learning
  • Machine Learning in Spark
  • Spark ML API
  • DataFrames
  • Transformers and Estimators
  • Pipeline
  • Working of a Pipeline
  • DAG Pipelines
  • Runtime Checking
  • Parameter Passing
  • General Machine Learning Pipeline-Example
  • Model Selection via Cross-Validation
  • Supported Types, Algorithms, and Utilities
  • Data Types
  • Feature Extraction and Basic Statistics
  • Clustering
  • K-Means
  • Perform Clustering Using K-Means
  • Gaussian Mixture
  • Power Iteration Clustering (PIC)
  • Latent Dirichlet Allocation (LDA)
  • Collaborative Filtering
  • Classification
  • Regression and its example
  • Perform Classification Using Linear Regression
  • Run Linear Regression
  • Perform Recommendation Using Collaborative Filtering
  • Run Recommendation System
  • Introduction to Graph-Parallel System
  • Limitations of Graph-Parallel System
  • Introduction to GraphX
  • Importing GraphX
  • The Property Graph
  • Features of the Property Graph
  • Creating a Graph
  • Create a Graph Using GraphX
  • Triplet View
  • Graph Operators
  • List of Operators
  • Property Operators and Structural Operators
  • Subgraphs
  • Join Operators
  • Perform Graph Operations Using GraphX
  • Perform Subgraph Operations
  • Neighborhood Aggregation
  • mapReduceTriplets
  • Perform MapReduce Operations
  • Counting Degree of Vertex
  • Collecting Neighbors
  • Caching and Uncaching
  • Graph Builders
  • Vertex and Edge RDDs
  • Graph System Optimizations
  • Built-in Algorithms
  • Question-Answer Session

What People Say

Nagmani Solanki

Digital Marketing Expert

Edugators platform is the best place to learn live classes, and live projects by which you can understand easily and have excellent customer service.

2.jpg

Saurabh Arya

Software Developer

It was a very good experience. Edugators and the instructor worked with us through the whole process to ensure we received the best training solution for our needs.

Shyam Kumar

Graphic Designer

I would definitely recommend taking courses from Edugators. The instructors are very knowledgeable, receptive to questions and willing to go out of the way to help you.

Need To Train Your Team?

Customized Corporate Training Programs and Developing Skills For Business Success.

Get Newsletter

Subscibe to our newsletter and we will notify you about the newest updates on Edugators