Apache Spark and Scala Live Class

4.9 out of 5 (178 Ratings)

Edugators Apache Spark and Scala training helps you master the essential skills of the Apache Spark open-source framework and Scala programming language, including Spark Streaming, Spark SQL, machine learning programming, GraphX programming, and Shell Scripting Spark. You will also understand the role of Spark in overcoming the limitations of MapReduce.

Course Preview

Apache Spark and Scala Upcoming Batch - Date and Time

Get customized Apache Spark and Scala course according to your requirement

Enquire for Customization

Get Customized

Apache Spark and Scala Overview

Apache Spark and Scala Course Description

Apache Spark is an open-source data processing framework that can easily perform processing tasks on very large data sets, and also distribute data processing tasks across multiple computers, either on its own or with other distributed computing tools. Spark is one of the most growing and widely used tool for Big Data & Analytics. It has been adopted by multiple companies falling into various domains around the globe and therefore, offers promising career opportunities. In order to take part in these kind of opportunities, you need a structured training that is aligned as per Cloudera Hadoop and Spark Developer Certification (CCA175) and current industry requirements and best practices.

Scala is a general purpose programming language which is compiler based and a multi-paradigm programming language. It is a combination of functional programming and object oriented programming languages.

Who should go for this training?

Developers and Architects
BI /ETL/DW Professionals
Senior IT Professionals
Testing Professionals
Mainframe Professionals
Freshers
Big Data Enthusiasts
Software Architects, Engineers and Developers
Data Scientists and Analytics Professionals

Requirements

Computer or laptop or Smartphone with Highspeed Internet Connection
There are no such prerequisites for our Spark and Scala Certification Training. However, prior knowledge of Java Programming and SQL will be helpful but is not at all mandatory.

Apache Spark and Scala Course Syllabus

Introduction to Spark

Introduction
Objectives
Evolution of Distributed Systems
Need of New Generation Distributed Systems
Limitations of MapReduce in Hadoop
Limitations of MapReduce in Hadoop (contd.)
Batch vs. Real-Time Processing
PairRDD Methods-Others
Application of In-Memory Processing
Introduction to Apache Spark
Components of a Spark Project
History of Spark
Language Flexibility in Spark
Spark Execution Architecture
Automatic Parallelization of Complex Flows
Automatic Parallelization of Complex Flows-Important Points
APIs That Match User Goals
Apache Spark-A Unified Platform of Big Data Apps
Running Spark in Different Modes
Installing Spark as a Standalone Cluster-Configurations
Installing Spark as a Standalone Cluster-Configurations
Overview of Spark on a Cluster
Tasks of Spark on a Cluster
Hadoop Ecosystem vs. Apache Spark

Programming in Scala

Introduction to Scala
Features of Scala
Basic Data Typesand Literals
Introduction to Operators
Types of Operators
Use Basic Literals and the Arithmetic Operator
Use the Logical Operator
Introduction to Type Inference
Type Inference for Recursive Methods
Type Inference for Polymorphic Methods and Generic Classes
Unreliability on Type Inference Mechanism
Mutable Collection vs. Immutable Collection
Functions and Anonymous Functions
Objects and Classes
Traits as Interfaces and examples
Collections and Types of Collections
Lists and Perform Operations on Lists
Maps and Maps-Operations
Pattern Matching
Implicits
Streams
Use Data Structures
Question-Answer Session

Using RDD for Creating Applications in Spark

Introduction RDDs
RDDs API
Features of RDDs
Creating RDDs
Creating RDDs-Referencing an External Dataset
Referencing an External Dataset-Text Files
Referencing an External Dataset-Sequence Files
Referencing an External Dataset-Other Hadoop Input Formats
Creating RDDs-Important Points
RDD Operations
RDD Operations-Transformations
Features of RDD Persistence
Storage Levels Of RDD Persistence
Choosing The Correct RDD Persistence Storage Level
Invoking the Spark Shell
Importing Spark Classes
Creating the SparkContext
Loading a File in Shell
Packaging a Spark Project with SBT
Running a Spark Project With SBT
Build a Scala Project
Build a Spark Java Project
Shared Variables-Broadcast and Variables-Accumulators
Writing a Scala Application and run a Scala Application
Write a Scala Application Reading the Hadoop Data
Run a Scala Application Reading the Hadoop Data
Scala RDD Extensions
DoubleRDD Methods
PairRDD Methods-Join
Java PairRDD Methods
General RDD Methods
Java RDD Methods and Common Java RDD Methods
Spark Java Function Classes
Method for Combining JavaPairRDD Functions
Transformations in RDD
Actions in RDD
Key-Value Pair RDD in Scala and Java
Using MapReduce and Pair RDD Operations
Reading and writing Text File from HDFS
Reading and writing Sequence File from HDFS
Using GroupBy
Run a Scala Application Performing GroupBy Operation
Run a Scala Application Using the Scala Shell
Write and Run a Java Application
Question-Answer Session

Running SQL Queries Using Spark SQL

Importance of Spark SQL
Benefits of Spark SQL
DataFrames
SQLContext
Creating a DataFrame
Using DataFrame Operations
Run SparkSQL with a Dataframe
Interoperating with RDDs
Using the Reflection-Based Approach
Using the Programmatic Approach
Run Spark SQL Programmatically
Data Sources
Save Modes
Parquet Files
Partition Discovery
Schema Merging
JSON Data
Hive Table
DML Operation-Hive Queries
Run Hive Queries Using Spark SQL
JDBC to Other Databases
Supported Hive Features
Supported Hive Data Types
Case Classes

Spark Streaming

Introduction to Spark Streaming
Working of Spark Streaming
Features of Spark Streaming
Streaming Word Count
Micro Batch
DStreams
Input DStreams and Receivers
Basic Sources
Advanced Sources
Advanced Sources-Twitter
Transformations on Dstreams
Output Operations on DStreams
Design Patterns for Using ForeachRDD
DataFrame and SQL Operations
Checkpointing and Enabling Checkpointing
Socket Stream and File Stream
Stateful Operations and Window Operations
Types of Window Operations
Join Operations-Stream-Dataset Joins
Join Operations-Stream-Stream Joins
Monitoring Spark Streaming Application
Performance Tuning-High Level
Performance Tuning-Detail Level
Capture and Process the Netcat Data
Capture and Process the Flume Data
Capture the Twitter Data
Question-Answer Session

Spark ML Programming

Introduction Spark ML Programming
Introduction to Machine Learning
Common Terminologies in Machine Learning
Applications of Machine Learning
Machine Learning in Spark
Spark ML API
DataFrames
Transformers and Estimators
Pipeline
Working of a Pipeline
DAG Pipelines
Runtime Checking
Parameter Passing
General Machine Learning Pipeline-Example
Model Selection via Cross-Validation
Supported Types, Algorithms, and Utilities
Data Types
Feature Extraction and Basic Statistics
Clustering
K-Means
Perform Clustering Using K-Means
Gaussian Mixture
Power Iteration Clustering (PIC)
Latent Dirichlet Allocation (LDA)
Collaborative Filtering
Classification
Regression and its example
Perform Classification Using Linear Regression
Run Linear Regression
Perform Recommendation Using Collaborative Filtering
Run Recommendation System

Spark GraphX Programming

Introduction to Graph-Parallel System
Limitations of Graph-Parallel System
Introduction to GraphX
Importing GraphX
The Property Graph
Features of the Property Graph
Creating a Graph
Create a Graph Using GraphX
Triplet View
Graph Operators
List of Operators
Property Operators and Structural Operators
Subgraphs
Join Operators
Perform Graph Operations Using GraphX
Perform Subgraph Operations
Neighborhood Aggregation
mapReduceTriplets
Perform MapReduce Operations
Counting Degree of Vertex
Collecting Neighbors
Caching and Uncaching
Graph Builders
Vertex and Edge RDDs
Graph System Optimizations
Built-in Algorithms
Question-Answer Session

499 18,999

ENROLL NOW

Call us for course related queries

+91 9311746545

Course Features

Sessions 19
Assignment Yes
Duration 2 hours / Class
Language English / Hindi

What People Say

Nagmani Solanki

Digital Marketing

Edugators platform is the best place to learn live classes, and live projects by which you can understand easily and have excellent customer service.

Saurabh Arya

Full Stack Developer

It was a very good experience. Edugators and the instructor worked with us through the whole process to ensure we received the best training solution for our needs.

Praveen Madhukar

Web Design

I would definitely recommend taking courses from Edugators. The instructors are very knowledgeable, receptive to questions and willing to go out of the way to help you.