Upcoming Batch - Date and Time

Get customized Apache Spark and Scala course according to your requirement

Enquire for Customization

Get Customized

Overview

Course Description

Apache Spark is an open-source data processing framework that can easily perform processing tasks on very large data sets, and also distribute data processing tasks across multiple computers, either on its own or with other distributed computing tools. Spark is one of the most growing and widely used tool for Big Data & Analytics. It has been adopted by multiple companies falling into various domains around the globe and therefore, offers promising career opportunities. In order to take part in these kind of opportunities, you need a structured training that is aligned as per Cloudera Hadoop and Spark Developer Certification (CCA175) and current industry requirements and best practices.

Scala is a general purpose programming language which is compiler based and a multi-paradigm programming language. It is a combination of functional programming and object oriented programming languages.

Who should go for this training?

  • Developers and Architects
  • BI /ETL/DW Professionals
  • Senior IT Professionals
  • Testing Professionals
  • Mainframe Professionals
  • Freshers
  • Big Data Enthusiasts
  • Software Architects, Engineers and Developers
  • Data Scientists and Analytics Professionals

Requirements

  • Computer or laptop or Smartphone with Highspeed Internet Connection

  • There are no such prerequisites for our Spark and Scala Certification Training. However, prior knowledge of Java Programming and SQL will be helpful but is not at all mandatory.

Course Syllabus

  • Introduction
  • Objectives
  • Evolution of Distributed Systems
  • Need of New Generation Distributed Systems
  • Limitations of MapReduce in Hadoop
  • Limitations of MapReduce in Hadoop (contd.)
  • Batch vs. Real-Time Processing
  • PairRDD Methods-Others
  • Application of In-Memory Processing
  • Introduction to Apache Spark
  • Components of a Spark Project
  • History of Spark
  • Language Flexibility in Spark
  • Spark Execution Architecture
  • Automatic Parallelization of Complex Flows
  • Automatic Parallelization of Complex Flows-Important Points
  • APIs That Match User Goals
  • Apache Spark-A Unified Platform of Big Data Apps
  • Running Spark in Different Modes
  • Installing Spark as a Standalone Cluster-Configurations
  • Installing Spark as a Standalone Cluster-Configurations
  • Overview of Spark on a Cluster
  • Tasks of Spark on a Cluster
  • Hadoop Ecosystem vs. Apache Spark
  • Introduction to Scala
  • Features of Scala
  • Basic Data Typesand Literals
  • Introduction to Operators
  • Types of Operators
  • Use Basic Literals and the Arithmetic Operator
  • Use the Logical Operator
  • Introduction to Type Inference
  • Type Inference for Recursive Methods
  • Type Inference for Polymorphic Methods and Generic Classes
  • Unreliability on Type Inference Mechanism
  • Mutable Collection vs. Immutable Collection
  • Functions and Anonymous Functions
  • Objects and Classes
  • Traits as Interfaces and examples
  • Collections and Types of Collections
  • Lists and Perform Operations on Lists
  • Maps and Maps-Operations
  • Pattern Matching
  • Implicits
  • Streams
  • Use Data Structures
  • Question-Answer Session
  • Introduction RDDs
  • RDDs API
  • Features of RDDs
  • Creating RDDs
  • Creating RDDs-Referencing an External Dataset
  • Referencing an External Dataset-Text Files
  • Referencing an External Dataset-Sequence Files
  • Referencing an External Dataset-Other Hadoop Input Formats
  • Creating RDDs-Important Points
  • RDD Operations
  • RDD Operations-Transformations
  • Features of RDD Persistence
  • Storage Levels Of RDD Persistence
  • Choosing The Correct RDD Persistence Storage Level
  • Invoking the Spark Shell
  • Importing Spark Classes
  • Creating the SparkContext
  • Loading a File in Shell
  • Packaging a Spark Project with SBT
  • Running a Spark Project With SBT
  • Build a Scala Project
  • Build a Spark Java Project
  • Shared Variables-Broadcast and Variables-Accumulators
  • Writing a Scala Application and run a Scala Application
  • Write a Scala Application Reading the Hadoop Data
  • Run a Scala Application Reading the Hadoop Data
  • Scala RDD Extensions
  • DoubleRDD Methods
  • PairRDD Methods-Join
  • Java PairRDD Methods
  • General RDD Methods
  • Java RDD Methods and Common Java RDD Methods
  • Spark Java Function Classes
  • Method for Combining JavaPairRDD Functions
  • Transformations in RDD
  • Actions in RDD
  • Key-Value Pair RDD in Scala and Java
  • Using MapReduce and Pair RDD Operations
  • Reading and writing Text File from HDFS
  • Reading and writing Sequence File from HDFS
  • Using GroupBy
  • Run a Scala Application Performing GroupBy Operation
  • Run a Scala Application Using the Scala Shell
  • Write and Run a Java Application
  • Question-Answer Session
  • Importance of Spark SQL
  • Benefits of Spark SQL
  • DataFrames
  • SQLContext
  • Creating a DataFrame
  • Using DataFrame Operations
  • Run SparkSQL with a Dataframe
  • Interoperating with RDDs
  • Using the Reflection-Based Approach
  • Using the Programmatic Approach
  • Run Spark SQL Programmatically
  • Data Sources
  • Save Modes
  • Parquet Files
  • Partition Discovery
  • Schema Merging
  • JSON Data
  • Hive Table
  • DML Operation-Hive Queries
  • Run Hive Queries Using Spark SQL
  • JDBC to Other Databases
  • Supported Hive Features
  • Supported Hive Data Types
  • Case Classes
  • Introduction to Spark Streaming
  • Working of Spark Streaming
  • Features of Spark Streaming
  • Streaming Word Count
  • Micro Batch
  • DStreams
  • Input DStreams and Receivers
  • Basic Sources
  • Advanced Sources
  • Advanced Sources-Twitter
  • Transformations on Dstreams
  • Output Operations on DStreams
  • Design Patterns for Using ForeachRDD
  • DataFrame and SQL Operations
  • Checkpointing and Enabling Checkpointing
  • Socket Stream and File Stream
  • Stateful Operations and Window Operations
  • Types of Window Operations
  • Join Operations-Stream-Dataset Joins
  • Join Operations-Stream-Stream Joins
  • Monitoring Spark Streaming Application
  • Performance Tuning-High Level
  • Performance Tuning-Detail Level
  • Capture and Process the Netcat Data
  • Capture and Process the Flume Data
  • Capture the Twitter Data
  • Question-Answer Session
  • Introduction Spark ML Programming
  • Introduction to Machine Learning
  • Common Terminologies in Machine Learning
  • Applications of Machine Learning
  • Machine Learning in Spark
  • Spark ML API
  • DataFrames
  • Transformers and Estimators
  • Pipeline
  • Working of a Pipeline
  • DAG Pipelines
  • Runtime Checking
  • Parameter Passing
  • General Machine Learning Pipeline-Example
  • Model Selection via Cross-Validation
  • Supported Types, Algorithms, and Utilities
  • Data Types
  • Feature Extraction and Basic Statistics
  • Clustering
  • K-Means
  • Perform Clustering Using K-Means
  • Gaussian Mixture
  • Power Iteration Clustering (PIC)
  • Latent Dirichlet Allocation (LDA)
  • Collaborative Filtering
  • Classification
  • Regression and its example
  • Perform Classification Using Linear Regression
  • Run Linear Regression
  • Perform Recommendation Using Collaborative Filtering
  • Run Recommendation System
  • Introduction to Graph-Parallel System
  • Limitations of Graph-Parallel System
  • Introduction to GraphX
  • Importing GraphX
  • The Property Graph
  • Features of the Property Graph
  • Creating a Graph
  • Create a Graph Using GraphX
  • Triplet View
  • Graph Operators
  • List of Operators
  • Property Operators and Structural Operators
  • Subgraphs
  • Join Operators
  • Perform Graph Operations Using GraphX
  • Perform Subgraph Operations
  • Neighborhood Aggregation
  • mapReduceTriplets
  • Perform MapReduce Operations
  • Counting Degree of Vertex
  • Collecting Neighbors
  • Caching and Uncaching
  • Graph Builders
  • Vertex and Edge RDDs
  • Graph System Optimizations
  • Built-in Algorithms
  • Question-Answer Session

What People Say

Nagmani Solanki

Digital Marketing

Edugators platform is the best place to learn live classes, and live projects by which you can understand easily and have excellent customer service.

Saurabh Arya

Full Stack Developer

It was a very good experience. Edugators and the instructor worked with us through the whole process to ensure we received the best training solution for our needs.

Praveen Madhukar

Web Design

I would definitely recommend taking courses from Edugators. The instructors are very knowledgeable, receptive to questions and willing to go out of the way to help you.

Need To Train Your Corporate Team ?

Customized Corporate Training Programs and Developing Skills For Project Success.

Get Newsletter

Subscibe to our newsletter and we will notify you about the newest updates on Edugators