Why should I learn PySpark?

PySpark is Spark with Python API that helps you to leverage the simplicity of Python and use Apache Spark to conquer Big Data. Besides, numerous job opportunities are available for certified Apache Spark Developers using Python in the budding Big Data Analytics market.

What is the duration of the Edureka Python Spark Certification Training Using PySpark?

The Python Spark Certification Training Using PySpark course by Edureka is a 6 weeks programme covered in online and weekend classes.

What skills will I learn by pursuing the Python Spark Certification Training Using PySpark programme?

You will acquire skills such as Sqoop data loading techniques, implementation of Apache Spark operations, integration of Apache Kafka with Apache Flume. You will also learn to apply clustering algorithms with the Spark MLib APIs, and more with the Edureka Python Spark Certification Training Using PySpark course.

Am I eligible to attend a demo session?

Unfortunately, to maintain the quality of classes, only a limited number of learners are allowed in every demo session. You cannot attend any demo before you enrol in the course. However, you can view the sample recordings to acquire clarity before you enrol in the certification training.

What is the mode of learning for the programme?

The Edureka Python Spark Certification Training Using PySpark course by Edureka is provided via instructor-led online sessions. Candidates have two options; they can either take weekday classes or weekend classes based on convenience.

Python Spark Certification Training Course by Edureka: Fee, Duration, How to Apply

Python Spark Certification Training Course

Edureka

To acquire expertise in Python and prepare for the Cloudera Hadoop and Spark Developer certification exam (CCA175).

Online

6 Weeks

₹ 19795 21995

particular	details
Medium of instructions English	Mode of learning Self study, Virtual Classroom	Mode of Delivery Video and Text Based	Frequency of Classes Weekends

particular

details

                                    Medium of instructions
                                    English

                                    Mode of learning
                                    Self study, Virtual Classroom

                                    Mode of Delivery
                                    Video and Text Based

                                    Frequency of Classes
                                    Weekends

Introduction to Big Data Hadoop and Spark

Topics

What is Big Data?
Big Data Customer Scenarios
Limitations and Solutions of Existing Data Analytics Architecture with Uber Use Case
How Hadoop Solves the Big Data Problem?
What is Hadoop?
Hadoop’s Key Characteristics
Hadoop Ecosystem and HDFS
Hadoop Core Components
Rack Awareness and Block Replication
YARN and its Advantage
Hadoop Cluster and its Architecture
Hadoop: Different Cluster Modes
Big Data Analytics with Batch & Real-Time Processing
Why Spark is Needed?
What is Spark?
How Spark Differs from its Competitors?
Spark at eBay
Spark’s Place in Hadoop Ecosystem

Hands-On

Hadoop terminal commands

Introduction to Python for Apache Spark

Topics

Overview of Python
Different Applications where Python is Used
Values, Types, Variables
Operands and Expressions
Conditional Statements
Loops
Command Line Arguments
Writing to the Screen
Python files I/O Functions
Numbers
Strings and related operations
Tuples and related operations
Lists and related operations
Dictionaries and related operations
Sets and related operations

Hands-On

Creating “Hello World” code
Demonstrating Conditional Statements
Demonstrating Loops
Tuple - properties, related operations, compared with list
List - properties, related operations
Dictionary - properties, related operations
Set - properties, related operations

Functions, OOPs, and Modules in Python

Topics

Functions
Function Parameters
Global Variables
Variable Scope and Returning Values
Lambda Functions
Object-Oriented Concepts
Standard Libraries
Modules Used in Python
The Import Statements
Module Search Path
Package Installation Ways

Hands-On

Functions - Syntax, Arguments, Keyword Arguments, Return Values
Lambda - Features, Syntax, Options, Compared with the Functions
Sorting - Sequences, Dictionaries, Limitations of Sorting
Errors and Exceptions - Types of Issues, Remediation
Packages and Module - Modules, Import Options, sys Path

Deep Dive into Apache Spark Framework

Topics

Spark Components & its Architecture
Spark Deployment Modes
Introduction to PySpark Shell
Submitting PySpark Job
Spark Web UI
Writing your first PySpark Job Using Jupyter Notebook
Data Ingestion using Sqoop

Hands-On

Building and Running Spark Application
Spark Application Web UI
Understanding different Spark Properties

Playing with Spark RDDs

Topics

Challenges in Existing Computing Methods
Probable Solution & How RDD Solves the Problem
What is RDD, It’s Operations, Transformations & Actions
Data Loading and Saving Through RDDs
Key-Value Pair RDDs
Other Pair RDDs, Two Pair RDDs
RDD Lineage
RDD Persistence
WordCount Program Using RDD Concepts
RDD Partitioning & How it Helps Achieve Parallelization
Passing Functions to Spark

Hands-On

Loading data in RDDs
Saving data through RDDs
RDD Transformations
RDD Actions and Functions
RDD Partitions
WordCount through RDDs

DataFrames and Spark SQL

Topics

Need for Spark SQL
What is Spark SQL
Spark SQL Architecture
SQL Context in Spark SQL
Schema RDDs
User Defined Functions
Data Frames & Datasets
Interoperating with RDDs
JSON and Parquet File Formats
Loading Data through Different Sources
Spark-Hive Integration

Hands-On

Spark SQL – Creating data frames
Loading and transforming data through different sources
Stock Market Analysis
Spark-Hive Integration

Machine Learning using Spark MLlib

Topics

Why Machine Learning
What is Machine Learning
Where Machine Learning is used
Face Detection: USE CASE
Different Types of Machine Learning Techniques
Introduction to MLlib
Features of MLlib and MLlib Tools
Various ML algorithms supported by MLlib

Hands-On

Face detection use case

Deep Dive into Spark MLlib

Topics

Supervised Learning: Linear Regression, Logistic Regression, Decision Tree, Random Forest
Unsupervised Learning: K-Means Clustering & How It Works with MLlib
Analysis of US Election Data using MLlib (K-Means)

Hands-On

K- Means Clustering
Linear Regression
Logistic Regression
Decision Tree
Random Forest

Understanding Apache Kafka and Apache Flume

Topics

Need for Kafka
What is Kafka
Core Concepts of Kafka
Kafka Architecture
Where is Kafka Used
Understanding the Components of Kafka Cluster
Configuring Kafka Cluster
Kafka Producer and Consumer Java API
Need of Apache Flume
What is Apache Flume
Basic Flume Architecture
Flume Sources
Flume Sinks
Flume Channels
Flume Configuration
Integrating Apache Flume and Apache Kafka

Hands-On

Configuring Single Node Single Broker Cluster
Configuring Single Node Multi-Broker Cluster
Producing and consuming messages through Kafka Java API
Flume Commands
Setting up Flume Agent
Streaming Twitter Data into HDFS

Apache Spark Streaming - Processing Multiple Batches

Topics

Drawbacks in Existing Computing Methods
Why Streaming is Necessary
What is Spark Streaming
Spark Streaming Features
Spark Streaming Workflow
How Uber Uses Streaming Data
Streaming Context & DStreams
Transformations on DStreams
Describe Windowed Operators and Why it is Useful
Important Windowed Operators
Slice, Window and ReduceByWindow Operators
Stateful Operators

Hands-On

WordCount Program using Spark Streaming

Apache Spark Streaming - Data Sources

Topics

Apache Spark Streaming: Data Sources
Streaming Data Source Overview
Apache Flume and Apache Kafka Data Sources
Example: Using a Kafka Direct Data Source

Hands-On

Various Spark Streaming Data Sources

Implementing an End-to-End Project

Topics

Project 1- Domain: Finance
Project 2- Domain: Media and Entertainment

Hands-On

Implementing an End-to-End Project

Spark GraphX (Self-Paced)

Topics

Introduction to Spark GraphX
Information about a Graph
GraphX Basic APIs and Operations
Spark GraphX Algorithm - PageRank, Personalized PageRank, Triangle Count, Shortest Paths, Connected Components, Strongly Connected Components, Label Propagation

Hands-On

The Traveling Salesman problem
Minimum Spanning Trees

Popular Courses

Popular Platforms

Head	Amount
Original Price	Rs. 21,995
Discounted Price	Rs. 19,795

Popular Searches

Python Spark Certification Training Course

Online

6 Weeks

₹ 19795 21995

Quick Facts

Course overview

The highlights

Program offerings

Course and certificate fees

Fees information

certificate availability

certificate providing authority

Who it is for

Eligibility criteria

What you will learn

The syllabus

Introduction to Big Data Hadoop and Spark

Topics

Hands-On

Introduction to Python for Apache Spark

Topics

Hands-On

Functions, OOPs, and Modules in Python

Topics

Hands-On

Deep Dive into Apache Spark Framework

Topics

Hands-On

Playing with Spark RDDs

Topics

Hands-On

DataFrames and Spark SQL

Topics

Hands-On

Machine Learning using Spark MLlib

Topics

Hands-On

Deep Dive into Spark MLlib

Topics

Hands-On

Understanding Apache Kafka and Apache Flume

Topics

Hands-On

Apache Spark Streaming - Processing Multiple Batches

Topics

Hands-On

Apache Spark Streaming - Data Sources

Topics

Hands-On

Implementing an End-to-End Project

Topics

Hands-On

Spark GraphX (Self-Paced)

Topics

Hands-On

Admission details

Filling the form

How it helps

FAQs

Why should I learn PySpark?

What is the duration of the Edureka Python Spark Certification Training Using PySpark

What skills will I learn by pursuing the Python Spark Certification Training Using PySpark programme?

Am I eligible to attend a demo session?

What is the mode of learning for the programme?

Articles

Popular Articles

Trending Courses

Popular Courses

Popular Platforms

Learn more about the Courses

Thank You!

Download the Careers360 App on your Android phone