- The Course Overview
- Brief Introduction to Spark
- Apache Spark Stack
- Spark Execution Process
- Newest Capabilities of PySpark 2.0+
- Cloning GitHub Repository
Learning PySpark
Gain a thorough understanding of the strategies and methodologies used with Python and Apache Spark to develop and ...Read more
Online
₹ 499 1799
Quick Facts
particular | details | |||
---|---|---|---|---|
Medium of instructions
English
|
Mode of learning
Self study
|
Mode of Delivery
Video and Text Based
|
Course overview
Python's PySpark provides an interface for Apache Spark. Along with offering the PySpark shell for collaborative data analysis in a distributed environment, PySpark allows users to create Spark applications using Python APIs. The majority of Spark's features, including Spark SQL, DataFrame, Streaming, MLlib, and Spark Core, are supported by PySpark. Learning PySpark online certification is designed by Packt Publishing - a learning platform that is intended to offer certifications to help applicants improve their knowledge and is delivered by Udemy.
Learning PySpark online training is a short-term program that contains 2.5 hours of video lectures along with downloadable resources which begins by offering applicants a solid understanding of Apache Spark and how to set up a Python environment for Spark. With the help of Learning PySpark online classes, applicants will learn about Spark SQL, Spark DataFrame, API, lazy execution, data sorting, data aggregation, and data transformation as well as will learn how to process data using Spark DataFrames and master data collection methods by distributed data processing.
The highlights
- Certificate of completion
- Self-paced course
- 2.5 hours of pre-recorded video content
- 1 downloadable resource
Program offerings
- Online course
- Learning resources
- 30-day money-back guarantee
- Unlimited access
- Accessible on mobile devices and tv
Course and certificate fees
Fees information
certificate availability
Yes
certificate providing authority
Udemy
Who it is for
What you will learn
After completing the Learning PySpark certification course, applicants will gain a solid understanding of the fundamentals of PySpark as well as will acquire the knowledge of the functionalities of Apache Spark and Spark 2.0 for application development and deployment. In this PySpark course, applicants will explore the functionalities of Spark SQL, Spark Dataframes, API, and Dataframes as well as will acquire an understanding of the schemas for transformations, lazy execution, and resilient distributed datasets. In this PySpark certification, applicants will also learn about the methodologies involved with data aggregation, data transformation, and data sorting.
The syllabus
A Brief Primer on PySpark
Resilient Distributed Datasets
- Brief Introduction to RDDs
- Creating RDDs
- Schema of an RDD
- Understanding Lazy Execution
- Introducing Transformations – .map(…)
- Introducing Transformations – .filter(…)
- Introducing Transformations – .flatMap(…)
- Introducing Transformations – .distinct(…)
- Introducing Transformations – .sample(…)
- Introducing Transformations – .join(…)
- Introducing Transformations – .repartition(…)
Resilient Distributed Datasets and Actions
- Introducing Actions – .take(…)
- Introducing Actions – .collect(…)
- Introducing Actions – .reduce(…) and .reduceByKey(…)
- Introducing Actions – .count()
- Introducing Actions – .foreach(…)
- Introducing Actions – .aggregate(…) and .aggregateByKey(…)
- Introducing Actions – .coalesce(…)
- Introducing Actions – .combineByKey(…)
- Introducing Actions – .histogram(…)
- Introducing Actions – .sortBy(…)
- Introducing Actions – Saving Data
- Introducing Actions – Descriptive Statistics
DataFrames and Transformations
- Introduction
- Creating DataFrames
- Specifying Schema of a DataFrame
- Interacting with DataFrames
- The .agg(…) Transformation
- The .sql(…) Transformation
- Creating Temporary Tables
- Joining Two DataFrames
- Performing Statistical Transformations
- The .distinct(…) Transformation
Data Processing with Spark DataFrames
- Schema Changes
- Filtering Data
- Aggregating Data
- Selecting Data
- Transforming Data
- Presenting Data
- Sorting DataFrames
- Saving DataFrames
- Pitfalls of UDFs
- Repartitioning Data