- Introduction
- Course Overview
- Frequently Asked Questions
- What is Spark? Why Python?
Spark and Python for Big Data with PySpark
Learn the skills for using Python, Spark Streaming, Machine Learning, and Spark 2.0 DataFrames with Spark and Python ...Read more
Online
₹ 599 4099
Quick Facts
particular | details | |||
---|---|---|---|---|
Medium of instructions
English
|
Mode of learning
Self study
|
Mode of Delivery
Video and Text Based
|
Course overview
Spark and Python for Big Data with PySpark online certification are designed to develop one of the most demanding and valuable skills in the field of technology i.e., big data analytics technology which is widely used among technology firms including Google, Meta, Netflix, Amazon, and Airbnb i.e., Apache Spark. Spark is up to 100 times quicker and efficient than Hadoop MapReduce, resulting in strong growth in the need for skilled professionals. Due to the newness of the Spark 2.0 DataFrame framework, individuals with skills are considered among the most valuable professionals in the industry.
Spark and Python for Big Data with PySpark online course is a short-term programme developed by Jose Portilla, Head of Data Science, Pierian Data Inc., and offered by Udemy Inc., a US-based online learning platform that provides courses for both amateurs and professionals.
Spark and Python for Big Data with PySpark syllabus include topics such as fundamentals of Spark DataFrames, optimization of Spark 2.0 syntax. The course also provides content for MLlib Machine Library with DataFrame syntax and Spark, Spark SQL, Spark streaming and Gradient Boosted Trees which learners will learn through 10+ hours of video content, articles and downloadable materials.
The highlights
- Certificate of completion
- Self-paced course
- English videos with multi-language subtitles
- 10.5 hours of pre-recorded video content
- Online course
- 30-day money-back guarantee
- Unlimited access
- Accessible on mobile devices and TV
Program offerings
- Certificate of completion
- Self-paced course
- English videos with multi-language subtitles
- 10.5 hours of pre-recorded video content
- 4 articles
- 4 downloadable resources
- 30-day money-back guarantee
- Unlimited access
- Accessible on mobile devices and tv
Course and certificate fees
Fees information
certificate availability
Yes
certificate providing authority
Udemy
Who it is for
What you will learn
After completing the Spark and Python for Big Data with PySpark certification course, learners will gain knowledge of using Python and Spark programs to analyse big data, syntax in Spark 2.0 DataFrame, classification using Spark with Random Forests, use Logistic Regression to categorise customer churn. Individuals will also learn to develop machine learning models using Spark’s MLlib, use AWS Elastic MapReduce service, analyse big data by setting up Amazon web services, developing spam filters using NaturalLanguage Processing and Spark.
The syllabus
Introduction to Course
Setting up Python with Spark
- Set-up Overview
- Note on Installation Sections
Databricks Setup
- Recommended Setup
- Databricks Setup
Local Installation VirtualBox
- Local Installation VirtualBox Part 1
- Local Installation VirtualBox Part 2
- Setting up PySpark
AWS EC2 PySpark Set-up
- AWS EC2 Set-up Guide
- Creating the EC2 Instance
- SSH with Mac or Linux
- Installations on EC2
AWS EMR Cluster Setup
- AWS EMR Setup
Python Crash Course
- Introduction to Python Crash Course
- Jupyter Notebook Overview
- Python Crash Course Part One
- Python Crash Course Part Two
- Python Crash Course Part Three
- Python Crash Course Exercises
- Python Crash Course Exercise Solutions
Spark DataFrame Basics
- Introduction to Spark DataFrames
- Spark DataFrame Basics
- Spark DataFrame Basics Part Two
- Spark DataFrame Basic Operations
- Groupby and Aggregate Operations
- Missing Data
- Dates and Timestamps
Spark DataFrame Project Exercise
- DataFrame Project Exercise
- DataFrame Project Exercise Solutions
Introduction to Machine Learning with MLlib
- Introduction to Machine Learning and ISLR
- Machine Learning with Spark and Python with MLlib
Linear Regression
- Linear Regression Theory and Reading
- Linear Regression Documentation Example
- Regression Evaluation
- Linear Regression Example Code Along
- Linear Regression Consulting Project
- Linear Regression Consulting Project Solutions
Logistic Regression
- Logistic Regression Theory and Reading
- Logistic Regression Example Code Along
- Logistic Regression Code Along
- Logistic Regression Consulting Project
- Logistic Regression Consulting Project Solutions
Decision Trees and Random Forests
- Tree Methods Theory and Reading
- Tree Methods Documentation Examples
- Decision Tress and Random Forest Code Along Examples
- Random Forest - Classification Consulting Project
- Random Forest Classification Consulting Project Solutions
K-means Clustering
- K-means Clustering Theory and Reading
- KMeans Clustering Documentation Example
- Clustering Example Code Along
- Clustering Consulting Project
- Clustering Consulting Project Solutions
Collaborative Filtering for Recommender Systems
- Introduction to Recommender Systems
- Recommender System - Code Along Project
Natural Language Processing
- Introduction to Natural Language Processing
- NLP Tools Part One
- NLP Tools Part Two
- Natural Language Processing Code Along Project
Spark Streaming with Python
- Introduction to Streaming with Spark!
- Spark Streaming Documentation Example
- Spark Streaming Twitter Project - Part
- Spark Streaming Twitter Project - Part Two
- Spark Streaming Twitter Project - Part Three
Bonus
Bonus Lecture