PySpark Essentials for Data Scientists (Big Data + Python)

BY
Udemy

Learn the fundamentals of PySpark to manage big data for Python-based machine learning.

Mode

Online

Fees

₹ 499 799

Quick Facts

particular details
Medium of instructions English
Mode of learning Self study
Mode of Delivery Video and Text Based

Course overview

The PySpark Essentials for Data Scientists (Big Data + Python) online certification, created by Layla AI - Data Scientist Consultant & Instructor, is offered by Udemy to participants who want to learn the practical PySpark concepts they will need to know daily to become qualified data scientists and data engineers.

PySpark Essentials for Data Scientists (Big Data + Python) online training by Udemy comprises more than 17.5 hours of virtual lessons along with 28 articles and 139 downloadable resources which provide a strong foundation for big data concepts using Python. Participants in the PySpark Essentials for Data Scientists (Big Data + Python) online course will learn about a variety of subjects including data streaming, application development, data wrangling, machine learning, natural language processing, spark structured streaming, hyperparameter tuning, cross-validation, topic modeling, Gaussian mixture modeling, and much more.

The highlights

  • Certificate of completion
  • Self-paced course
  • 17.5 hours of pre-recorded video content
  • 28 articles 
  • 139 downloadable resources

Program offerings

  • Online course
  • Learning resources
  • 30-day money-back guarantee
  • Unlimited access
  • Accessible on mobile devices and tv

Course and certificate fees

Fees information
₹ 499  ₹799
certificate availability

Yes

certificate providing authority

Udemy

What you will learn

Data science knowledge Knowledge of big data Knowledge of python Machine learning Natural language processing Web application development skills

After completing the PySpark Essentials for Data Scientists (Big Data + Python) certification course, participants will gain an in-depth understanding of the fundamentals associated with PySpark for data science operation as well as will acquire the knowledge of the techniques to use Python with big data on distributed data frames like Apache Spark. In this PySpark course, participants will explore the fundamentals associated with Spark structured streaming, Spark machine learning, natural language processing, topic modeling, data streaming, Gaussian mixture modeling, cluster analysis, cross-validation, hyperparameter tuning, frequent pattern mining, data wrangling, classification, regression, SQL queries, K means clustering, and data science algorithms. In this PySpark certification, participants will also acquire the skills to develop applications using machine learning as well as will acquire the knowledge of the strategies to manipulate, join and aggregate Dataframes in Spark with Python.

The syllabus

Course Introduction

  • Frequently Asked Questions
  • Course Introduction
  • Course Orientation
  • Course Materials Bulk Download
  • Resources for Setting up PySpark
  • Python Cheat Sheet Resources
  • Introduction to PySpark
  • Transitioning from Python to PySpark Concept Review
  • Transitioning from Python to PySpark Code Along Activity

Data frame Essentials: Read, Write, Validate & Explore

  • Data frame Essentials Concept Review
  • Data frame Essentials Concept Review Quiz
  • A little something to keep you going....
  • Read, Write and Validate Data frames Code Along Activity
  • Read, Write and Validate Data HW
  • Read, Write and Validate Data HW Solutions Code Review
  • A little something to keep you going....
  • Search and Filter Data frames Code Along Activity
  • Search and Filter Data frames HW
  • Search and Filter Data frames HW Solution Code Review
  • A little something to keep you going....
  • SQL Options in Spark/PySpark Code Along Activity
  • SQL Options in Spark/PySpark HW
  • SQL Options in Spark/PySpark HW Solutions
  • A little something to keep you going....

Data frame Essentials: Clean, Manipulate, Join, Aggregate

  • Manipulating Data Frames Code Along Activity
  • Manipulating Dataframes HW
  • Manipulating Data Frames HW Solution
  • A little something to keep you going....
  • Aggregating Data in Dataframes Code Along Activity
  • Aggregating Data in Data Frames HW
  • Aggregating Data in Data Frames HW Solution
  • A little something to keep you going....
  • Joining and Appending Dataframes Code Along Activity
  • Joining and Appending Dataframes HW
  • Joining and Appending Dataframes HW Solution Code Review
  • A little something to keep you going....
  • Handling Missing Data in Dataframes Code Along Activity
  • Handling Missing Data in Data Frames HW
  • Handling Missing Data in Data Frames HW Solution
  • Dataframe Essentials Coding Master Review
  • A little something to keep you going....

Introduction to Spark MLlib

  • Introduction to Machine Learning Concept Review
  • Introduction to Machine Learning Quiz
  • Introduction to MLlib Concept Review
  • Model Selection and Tuning in MLlib Concept Review
  • Model Selection and Tuning in MLlib Quiz
  • Two Links to Bookmark
  • A little something to keep you going....

Classification in MLlib

  • Introduction to Classification in MLlib Concept Review
  • Classification in MLlib Quiz
  • A little something to keep you going....
  • Classification in MLlib Code Along Part 1: Data Formatting and Transformations
  • Classification in MLlib Code Review Part 2.0: Train and Evaluate Models [Intro]
  • Classification in MLlib Code Review Part 2.1: Train & Test Models [Logistic]
  • Classification in MLlib Code Review Part 2.2: Train & Test Models [1 vs Rest]
  • A little something to keep you going....
  • Classification in MLlib Code Review Part 2.3: Train & Test Models[Multilayer PC]
  • Classification in MLlib Code Review Part 2.4: Train & Test Models [Naive Bayes]
  • Classification in MLlib Code Review Part 2.5: Train & Test Models [Linear SVM]
  • Classification in MLlib Code Review Part 2.6: Train & Test Models[Decision Tree]
  • Classification in MLlib Code Review Part 2.7: Train & Test Models[Random Forest]
  • Classification in MLlib Code Review Part 2.8: Train & Test Models [GBT]
  • A little something to keep you going....
  • Bonus: Add loop functions to your training and evaluation script
  • Bonus: Leverage MLflow to better track and manage your results
  • Classification Project
  • Remember to be creative with this project!
  • Classification Project Solution

Natural Language Processing in MLlib

  • Introduction to Natural Language Processing
  • Introduction to Natural Language Processing Quiz
  • Natural Language Processing Concept Review [Part 1: Feature Transformers]
  • Natural Language Processing Concept Review [Part 2: Feature Extractors]
  • Natural Language Processing Feature Extractors Quiz
  • A little something to keep you going....
  • Natural Language Processing Code Along Activity Part 1: Data Prep
  • Natural Language Processing Code Along Activity Part 2: Vectorize, Train & Eval
  • Natural Language Processing Project
  • Natural Language Processing Project Solution
  • A little something to keep you going....

Regression in MLlib

  • Regression in MLlib Concept Review
  • Regression in PySpark's MLlib
  • Regression in MLlib Code Review Introduction
  • Regression in MLlib Code Review Part 1: Data Prep
  • Regression in MLlib Code Review Part 2.0: Linear Regression
  • A little something to keep you going....
  • Regression in MLlib Code Review Part 2.1: Decision Tree Regression
  • Regression in MLlib Code Review Part 2.2: Random Forest Regression
  • Regression in MLlib Code Review Part 2.3: Gradient Boosted Tree Regression
  • A little something to keep you going....
  • Bonus: Add loop functions to your regression training and evaluation script
  • Regression Project
  • And finally... have fun with this project and love what you do!
  • Regression Project Solution Code Along Activity

Clustering in PySpark

  • Intro to Clustering in MLlib Concept Review
  • Clustering Concept Review Quiz
  • K-Means & Bisecting K-Means in MLlib Code Along Activity
  • Latent Dirichlet Allocation in MLlib Code Along Activity
  • A little something to keep you going....
  • Gaussian Mixture Modeling in MLlib Code Along Activity
  • Clustering Project Introduction
  • Clustering Project Solution Code Review
  • A little something to keep you going....

Frequent Pattern Mining in MLlib

  • Frequent Pattern Mining in MLlib Concept Review
  • Frequent Pattern Mining Concept Quiz
  • Frequent Pattern Mining Code Along Activity [Part 1: FP-Growth]
  • Frequent Pattern Mining Code Along Activity [Part 2: PrefixSpan]
  • A little something to keep you going....
  • Frequent Pattern Mining Project Introduction
  • Frequent Pattern Mining Project Solution Code Review

Spark Structured Streaming

  • Intro to Spark Structured Streaming
  • Intro to Streaming Data Using Sockets
  • Twitter Structure Streaming Project Setup and Intro
  • Twitter Project Tweet Listener Setup
  • Twitter Project Structured Stream Setup and Implementation
  • Additional Spark Structured Streaming Resources

Course Wrap-up

  • Closing Remarks
  • Tips for success moving forward
  • And finally... remember to set your goals High!

Instructors

Ms Layla AI
Data Scientist Consultant
Udemy

Other Masters

Trending Courses

Popular Courses

Popular Platforms

Learn more about the Courses

Download the Careers360 App on your Android phone

Regular exam updates, QnA, Predictors, College Applications & E-books now on your Mobile

Careers360 App
150M+ Students
30,000+ Colleges
500+ Exams
1500+ E-books