PySpark & AWS: Master Big Data With PySpark and AWS

BY
Udemy

Mode

Online

Fees

₹ 599 4099

Quick Facts

particular details
Medium of instructions English
Mode of learning Self study
Mode of Delivery Video and Text Based

Course and certificate fees

Fees information
₹ 599  ₹4,099
certificate availability

Yes

certificate providing authority

Udemy

The syllabus

Introduction

  • Why Big Data
  • Applications of PySpark
  • Introduction to Instructor
  • Introduction to Course
  • Projects Overview
  • Request for Your Honest Review
  • Links for the Course's Materials and Codes

01-Introduction to Hadoop, Spark EcoSystems and Architectures

  • Links for the Course's Materials and Codes
  • Why Spark
  • Hadoop EcoSystem
  • Spark Architecture and EcoSystem
  • DataBricks SignUp
  • Create DataBricks Notebook
  • Download Spark and Dependencies
  • Java Setup on Window
  • Python Setup on Window
  • Spark Setup on Window
  • Hadoop Setup on Window
  • Runing Spark on Window
  • Java Download on MAC
  • Installing JDK on MAC
  • Setting Java Home on MAC
  • Java check on MAC
  • Installing Python on MAC
  • Setup Spark on MAC
  • Which of the following statement is True
  • Which of the following is not a part of spark ecosystem?

Spark RDDs

  • Links for the Course's Materials and Codes
  • Spark RDDs
  • Creating Spark RDD
  • Running Spark Code Locally
  • RDD stands for:
  • RDD is created by using:
  • RDD Map (Lambda)
  • RDD Map (Simple Function)
  • Quiz (Map)
  • Solution 1 (Map)
  • Solution 2 (Map)
  • RDD FlatMap
  • RDD Filter
  • Quiz (Filter)
  • Solution (Filter)
  • RDD Distinct
  • RDD GroupByKey
  • RDD ReduceByKey
  • Quiz (Word Count)
  • Solution (Word Count)
  • RDD (Count and CountByValue)
  • RDD (saveAsTextFile)
  • RDD (Partition)
  • Finding Average-1
  • Finding Average-2
  • Quiz (Average)
  • Solution (Average)
  • Finding Min and Max
  • Quiz (Min and Max)
  • Solution (Min and Max)
  • Project Overview
  • Total Students
  • Total Marks by Male and Female Student
  • Total Passed and Failed Students
  • Total Enrollments per Course
  • Total Marks per Course
  • Average marks per Course
  • Finding Minimum and Maximum marks
  • Average Age of Male and Female Students

Spark DFs

  • Links for the Course's Materials and Codes
  • Introduction to Spark DFs
  • Creating Spark DFs
  • DF stands for:
  • DF is created by using:
  • Spark Infer Schema
  • Spark Provide Schema
  • Create DF from Rdd
  • Rectifying the Error
  • Select DF Colums
  • Spark DF withColumn
  • Spark DF withColumnRenamed and Alias
  • Spark DF Filter rows
  • Quiz (select, withColumn, filter)
  • Solution (select, withColumn, filter)
  • Spark DF (Count, Distinct, Duplicate)
  • Quiz (Distinct, Duplicate)
  • Solution (Distinct, Duplicate)
  • Spark DF (sort, orderBy)
  • Quiz (sort, orderBy)
  • Solution (sort, orderBy)
  • Spark DF (Group By)
  • Spark DF (Group By - Multiple Columns and Aggregations)
  • Spark DF (Group By -Visualization)
  • Spark DF (Group By - Filtering)
  • Quiz (Group By)
  • Solution (Group By)
  • Quiz (Word Count)
  • Solution (Word Count)
  • Spark DF (UDFs)
  • Quiz (UDFs)
  • Solution (UDFs)
  • Solution (Cache and Presist)
  • Spark DF (DF to RDD)
  • Spark DF (Spark SQL)
  • Spark DF (Write DF)
  • Project Overview
  • Project (Count and Select)
  • Project (Group By)
  • Project (Group By, Aggregations and Order By)
  • Project (Filtering)
  • Project (UDF and WithColumn)
  • Project (Write)

Collaborative filtering

  • Links for the Course's Materials and Codes
  • Collaborative filtering
  • Utility Matrix
  • Explicit and Implicit Ratings
  • Expected Results
  • Dataset
  • Joining Dataframes
  • Train and Test Data
  • ALS model
  • Hyperparameter tuning and cross validation
  • Best model and evaluate predictions
  • Recommendations

Spark Streaming

  • Links for the Course's Materials and Codes
  • Introduction to Spark Streaming
  • Spark Streaming with RDD
  • Spark streaming is used to:
  • Spark Streaming Context
  • Spark Streaming Reading Data
  • Spark Streaming Cluster Restart
  • Spark Streaming RDD Transformations
  • Which statement is true about SparkContext and StreamingContext:
  • Spark Streaming DF
  • Spark Streaming Display
  • Spark Streaming DF Aggregations

ETL Pipeline

  • Links for the Course's Materials and Codes
  • Introduction to ETL
  • We can perform ETL using PySpark:
  • ETL stands for:
  • ETL pipeline Flow
  • Data set
  • Extracting Data
  • Transforming Data
  • Loading data (Creating RDS-I)
  • Load data (Creating RDS-II)
  • RDS Networking
  • Downloading Postgres
  • Installing Postgres
  • Connect to RDS thorugh PgAdmin
  • Loading Data

Project - Change Data Capture / Replication On Going

  • Links for the Course's Materials and Codes
  • Introduction to Project
  • Project Architecture
  • In this project we are going to implement:
  • The cloud service DMS will be used to:
  • Creating RDS MySql instance
  • Creating S3 Bucket
  • Creating DMS Source Endpoint
  • Creating DMS Destination Endpoint
  • Creating DMS Instance
  • MySql WorkBench
  • Connecting with RDS and Dumping Data
  • Quering RDS
  • DMS Full Load
  • DMS Replication Ongoing
  • Stoping Instances
  • Glue Job (Full Load)
  • Glue Job (Change Capture)
  • Glue Job (CDC)
  • Creating Lambda Function and Adding Trigger
  • Checking Trigger
  • Getting S3 file name in Lambda
  • Creating Glue Job
  • Adding Invoke for Glue Job
  • Testing Invoke
  • Writing Glue Shell Job
  • Full Load Pipeline
  • Change Data Capture Pipeline

Trending Courses

Popular Courses

Popular Platforms

Learn more about the Courses

Download the Careers360 App on your Android phone

Regular exam updates, QnA, Predictors, College Applications & E-books now on your Mobile

Careers360 App
150M+ Students
30,000+ Colleges
500+ Exams
1500+ E-books