Data Engineering Master Course: Spark/Hadoop/Kafka/MongoDB

Udemy

Learn the fundamentals of big data using tools such as Hadoop, Scoop, Spark, Flume, Hive, and others.

Brochure

Enquire

Online

₹ 449 2899

particular

details

                                    Medium of instructions
                                    English

                                    Mode of learning
                                    Self study

                                    Mode of Delivery
                                    Video and Text Based

Big Data Introduction

Course Intro
Big Data Intro
Understanding Big Data Ecosystem

Environment Setup

Cloudera vm setup
GCP Cluster Fixes
Cluster Setup on Google Cloud
Environment Update

Hadoop & Yarn

HDFS and Hadoop Commands
Yarn Cluster Overview

Sqoop Import

Sqoop Introduction
Managing Target Directories
Working with Parquet File Format
Working with Avro File Format
Working with Different Compressions
Conditional Imports
Split-by and Boundary Queries
Field delimeters
Incremental Appends
Sqoop-Hive Cluster Fix
Sqoop Hive Import
Sqoop List Tables/Database
Sqoop Assignment1
Sqoop Assignment2
Sqoop Import Practice1
Sqoop Import Practice2

Sqoop Export

Export from Hdfs to Mysql
Export from Hive to Mysql
Export Avro Compressed to Mysql
Bonus Lecture: Sqoop with Airflow

Apache Flume

Flume Introduction & Architecture
Exec Source and Logger Sink
Moving data from Twitter to HDFS
Moving data from NetCat to HDFS
Flume Interceptors
Flume Interceptor Example
Flume Multi-Agent Flow
Flume Consolidation

Apache Hive

Hive Introduction
Hive Database
Hive Managed Tables
Hive External Tables
Hive Inserts
Hive Analytics
Working with Parquet
Compressing Parquet
Working with Fixed File Format
Alter Command
Hive String Functions
Hive Date Functions
Hive Partitioning
Hive Bucketing

Spark with Yarn & HDFS

What is Apache Spark
Understanding Cluster Manager (Yarn)
Understanding Distributed Storage (HDFS)
Running Spark on Yarn/HDFS
Understanding Deploy Modes

GCS Cluster

Spark on GCS Cluster

Spark Internals

Drivers & Executors
RDDs & Dataframes
Transformation & Actions
Wide & Narrow Transformations
Understanding Execution Plan
Different Plans by Driver

Spark RDD : Transformation & Actions

Map/FlatMap Transformation
Filter/Intersection
Union/Distinct Transformation
GroupByKey/ Group people based on Birthday months
ReduceByKey / Total Number of students in each Subject
SortByKey / Sort students based on their rollno
MapPartition / MapPartitionWithIndex
Change number of Partitions
Join / join email address based on customer name
Spark Actions

Spark RDD Practice

Scala Tuples
Filter Error Logs
Frequency of word in Text File
Population of each city
Orders placed by Customers
average rating of movie

Spark Dataframes & Spark SQL

Dataframe Intro
Dafaframe from Json Files
Dataframe from Parquet Files
Dataframe from CSV Files
Dataframe from Avro File
Working with XML
Working with Columns
Working with String
Working with Dates
Dataframe Filter API
DataFrame API Part1
DataFrame API Part2
Spark SQL
Working with Hive Tables in Spark
Datasets versus Dataframe
User Defined Functions (UDFS)

Using Intellij IDE

Intellij Setup
Project Setup
Writing first Spark program on IDE
Understanding spark configuration
Adding Actions/Transformations
Understanding Execution Plan

Running Spark on EMR (AWS Cloud)

EMR Cluster Overview
Cluster Setup
Setting Spark Code for EMR
Using Spark-submit
Running Spark on EMR Cluster

Spark with Cassandra

Cassandra Course
Creating Spark RDD from Cassandra Table
Processing Cassandra data in Spark
Cassandra Rows to Case Class
Saving Spark RDD to Cassandra

Getting Started with MongoDB

MongoDB Intro
MongoDB Usecase & Limitations
MongoDB Installation

Crud Operations

Find
Find With Filter
Insert
Update
Update Continues
Projections
Delete

Working with Operators

In / not in Operators
gte / lte Operators
and / or operators
regex operator

MongoDB Compass

Working with GUI

Advanced Mongo

Validation/Schema
Working with Indexes

Spark with Mongo

Spark Mongo Integration

Popular Courses

Popular Platforms

Popular Searches

Data Engineering Master Course: Spark/Hadoop/Kafka/MongoDB

Online

₹ 449 2899

Quick Facts

Course overview

The highlights

Program offerings

Course and certificate fees

Fees information

certificate availability

certificate providing authority

Who it is for

What you will learn

The syllabus