Data Engineering Master Course: Spark/Hadoop/Kafka/MongoDB

BY
Udemy

Learn the fundamentals of big data using tools such as Hadoop, Scoop, Spark, Flume, Hive, and others.

Mode

Online

Fees

₹ 449 2899

Quick Facts

particular details
Medium of instructions English
Mode of learning Self study
Mode of Delivery Video and Text Based

Course overview

Big Data is a massive collection of data that is growing rapidly over time. It is a data set that is so big and complicated that traditional data processing tools cannot store or process it proficiently. Big data is a type of information that is extremely massive in size. Data Engineering Master Course: Spark/Hadoop/Kafka/MongoDB online certification is developed by Navdeep Kaur Instructor & Founder of TechnoAvengers.com and is offered by Udemy.

Data Engineering Master Course: Spark/Hadoop/Kafka/MongoDB online course involves more than 11.5 hours of prerecorded lessons, 5 articles, and 21 downloadable resources which begin with an explanation of what the Hadoop distributed file system is along with the most prevalent Hadoop commands required to operate with Hadoop file system. Master Big Data - Apache Spark/Hadoop/Sqoop/Hive/Flume/Mongo online training will also cover topics such as data migration, data ingestion, CRUD operations, data frames, and the functionalities of tools such as MySQL, Cassandra, Sqoop, Flume, Hive, and MongoDB.

The highlights

  • Certificate of completion
  • Self-paced course
  • 11.5 hours of pre-recorded video content
  • 5 articles
  • 21 downloadable resources
  • Assignments

Program offerings

  • Online course
  • Learning resources
  • 30-day money-back guarantee
  • Unlimited access
  • Accessible on mobile devices and tv

Course and certificate fees

Fees information
₹ 449  ₹2,899
certificate availability

Yes

certificate providing authority

Udemy

What you will learn

Knowledge of big data Knowledge of mongodb Knowledge of apache spark

After completing the Data Engineering Master Course: Spark/Hadoop/Kafka/MongoDB certification course, candidates will acquire an understanding of the principles associated with big data for big data analytics using the Hadoop, Sqoop, Hive, Spark, Flume and MongoDB. Candidates will explore the fundamentals involved with data frames, data migration, data ingestion, GUI, and CRUD operations. Candidates will learn about migrating data from Hive to MySQL, HDFS to MySQL as well as will acquire the knowledge of the functionalities of Cassandra, Intellij IDE, and Netcat for big data operations.

The syllabus

Big Data Introduction

  • Course Intro
  • Big Data Intro
  • Understanding Big Data Ecosystem

Environment Setup

  • Cloudera vm setup
  • GCP Cluster Fixes
  • Cluster Setup on Google Cloud
  • Environment Update

Hadoop & Yarn

  • HDFS and Hadoop Commands
  • Yarn Cluster Overview

Sqoop Import

  • Sqoop Introduction
  • Managing Target Directories
  • Working with Parquet File Format
  • Working with Avro File Format
  • Working with Different Compressions
  • Conditional Imports
  • Split-by and Boundary Queries
  • Field delimeters
  • Incremental Appends
  • Sqoop-Hive Cluster Fix
  • Sqoop Hive Import
  • Sqoop List Tables/Database
  • Sqoop Assignment1
  • Sqoop Assignment2
  • Sqoop Import Practice1
  • Sqoop Import Practice2

Sqoop Export

  • Export from Hdfs to Mysql
  • Export from Hive to Mysql
  • Export Avro Compressed to Mysql
  • Bonus Lecture: Sqoop with Airflow

Apache Flume

  • Flume Introduction & Architecture
  • Exec Source and Logger Sink
  • Moving data from Twitter to HDFS
  • Moving data from NetCat to HDFS
  • Flume Interceptors
  • Flume Interceptor Example
  • Flume Multi-Agent Flow
  • Flume Consolidation

Apache Hive

  • Hive Introduction
  • Hive Database
  • Hive Managed Tables
  • Hive External Tables
  • Hive Inserts
  • Hive Analytics
  • Working with Parquet
  • Compressing Parquet
  • Working with Fixed File Format
  • Alter Command
  • Hive String Functions
  • Hive Date Functions
  • Hive Partitioning
  • Hive Bucketing

Spark with Yarn & HDFS

  • What is Apache Spark
  • Understanding Cluster Manager (Yarn)
  • Understanding Distributed Storage (HDFS)
  • Running Spark on Yarn/HDFS
  • Understanding Deploy Modes

GCS Cluster

  • Spark on GCS Cluster

Spark Internals

  • Drivers & Executors
  • RDDs & Dataframes
  • Transformation & Actions
  • Wide & Narrow Transformations
  • Understanding Execution Plan
  • Different Plans by Driver

Spark RDD : Transformation & Actions

  • Map/FlatMap Transformation
  • Filter/Intersection
  • Union/Distinct Transformation
  • GroupByKey/ Group people based on Birthday months
  • ReduceByKey / Total Number of students in each Subject
  • SortByKey / Sort students based on their rollno
  • MapPartition / MapPartitionWithIndex
  • Change number of Partitions
  • Join / join email address based on customer name
  • Spark Actions

Spark RDD Practice

  • Scala Tuples
  • Filter Error Logs
  • Frequency of word in Text File
  • Population of each city
  • Orders placed by Customers
  • average rating of movie

Spark Dataframes & Spark SQL

  • Dataframe Intro
  • Dafaframe from Json Files
  • Dataframe from Parquet Files
  • Dataframe from CSV Files
  • Dataframe from Avro File
  • Working with XML
  • Working with Columns
  • Working with String
  • Working with Dates
  • Dataframe Filter API
  • DataFrame API Part1
  • DataFrame API Part2
  • Spark SQL
  • Working with Hive Tables in Spark
  • Datasets versus Dataframe
  • User Defined Functions (UDFS)

Using Intellij IDE

  • Intellij Setup
  • Project Setup
  • Writing first Spark program on IDE
  • Understanding spark configuration
  • Adding Actions/Transformations
  • Understanding Execution Plan

Running Spark on EMR (AWS Cloud)

  • EMR Cluster Overview
  • Cluster Setup
  • Setting Spark Code for EMR
  • Using Spark-submit
  • Running Spark on EMR Cluster

Spark with Cassandra

  • Cassandra Course
  • Creating Spark RDD from Cassandra Table
  • Processing Cassandra data in Spark
  • Cassandra Rows to Case Class
  • Saving Spark RDD to Cassandra

Getting Started with MongoDB

  • MongoDB Intro
  • MongoDB Usecase & Limitations
  • MongoDB Installation

Crud Operations

  • Find
  • Find With Filter
  • Insert
  • Update
  • Update Continues
  • Projections
  • Delete

Working with Operators

  • In / not in Operators
  • gte / lte Operators
  • and / or operators
  • regex operator

MongoDB Compass

  • Working with GUI

Advanced Mongo

  • Validation/Schema
  • Working with Indexes

Spark with Mongo

  • Spark Mongo Integration

Instructors

Ms Navdeep Kaur

Ms Navdeep Kaur
Instructor
Udemy

Trending Courses

Popular Courses

Popular Platforms

Learn more about the Courses

Download the Careers360 App on your Android phone

Regular exam updates, QnA, Predictors, College Applications & E-books now on your Mobile

Careers360 App
150M+ Students
30,000+ Colleges
500+ Exams
1500+ E-books