- Course Intro
- Big Data Intro
- Understanding Big Data Ecosystem
Data Engineering Master Course: Spark/Hadoop/Kafka/MongoDB
Quick Facts
particular | details | |||
---|---|---|---|---|
Medium of instructions
English
|
Mode of learning
Self study
|
Mode of Delivery
Video and Text Based
|
Course overview
Big Data is a massive collection of data that is growing rapidly over time. It is a data set that is so big and complicated that traditional data processing tools cannot store or process it proficiently. Big data is a type of information that is extremely massive in size. Data Engineering Master Course: Spark/Hadoop/Kafka/MongoDB online certification is developed by Navdeep Kaur Instructor & Founder of TechnoAvengers.com and is offered by Udemy.
Data Engineering Master Course: Spark/Hadoop/Kafka/MongoDB online course involves more than 11.5 hours of prerecorded lessons, 5 articles, and 21 downloadable resources which begin with an explanation of what the Hadoop distributed file system is along with the most prevalent Hadoop commands required to operate with Hadoop file system. Master Big Data - Apache Spark/Hadoop/Sqoop/Hive/Flume/Mongo online training will also cover topics such as data migration, data ingestion, CRUD operations, data frames, and the functionalities of tools such as MySQL, Cassandra, Sqoop, Flume, Hive, and MongoDB.
The highlights
- Certificate of completion
- Self-paced course
- 11.5 hours of pre-recorded video content
- 5 articles
- 21 downloadable resources
- Assignments
Program offerings
- Online course
- Learning resources
- 30-day money-back guarantee
- Unlimited access
- Accessible on mobile devices and tv
Course and certificate fees
Fees information
certificate availability
Yes
certificate providing authority
Udemy
Who it is for
What you will learn
After completing the Data Engineering Master Course: Spark/Hadoop/Kafka/MongoDB certification course, candidates will acquire an understanding of the principles associated with big data for big data analytics using the Hadoop, Sqoop, Hive, Spark, Flume and MongoDB. Candidates will explore the fundamentals involved with data frames, data migration, data ingestion, GUI, and CRUD operations. Candidates will learn about migrating data from Hive to MySQL, HDFS to MySQL as well as will acquire the knowledge of the functionalities of Cassandra, Intellij IDE, and Netcat for big data operations.
The syllabus
Big Data Introduction
Environment Setup
- Cloudera vm setup
- GCP Cluster Fixes
- Cluster Setup on Google Cloud
- Environment Update
Hadoop & Yarn
- HDFS and Hadoop Commands
- Yarn Cluster Overview
Sqoop Import
- Sqoop Introduction
- Managing Target Directories
- Working with Parquet File Format
- Working with Avro File Format
- Working with Different Compressions
- Conditional Imports
- Split-by and Boundary Queries
- Field delimeters
- Incremental Appends
- Sqoop-Hive Cluster Fix
- Sqoop Hive Import
- Sqoop List Tables/Database
- Sqoop Assignment1
- Sqoop Assignment2
- Sqoop Import Practice1
- Sqoop Import Practice2
Sqoop Export
- Export from Hdfs to Mysql
- Export from Hive to Mysql
- Export Avro Compressed to Mysql
- Bonus Lecture: Sqoop with Airflow
Apache Flume
- Flume Introduction & Architecture
- Exec Source and Logger Sink
- Moving data from Twitter to HDFS
- Moving data from NetCat to HDFS
- Flume Interceptors
- Flume Interceptor Example
- Flume Multi-Agent Flow
- Flume Consolidation
Apache Hive
- Hive Introduction
- Hive Database
- Hive Managed Tables
- Hive External Tables
- Hive Inserts
- Hive Analytics
- Working with Parquet
- Compressing Parquet
- Working with Fixed File Format
- Alter Command
- Hive String Functions
- Hive Date Functions
- Hive Partitioning
- Hive Bucketing
Spark with Yarn & HDFS
- What is Apache Spark
- Understanding Cluster Manager (Yarn)
- Understanding Distributed Storage (HDFS)
- Running Spark on Yarn/HDFS
- Understanding Deploy Modes
GCS Cluster
- Spark on GCS Cluster
Spark Internals
- Drivers & Executors
- RDDs & Dataframes
- Transformation & Actions
- Wide & Narrow Transformations
- Understanding Execution Plan
- Different Plans by Driver
Spark RDD : Transformation & Actions
- Map/FlatMap Transformation
- Filter/Intersection
- Union/Distinct Transformation
- GroupByKey/ Group people based on Birthday months
- ReduceByKey / Total Number of students in each Subject
- SortByKey / Sort students based on their rollno
- MapPartition / MapPartitionWithIndex
- Change number of Partitions
- Join / join email address based on customer name
- Spark Actions
Spark RDD Practice
- Scala Tuples
- Filter Error Logs
- Frequency of word in Text File
- Population of each city
- Orders placed by Customers
- average rating of movie
Spark Dataframes & Spark SQL
- Dataframe Intro
- Dafaframe from Json Files
- Dataframe from Parquet Files
- Dataframe from CSV Files
- Dataframe from Avro File
- Working with XML
- Working with Columns
- Working with String
- Working with Dates
- Dataframe Filter API
- DataFrame API Part1
- DataFrame API Part2
- Spark SQL
- Working with Hive Tables in Spark
- Datasets versus Dataframe
- User Defined Functions (UDFS)
Using Intellij IDE
- Intellij Setup
- Project Setup
- Writing first Spark program on IDE
- Understanding spark configuration
- Adding Actions/Transformations
- Understanding Execution Plan
Running Spark on EMR (AWS Cloud)
- EMR Cluster Overview
- Cluster Setup
- Setting Spark Code for EMR
- Using Spark-submit
- Running Spark on EMR Cluster
Spark with Cassandra
- Cassandra Course
- Creating Spark RDD from Cassandra Table
- Processing Cassandra data in Spark
- Cassandra Rows to Case Class
- Saving Spark RDD to Cassandra
Getting Started with MongoDB
- MongoDB Intro
- MongoDB Usecase & Limitations
- MongoDB Installation
Crud Operations
- Find
- Find With Filter
- Insert
- Update
- Update Continues
- Projections
- Delete
Working with Operators
- In / not in Operators
- gte / lte Operators
- and / or operators
- regex operator
MongoDB Compass
- Working with GUI
Advanced Mongo
- Validation/Schema
- Working with Indexes
Spark with Mongo
- Spark Mongo Integration