Certification Course on Big Data Engineering with Hadoop and Spark

BY
CloudxLab

Mode

Online

Duration

3 Months

Fees

₹ 1499 2998

Quick Facts

particular details
Medium of instructions English
Mode of learning Self study, Virtual Classroom
Mode of Delivery Video and Text Based

Course and certificate fees

Fees information
₹ 1,499  ₹2,998
certificate availability

Yes

certificate providing authority

CloudxLab

The syllabus

Course 1: Big Data with Hadoop

Introduction
  • Big Data Introduction
  • Distributed systems
  • Big Data Use Cases
  • Various Solutions
  • Overview of Hadoop Ecosystem
  • Spark Ecosystem Walkthrough
  • Quiz
Foundation & Environment
  • Understanding the Cloudxlab
  • Cloudxlab Hands-on
  • Hadoop & Spark Hands-on
  • Quiz and Assessment
  • Basics of Linux - Quick Hands-on
  • Understanding Regular Expressions
  • Quiz and Assessment
  • Setting up VM (optional)
Zookeeper
  • ZooKeeper - Race Condition
  • ZooKeeper - Deadlock
  • Hands-On
  • Quiz & Assessment
  • How does election happen - Paxos Algorithm?
  • Use cases
  • When not to use
  • Quiz & Assessment
HDFS
  • Why HDFS or Why not existing file systems?
  • HDFS - NameNode & DataNodes
  • Quiz
  • Advance HDFS Concepts (HA, Federation)
  • Quiz
  • Hands-on with HDFS (Upload, Download, SetRep)
  • Quiz & Assessment
  • Data Locality (Rack Awareness)
YARN
  • YARN - Why not existing tools?
  • YARN - Evolution from MapReduce 1.0
  • Resource Management: YARN Architecture
  • Advance Concepts - Speculative Execution
  • Quiz
MapReduce Basics
  • MapReduce - Understanding Sorting
  • MapReduce - Overview & Quiz
  • Example 0 - Word Frequency Problem - Without MR
  • Example 1 - Only Mapper - Image Resizing
  • Example 2 - Word Frequency Problem
  • Example 3 - Temperature Problem
  • Example 4 - Multiple Reducer
  • Example 5 - Java MapReduce Walkthrough & Quiz
Map Reduce Advanced
  • Writing MapReduce Code Using Java
  • Building MapReduce project using Apache Ant
  • Concept - Associative & Commutative
  • Quiz
  • Example 8 - Combiner
  • Example 9 - Hadoop Streaming
  • Example 10 - Adv. Problem Solving - Anagrams
  • Example 11 - Adv. Problem Solving - Same DNA
  • Example 12 - Adv. Problem Solving - Similar DNA
  • Example 12 - Joins - Voting
  • Limitations of MapReduce
  • Quiz
Analyzing Data with Pig
  • Pig - Introduction
  • Pig - Modes
  • Getting Started
  • Example - NYSE Stock Exchange
  • Concept - Lazy Evaluation
Processing Data with Hive
  • Hive - Introduction
  • Hive - Data Types
  • Getting Started
  • Loading Data in Hive (Tables)
  • Example: Movielens Data Processing
  • Advance Concepts: Views
  • Connecting Tableau and HiveServer 2
  • Connecting Microsoft Excel and HiveServer 2
  • Project: Sentiment Analysis of Twitter Data
  • Advanced - Partition Tables
  • Understanding HCatalog & Impala
  • Quiz
NoSQL and HBase
  • NoSQL - Scaling Out / Up
  • NoSQL - ACID Properties and RDBMS Story
  • CAP Theorem
  • HBase Architecture - Region Servers etc
  • Hbase Data Model - Column Family Orientedness
  • Getting Started - Create table, Adding Data
  • Adv Example - Google Links Storage
  • Concept - Bloom Filter
  • Comparison of NOSQL Databases
  • Quiz
Importing Data with Sqoop and Flume, Oozie
  • Sqoop - Introduction
  • Sqoop Import - MySQL to HDFS
  • Exporting to MySQL from HDFS
  • Concept - Unbounding Dataset Processing or Stream Processing
  • Flume Overview: Agents - Source, Sink, Channel
  • Example 1 - Data from Local network service into HDFS
  • Example 2 - Extracting Twitter Data
  • Quiz
  • Example 3 - Creating workflow with Oozie

Course 2: Big Data with Spark

Introduction
  • Apache Spark ecosystem walkthrough
  • Spark Introduction - Why Spark?
  • Quiz
Scala Basics
  • Scala - Quick Introduction - Access Scala on CloudxLab
  • Scala - Quick Introduction - Variables and Methods
  • Getting Started: Interactive, Compilation, SBT
  • Types, Variables & Values
  • Functions
  • Collections
  • Classes
  • Parameters
  • More Features
  • Quiz and Assessment
Spark Basics
  • Apache Spark ecosystem walkthrough
  • Spark Introduction - Why Spark?
  • Using the Spark Shell on CloudxLab
  • Example 1 - Performing Word Count
  • Understanding Spark Cluster Modes on YARN
  • RDDs (Resilient Distributed Datasets)
  • General RDD Operations: Transformations & Actions
  • RDD lineage
  • RDD Persistence Overview
  • Distributed Persistence
Writing and Deploying Spark Applications
  • Creating the SparkContext
  • Building a Spark Application (Scala, Java, Python)
  • The Spark Application Web UI
  • Configuring Spark Properties
  • Running Spark on Cluster
  • RDD Partitions
  • Executing Parallel Operations
  • Stages and Tasks
Common Patterns in Spark Data Processing
  • Common Spark Use Cases
  • Example 1 - Data Cleaning (Movielens)
  • Example 2 - Understanding Spark Streaming
  • Understanding Kafka
  • Example 3 - Spark Streaming from Kafka
  • Iterative Algorithms in Spark
  • Project: Real-time analytics of orders in an e-commerce company
Data Formats and Management
  • InputFormat and InputSplit
  • JSON
  • XML
  • AVRO
  • How to store many small files - SequenceFile?
  • Parquet
  • Protocol Buffers
  • Comparing Compressions
  • Understanding Row Oriented and Column Oriented Formats - RCFile?
DataFrames and Spark SQL
  • Spark SQL - Introduction
  • Spark SQL - Dataframe Introduction
  • Transforming and Querying DataFrames
  • Saving DataFrames
  • DataFrames and RDDs
  • Comparing Spark SQL, Impala, and Hive-on-Spark
Machine Learning with Spark
  • Machine Learning Introduction
  • Applications Of Machine Learning
  • MlLib Example: k-means
  • SparkR Example

Projects

Sentiment analysis
  • Sentiment analysis of "Iron Man 3" movie using Hive and visualizing the sentiment data using BI tools such as Tableau
Process the NSE
  • Process the NSE (National Stock Exchange) data using Hive for various insights
MovieLens Project
  • Analyze MovieLens data using Hive
Spark MLlib
  • Generate movie recommendations using Spark MLlib
Churn the logs
  • Churn the logs of NASA Kennedy Space Center WWW server using Spark to find out useful business and devops metrics
Spark application
  • Write end-to-end Spark application starting from writing code on your local machine to deploying to the cluster
Analytics Dashboard
  • Real-time analytics dashboard for an e-commerce company using Apache Spark, Kafka, Spark Streaming, Node.js, Socket.IO and Highcharts

Instructors

Mr Sandeep Giri

Mr Sandeep Giri
Founder
CloudxLab

Mr Abhinav Singh

Mr Abhinav Singh
Co-Founder
CloudxLab

Trending Courses

Popular Courses

Popular Platforms

Learn more about the Courses

Download the Careers360 App on your Android phone

Regular exam updates, QnA, Predictors, College Applications & E-books now on your Mobile

Careers360 App
150M+ Students
30,000+ Colleges
500+ Exams
1500+ E-books