Learn By Example: Hadoop, MapReduce for Big Data problems

BY
Udemy

Learn about the principles of parallel thinking to master Hadoop and MapReduce functionality.

Mode

Online

Fees

₹ 4099

Quick Facts

particular details
Medium of instructions English
Mode of learning Self study
Mode of Delivery Video and Text Based

Course overview

Learn By Example: Hadoop, MapReduce for Big Data Problems certification course is designed by Loony Corn, a global e-learning platform with ex-Google, Stanford, and Flipkart team members, and is made available by Udemy for individuals looking to create sophisticated distributed computing applications to process large amounts of data using the capabilities of Hadoop and MapReduce. Learn By Example: Hadoop, MapReduce for Big Data problems online course aims to provide participants a hands-on introduction to Hadoop from the very beginning.

Learn By Example: Hadoop, MapReduce for Big Data problems online classes include more than 13.5 hours of video-based lessons accompanied by 112 downloadable study materials and articles that cover topics like parallel thinking, performance tuning, natural language processing, cluster management, serial computing, distributed computing, collaborative filtering, k-means clustering, as well as teach about the techniques to use VMs and the Cloud to build their clusters.

The highlights

  • Certificate of completion
  • Self-paced course
  • 13.5 hours of pre-recorded video content
  • 1 article
  • 112 downloadable resources

Program offerings

  • Online course
  • Downloadable learning resources
  • 30-day money-back guarantee
  • Unlimited access
  • Accessible on mobile devices and tv

Course and certificate fees

Fees information
₹ 4,099
certificate availability

Yes

certificate providing authority

Udemy

What you will learn

Natural language processing Knowledge of big data Sql knowledge

After completing the Learn By Example: Hadoop, MapReduce for Big Data problems online certification, participants will be introduced to the methodologies and techniques of MapReduce and Hadoop for big data operations. Participants will explore techniques involved with the interaction of YARN, Mapreduce, and HDFS as well as acquire the knowledge of the principle associated with parallel thinking. Participants will learn about concepts involved with performance tuning, collaborative filtering, natural language processing, k-means clustering, serial computing, distributed computing, and cluster management. Additionally, participants will learn how to use SQL group and SQL select, as well as inverted indices.

The syllabus

Introduction

  • You, this course and Us

Why is Big Data a Big Deal

  • The Big Data Paradigm
  • Serial vs Distributed Computing
  • What is Hadoop?
  • HDFS or the Hadoop Distributed File System
  • MapReduce Introduced
  • YARN or Yet Another Resource Negotiator

Installing Hadoop in a Local Environment

  • Hadoop Install Modes
  • Hadoop Standalone mode Install
  • Hadoop Pseudo-Distributed mode Install

The MapReduce "Hello World"

  • The basic philosophy underlying MapReduce
  • MapReduce - Visualized And Explained
  • MapReduce - Digging a little deeper at every step
  • "Hello World" in MapReduce
  • The Mapper
  • The Reducer
  • The Job

Run a MapReduce Job

  • Get comfortable with HDFS
  • Run your first MapReduce Job

Juicing your MapReduce - Combiners, Shuffle and Sort and The Streaming API

  • Parallelize the reduce phase - use the Combiner
  • Not all Reducers are Combiners
  • How many mappers and reducers does your MapReduce have?
  • Parallelizing reduce using Shuffle And Sort
  • MapReduce is not limited to the Java language - Introducing the Streaming API
  • Python for MapReduce

HDFS and Yarn

  • HDFS - Protecting against data loss using replication
  • HDFS - Name nodes and why they're critical
  • HDFS - Checkpointing to backup name node information
  • Yarn - Basic components
  • Yarn - Submitting a job to Yarn
  • Yarn - Plug in scheduling policies
  • Yarn - Configure the scheduler

MapReduce Customizations For Finer Grained Control

  • Setting up your MapReduce to accept command line arguments
  • The Tool, ToolRunner and GenericOptionsParser
  • Configuring properties of the Job object
  • Customizing the Partitioner, Sort Comparator, and Group Comparator

The Inverted Index, Custom Data Types for Keys, Bigram Counts and Unit Tests!

  • The heart of search engines - The Inverted Index
  • Generating the inverted index using MapReduce
  • Custom data types for keys - The Writable Interface
  • Represent a Bigram using a WritableComparable
  • MapReduce to count the Bigrams in input text
  • Setting up your Hadoop project
  • Test your MapReduce job using MRUnit

Input and Output Formats and Customized Partitioning

  • Introducing the File Input Format
  • Text And Sequence File Formats
  • Data partitioning using a custom partitioner
  • Make the custom partitioner real in code
  • Total Order Partitioning
  • Input Sampling, Distribution, Partitioning and configuring these
  • Secondary Sort

Recommendation Systems using Collaborative Filtering

  • Introduction to Collaborative Filtering
  • Friend recommendations using chained MR jobs
  • Get common friends for every pair of users - the first MapReduce
  • Top 10 friend recommendation for every user - the second MapReduce

Hadoop as a Database

  • Structured data in Hadoop
  • Running an SQL Select with MapReduce
  • Running an SQL Group By with MapReduce
  • A MapReduce Join - The Map Side
  • A MapReduce Join - The Reduce Side
  • A MapReduce Join - Sorting and Partitioning
  • A MapReduce Join - Putting it all together

K-Means Clustering

  • What is K-Means Clustering?
  • A MapReduce job for K-Means Clustering
  • K-Means Clustering - Measuring the distance between points
  • K-Means Clustering - Custom Writables for Input/Output
  • K-Means Clustering - Configuring the Job
  • K-Means Clustering - The Mapper and Reducer
  • K-Means Clustering : The Iterative MapReduce Job

Setting up a Hadoop Cluster

  • Manually configuring a Hadoop cluster (Linux VMs)
  • Getting started with Amazon Web Servicies
  • Start a Hadoop Cluster with Cloudera Manager on AWS

Appendix

  • Setup a Virtual Linux Instance (For Windows users)
  • [For Linux/Mac OS Shell Newbies] Path and other Environment Variables

Instructors

Mr Janani Ravi
Instructor
Udemy

Mr Vitthal Srinivasan
Instructor
Udemy

Trending Courses

Popular Courses

Popular Platforms

Learn more about the Courses

Download the Careers360 App on your Android phone

Regular exam updates, QnA, Predictors, College Applications & E-books now on your Mobile

Careers360 App
150M+ Students
30,000+ Colleges
500+ Exams
1500+ E-books