Data Engineering using Kafka and Spark Structured Streaming

Udemy

Learn about the components of data engineering and how to construct streaming pipelines using Kafka and Spark Structured Streaming.

Online

₹ 549 2899

particular

details

                                    Medium of instructions
                                    English

                                    Mode of learning
                                    Self study

                                    Mode of Delivery
                                    Video and Text Based

Introduction

Introduction to Data Engineering using Kafka and Spark Structured Streaming
Important Note for first time Data Engineering Customers
Important Note for Data Engineering Essentials (Python and Spark) Customers
How to get 30 days of complimentary lab access?
How to access the material used for this course?

Getting Strted with kafka

Overview of Kafka
Managing Topics using Kafka CLI
Produce and Consume Messages using CLI
Validate Generation of Web Server Logs
Create a Web Server using nc
Produce retail logs to Kafka Topic
Consume retail logs from Kafka Topic
Clean up Kafka CLI Sessions to produce and consume messages
Define Kafka Connect to produce
Validate Kafka Connect to produce

Data Ingestion using Kafka Connect

Overview of Kafka Connect
Define Kafka Connect to Produce Messages
Validate Kafka Connect to produce messages
Cleanup Kafka Connect to produce messages
Write Data to HDFS using Kafka Connect
Setup HDFS 3 Sink Connector Plugin
Overview of Kafka Consumer Groups
Configure HDFS 3 Sink Properties
Run and Validate HDFS 3 Sink
Cleanup Kafka Connect to consume messages

Overview of Spark Structured Streaming

Understanding Streaming Context
Validate Log Data for Streaming
Push log messages to Netcat Webserver
Overview of built-in Input Sources
Reading Web Server logs using Spark Structured Streaming
Overview of Output Modes
Using append as Output Mode
Using complete as Output Mode
Using update as Output Mode
Overview of Triggers in Spark Structured Streaming
Overview of built-in Output Sinks
Previewing the Streaming Data

Kafgka and Spark Structured Streaming Integration

Create Kafka Topic
Read Data from Kafka Topic
Preview data using console
Preview data using memory
Transform Data using Spark APIs
Write Data to HDFS using Spark
Validate Data in HDFS using Spark
Write Data to HDFS using Spark using Header
Cleanup Kafka Connect and Files in HDFS

Incremental Loads using Spark Structured streaming

Overview of Spark Structured Streaming Triggers
Steps for Incremental Data Processing
Create Working Directory in HDFS
Logic to Upload GHArchive Files
Upload GHArchive Files to HDFS
Add new GHActivity JSON Files
Read JSON Data using Spark Structured streaming
Write in Parquet File Format
Analyze GHArchive Data in Parquet files using Spark
Add New GHActivity JSON files
Load Data Incrementally to Target Table
Validate Incremental Load
Add New GHActivity JSON files
Using maxFilerPerTrigger and latestFirst
Validate Incremental Load
Add New GHActivity JSON files
Incremental Load using Archival Process
Validate Incremental Load

Setting up environment using AWS Cloud9

Getting Started with Cloud9
Cleaning Cloud9 Environment
Warming up with Cloud9 IDE
Overview of EC2 related to Cloud9
Opening ports for Cloud9 Instance
Associating Elastic IPs to Cloud9 Instance
Increase EBS Volume Size of Cloud9 Instance
Setup Jupyter Lab on Cloud9
[Commands] Setup Jupyter Lab on Cloud9

Setting up Environment - Overview og GCP and Provision Ubuntu VP

Signing up for GCP
Overview of GCP Web Console
Overview of GCP Pricing
Provision Ubuntu VM from GCP
Setup Docker
Validating Python
Setup Jupyter Lab
Setup Jupyter Lab locally on Mac

Setup Single Node Hadoop Cluster

Introduction to Single Node Hadoop Cluster
Material related to setting up the environment
Setup Prerequisites
Setup Password less login
Download and Install Hadoop
Configure Hadoop HDFS
Start and Validate HDFS
Configure Hadoop YARN
Start and Validate YARN
Managing Single Node Hadoop

Setup Hive and Spark

Setup Data Sets for Practice
Download and Install Hive
Setup Database for Hive Metastore
Configure and Setup Hive Metastore
Launch and Validate Hive
Scripts to Manage Single Node Cluster
Download and Install Spark 2
Configure Spark 2
Validate Spark 2 using CLIs
Validate Jupyter Lab Setup
Integrate Spark 2 with Jupyter Lab
Download and Install Spark 3
Configure Spark 3
Validate Spark 3 using CLIs
Integrate Spark 3 with Jupyter Lab

Setup Single Node Kafka Cluster

Download and Install Kafka
Configure and Start Zookeeper
Configure and Start Kafka Broker
Scripts to manage single node cluster
Overview of Kafka CLI
Setup Retail log Generator
Redirecting logs to Kafka

Popular Courses

Popular Platforms

Popular Searches

Data Engineering using Kafka and Spark Structured Streaming

Online

₹ 549 2899

Quick Facts

Course overview

The highlights

Program offerings

Course and certificate fees

Fees information

certificate availability

certificate providing authority

Who it is for

What you will learn

The syllabus

Introduction

Getting Strted with kafka

Data Ingestion using Kafka Connect

Overview of Spark Structured Streaming

Kafgka and Spark Structured Streaming Integration

Incremental Loads using Spark Structured streaming

Setting up environment using AWS Cloud9

Setting up Environment - Overview og GCP and Provision Ubuntu VP

Setup Single Node Hadoop Cluster

Setup Hive and Spark

Setup Single Node Kafka Cluster

Instructors

Articles

Popular Articles

Latest Articles

Trending Courses

Popular Courses

Popular Platforms

Learn more about the Courses

Download the Careers360 App on your Android phone

Popular Searches

Data Engineering using Kafka and Spark Structured Streaming

Online

₹ 549 2899

Quick Facts

Course overview

The highlights

Program offerings

Course and certificate fees

Fees information

certificate availability

certificate providing authority

Who it is for

What you will learn

The syllabus

Introduction

Getting Strted with kafka

Data Ingestion using Kafka Connect

Overview of Spark Structured Streaming

Kafgka and Spark Structured Streaming Integration

Incremental Loads using Spark Structured streaming

Setting up environment using AWS Cloud9

Setting up Environment - Overview og GCP and Provision Ubuntu VP

Setup Single Node Hadoop Cluster

Setup Hive and Spark

Setup Single Node Kafka Cluster

Instructors

Articles

Popular Articles

Latest Articles

Trending Courses

Popular Courses

Popular Platforms

Learn more about the Courses

Thank You!

Download the Careers360 App on your Android phone