Data Engineering using AWS Analytics

BY
Udemy

Master the advanced data engineering techniques to conduct data analysis on AWS.

Mode

Online

Fees

₹ 499 3499

Quick Facts

particular details
Medium of instructions English
Mode of learning Self study
Mode of Delivery Video and Text Based

Course overview

Data engineering is a discipline that requires skills that fall somewhere between software development and programming on the one hand and advanced analytics skills such as those required by data scientists on the other hand. Durga Viswanatha Raju Gadiraju - CEO at ITVersity & CTO at Analytiqs, Inc, Asasri Manthena - Marketing & Sales Manager at ITVersity, Inc, and Perraju Vegiraju - Certified Instructor designed the Data Engineering using AWS Data Analytics online certification, which is delivered by Udemy.

Data Engineering using AWS Data Analytics online training incorporates more than 25.5 hours of prerecorded lessons, 18 downloadable resources, and 112 articles for applicants who want to master AWS Analytics services for data engineering activities. Data Engineering using AWS Data Analytics online classes will walk applicants through the process of building data engineering pipelines using the AWS data analytics stack and will explain concepts such as data pipelines, web server streaming, data processing, CRUD operations, and more.

The highlights

  • Certificate of completion
  • Self-paced course
  • 25.5 hours of pre-recorded video content
  • 112 articles
  • 18 downloadable resources

Program offerings

  • Online course
  • Learning resources
  • 30-day money-back guarantee
  • Unlimited access
  • Accessible on mobile devices and tv

Course and certificate fees

Fees information
₹ 499  ₹3,499
certificate availability

Yes

certificate providing authority

Udemy

What you will learn

Knowledge of aws technology Knowledge of cloud computing

After completing the Data Engineering using AWS Data Analytics certification course, applicants will obtain a comprehensive understanding of the fundamentals associated with AWS for data engineering operations like data analytics as well as will acquire knowledge of the strategies involved with cloud computing and data ingestion. In this data engineering course, applicants will explore the functionalities of various tools on the AWS platform including AWS S3, AWS EC2, AWS IAM, AWS Lambda, AWS Elastic Map Reduce, AWS Kinesis, AWS Athena, AWS Events Bridge, and AWS Glue Catalog. In this data engineering certification, applicants will learn about strategies involved with data processing, web server streaming, CRUD operations, virtual machine, Python boto3, and batch data pipelines.

The syllabus

Introduction to the course

  • Introduction to Data Engineering using AWS Analytics Services
  • Video Lectures and Reference Materials
  • Taking the Udemy course for New Udemy users.
  • Additional costs for AWS infrastructure for hands-on practice.
  • Signup for AWS account
  • Logging into AWS account
  • Overviewing of AWS Billing Dashboard - Cost Explorer and Budget

Setup Local Development Environment for AWS on Windows 10 and 11

  • Setup Local Environment on Windows for AWS
  • Overview of Powershell on Windows 10 or Windows 11
  • Setup Ubuntu VM on Windows 10 or 11 using wsl
  • Setup Ubuntu VM on Windows 10 or 11 using wsl - Contd...
  • Setup Python venv and pip on Ubuntu
  • Setup AWS CLI on Windows and Ubuntu using Pip
  • Create AWS IAM User and Download Credentials
  • Configure AWS CLI on Windows
  • Create Python Virtual Environment for AWS Projects
  • Setup Boto3 as part of Python Virtual Environment
  • Setup Jupyter Lab and Validate boto3

Setup Local Development Environment for Mac

  • Setup Local Environment for AWS on Mac
  • Setup AWS CLI on Mac
  • Setup AWS IAM User to configure AWS CLI
  • Configure AWS CLI using IAM User Credentials
  • Setup Python Virtual Environment on Mac using Python 3
  • Setup Boto3 as part of Python Virtual Environment
  • Setup Jupyter Lab and Validate boto3

Setup environment for practice using Cloud9

  • Introduction to Cloud 9
  • Setup Cloud9
  • Overview of Cloud9 IDE
  • Docker and AWS CLI on Cloud9
  • Cloud9 and EC2
  • Accessing Web Applications
  • Allocate and Assign Static IP
  • Changing Permissions using IAM Policies
  • Increasing Size of EBS Volume
  • Opening ports for Cloud9 Instance
  • Setup Jupyter lab on Cloud9 Instance
  • Open SSH Port for Cloud9 EC2 Instance
  • Connect to Cloud9 EC2 Instance using SSH

AWS getting started with s3, IAM and CLI

  • Introduction - AWS Getting Started
  • [Instructions] Introduction - AWS Getting Started
  • Create s3 Bucket
  • [Instructions] Create s3 Bucket
  • Create IAM Group and User
  • [Instructions] Create IAM Group and User
  • Overview of Roles
  • [Instructions] Overview of Roles
  • Create and Attach Custom Policy
  • [Instructions and Code] Create and Attach Custom Policy
  • Configure and Validate AWS CLI
  • [Instructions and Code] Configure and Validate AWS CLI

Storage - Deep Dive into AWS simple Storage SService aka s3

  • Getting Started with AWS Simple Storage aka S3
  • [Instructions] Getting Started with S3
  • Setup Data Set locally
  • [Instructions] Setup Data Set locally
  • Adding S3 Buckets and Objects
  • [Instruction] Adding s3 Buckets and Objects
  • Version Control in S3
  • [Instructions] Version Control in S3
  • Cross-Region Replication
  • [Instructions] Cross-Region Replication
  • Overview of S3 Storage Classes
  • [Instructions] Overview of S3 Storage Classes
  • Overview of Glacier
  • [Instructions] Overview of Glacier
  • Managing S3 using AWS CLI
  • [Instructions and Commands] Managing S3 using AWS CLI
  • Managing Objects in S3 using CLI - Lab
  • [Instructions] Managing Objects in S3 using CLI - Lab

AWS Security using IAM - Managing AWS Users, Roles and Policies using AWS IAM

  • Creating AWS IAM Users with Programmatic and Web Console Access
  • [Instructions] Creating IAM Users
  • Logging into AWS Management Console using IAM User
  • [Instructions] Logging into AWS Management Console using IAM User
  • Validate Programmatic Access to IAM User
  • [Instructions and Commands] Validate Programmatic Access to IAM User
  • IAM Identity-based Policies
  • [Instructions and Commands] IAM Identity-based Policies
  • Managing IAM Groups
  • [Instructions and Commands] Managing IAM Groups
  • Managing IAM Roles
  • [Instructions and Commands] Managing IAM Roles
  • Overview of Custom Policies
  • [Instructions and Commands] Overview of Custom Policies
  • Managing IAM using AWS CLI
  • [Instructions and Commands] Managing IAM using AWS CLI

Infrastructure - Getting Started with AWS Elastic Cloud Compute aka EC2

  • Getting Started with AWS Elastic Cloud Compute aka EC2
  • [Instructions] Getting Started with EC2
  • Create EC2 Key Pair
  • [Instructions] Create EC2 Key Pair
  • Launch EC2 Instance
  • [Instructions] Launch EC2 Instance
  • Connecting to EC2 Instance
  • [Instructions and Commands] Connecting to EC2 Instance
  • Security Groups Basics
  • [Instructions and Commands] Security Groups Basics
  • Public and Private IP Addresses
  • [Instructions] Public and Private IP Addresses
  • EC2 Life Cycle
  • [Instructions] EC2 Life Cycle
  • Allocating and Assigning Elastic IP Address
  • [Instructions] Allocating and Assigning Elastic IP Addresses
  • Managing EC2 Using AWS CLI
  • [Instructions and Commands] Managing EC2 Using AWS CLI
  • Upgrade or Downgrade EC2 Instances
  • [Instructions and Commands] Upgrade or Downgrade EC2 Instances

Infrastructure - AWS EC2 advanced

  • Understanding AWS EC2 Instance or Virtual Machine Metadata 
  • [Instructions and Commands] Understanding EC2 Metadata
  • Querying on EC2 Metadata
  • [Instructions and Commands] Querying on EC2 Metadata
  • Filtering on EC2 Metadata
  • [Instructions and Commands] Filtering on EC2 Metadata
  • Using Bootstrapping Scripts
  • [Instructions and Commands] Using Bootstrapping Scripts
  • Create an AMI
  • [Instructions and Commands] Create an AMI
  • Validate AMI - Lab
  • [Instructions and Commands] Validate AMI - Lab

Data Ingestion using Lambda Functions

  • Hello World using AWS Lambda
  • [Instructions] Hello World using AWS Lambda
  • Setup Project for local development
  • [Instructions and Code] Setup Project for local development
  • Deploy Project to AWS Lambda console
  • [Instructions and Code] Deploy Project to AWS Lambda console
  • Develop download functionality using requests
  • [Instructions and Code] Develop download functionality using requests
  • Using 3rd party libraries in AWS Lambda
  • [Instructions and Code] Using 3rd party libraries in AWS Lambda
  • Validating s3 access for local development
  • [Instructions and Code] Validating s3 access for local development
  • Develop upload functionality to s3
  • [Instructions and Code] Develop upload functionality to s3
  • Validating using AWS Lambda Console
  • [Instructions and Code] Validating using AWS Lambda Console
  • Run using AWS Lambda Console
  • [Instructions] Run using AWS Lambda Console
  • Validating files incrementally
  • [Instructions and Code] Validating files incrementally
  • Reading and Writing Bookmark using s3
  • [Instructions and Code] Reading and Writing Bookmark using s3
  • Maintaining Bookmark using s3
  • [Instructions and Code] Maintaining Bookmark using s3
  • Review the incremental upload logic
  • Deploying lambda function
  • [Instructions and Source Code] - activity-downloader Lambda Function
  • Schedule Lambda Function using AWS Event Bridge
  • [Instructions] Schedule Lambda Function using AWS Event Bridge

Overview of Glue Components

  • Introduction - Overview of Glue Components
  • [Instructions] Overview of Glue Components
  • Create Crawler and Catalog Table
  • [Instructions] Create Crawler and Catalog Table
  • Analyze Data using Athena
  • [Instructions] Analyze Data using Athena
  • Creating S3 Bucket and Role
  • [Instructions and Code] Creating S3 Bucket and Role
  • Create and Run the Glue Job
  • [Instructions] Create and Run the Glue Job
  • Validate using Glue CatalogTable and Athena
  • [Instructions and Code] Validate using Glue CatalogTable and Athena
  • Create and Run Glue Trigger
  • [Instructions and Code] Create and Run Glue Trigger
  • Create Glue Workflow
  • [Instructions] Create Glue Workflow
  • Run Glue Workflow and Validate
  • [Instructions] Run Glue Workflow and Validate

Setup Spark history Server for Glue Jobs

  • Introduction - Spark History Server for Glue
  • Setup Spark History Server on AWS
  • Clone AWS Glue Samples repository
  • [Instructions and Code] Clone AWS Glue Samples repository
  • Build Glue Spark UI Container
  • [Instructions and Code] Build Glue Spark UI Container
  • Update IAM Policy Permissions
  • Start Glue Spark UI Container
  • [Instructions and Code] Start Glue Spark UI Container

Deep Dive into Glue Catalog

  • Prerequisites for Glue Catalog Tables
  • [Instructions] Prerequisites for Glue Catalog Tables
  • Steps for Creating Catalog Tables
  • [Instructions] Steps for Creating Catalog Tables
  • Download Data Set
  • [Instructions and Code] Download Data Set
  • Upload data to s3
  • [Instructions and Code] Upload data to s3
  • Create Glue Catalog Database - itvghlandingdb
  • [Instructions] Create Glue Catalog Database - itvghlandingdb
  • Create Glue Catalog Table - ghactivity
  • [Instructions] Create Glue Catalog Table - ghactivity
  • Running Queries using Athena - ghactivity
  • [Instructions and Code] Running Queries using Athena - ghactivity
  • Crawling Multiple Folders
  • [Instructions] Crawling Multiple Folders
  • Managing Glue Catalog using AWS CLI
  • Managing Glue Catalog using AWS CLI
  • Managing Glue Catalog using Python Boto3
  • [Instructions and Code] Managing Glue Catalog using Python Boto3

Exploring Glue Job APIs

  • Update IAM Role for Glue Job
  • [Instructions and Code] Update IAM Role for Glue Job
  • Generate baseline Glue Job
  • [Instructions and Code] Generate baseline Glue Job
  • Running baseline Glue Job
  • [Instructions] Running baseline Glue Job
  • Glue Script for Partitioning Data
  • [Instructions and Code] Glue Script for Partitioning Data
  • Validating using Athena
  • [Instructions and Code] Validating using Athena

Glue Job Boomarks

  • Introduction to Glue Job Boomarks
  • Cleaning up the data
  • [Instructions and Code] Cleaning up the data
  • Overview of AWS Glue CLI
  • [Instructions and Code] Overview of AWS Glue CLI
  • Run Job using Bookmark
  • [Instructions and Code] Run Job using Bookmark
  • Validate Bookmark using AWS CLI
  • [Instructions and Code] Validate Bookmark using AWS CLI
  • Add new data to landing
  • [Instructions and Code] Add new data to landing
  • Rerun Glue Job using Bookmark
  • [Instructions and Code] Rerun Glue Job using Bookmark
  • Validate Job Bookmark and Files for Incremental run
  • [Instructions and Code] Validate Job Bookmark and Files for Incremental run
  • Recrawl the Glue Catalog Table using CLI
  • [Instructions and Code] Recrawl the Clue Catalog Table using CLI
  • Run Athena Queries for Data Validation
  • [Instructions and Code] Run Athena Queries for Data Validation

Getting Started with AWS EMR

  • Planning of EMR Cluster
  • Create EC2 Key Pair
  • Setup EMR Cluster with Spark
  • Understanding Summary of AWS EMR Cluster
  • Review EMR Cluster Application User Interfaces
  • Review EMR Cluster Monitoring07 Review EMR Cluster Monitoring
  • Review EMR Cluster Hardware and Cluster Scaling Policy
  • Review EMR Cluster Configurations
  • Review EMR Cluster Events
  • Review EMR Cluster Steps
  • Review EMR Cluster Bootstrap Actions
  • Connecting to EMR Master Node using SSH
  • Disabling Termination Protection and Terminating the Cluster
  • Clone and Create New Cluster
  • Listing AWS S3 Buckets and Objects using AWS CLI on EMR Cluster
  • Listing AWS S3 Buckets and Objects using HDFS CLI on EMR Cluster
  • Managing Files in AWS s3 using HDFS CLI on EMR Cluster

Development Lifecycle for Pyspark

  • Setup Virtual Environment and Install Pyspark
  • [Commands] - Setup Virtual Environment and Install Pyspark
  • Getting Started with Pycharm
  • [Code and Instructions] - Getting Started with Pycharm
  • Passing Run Time Arguments
  • Accessing OS Environment Variables
  • Getting Started with Spark
  • Create Function for Spark Session
  • [Code and Instructions] - Create Function for Spark Session
  • Setup Sample Data
  • Read data from files
  • [Code and Instructions] - Read data from files
  • Process data using Spark APIs
  • [Code and Instructions] - Process data using Spark APIs
  • Write data to files
  • [Code and Instructions] - Write data to files
  • Validating Writing Data to Files
  • Productionizing the Code
  • [Code and Instructions] - Productionizing the code

Deploying Spark Applications using AWS EMR

  • Deploying Applications using AWS EMR - Introduction
  • Setup EMR Cluster to deploy applications
  • Validate SSH Connectivity to Master node of AWS EMR Cluster
  • Setup Jupyter Notebook Environment on EMR Cluster
  • Create required AWS s3 Bucket
  • Upload Ghactivity Data to s3
  • Validate Application using AWS EMR Compatible Versions
  • Deploy Application to AWS EMR Master Node
  • Create user space for ec2-user on AWS EMR Cluster
  • Run Spark Application using spark-submit on AWS EMR Master Node
  • Validate Data using Jupyter Notebooks on AWS EMR Cluster
  • Clone and Start Auto Terminated AWS EMR Cluster
  • Delete Data Populated by GHAcitivity Application using AWS EMR Cluster
  • Differences between Spark Client and Cluster Deployment Modes
  • Running Spark Application using Cluster Mode on AWS EMR Cluster
  • Overview of Adding Pyspark Application as Step to AWS EMR Cluster
  • Deploy Spark Application to AWS S3
  • Running Spark Applications as AWS EMR Steps in client mode
  • Running Spark Applications as AWS EMR Steps in cluster mode
  • Validate AWS EMR Step Execution of Spark Application

Streaming Pipeline using Kinesis

  • Building Streaming Pipeline using Kinesis
  • Rotating Logs
  • Setup Kinesis Firehose Agent
  • Create Kinesis Firehose Delivery Stream
  • Planning the Pipeline
  • Create IAM Group and User
  • Granting Permissions to IAM User using Policy
  • Configure Kinesis Firehose Agent
  • Start and Validate Agent
  • Conclusion - Building Simple Steaming Pipeline

Consuming data from s3 using boto3

  • Customizing s3 folder using Kinesis Delivery Stream
  • Create Policy to read from s3 Bucket
  • Validate s3 access using AWS CLI
  • Setup Python Virtual Environment to explore boto3
  • Validating access to s3 using Python boto3
  • Read Content from s3 object
  • Read multiple s3 Objects
  • Get number of s3 Objects using Marker
  • Get size of s3 Objects using Marker

Populating GitHub Data to Dynamodb

  • Install required libraries
  • Understanding GitHub APIs
  • Setting up GitHub API Token
  • Understanding GitHub Rate Limit
  • Create New Repository for since
  • Extracting Required Information
  • Processing Data
  • Grant Permissions to create dynamodb tables using boto3
  • Create Dynamodb Tables
  • Dynamodb CRUD Operations
  • Populate Dynamodb Table
  • Dynamodb Batch Operations

Overview of Amazon Athena

  • Getting Started with Amazon Athena
  • Quick Recap of Glue Catalog Databases and Tables
  • Access Glue Catalog Databases and Tables using Athena Query Editor
  • Create a Database and Table using Athena
  • Populate Data into Table using Athena
  • Using CTAS to create tables using Athena
  • Overview of Amazon Athena Architecture
  • Amazon Athena Resources and relationship with Hive
  • Create a Partitioned Table using Athena
  • Develop Query for Partitioned Column
  • Insert into Partitioned Tables using Athena
  • Validate Data Partitioning using Athena
  • Drop Athena Tables and Delete Data Files
  • Drop Partitioned Table using Athena
  • Data Partitioning in Athena using CTAS

Amazon Athena using AWS CLI

  • Amazon Athena using AWS CLI - Introduction
  • Get help and list Athena databases using AWS CLI
  • [Commands] Get help and list Athena databases using AWS CLI
  • Managing Athena Workgroups using AWS CLI
  • [Commands] Managing Athena Workgroups using AWS CLI
  • Run Athena Queries using AWS CLI
  • [Commands] Run Athena Queries using AWS CLI
  • Get Athena Table Metadata using AWS CLI
  • [Commands] Get Athena Table Metadata using AWS CLI
  • Run Athena Queries with a custom location using AWS CLI
  • [Commands] Run Athena Queries with a custom location
  • Drop Athena table using AWS CLI
  • [Commands] Drop Athena table using AWS CLI
  • Run CTAS under Athena using AWS CLI
  • [Commands] Run CTAS under Athena using AWS CLI

Amazon Athena using Python boto3

  • Amazon Athena using Python boto3 - Introduction
  • Getting Started with Managing Athena using Python boto3
  • [Code] Getting Started with Managing Athena using Python boto3
  • List Amazon Athena Databases using Python boto3
  • [Code] List Amazon Athena Databases using Python boto3
  • List Amazon Athena Tables using Python boto3
  • [Code] List Amazon Athena Tables using Python boto3
  • Run Amazon Athena Queries using Python boto3
  • [Code] Run Amazon Athena Queries using Python boto3
  • Review Athena Query Results using boto3
  • [Code] Review Athena Query Results using Python boto3

Getting Started with Amazon Redshift

  • Getting Started with Amazon Redshift - Introduction
  • Create Redshift Cluster using Free Trial
  • Connecting to Database using Redshift Query Editor
  • Get the list of tables querying information schema
  • [Queries] - Get the list of tables querying information schema
  • Run Queries against Redshift Tables using Query Editor
  • [Queries] - Validate user's data using Query Editor
  • Create Redshift Table using Primary Key
  • [Queries] - Create a Redshift Table
  • [Consolidated Queries] - CRUD Operations
  • Insert Data into Redshift Tables
  • Update Data in Redshift Tables
  • Delete data from Redshift tables
  • Redshift Saved Queries using Query Editor
  • Deleting Redshift Cluster
  • Restore Redshift Cluster from Snapshot

Copy Data from s3 to Redshift Tables

  • Copy Data from s3 to Redshift - Introduction
  • Setup Data in s3 for Redshift Copy
  • Copy Database and Table for Redshift Copy Command
  • Create IAM User with full access on s3 for Redshift Copy
  • Run Copy Command to copy data from s3 to Reshift Table
  • Troubleshoot Errors related to Redshift Copy Command
  • Run Copy Command to copy from s3 to Redshift table
  • Validate using queries against Redshift Table
  • Overview of Redshift Copy Command
  • Create IAM Role for Redshift to access s3
  • Copy Data from s3 to Redshift table using IAM Role
  • Setup JSON Dataset in s3 for Redshift Copy Command
  • Copy JSON Data from s3 to Redshift table using IAM Role

Develop application using Redshift Cluster

  • Develop application using Redshift Cluster - Introduction
  • Allocate Elastic Ip for Redshift Cluster
  • Enable Public Accessibility for Redshift Cluster
  • Update Inbound Rules in Security Group to access Redshift Cluster
  • Create Database and User in Redshift Cluster
  • Connect to database in Redshift using psql
  • Change Owner on Redshift Tables
  • Download Redshift JDBC Jar file
  • Connect to Redshift Databases using IDEs such as SQL Workbench
  • Setup Python Virtual Environment for Redshift
  • Run Simple Query against Redshift Database Table using Python
  • Truncate Redshift Table using Python
  • Create IAM User to copy from s3 to Redshift Tables
  • Validate Access of IAM User using Boto3
  • Run Redshift Copy Command using Python

Redshift Tables with Distkeys and Sortkeys

  • Redshift Tables with Distkeys and Sortkeys - Introduction
  • Quick Review of Redshift Architecture
  • Create a multi-node Redshift Cluster
  • Connect to Redshift Cluster using Query Editor
  • Create Redshift Database
  • Create Redshift Database User
  • Create Redshift Database Schema
  • Default Distribution Style of Redshift Table
  • Grant Select Permissions on Catalog to Redshift Database User
  • Update Search Path to query Redshift system tables
  • Validate table with DISTSTYLE AUTO
  • Create Cluster from Snapshot to the original state
  • Overview of Node Slices in Redshift Cluster
  • Overview of Distribution Styles
  • Distribution Strategies for retail tables in Redshift
  • Create Redshift tables with distribution styles all
  • Troubleshoot and Fix Load or Copy Errors
  • Create Redshift Table with Distribution Style Auto
  • 19 Create Redshift Tables using Distribution Style Key
  • Delete Cluster with the manual snapshot

Redshift Federated Queries and Spectrum

  • Redshift Federated Queries and Spectrum - Introduction
  • Overview of integrating RDS and Redshift for Federated Queries
  • Create IAM Role for Redshift Cluster
  • Setup Postgres Database Server for Redshift Federated Queries
  • Create tables in Postgres Database for Redshift Federated Queries
  • Creating Secret using Secrets Manager for Postgres Database
  • Accessing Secret Details using Python Boto3
  • Reading JSON Data to Dataframe using Pandas
  • Write JSON Data to Database Tables using Pandas
  • Create IAM Policy for Secret and associate with Redshift Role
  • Create Redshift Cluster using IAM Role with permissions on a secret
  • Create Redshift External Schema to Postgres Database
  • Update Redshift Cluster Network Settings for Federated Queries
  • Performing ETL using Redshift Federated Queries
  • Clean up resources added for Redshift Federated Queries
  • Grant Access on Glue Data Catalog to Redshift Cluster for Spectrum
  • Setup Redshift Clusters to run queries using Spectrum
  • Quick Recap of Glue Catalog Database and Tables for Redshift Spectrum
  • Create External Schema using Redshift Spectrum
  • Run Queries using Redshift Spectrum
  • Cleanup the Redshift Cluster

Instructors

Mr Durga Viswanatha Raju Gadiraju

Mr Durga Viswanatha Raju Gadiraju
Technology Adviser
Freelancer

Trending Courses

Popular Courses

Popular Platforms

Learn more about the Courses

Download the Careers360 App on your Android phone

Regular exam updates, QnA, Predictors, College Applications & E-books now on your Mobile

Careers360 App
150M+ Students
30,000+ Colleges
500+ Exams
1500+ E-books