Data Engineering on Google Cloud platform

BY
Udemy

Acquire a thorough understanding of the functionalities and concepts involved in data engineering activities using the Google cloud platform.

Mode

Online

Fees

₹ 549 2699

Quick Facts

particular details
Medium of instructions English
Mode of learning Self study
Mode of Delivery Video and Text Based

Course overview

Data Engineering on Google Cloud Platform online certification was developed by Cloud Resident, an education platform that offers courses in the cloud, data engineering, analytics, and architecture, and is offered by Udemy, which is intended for candidates looking for a thorough training program that could help them master the concepts and methods associated with Google Cloud Platform (GPC) for data engineering.

Data Engineering on Google Cloud Platform online classes is a short-term program that contains 10 hours of hands-on study materials supported by 42 downloadable resources which aim to offer the most useful answers to real-world scenarios for data engineering on the cloud. With Data Engineering on Google Cloud Platform online training, candidates will also be taught about strategies involved with PySpark structured streaming, real-time event data streaming, event time data processing, automation, data transformation, data ingestion, and more.

The highlights

  • Certificate of completion
  • Self-paced course
  • 10 hours of pre-recorded video content
  • 42 downloadable resources

Program offerings

  • Online course
  • Learning resources
  • 30-day money-back guarantee
  • Unlimited access
  • Accessible on mobile devices and tv

Course and certificate fees

Fees information
₹ 549  ₹2,699
certificate availability

Yes

certificate providing authority

Udemy

What you will learn

Knowledge of cloud computing Automation skills

After completing the Data Engineering on Google Cloud Platform certification course, candidates will be introduced to the fundamentals of the Google cloud platform (GCP) for data engineering operations as well as will acquire the knowledge of the concepts involved with cloud computing, ETL, and data warehousing. In this data engineering certification, candidates will explore the functionalities of Apache Airflow, HiveSQL, SparkSQL, CloudSQL, Bigquery, Hive tables, PySpark, Dataproc, and Ad-hoc queries. In this data engineering course, candidates will learn about strategies involved with automation, event time data processing, real-time data streaming, PySpark structured streaming, data ingestion, and data transformation. 

The syllabus

Introduction and Overview

  • Course and Tutor Introduction
  • Course Overview and Objectives
  • How to make the most out of the course ?

Batch Processing and ETL using BigQuery,Spark and Airflow / Google composer

  • Introduction to Bigquery as a Data warehousing tool on GCP
  • Practical - Partitioned tables & Loading Data
  • Introduction to Dataproc Clusters
  • Practical - Create Dataproc Clusters
  • Practical - Problem Statement | Write PySpark ETL job using Jupyter notebooks
  • Practical - Submit Pyspark Job and load data into Bigquery tables
  • Introduction To Google Workflow Template
  • Practical - Write a Google workflow to submit Pyspark applications
  • Introduction to Apache Airflow / Google Composer
  • Practical-Write airflow script in python for creating DAG and dependencies

Batch Data ingestion using Apache Sqoop and Apache Airflow / Google Composer

  • Introduction to Apache Sqoop
  • Practical - Setup Sqoop dependencies and Cloudsql database Setup
  • Practical - Create Dataproc Cluster | Sqoop Commands/simple imports to GCS
  • Practical - Sqoop - Incremental Imports from CloudSql Mysql Database
  • Practical - Sqoop Boundary Query / Imports with no Primary keys
  • Practical - Sqoop import using Apache Airflow / Google Composer
  • Practical - Sqoop incremental imports using Apache Airflow / Google Composer

Kafka Crash Course

  • Kafka Introduction
  • Kafka - Topics , partitions and brokers
  • Kafka - Replications
  • Kafka - Role of Zookeeper
  • Kafka - Practice Commands on Dataproc

Real-Time Streaming and Analytics using Spark Structured Streaming with Kafka

  • Real time Streaming - Section Overview
  • Understanding Spark streaming APIs - Dstreams and Structured Streaming
  • Introduction to Spark Structured streaming
  • Practical - Create Dataproc Clusters - With Initialization Actions
  • Practical - Dataproc Cluster Setup and prerequisites for streaming application
  • Practical - Pyspark Structured streaming - Testing streaming data and aggregates
  • Practical - Problem Statement | Late Data handling and Streaming Aggregations
  • Practical-Write background cloud Functions to load transformed data to bigquery
  • Practical - Get the most visited categories in microbatches
  • Problem Statement | Raw Data Streaming
  • Practical - Raw Data Streaming|Hive external tables |Microbatching using Airflow
  • Practical - Write GCS Triggered cloud functions to load data into bigquery

Real-Time Streaming with streaming files as source of data with IOT sensor data

  • Understanding Streaming files as a source of data | IOT sensor data
  • Understanding the Problem Statement
  • Practical - Data generator python script setup
  • Practical-Stateful Aggregations|Foreachbatch sink|GCS triggered cloud Functions
  • Practical - Cloud functions & loading data into bigquery
  • Problem Statement | Handling high consumption IOT device alerts

Update - BigQuery / CLoudSql - Federated Queries

  • Update - BigQuery / CLoudSql - Federated Queries

Instructors

Mr Sid Raghunath
Data Architect
Udemy

Trending Courses

Popular Courses

Popular Platforms

Learn more about the Courses

Download the Careers360 App on your Android phone

Regular exam updates, QnA, Predictors, College Applications & E-books now on your Mobile

Careers360 App
150M+ Students
30,000+ Colleges
500+ Exams
1500+ E-books