Spark SQL and Spark 3 using Scala (Formerly CCA175)

BY
Udemy

Mode

Online

Fees

₹ 3299

Quick Facts

particular details
Medium of instructions English
Mode of learning Self study
Mode of Delivery Video and Text Based

Course and certificate fees

Fees information
₹ 3,299
certificate availability

Yes

certificate providing authority

Udemy

The syllabus

Introduction

  • CCA 175 Spark and Hadoop Developer - Curriculum

Setting up Environment using AWS Cloud9

  • Getting Started with Cloud9
  • Creating Cloud9 Environment
  • Warming up with Cloud9 IDE
  • Overview of EC2 related to Cloud9
  • Opening ports for Cloud9 Instance
  • Associating Elastic IPs to Cloud9 Instance
  • Increase EBS Volume Size of Cloud9 Instance
  • Setup Jupyter Lab on Cloud9
  • [Commands] Setup Jupyter Lab on Cloud9

Setting up Environment - Overview of GCP and Provision Ubuntu VM

  • Signing up for GCP
  • Overview of GCP Web Console
  • Overview of GCP Pricing
  • Provision Ubuntu VM from GCP
  • Setup Docker
  • Why we are setting up Python and Jupyter Lab for Scala related course?
  • Validating Python
  • Setup Jupyter Lab

Setup Hadoop on Single Node Cluster

  • Introduction to Single Node Hadoop Cluster
  • Setup Prerequisties
  • [Commands] - Setup Prerequisites
  • Setup Password less login
  • [Commands] - Setup Password less login
  • Download and Install Hadoop
  • [Commands] - Download and Install Hadoop
  • Configure Hadoop HDFS
  • [Commands] - Configure Hadoop HDFS
  • Start and Validate HDFS
  • [Commands] - Start and Validate HDFS
  • Configure Hadoop YARN
  • [Commands] - Configure Hadoop YARN
  • Start and Validate YARN
  • [Commands] - Start and Validate YARN
  • Managing Single Node Hadoop
  • [Commands] - Managing Single Node Hadoop

Setup Hive and Spark on Single Node Cluster

  • Setup Data Sets for Practice
  • [Commands] - Setup Data Sets for Practice
  • Download and Install Hive
  • [Commands] - Download and Install Hive
  • Setup Database for Hive Metastore
  • [Commands] - Setup Database for Hive Metastore
  • Configure and Setup Hive Metastore
  • [Commands] - Configure and Setup Hive Metastore
  • Launch and Validate Hive
  • [Commands] - Launch and Validate Hive
  • Scripts to Manage Single Node Cluster
  • [Commands] - Scripts to Manage Single Node Cluster
  • Download and Install Spark 2
  • [Commands] - Download and Install Spark 2
  • Configure Spark 2
  • [Commands] - Configure Spark 2
  • Validate Spark 2 using CLIs
  • [Commands] - Validate Spark 2 using CLIs
  • Validate Jupyter Lab Setup
  • [Commands] - Validate Jupyter Lab Setup
  • Intergrate Spark 2 with Jupyter Lab
  • [Commands] - Intergrate Spark 2 with Jupyter Lab
  • Download and Install Spark 3
  • [Commands] - Download and Install Spark 3
  • Configure Spark 3
  • [Commands] - Configure Spark 3
  • Validate Spark 3 using CLIs
  • [Commands] - Validate Spark 3 using CLIs
  • Intergrate Spark 3 with Jupyter Lab
  • [Commands] - Intergrate Spark 3 with Jupyter Lab

Scala Fundamentals

  • Introduction and Setting up of Scala
  • Setup Scala on Windows
  • Basic Programming Constructs
  • Functions
  • Object Oriented Concepts - Classes
  • Object Oriented Concepts - Objects
  • Object Oriented Concepts - Case Classes
  • Collections - Seq, Set and Map
  • Basic Map Reduce Operations
  • Setting up Data Sets for Basic I/O Operations
  • Basic I/O Operations and using Scala Collections APIs
  • Tuples
  • Development Cycle - Create Program File
  • Development Cycle - Compile source code to jar using SBT
  • Development Cycle - Setup SBT on Windows
  • Development Cycle - Compile changes and run jar with arguments
  • Development Cycle - Setup IntelliJ with Scala
  • Development Cycle - Develop Scala application using SBT in IntelliJ

Overview of Hadoop HDFS Commands

  • Getting help or usage of HDFS Commands
  • Listing HDFS Files
  • Managing HDFS Directories
  • Copying files from local to HDFS
  • Copying files from HDFS to local
  • Getting File Metadata
  • Previewing Data in HDFS File
  • HDFS Block Size
  • HDFS Replication Factor
  • Getting HDFS Storage Usage
  • Using HDFS Stat Commands
  • HDFS File Permissions
  • Overriding Properties

Apache Spark 2 using Scala - Data Processing - Overview

  • Introduction for the module
  • Starting Spark Context using spark-shell
  • Overview of Spark read APIs
  • Previewing Schema and Data using Spark APIs
  • Overview of Spark Data Frame APIs
  • Overview of Functions to Manipulate Data in Spark Data Frames
  • Overview of Spark Write APIs

Apache Spark 2 using Scala - Processing Column Data using Pre-defined Functions

  • Introduction to Pre-defined Functions
  • Creating Spark Session Object in Notebook
  • Create Dummy Data Frames for Practice
  • Categories of Functions on Spark DAta Frame Columns
  • Using Spark Special Functions - col
  • Using Spark Special Functions - lit
  • Manipulating String Columns using Spark Functions - Case Conversion and Length
  • Manipulating String Columns using Spark Functions - substring
  • Manipulating String Columns using Spark Functions - split
  • Manipulating String Columns using Spark Functions - Concatenating Strings
  • Manipulating String Columns using Spark Functions - Padding Strings
  • Manipulating String Columns using Spark Functions - Trimming unwanted characters
  • Date and Time Functions in Spark - Overview
  • Date and Time Functions in Spark - Date Arithmetic
  • Date and Time Functions in Spark - Using trunc and date_trunc
  • Date and Time Functions in Spark - Using date_format and other functions
  • Date and Time Functions in Spark - dealing with unix timestamp
  • Pre-defined Functions in Spark - Conclusion

Apache Spark 2 using Scala - Basic Transformations using Data Frame

  • Introduction to Basic Transformations using Data Frame APIs
  • Starting Spark Context
  • Overview of Filtering using Spark Data Frame APIs
  • Filtering Data from Spark Data Frames - Reading Data and Understanding Schema
  • Filtering Data from Spark Data Frames - Task 1 - Equal Operator
  • Filtering Data from Spark Data Frames - Task 2 - Comparison Operators
  • Filtering Data from Spark Data Frames - Task 3 - Boolean AND
  • Filtering Data from Spark Data Frames - Task 4 - IN Operator
  • Filtering Data from Spark Data Frames - Task 5 - Between and Like
  • Filtering Data from Spark Data Frames - Task 6 - Using functions in Filter
  • Overview of Aggregations using Spark Data Frame APIs
  • Overview of Sorting using Spark Data Frame APIs
  • Solution - Get Delayed Counts using Spark Data Frame APIs - Part 1
  • Solution - Get Delayed Counts using Spark Data Frame APIs - Part 2
  • Solution - Getting Delayed Counts By Date using Spark Data Frame APIs

Apache Spark 2 using Scala - Joining Data Sets

  • Prepare and Validate Data Sets
  • Starting Spark Session or Spark Context
  • Analyze Data Sets for Joins using Spark Data Frame APIs
  • Eliminate Duplicate records from Data Frame using Spark Data Frame APIs
  • Recap of Basic Transformations using Spark Data Frame APIs
  • Joining Data Sets using Spark Data Frame APIs - Problem Statements
  • Overview of Joins using Spark Data Frame APIs
  • Inner Join using Spark Data Fr - Get number of flights departed from US airports
  • Inner Join using Spark Data Fram - Get number of flights departed from US States
  • Outer Join using Spark Data Frame APIs - Get Aiports - Never Used

Apache Spark 2 using SQL - Getting Started

  • Getting Started with Spark SQL - Overview
  • Overview of Spark Documentation
  • Launching and using Spark SQL CLI
  • Overview of Spark SQL Properties
  • Running OS Commands using Spark SQL
  • Understanding Spark Metastore Warehouse Directory
  • Managing Spark Metastore Databases
  • Managing Spark Metastore Tables
  • Retrieve Metadata of Spark Metastore Tables
  • Role of Spark Metastore or Hive Metastore
  • Exercise - Getting Started with Spark SQL

Apache Spark 2 using SQL - Basic Transformations

  • Basic Transformation using Spark SQL - Introduction
  • Spark SQL - Overview
  • Define Problem Statement for Basic Transformations using Spark SQL
  • Prepare or Create Tables using Spark SQL
  • Projecting or Selecting Data using Spark SQL
  • Filtering Data using Spark SQL
  • Joining Tables using Spark SQL - Inner
  • Joining Tables using Spark SQL - Outer
  • Aggregating Data using Spark SQL
  • Sorting Data using Spark SQL
  • Conclusion - Final Solution using Spark SQL

Apache Spark 2 using SQL - Basic DDL and DML

  • Introduction to Basic DDL and DML using Spark SQL
  • Create Spark Metastore Tables using Spark SQL
  • Overview of Data Types for Spark Metastore Table Columns
  • Adding Comments to Spark Metastore Tables using Spark SQL
  • Loading Data Into Spark Metastore Tables using Spark SQL - Local
  • Loading Data Into Spark Metastore Tables using Spark SQL - HDFS
  • Loading Data into Spark Metastore Tables using Spark SQL - Append and Overwrite
  • Creating External Tables in Spark Metastore using Spark SQL
  • Managed Spark Metastore Tables vs External Spark Metastore Tables
  • Overview of Spark Metastore Table File Formats
  • Drop Spark Metastore Tables and Databases
  • Truncating Spark Metastore Tables
  • Exercise - Managed Spark Metastore Tables

Apache Spark 2 using SQL - DML and Partitioning

  • Introduction to DML and Partitioning of Spark Metastore Tables using Spark SQL
  • Introduction to Partitioning of Spark Metastore Tables using Spark SQL
  • Creating Spark Metastore Tables using Parquet File Format
  • Load vs. Insert into Spark Metastore Tables using Spark SQL
  • Inserting Data using Stage Spark Metastore Table using Spark SQL
  • Creating Partitioned Spark Metastore Tables using Spark SQL
  • Adding Partitions to Spark Metastore Tables using Spark SQL
  • Loading Data into Partitioned Spark Metastore Tables using Spark SQL
  • Inserting Data into Partitions of Spark Metastore Tables using Spark SQL
  • Using Dynamic Partition Mode to insert data into Spark Metastore Tables
  • Exercise - Partitioned Spark Metastore Tables using Spark SQL

Apache Spark 2 using SQL - Pre-defined Functions

  • Introduction - Overview of Spark SQL Functions
  • Overview of Pre-defined Functions using Spark SQL
  • Validating Functions using Spark SQL
  • String Manipulation Functions using Spark SQL
  • Date Manipulation Functions using Spark SQL
  • Overview of Numeric Functions using Spark SQL
  • Data Type Conversion using Spark SQL
  • Dealing with Nulls using Spark SQL
  • Using CASE and WHEN using Spark SQL
  • Query Example - Word Count using Spark SQL

Apache Spark 2 using SQL - Pre-defined Functions - Exercises

  • Prepare Users Table using Spark SQL
  • Exercise 1 - Get number of users created per year
  • Exercise 2 - Get the day name of the birth days of users
  • Exercise 3 - Get the names and email ids of users added in the year 2019
  • Exercise 4 - Get the number of users by gender
  • Exercise 5 - Get last 4 digits of unique ids
  • Exercise 6 - Get the count of users based up on country code

Apache Spark 2 using SQL - Windowing Functions

  • Introduction to Windowing Functions using Spark SQL
  • Prepare HR Database in Spark Metastore using Spark SQL
  • Overview of Windowing Functions using Spark SQL
  • Aggregations using Windowing Functions using Spark SQL
  • LEAD or LAG Functions using Spark SQL
  • Getting first and last values using Spark SQL
  • Ranking using Windowing Functions in Spark SQL
  • Order of execution of Spark SQL Queries
  • Overview of Subqueries using Spark SQL
  • Filtering Window Function Results using Spark SQL

Sample scenarios with solutions

  • Introduction to Sample Scenarios and Solutions
  • Problem Statements - General Guidelines
  • Initializing the job - General Guidelines
  • Getting crime count per type per month - Understanding Data
  • Getting crime count per type per month - Implementing the logic - Core API
  • Getting crime count per type per month - Implementing the logic - Data Frames
  • Getting crime count per type per month - Validating Output
  • Get inactive customers - using Core Spark API (leftOuterJoin)
  • Get inactive customers - using Data Frames and SQL
  • Get top 3 crimes in RESIDENCE - using Core Spark API
  • Get top 3 crimes in RESIDENCE - using Data Frame and SQL
  • Convert NYSE data from text file format to parquet file format
  • Get word count - with custom control arguments, num keys and file format

Instructors

Mr Durga Viswanatha Raju Gadiraju

Mr Durga Viswanatha Raju Gadiraju
Technology Adviser
Freelancer

Trending Courses

Popular Courses

Popular Platforms

Learn more about the Courses

Download the Careers360 App on your Android phone

Regular exam updates, QnA, Predictors, College Applications & E-books now on your Mobile

Careers360 App
150M+ Students
30,000+ Colleges
500+ Exams
1500+ E-books