Taming Big Data with Apache Spark and Python - Hands On!

BY
Udemy

Learn the skills and techniques which professionals use to analyse large data sets from Taming Big Data with Apache Spark and Python - Hands On course.

Mode

Online

Fees

₹ 699 4099

Quick Facts

particular details
Medium of instructions English
Mode of learning Self study
Mode of Delivery Video and Text Based

Course overview

Taming Big Data with Apache Spark and Python - Hands On online certification is developed by Sundog Education, Online Learning Platform, Frank Kane, Founder - Sundog Education and Machine Learning Professional and offered by Udemy Inc., a US-based online learning platform to help individuals boost their careers.

Taming Big Data with Apache Spark and Python - Hands-On online course is quite interactive for learners where they will spend the majority of their time working along with the instructor to build, evaluate, and run real code – both locally and in the cloud using Amazon's Elastic MapReduce service. Taming Big Data with Apache Spark and Python - Hands-On online training offers  20+ downloadable materials and 7 hours of pre-recorded lectures to help learners create, run, and study. The course will also cover spark-based technologies such as spark SQL, spark streaming, GraphX, etc. Interested candidates can enrol themselves in the course by making an online payment through any online mode of payment to purchase the lifetime subscription to the course.

The highlights

  • Certificate of completion
  • Self-paced course
  • English videos with multi-language subtitles
  • 7 hours of pre-recorded video content
  • Online course
  • 30-day money-back guarantee
  • Unlimited access
  • Accessible on mobile devices and TV

Program offerings

  • Certificate of completion
  • Self-paced course
  • English videos with multi-language subtitles
  • 7 hours of pre-recorded video content
  • 30-day money-back guarantee
  • Unlimited access
  • Accessible on mobile devices and tv
  • 4 articles
  • 26 downloadable resources

Course and certificate fees

Fees information
₹ 699  ₹4,099
certificate availability

Yes

certificate providing authority

Udemy

What you will learn

Knowledge of python Knowledge of big data Knowledge of apache spark

After completing Taming Big Data with Apache Spark and Python - Hands-On certification course, learners will get an understanding about DataFrames and Resilient Distributed Datastores in Spark, how python allows someone to swiftly create and run Spark jobs, convert difficult analysis issues into Spark scripts that are iterative or multi-stage, using Amazon's Elastic MapReduce to scale up to big data collections, how to deal with clusters and how to share data between nodes using accumulators and variables, how Hadoop YARN distributes Spark across clusters of computers, more spark technologies such as Spark SQL, Spark Streaming, and GraphX.

The syllabus

Getting Started with Spark

  • Introduction
  • How to Use This Course
  • Udemy 101: Getting the Most From This Course
  • IMPORTANT! DO NOT USE JAVA 16 WITH THIS COURSE
  • [Activity]Getting Set Up: Installing Python, a JDK, Spark, and its Dependencies.
  • Alternate MovieLens download location
  • [Activity] Installing the MovieLens Movie Rating Dataset
  • [Activity] Run your first Spark program! Ratings histogram example.

Spark Basics and the RDD Interface

  • What's new in Spark 3?
  • Introduction to Spark
  • The Resilient Distributed Dataset (RDD)
  •  Ratings Histogram Walkthrough
  • Key/Value RDD's and the Average Friends by Age Example
  •  [Activity] Running the Average Friends by Age Example
  •  Filtering RDD's, and the Minimum Temperature by Location Example
  •  [Activity]Running the Minimum Temperature Example, and Modifying it for Maximums
  •  [Activity] Running the Maximum Temperature by Location Example
  •  [Activity] Counting Word Occurrences using flatmap()
  •  [Activity] Improving the Word Count Script with Regular Expressions
  •  [Activity] Sorting the Word Count Results
  •  [Exercise] Find the Total Amount Spent by Customer
  •  [Exercise] Check your Results, and Now Sort them by Total Amount Spent.
  •  Check Your Sorted Implementation and Results Against Mine.

Spark SQL, DataFrames, and DataSets

  • Introducing SparkSQL
  • [Activity] Executing SQL commands and SQL-style functions on a DataFrame
  • Using DataFrames instead of RDD's
  • [Exercise] Friends by Age, with DataFrames    
  • Exercise Solution: Friends by Age, with DataFrames    
  • [Activity] Word Count, with DataFrames
  • [Activity] Minimum Temperature, with DataFrames (using a custom schema)    
  • [Exercise] Implement Total Spent by Customer with DataFrames
  • Exercise Solution: Total Spent by Customer, with DataFrames

Advanced Examples of Spark Programs

  • [Activity] Find the Most Popular Movie
  • [Activity] Use Broadcast Variables to Display Movie Names Instead of ID Numbers
  • Find the Most Popular Superhero in a Social Graph
  • [Activity] Run the Script - Discover Who the Most Popular Superhero is!
  • [Exercise] Find the Most Obscure Superheroes
  • Exercise Solution: Most Obscure Superheroes    
  • Superhero Degrees of Separation: Introducing Breadth-First Search
  • Superhero Degrees of Separation: Accumulators, and Implementing BFS in Spark
  • [Activity] Superhero Degrees of Separation: Review the Code and Run it
  • Item-Based Collaborative Filtering in Spark, cache(), and persist()
  • [Activity] Running the Similar Movies Script using Spark's Cluster Manager Preview
  • [Exercise] Improve the Quality of Similar Movies

Running Spark on a Cluster

  • Introducing Elastic MapReduce
  • [Activity] Setting up your AWS / Elastic MapReduce Account and Setting Up PuTTY
  • Partitioning
  • Create Similar Movies from One Million Ratings - Part 1
  • [Activity] Create Similar Movies from One Million Rating - Part 2
  • Create Similar Movies from One Million Ratings - Part 3
  • Troubleshooting Spark on a Cluster
  • More Troubleshooting, and Managing Dependencies

Machine Learning with Spark ML

  • Introducing MLLib
  • [Activity] Using Spark ML to Produce Movie Recommendations
  • Analyzing the ALS Recommendations Results
  • [Activity] Linear Regression with Spark ML 
  • [Exercise] Using Decision Trees in Spark ML to Predict Real Estate Prices
  • Exercise Solution: Decision Trees with Spark

Spark Streaming, Structured Streaming, and GraphX

  • Spark Streaming
  • [Activity] Structured Streaming in Python
  • [Exercise] Use Windows with Structured Streaming to Track Most-Viewed URL's
  • Exercise Solution: Using Structured Streaming with Windows
  • GraphX

You Made It! Where to Go from Here

  • Learning More about Spark and Data Science
  • Bonus Lecture: More courses to explore!

Instructors

Mr Frank Kane

Mr Frank Kane
Founder
Freelancer

Trending Courses

Popular Courses

Popular Platforms

Learn more about the Courses

Download the Careers360 App on your Android phone

Regular exam updates, QnA, Predictors, College Applications & E-books now on your Mobile

Careers360 App
150M+ Students
30,000+ Colleges
500+ Exams
1500+ E-books