Apache Spark with Scala - Hands On with Big Data!

BY
Udemy

Acquire a practical understanding of the concepts and methodologies involved in Apache Spark's big data activities using Scala.

Mode

Online

Fees

₹ 599 4099

Quick Facts

particular details
Medium of instructions English
Mode of learning Self study
Mode of Delivery Video and Text Based

Course overview

Apache Spark is a strong and united analytics engine for handling large amounts of data. It involves Java, Scala, Python, and R APIs, as well as an optimized engine that supports general implementation graphs. Apache Spark with Scala - Hands On with Big Data online certification is designed by Sundog Education  - an educational platform that provides valuable professional skills in big data, data science, and machine learning in association with Frank Kane- Founder of Sundog Education, which is presented by Udemy.

Apache Spark with Scala - Hands On with Big Data online course offers 9 hours of hands-on lectures along with 3 articles which are designed to help candidates learn the technique of framing data analysis difficulties as spark problems using over 20 hands-on illustrations, and then scaling them up to operate on cloud computing services. Apache Spark with Scala - Hands On with Big Data online classes discuss topics like big data analysis, machine learning, data streaming, caching, partitioning, graph structures, structured data, data frames, Hadoop clusters, datasets, and more.

The highlights

  • Certificate of completion
  • Self-paced course
  • 9 hours of pre-recorded video content
  • 3 articles
  • Learning resources

Program offerings

  • Online course
  • Learning resources
  • 30-day money-back guarantee
  • Unlimited access
  • Accessible on mobile devices and tv

Course and certificate fees

Fees information
₹ 599  ₹4,099
certificate availability

Yes

certificate providing authority

Udemy

What you will learn

Machine learning Knowledge of big data Knowledge of apache spark

After completing the Apache Spark with Scala - Hands On with Big Data certification course, candidates will acquire knowledge of the strategies to use Apache Spark with Scala for big data operations including big data analytics. In this big data certification, candidates will explore the concepts involved with MLLib, data streaming, spark streaming, caching, partitioning, resilient distributed datasets, graph structures, and structured data as well as will acquire knowledge of techniques to transform structured data using datasets, data frames, and SparkSQL. In this Apache Spark course, candidates will learn about methodologies to develop, deploy and manage spark scripts on Hadoop clusters as well as will acquire knowledge of the strategies involved with traversing and analyzing graph structures using GraphX. In this big data course, candidates will also learn about analyzing big data sets using machine learning on Spark.

The syllabus

Getting Started

  • Udemy 101: Getting the Most From This Course
  • Alternate download link for the ml-100k dataset
  • WARNING: DO NOT INSTALL JAVA 16 IN THE NEXT LECTURE
  • Introduction, and installing the course materials, IntelliJ, and Scala
  • Introduction to Apache Spark
  • Spark Basics
  • What's New in Spark 3?

Scala Crash Course [Optional]

  • [Activity] Scala Basics
  • [Exercise] Flow Control in Scala
  • [Exercise] Functions in Scala
  • [Exercise] Data Structures in Scala

Using Resilient Distributed Datasets (RDDs)

  • The Resilient Distributed Dataset
  • Ratings Histogram Example
  • Spark Internals
  • Key / Value RDD's, and the Average Friends by Age example
  • [Activity] Running the Average Friends by Age Example
  • Filtering RDD's, and the Minimum Temperature by Location Example
  • [Activity] Running the Minimum Temperature Example, and Modifying it for Maximum
  • [Activity] Counting Word Occurrences using Flatmap()
  • [Activity] Improving the Word Count Script with Regular Expressions
  • [Activity] Sorting the Word Count Results
  • [Exercise] Find the Total Amount Spent by Customer
  • [Exercise] Check your Results, and Sort Them by Total Amount Spent
  • Check Your Results and Implementation Against Mine

SparkSQL, DataFrames, and DataSets

  • Introduction to SparkSQL
  • [Activity] Using SparkSQL
  • [Activity] Using DataSets
  • [Exercise] Implement the "Friends by Age" example using DataSets
  • Exercise Solution: Friends by Age, with Datasets.
  • [Activity] Word Count example, using Datasets
  • [Activity] Revisiting the Minimum Temperature example, with Datasets
  • [Exercise] Implement the "Total Spent by Customer" problem with Datasets
  • Exercise Solution: Total Spent by Customer with Datasets

Advanced Examples of Spark Programs

  • [Activity] Find the Most Popular Movie
  • [Activity] Use Broadcast Variables to Display Movie Names
  • [Activity] Find the Most Popular Superhero in a Social Graph
  • [Exercise] Find the Most Obscure Superheroes
  • Exercise Solution: Find the Most Obscure Superheroes
  • Superhero Degrees of Separation: Introducing Breadth-First Search
  • Superhero Degrees of Separation: Accumulators, and Implementing BFS in Spark
  • [Activity] Superhero Degrees of Separation: Review the code, and run it!
  • Item-Based Collaborative Filtering in Spark, cache(), and persist()
  • [Activity] Running the Similar Movies Script using Spark's Cluster Manager
  • [Exercise] Improve the Quality of Similar Movies

Running Spark on a Cluster

  • [Activity] Using spark-submit to run Spark driver scripts
  • [Activity] Packaging driver scripts with SBT
  • [Exercise] Package a Script with SBT and Run it Locally with spark-submit
  • Exercise solution: Using SBT and spark-submit
  • Introducing Amazon Elastic MapReduce
  • Creating Similar Movies from One Million Ratings on EMR
  • Partitioning
  • Best Practices for Running on a Cluster
  • Troubleshooting, and Managing Dependencies

Machine Learning with Spark ML

  • Introducing MLLib
  • [Activity] Using MLLib to Produce Movie Recommendations
  • Linear Regression with MLLib
  • [Activity] Running a Linear Regression with Spark
  • [Exercise] Predict Real Estate Values with Decision Trees in Spark
  • Exercise Solution: Predicting Real Estate with Decision Trees in Spark

Intro to Spark Streaming

  • The DStream API for Spark Streaming
  • [Activity] Real-time Monitoring of the Most Popular Hashtags on Twitter
  • Structured Streaming
  • [Activity] Using Structured Streaming for real-time log analysis
  • [Exercise] Windowed Operations with Structured Streaming
  • Exercise Solution: Top URL's in a 30-second Window

Intro to GraphX

  • GraphX, Pregel, and Breadth-First-Search with Pregel.
  • Using the Pregel API with Spark GraphX
  • [Activity] Superhero Degrees of Separation using GraphX

You Made It! Where to Go from Here.

  • Learning More, and Career Tips
  • Bonus Lecture: More courses to explore!

Instructors

Mr Frank Kane

Mr Frank Kane
Founder
Freelancer

Trending Courses

Popular Courses

Popular Platforms

Learn more about the Courses

Download the Careers360 App on your Android phone

Regular exam updates, QnA, Predictors, College Applications & E-books now on your Mobile

Careers360 App
150M+ Students
30,000+ Colleges
500+ Exams
1500+ E-books