Big Data Analysis: Hive, Spark SQL, DataFrames and GraphFrames

BY
Yandex via Coursera

Master the knowledge of Spark SQL, GraphFrames, DataFrames and Hive tools with The Big Data Analysis: Hive, Spark SQL, DataFrames and GraphFrames certification.

Lavel

Expert

Mode

Online

Duration

6 Weeks

Fees

Free

Quick Facts

particular details
Medium of instructions English
Mode of learning Self study
Mode of Delivery Video and Text Based

Course overview

We are witnessing a phenomenal era, which is digitally powered with technologies flourishing exponentially. With the strong presence of Big Data, computer systems are now capable of deriving acute information and desired results through analysis of this structured and unstructured data, which ultimately forms the ‘big data’. It is also witnessed that such analysis of Big Data is highly relevant and informative for large organisations, businesses and professionals to optimize their performances.

Considered to be information assets, Big Data enables effective decision making, optimizing processes and cost effectiveness in large, medium as well as small organisations. However, to yield the benefits of this high volume, high velocity and high variety of big data, it becomes essential to analyse this big data using various tools and techniques. This is where the knowledge of using Hive, Spark SQL, DataFrames and GraphFrames comes very handy. Analysts engaged in big data analysis are highly in demand and using these tools, one can efficiently analyse the big data to facilitate important decision making and process optimisation in their employer organisations.

The Big Data Analysis: Hive, Spark SQL, DataFrames and GraphFrames online programme offered by Coursera will impart key skills to the participants in using big data analysis tools and pursue their careers in the area of big data analysis. 

The highlights

  • Offered by Yandex via Coursera platform
  • 100% online learning mode
  • About 39 hours of course content
  • Flexible learning schedule and assignment deadlines
  • Shareable certificate upon course completion
  • Insights from industry experts

Program offerings

  • Videos
  • Readings
  • Practice exercises
  • Quizzes

Course and certificate fees

Type of course

Free

  • Coursera offers this course via Purchase Course and Audit-Only Options.
  • The price of purchasing the course is Rs. 2,159.
  • There are no charges for Audit Mode. However, participants will not be able to gain access to graded assignments required to earn the certificate.

Fee details for Big Data Analysis: Hive, Spark SQL, DataFrames and GraphFrames Course

Course Purchase Fees

Rs. 2,159 (includes full access to course material and graded assignments)

Audit Only

Free access to course material except graded assignments

Financial Aid

Available on application

certificate availability

Yes

certificate providing authority

Coursera

certificate fees

₹2,152

Eligibility criteria

Certification Qualifying Details 

In order to avail the certificate of completion, participants of The Big Data Analysis: Hive, Spark SQL, DataFrames and GraphFrames certificate course will have to complete as well as pass in all the graded items attached to the course. These assignments will consist of quizzes and other assignments (if applicable). Upon passing in these assignments, Coursera will issue an electronic certificate of completion which will be automatically added to the accomplishment page of participants. further, this certificate can be shared online via URL as well as be printed.

What you will learn

Knowledge of apache spark

Upon completion of The Big Data Analysis: Hive, Spark SQL, DataFrames and GraphFrames, participants of the Big Data Analysis: Hive, Spark SQL, DataFrames and GraphFrames course will be able to:

  • Process graphs using the Spark GraphFrames
  • Construct and use Spark DataFrames
  • Write ad-hoc analytical jobs using Spark DataFrames
  • Optimize and debug Spark application performance to its maximum
  • Use Hive, Spark DataFrames and Spark SQL to efficiently warehouse your data
  • Make effective use of networks and social graphs
  • Write and executive queries using Hive and Spark SQL

The syllabus

Welcome to the Second Course: Big Data Analysis

Videos
  • Meet Alexey Dral
  • Meet Natalia Pritykovskaya
  • Meet Pavel Klemenkov
  • Meet Pavel Mezentsev
  • What is BigData Analysis?
  • Tools for BigData Analysis
  • Graph Data Analysis
  • Computations Optimization
Readings
  • Slack Channel is the quickest way to get answers to your questions

Big Data SQL: Hive

Videos
  • Hive Data Definition Language (DDL)
  • Hive Data Manipulation Language (DML)
  • Hive Analytics: RegexSerDe, Views
  • Hive Streaming
  • Hive Optimization: Data Skew
  • Analytics: Business Use Cases
  • Business Use Cases: Solution with Hive
  • HTTP Web Service: Access Log Format
  • Hive Analytics: UDF, UDAF, UDTF
  • Hive Optimization: Partitioning, Bucketing and Sampling
  • Hive PTF (Window Functions)
  • Hive Map-Side Joins: Plain, Bucket, Sort-Merge
  • Hive Optimization: Row-Columnar File Formats, Compression
  • (optional) Regular Expressions, Likbez
  • (optional) SQL: likbez

Big Data SQL: Hive (practice week)

Videos
  • How to Install Docker on Windows 7, 8, 10
  • How to submit your first Hive assignment
  • How to submit your first assignment
Readings
  • Docker Installation Guide
  • Hive assignment. Intro and instructions
  • Assignments. General requirements
  • Grading System: Instructions and Common Problems

Spark SQL and Spark Dataframe

Videos
  • How to process a DataFrame as SQL
  • Advantages of Spark SQL
  • Working with Hive
  • What is Pandas DataFrame and how to create it
  • RDD vs. DF vs. SQL
  • Aggregates
  • User Defined Functions
  • Functions
  • Projection and Filtering
  • Reading and Writing Files
  • Time Processing
  • Window Functions
  • Two-Dimensional Distributions
  • Join

Graph Analysis from Big Data Perspective

Videos
  • Counting common friends. Part I
  • Graph representation
  • Counting common friends. Part II
  • Graph examples
  • GraphFrames: Introduction
  • Motif Finding: DSL
  • Counting common friends. Part III
  • Motif Finding: Counting Mutual Friends
  • Triangles Count: Introduction
  • Motif Finding: Under The Hood. Part 1
  • Triangles Count: Edge Lists
  • Triangles Count: GraphFrame
  • Motif Finding: Under The Hood. Part 2

PageRank and Recent Advances

Videos
  • Introduction
  • GraphFrames
  • Algorithm
  • Taste Graph. Part I
  • Page Rank Algorithm
  • GraphFrames API
  • Taste Graph. Part II
  • RDD Implementation
  • Taste Graph. Part III
  • Random Walk
Readings
  • Graph based Music Recommender

Spark Internals and Optimization

Videos
  • Welcome
  • Shuffle. Where to send data?
  • Shuffle. How to send data?
  • Spark Execution Model
  • PageRank Optimization
  • Optimizing Functions
  • Catalyst
  • Spark SQL. Motivation
  • UDF Optimization
  • Joins
  • Optimizing Joins
  • Catalyst Optimization Example
  • Resource Allocation
  • Memory Management
  • Speculative Execution
  • Persistence and Checkpointing
  • Dynamic Allocation
Readings
  • Deployment of the environment

Admission details

To enrol for the Big Data Analysis: Hive, Spark SQL, DataFrames and GraphFrames online certificate course, the participants are required to follow the below mentioned steps.

Step 1: Visit the course page.

Step 2: Click on the “Enroll for Free” box.

Step 3: Log in or sign up using your Google or Email credentials.

Step 4: You will have the option of “Purchase Course” and “Audit Only”. “Purchase Course” will enable the applicants to receive full access to all course material and graded assignments. Through the “Audit Only” option, applicants will only receive access to course content, and not the graded items.

Step 5: Choose the option of your choice and proceed to make payment.

Please note that applicants via the “Audit Only” option, applicants will be able to access the course content, but not the graded assignments, which is required to avail course completion certificate from Coursera.

Scholarship Details

Coursera offers financial aid/sponsorship support for this course. Through this, participants will be able to access the entire course content including the graded assignments in order to earn the course completion certificate.

To apply for financial aid/scholarship, follow these steps.

Step 1: Click on the ‘Financial aid available’ button on the course homepage. Enter your desired log-in credentials to proceed.

Step 2: Fill the application form with basic information and required fields. To avoid rejection, ensure that the application is more than 150 words.

Step 3: While your application is being reviewed, you can begin the course through audit mode. Please note that the review process can take up to 15 days.

Step 4: Upon review, Coursera will notify your application status as accepted or rejected/denied via email. In the case of application being accepted, participants will be directly enrolled in the course.

Step 5: Participants will have 2 weeks’ time to unenroll from the course, once the application has been accepted.

Evaluation process

Participants will be required to pass in all the graded assignments of the course to earn the course completion certificate from Coursera. These assignments will consist majorly of quizzes and any other applicable assignments given by the instructors. Participants can opt for flexible deadlines, as well as save their progress to be picked up later. Participants will be eligible for a shareable Certificate from Coursera only after completing and passing all the required graded assignments.

How it helps

In this highly digitised era, technological advancements are beaming and achieving new heights. Digitalisation is simplifying processes and optimizing performance, allowing the businesses to flourish more efficiently. This is possible because of useful, informative and critical analysis of ‘Big Data’. Participants of this online course will be benefited with knowledge and skills to use various tools for building highly dynamic and well-organised big data workflows.

Through this online certification course, participants will gain useful insights and working knowledge of Hive, Spark SQL, DataFrames and GraphFrames, which will enable them to efficiently warehouse their data, write and execute queries, as well as work with social graphs and networks. The participants will benefit a great deal by gaining knowledge of Pandas DataFrame, Aggregates, PageRank Optimization, Memory Management and various other tools of big data analysis. The participants will also receive a shareable certificate from Coursera upon completion of the course.

Globally, there is a surge in demand for big data analysts and business analysts who possess the competency to deal with a humongous volume of data. By attending and completing this course, the participants will receive exposure to tools and techniques used in analysing big data and strengthen their skills and competencies to assist their employer organisations in better decision making, forecasting, cost reduction and process optimisation. Participants will also receive their training tutorials from leading instructors representing the big data analysis field.

FAQs

Is there a prerequisite to enrol for The Big Data Analysis: Hive, Spark SQL, DataFrames and GraphFrames course?

This course is an advanced level course designed to deliver learning on making use of big data analysis tools like Hive, Spark SQL, DataFrames and GraphFrames, irrespective of the personal skill levels of the participants.

How does the Audit Only option work?

Through the Audit Only option, participants will be able to access only the course material consisting of videos and readings, and not the graded assignments like quizzes. Participants do not have to pay any fees or charges to ‘audit’ the course.

Who can become eligible for certification in The Big Data Analysis: Hive, Spark SQL, DataFrames and GraphFrames?

Only those participants who have completed all the course work and have passed with sufficient grades will become eligible for Coursera certification.

How does the shareable course certificate work?

You will receive your certificate once you have passed in all the graded assignments. From the accomplishments page, you can share the certificate online on LinkedIn, CV and more, as well as print the same.

What is the Purchase Course method?

By purchasing the course, participants will have access to full contents of the course including quizzes and other graded assignments required to earn certification.

Similar Courses

Big Data Capstone Project

The University of Adelaide, Adelaide via Edx

6 Weeks Online
Expert
Free

Post Graduate Program in Big Data

Belhaven University, Mississippi via Intellipaat

7 Months Online
Expert
₹ 75,012

Big Data Applications Machine Learning at Scale

Yandex via Coursera

5 Weeks Online
Expert
Free

Data Architect

Udacity

4 Months Online
Expert
₹41,820 ₹49,200

Big Data and Education

Penn via Edx

8 Weeks Online
Expert
Free

Big Data Analytics using Spark

UC San Diego via Edx

10 Weeks Online
Expert
Free

Courses of your Interest

TOGAF 9 Combined Level 1 and Level 2 Training

TOGAF 9 Combined Level 1 and Level 2 Training

SkillUp Online via Simplilearn

8 Hours Online
Expert
Free
Data Science Bootcamp Interview Guaranteed

Data Science Bootcamp Interview Guaranteed

IIIT Bangalore via upGrad

9 Months Online
Expert
₹ 150,000
Advanced Certificate Program in DevOps

Advanced Certificate Program in DevOps

CMU School of Computer Science, Pitts... via TalentSprint

6 Months Online
Expert
₹ 240,000
Mastering Deep Learning Using Apache Spark

Mastering Deep Learning Using Apache Spark

Simpliv Learning

Online
Expert
$149 $749
Devops with AWS CodePipeline Jenkins and AWS CodeD...

Devops with AWS CodePipeline Jenkins and AWS CodeD...

Simpliv Learning

Online
Expert
$199 $999
Machine Learning with Python from Linear Models to...

Machine Learning with Python from Linear Models to...

MIT Cambridge via Edx

15 Weeks Online
Expert
Free
Post Graduate Program in Test Architect

Post Graduate Program in Test Architect

Belhaven University, Mississippi via Intellipaat

101 Hours Online
Expert
₹ 89,034
Computer Applications of Artificial Intelligence a...

Computer Applications of Artificial Intelligence a...

Purdue University, West Lafayette via Edx

5 Weeks Online
Expert
Free

Advanced Power Searching With Google

Google via Edx

2 Weeks Online
Expert
Free
Automated Software Testing Model and State Based T...

Automated Software Testing Model and State Based T...

Delft University of Technology via Edx

5 Weeks Online
Expert
Free

Trending Courses

Popular Courses

Popular Platforms

Learn more about the Courses

Download the Careers360 App on your Android phone

Regular exam updates, QnA, Predictors, College Applications & E-books now on your Mobile

Careers360 App
150M+ Students
30,000+ Colleges
500+ Exams
1500+ E-books