Information Retrieval and Mining Massive Data Sets

BY
Udemy

Explore and learn about the strategies to create a google scale information retrieval system with the ‘Information Retrieval & Mining Massive Data Sets’ course.

Mode

Online

Fees

₹ 549 2799

Quick Facts

particular details
Medium of instructions English
Mode of learning Self study
Mode of Delivery Video and Text Based

Course overview

The ‘Information Retrieval and Mining Massive Data Sets’ online course is developed for the learners to gain knowledge about the different methodologies and strategies required to create an Information Retrieval system. The program is provided by the online education platform Udemy. The course curriculum consists of thirty-nine hours of videos and lectures which can be learned independently by the students at their own pace.

The course instruction is provided along with the English language subtitles that aid in a better understanding of the concepts. The theoretical concepts and the practical application of information retrieval and mining large data sets are engaged by the students and guided by the course instructors, Omkar Despande and the Mentors Net.

The ‘Information Retrieval and Mining Massive Data Sets’ training program equips students with the opportunity to have access to the course materials any time and the students will receive the course completion certificate after finishing the course.

The highlights

  • Online mode
  • 39 hours of Video
  • Expert guidance
  • Full-time course access
  • Mobile access 
  • TV access
  • Course Certificate

Program offerings

  • Videos
  • Lectures
  • Articles
  • Quizzes
  • Exercises
  • Subtitles
  • Full-time access
  • Mobile access
  • Tv access
  • Course completion certificate

Course and certificate fees

Fees information
₹ 549  ₹2,799

The ‘Information Retrieval and Mining Massive Data Sets’ classes are available to the learners after the payment of the course fee.

Information Retrieval and Mining Massive Data Sets fee structure

HeadAmount
Original PriceRs. 2799
Discounted PriceRs. 549
certificate availability

Yes

certificate providing authority

Udemy

Who it is for

The ‘Information Retrieval and Mining Massive Data Sets’ online certification program is designed for big data enthusiasts and data scientists to improve their knowledge and professional skills in information retrieval and data mining.

Eligibility criteria

The students who wish to apply for the ‘Information Retrieval and Mining Massive Data Sets’ online course are required to have prior knowledge with probability and linear algebra along with expertise on graduate-level algorithms, and an understanding of programming languages such as C, Java, and Python.

Certificate qualifying details

The students of the ‘Information Retrieval and Mining Massive Data Sets’ online program are eligible to receive the course certificate after completing the course.

What you will learn

Statistical skills Database management Knowledge of data mining

The Information Retrieval and Mining Massive Data Sets curriculum is structured for the students to learn about data mining algorithms to understand large amounts of data with the following aspects,

  • Creating information retrieval systems.
  • Drawing recurring patterns and associations.
  • The concepts of web mining.
  • Tasks such as classification and clustering of data.
  • Comprehension of the recommendation system of the data.

The syllabus

Introduction to Boolean Search Engine

  • What is a data mining
  • Structured Data, Unstructured Data, and Information Retrieval
  • Term-Document Incidence Matrix (1)
  • Term-Document Incidence Matrix (2)
  • Inverted Index
  • Tradeoffs in implementing an Inverted Index
  • Processing AND, OR, NOT queries
  • Overview of Index Construction Pipeline
  • Query optimization using Document Frequency (1)
  • Query Optimization Using Document Frequency (2)
  • Boolean Retrieval Model
  • Example of a Boolean Retrieval Model
  • Limitations of Boolean Retrieval Model
  • How to evaluate the performance of an IR System
  • Google zeitgeist

Dictionary Data Structure, Tolerant retrieval

  • Parsing Documents and Issues Associated with it
  • Tokenization Process in an IR System
  • Normalization to Terms
  • Faster Postings Merges With Skip Pointers
  • How to Handle Phrase Query
  • Phrase Query Using Positional Index
  • How to handle proximity query
  • Discussion on Positional Index Size

Index Construction. Postings Size estimation sort-based indexing, dynamic index

  • Dictionary Data Structure Implementation
  • Wild card queries
  • Questions on Wild Card Queries
  • Wild Card Query Handling Using Permuterm Index
  • Wild Card Query Handling Using K-Gram Index
  • Soundex Algorithm
  • Spelling Correction Techniques in an IR System
  • Question On Soundex Algorithm
  • Spelling Correction (Part 2)
  • Introduction To Dynamic Programming
  • How To Calculate Edit Distance Between Two Strings
  • Spelling Correction Using Weighted Edit Distance
  • Spelling Correction Using Ngram Overlap Technique
  • Calculating Jaccard Coefficient (An Example)
  • Context-Sensitive Spell Correction

Dictionary Compression, Posting Compression

  • Introduction to Index Construction
  • Index Construction Using InMemory Sorting
  • Index Construction Using BSBI Algorithm
  • Index Construction Using SPIMI Algorithm
  • Introduction To Distributed Indexing
  • How To build distributed indexes
  • Q & A on Distributed Index
  • Map Reduce
  • Dynamic indexing using a naive approach
  • Dynamic indexing using logarithmic merge
  • Issues With Multiple Indexes

Scoring, term weighing, and the vector face model

  • Why do we compress indexes
  • Important Statistics about RCV Collection
  • Various Dictionary Compression Techniques
  • Various Dictionary Compression Techniques Part 2
  • Various Posting Compression Techniques

Efficient vector space scoring, Nearest neighbor techniques

  • Ranked retrieval model
  • Jaccard Score
  • Term Frequency Weighing And Bag Of Words Model
  • Inverse Document Frequency
  • TF-IDF Score
  • Documents AS TF-IDF Vectors
  • Length Normalization
  • Cosine Similarity Example
  • Computing Cosine Scores On Index
  • Variants of TF IDF Weights

Evaluating search engineers, User Happiness, Precision, Recall, F-measure

  • Term at a Time Scoring
  • Efficient Cosine Ranking
  • Generic Approach For Speeding up Cosine Similarity
  • Index Elimination
  • Champion Lists
  • Static Quality Score
  • High And Low Lists
  • Impact Ordered Posting
  • Cluster Pruning
  • Parametric Zone Tired Index
  • Query Term Proximity And Query Parsing
  • How A Search Engine Works

Advertisement system, Google Adsense, Search Engine Optimization

  • Performance of a Search Engine Part 1
  • Performance of a Search Engine Part 2
  • Performance of a Search Engine Part 3
  • Performance of a Search Engine Part 4
  • Performance of a Search Engine Part 5

Supervised learning, Text Classification. Naive-Bayes Text Classification

  • ECommerce Vs. Traditional Businesses
  • Pricing Models For Online Advertisement
  • AdWords and AdSense
  • SEM And SEO

Link Analysis. Web as a graph. PageRank

  • Classification System
  • Document Classification
  • Manual Classification Methods
  • Naive Bayes Classifiers
  • Bayes Rules Of Text Classification
  • Various Classification Methods
  • Example of Multivariate Bernoulli Model
  • Second Version of Naive Bayes
  • Example of Second Version of Naive Bayes

Clustering. Introduction to the problem. Partitioning methods: K-means clustering

  • Reputation system
  • Examples of Reputation System
  • Limitations of Reputation System
  • Page Rank Calculation

Web crawler

  • What is Clustering
  • Applications of Clustering in IR Systems
  • Issues For Clustering
  • Introduction to Clustering Algorithms
  • K-Means Clustering Algorithms
  • Rocchio Algorithms
  • K Nearest Neighbor Algorithms
  • Discussion on K Nearest Neighbor
  • Proof of Rocchio's Algorithm as a linear classifier
  • Worked out Example On Rocchio Algorithms
  • Examples On Bigram Index

Association rules. Market Basket Models and Frequent Item Sets. A priority Algorithm

  • How a web crawler works
  • Complications in Crawling
  • Advance Crawler Architecture
  • URL Frontier
  • Association Rule Introduction
  • Market Basket Model and Frequent Item Sets
  • A formal approach to Association Rules
  • How to find association Rules
  • Storage Considerations for Market Basket
  • Memory Bottleneck in Storage of Market Basket
  • A Naive Algorithm to discover Association Rules Part1
  • A Naive Algorithm to discover Association Rules Part2
  • A Priori Algorithm
  • Extension of A Priori Algorithm

Admission details

The course on ‘Information Retrieval and Mining Massive Data Sets’ by Udemy can be accessed by registering for the course online through the official website.

Step 1: Go to the course page using the following link

https://www.udemy.com/course/information-retrieval-and-mining-massive-data-sets/?couponCode=ST9MT71624

Step 2: Select the ‘Buy Now’ option

Step 3: Fill in the required details and complete the registration.


Filling the form

The students who wish to apply for the ‘Information Retrieval and Mining Massive Data Sets’ program should enter their name, email address, and the password of the Udemy course account to sign in to the program.

How it helps

The Information Retrieval and Mining Massive Data Sets certification is developed for the students to understand the data mining algorithms and find techniques to sort big data problems. The course helps the students develop expertise in creating information retrieval systems. The candidates are provided with course certification for enhancing their careers.

Instructors

Mr Omkar Deshpande

Mr Omkar Deshpande
Data Scientist
Walmart Inc.

Other Bachelors, Other Masters, Ph.D

FAQs

Which online education platform offers the course on ‘Information Retrieval and Mining Massive Data Sets’?

The course is provided by Udemy.

What is the duration of the ‘Information Retrieval and Mining Massive Data Sets’ online course?

The course curriculum consists of thirty-nine hour-long videos and other content that can be completed at any time.

What are the prerequisites to join the ‘Information Retrieval and Mining Massive Data Sets’ training program?

The students are supposed to have prior knowledge in probability, linear algebra, graduate-level algorithms, and programming languages such as C, Java, and Python.

Do the ‘Information Retrieval and Mining Massive Data Sets’ benefits include a course certificate?

Yes, the students are issued with the course certificate after completion of the training.

Trending Courses

Popular Courses

Popular Platforms

Learn more about the Courses

Download the Careers360 App on your Android phone

Regular exam updates, QnA, Predictors, College Applications & E-books now on your Mobile

Careers360 App
150M+ Students
30,000+ Colleges
500+ Exams
1500+ E-books