Careers360 Logo
Interested in this College?
Get updates on Eligibility, Admission, Placements Fees Structure
Compare

Quick Facts

Medium Of InstructionsMode Of LearningMode Of Delivery
EnglishSelf Study, Virtual ClassroomVideo and Text Based

Course Overview

The Georgia Institute of Technology, USA, offers the Reinforcement Learning online programme by Udacity. This four-month-long course aims to teach you important Reinforcement Learning (RL) concepts. You will study through a blend of recent papers and classic work in the area.

Throughout the Reinforcement Learning online training, you will learn about automated decision-making from the point of view of Computer Science. The course offers in-depth learning content taught by industry professionals. You will learn via interactive quizzes, video lectures and practical exercises.

The Reinforcement Learning syllabus covers a wide range of RL topics comprehensively. These include convergence, generalisation, game theory, Bellman equations, MDP (Markov Decision Process), among others. You will also explore efficient algorithms, multiagent and single-agent planning, and more. 

The Reinforcement Learning course by Udacity also describes Temporal Difference (TD) learning and related concepts. You will also reiterate a result from a published paper in RL at the end of this advanced online programme.

Also Read:
Deep Reinforcement Learning Certification Courses

The Highlights

  • Free access
  • Self-paced learning
  • Online course
  • 4-months programme
  • Offered by Georgia Tech, USA
  • Advanced-level course

Programme Offerings

  • Online Learning Platform
  • Practical exercises
  • 4-months training
  • Free programme access
  • Exhaustive curriculum
  • video lectures
  • Self-paced learning
  • An offering of Georgia Tech
  • Industry expert instructors

Courses and Certificate Fees

Certificate Availability
no

Eligibility Criteria

There are some prerequisites to join the Reinforcement Learning training. You must be familiar with Java programming. Plus, you must have completed a graduate-level machine learning programme. You also need some prior exposure to RL.

What you will learn

Machine learningApplication of ML Algorithms

Near the end of the Reinforcement Learning programme, you will have an understanding of:

  • Basic RL concepts
  • The theoretical perspective of Machine Learning (ML)
  • Algorithms and procedures to learn near-optimal decisions from experience
  • RL topics like Temporal Difference (TD), generalisation, convergence, Bellman equations, etc.

Admission Details

Step 1 – Reach the Reinforcement Learning course page by clicking here: https://www.udacity.com/course/deep-reinforcement-learning-nanodegree--nd893.

Step 2 – Hit the ‘Start Free Course’ button to open a registration page.

Step 3 – Fill in some basic details and click on ‘Sign Up for a new Udacity account. Alternatively, you can sign in by linking an existing Facebook or Google account.

Step 4 – That’s it. You will get enrolment in the Reinforcement Learning programme by Udacity after logging in. 

Application Details

It’s easy to enroll in the Reinforcement Learning course without filling up an application form. Just log on to Udacity’s web portal, go to the course page and create an account for getting admission. You simply need to enter your full name, email address and password while creating a new account.

The Syllabus

  • Let’s do the time warp again
  • Introduction
  • Decision making and Reinforcement Learning
  • The world – 1 
  • The world – 2
  • Markov Decision Process – 1
  • Markov Decision Process – 2
  • Markov Decision Process – 3
  • Markov Decision Process – 4
  • More about rewards – 1
  • More about rewards – 2
  • More about rewards – 3
  • A sequence of rewards – 1  
  • A sequence of rewards – 2
  • A sequence of rewards – 3
  • A sequence of rewards – 4
  • Assumptions
  • Policies – 1
  • Policies – 2
  • Finding policies – 1
  • Finding policies – 2
  • Finding policies – 3
  • Finding policies – 4
  • Back to the future
  • The Bellman Equations – 1
  • The Bellman Equations – 2
  • The Bellman Equations – 3
  • Bellman Equations relations
  • The third Bellman equation
  • What have we learned?

  • Mystery game – 1
  • Mystery game – 2
  • Behaviour structures – 1
  • Behaviour structures – 2
  • Evaluating a policy
  • Evaluating a learner
  • What have we learned?

  • Temporal difference learning
  • RL context – 1
  • RL context – 2
  • TD Lambda
  • Value computation example
  • Estimating from data
  • Computing estimates incrementally
  • Properties of learning rates
  • Selecting learning rates
  • TD(1) rule
  • TD(1) example – 1
  • TD(1) example – 2
  • TD(1) example – 3
  • Why TD(1) is “Wrong”
  • TD(0) rule
  • TD(Lambda) rule
  • K-step estimators
  • K-step estimators and TD(Lambda)
  • TD(Lambda) empirical performance
  • What have we learned?

  • Convergence: TD with control
  • Bellman equations
  • Bellman equations with actions
  • Bellman operator – 1
  • Bellman operator – 2
  • Contraction mappings
  • Contraction mapping quiz
  • Contraction properties
  • The Bellman operator contracts – 1
  • The Bellman operator contracts – 2
  • Max Is a non-expansion
  • Proof that Max Is a non-expansion – 1
  • Proof that Max Is a non-expansion – 2
  • Convergence – 1
  • Convergence – 2
  • Convergence theorem explained – 1
  • Convergence theorem explained – 2
  • Generalised MDPs
  • Generalised MDPs – Solutions – 1
  • Generalised MDPs – Solutions – 2
  • Generalised MDPs – Solutions – 3
  • What have you learned?

  • More on value iteration – 1
  • More on value iteration – 2
  • More on value iteration – 3
  • Linear programming – 1
  • Linear programming – 2
  • Linear programming – 3
  • Policy iteration
  • Domination
  • Why does policy iteration work?
  • B_2 is monotonic
  • Another property in policy iteration – 1 
  • Policy iteration proof
  • Another property in policy iteration – 2 
  • What have we learned?

  • Changing the reward function
  • Multiplying by a scalar
  • Adding a scalar
  • Reward shaping
  • Shaping in RL
  • Potential-based shaping in RL
  • State-based bonuses
  • Potential-based shaping – 1
  • Potential-based shaping – 2
  • Q-learning with potentials – 1
  • Q-learning with potentials – 2
  • What have we learned?

  • K-armed bandits – 1
  • K-armed bandits – 2
  • Confidence-based exploration – 1
  • Confidence-based exploration – 2
  • Metrics for bandits – 1 
  • Metrics for bandits – 2
  • Metrics for bandits – 3
  • Metrics for bandits – 4
  • Find best implies few mistakes
  • Few mistakes imply do well – 1
  • Few mistakes imply do well – 2
  • Do well implies find the best
  • Putting it together
  • Hoeffding
  • Combining arm info – 1
  • Combining arm info – 2
  • Combining arm info – 3
  • How many samples? – 1
  • How many samples? – 2
  • Exploring deterministic MDPs – 1 
  • MDP optimisation criteria
  • Exploring deterministic MDPs – 2
  • Exploring deterministic MDPs – 3
  • Rmax analysis – 1
  • Rmax analysis – 2
  • Rmax analysis – 3
  • Lower bound
  • General stochastic MDPs
  • General Rmax
  • Simulation lemma – 1
  • Simulation lemma – 2
  • Explore-or-exploit lemma
  • What have we learned?

  • Example: Taxi
  • Generalisation idea
  • Basic up[date rule
  • Linear value function approximation
  • Calculus
  • Does it work? – 1
  • Does it work? – 2
  • Does it work? – 3
  • Baird’s counterexample – 1
  • Baird’s counterexample – 2
  • Bad update sequence – 1
  • Bad update sequence – 2
  • Bad update sequence – 3
  • Bad update sequence – 4
  • Averagers – 1
  • Averagers – 2
  • Averagers – 3
  • Connection to MDPs
  • What have we learned? – 1
  • What have we learned? – 2

  • POMDPs
  • POMDPs generalise MDPs
  • POMDP example – 1
  • POMDP example – 2
  • State estimation – 1
  • State estimation – 2
  • Value iteration in POMDPs – 1
  • Value iteration in POMDPs – 2
  • Piecewise-linear and convex – 1
  • Piecewise-linear and convex – 2
  • Piecewise-linear and convex – 3
  • Piecewise-linear and convex – 4
  • Algorithmic approach
  • Domination
  • RL for POMDPs – 1
  • RL for POMDPs – 2
  • Learning a POMDP
  • Learning memoryless policies – 1
  • Learning memoryless policies – 2
  • Learning memoryless policies – 3
  • Bayesian RL – 1 
  • Bayesian RL – 2
  • Bayesian RL – 3
  • Predictive state representation
  • PSR example – 1
  • PSR example – 2
  • PSR theorem
  • What have we learned? – 1
  • What have we learned? – 2

  • Generalising generalising
  • What makes RL hard?
  • Temporal Abstraction – 1
  • Temporal Abstraction – 2
  • Temporal Abstraction – 3
  • Temporal abstraction options – 1
  • Temporal abstraction options – 2
  • Temporal abstraction option function – 1
  • Temporal abstraction option function – 2
  • Temporal abstraction option function – 3
  • Temporal abstraction option function – 4
  • Temporal abstraction option function – 5
  • Pac-man problems – 1
  • Pac-man problems – 2
  • Pac-man problems – 3
  • Pac-man problems – 4
  • How it comes together – 1
  • How it comes together – 2
  • Goal abstraction – 1
  • Goal abstraction – 2
  • Goal abstraction – 3
  • Goal abstraction – 4
  • Goal abstraction – 5
  • Monte Carlo tree search – 1
  • Monte Carlo tree search – 2
  • Monte Carlo tree search – 3
  • Monte Carlo tree search – 4
  • Monte Carlo tree search – 5
  • Monte Carlo tree properties – 1
  • Monte Carlo tree properties – 2
  • What have we learned? – 1
  • What have we learned? – 2

  • Scooby Dooby Doo!
  • Game theory
  • What is game theory?
  • A simple game – 1
  • A simple game – 2
  • A simple game – 3
  • Minimax
  • Fundamental result
  • Game tree – 1
  • Game tree – 2
  • Von Neumann
  • Mini poker 
  • Mini poker tree
  • Mixed strategy
  • Lines
  • Centre game
  • Snitch – 1
  • Snitch – 2
  • Snitch – 3
  • A beautiful equilibrium – 1
  • A beautiful equilibrium – 2
  • A beautiful equilibrium – 3
  • The two-step
  • 2Step2Furious
  • What have we learned?

  • The sequencing
  • Iterated prisoner’s dilemma
  • Uncertain end
  • Tit-for-tat – 1
  • Tit-for-tat – 2
  • Facing TFT
  • Finite-state strategy
  • The best response in IPD
  • Folk theorem
  • Repeated games – 1
  • Repeated games – 2
  • Minmax profile
  • Security level profile
  • Folksy theorem
  • Frim trigger
  • Implausible threats
  • TFT versus TFT
  • Pavlov
  • Pavlov vs Pavlov
  • Pavlov is subgame perfect
  • Computational folk theorem
  • Stochastic games and multiagent RL
  • Stochastic games
  • Models and stochastic games
  • Zero-sum stochastic games – 1
  • Zero-sum stochastic games – 2
  • General-sum games
  • Lots of ideas
  • What have we learned?

  • Solution concepts
  • General Tso chicken – 1
  • General Tso chicken – 2
  • General Tso chicken – 3
  • Correlated GTC – 1
  • Correlated GTC – 2
  • Correlated GTC – 3
  • Correlated facts
  • Solution concepts revisited
  • Coco values – 1
  • Coco values – 2
  • Coco definition
  • Coco example
  • Coco properties
  • Mechanism design
  • Peer teaching
  • Peer teaching – 2 
  • Peer teaching – 3
  • Peer teaching – 4
  • Peer teaching – 5
  • King Solomon – 1
  • King Solomon – 2
  • King Solomon – 3
  • King Solomon – 4
  • King Solomon – 5
  • King Solomon – 6
  • King Solomon – 7
  • What have we learned? – 1
  • What have we learned? 2

  • Coordination and communicating
  • DEC-POMDP
  • DEC-POMDP properties
  • DEC-POMDP example
  • Communicating and coaching
  • Inverse Reinforcement Learning – 1
  • Inverse Reinforcement Learning – 2 
  • Output of MLIRL
  • What have we learned (or have we?)
  • Curly, bean me up
  • What we will have learned
  • Not reward shaping
  • Policy shaping – 1
  • Policy shaping – 2
  • Policy shaping – 3
  • Policy shaping – 4
  • Policy shaping – 5
  • Policy shaping – 6
  • Policy shaping – 7
  • Multiple sources – 1
  • Multiple sources – 2
  • Multiple sources – 3
  • Drama management
  • Drama management – 2
  • Trajectories as MDPs
  • Trajectories as TTD MDPs – 1
  • Trajectories as TTD MDPs – 2 
  • What have we learned?

  • Outroduction – part 1
  • Outroduction – part 2

Instructors

Georgia Tech Frequently Asked Questions (FAQ's)

1: Which institute offers the Reinforcement Learning programme?

The Georgia Institute of Technology, USA, offers this online course.

2: Do I have to be familiar with programming?

Yes, you need to have experience with Java programming for joining the Reinforcement Learning course.

3: What is the duration of the Reinforcement Learning programme?

The programme will take about four months to complete.

4: Does the Reinforcement Learning course require any registration fee?

Joining the course requires no registration or course fee.

5: Who are the instructors for the Reinforcement Learning course?

Chris Pryby, Michael Littman and Charles Isbell are the expert instructors for this online programme.

Articles

Back to top