Skip to content

REINFORCEMENT LEARNING SUBJECT ​

Subject: Reinforcement Learning Lectures: 7 (Intro to RL β†’ Exploration and Exploitation) Status: Complete

OVERVIEW ​

Foundations of reinforcement learning including Markov Decision Processes, Dynamic Programming, Model-Free methods (Monte-Carlo, TD Learning), Q-Learning, Sarsa, Value Function Approximation, and Exploration strategies.

STRUCTURE ​

reinforcement-learning/
β”œβ”€β”€ README.md              # Subject index (all 7 lectures combined)
β”œβ”€β”€ lecture-1.md through lecture-7.md  # Individual lecture files
└── assets/               # Lecture images (hashed filenames)

WHERE TO LOOK ​

LectureTopicKey Concepts
Lecture 1Intro to RLAgent, Environment, Policy, Value Function, Model, Exploration vs Exploitation
Lecture 2MDPMarkov Process, Bellman Equation, Optimal Value Function, Optimal Policy
Lecture 3DPPolicy Evaluation, Policy Iteration, Value Iteration, Gridworld Example
Lecture 4Model-Free PredictionMonte-Carlo, Temporal-Difference (TD), TD(Ξ»), Bootstrapping
Lecture 5Model-Free ControlQ-Learning, Sarsa, On-Policy vs Off-Policy, Importance Sampling
Lecture 6Value ApproximationIncremental Methods, Batch Methods, Gradient Descent, Deep RL foundations
Lecture 7ExplorationMulti-Armed Bandits, Regret, Ξ΅-greedy, UCB, Thompson Sampling

CONVENTIONS ​

File Naming:

  • Lectures numbered sequentially: lecture-1.md, lecture-2.md, etc.
  • No zero-padding used (e.g., lecture-7.md, not lecture-07.md)

Image References:

  • Paths use absolute references from docs root: /assets/reinforcement-learning/[filename].[ext]
  • Images have SHA256-style hashed filenames (e.g., lksIJhrPs0l3bx6melO3GRytcEX52dXEGmw8ITx99nw=.png)

Content Organization:

  • README.md contains ALL lectures sequentially (main reading flow)
  • Individual lecture-N.md files exist for sidebar navigation
  • Both versions kept in sync - if you update README.md, update corresponding lecture file

ANTI-PATTERNS (THIS SUBJECT) ​

None documented.

NOTES ​

Focus Areas (from subject header):

  • Dynamic Programming β€” Priority
  • Q Learning β€” Priority
  • Model Free β€” Priority
  • Value Based β€” Priority (discrete spaces)
  • Policy Based β€” Can be skipped

Math Syntax: Use KaTeX with $...$ for inline or $$...$$ for block equations.