Appearance
REINFORCEMENT LEARNING SUBJECT β
Subject: Reinforcement Learning Lectures: 7 (Intro to RL β Exploration and Exploitation) Status: Complete
OVERVIEW β
Foundations of reinforcement learning including Markov Decision Processes, Dynamic Programming, Model-Free methods (Monte-Carlo, TD Learning), Q-Learning, Sarsa, Value Function Approximation, and Exploration strategies.
STRUCTURE β
reinforcement-learning/
βββ README.md # Subject index (all 7 lectures combined)
βββ lecture-1.md through lecture-7.md # Individual lecture files
βββ assets/ # Lecture images (hashed filenames)WHERE TO LOOK β
| Lecture | Topic | Key Concepts |
|---|---|---|
| Lecture 1 | Intro to RL | Agent, Environment, Policy, Value Function, Model, Exploration vs Exploitation |
| Lecture 2 | MDP | Markov Process, Bellman Equation, Optimal Value Function, Optimal Policy |
| Lecture 3 | DP | Policy Evaluation, Policy Iteration, Value Iteration, Gridworld Example |
| Lecture 4 | Model-Free Prediction | Monte-Carlo, Temporal-Difference (TD), TD(Ξ»), Bootstrapping |
| Lecture 5 | Model-Free Control | Q-Learning, Sarsa, On-Policy vs Off-Policy, Importance Sampling |
| Lecture 6 | Value Approximation | Incremental Methods, Batch Methods, Gradient Descent, Deep RL foundations |
| Lecture 7 | Exploration | Multi-Armed Bandits, Regret, Ξ΅-greedy, UCB, Thompson Sampling |
CONVENTIONS β
File Naming:
- Lectures numbered sequentially:
lecture-1.md,lecture-2.md, etc. - No zero-padding used (e.g.,
lecture-7.md, notlecture-07.md)
Image References:
- Paths use absolute references from docs root:
/assets/reinforcement-learning/[filename].[ext] - Images have SHA256-style hashed filenames (e.g.,
lksIJhrPs0l3bx6melO3GRytcEX52dXEGmw8ITx99nw=.png)
Content Organization:
README.mdcontains ALL lectures sequentially (main reading flow)- Individual
lecture-N.mdfiles exist for sidebar navigation - Both versions kept in sync - if you update
README.md, update corresponding lecture file
ANTI-PATTERNS (THIS SUBJECT) β
None documented.
NOTES β
Focus Areas (from subject header):
- Dynamic Programming β Priority
- Q Learning β Priority
- Model Free β Priority
- Value Based β Priority (discrete spaces)
- Policy Based β Can be skipped
Math Syntax: Use KaTeX with $...$ for inline or $$...$$ for block equations.