== Reinforcement Learning ==
=== Lecture 4: Model Free Prediction ===

=== Lecture 5: Model Free Control ===
동영상 주소: https://www.youtube.com/watch?v=0g4j2k_Ggc4&t=2466s
 * on policy vs off policy
 * Policy Iteration: Iterate these two step
  1. Policy evaluation
   * Evaluate value function with given policy π
  1. Policy Improvement
   * Update policy in current state s, current action a, current reward r to next state s', nest action a' -> 
sarsa
 * Greedy policy improvement
 * ε-Greedy policy improvement
  * 1-ε 의 확률로 greedy action
  * ε의 확률로 random action
 * GLIE: Greedy in the Limit with Infinite Exploration
  * ε이 step k에서 1/k로 점점 작아진다면(fade out) GLIE이다
 * Sarsa
  * one step update policy TD?
  * on policy
  * Sarsa는 다음과 같은 조건에서 converge한다
   1. GLIE sequence of policies
   1. Robinson Monro sequence of step sizes