Lecture 5: Model Free Control ¶
동영상 주소: https://www.youtube.com/watch?v=0g4j2k_Ggc4&t=2466s
- on policy vs off policy
 
- Policy Iteration: Iterate these two step
 - Policy evaluation
 - Evaluate value function with given policy π
 
 
- Evaluate value function with given policy π
- Policy Improvement
 - Update policy in current state s, current action a, current reward r to next state s', nest action a' -> 
 
 
- Update policy in current state s, current action a, current reward r to next state s', nest action a' -> 
 
- Policy evaluation
- Greedy policy improvement
 
- ε-Greedy policy improvement
 - 1-ε 의 확률로 greedy action
 
- ε의 확률로 random action
 
 
- 1-ε 의 확률로 greedy action
- GLIE: Greedy in the Limit with Infinite Exploration
 - ε이 step k에서 1/k로 점점 작아진다면(fade out) GLIE이다
 
 
- ε이 step k에서 1/k로 점점 작아진다면(fade out) GLIE이다
- Sarsa
 - one step update policy TD?
 
- on policy
 
- Sarsa는 다음과 같은 조건에서 converge한다
 - GLIE sequence of policies
 
- Robinson Monro sequence of step sizes
 
 
- GLIE sequence of policies
 
- one step update policy TD?













