Difference between r1.2 and the current
@@ -1,5 +1,24 @@
== Reinforcement Learning ==
=== Lecture 5: Model Free Control ===
=== Lecture 4: Model Free Prediction ===
https://www.youtube.com/watch?v=0g4j2k_Ggc4&t=2466s
* ε-Greedy
동영상 주소: https://www.youtube.com/watch?v=0g4j2k_Ggc4&t=2466s
* on policy vs off policy
* Policy Iteration: Iterate these two step
1. Policy evaluation
* Evaluate value function with given policy π
1. Policy Improvement
* Update policy in current state s, current action a, current reward r to next state s', nest action a' ->
sarsa
* Greedy policy improvement
* ε-Greedy policy improvement
* 1-ε 의 확률로 greedy action
* ε의 확률로 random action
* GLIE: Greedy in the Limit with Infinite Exploration
* ε이 step k에서 1/k로 점점 작아진다면(fade out) GLIE이다
* Sarsa * one step update policy TD?
* on policy
* Sarsa는 다음과 같은 조건에서 converge한다
1. GLIE sequence of policies
1. Robinson Monro sequence of step sizes
Reinforcement Learning ¶
Lecture 4: Model Free Prediction ¶
Lecture 5: Model Free Control ¶
동영상 주소: https://www.youtube.com/watch?v=0g4j2k_Ggc4&t=2466s
- on policy vs off policy
- Policy Iteration: Iterate these two step
- Policy evaluation
- Evaluate value function with given policy π
- Evaluate value function with given policy π
- Policy Improvement
- Update policy in current state s, current action a, current reward r to next state s', nest action a' ->
- Update policy in current state s, current action a, current reward r to next state s', nest action a' ->
- Policy evaluation
- Greedy policy improvement
- ε-Greedy policy improvement
- 1-ε 의 확률로 greedy action
- ε의 확률로 random action
- 1-ε 의 확률로 greedy action
- GLIE: Greedy in the Limit with Infinite Exploration
- ε이 step k에서 1/k로 점점 작아진다면(fade out) GLIE이다
- ε이 step k에서 1/k로 점점 작아진다면(fade out) GLIE이다
- Sarsa
- one step update policy TD?
- on policy
- Sarsa는 다음과 같은 조건에서 converge한다
- GLIE sequence of policies
- Robinson Monro sequence of step sizes
- GLIE sequence of policies
- one step update policy TD?