머신러닝스터디/2017/Reinforcement Learning/

Difference between r1.2 and the current

@@ -1,5 +1,24 @@

== Reinforcement Learning ==

=== Lecture 4: Model Free Prediction ===

=== Lecture 5: Model Free Control ===

https://www.youtube.com/watch?v=0g4j2k_Ggc4&t=2466s

* ε-Greedy

동영상 주소: https://www.youtube.com/watch?v=0g4j2k_Ggc4&t=2466s

* on policy vs off policy

* Policy Iteration: Iterate these two step

1. Policy evaluation

* Evaluate value function with given policy π

1. Policy Improvement

* Update policy in current state s, current action a, current reward r to next state s', nest action a' ->

sarsa

* Greedy policy improvement

* ε-Greedy policy improvement

* 1-ε 의 확률로 greedy action

* ε의 확률로 random action

* GLIE: Greedy in the Limit with Infinite Exploration

* ε이 step k에서 1/k로 점점 작아진다면(fade out) GLIE이다

* Sarsa

* one step update policy TD?

* on policy

* Sarsa는 다음과 같은 조건에서 converge한다

1. GLIE sequence of policies

1. Robinson Monro sequence of step sizes

on policy vs off policy
Policy Iteration: Iterate these two step
1. Policy evaluation
  - Evaluate value function with given policy π
2. Policy Improvement
  - Update policy in current state s, current action a, current reward r to next state s', nest action a' ->

sarsa

Greedy policy improvement
ε-Greedy policy improvement
- 1-ε 의 확률로 greedy action
- ε의 확률로 random action
GLIE: Greedy in the Limit with Infinite Exploration
- ε이 step k에서 1/k로 점점 작아진다면(fade out) GLIE이다
Sarsa
- one step update policy TD?
- on policy
- Sarsa는 다음과 같은 조건에서 converge한다
  1. GLIE sequence of policies
  2. Robinson Monro sequence of step sizes