U E D R , A S I H C RSS

머신러닝스터디/2017/Reinforcement Learning/

Difference between r1.14 and the current

@@ -15,6 +15,7 @@
* 1-ε 의 확률로 greedy action
* ε의 확률로 random action
* GLIE: Greedy in the Limit with Infinite Exploration
* ε이 step k에서 1/k로 점점 작아진다면(fade out) GLIE이다
* Sarsa
* one step update policy TD?
* on policy


Reinforcement Learning

Lecture 4: Model Free Prediction


Lecture 5: Model Free Control

동영상 주소: https://www.youtube.com/watch?v=0g4j2k_Ggc4&t=2466s
  • on policy vs off policy
  • Policy Iteration: Iterate these two step
    1. Policy evaluation
      • Evaluate value function with given policy π
    2. Policy Improvement
      • Update policy in current state s, current action a, current reward r to next state s', nest action a' ->
sarsa
  • Greedy policy improvement
  • ε-Greedy policy improvement
    • 1-ε 의 확률로 greedy action
    • ε의 확률로 random action
  • GLIE: Greedy in the Limit with Infinite Exploration
    • ε이 step k에서 1/k로 점점 작아진다면(fade out) GLIE이다
  • Sarsa
    • one step update policy TD?
    • on policy
    • Sarsa는 다음과 같은 조건에서 converge한다
      1. GLIE sequence of policies
      2. Robinson Monro sequence of step sizes
Valid XHTML 1.0! Valid CSS! powered by MoniWiki
last modified 2021-02-07 05:29:28
Processing time 0.0323 sec