Difference between r1.6 and the current
@@ -1,8 +1,17 @@
[[pagelist(^(머신러닝스터디/2017/))]]
* Reinforcement learning, game play
=== Reinforcement Learning ===
https://en.wikipedia.org/wiki/Bellman_equation
Planning vs Learning
Planning
* Know about Model
* Dynamic Programming
Learning
* Model free
* Monte Carlo method, Temporal Difference learning
==== Monte-Carlo Reinforcement Learning ====* 직접적인 경험으로부터 배움
- CNN, Artistic style
- Reinforcement learning, game play
Reinforcement Learning ¶
https://en.wikipedia.org/wiki/Bellman_equation
Planning vs Learning
Planning
Planning vs Learning
Planning
- Know about Model
- Dynamic Programming
- Model free
- Monte Carlo method, Temporal Difference learning
Monte-Carlo Reinforcement Learning ¶
- 직접적인 경험으로부터 배움
- Model-free : 직접적인 MDP transition과 보상을 알 필요가 없다
- 끝난 에피소드로부터 학습한다.
- episodic MDP 문제만 풀 수 있다.
Temporal-Difference Learning ¶
- 경험으로부터 학습한다
- model-free
- 끌나지 않은 경험에서도 학습 가능하다(Bootstraping)