== Reinforcement Learning == === Lecture 4: Model Free Prediction === === Lecture 5: Model Free Control === 동영상 주소: https://www.youtube.com/watch?v=0g4j2k_Ggc4&t=2466s * on policy vs off policy * ε-Greedy * Policy Iteration * Iterate these two step 1. Policy evaluation 1. Policy Improvement * Sarsa * one step update policy TD? * on policy * Sarsa는 다음과 같은 조건에서 converge한다 1. GLIE sequence of policies 1. Robinson Monro sequence of step sizes