== Reinforcement Learning == === Lecture 4: Model Free Prediction === === Lecture 5: Model Free Control === 동영상 주소: https://www.youtube.com/watch?v=0g4j2k_Ggc4&t=2466s * on policy vs off policy * Policy Iteration: Iterate these two step 1. Policy evaluation * Evaluate value function with given policy π 1. Policy Improvement * Update policy in current state s, current action a, current reward r to next state s', nest action a' -> sarsa * Greedy policy improvement * ε-Greedy policy improvement * 1-ε 의 확률로 greedy action * ε의 확률로 random action * Sarsa * one step update policy TD? * on policy * Sarsa는 다음과 같은 조건에서 converge한다 1. GLIE sequence of policies 1. Robinson Monro sequence of step sizes