Reinforcement Learning: an Introduction

Before we delve into solving MDP by dynamic programming, let’s review concepts in MDP! Markov Decision Process We can use MDP to describe our problems. It includes Action, State, Reward. Markov Property The next state could be fully derived by only the current state. $$P(s_t,r_t|s_{t-1}, a_{t-1}, \dots s_0) = P(s_t,r_t|s_{t-1},a_{t-1})$$ Reward Hypothesis What we mean by goals and purposes can be well thought of as maximization of the expected value of the cumulative sum of a received scalar signal (reward)....

Reinforcement Learning: an Introduction

How to make decisions in a bandit game?

Dynamic Programming for MDP