To make better use of data, we can use experience replay to increase data efficiency.

Experience replay

We can put ${s,a,s’,r}$ pairs in the buffer and update Q using mini batch methods. To decrease noise in the replay, we average over several samples. (That’s why minibatch)

Infer-Collect framework