Harry Klopf’s Hedonistic Hypothesis

Neurons will maximize the local analog of pleasure and minimize the local analog of pain, i.e. it is a RL agent.

specific hypothesis

When a neuron fires an action potential, all the contributing synapses become eligible to undergo changes in their efficacies, or weights.

If the action potential is followed within an appropriate time period by an increase in reward, an efficacies of all eligible synapses increase (or decrease in the case of punishment).

Klopf proposed an eligibility trace graph:

In RL, eligibility trace is an exponential decrease graph:

Contingent eligibility: depends on pre- and post- synaptic activity. Lead to three factor learning. Non-contingent eligibility: depends only on pre-synaptic activity. Lead to two factor learning.

In actor-critic algorithm, we can regard critic as non-contingent eligibility, and actor as contingent eligibility.