(The use of the term reward is used here in a neutral fashion and does not imply any pleasure, hedonic impact or other psychological interpretations.) In general we are following Marr's approach (Marr et al 1982, later re-introduced by Gurney et al 2004) by introducing different levels: the algorithmic, the mechanistic and the implementation level.
The best studied case is when RL can be formulated as class of Markov Decision Problems (MDP).
If the model (T and r) of the process is not known in advance, then we are truly in the domain of RL, where by an adaptive process the optimal value function and/or the optimal policy will have to be learned.
The most influential algorithms, which will be described below, are: Early on, we note that the state-action space formalism used in reinforcement learning (RL) can be also translated into an equivalent neuronal network formalism, as will be discussed below.
Among neuroscientists, reinforcement learning (RL) algorithms are often seen as a realistic alternative: neurons can randomly introduce change, and use unspecific feedback signals to observe their effect on the cost and thus approximate their gradient. Each neuron uses an RL-type strategy to learn how to approximate the gradients that backpropagation would provide -- in this way it learns to learn.
However, the convergence rate of such learning scales poorly with the number of involved neurons (e.g. We provide proof that our approach converges to the true gradient for certain classes of networks.
They can be used to acquire the optimal value function and/or the optimal policy.
Most notably here Value-Iteration and Policy-Iteration are being used, both of which have their origins in the field of Dynamic Programming (Bellmann 1957) and are, strictly-speaking, therefore not RL algorithms (see Kaelbling et al 1996 for a discussion).
Furthermore RL is necessarily linked to biophysics and the theory of synaptic plasticity.
RL methods are used in a wide range of applications, mostly in academic research but also in fewer cases in industry.