Search for a command to run...
Mesocorticostriatal dopamine projections are crucial for value learning, motivational control, and cognitive functions. However, while dopamine's role in value learning as reward-prediction-error (RPE) has been much understood, precise roles in motivational control and cognitive functions remain more elusive. Computationally, this corresponds to that while the operation of mesostriatal dopamine could be minimally described by simple reinforcement learning (RL) models with one-dimensional reward/RPE and fixed state representation, (i) how reward-specific motivational control can be achieved through heterogeneous dopamine responses, and (ii) how sophisticated cortical state representation can be formed through mesocortical dopamine, cannot be captured by such simple models. To address both of these at once, we combined recent models for each of them: the "Reward Bases (RB)", which achieved reward-specific motivational control through multi-dimensional RPE (but with fixed cortical representation), and the "online value-recurrent-neutral-network (OVRNN)", which achieved state-representation learning through training of RNN by RPE (but of one-dimensional). We show that the combined model can achieve both functions simultaneously via double 'feedback alignments' of the cortical and striatal downstream connections to the mesocorticostriatal dopamine projections. Crucially, cortical inhibition-dominance is a key for successful learning. Excessive excitation leads to aberrant persistent activity, which disrupts the alignments and impairs reward-specific motivational control and credit assignment. This implies how negative and positive symptoms of schizophrenia could emerge from excitation-inhibition imbalance, and we show how our model could explain altered brain activations in patients. Our model thus provides an integrated computational account for dopamine's functions, with implications on how its dysfunctions link to schizophrenia.<b>Significance statement</b> Dopamine has been suggested to play crucial roles in value learning, motivational control, and cognitive functions, and they have been tried to be understood using the reinforcement learning (RL) framework. However, existing RL models have two limitations: reward identity/diversity is ignored, and state/action representation is handcrafted. Recent studies addressed either of them, but only separately. We combine these separate models, and demonstrate that reward-specific value and state representation can be simultaneously learned through double operations of "feedback alignment", a bio-plausible alternative to the dominant machine-learning algorithm. Crucially, inhibition-dominance is a key for successful learning. Excessive excitation-induced persistent activity disturbs alignments and impairs motivational control and credit assignment, implying how excitation-inhibition imbalance could lead to negative and positive symptoms of schizophrenia.