Chapter 3: From elementary
learning to cognitive control: a neurocomputational perspective
3.1 From classical conditioning to cognitive control
As was pointed out in Chapter 2, Pfc has been shown to
be involved in those tasks that use a delayed-response paradigm and in which
performance is linked to the ability of the animal to maintain information over
some delay in order to release a response at a later point in time. The model
presented here has shown how a field of Pfc neurons can store a given STM representation
and at the same time face the presentation of several distractors interleaved
between the to-be-stored cue and the release of the action. However this
model, as well as most of the models described in the literature, does not take
into account how the organism learns to maintain relevant cues in STM,
namely how cognitive control develops as an emergent property of a biological
system. In particular, the relevant questions are:
1)
How does an animal learn the DAT (or
another task in which a STM load is required) in the first place?
2)
How is the DA response calibrated in
order to allow robust STM maintenance in Pfc?
3)
How does Pfc select the relevant
cues and the relevant responses that are associated with the reward?
In order to address these questions, let’s consider in
detail how a DAT is structured, as an example of a rather complex task that
involves “cognitive control”, STM, response selection, behavioral inhibition,
among others. When a monkey has first to learn a DAT, it engages in an
exploratory behavior, which thanks to the intervention of the experimenter is
constrained to the apparatus/cues relevant for the task. In the DAT, the monkey
has to hold a lever for a given time (5 seconds in the example). At the end of
this delay period, a go-signal is turned on (both lights of the panel are
turned on), and the monkey should press the opposite panel that has previously
selected in the preceding trial (Figure 3.1).

Figure 3.1. Structure
of the delayed alternation task. Successive go-signals (simultaneous
apparition of the two circles, indicated by arrows) require alternating between
two responses (right and left) separated by a delay of 5 s. In order for the
task to be performed, a representation (or “trace”) of the previous response
must be preserved in order to perform correctly the next response.
This go-signal is not informative of the required
response, since it is identical for the two different responses. The only way
the monkey can perform the task correctly isto hold in STM the previous response,
and to use this information in order to to select the appropiate panel.
From the ecological point of view, this task is rather
complex and incorporates several hard problems. First of all, at the time the
reward is delivered after the monkey presses the correct key, several stimuli
are present in the environment, and several responses have been emitted by the
animal. How does the animal learn the contingency between stimuli, its
behavior, and the reward? How does the animal learn that the reward is a
function of a) the previous response and b) holding the bar for a given delay?
Once the proper set of cues/responses is found to be
in a causal relationship with the reward, how does the animal learn how to use
the cue in order to control its behavior and guide it trough the obtainment of
the reward? This task is even harder to perform when a delay is interposed
between cue, behavioral response and the reward, and other stimuli intervene
between the time the cue is presented and the reward is delivered.
As we have seen in the simulation, if Pfc activation
is continuously updated by bottom-up (BU) input and buffered from distortion
(STM storage) only when DAergic activation occurs, then the problem of how Pfc
ever learns a trace of previous cues/responses arises. If, as it was discussed
in the previous chapter, Pfc is normally vulnerable to interference unless
DAergic gating occurs, then Pfc could only preserve the most recent patterns of
activations, namely the one that were present at the time the unconditioned
stimulus (US) – or reward – occurred. But in the DAT the significant response
(the action of selecting the opposite target) is produced several seconds
before the reward is delivered, and many interleaved task irrelevant
stimuli/responses can potentially be interposed between the relevant cue/action
and the reward. How does the animal learn what is relevant in obtaining the
reward? How does the brain solve the paradox of being unable to store
Pfc representation unless these are followed by a DAergic response, and at the
same time learn contingencies between stimuli that were present (and probably
decayed) long before the reward was generated?
As can be seen, the DAT is rather complex, and
involves both classical (cues) and operant (response) conditioning. For the
purpose of developing a real-time, ecological model of how cognitive control
can develop, we will take into account a simpler task.
3.2 The linkage between cognitive control and elementary learning
In order to understand
more complex forms of conditioning, experimental settings will be taken into
account here which incorporate a) a delay between a stimulus, a response and a
reward and b) the control of otherwise preponderant response pattern by higher
order areas intended to be the neural substrate of “cognitive control”.
Figure 3.2 illustrates a typical experimental setting
in which a mixture of operant and classical conditioning paradigm is used. In
this hypothetical experiment, a reward is delivered whenever a response (e.g., a lever press)
follows the presentation of a cue. In this simple example, the presentation of
the cue (CS2) must be followed by a response (RESP1), and reward is delivered
if the rat presses the lever within a given, long interval following the onset
of the cue. The temporal relationship between cue and response is important,
since in this protocol no reward is delivered if the response precedes the cue
(RESP1 occurs before CS2).

Figure 3.2. In an
ecological setting, as well as in some experimental situations, the animal is
exposed to several stimuli (CSn in the figure), and typically emits some sort
of response during its normal activity (exploratory behavior, etc..). When a
reward is unexpectedly delivered following a given cue (CS, classical
conditioning) or a response (operant conditioning) or a mixture of
cue/response, the animal has to solve a very difficult task in order to obtain
the reward again, namely causally link cue, behavior and reward.
Even in this
apparently simple example, several problems have to be “solved” by the animal
in order to learn that a given cue, followed by a given response, are the
crucial contingencies to be learned. How does the animal discard all other cues
and responses, and finally selects the ones which are in causal relationship
with the reward? How does the animal learn that the temporal order of
cue/response is crucial in order to obtain the reward? Finally, how does the
organism prevent the premature
release of the action, which will prevent the obtainment of the reward, but release the response after an
appropriate interval?
In order to
reduce even more the complexity of the “ecological” task shown above, we will
consider a variation of trace conditioning, which is the simplest case of
classical conditioning in which STM storage of a cue is required. The task used
will be actually an hybrid between trace conditioning and operant conditioning,
sice a motor output would be required to the animal. Furthermore, this response
should happen in a given time window following the CS, therefore requiring and
adaptively timed calibration of the motor output.
In trace
conditioning, a cue is followed by a

Figure 3.3. Conditioning paradigms: delay,
trace, simultaneous, and backward conditioning.
3.3 Bridging the temporal gap between representations
Let’s imagine a simple system (Figure 3.4) in which we
have a “posterior” (input), an “anterior” (control) and a motor (output) fields
of neurons, which are intended to mimic a sensory, a prefrontal and a motor
cortical area. Let’s imagine that a fourth, subcortical area receives
homeostatic signals from the simulated organism, which is a different source of
input not related to the environment but rather to internal physiological
variables. This is the

Figure 3.4. The model.
The variables described in the equations are shown in the model. The motor
cortex corresponds to the output stage of the model, and an anterior cortex is
interposed between the posterior (input) cortex and the output. In this
simplified model, the
The field of anterior cortical neurons is the same as
the one simulated in the first set of experiments, which is proposed to be the
analogous of Pfc, and includes two neural species (excitatory pyramidal and an
inhibitory interneurons). As opposed to the first set of simulations, the
external input is now modeled by another population of posterior units, which
also include two neural species (pyramidal cells and inhibitory interneurons).
Anterior units receive projections from the dopaminergic neuron, and are
equipped with self-excitatory connections in their pyramidal field. The DA
unit, in turn, receives primary reward signals, and projects to the anterior
system. In the model, this field of neurons corresponds to the VTA. A more
detailed explanation of the neurobiological foundation of the model is given
later in this chapter when the final model will be described, but is discarded
here in order to highlight how computational, behavioral and theoretical
constraints, rather than the need to match biological data, are the main guide
in the specification of the characteristic of the model (Swanson, 1982; Oades
and Halliday, 1997; Floresco and Grace,
2003).
The following equations define the activation of
posterior (x), anterior (y) and motor (w) pyramidal
neurons, posterior (m), anterior (n) and motor (o)
inhibitory interneurons neurons, VTA units (r), and the signal function f(h)
used in the recurrent portion of the activation. Pyramidal neurons of the
posterior cortex (xi) project to the anterior cortex (yj)
trough adaptive, modifiable connections zij:
(9) (10) (11) (8)
![]()
![]()
(12) (13) (14) (15)
![]()
where Ii is the bottom-up input to the
cell,
is the self-excitatory input,
is the recurrent
excitatory and inhibitory inputs, A and B and C are the decay rate, the excitatory and
the inhibitory saturation point, respectively, and f(h) is the feedback
function defined by Equation (7), where h is the argument of the
function and F is a constant. In Equation (5), 0 £ DA £ 1.
In the above equations, all terms are like the one
used in the first set of simulations, with the exception of the term REWARD,
which is 1 only when the reward is delivered, 0 otherwise. The unit r
has a leaky-integrator type dynamic (a differential equation wiuth constaint
increments and variable leakage) which broadcast r to Pfc neurons, which
substitute DA of previous simulations. The adaptive connections from
posterior to anterior system adopt the outstar learning rule (Grossberg, 1982),
which is basically a variant of hebbian learning with a decay, self normalizing
term. The outstar learning rule has been demonstrated to maintain synaptic
weights bounded and to converge to a solution in which the pattern of synaptic
weights tracks the post-synaptic activation (Grossberg, 1982).

Figure 3.5. The system
depicted in figure 3.4 cannot learn a trace-conditioning paradigm. Note
that the trace zij between posterior and anterior systems can
be reinforced up to a certain extent, but no STM maintenance will survive since
the Pfc activity would be already decayed by the time the DA activation is
delivered, which would allow Pfc reverberation.
The system described by these equations is not
able to learn a task in which a sufficiently long delay is interposed between
CS and US, as in trace conditioning. In fact, no sensory, motor or prefrontal
trace will be available to be paired with the
Another issue with the model of Figure 3.4 is that the
system does not allow the motor output to be inhibited from releasing a
prepotent, sensory driven response at the time the sensory cue is presented.
Again, this basic property can be considered one of the main features of
cognitive control. Summarizing, the system of Figure 3.4 is insufficient for
explaining the basic target phenomena of trace conditioning. The failure of
this model to cope with the target behavior is a justification of the burden of
expanding the complexity of the model of several orders of magnitude.
In the following section, a candidate model will be
presented in order to cope with a elementary task which involves cognitive
control. Before presenting the outline of the model, a fundamental issue should
be further investigated, namely how the temporal gap between the CS and the
3.4 Synchronizing asynchronous events: the role of hippocampus in learning
There is consistent evidence for the involvement of hippocampus
in learning and memory in general, and conditioning in particular (for recent
reviews, see O'Reilly RC and Norman 2002; Sander, Wiltgen and Fanselow, 2003;
Knierim, 2003). Importantly, the involvement of the hippocampus is limited to trace but not delay conditioning,
therefore emphasizing the importance of the hippocampus in those experimental
paradigms where a STM representation of the stimulus is required (Huerta et
al., 2000; McEchron et al. 1998; Anderson and Steinmetz, 1994; Solomon et al.,
1986). Lesioning the hippocampus and the amygdala produced memory deficits in
the delayed non-matching to sample task in non-human primates (Mishkin, 1978),
a task in which cognitive control (selecting the non-preponderant response) and
trace conditioning (STM storage of activation) are required.
The hippocampal pathway begins in the Entorhinal
cortex (EC), passes first to the dentate gyrus via the perforant pathway (PP),
then along the mossy fibers to area CA3 (Figure 3.6). From CA3, projections to
area CA1 via the Schaffer collaterals, then to the subiculum, and finally back
out to the EC which forms the majority of connections to and from the cortex.
The information that reaches the hippocampus trough perirhinal cortex and EC
comes from the highest integrative cortices, namely secondary and associative
areas of posterior and anterior neocortex. EC neurons respond to stimuli with
highly differentiated, phasic patterns. Direct stimulation of the perforant
path (PP) is more effective in CA1 than in CA3. Repeated PP stimulation leads
to an increase in the
b a


c d


a e

Figure
3.6 The
Hippocampal complex. a) The hippocampus
is located in the depth of the temporal cortex (in the figure, a mouse brain is
shown) b) Detail of a), with CA3 and
CA1 shown. c) The Papez circuit d) Cortical and subcortical structures
interested in the Papez circuit e) Detail
of hippocampus cell morphology and connectivity.
efficacy of electric stimulation, a phenomenon that Vinogradova
(Vinogradova, 2001 for a review) named “chronic potentiation” and that has been
later renamed LTP. CA3 neurons exert their actions locally in the hippocampus through
their Shaffer collaterals, as well as by regulating the activity of
diencephalic brain-stem structures, like the the reticular formation (RF) and
the Nucleus Accumbens (NAc), trough the lateral septal nucleus relay (LS). CA1
exerts its influence on neocortex trough a circuit that consists of these major
stations: CA1 → Subiculum → postcommissural fornix →
mammillary bodies → anterior thalamic nucleus → prefrontal and
cingulate cortex.
From these gross anatomical considerations, it appears
that the information flow in the hippocampus is mainly unidirectional, although
we will see how recurrency and, therefore, feedback, is a typical feature of hippocampus.
Hippocampal lesions have been extensively studied both in neuropsychological
(Squire et al, 2001; Holscher, 2003; Suzuki, 2003) and neurophysiological (see Vinogradova, 2001 for a review) settings. The deficits can be
grouped in two main classes:
- Deficits in memory: this impairment are
selective, involving the consolidation of explicit, declarative, episodic
memory. Implicit, procedural and motor memory are usually preserved.
- Deficits
in selective attention: unstable attention, highly vulnerable to irrelevant stimulation, but
at the same time also rigid, generating difficulties in shifting from one item
to the other.
The
involvement of hippocampus in classical conditioning has been shown in the
context of the Nictitating Membrane Response (NMR) in rabbits (Mauk and Thompson, 1987). Rabbits possess a nictitating membrane (a third eyelid) which has been
shown being conditionable in a classical conditioning paradigm. In NMR
classical conditioning a neutral stimulus (CS), such as a tone, is presented
just before an unconditioned stimulus (US), such as a mild puff of air to the
eye. After repeated pairings of the CS and the
The
conditioned eyeblink is an example of an aversively conditioned somatic motor
response. The response is a highly specific motor movement that becomes
adaptively timed to the presentation of the
The hippocampus has been also proposed to be involved
in spatial navigation and sequence learning (Linsman, 1999; Nathe, Frank; 2003;
Bingman et al., 2003). A strong supporter of the latter argument is Linsman
(see Linsman, 1999 for a review). The work by Linsman is important because it
is an attempt to discuss issues like spatial navigation, adaptive timing,
hetero and auto-associative networks in the light of hippocampal anatomy and physiology.
Linsman does not specifically discuss the involvement of
hippocampal in spatial memory, thereby not limiting the breadth of the theory
to a single subset of behaviors. The emphasis is on the recall of memory sequences
instead of simple “spatial location”, a
position that is more general with respect to the canonical view of hippocampus
as a “position detector” (see discussion on place cells, O'Keefe et al, 1998; O'Keefe and Burgess, 1999; Nathe and Frank, 2003; Bingman et al., 2003). The role of the hippocampus
is then to store, and recall “sequences”, like spatial position or episodes in
a complex situation, and detect a match/mismatch between these predicted
sequences and the sensory data.

Figure 3.7.
Diagram of the main intra-hippocampal wiring. From Linsman, 1999.

Figure 3.8 (Figure caption from Linsman, 1999, pag 235). The Phase-Advance of Hippocampal Place Cells May Reflect the Recall of
Sequences Organized by Theta (5–10 Hz) and Gamma (z40 Hz) Oscillations
(a) A rat
moves through a sequence of positions (A–G), causing the firing of a place cell
over this entire region. The firing of the G cell occurs with an earlier and
earlier phase of theta cycles as the animal moves along this well known path, a
phenomenon known as the phase-advance. Successive theta cycles are labeled 1–7.
This can be explained (Jensen and Lisman, 1996a) as follows: the G cell
represents position G, a region much smaller than the entire place field (A–G),
but fires at positions A through F as part of a sequence recall process. This
process is initiated at the beginning of each theta cycle by a cue signifying
the current position of the animal. The cells encoding this position become
active in the first gamma cycle and in turn activate cells encoding the next position
in the sequence in the next gamma cycle. This sequence prediction can go on
until the last gamma cycle of a theta cycle. As the animal is moving, the cue
at each successive theta cycle is further along the path.
(b) Diagram
showing how on each theta cycle, the firing of the G cell occurs earlier in the
predicted sequence, i.e., at an earlier gamma cycle within a theta cycle.
(c)
Illustration of how multiple memory items in a sequence can be active in
different gamma cycles (which have different phase relative to a theta cycle).
This is what is meant by a phase code. Note that each memory (a place or event)
is represented by the subset of cells that fires in the same gamma cycle
(yellow indicates firing). Phase coding may occur when the hippocampus is in
recall mode (as in [a] and [b]), but also when it is in learning mode. In the
latter case, it acts as a “multiplexing buffer,” as follows: a memory item is
inserted into the buffer and fires in a given gamma cycle on many successive
theta cycles; when the next item is presented, it is also maintained by the
buffer, but in a different (later) gamma cycle. The biophysical processes
required for a multiplexing buffer are as follows. First, the firing of
pyramidal cells activates intrinsic conductances that produce a positive going
ramp critical for the reactivation of memories on subsequent theta cycles.
Second, rapid feedback inhibition onto pyramidal cells generates 40 Hz
oscillations and organizes a winner-take-all process in which only the most
excitable cells (encoding the next item in the sequence) fire in a given gamma
cycle. Third, a recurrent autoassociational network with weights encoding each
item make the cells that encode an item fire as a group, thereby imparting
resistance to noise (see simulations of 1–3 in Jensen and Lisman, 1996b,
1996c).
The belief of hippocampus as a mere feedforward
network involving cerebral cortex - dentate gyrus - CA3 - CA1 – cerebral cortex
has been progressively challenged. The first models incorporated the idea that
CA3 was an autoassociative network that somehow stored memories for a later
retrieval (Marr, 1971). This proposal was based on the observation that CA3
presents a massive recurrency, show LTP, and Hebbian learning. Unfortunately,
CA3 is not the only recurrent network in the hippocampus, but also CA1 and the
Dentate Gyrus show a strong degree of recurrency. In particular, granule cells
(see Figure 3.7) make strong connection on dentate mossy cells, which create a
recurrent network by projecting back to the Granule cells. Lisman (1999) is, in
his own words, the first to propose a functional role for these two distinct
recurrent networks. First of all, Lisman emphasizes that the hippocampus is
only involved in episodic memories, i.e. memories that can be formed during a
single episode. Lisman suggest that the hippocampus has a somehow coarser,
higher level representation of episodes that can then recall more detailed
cortical representations. Linsman stresses the fact that the hippocampus is
especially important in learning sequences of events.
One
important observation is that hippocampectomized rats do orient to novel stimuli (completely novel stimuli), but do not orient when the familiar sequence
on which they have been trained for is altered (Honey et al., 1998). Secondly, place cells tend to fire during sleep in the same sequence
they have been observed firing in the awake state (Skaggs and McNaughton,
1996). A typical physiological feature of place cells is the
so-called “phase advance” (O’Keefe and Recce, 1993): the hippocampus of a rat
the moves into its environment is characterized by theta frequency oscillations
(4-10 Hz) The progressive approach of the rat towards the place field of the
cell causes that cell to fire earlier in the theta cycle. The theta cycle is in
fact divided into faster gamma cycles, in which the shift of activation is
visible (Figure 3.7). This sequence is time compressed, since the theta cycle
is obviously happening at a faster rate with respect to the physical movement
of the rat trough the environment. hippocampus, in this account, is actually a
key “instrument” for predicting environmental events, a feature that
constitutes a key evolutionary advantage.
How
can CA3 store sequences? Lisman (1999) proposes that this property depends on
NMDA receptors present at the recurrent synapses of CA3. These channels are
implied in LTP, and are the biophysical substrate of the Hebbian learning
observed in CA3. An important observation is that NMDA channel activation in
CA1 and CA3 leads to LTP even when the post-synaptic activity lags for 100 ms.
This observation is interesting and puzzling at the same time: if a given event
A is not followed by an event B within a 100 ms gap, Hebbian learning is
virtually impossible. Lisman does address this point by commenting that “The
mechanism described in the previous paragraph could lead to the encoding of
memory sequences in which sequential events have a temporal separation of <
100 ms, but what about the more common situation in which the temporal
separation is much larger? The encoding of such sequences may depend on a short
term memory buffer that can extend the period of active firing for many
seconds. Because hippocampal neurons tend to fire for many seconds after a
brief stimulus

Figure 3.9 (Figure caption from Linsman 1999, pag 236).
Reciprocally Interacting Heteroassociative and
Autoassociative Networks Produce More Accurate Sequence Recall than a Single
Heteroassociative Network (a) In the simplest heteroassociative network,
the cells that encode one memory are selectively connected to the cells that
encode the next memory in a sequence. With each successive step in the sequence
recall process, the memory becomes more degraded, as indicated by the number of
primes. A single network can accurately recall sequences if there is a high
degree of correlation between successive memories, but this will not work in
the general case. (b) An autoassociative network that stores the
associations that constitute each memory item is capable of producing the
correct version of any item (e.g., B) when presented with a degraded version
(e.g., B’).(c) Accurate sequence
prediction through the reciprocal interactions of two networks. One network is
heteroassociative. When the next item in the sequence is produced, it is sent
to the autoassociative network, which is able to correct it. This corrected
version is then sent back to the heteroassociative network, where it serves as
a basis for the next step in the predictive process. Not enough information is
available for a detailed simulation of how this could be carried out by CA3 and
dentate networks, but the following is an example of how some of the key
problems might be dealt with. A cycle begins when memory A cells of CA3 excite
memory B cells of CA3 through recurrent connections, causing single spikes in
these cells and pattern B’. The
spikes are transmitted to the dentate network, where the correct granule cells
for the item B are excited (because of direct input from CA3 or indirect input
through mossy cells). These “correct” granule cells then fire the “correct” CA3
cells. This causes a burst and initiates the next cycle. If a CA3 cell
representing B did not fire because of recurrent input (a false negative), it
will fire because of mossy fiber input. A CA3 cell that is a false positive
will fire only a single spike (since it will not get mossy fiber input). If
only bursts are effectively transmitted to other CA3 cells by the facilitating
recurrent synapses (Lisman, 1997), false positives will have little impact. (d)
Complexities of sequence storage and recall. First, psychophysical evidence
indicates that sequence memory is not strictly a pairwise process between
memories n and n-1. The dashed arrow indicates that connections between
memories n-2 and n may also contribute (see Jensen and Lisman, 1996c for how a
multiplexing buffer makes this possible). Second, studies of human memory
(Howard and Kahana, 1998) and nerve network simulations (Levy, 1996) suggest
that sequence items can be autoassociated with a preexisting sequence that can
be thought of as a sequence of time steps (t1, t2, etc.). Heteroassociation may
therefore not be obligatory for sequence learning.
(Vinogradova, 1984; Hampson et al., 1993;
Colombo and Gross, 1994), the hippocampus must either itself be a buffer or be
driven by a network that has buffering ability. Such persistent firing allows a
single brief presentation to be synaptically encoded by an LTP-type process
that requires repetitive firing to produce synaptic modification.” (Lisman 1999, pag 235)
Linsman
observes that phase advance is also observed in the Dentate Gyrus, and this
area receives feedback connection from CA3. Linsman proposes that the
functional role of the coupled recurrent networks is the following (Figures 3.9
and 3.10). Heteroassociative recurrent networks carry the problem of noise in
their prediction. A small perturbation at a given stage in the sequential step
can lead to a progressively deteriorating recall of information. Linsman
proposes that the Dentate is an autoassociative recurrent network that, given a
specific input (feedback) from CA3, reconstruct an undegraded pattern from the
one generated by the CA3 “hypothesis” and broadcast it back to CA3. How this
fine mechanism could be implemented in CA3 and Dentate is, to me, not clear.

Figure 3.10. (from
Linmas, 1999) The Role of Dentate Synapses in Filtering Out Context and the
Role of the Perforant Path to CA3 in Transmitting Context (a) At the medial
perforant path input to dentate granule cells, contextual information that is
steadily firing (horizontal red arrows) is not transmitted because of
low-frequency depression. Rapid increases in firing (upward arrows) due to
salient information is transmitted. Note that in the dentate, the features
Jerry and Sad are represented by the same cell, whereas this is not the case
for the cortical input cells. This is what is meant by a change in
representation. (b) The same perforant path axons that provide input to the
dentate also provide input to CA3. Even constant “contextual” items produce a
subthreshold depolarizing bias in CA3. This bias enables a single powerful
mossy fiber input (representing event information) to detonate a CA3 cell. In
this way, an item is represented in context, even though context itself does
not cause firing (as observed). (For altogether different models for encoding
context, see Samsonovich and Mc-Naughton, 1997; Minai and Best, 1998.)
What is then the role
of the Perforant Path (PP)? Linsman suggests that PP provides both Dentate and
CA3 with contextual information that appears to be affected by hippocampal
damage (hippocampectomized animals have difficulties in selecting between different contexts that lead to
different rewards). Linsman notes that “there
are no cells in the hippocampus that fire continuously in a particular context.
One explanation is that contextual input to the hippocampus is itself
subthreshold.
Such a subthreshold depolarization could,
however, have important consequences in enabling context-appropriate cells to
be triggered by other inputs” (Lisman 1999, pag. 237). Linsman proposes that PP
information is filtered out in the dentate cells, in such a way that only
relevant information is transmitted to CA3. The same PP input excite, always
subtreshold, CA3, but this time Mossy fiber from the Dentate can trigger firing
of the cells because of the coincidence of Dentate/Mossy fibers. This reasoning
is a bit problematic, since it leaves open the problem of how the dentate knows
what information is relevant (and thus not to be filtered). Finally, how to
relate the autoassociative-heteroassociative role of Dentate-CA3 with this new
function of context representation is another important, unresolved issue.
Projections
from CA3 fan out to CA1, a fact that Linsman sees as the signature of a change
of representation back to cortical standards, whereas point-to-point connection
stands for a relative constant mapping between areas that have a similar
representation. Linsman proposes that CA1 and cortex use the same representation,
whereas CA3 and dentate use different representations.
In partial agreement with
Vinogradova (Vinogradova, 2001), Linsman proposes that CA1 might compute a
match/mismatch between cortical input (trough EC) and prediction originating
from CA3. This idea, that maps back to the original proposal by Sokolov
(Sokolov, 1963) of a brain that forms a representation of the world based on
past events and compares continuously predictions and reality, is incorporated
in may models (Grossberg, 1982; Lynch and Ranger, 1992; Hasselmo and Schnell,
1994; Blum and Abbott, 1996; Levy, 1996). Cells in the mammillary body
(receiving one of the output pathways from the hippocampus, namely from CA1)
fire in exact registration with the expected onset of a repetitive stimulus that
has been omitted (Vinogradova, 2001). Other experiments show a habituative
response of hippocampus to repetitive stimulation, followed by a dishabituative
response when an unexpected stimulus is presented (Vinogradova, 2001).
In a recent paper, Nakazawa et al. (Nakazawa et al.,
2002) have studied the involvement of hippocampal CA3 NMDA receptors in
associative memory recall. The paper is consistent with the


Figure 3.11.
From (Nakazawa et al., 2002). (A) shows the general organization of
the hippocampus and the related Entorhinal cortex. Red arrows show the pathways
studied by Nakazawa et al. EC, Entorhinal cortex; DG, dentate
gyrus; RC, recurrent collaterals; SC, Schaffer collaterals; MF, mossy fibers;
PP, perforant path. Figures B to E show the basic wiring of CA3
and CA1, illustrating the proposed mechanisms for pattern completion. In
control (B) and mutant (D), full cue input (downward arrows) is
provided to CA3 from DG or EC and to CA1 from EC. In control (C) and
mutant (E), a fraction of the original input is provided to activate the
memory trace during recall. Red dots, CA3 RC synapses or SC-CA1 synapses
participating in memory trace formation; red circles, memory traces that are
activated during recall; red dots without red circles, memory trace not
activated during recall; red triangles and lines, CA3 pyramidal cell activity
resulting from pattern completion through recurrent collateral .ring; green
triangles and lines, CA3 pyramidal cell response to external cue information;
open triangles and black lines, silent CA3 pyramidal cells and inactive
outputs; blue triangles, CA1 pyramidal cells.
general view that sees the hippocampus
involved in pattern completion. The ability to retrieve
complete memories on the basis of incomplete sets of cues is a crucial function
of biological memory systems. The authors suggest that pattern completion is
mediated principally by the extensive recurrent connectivity of the CA3 area of
the hippocampus. The authors have tested this hypothesis by generating and analyzing
a genetically engineered mouse strain in which the NMDA receptor gene is
ablated selectively in the CA3 pyramidal cells. The mutant mice normally
acquired and retrieved spatial reference memory in the Morris water maze, but
they were impaired in retrieving this memory when presented with a fraction
of the original cues. These results are explained by a qualitative model
shown in Figure 3.11. The model emphasizes how CA3, due to its recurrent
connectivity, is involved in storing and retrieving relationship between
patterns. Damage to CA3 would be evident in those situations in which only a
partial version of the pattern is provided. In these situations, the
performance of the system relies on the ability of retrieving the whole pattern
(in the example, the set of cues) from a partial version.
Summarizing,
there is sufficient evidence pointing to the fact that the hippocampus in
involved in learning and memory in general, and conditioning in particular.
Furthermore, those tasks in which a temporal gap is introduced are the ones
more affected by hippocampal impairment. The following section will review the
models of hippocampus which have incorporated the notion of a trace between
stimuli which will bridge the gap between temporal disjoint representations.
3.5 Models of timing in hippocampus
The work of Nakazawa
et al. (2002) is a good exemplar of the “stream” of papers proposing some form
of relationship between a recurrent network, memory storage, pattern
completion, hippocampal architecture, and deficits following hippocampal
alterations (Marr, 1971; Gardner-Medwin, 1976; McNaughton and Morris, 1987;
Rolls, 1989; Hasselmo et al., 1995).
None of
these view, however, emphasizes a pregnant characteristics of the behavioral
constraints an animal is facing in an ecological setting, namely that not
all cues that should be associated to a given reward or in a given task
co-occur in time. This is a crucial observation, and is directly related to
the argument discussed in the context of trace conditioning and cognitive
control. These models wrongly assume that all cues that should be associated
are available at the same time for the associative mechanism in CA3. This is an
unjustified assumption, and further mechanism should be invoked to bridge
the temporal gap between different cues, whose representations arise and vanish
in a continuously varying environment. An autoassociative recurrent or
heteroassociative network, as the one depicted in Figure 3.12, can store a
pattern trough a hebbian-like LTM mechanism, with the proviso that maximum learning is obtained when the activation
patterns co-occur in time. The following question then arises, namely how can two representations which
are disjointed in time be ever correlated and mutually reinforced.

Figure 3.12. Diagram representing either
an autoassociative recurrent or heteroassociative network, depending on whether the two sets of units
represent the same population (autoassociative) or different populations
(heteroassociative).
The relationship between timing,
representations of stimulus traces and the hippocampus has been proposed by
several authors (Zipser, 1986; Grossberg and Schmajuk, 1989; Grossberg and
Merrill, 1992, 1996). Zipser (1986) proposes that a chain of neurons exists in
the hippocampus which acts as a delay line. In Figure 3.13, a conditioned
stimulus, CS(t), consists of a short block pulse derived from the onset
of the CS and injected into the delay line. CS(t) then slowly propagates
down the delay line, activating each neuron after a 50 ms delay.

Figure 3.13.
Basic structure of the hippocampal delay line model of adaptive timing as
proposed by Zipser (1986).
In Figure 3.13, NM(t) is the nictitating membrane response produced by the
![]()
Rewriting this as a differential equation, this
learning law can be seen to be of the form of outstar learning (Grossberg,
1969a).
![]()
In these
equations, LTM is gated by the CS at a given delay along the delay line, namely
the CS representation (t-d) time steps ago. An active representation at
a give delay is then multiplied with the
Grossberg
and Schmajuk (1989) and Grossberg and Merrill (1992) proposed a model of timing
in the hippocampus, which they called spectral timing. The spectral
timing mechanism differs from the delay line model of Zipser in several ways.
Instead of a short block pulse CS, Grossberg and Merrill hypothesized an STM
storage site s(t) which converts both trace and delay CSs into step
signals. The delay line of Zipser is replaced with a two-step process involving
a set of independent activities xj(t) with a spectrum of
different reaction rates aj, j=1,2,..,n. These activities are
then gated by habituative transmitter gates yj(t). The gated
activities f(xj)yj(t) all begin to grow at the
onset of a CS, but reach their peak values at a range of distinct times,
after which they decay. These gated activities trigger learning by long term
memory (LTM) traces, zj, in their respective pathways. Some
gated activities are large at the particular ISI when the

Figure 3.14. This
figure illustrates the idea of how an independent spectrum of activation of
chemical concentrations can help to a) bridge the gap between CSi
and US and b) create the substrate of a differential learning of timed signals.
In the figure, a few “spectra” intersect the US presentation. Reinforcement
and temporal summation of those spectra allows the development of a adaptively
timed signal (inset).
An idea
similar to the one proposed by Zipser and Grossberg was already incorporated in
the model proposed by Sutton and Barto (Sutton and Barto, 1981), in which a
“non-stimulating trace” was postulated in order to bridge the temporal gap
between CS and US. The distinction between stimulating and non-stimulating
traces was intended to stress the difference between two different operating
modes of the neural assembly, namely of the “normal” mode in which activation
propagates between nodes, and an “offline” mode in which traces are generated
which do not excite/inhibit other neurons unless an event occurs (US). This idea of stimulus
traces is very similar to the Ca++ spectra which are proposed to be
the physiological substrate of the spectral timing in Grossberg’s model, and is
a further convergence of consensus on the idea that some sort of mechanism
should exist in order to bridge the temporal gap between two events disjointed
in time.
The model proposed by Zipser, Grossberg and Sutton and
Barto are a partial response to the problem of trace conditioning, and do not
assume any relationship between trace conditioning and higher order functions,
like cognitive control, which is a major assumption of this work. For this
reason, the hippocampal mechanism, despite its appealing properties, is not
embedded in a more global architecture which is necessary to explain how higher
functions, like STM and therefore cognitive control, develop from this basic
building block. Furthermore, the role of CA3, dentate gyrus, CA1 and Pfc is not
developed, with a resulting fuzziness in defining the biological grounding of
the models. This work aims to show, among other things, how embedding the
hippocampal complex into a general framework of cognitive control can help bridging
the gap between behavioral, anatomical and physiological data.
3.6 A unifying model for trace
conditioning: cerebral cortex, hippocampus and basal ganglia interplay in
cognitive control
In this paragraph a qualitative, as well as a
quantitative characterization of the model is presented. The brain areas
modeled are shown in Figures 3.15 and 3.16, whereas the schematic of the model
is depicted in Figure 3.17. The areas modeled, as well as the granularity of
the representation, constitute the minimal architecture capable of explaining
the target dataset, and are consistent with the anatomical and
neurophysiological evidences described in the previous sections.
Due to the extent of the areas modeled and the complex
interactions between them, the presentation of the model is organized in two
separate sections:
1)
The first section includes the areas
that are responsible for learning the correlation between CS and US, and their
temporal relationship. This section of the model includes anterior and
posterior cortex, VTA, Nucleus Accumbens, the hippocampal complex, and and the Entorhinal
cortex. The equations that define this first set of areas are further split in
two functional subset, defining a positive and negative phase.
2)
The second section includes Premotor
and Motor cortices, and the Basal Ganglia complex. This section of the model
accounts for the development of an adaptively-timed release of an action. This
is a crucial feature of cognitive control, and allows the system to release a
motor output at a time that is appropriate to obtain the reward.
Despite
the model is intended to be a minimal architecture for explaining the target
database, the complexity of the interactions between the various components is
a clear obstacle to the understanding of its dynamics. It must be kept in mind,
however, that the larger the set of biological and behavioural data a system is
designed to explain, the more complex the system would be. Nonetheless, the
biological systems that the model aims at simulating are several orders of
magnitude more complex than the present model. Finally, it must not be
forgotten that the discretization of brain functions that is often found in
psychology, neuropsychology and, in general, in the neuroscience community, is
a rather arbitrary process. Therefore, the reader that would try to attach a
“label” to the function of a given cortical or subcortical area described in
this model would be confused by the massive amount of feedback present in the
architecture. This high degree of recurrency, a characteristic shared by the
nervous system, is a concrete obstacle to an explicit, verbal labelling of the
components of the network in terms of the aspect of the behavioural performance
they control. Given this proviso, a frame of reference of the main
mechanisms of the model, as well as an approximate description of the
behavioural variable that the model component controls, is given below.
Sensory input enters the system trough the
posterior cortex, the sensory interface of the system. From this simplified,
3-node-structure, the signal travels in two directions: its cortical pathway to
the anterior cortex (Pfc), and its subcortical pathway to the hippocampal
complex. The hippocampal complex is the first subcortical stage of the system,
which is involved in bridging the temporal gap between stimuli that occur
disjointly in time.
The
cortical route, which in the model is intended to mimic the fast
cortico-cortical pathways that are believed to be the main “drivers” of
behaviour, proceeds from Pfc to the premotor cortex (PM) and, from here, to the
motor cortex (M), the final output stage of the system. From PM the second,
major subcortical projection arises: PM projects to the basal ganglia complex
(BG), which in turn gates the motor output by exerting a powerful control over
the thalamic nucleus that gates the motor output. One of the main
characteristics of the model is the interplay between cortical and subcortical
structures in the control of the behavioural output. The hippocampal complex
provides the neural substrate for bridging the temporal gap between
environmental stimuli (CSs) and relevant events (USs) which the organism should
learn to approach/avoid. This is achieved by a complex mechanism, which
includes the Entorhinal cortex (EC), which receives and stores in a
reverberatory network the sensory input, the Dentate Gyrus (D), which produces
a spectra of activation for each sensory representation. When an unexpected US
is delivered (for instance, a reward), a DAergic spike is produced from an innate
pathway in VTA which triggers learning of the signals generated by the
presentation of the CS in the hippocampus. A few repetitions of the CS would
allow the D to produce an output which peak would be approximately timed with
the time of delivery of the
The
second subcortical pathway includes the Basal Ganglia complex and the thalamus
(ventral-anterior/ventro-medial nucleus, THAL). This pathway allows the system
to refrain from executing a learned motor output, e.g. reaching from the reward
before the reward is administered. This adaptively-timed inhibition, a major
feature of cognitive control, is learned trough the BG complex. The PM cortex
is the target of the activation of Pfc, the area which “stores” the behavioural
plan to be executed. This in turn triggers activation of several PM
representations over time, which are in turn broadcasted to the BG and to motor
cortex (M). M produces a motor output whenever a) it receives input from PM and
b) the THAL loop is closed. Many of the motor outputs produced in this way
would be badly timed, so that the
A
detailed implementation of the above-summarized mechanisms is given
below.
1- First section
The following equations define the activation of posterior (x)
and anterior (y) pyramidal neurons, VTA (VTA), Nucleus Accumbens
(NAC), Entorhinal cortex (EC), Dentate Gyrus (D), CA3 (CA3),
the neurotransmitter from Dentate to CA3 (q), Dentate to CA3 LTM weights
(
), the LTM connections between posterior
and anterior cortex (
),CA3 to CA3 LTM weights (wCA3).

Figure
3.15. The brain
areas included in the model (lateral view).

Figure
3.16. The brain
areas included in the model (medial view).

Figure
3.17. A diagram o
the main components of the model. Variable names are indicated in italic.
(18)
![]()
![]()
In the equation, [w]+ and [w]-
stand for the absolute positive and negative rectified values of h, respectively. The function f(h)
is defined as follows:
where
= 1 and n varies according to the equation. Appendix II has a detailed
description of the parameters used in the equations. The function f(w)
is a generic sigmoid function, whose parameters are defined by
and n.
The equations describing the model are listed below and are grouped in
two functional stages:
-
“Positive” phase: in this stage, the
system works in the conventional feedforward fashion. The sensory input enters
the posterior cortex, travels trough the Pfc (cortical pathway) an reaches the
subcortical structures through the EC (posterior cortex to EC) and the Pfc (Pfc
to VTA). The adaptive timing mechanism in the hippocampus ensures that LTM
connections are formed in this area when a DAergic spike (caused by a US or by
a learned CS) is generated in VTA. The adaptive timing ensures that
associations between events that occur disjointly in time can be formed. These
associations influence cortical stages in the “negative” phase, when
consolidation of memory occurs.
-
“Negative” phase: this phase is
interleaved between positive phases, and occurs when no input is delivered to
the system. In this stage, CA3 broadcast back the learned associations formed
during

Figure 3.18. Positive phase. LTM
consolidation is indicated in red. Learning of CS-CS and CS-US contingencies is
modulated by the VTA DAergic activity, such that LTM weights of units
co-activated in time are strengthened. The hippocampal complex would ensure
that temporally disjoint events can be associated. Pfc (anterior cortex) is
implied in the positive phase, but in this diagram the subcortical aspects are
highlited.

Figure
3.19. Negative
phase. a) The associations learned in the hippocampal complex during the
positive phase are “broadcasted” back to cortex for LTM cortical consolidation.
b) The relaxation process in CA3 allows recovery of stored associations trough
recurrect circulation of activation (relaxation).

Figure
3.20. Positive
(performance) and negative (consolidation) phases.
the positive phase. This allows cortico-cortical
connections to be strengthened, and allows the system to rely on fast,
feedforward cotico-cortical connections to perform the task.
Alternatively, this phases can be considered analogous
to a “waking” (or performance) and a “sleeping” (consolidation”) phases, or can
be thought as different functional stages enforced by some oscillatory carrier.
The idea is that these two different functional stages are necessary in order
to allow the system to perform two functions (performance and memory
consolidation) which are incompatible in the same set of equations, but become
consistent when disjoint in time. The parameters of the equations are listed in
the Appendix I. When necessary, equations will be complemented by diagrams in
order to clarify the “anatomy” and the functional relationships between model
components.
-
POSITIVE
PHASE
-
Posterior
cortex (x):

Pyramidal neurons of the posterior cortex (xi) project
to the anterior cortex (yj) trough adaptive, modifiable
connections zij.
(19)
![]()
The activation of the cell is normalized by lateral
inhibition, and is dragged towards –x
trough lateral inhibition and the reset signal. This reset signal is produced
at the end of the trial in order to reset the system. The use of this expedient
allows to greatly reduce the computational time of the simulations by stacking
multiple trials in a short time frame. In this model, only 3 units are used in
the posterior, anterior, premotor and motor cortex.
-
Anterior
cortex – Pfc (y):

(20)
![]()
(21)

where
is the weighted input from the posterior
cortex,
is the reverberatory, STM component of the
network (see first part of the thesis),
is the gating function of DA, and
is lateral inhibition. The positive (peak) and
negative (dip) DAergic component are defined as:
(22)

The equations above
state that the level of DA in Pfc
is equal to a tonic level when the activation of VTA is positive, or equal to
when VTA is inhibited, where
is a constant (see Appendix II). See Dreher
and Burnod (2002) for
a discussion of the time course of DA washout in Pfc.
-
Posterio-to-anterior
cortex LTM weights (
):
(23)


(24)
![]()
where
is a VTA-gated hebbian learning, and
is a VTA and pre-synaptically gated decay. Synaptic
weights therefore grow towards Bw
through
whenever the quantity defined by
is positive. In the equation above and all
other equation, VTA+ and VTA- are defined as:
(25)
![]()
where [VTA]+ and [VTA]- stand for the absolute positive
and negative rectified values of VTA.
According to Equations (24) and (25), the activation of VTA gives rise to a
spike or a dip in VTA depending whether VTA activation is above or below zero.
VTA spikes (VTA+) and dips
(VTA-)
have different, opposite effects on STM maintenance and synaptic plasticity. In
Equation (23), the LTM weights are dragged towards Bw when a)
pre and post-synaptic cells are active and b) a VTA spike has occurred. LTM
weights are reduced when the pre-synaptic cell is active, and a VTA dip has
occurred. Notice, in fact, that a depressed VTA would result is a positive
quantity VTA-, which will
drag the weights towards - ![]()
-
Ventral Tegmental Area - VTA (VTA):

(26)
![]()
where
describes the fact that VTA is excited by a
primary
). The LTM weight between Pfc and
VTA is defined by the following equation:
(27)
![]()
The equation above states that
increases with coincident activation of Pfc
and VTA, and decreases both passively and actively when a VTA dip occurs trough
the term
.
(28)
- DA in hippocampus
(Now-Print signal, N, inhibitory
interneuron, l):
(29)
![]()
![]()
The Adaptive Timing (AT, or spectral timing) model has been described by
Grossberg and coworkers in several works (Grossberg and Schmajuck, 1989; Grossberg and Merril, 1992, 1996). The present model uses
as a building block a modified version of the AT wherever a temporal gap
between cell activation should be bridged. Learning in the Adaptive Timing
mechanism is modulated by a Now-Print signal N (Grossberg and Merril, 1996). This neuron-like element is the
rectified difference between VTA activation and a slowly-varying interneuron l, and
is a constant. The net result of this
2-component-interaction is a signal that varies sharply whenever a variation in
VTA activation occurs. Therefore, these equations can be considered as a
neuron-like implementation of a differentiator. Major details on this model can
be found in Grossberg and Merril (1992, 1996).

Figure
3.21. The
EC-Hioppocampus complex.
-
Entorhinal
cortex (EC):

(30)
![]()
The EC is a recurrent network excited by the posterior cortex. The
highly recurrent architecture has the consequence of setting a great
convergence of information at his cortical stage, confirmed by anatomical and
physiological data (see Hasselmo 1995 for a review). EC constitutes the source
of input for the hippocampal complex and, therefore, for the AT mechanism.
![]()
(31)
- Entorhinal
cortex LTM (
):
where
is a hebbian learning component gated by N. In Equation (31), the synaptic weight
grows when the product
is positive, namely when the EC neurons are active and when the
Now-print signal N is positive. The
LTM weights allow learning of stimulus contingencies in EC when a Now-print
signal is generated.
- The spectral timing mechanism
in the hippocampus: Dentate Gyrus, depletable neurotransmitter, CA3, and LTM
weights from Dentate to CA3
The figure below describes the main dynamics
of the AT mechanism. In particular, the output of the system to a CS (whose
activation is kept high trough EC reverberatory activity) that has been paired
to the

Figure
3.22. The
Adaptive timing mechanism. From top-left to bottom-right: CS; US; activation of
first cellular stage (x, in the model
Dentate Gyrus); depletable neurotransmitter (y, in the model the neurotransmitter between Dentate and CA3); the
product of the activation of Dentate and the neurotransmitter; the gated
product of the activation of Dentate, the neurotransmitter and the adaptive LTM
weights between Dentate and CA3; final output of the system. Notice that after
1000 iterations the output peaks at the time of the
- Dentate Gyrus - (D):
(32)

![]()
where
is the nth variable that scales the
activation in Dn,
determining the rate at which Dn
grows/decay after stimulation from EC.
For each n, where n ={
takes one a velue
between 0.00001 and 0.2 (see Appendi II for the whole parameter listing). Each
therefore scales
the magnitude of the nth activation of D,
therefore creating a spectra of activation functions, in
which the activation of different cells will peak at different time. The Dentate
Gyrus is stimulated by the EC, and in turn projects to CA3. The Dentate has
been documented as a structure where a “code expansion” might take place due to
the higher number of Dentate cells with respect to EC and CA3 (Berger, Berry
and Thompson, 1986). The suggestion of this model is that this code expansion
occurs in the time domain, trough the creation of a spectra of cellular
activation that can be adaptively timed to a rewarding signal.
(33)
-
Neurotransmitter from Dentate to CA3 - (q):
![]()
where
is the accumulation process of the
neurotransmitter towards 1, and
is the depletion of the neurotransmitter which
is proportional to the pre-synaptic activity gated by the transmitter itself. In
Equation (33), a sigmoidal function of the nth Dentate nucleus activation
is multiplied by the available
neurotransmitter qn to
determine the depletion of neurotransmitter, which is therefore proportional to
the strength of the signal and the available neurotrasmitter.
- Dentate to CA3 LTM weights -
(
):
(34)
![]()
where
is a hebbian term which increases the LTM
weight when the Dentate is active and a DAergic spike occurs and decreases when
a dip occurs. These LTM weights ensure learning of the appropriate AT signal
when a CS enters in association with a
- CA3 - (CA3+):
(35)
![]()
The activation of the 3 CA3 cells consists, in the active phase, of the
sum of the dentate activation gated by the neurotransmitter levels and the LTM
weights.
(36)
- CA3 LTM
weights (wCA3):
![]()
where
is a hebbial
learnng component scaled by
which is positive
when VTA+ is positive and
the CA3 units are coactive. The term
is analogous to the previous term, but in this
case the change is gated by a DA dip trough the term VTA-, ad the sign of the synaptic change is negative.
CA3 to CA3 weights are stored trough Hebbian learning in this phase. These
associations will drive cortical learning in the passive phase.
- NAc - (NAC):
(37) (38)

(39)
![]()
The Nucleus Accumbens receives projections from
the hippocampus trough the subiculum, the major output pathways of the hippocampus
(for a review, Zahm, 2000). Equations (37,38) are analogous to Equations (28,29).
In fact, the variables h and u are governed by the same dynamics
that were used for DA in the hippocampus (Now-print signal). These equations
enforce the NAC to emit an output which is a time-derivative of its own
activation. This property enables the NAC to emit an output only when a
substantial, above threshold variation in its activation occurs. In Equations
(38,39),
and
are two constants.
-
NEGATIVE
PHASE
In the negative phase, the associations that have been learned in CA3
during the positive phase are “broadcasted” to the cortical stages, where they
influence cortico-cortical connectivity. The idea behind this stage is that
associations between events which occurred disjointly in time can now be stored
as associations between cortical activations. Cortico-cortical dynamics are
fast, and occur during normal, on-line performance. Therefore an association
that has been temporarily stored in the hippocampus should be transferred to cortex
in order for it to influence behavior.
This
process occurs in the negative phase. In this model negative phases occur
between positive phases in an interleaved fashion, in particular between
trials. This strategy is used in order to reduce the computational time, and an
alternative scheme can be proposed in which negative phases occur with a
different rhythm. The following equations describe the dynamics of the negative
phase.
(40)
- CA3 activation (CA3):
![]()
where i≠j, i≠k. After setting the activation of all
CA3 neurons equal to 1, the activation of the recurrent network in CA3 is
relaxed for 100 epochs. This procedure allows associations stored in the
recurrent LTM connectivity in CA3 to be recovered in the passive phase.
(42) (41)
- Cortico-cortical learning (from posterior
to anterior cortex):
(43)

After setting the activation of xi
= yi = CA3i, learning of
cortico-cortical connections is enabled based on
this cortical activation. Equation (41) defins the activation of posterior and
anterior cortex as the rectified average of CA3 activation. The LTM weights
from posterior to anterior cortex
are also normalized in Equation
(43). Initial cortico-cortical weights wc
= .33.
2- Second section
The
second section of the model describes the areas which are involved in the
generation and the control of the motor output. After a plan has been
stored in Pfc, the excitatory connections between Pfc and Premotor cortex
enable the execution of the motor plan, gaining access to the final stage of
the system, motor cortex. However, remember that the behavioral task (Figure 3.23)
requires a delay to be interposed between the presentation of the CS and the
actual release of action. The main question is now how the system, which has
learned that a CS is associated to a

Figure 3.23. The task. a) shows the general setting of the task. In b), a reward is
delivered when the response falls in the admissible delay, whereas in c) and d)
a premature or excessively delayed response do not lead to reward.
The
main target of this second set of simulations is to test whether the Basal
Ganglia (BG) complex can contribute to allow the system to cope with the task.
The BG is a collection of subcortical nuclei that are involved in the control
of movement. This term was once used to describe all the large nuclear masses
in the midbrain, including the thalamus, but has become restricted to describe
five of these nuclei that share a similar functional architecture: the caudate,
putamen, globus pallidus, subthalamic nucleus, and substantia nigra. The BG
receives no direct sensory input and sends little direct output to the spinal
cord. Rather, their primary input comes from the cerebral cortex and is sent
back to

Figure 3.24. The
areas modeled in the second section.
the cortex
via the thalamus forming various parallel loops. The importance of the BG in the
control of movement first became apparent in clinical studies of patients with
a specific set of movement disorders, which have become known as Parkinson’s
disease and Huntington’s disease. These syndromes rarely included the loss of a
specific motor function, such as the movement of one’s hand, which may be
caused by localized damage in the primary motor cortex. Rather, they appeared
to involve mainly deficits in the general control and initiation of movement.
While early theories view the basal ganglia as having only a modulatory effect
on motor control, more recent research has implicated the basal ganglia as
having many important roles in the contextual
analysis of the environment and the use of this information for the formation
and execution of motor programs and other aspects of intelligent behavior
(Houk, 1995). Some of the roles hypothesized include: sensory-motor associative
learning, operant or instrumental conditioning, reinforcement learning,
procedural learning, adaptive timing, temporal order learning, the formation of
temporal sequences actions, choosing between competing actions, the initiation
of voluntary movement, planning, working memory, and even volition (see Heyder et al. 2004 for a rcent review).
Figure
3.24 shows the main cortical and subcortical areas modeled in this second
section. In particular, two cortical areas (Motor and Premotor) were simulated,
in which the Premotor cortex is simply a copy of the activation of Pfc obtained
in the first experiment. It is assumed that positive and negative reward
signals were generated in the first set of experiments (described in Results,
3.7), when the action was performed in
the appropriate delay (see results for more details on the simulation
procedure). The Premotor cortex (PM) projects both to Motor
cortex (M) and to the Basal Ganglia complex (BG), in particular to the Striatum
(Str), the input structure of the BG. The BG simulated in the model includes
three families of inhibitory interneurons: Str, Globus Pallidus internal and
external segments (GPi and GPe, respectively).GPe, the output nucleus of the
BG, projects trough inhibitory connections to the thalamic nucleus of the motor
cortex (Ventral Anterior/Ventral Lateral Thalamus -THAL).
A motor output can
be generated by M only when the
PM activation is complemented by THAL, which will be the result of the balance
of inhibition and disinhibition coming from the BG. Since all neurons modeled
in the BG are inhibitory interneurons, the net effect of GPe (the output stage
of the BG) on the THAL will depend on the relative balance of inhibition in the
BG circuit, which can be shifted due to learning.
The
BG complex, the key structure proposed here to implement the concept of control,
is characterized by two main pathways which converge to the GPe and determine
the level of inhibition of the THAL and, therefore, the release of action
following a PM input to the striatum (Figure 3.24):
1) The direct
pathway (PM → Str → GPe → THAL → M), trough which the
PM excite Str, which in turn inhibits the GPe and releases the THAL. This pathway’s net effect is to allow the
release of a motor output.
2) The indirect
pathway (PM → Str → GPi → GPe →
THAL → M), in which the extra inhibition provided by the GPi inverts the
sign of GPe output on THAL. This
pathway’s net effect is to prevent the release of a motor output.
How does the BG
complex control an AT release of an action by dynamically shifting the balance
between direct and indirect pathways? How is the AT response shaped by rewards
and punishments, and finely tuned to the release of action in such a way that
the motor output occurs when needed?
The
proposed mechanism is based on a mechanism analogous to the AT mechanism
implemented in the hippocampus. The Str implements a spectra of delays
triggered by PM, analogously to what the Dentate did in the hippocampus when
stimulated by EC. PM is implemented as a population of cells which send motor
plans to motor cortex after Pfc activation.
In response to a given Pfc plan, PM generates
different output at different times. Some of these actions are appropriately
timed in order to get a reward, some are not.
The main functional role of the BG is therefore to
inhibit those motor plans generated in PM which lead to non-reward, while at
the same time allowing those appropriately timed. The BG achieves this result by selectively closing the THAL gate from
PM to M trough inhibition.
PM projects to the
BG through the Str. The output of the Str is then conveyed to the GPi and the
GPe. When an action is generated, the outcome can be either positive (reward)
or negative (punishment). Reward and punishment are coded by a DAergic spike
and dip, respectively. These signals cause an opposite effect on the direct and
indirect pathways:
a) A DAergic spike cause LTM strengthening of Str → GPe inhibition,
and the consequent release of the THAL, with release of motor output.
b) A DAergic dip cause a LTM strengthening of Str → GPi inhibition,
which will result in a GPi disinhibition
and a net inhibition of the THAL, with suppression of the motor output.
A physiological
justification of the differential effects of DA on synaptic plasticity in the
BG can be found in Brown et al. (Brown, Bullock and Grossberg, 1999).
The
basic mechanism of the BG would be, therefore, to inhibit those motor plans generated in PM which lead to punishment
(or non-reward). Below the model components are explained in detail.
For simplicity, this stage is coded
by a population of cells that are active at different times after a Pfc stimulation
(a plan) has been generated.

Figure 3.25. The
relationship between Pfc ad PM.
PM activation is then broadcasted to M and Str.
(44)
The Str is composed of two subsections (which
can be considered analogous to the “patch” and the “matrix” of the Striatum
(Brown, Bullock and Grossberg, 1999). The section of the Str projecting to GPe
is tonically active (
= 0.9), providing a “default” inhibition to the GPi, which
in turn disinhibit the THAL and allows a motor output to be generated. The
section of the Str projecting to the GPi is instead governed by an AT mechanism
of the sort described in the Dentate of the hippocampus. The activation of the
Str obeys an AT dynamics:
![]()
(45)
where
is the nth variable that scales the
activation in Strn
(analogously to Equation 32), determining the rate at which Strn grows/decay after
stimulation from PM. The Str is
stimulated by the PM, and in turn projects to GPi. The neurotransmitter from
Str to GPi is modeled as:
![]()
(46)
![]()
where
is the accumulation process of the
neurotransmitter towards 1, and
is the depletion of the neurotransmitter which
is proportional to the pre-synaptic activity gated by the transmitter itself.
(47)
where
is a hebbian term which increases the LTM
weight when the Str is active and a DAergic spike occurs and decreases when a
dip occurs. Finally, the activation of GPi obeys the following equation:
![]()
This set of equations ensures that a DAergic spike (dip) which occurred
in an arbitrary time after a Str neuron was excited by PM to reinforce (weaken)
the → GPi pathway, with the net result of opening (closing) the THAL gate
and allowing the release (blockage) of motor output.
(48)
![]()
The GPe, the output nucleus of the BG, is tonically active, tonically
inhibited by
, and phasically inhibited by GPi.
Thalamic activation is simply defined by:
(49)
![]()
where
=0.1. THAL will be normally open
unless GPe is excited, which will in turn depend on internal BG dynamics.
(50)
![]()
Finally, motor cortex, the output stage of the system, is equal to:
This simple equation states that M will be active whenever 1) a PM
activity is generated and 2) the THAL
gate is open.
3.7 Simulation results
The
following set of simulations is aimed at testing the ability of the network
described in the previous paragraphs to autonomously
learn a contingency between a CS and a

Figure 3.26: The network is presented
with 20 CS-US pairings, followed by 30 CS-alone presentations (extinction trials).
In the training trials, the CS is on from 1-50 msec, US on from 150 to 200 msec
(ITI = 400 msec). In the extinction trails, the CS is presented for 50 msec,
with ITI = 550 msec. Each trial therefore lasts 600 msec in both learning and
extinction trials. Left column, from top to bottom: CS (blue) and US (red)
presentation; weight from Pfc to VTA; VTA (blue) and NAC (red) activation, NAc
activation alone (notice different scale); right column, from top to bottom:
posterior cortex activation (blue = CS, red = US), anterior cortex (CS1 = blue,
CS2 = magenta, US = red), weight between anterior and posterior cortex (only
weights from x1→y1, x1→y3 and x3→y3 are shown), CA3
activation, CA3 weights. In the figure, W 1=1, 1=3, 3=1 and 3=3 represent
synaptic weights between the respective CA3 neurons.
The
experimental protocol is split in two set of simulations. First, the network is
presented with 20 CS-US pairings, followed by 30 CS-alone presentations
(extinction trials). In the training trials, the CS is presented from 1-50
msec, and the
As can be seen from the activation
in anterior cortex, STM activation gradually develops as the learning trials
proceed, analogously to what has been shown in the model presented in Chapter 2.
The difference with that model is that, this time, the STM maintenance is
autonomously achieved. In the previous model, instead, the VTA activation was
artificially shaped in order to achieve STM storage and update. The activation
of anterior cortex is shown in Figure 3.27.

Figure 3.27: From top to bottom: input,
y1, y2 and y3 activation (anterior cortex, or Pfc). Notice that y2, corresponding
to the (non-presented) CS2, is silent.
STM storage is achieved as a
function of the dynamics of VTA. Due to the highly recurrent architecture of
the system, which parallels than characterists of biological neural systems, it
is difficult to isolate what specific system or subsystem is “responsible” for
a given functional property.

Figure 3.28: From top to
bottom: input, VTA activation, NAC activation, net (rectified) VTA output.
Figure 3.26
shows how the LTM weights from Pfc to VTA grow during CS-US pairings. This
plasticity allows the CS to learn exciting Pfc when the CS itself is presented.
This is due to the fact that synaptic plasticity is originally triggered by the
STM is both learned (Figure 3.29 and
3.30) and extinguished (Figure 3.31). The mechanism of extinction can be easily
related to the influence of the GABAergic projection of NAc to VTA. The NAc
receives the AT output of hippocampus which is timed to the
In extinction, the output of the NAc
is unopposed by the

Figure 3.29: Typical run in early
training. From top to bottom: input (CS1 = blue, US = red), posterior cortex
activation (x1(CS1) = blue, x2(US) = red), anterior cortex activation (y1(CS1)
= blue, y2(US) = red), VTA activation (blue), NAC activation (red). Notice the
STM activity in anterior cortex, and the
learning.
Notice that, however, with the experimental protocol ad the parameter choice
the LTM weight from Pfc to VTA does not decay completely. This observation can
be consistent with the introspective experience that, once a stimulus has been
conditioned to a given US (pleasurable/noxious stimulus), its “psychological”
status is permanently changed, despite the output of the system does not
necessarily reflect this permanent change.
Figure 3.32 shows an interesting
comparison between the model VTA and the DAergic cell discharge profile
obtained by Schultz et al. (1997) from VTA. When a

Figure 3.30: Details of learning trial.
From top to bottom: input (CS1 = blue, US = red), posterior cortex activation
(x1(CS1) = blue, x2(US) = red), anterior cortex activation, (y1(CS1) = blue,
y2(US) = red), VTA activation (blue),
NAC activation (red). Notice the STM maintenance in anterior cortex and the
adaptively timed activation of NAC that provides inhibition of VTA at the time
of

Figure 3.31: Late extinction trials. From
top to bottom: input (CS1 = blue), posterior cortex activation (x1(CS1) =
blue), anterior cortex activation, (y1(CS1) = blue, y2(US) = red), VTA activation (blue), NAC activation (red).
STM maintenance is greatly suppressed (except in one trial), as well as VTA activation.

Figure 3.32: Comparison of the model VTA activation with DAergic cell discharge profile
obtained by Schultz et al. (1997). When a
Interestingly,
in trials when only the CS is presented (i.e., extinction trials), a dip in the
The second set of
experiments tests how the motor-BG aspects of the model are intimately related
to the concept of cognitive control. In
the first set of simulations we has seen how a plan can been stored

Figure 3.34. a) Activation of a Pfc plan triggers PM activation at different time. b)
Learning trough the BG circuit selectively weakens the output of those PM cells
which are not appropriately times with the “admissible” range for performing an
action.
in Pfc. As a consequence, an action can now be triggered by frontal
cortex through Pfc to primary and secondary Motor cortex connections. Remember,
however, that the behavioral task (Figure 3.23) requires an adaptively timed
release of action. The main rationale behind this second set of experiments is
illustrated in Figure 3.34. The main idea is that activation of a Pfc plan triggers a spectra of PM activation (not explicitly
modeled in the system) at different time after Pfc fires. Learning trough the
BG selectively weakens the output of those PM cells which are not appropriately
timed with the “admissible” temporal

Figure 3.34. From top-left to bottom-right: PM activation; reward; activation of
spectra in Str; transmitter from Str to GPi; Str*transmitter gated signal;
Str*transmitter*LTM from Str to GPi doubly gated signal; net output from Str to
GPi.

Figure 3.35. Left column, from top to bottom: M activation; Str activation; GPe; Right
column, from top to bottom: THAL activation; GPi activation; PM activation. Motor
cortex is inactive, and the action is not released.

Figure 3.36. From top-left to bottom-right: PM activation; reward; activation of
spectra in Str; transmitter from Str to GPi; Str*transmitter gated signal;
Str*transmitter*LTM from Str to GPi doubly gated signal; net output from Str to
GPi.

Figure 3.37. Left column, from top to bottom: M activation; Str activation; GPe; Right
column, from top to bottom: THAL activation; GPi activation; PM activation. Motor
cortex is active, and the action is released.
range for performing an action, whereas allows
the performance of the actions which are appropriately timed. Figure 3.35 shows an example of how the BG circuit can
electively weaken a PM representation which is not appropriately timed in order
to obtain a reward. In Figure 3.34 the negative reward (DAergic dip), generated
in the way seen in the first set of experiments, weakens the motor output
consequent to that specific PM activation, as shown by the dip in Str activation.
Figure 3.35 shows the behavior of the main areas involved. This situation
should be contrasted with the one described in Figures 3.36 and 3.37, in which
a reinforcement o fhe pathway leading to the correct response is shown. Figure
3.37 shows a PM response that is adaptively timed to the reward. This cause M
to release the action, and LTM changes that allow opening of the THAL gate for
this specific PM activation.
These
simulations shows how a similar mechanism of the type implemented in the
hippocampus, namely AT, can be used in a fairly different context than in
learning contingencies between CS and US. In this instantiation of the AT
mechanism, the contingencies to be learned are a) a motor response and b) the
consequence of the motor response. Since the consequence of a motor response
can occur at an undetermined future moment after the motor action has been
produced, a mechanism for bridging this temporal gap is required. The AT
mechanism can therefore be seen as a general purpose machinery to solve a
correspondingly general problem, namely bridging temporal gaps between neural
representations.