Chapter 3: From elementary learning to cognitive control: a neurocomputational perspective

 

3.1  From classical conditioning to cognitive control

 

As was pointed out in Chapter 2, Pfc has been shown to be involved in those tasks that use a delayed-response paradigm and in which performance is linked to the ability of the animal to maintain information over some delay in order to release a response at a later point in time. The model presented here has shown how a field of Pfc neurons can store a given STM representation and at the same time face the presentation of several distractors interleaved between the to-be-stored cue and the release of the action. However this model, as well as most of the models described in the literature, does not take into account how the organism learns to maintain relevant cues in STM, namely how cognitive control develops as an emergent property of a biological system. In particular, the relevant questions are:

1)      How does an animal learn the DAT (or another task in which a STM load is required) in the first place?

2)      How is the DA response calibrated in order to allow robust STM maintenance in Pfc?

3)      How does Pfc select the relevant cues and the relevant responses that are associated with the reward?

In order to address these questions, let’s consider in detail how a DAT is structured, as an example of a rather complex task that involves “cognitive control”, STM, response selection, behavioral inhibition, among others. When a monkey has first to learn a DAT, it engages in an exploratory behavior, which thanks to the intervention of the experimenter is constrained to the apparatus/cues relevant for the task. In the DAT, the monkey has to hold a lever for a given time (5 seconds in the example). At the end of this delay period, a go-signal is turned on (both lights of the panel are turned on), and the monkey should press the opposite panel that has previously selected in the preceding trial (Figure 3.1).

 

 

Figure 3.1. Structure of the delayed alternation task. Successive go-signals (simultaneous apparition of the two circles, indicated by arrows) require alternating between two responses (right and left) separated by a delay of 5 s. In order for the task to be performed, a representation (or “trace”) of the previous response must be preserved in order to perform correctly the next response.

This go-signal is not informative of the required response, since it is identical for the two different responses. The only way the monkey can perform the task correctly isto hold in STM the previous response, and to use this information in order to to select the appropiate panel.

From the ecological point of view, this task is rather complex and incorporates several hard problems. First of all, at the time the reward is delivered after the monkey presses the correct key, several stimuli are present in the environment, and several responses have been emitted by the animal. How does the animal learn the contingency between stimuli, its behavior, and the reward? How does the animal learn that the reward is a function of a) the previous response and b) holding the bar for a given delay?

Once the proper set of cues/responses is found to be in a causal relationship with the reward, how does the animal learn how to use the cue in order to control its behavior and guide it trough the obtainment of the reward? This task is even harder to perform when a delay is interposed between cue, behavioral response and the reward, and other stimuli intervene between the time the cue is presented and the reward is delivered.

As we have seen in the simulation, if Pfc activation is continuously updated by bottom-up (BU) input and buffered from distortion (STM storage) only when DAergic activation occurs, then the problem of how Pfc ever learns a trace of previous cues/responses arises. If, as it was discussed in the previous chapter, Pfc is normally vulnerable to interference unless DAergic gating occurs, then Pfc could only preserve the most recent patterns of activations, namely the one that were present at the time the unconditioned stimulus (US) – or reward – occurred. But in the DAT the significant response (the action of selecting the opposite target) is produced several seconds before the reward is delivered, and many interleaved task irrelevant stimuli/responses can potentially be interposed between the relevant cue/action and the reward. How does the animal learn what is relevant in obtaining the reward? How does the brain solve the paradox of being unable to store Pfc representation unless these are followed by a DAergic response, and at the same time learn contingencies between stimuli that were present (and probably decayed) long before the reward was generated?  

As can be seen, the DAT is rather complex, and involves both classical (cues) and operant (response) conditioning. For the purpose of developing a real-time, ecological model of how cognitive control can develop, we will take into account a simpler task.

 

 

 

 

 

3.2  The linkage between cognitive control and elementary learning

 

            In order to understand more complex forms of conditioning, experimental settings will be taken into account here which incorporate a) a delay between a stimulus, a response and a reward and b) the control of otherwise preponderant response pattern by higher order areas intended to be the neural substrate of “cognitive control”.

Figure 3.2 illustrates a typical experimental setting in which a mixture of operant and classical conditioning paradigm is used. In this hypothetical experiment, a reward is delivered whenever a response (e.g., a lever press) follows the presentation of a cue. In this simple example, the presentation of the cue (CS2) must be followed by a response (RESP1), and reward is delivered if the rat presses the lever within a given, long interval following the onset of the cue. The temporal relationship between cue and response is important, since in this protocol no reward is delivered if the response precedes the cue (RESP1 occurs before CS2).

Figure 3.2. In an ecological setting, as well as in some experimental situations, the animal is exposed to several stimuli (CSn in the figure), and typically emits some sort of response during its normal activity (exploratory behavior, etc..). When a reward is unexpectedly delivered following a given cue (CS, classical conditioning) or a response (operant conditioning) or a mixture of cue/response, the animal has to solve a very difficult task in order to obtain the reward again, namely causally link cue, behavior and reward.

Even in this apparently simple example, several problems have to be “solved” by the animal in order to learn that a given cue, followed by a given response, are the crucial contingencies to be learned. How does the animal discard all other cues and responses, and finally selects the ones which are in causal relationship with the reward? How does the animal learn that the temporal order of cue/response is crucial in order to obtain the reward? Finally, how does the organism prevent the premature release of the action, which will prevent the obtainment of the reward, but release the response after an appropriate interval?

In order to reduce even more the complexity of the “ecological” task shown above, we will consider a variation of trace conditioning, which is the simplest case of classical conditioning in which STM storage of a cue is required. The task used will be actually an hybrid between trace conditioning and operant conditioning, sice a motor output would be required to the animal. Furthermore, this response should happen in a given time window following the CS, therefore requiring and adaptively timed calibration of the motor output.

In trace conditioning, a cue is followed by a US, but only after a delay is interposed between the offset of the cue and the onset of the US. This temporal pattern requires some sort of STM, or internal trace, to develop in order to bridge this temporal gap. Trace conditioning is one of the most well studied conditioning paradigms, some of which are described in Figure 3.3. The strength of conditioning obtained in trace conditioning is usually weaker than the one obtained in delay conditioning, in which CS and US partially overlap in time. We’ll see how even this apparently simple behavioral paradigm requires an unsuspected complicated neural machinery.

Figure 3.3. Conditioning paradigms: delay, trace, simultaneous, and backward conditioning.

3.3 Bridging the temporal gap between representations

 

Let’s imagine a simple system (Figure 3.4) in which we have a “posterior” (input), an “anterior” (control) and a motor (output) fields of neurons, which are intended to mimic a sensory, a prefrontal and a motor cortical area. Let’s imagine that a fourth, subcortical area receives homeostatic signals from the simulated organism, which is a different source of input not related to the environment but rather to internal physiological variables. This is the US pathway of the model, and is active whenever a primary US is delivered (food, shock, etc) and in this first simplified version of the model corresponds to the VTA, the main source of DA for Pfc, which receives afferents from amygdala and hypothalamus, two areas linked to the autonomic nervous system.

 

Figure 3.4. The model. The variables described in the equations are shown in the model. The motor cortex corresponds to the output stage of the model, and an anterior cortex is interposed between the posterior (input) cortex and the output. In this simplified model, the US reached the subcortical stage (VTA) directly from homeostatic receptors, and a signal r (DA) is delivered to Pfc trough modulatory projections. 

 

The field of anterior cortical neurons is the same as the one simulated in the first set of experiments, which is proposed to be the analogous of Pfc, and includes two neural species (excitatory pyramidal and an inhibitory interneurons). As opposed to the first set of simulations, the external input is now modeled by another population of posterior units, which also include two neural species (pyramidal cells and inhibitory interneurons). Anterior units receive projections from the dopaminergic neuron, and are equipped with self-excitatory connections in their pyramidal field. The DA unit, in turn, receives primary reward signals, and projects to the anterior system. In the model, this field of neurons corresponds to the VTA. A more detailed explanation of the neurobiological foundation of the model is given later in this chapter when the final model will be described, but is discarded here in order to highlight how computational, behavioral and theoretical constraints, rather than the need to match biological data, are the main guide in the specification of the characteristic of the model (Swanson, 1982; Oades and Halliday, 1997; Floresco and Grace, 2003). 

The following equations define the activation of posterior (x), anterior (y) and motor (w) pyramidal neurons, posterior (m), anterior (n) and motor (o) inhibitory interneurons neurons, VTA units (r), and the signal function f(h) used in the recurrent portion of the activation. Pyramidal neurons of the posterior cortex (xi) project to the anterior cortex (yj) trough adaptive, modifiable connections zij:    

(9)

 

(10)

 

(11)

 

(8)

 

 

(12)

 

(13)

 

(14)

 

(15)

 

 


where Ii is the bottom-up input to the cell, is the self-excitatory input,  is the recurrent excitatory and inhibitory inputs, A and B and C are the decay rate, the excitatory and the inhibitory saturation point, respectively, and f(h) is the feedback function defined by Equation (7), where h is the argument of the function and F is a constant. In Equation (5), 0 £ DA £ 1.

In the above equations, all terms are like the one used in the first set of simulations, with the exception of the term REWARD, which is 1 only when the reward is delivered, 0 otherwise. The unit r has a leaky-integrator type dynamic (a differential equation wiuth constaint increments and variable leakage) which broadcast r to Pfc neurons, which substitute DA of previous simulations. The adaptive connections from posterior to anterior system adopt the outstar learning rule (Grossberg, 1982), which is basically a variant of hebbian learning with a decay, self normalizing term. The outstar learning rule has been demonstrated to maintain synaptic weights bounded and to converge to a solution in which the pattern of synaptic weights tracks the post-synaptic activation (Grossberg, 1982).

Figure 3.5. The system depicted in figure 3.4 cannot learn a trace-conditioning paradigm. Note that the trace zij between posterior and anterior systems can be reinforced up to a certain extent, but no STM maintenance will survive since the Pfc activity would be already decayed by the time the DA activation is delivered, which would allow Pfc reverberation.

 

The system described by these equations is not able to learn a task in which a sufficiently long delay is interposed between CS and US, as in trace conditioning. In fact, no sensory, motor or prefrontal trace will be available to be paired with the US activation, which in the model enters the system trough the VTA alone. This feature of the model is illustrated in figure 3.5. As can be seen, long delays prevent a temporal overlap between sensory and Pfc trace. Pfc would therefore not learn to store an activation in STM and would not activate the motor output. Furthermore, this system, as it is designed, would not account for the fact that a response, in certain experimental paradigms, must be timed to the US, or is functional to the obtainment/avoidance of the US. A possible way to overcome this problem is to use a STM, recurrent stage, a methodology employed by Grossberg in many models (Grossberg, 1982; Grossberg and Schmajuk, 1989; Grossberg and Merrill, 1992, 1996). This strategy has the same problem that the model presented in the first part of this thesis had, namely the chicken-egg problem of storing STM representation which are not yet being paired with a US, and at the same time postulating that STM storage is a characteristic of learned CSs which are stored in STM.

Another issue with the model of Figure 3.4 is that the system does not allow the motor output to be inhibited from releasing a prepotent, sensory driven response at the time the sensory cue is presented. Again, this basic property can be considered one of the main features of cognitive control. Summarizing, the system of Figure 3.4 is insufficient for explaining the basic target phenomena of trace conditioning. The failure of this model to cope with the target behavior is a justification of the burden of expanding the complexity of the model of several orders of magnitude.

In the following section, a candidate model will be presented in order to cope with a elementary task which involves cognitive control. Before presenting the outline of the model, a fundamental issue should be further investigated, namely how the temporal gap between the CS and the US can be bridged. In particular, the biological candidate structures will be explored.

 

3.4  Synchronizing asynchronous events: the role of hippocampus in learning

 

There is consistent evidence for the involvement of hippocampus in learning and memory in general, and conditioning in particular (for recent reviews, see O'Reilly RC and Norman 2002; Sander, Wiltgen and Fanselow, 2003; Knierim, 2003). Importantly, the involvement of the hippocampus is limited to trace but not delay conditioning, therefore emphasizing the importance of the hippocampus in those experimental paradigms where a STM representation of the stimulus is required (Huerta et al., 2000; McEchron et al. 1998; Anderson and Steinmetz, 1994; Solomon et al., 1986). Lesioning the hippocampus and the amygdala produced memory deficits in the delayed non-matching to sample task in non-human primates (Mishkin, 1978), a task in which cognitive control (selecting the non-preponderant response) and trace conditioning (STM storage of activation) are required.     

The hippocampal pathway begins in the Entorhinal cortex (EC), passes first to the dentate gyrus via the perforant pathway (PP), then along the mossy fibers to area CA3 (Figure 3.6). From CA3, projections to area CA1 via the Schaffer collaterals, then to the subiculum, and finally back out to the EC which forms the majority of connections to and from the cortex. The information that reaches the hippocampus trough perirhinal cortex and EC comes from the highest integrative cortices, namely secondary and associative areas of posterior and anterior neocortex. EC neurons respond to stimuli with highly differentiated, phasic patterns. Direct stimulation of the perforant path (PP) is more effective in CA1 than in CA3. Repeated PP stimulation leads to an increase in the

b

 

a

 

c

 

d

 

a

 

e

 

Figure 3.6 The Hippocampal complex. a) The hippocampus is located in the depth of the temporal cortex (in the figure, a mouse brain is shown) b) Detail of a), with CA3 and CA1 shown. c) The Papez circuit d) Cortical and subcortical structures interested in the Papez circuit e) Detail of hippocampus cell morphology and connectivity.

 

efficacy of electric stimulation, a phenomenon that Vinogradova (Vinogradova, 2001 for a review) named “chronic potentiation” and that has been later renamed LTP. CA3 neurons exert their actions locally in the hippocampus through their Shaffer collaterals, as well as by regulating the activity of diencephalic brain-stem structures, like the the reticular formation (RF) and the Nucleus Accumbens (NAc), trough the lateral septal nucleus relay (LS). CA1 exerts its influence on neocortex trough a circuit that consists of these major stations: CA1 → Subiculum → postcommissural fornix → mammillary bodies → anterior thalamic nucleus → prefrontal and cingulate cortex.

From these gross anatomical considerations, it appears that the information flow in the hippocampus is mainly unidirectional, although we will see how recurrency and, therefore, feedback, is a typical feature of hippocampus. Hippocampal lesions have been extensively studied both in neuropsychological (Squire et al, 2001; Holscher, 2003; Suzuki, 2003) and neurophysiological (see Vinogradova, 2001 for a review) settings. The deficits can be grouped in two main classes:

 

- Deficits in memory: this impairment are selective, involving the consolidation of explicit, declarative, episodic memory. Implicit, procedural and motor memory are usually preserved.

- Deficits in selective attention: unstable attention, highly vulnerable to irrelevant stimulation, but at the same time also rigid, generating difficulties in shifting from one item to the other.

 

The involvement of hippocampus in classical conditioning has been shown in the context of the Nictitating Membrane Response (NMR) in rabbits (Mauk and Thompson, 1987). Rabbits possess a nictitating membrane (a third eyelid) which has been shown being conditionable in a classical conditioning paradigm. In NMR classical conditioning a neutral stimulus (CS), such as a tone, is presented just before an unconditioned stimulus (US), such as a mild puff of air to the eye. After repeated pairings of the CS and the US, the CS elicits a learned or conditioned NMR response (CR) in advance of the US. The two most commonly studied forms of eyeblink conditioning are delay and trace conditioning. In delay conditioning, the CS is presented and remains on until the US is presented with two stimuli overlapping and co-terminating. In trace conditioning, an “empty” (or trace) interval separates the CS and US.

The conditioned eyeblink is an example of an aversively conditioned somatic motor response. The response is a highly specific motor movement that becomes adaptively timed to the presentation of the US. Work with rabbits first demonstrated a clear distinction between delay and trace eyeblink conditioning. The acquisition and retention of delay eyeblink conditioning requires intact cerebellum and associated brainstem structures (Mauk and Thompson, 1987). Like delay conditioning, successful trace eyeblink conditioning requires intact cerebellum (Woodruff-Pak et al. 1985). However, trace conditioning differs from delay conditioning in that it also requires the contribution of hippocampal and neocortical structures. Thus, acquisition and retention of trace conditioning are severely disrupted in rats and rabbits with hippocampal lesions (Moyer et al., 1990; Kim et al., 1995). Notably, trace conditioning in rabbits is disrupted by Pfc lesions (Kim et al., 1995). Another distinctive feature of trace conditioning is that the importance of the hippocampus is time-limited. When hippocampal lesions are made in rabbits 1 day after acquisition, trace conditioning is abolished, whereas lesions made 30 days after acquisition have no effect (Kim et al., 1995).

The hippocampus has been also proposed to be involved in spatial navigation and sequence learning (Linsman, 1999; Nathe, Frank; 2003; Bingman et al., 2003). A strong supporter of the latter argument is Linsman (see Linsman, 1999 for a review). The work by Linsman is important because it is an attempt to discuss issues like spatial navigation, adaptive timing, hetero and auto-associative networks in the light of hippocampal anatomy and physiology.

Linsman does not specifically discuss the involvement of hippocampal in spatial memory, thereby not limiting the breadth of the theory to a single subset of behaviors. The emphasis is on the recall of memory sequences instead of simple “spatial location”, a  position that is more general with respect to the canonical view of hippocampus as a “position detector” (see discussion on place cells, O'Keefe et al, 1998; O'Keefe and Burgess, 1999; Nathe and Frank, 2003; Bingman et al., 2003). The role of the hippocampus is then to store, and recall “sequences”, like spatial position or episodes in a complex situation, and detect a match/mismatch between these predicted sequences and the sensory data.

             

Figure 3.7. Diagram of the main intra-hippocampal wiring. From Linsman, 1999. 

Figure 3.8 (Figure caption from Linsman, 1999, pag 235). The Phase-Advance of Hippocampal Place Cells May Reflect the Recall of Sequences Organized by Theta (5–10 Hz) and Gamma (z40 Hz) Oscillations

(a) A rat moves through a sequence of positions (A–G), causing the firing of a place cell over this entire region. The firing of the G cell occurs with an earlier and earlier phase of theta cycles as the animal moves along this well known path, a phenomenon known as the phase-advance. Successive theta cycles are labeled 1–7. This can be explained (Jensen and Lisman, 1996a) as follows: the G cell represents position G, a region much smaller than the entire place field (A–G), but fires at positions A through F as part of a sequence recall process. This process is initiated at the beginning of each theta cycle by a cue signifying the current position of the animal. The cells encoding this position become active in the first gamma cycle and in turn activate cells encoding the next position in the sequence in the next gamma cycle. This sequence prediction can go on until the last gamma cycle of a theta cycle. As the animal is moving, the cue at each successive theta cycle is further along the path.

(b) Diagram showing how on each theta cycle, the firing of the G cell occurs earlier in the predicted sequence, i.e., at an earlier gamma cycle within a theta cycle.

(c) Illustration of how multiple memory items in a sequence can be active in different gamma cycles (which have different phase relative to a theta cycle). This is what is meant by a phase code. Note that each memory (a place or event) is represented by the subset of cells that fires in the same gamma cycle (yellow indicates firing). Phase coding may occur when the hippocampus is in recall mode (as in [a] and [b]), but also when it is in learning mode. In the latter case, it acts as a “multiplexing buffer,” as follows: a memory item is inserted into the buffer and fires in a given gamma cycle on many successive theta cycles; when the next item is presented, it is also maintained by the buffer, but in a different (later) gamma cycle. The biophysical processes required for a multiplexing buffer are as follows. First, the firing of pyramidal cells activates intrinsic conductances that produce a positive going ramp critical for the reactivation of memories on subsequent theta cycles. Second, rapid feedback inhibition onto pyramidal cells generates 40 Hz oscillations and organizes a winner-take-all process in which only the most excitable cells (encoding the next item in the sequence) fire in a given gamma cycle. Third, a recurrent autoassociational network with weights encoding each item make the cells that encode an item fire as a group, thereby imparting resistance to noise (see simulations of 1–3 in Jensen and Lisman, 1996b, 1996c).

 

The belief of hippocampus as a mere feedforward network involving cerebral cortex - dentate gyrus - CA3 - CA1 – cerebral cortex has been progressively challenged. The first models incorporated the idea that CA3 was an autoassociative network that somehow stored memories for a later retrieval (Marr, 1971). This proposal was based on the observation that CA3 presents a massive recurrency, show LTP, and Hebbian learning. Unfortunately, CA3 is not the only recurrent network in the hippocampus, but also CA1 and the Dentate Gyrus show a strong degree of recurrency. In particular, granule cells (see Figure 3.7) make strong connection on dentate mossy cells, which create a recurrent network by projecting back to the Granule cells. Lisman (1999) is, in his own words, the first to propose a functional role for these two distinct recurrent networks. First of all, Lisman emphasizes that the hippocampus is only involved in episodic memories, i.e. memories that can be formed during a single episode. Lisman suggest that the hippocampus has a somehow coarser, higher level representation of episodes that can then recall more detailed cortical representations. Linsman stresses the fact that the hippocampus is especially important in learning sequences of events. 

            One important observation is that hippocampectomized rats do orient to novel stimuli (completely novel stimuli), but do not orient when the familiar sequence on which they have been trained for is altered (Honey et al., 1998). Secondly, place cells tend to fire during sleep in the same sequence they have been observed firing in the awake state (Skaggs and McNaughton, 1996). A typical physiological feature of place cells is the so-called “phase advance” (O’Keefe and Recce, 1993): the hippocampus of a rat the moves into its environment is characterized by theta frequency oscillations (4-10 Hz) The progressive approach of the rat towards the place field of the cell causes that cell to fire earlier in the theta cycle. The theta cycle is in fact divided into faster gamma cycles, in which the shift of activation is visible (Figure 3.7). This sequence is time compressed, since the theta cycle is obviously happening at a faster rate with respect to the physical movement of the rat trough the environment. hippocampus, in this account, is actually a key “instrument” for predicting environmental events, a feature that constitutes a key evolutionary advantage. 

            How can CA3 store sequences? Lisman (1999) proposes that this property depends on NMDA receptors present at the recurrent synapses of CA3. These channels are implied in LTP, and are the biophysical substrate of the Hebbian learning observed in CA3. An important observation is that NMDA channel activation in CA1 and CA3 leads to LTP even when the post-synaptic activity lags for 100 ms. This observation is interesting and puzzling at the same time: if a given event A is not followed by an event B within a 100 ms gap, Hebbian learning is virtually impossible. Lisman does address this point by commenting that “The mechanism described in the previous paragraph could lead to the encoding of memory sequences in which sequential events have a temporal separation of < 100 ms, but what about the more common situation in which the temporal separation is much larger? The encoding of such sequences may depend on a short term memory buffer that can extend the period of active firing for many seconds. Because hippocampal neurons tend to fire for many seconds after a brief stimulus

 

Figure 3.9 (Figure caption from Linsman 1999, pag 236). Reciprocally Interacting Heteroassociative and Autoassociative Networks Produce More Accurate Sequence Recall than a Single Heteroassociative Network (a) In the simplest heteroassociative network, the cells that encode one memory are selectively connected to the cells that encode the next memory in a sequence. With each successive step in the sequence recall process, the memory becomes more degraded, as indicated by the number of primes. A single network can accurately recall sequences if there is a high degree of correlation between successive memories, but this will not work in the general case. (b) An autoassociative network that stores the associations that constitute each memory item is capable of producing the correct version of any item (e.g., B) when presented with a degraded version (e.g., B).(c) Accurate sequence prediction through the reciprocal interactions of two networks. One network is heteroassociative. When the next item in the sequence is produced, it is sent to the autoassociative network, which is able to correct it. This corrected version is then sent back to the heteroassociative network, where it serves as a basis for the next step in the predictive process. Not enough information is available for a detailed simulation of how this could be carried out by CA3 and dentate networks, but the following is an example of how some of the key problems might be dealt with. A cycle begins when memory A cells of CA3 excite memory B cells of CA3 through recurrent connections, causing single spikes in these cells and pattern B. The spikes are transmitted to the dentate network, where the correct granule cells for the item B are excited (because of direct input from CA3 or indirect input through mossy cells). These “correct” granule cells then fire the “correct” CA3 cells. This causes a burst and initiates the next cycle. If a CA3 cell representing B did not fire because of recurrent input (a false negative), it will fire because of mossy fiber input. A CA3 cell that is a false positive will fire only a single spike (since it will not get mossy fiber input). If only bursts are effectively transmitted to other CA3 cells by the facilitating recurrent synapses (Lisman, 1997), false positives will have little impact. (d) Complexities of sequence storage and recall. First, psychophysical evidence indicates that sequence memory is not strictly a pairwise process between memories n and n-1. The dashed arrow indicates that connections between memories n-2 and n may also contribute (see Jensen and Lisman, 1996c for how a multiplexing buffer makes this possible). Second, studies of human memory (Howard and Kahana, 1998) and nerve network simulations (Levy, 1996) suggest that sequence items can be autoassociated with a preexisting sequence that can be thought of as a sequence of time steps (t1, t2, etc.). Heteroassociation may therefore not be obligatory for sequence learning.

 

(Vinogradova, 1984; Hampson et al., 1993; Colombo and Gross, 1994), the hippocampus must either itself be a buffer or be driven by a network that has buffering ability. Such persistent firing allows a single brief presentation to be synaptically encoded by an LTP-type process that requires repetitive firing to produce synaptic modification.” (Lisman 1999, pag 235)

Linsman observes that phase advance is also observed in the Dentate Gyrus, and this area receives feedback connection from CA3. Linsman proposes that the functional role of the coupled recurrent networks is the following (Figures 3.9 and 3.10). Heteroassociative recurrent networks carry the problem of noise in their prediction. A small perturbation at a given stage in the sequential step can lead to a progressively deteriorating recall of information. Linsman proposes that the Dentate is an autoassociative recurrent network that, given a specific input (feedback) from CA3, reconstruct an undegraded pattern from the one generated by the CA3 “hypothesis” and broadcast it back to CA3. How this fine mechanism could be implemented in CA3 and Dentate is, to me, not clear.

Figure 3.10. (from Linmas, 1999) The Role of Dentate Synapses in Filtering Out Context and the Role of the Perforant Path to CA3 in Transmitting Context (a) At the medial perforant path input to dentate granule cells, contextual information that is steadily firing (horizontal red arrows) is not transmitted because of low-frequency depression. Rapid increases in firing (upward arrows) due to salient information is transmitted. Note that in the dentate, the features Jerry and Sad are represented by the same cell, whereas this is not the case for the cortical input cells. This is what is meant by a change in representation. (b) The same perforant path axons that provide input to the dentate also provide input to CA3. Even constant “contextual” items produce a subthreshold depolarizing bias in CA3. This bias enables a single powerful mossy fiber input (representing event information) to detonate a CA3 cell. In this way, an item is represented in context, even though context itself does not cause firing (as observed). (For altogether different models for encoding context, see Samsonovich and Mc-Naughton, 1997; Minai and Best, 1998.)

 

            What is then the role of the Perforant Path (PP)? Linsman suggests that PP provides both Dentate and CA3 with contextual information that appears to be affected by hippocampal damage (hippocampectomized animals have difficulties in selecting  between different contexts that lead to different rewards). Linsman notes that “there are no cells in the hippocampus that fire continuously in a particular context. One explanation is that contextual input to the hippocampus is itself subthreshold.

Such a subthreshold depolarization could, however, have important consequences in enabling context-appropriate cells to be triggered by other inputs” (Lisman 1999, pag. 237). Linsman proposes that PP information is filtered out in the dentate cells, in such a way that only relevant information is transmitted to CA3. The same PP input excite, always subtreshold, CA3, but this time Mossy fiber from the Dentate can trigger firing of the cells because of the coincidence of Dentate/Mossy fibers. This reasoning is a bit problematic, since it leaves open the problem of how the dentate knows what information is relevant (and thus not to be filtered). Finally, how to relate the autoassociative-heteroassociative role of Dentate-CA3 with this new function of context representation is another important, unresolved issue.

            Projections from CA3 fan out to CA1, a fact that Linsman sees as the signature of a change of representation back to cortical standards, whereas point-to-point connection stands for a relative constant mapping between areas that have a similar representation. Linsman proposes that CA1 and cortex use the same representation, whereas CA3 and dentate use different representations.

            In partial agreement with Vinogradova (Vinogradova, 2001), Linsman proposes that CA1 might compute a match/mismatch between cortical input (trough EC) and prediction originating from CA3. This idea, that maps back to the original proposal by Sokolov (Sokolov, 1963) of a brain that forms a representation of the world based on past events and compares continuously predictions and reality, is incorporated in may models (Grossberg, 1982; Lynch and Ranger, 1992; Hasselmo and Schnell, 1994; Blum and Abbott, 1996; Levy, 1996). Cells in the mammillary body (receiving one of the output pathways from the hippocampus, namely from CA1) fire in exact registration with the expected onset of a repetitive stimulus that has been omitted (Vinogradova, 2001). Other experiments show a habituative response of hippocampus to repetitive stimulation, followed by a dishabituative response when an unexpected stimulus is presented (Vinogradova, 2001). 

In a recent paper, Nakazawa et al. (Nakazawa et al., 2002) have studied the involvement of hippocampal CA3 NMDA receptors in associative memory recall. The paper is consistent with the

Figure 3.11. From (Nakazawa et al., 2002). (A) shows the general organization of the hippocampus and the related Entorhinal cortex. Red arrows show the pathways studied by Nakazawa et al. EC, Entorhinal cortex; DG, dentate gyrus; RC, recurrent collaterals; SC, Schaffer collaterals; MF, mossy fibers; PP, perforant path. Figures B to E show the basic wiring of CA3 and CA1, illustrating the proposed mechanisms for pattern completion. In control (B) and mutant (D), full cue input (downward arrows) is provided to CA3 from DG or EC and to CA1 from EC. In control (C) and mutant (E), a fraction of the original input is provided to activate the memory trace during recall. Red dots, CA3 RC synapses or SC-CA1 synapses participating in memory trace formation; red circles, memory traces that are activated during recall; red dots without red circles, memory trace not activated during recall; red triangles and lines, CA3 pyramidal cell activity resulting from pattern completion through recurrent collateral .ring; green triangles and lines, CA3 pyramidal cell response to external cue information; open triangles and black lines, silent CA3 pyramidal cells and inactive outputs; blue triangles, CA1 pyramidal cells.

 

general view that sees the hippocampus involved in pattern completion. The ability to retrieve complete memories on the basis of incomplete sets of cues is a crucial function of biological memory systems. The authors suggest that pattern completion is mediated principally by the extensive recurrent connectivity of the CA3 area of the hippocampus. The authors have tested this hypothesis by generating and analyzing a genetically engineered mouse strain in which the NMDA receptor gene is ablated selectively in the CA3 pyramidal cells. The mutant mice normally acquired and retrieved spatial reference memory in the Morris water maze, but they were impaired in retrieving this memory when presented with a fraction of the original cues. These results are explained by a qualitative model shown in Figure 3.11. The model emphasizes how CA3, due to its recurrent connectivity, is involved in storing and retrieving relationship between patterns. Damage to CA3 would be evident in those situations in which only a partial version of the pattern is provided. In these situations, the performance of the system relies on the ability of retrieving the whole pattern (in the example, the set of cues) from a partial version.   

Summarizing, there is sufficient evidence pointing to the fact that the hippocampus in involved in learning and memory in general, and conditioning in particular. Furthermore, those tasks in which a temporal gap is introduced are the ones more affected by hippocampal impairment. The following section will review the models of hippocampus which have incorporated the notion of a trace between stimuli which will bridge the gap between temporal disjoint representations.

 

3.5  Models of timing in hippocampus

           

The work of Nakazawa et al. (2002) is a good exemplar of the “stream” of papers proposing some form of relationship between a recurrent network, memory storage, pattern completion, hippocampal architecture, and deficits following hippocampal alterations (Marr, 1971; Gardner-Medwin, 1976; McNaughton and Morris, 1987; Rolls, 1989; Hasselmo et al., 1995).

None of these view, however, emphasizes a pregnant characteristics of the behavioral constraints an animal is facing in an ecological setting, namely that not all cues that should be associated to a given reward or in a given task co-occur in time. This is a crucial observation, and is directly related to the argument discussed in the context of trace conditioning and cognitive control. These models wrongly assume that all cues that should be associated are available at the same time for the associative mechanism in CA3. This is an unjustified assumption, and further mechanism should be invoked to bridge the temporal gap between different cues, whose representations arise and vanish in a continuously varying environment. An autoassociative recurrent or heteroassociative network, as the one depicted in Figure 3.12, can store a pattern trough a hebbian-like LTM mechanism, with the proviso that maximum learning is obtained when the activation patterns co-occur in time. The following question then arises, namely how can two representations which are disjointed in time be ever correlated and mutually reinforced.

Figure 3.12. Diagram representing either an autoassociative recurrent or heteroassociative network, depending on whether the two sets of units represent the same population (autoassociative) or different populations (heteroassociative).

 

The relationship between timing, representations of stimulus traces and the hippocampus has been proposed by several authors (Zipser, 1986; Grossberg and Schmajuk, 1989; Grossberg and Merrill, 1992, 1996). Zipser (1986) proposes that a chain of neurons exists in the hippocampus which acts as a delay line. In Figure 3.13, a conditioned stimulus, CS(t), consists of a short block pulse derived from the onset of the CS and injected into the delay line. CS(t) then slowly propagates down the delay line, activating each neuron after a 50 ms delay.

Figure 3.13. Basic structure of the hippocampal delay line model of adaptive timing as proposed by Zipser (1986).

 

In Figure 3.13, NM(t) is the nictitating membrane response produced by the US, which acts as teaching signal. The goal of learning in the model is for the hippocampal response to match NM(t). Learning of the pathways described in Figure 3.13 is obtained by adjustment of synaptic weights through the following difference equation:

Rewriting this as a differential equation, this learning law can be seen to be of the form of outstar learning (Grossberg, 1969a).

In these equations, LTM is gated by the CS at a given delay along the delay line, namely the CS representation (t-d) time steps ago. An active representation at a give delay is then multiplied with the US, and hebbian learning with decay determines the growth or the decay of the LTM trace.

Grossberg and Schmajuk (1989) and Grossberg and Merrill (1992) proposed a model of timing in the hippocampus, which they called spectral timing. The spectral timing mechanism differs from the delay line model of Zipser in several ways. Instead of a short block pulse CS, Grossberg and Merrill hypothesized an STM storage site s(t) which converts both trace and delay CSs into step signals. The delay line of Zipser is replaced with a two-step process involving a set of independent activities xj(t) with a spectrum of different reaction rates aj, j=1,2,..,n. These activities are then gated by habituative transmitter gates yj(t). The gated activities f(xj)yj(t) all begin to grow at the onset of a CS, but reach their peak values at a range of distinct times, after which they decay. These gated activities trigger learning by long term memory (LTM) traces, zj, in their respective pathways. Some gated activities are large at the particular ISI when the US becomes large; others are small. The temporally well-correlated LTM traces grow large, whereas others do not. Each LTM trace multiplies its gated signal, thereby amplifying the signals in those pathways that, at least partially, correlate with the US. All the LTM-gated signals from the entire population add up to form the total output, R. Although no individual signal is well-timed, the population response peaks at the ISI and exhibits the key behavioral properties of peaking at the time the US is delivered. An illustration is represented in Figure 3.14.

Figure 3.14. This figure illustrates the idea of how an independent spectrum of activation of chemical concentrations can help to a) bridge the gap between CSi and US and b) create the substrate of a differential learning of timed signals. In the figure, a few “spectra” intersect the US presentation. Reinforcement and temporal summation of those spectra allows the development of a adaptively timed signal (inset).

 

 

An idea similar to the one proposed by Zipser and Grossberg was already incorporated in the model proposed by Sutton and Barto (Sutton and Barto, 1981), in which a “non-stimulating trace” was postulated in order to bridge the temporal gap between CS and US. The distinction between stimulating and non-stimulating traces was intended to stress the difference between two different operating modes of the neural assembly, namely of the “normal” mode in which activation propagates between nodes, and an “offline” mode in which traces are generated which do not excite/inhibit other neurons unless  an event occurs (US). This idea of stimulus traces is very similar to the Ca++ spectra which are proposed to be the physiological substrate of the spectral timing in Grossberg’s model, and is a further convergence of consensus on the idea that some sort of mechanism should exist in order to bridge the temporal gap between two events disjointed in time.

The model proposed by Zipser, Grossberg and Sutton and Barto are a partial response to the problem of trace conditioning, and do not assume any relationship between trace conditioning and higher order functions, like cognitive control, which is a major assumption of this work. For this reason, the hippocampal mechanism, despite its appealing properties, is not embedded in a more global architecture which is necessary to explain how higher functions, like STM and therefore cognitive control, develop from this basic building block. Furthermore, the role of CA3, dentate gyrus, CA1 and Pfc is not developed, with a resulting fuzziness in defining the biological grounding of the models. This work aims to show, among other things, how embedding the hippocampal complex into a general framework of cognitive control can help bridging the gap between behavioral, anatomical and physiological data.  

 

3.6  A unifying model for trace conditioning: cerebral cortex, hippocampus and basal ganglia interplay in cognitive control

 

In this paragraph a qualitative, as well as a quantitative characterization of the model is presented. The brain areas modeled are shown in Figures 3.15 and 3.16, whereas the schematic of the model is depicted in Figure 3.17. The areas modeled, as well as the granularity of the representation, constitute the minimal architecture capable of explaining the target dataset, and are consistent with the anatomical and neurophysiological evidences described in the previous sections.

Due to the extent of the areas modeled and the complex interactions between them, the presentation of the model is organized in two separate sections:

1)                            The first section includes the areas that are responsible for learning the correlation between CS and US, and their temporal relationship. This section of the model includes anterior and posterior cortex, VTA, Nucleus Accumbens, the hippocampal complex, and and the Entorhinal cortex. The equations that define this first set of areas are further split in two functional subset, defining a positive and negative phase.

2)                            The second section includes Premotor and Motor cortices, and the Basal Ganglia complex. This section of the model accounts for the development of an adaptively-timed release of an action. This is a crucial feature of cognitive control, and allows the system to release a motor output at a time that is appropriate to obtain the reward.

 

Despite the model is intended to be a minimal architecture for explaining the target database, the complexity of the interactions between the various components is a clear obstacle to the understanding of its dynamics. It must be kept in mind, however, that the larger the set of biological and behavioural data a system is designed to explain, the more complex the system would be. Nonetheless, the biological systems that the model aims at simulating are several orders of magnitude more complex than the present model. Finally, it must not be forgotten that the discretization of brain functions that is often found in psychology, neuropsychology and, in general, in the neuroscience community, is a rather arbitrary process. Therefore, the reader that would try to attach a “label” to the function of a given cortical or subcortical area described in this model would be confused by the massive amount of feedback present in the architecture. This high degree of recurrency, a characteristic shared by the nervous system, is a concrete obstacle to an explicit, verbal labelling of the components of the network in terms of the aspect of the behavioural performance they control. Given this proviso, a frame of reference of the main mechanisms of the model, as well as an approximate description of the behavioural variable that the model component controls, is given below.

 Sensory input enters the system trough the posterior cortex, the sensory interface of the system. From this simplified, 3-node-structure, the signal travels in two directions: its cortical pathway to the anterior cortex (Pfc), and its subcortical pathway to the hippocampal complex. The hippocampal complex is the first subcortical stage of the system, which is involved in bridging the temporal gap between stimuli that occur disjointly in time.

The cortical route, which in the model is intended to mimic the fast cortico-cortical pathways that are believed to be the main “drivers” of behaviour, proceeds from Pfc to the premotor cortex (PM) and, from here, to the motor cortex (M), the final output stage of the system. From PM the second, major subcortical projection arises: PM projects to the basal ganglia complex (BG), which in turn gates the motor output by exerting a powerful control over the thalamic nucleus that gates the motor output. One of the main characteristics of the model is the interplay between cortical and subcortical structures in the control of the behavioural output. The hippocampal complex provides the neural substrate for bridging the temporal gap between environmental stimuli (CSs) and relevant events (USs) which the organism should learn to approach/avoid. This is achieved by a complex mechanism, which includes the Entorhinal cortex (EC), which receives and stores in a reverberatory network the sensory input, the Dentate Gyrus (D), which produces a spectra of activation for each sensory representation. When an unexpected US is delivered (for instance, a reward), a DAergic spike is produced from an innate pathway in VTA which triggers learning of the signals generated by the presentation of the CS in the hippocampus. A few repetitions of the CS would allow the D to produce an output which peak would be approximately timed with the time of delivery of the US. This signal is conveyed to CA3, and from here to the Nucleus Accumbens (NAc), where the inhibitory connections of this nucleus inhibit the VTA and, therefore, control the level of DA in the system, which in turn control STM and synaptic plasticity. The suggested role for the NAc is, therefore, to control the level of DA accordingly to the predictability of the stimulus: highly predictable stimuli tend to cause no response in the DAergic pathway, a fact that is documented in the literature (Shultz, 2000) and is mirrored by the behaviour of the model. DA is also crucial in Pfc functioning, as demonstrated in Chapter 1. STM reverberation is governed by DAergic activity, which is in turn governed by Pfc itself and the hippocampus through the NAc. Pfc can control its own levels of DA due to the learnable pathway to VTA. This pathway allows Pfc to maintain a high level of DA when an important stimulus (a CS paired with a US) is presented. At the same time, the DAergic activity can be suppressed, and therefore plastic changes in the opposite direction (LTD) can occur whenever a CS that is usually followed by a US is now unpaired with the reward. The same mechanism that keeps VTA under inhibitory control (hippocampus and NAc) can now trigger unlearning of the CS-US contingencies.

The second subcortical pathway includes the Basal Ganglia complex and the thalamus (ventral-anterior/ventro-medial nucleus, THAL). This pathway allows the system to refrain from executing a learned motor output, e.g. reaching from the reward before the reward is administered. This adaptively-timed inhibition, a major feature of cognitive control, is learned trough the BG complex. The PM cortex is the target of the activation of Pfc, the area which “stores” the behavioural plan to be executed. This in turn triggers activation of several PM representations over time, which are in turn broadcasted to the BG and to motor cortex (M). M produces a motor output whenever a) it receives input from PM and b) the THAL loop is closed. Many of the motor outputs produced in this way would be badly timed, so that the US would not be delivered. When one of the motor outputs is produced, driven by an appropriately timed PM representation, the subcortical pathway that leads to the opening of the thalamo-cortical loop is reinforced, while at the same time the ones that lead to non-reward are weakened.

A detailed implementation of the above-summarized mechanisms is given below. 

 

1- First section

 

The following equations define the activation of posterior (x) and anterior (y) pyramidal neurons, VTA (VTA), Nucleus Accumbens (NAC), Entorhinal cortex (EC), Dentate Gyrus (D), CA3 (CA3), the neurotransmitter from Dentate to CA3 (q), Dentate to CA3 LTM weights (), the LTM connections between posterior and anterior cortex (),CA3 to CA3 LTM weights (wCA3).

 

Figure 3.15. The brain areas included in the model (lateral view).

 

Figure 3.16. The brain areas included in the model (medial view).

 

 

Figure 3.17. A diagram o the main components of the model. Variable names are indicated in italic.

 


(18)

 
In the equation, [w]+ and [w]- stand for the absolute positive and negative rectified values of h, respectively. The function f(h) is defined as follows:  

 

where = 1 and n varies according to the equation. Appendix II has a detailed description of the parameters used in the equations. The function f(w) is a generic sigmoid function, whose parameters are defined by  and n.

The equations describing the model are listed below and are grouped in two functional stages:

-         “Positive” phase: in this stage, the system works in the conventional feedforward fashion. The sensory input enters the posterior cortex, travels trough the Pfc (cortical pathway) an reaches the subcortical structures through the EC (posterior cortex to EC) and the Pfc (Pfc to VTA). The adaptive timing mechanism in the hippocampus ensures that LTM connections are formed in this area when a DAergic spike (caused by a US or by a learned CS) is generated in VTA. The adaptive timing ensures that associations between events that occur disjointly in time can be formed. These associations influence cortical stages in the “negative” phase, when consolidation of memory occurs.

-         “Negative” phase: this phase is interleaved between positive phases, and occurs when no input is delivered to the system. In this stage, CA3 broadcast back the learned associations formed during

Figure 3.18. Positive phase. LTM consolidation is indicated in red. Learning of CS-CS and CS-US contingencies is modulated by the VTA DAergic activity, such that LTM weights of units co-activated in time are strengthened. The hippocampal complex would ensure that temporally disjoint events can be associated. Pfc (anterior cortex) is implied in the positive phase, but in this diagram the subcortical aspects are highlited.

Figure 3.19. Negative phase. a) The associations learned in the hippocampal complex during the positive phase are “broadcasted” back to cortex for LTM cortical consolidation. b) The relaxation process in CA3 allows recovery of stored associations trough recurrect circulation of activation (relaxation).

 

Figure 3.20. Positive (performance) and negative (consolidation) phases.

 

the positive phase. This allows cortico-cortical connections to be strengthened, and allows the system to rely on fast, feedforward cotico-cortical connections to perform the task.

 

Alternatively, this phases can be considered analogous to a “waking” (or performance) and a “sleeping” (consolidation”) phases, or can be thought as different functional stages enforced by some oscillatory carrier. The idea is that these two different functional stages are necessary in order to allow the system to perform two functions (performance and memory consolidation) which are incompatible in the same set of equations, but become consistent when disjoint in time. The parameters of the equations are listed in the Appendix I. When necessary, equations will be complemented by diagrams in order to clarify the “anatomy” and the functional relationships between model components. 

 

 

 

 

 

 

 

 

-         POSITIVE PHASE

-         Posterior cortex (x):

Pyramidal neurons of the posterior cortex (xi) project to the anterior cortex (yj) trough adaptive, modifiable connections zij.

(19)

 

 


           

The activation of the cell is normalized by lateral inhibition, and is dragged towards –x trough lateral inhibition and the reset signal. This reset signal is produced at the end of the trial in order to reset the system. The use of this expedient allows to greatly reduce the computational time of the simulations by stacking multiple trials in a short time frame. In this model, only 3 units are used in the posterior, anterior, premotor and motor cortex.                        

 

 

 

 

 

 

 

 

 

 

 

 

-         Anterior cortex – Pfc (y):

(20)

 

 

 


(21)

 

where  is the weighted input from the posterior cortex,  is the reverberatory, STM component of the network (see first part of the thesis),  is the gating function of DA, and  is lateral inhibition. The positive (peak) and negative (dip) DAergic component are defined as:

 

(22)

 

 

            The equations above state that the level of DA in Pfc is equal to a tonic level when the activation of VTA is positive, or equal to  when VTA is inhibited, where  is a constant (see Appendix II). See Dreher and Burnod (2002) for a discussion of the time course of DA washout in Pfc.

-         Posterio-to-anterior cortex LTM weights ():


(23)

 

 


(24)

 
where  is a VTA-gated hebbian learning, and  is a VTA and pre-synaptically gated decay. Synaptic weights therefore grow towards Bw through  whenever the quantity defined by  is positive. In the equation above and all other equation, VTA+ and VTA- are defined as:


(25)

 
 


where [VTA]+ and [VTA]- stand for the absolute positive and negative rectified values of VTA. According to Equations (24) and (25), the activation of VTA gives rise to a spike or a dip in VTA depending whether VTA activation is above or below zero. VTA spikes (VTA+) and dips (VTA-) have different, opposite effects on STM maintenance and synaptic plasticity. In Equation (23), the LTM weights are dragged towards Bw when a) pre and post-synaptic cells are active and b) a VTA spike has occurred. LTM weights are reduced when the pre-synaptic cell is active, and a VTA dip has occurred. Notice, in fact, that a depressed VTA would result is a positive quantity VTA-, which will drag the weights towards -

 

 

-         Ventral Tegmental Area - VTA (VTA):

(26)

 
 


 


where  describes the fact that VTA is excited by a primary US (innate pathway), as well as by learned CS (from the 3 anterior cortex units to VTA trough the LTM pathway ). The LTM weight between Pfc and VTA is defined by the following equation:

(27)

 

 


The equation above states that  increases with coincident activation of Pfc and VTA, and decreases both passively and actively when a VTA dip occurs trough the term .

 

 

 

 

 

 

 

 

(28)

 
- DA in hippocampus (Now-Print signal, N, inhibitory interneuron, l):


(29)

 

           

The Adaptive Timing (AT, or spectral timing) model has been described by Grossberg and coworkers in several works (Grossberg and Schmajuck, 1989; Grossberg and Merril, 1992, 1996). The present model uses as a building block a modified version of the AT wherever a temporal gap between cell activation should be bridged. Learning in the Adaptive Timing mechanism is modulated by a Now-Print signal N (Grossberg and Merril, 1996). This neuron-like element is the rectified difference between VTA activation and a slowly-varying interneuron l, and  is a constant. The net result of this 2-component-interaction is a signal that varies sharply whenever a variation in VTA activation occurs. Therefore, these equations can be considered as a neuron-like implementation of a differentiator. Major details on this model can be found in Grossberg and Merril (1992, 1996).

Figure 3.21. The EC-Hioppocampus complex.

 

 

 

-         Entorhinal cortex (EC):

(30)

 

 

 


The EC is a recurrent network excited by the posterior cortex. The highly recurrent architecture has the consequence of setting a great convergence of information at his cortical stage, confirmed by anatomical and physiological data (see Hasselmo 1995 for a review). EC constitutes the source of input for the hippocampal complex and, therefore, for the AT mechanism.


(31)

 
- Entorhinal cortex LTM ():

 

where  is a hebbian learning component gated by N. In Equation (31), the synaptic weight grows when the product  is positive, namely when the EC neurons are active and when the Now-print signal N is positive. The LTM weights allow learning of stimulus contingencies in EC when a Now-print signal is generated.

 

 

 

 

 

- The spectral timing mechanism in the hippocampus: Dentate Gyrus, depletable neurotransmitter, CA3, and LTM weights from Dentate to CA3

 

The figure below describes the main dynamics of the AT mechanism. In particular, the output of the system to a CS (whose activation is kept high trough EC reverberatory activity) that has been paired to the US is progressively shifted towards the time of delivery of the US.

 

Figure 3.22. The Adaptive timing mechanism. From top-left to bottom-right: CS; US; activation of first cellular stage (x, in the model Dentate Gyrus); depletable neurotransmitter (y, in the model the neurotransmitter between Dentate and CA3); the product of the activation of Dentate and the neurotransmitter; the gated product of the activation of Dentate, the neurotransmitter and the adaptive LTM weights between Dentate and CA3; final output of the system. Notice that after 1000 iterations the output peaks at the time of the US. Each iteration correspond to 1 simulated millisecond.

 

 

 

 

 

 

 

 

- Dentate Gyrus - (D):

(32)

 


where  is the nth variable that scales the activation in Dn, determining the rate at which Dn grows/decay after stimulation from EC. For each n, where n ={1:20}, the value of  takes one a velue between 0.00001 and 0.2 (see Appendi II for the whole parameter listing). Each  therefore scales the magnitude of the nth activation of D, therefore creating a spectra of activation functions, in which the activation of different cells will peak at different time. The Dentate Gyrus is stimulated by the EC, and in turn projects to CA3. The Dentate has been documented as a structure where a “code expansion” might take place due to the higher number of Dentate cells with respect to EC and CA3 (Berger, Berry and Thompson, 1986). The suggestion of this model is that this code expansion occurs in the time domain, trough the creation of a spectra of cellular activation that can be adaptively timed to a rewarding signal. 

 

(33)

 
- Neurotransmitter from Dentate to CA3 - (q):


where  is the accumulation process of the neurotransmitter towards 1, and  is the depletion of the neurotransmitter which is proportional to the pre-synaptic activity gated by the transmitter itself. In Equation (33), a sigmoidal function of the nth Dentate nucleus activation  is multiplied by the available neurotransmitter qn to determine the depletion of neurotransmitter, which is therefore proportional to the strength of the signal and the available neurotrasmitter.  

 

- Dentate to CA3 LTM weights - ():


(34)

 
 


where  is a hebbian term which increases the LTM weight when the Dentate is active and a DAergic spike occurs and decreases when a dip occurs. These LTM weights ensure learning of the appropriate AT signal when a CS enters in association with a US.

 

- CA3 - (CA3+):


(35)

 

 

The activation of the 3 CA3 cells consists, in the active phase, of the sum of the dentate activation gated by the neurotransmitter levels and the LTM weights.  

 

(36)

 
- CA3 LTM weights (wCA3):


where  is a hebbial learnng component scaled by  which is positive when VTA+ is positive and the CA3 units are coactive. The term  is analogous to the previous term, but in this case the change is gated by a DA dip trough the term VTA-, ad the sign of the synaptic change is negative. CA3 to CA3 weights are stored trough Hebbian learning in this phase. These associations will drive cortical learning in the passive phase.

 

- NAc - (NAC):

 

(37)

 

(38)

 

(39)

 

 


The Nucleus Accumbens receives projections from the hippocampus trough the subiculum, the major output pathways of the hippocampus (for a review, Zahm, 2000). Equations (37,38) are analogous to Equations (28,29). In fact, the variables h and u are governed by the same dynamics that were used for DA in the hippocampus (Now-print signal). These equations enforce the NAC to emit an output which is a time-derivative of its own activation. This property enables the NAC to emit an output only when a substantial, above threshold variation in its activation occurs. In Equations (38,39),  and are two constants.

-         NEGATIVE PHASE

 

In the negative phase, the associations that have been learned in CA3 during the positive phase are “broadcasted” to the cortical stages, where they influence cortico-cortical connectivity. The idea behind this stage is that associations between events which occurred disjointly in time can now be stored as associations between cortical activations. Cortico-cortical dynamics are fast, and occur during normal, on-line performance. Therefore an association that has been temporarily stored in the hippocampus should be transferred to cortex in order for it to influence behavior.

            This process occurs in the negative phase. In this model negative phases occur between positive phases in an interleaved fashion, in particular between trials. This strategy is used in order to reduce the computational time, and an alternative scheme can be proposed in which negative phases occur with a different rhythm. The following equations describe the dynamics of the negative phase.

 

(40)

 
- CA3 activation (CA3):


where ij, ik. After setting the activation of all CA3 neurons equal to 1, the activation of the recurrent network in CA3 is relaxed for 100 epochs. This procedure allows associations stored in the recurrent LTM connectivity in CA3 to be recovered in the passive phase.

 

(42)

 

(41)

 
- Cortico-cortical learning (from posterior to anterior cortex):

(43)

 

 


After setting the activation of xi = yi = CA3i, learning of cortico-cortical connections is enabled based on this cortical activation. Equation (41) defins the activation of posterior and anterior cortex as the rectified average of CA3 activation. The LTM weights from posterior to anterior cortex are also normalized in Equation (43). Initial cortico-cortical weights wc = .33.

2- Second section

 

            The second section of the model describes the areas which are involved in the generation and the control of the motor output. After a plan has been stored in Pfc, the excitatory connections between Pfc and Premotor cortex enable the execution of the motor plan, gaining access to the final stage of the system, motor cortex. However, remember that the behavioral task (Figure 3.23) requires a delay to be interposed between the presentation of the CS and the actual release of action. The main question is now how the system, which has learned that a CS is associated to a US, can now refrain from releasing an action in an inappropriate time, even though that action is adaptive and leads to a reward when appropriately released. This task is therefore a prototypical example of the more complex and articulated concept of “cognitive control”.

 

Figure 3.23. The task. a) shows the general setting of the task. In b), a reward is delivered when the response falls in the admissible delay, whereas in c) and d) a premature or excessively delayed response do not lead to reward.

 

The main target of this second set of simulations is to test whether the Basal Ganglia (BG) complex can contribute to allow the system to cope with the task. The BG is a collection of subcortical nuclei that are involved in the control of movement. This term was once used to describe all the large nuclear masses in the midbrain, including the thalamus, but has become restricted to describe five of these nuclei that share a similar functional architecture: the caudate, putamen, globus pallidus, subthalamic nucleus, and substantia nigra. The BG receives no direct sensory input and sends little direct output to the spinal cord. Rather, their primary input comes from the cerebral cortex and is sent back to

Figure 3.24. The areas modeled in the second section.

 

the cortex via the thalamus forming various parallel loops. The importance of the BG in the control of movement first became apparent in clinical studies of patients with a specific set of movement disorders, which have become known as Parkinson’s disease and Huntington’s disease. These syndromes rarely included the loss of a specific motor function, such as the movement of one’s hand, which may be caused by localized damage in the primary motor cortex. Rather, they appeared to involve mainly deficits in the general control and initiation of movement. While early theories view the basal ganglia as having only a modulatory effect on motor control, more recent research has implicated the basal ganglia as having many important roles in the contextual analysis of the environment and the use of this information for the formation and execution of motor programs and other aspects of intelligent behavior (Houk, 1995). Some of the roles hypothesized include: sensory-motor associative learning, operant or instrumental conditioning, reinforcement learning, procedural learning, adaptive timing, temporal order learning, the formation of temporal sequences actions, choosing between competing actions, the initiation of voluntary movement, planning, working memory, and even volition (see Heyder et al. 2004 for a rcent review).

Figure 3.24 shows the main cortical and subcortical areas modeled in this second section. In particular, two cortical areas (Motor and Premotor) were simulated, in which the Premotor cortex is simply a copy of the activation of Pfc obtained in the first experiment. It is assumed that positive and negative reward signals were generated in the first set of experiments (described in Results, 3.7),  when the action was performed in the appropriate delay (see results for more details on the simulation procedure). The Premotor cortex (PM) projects both to Motor cortex (M) and to the Basal Ganglia complex (BG), in particular to the Striatum (Str), the input structure of the BG. The BG simulated in the model includes three families of inhibitory interneurons: Str, Globus Pallidus internal and external segments (GPi and GPe, respectively).GPe, the output nucleus of the BG, projects trough inhibitory connections to the thalamic nucleus of the motor cortex (Ventral Anterior/Ventral Lateral Thalamus -THAL).

A motor output can be generated by M only when the PM activation is complemented by THAL, which will be the result of the balance of inhibition and disinhibition coming from the BG. Since all neurons modeled in the BG are inhibitory interneurons, the net effect of GPe (the output stage of the BG) on the THAL will depend on the relative balance of inhibition in the BG circuit, which can be shifted due to learning.

            The BG complex, the key structure proposed here to implement the concept of control, is characterized by two main pathways which converge to the GPe and determine the level of inhibition of the THAL and, therefore, the release of action following a PM input to the striatum (Figure 3.24):

1) The direct pathway (PM → Str → GPe → THAL → M), trough which the PM excite Str, which in turn inhibits the GPe and releases the THAL. This pathway’s net effect is to allow the release of a motor output.

2) The indirect pathway (PM → Str → GPi → GPe → THAL → M), in which the extra inhibition provided by the GPi inverts the sign of GPe output on THAL. This pathway’s net effect is to prevent the release of a motor output.

How does the BG complex control an AT release of an action by dynamically shifting the balance between direct and indirect pathways? How is the AT response shaped by rewards and punishments, and finely tuned to the release of action in such a way that the motor output occurs when needed?

            The proposed mechanism is based on a mechanism analogous to the AT mechanism implemented in the hippocampus. The Str implements a spectra of delays triggered by PM, analogously to what the Dentate did in the hippocampus when stimulated by EC. PM is implemented as a population of cells which send motor plans to motor cortex after Pfc activation.

In response to a given Pfc plan, PM generates different output at different times. Some of these actions are appropriately timed in order to get a reward, some are not.

The main functional role of the BG is therefore to inhibit those motor plans generated in PM which lead to non-reward, while at the same time allowing those appropriately timed. The BG achieves this result by selectively closing the THAL gate from PM to M trough inhibition.

PM projects to the BG through the Str. The output of the Str is then conveyed to the GPi and the GPe. When an action is generated, the outcome can be either positive (reward) or negative (punishment). Reward and punishment are coded by a DAergic spike and dip, respectively. These signals cause an opposite effect on the direct and indirect pathways:

a)      A DAergic spike cause LTM strengthening of Str → GPe inhibition, and the consequent release of the THAL, with release of motor output. 

b)      A DAergic dip cause a LTM strengthening of Str → GPi inhibition, which will result in a GPi disinhibition and a net inhibition of the THAL, with suppression of the motor output.

 

A physiological justification of the differential effects of DA on synaptic plasticity in the BG can be found in Brown et al. (Brown, Bullock and Grossberg, 1999).

            The basic mechanism of the BG would be, therefore, to inhibit those motor plans generated in PM which lead to punishment (or non-reward). Below the model components are explained in detail.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

  • Premotor cortex (PM)

 

For simplicity, this stage is coded by a population of cells that are active at different times after a Pfc stimulation (a plan) has been generated.

Figure 3.25. The relationship between Pfc ad PM.

 

PM activation is then broadcasted to M and Str.

 

  • Striatum (Str) and GPi

 

(44)

 
The Str is composed of two subsections (which can be considered analogous to the “patch” and the “matrix” of the Striatum (Brown, Bullock and Grossberg, 1999). The section of the Str projecting to GPe is tonically active (= 0.9), providing a “default” inhibition to the GPi, which in turn disinhibit the THAL and allows a motor output to be generated. The section of the Str projecting to the GPi is instead governed by an AT mechanism of the sort described in the Dentate of the hippocampus. The activation of the Str obeys an AT dynamics:


(45)

 
where  is the nth variable that scales the activation in Strn (analogously to Equation 32), determining the rate at which Strn grows/decay after stimulation from PM. The Str is stimulated by the PM, and in turn projects to GPi. The neurotransmitter from Str to GPi is modeled as:



(46)

 
where  is the accumulation process of the neurotransmitter towards 1, and  is the depletion of the neurotransmitter which is proportional to the pre-synaptic activity gated by the transmitter itself.

 

(47)

 
where  is a hebbian term which increases the LTM weight when the Str is active and a DAergic spike occurs and decreases when a dip occurs. Finally, the activation of GPi obeys the following equation:


This set of equations ensures that a DAergic spike (dip) which occurred in an arbitrary time after a Str neuron was excited by PM to reinforce (weaken) the → GPi pathway, with the net result of opening (closing) the THAL gate and allowing the release (blockage) of motor output. 

 

  • Globus pallidus – external segment (GPe)

 

(48)

 

The GPe, the output nucleus of the BG, is tonically active, tonically inhibited by , and phasically inhibited by GPi.

 

  • Ventral Anterior/Ventral Lateral Thalamus (THAL)

 

Thalamic activation is simply defined by:


(49)

 
 


      where =0.1. THAL will be normally open unless GPe is excited, which will in turn depend on internal BG dynamics.

 

  • Motor cortex (m)

 

(50)

 

Finally, motor cortex, the output stage of the system, is equal to:

 

This simple equation states that M will be active whenever 1) a PM activity is generated and 2) the THAL gate is open.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

3.7  Simulation results

 

The following set of simulations is aimed at testing the ability of the network described in the previous paragraphs to autonomously learn a contingency between a CS and a US, and in a second time develop the ability to adaptively time the execution of a correct action. Furthermore, since the system is biologically inspired, the network elements should exhibit some properties of the neural substrate they are supposed to model.

Figure 3.26: The network is presented with 20 CS-US pairings, followed by 30 CS-alone presentations (extinction trials). In the training trials, the CS is on from 1-50 msec, US on from 150 to 200 msec (ITI = 400 msec). In the extinction trails, the CS is presented for 50 msec, with ITI = 550 msec. Each trial therefore lasts 600 msec in both learning and extinction trials. Left column, from top to bottom: CS (blue) and US (red) presentation; weight from Pfc to VTA; VTA (blue) and NAC (red) activation, NAc activation alone (notice different scale); right column, from top to bottom: posterior cortex activation (blue = CS, red = US), anterior cortex (CS1 = blue, CS2 = magenta, US = red), weight between anterior and posterior cortex (only weights from x1→y1, x1→y3 and x3→y3 are shown), CA3 activation, CA3 weights. In the figure, W 1=1, 1=3, 3=1 and 3=3 represent synaptic weights between the respective CA3 neurons.     

 

The experimental protocol is split in two set of simulations. First, the network is presented with 20 CS-US pairings, followed by 30 CS-alone presentations (extinction trials). In the training trials, the CS is presented from 1-50 msec, and the US is presented from 150 to 200 msec, with an ITI of 400 msec. In the extinction trails, the CS is presented for 50 msec, with an ITI of 550 msec. Each trial therefore lasts 600 msec in both learning and extinction trials. The main results and network dynamics are shown in Figure 3.26. In the simulations, a second pathway (CS2) was included, although a CS2 was not used. This expediend was introduced in order to check for inconsistencies in the odel, namely to check wheter spurious learning could occur within this architecture.

            As can be seen from the activation in anterior cortex, STM activation gradually develops as the learning trials proceed, analogously to what has been shown in the model presented in Chapter 2. The difference with that model is that, this time, the STM maintenance is autonomously achieved. In the previous model, instead, the VTA activation was artificially shaped in order to achieve STM storage and update. The activation of anterior cortex is shown in Figure 3.27.

Figure 3.27: From top to bottom: input, y1, y2 and y3 activation (anterior cortex, or Pfc). Notice that y2, corresponding to the (non-presented) CS2, is silent.   

 

            STM storage is achieved as a function of the dynamics of VTA. Due to the highly recurrent architecture of the system, which parallels than characterists of biological neural systems, it is difficult to isolate what specific system or subsystem is “responsible” for a given functional property. Nevertheless, it is possible to tentatively point to some important structural and functional relationship that cause VTA activity to be shaped is such a way to allow STM storage in Pfc (Figure 3.28 and 3.29).

Figure 3.28: From top to bottom: input, VTA activation, NAC activation, net (rectified) VTA output. 

 

Figure 3.26 shows how the LTM weights from Pfc to VTA grow during CS-US pairings. This plasticity allows the CS to learn exciting Pfc when the CS itself is presented. This is due to the fact that synaptic plasticity is originally triggered by the US presentation trough an innate pathway, the CS acquires the property of an US, as postulated in the stimulus substitution theory (Pavlov, 1903, 1906, 1927, 1928).  

            STM is both learned (Figure 3.29 and 3.30) and extinguished (Figure 3.31). The mechanism of extinction can be easily related to the influence of the GABAergic projection of NAc to VTA. The NAc receives the AT output of hippocampus which is timed to the US. In learning trials, the inhibitory output of the NAc to VTA is contrasted with the US-dependent activation of VTA (Figure 3.29).

            In extinction, the output of the NAc is unopposed by the US-dependent VTA activation, whose lack now unmasks the effect of the GABAergic input. The result is an AT inhibition of VTA, documented in Figure 3.32, which also causes Pfc to extinguish STM of the CS. The VTA inhibition causes a dip in the DAergic transmission which triggers a process of plastic changes whose sign is now opposite with respect to the DAergic spike occurring in

Figure 3.29: Typical run in early training. From top to bottom: input (CS1 = blue, US = red), posterior cortex activation (x1(CS1) = blue, x2(US) = red), anterior cortex activation (y1(CS1) = blue, y2(US) = red), VTA activation (blue), NAC activation (red). Notice the STM activity in anterior cortex, and the peak of NAC activation at the time of the US. In early extinction trials, STM activity is still maintained due to the strong learned connections.

 

learning. Notice that, however, with the experimental protocol ad the parameter choice the LTM weight from Pfc to VTA does not decay completely. This observation can be consistent with the introspective experience that, once a stimulus has been conditioned to a given US (pleasurable/noxious stimulus), its “psychological” status is permanently changed, despite the output of the system does not necessarily reflect this permanent change.

            Figure 3.32 shows an interesting comparison between the model VTA and the DAergic cell discharge profile obtained by Schultz et al. (1997) from VTA. When a US is delivered to the animal (or the model), dopamine neuron are activated by the unpredicted occurrence of the US. When the CS predicts the reward, after being paired with the US, the DAergic spike occurs at the time of the CS. Figure 3.32 shows how the increased activity of VTA occurs at the time of the CS, but a residual activity is also present at the time of the US.

Figure 3.30: Details of learning trial. From top to bottom: input (CS1 = blue, US = red), posterior cortex activation (x1(CS1) = blue, x2(US) = red), anterior cortex activation, (y1(CS1) = blue, y2(US) = red),  VTA activation (blue), NAC activation (red). Notice the STM maintenance in anterior cortex and the adaptively timed activation of NAC that provides inhibition of VTA at the time of US presentation.

 

Figure 3.31: Late extinction trials. From top to bottom: input (CS1 = blue), posterior cortex activation (x1(CS1) = blue), anterior cortex activation, (y1(CS1) = blue, y2(US) = red),  VTA activation (blue), NAC activation (red). STM maintenance is greatly suppressed (except in one trial), as well as VTA activation.

 

Figure 3.32: Comparison of the model VTA activation with DAergic cell discharge profile obtained by Schultz et al. (1997). When a US is delivered to the animal (or the model), dopamine neuron are activated by the unpredicted occurrence of the US. When the CS predicts the reward, after being paired with the US, the DAergic spike occurs at the time of the CS. In the model, the increased activity of VTA occurs at the time of the CS, but a residual activity is also present at the time of the US. Finally, in a trial when only the CS is presented, a dip in the US activation occurs exactly at the time of the US presentation in both the data and the model. The plots of this figure were obtained by adding a small random noise to the VTA activity (0<noise<.5). This random noise represents the activity of other, non modeled cellular stages, as well as an intrinsic noise of the system. 

Interestingly, in trials when only the CS is presented (i.e., extinction trials), a dip in the US activation occurs exactly at the time of the US presentation in both the data and the model. The plots in Figure 3.32 were obtained by adding a small random noise to the VTA activity (0 < noise < .5). This random noise stands for the activity of other, non modeled cellular stages, as well as an intrinsic noise of the system which was not modeled in the simulations. 

           

            The second set of experiments tests how the motor-BG aspects of the model are intimately related to the concept of cognitive control. In the first set of simulations we has seen how a plan can been stored

Figure 3.34. a) Activation of a Pfc plan triggers PM activation at different time. b) Learning trough the BG circuit selectively weakens the output of those PM cells which are not appropriately times with the “admissible” range for performing an action. 

 

in Pfc. As a consequence, an action can now be triggered by frontal cortex through Pfc to primary and secondary Motor cortex connections. Remember, however, that the behavioral task (Figure 3.23) requires an adaptively timed release of action. The main rationale behind this second set of experiments is illustrated in Figure 3.34. The main idea is that activation of a Pfc plan triggers a spectra of PM activation (not explicitly modeled in the system) at different time after Pfc fires. Learning trough the BG selectively weakens the output of those PM cells which are not appropriately timed with the “admissible” temporal

Figure 3.34. From top-left to bottom-right: PM activation; reward; activation of spectra in Str; transmitter from Str to GPi; Str*transmitter gated signal; Str*transmitter*LTM from Str to GPi doubly gated signal; net output from Str to GPi. 

Figure 3.35. Left column, from top to bottom: M activation; Str activation; GPe; Right column, from top to bottom: THAL activation; GPi activation; PM activation. Motor cortex is inactive, and the action is not released.

 

Figure 3.36. From top-left to bottom-right: PM activation; reward; activation of spectra in Str; transmitter from Str to GPi; Str*transmitter gated signal; Str*transmitter*LTM from Str to GPi doubly gated signal; net output from Str to GPi. 

 

Figure 3.37. Left column, from top to bottom: M activation; Str activation; GPe; Right column, from top to bottom: THAL activation; GPi activation; PM activation. Motor cortex is active, and the action is released.

 

range for performing an action, whereas allows the performance of the actions which are appropriately timed. Figure 3.35 shows an example of how the BG circuit can electively weaken a PM representation which is not appropriately timed in order to obtain a reward. In Figure 3.34 the negative reward (DAergic dip), generated in the way seen in the first set of experiments, weakens the motor output consequent to that specific PM activation, as shown by the dip in Str activation. Figure 3.35 shows the behavior of the main areas involved. This situation should be contrasted with the one described in Figures 3.36 and 3.37, in which a reinforcement o fhe pathway leading to the correct response is shown. Figure 3.37 shows a PM response that is adaptively timed to the reward. This cause M to release the action, and LTM changes that allow opening of the THAL gate for this specific PM activation.

            These simulations shows how a similar mechanism of the type implemented in the hippocampus, namely AT, can be used in a fairly different context than in learning contingencies between CS and US. In this instantiation of the AT mechanism, the contingencies to be learned are a) a motor response and b) the consequence of the motor response. Since the consequence of a motor response can occur at an undetermined future moment after the motor action has been produced, a mechanism for bridging this temporal gap is required. The AT mechanism can therefore be seen as a general purpose machinery to solve a correspondingly general problem, namely bridging temporal gaps between neural representations.