Spike-Timing-Dependent Plasticity and its Reward

Transcription

Spike-Timing-Dependent Plasticity and its Reward
Spike-Timing-Dependent Plasticity
and its Reward-Modulated Variant
Dejan Pecevski, [email protected]
Institute for Theoretical Computer Science
Graz University of Technology
Neural Networks B, April, 2009
Agenda
• Spike-Timing-Dependent Plasticity
ƒ Experimental evidence
ƒ Model of STDP
ƒ Analysis of STDP: learning equation
ƒ Extensions of the model to fit additional experimental data
ƒ Functional role of STDP (hypotheses)
• Reward-modulated STDP
ƒ model of RM-STDP and the learning equation
ƒ Modeling a biofeedback experiment
ƒ Discrimination of temporal spike patterns with RM-STDP
Dejan Pecevski, Neural Networks B, April, 2009
2 of 38
2
Hebbian learning
•
synaptic changes are thought to be the
neurochemical basis for learning and memory
Hebb’s postulate
“When an axon of cell A repeatedly or
persistently takes part in firing cell B, then A’s
efficiency as one of the cells firing B is
increased”
Hebb, 1949
•
•
•
local rule
driven by the correlations of the firing between
cells
cells that are correlated form cell assemblies
Dejan Pecevski, Neural Networks B, April, 2009
3 of 38
3
Spike-timing-dependent plasticity
Experiment
•
•
•
•
Two connected neurons are stimulated to fire at specific times tpre (presynaptic nrn.)
and tpost (postsynaptic nrn.).
Such paired stimulations at tpre and tpost are performed repeatedly, always with same
interval between the corresponding spike pairs Δt= tpost-tpre..
After the repeated stimulation the change of the EPSP amplitude is observed.
The same procedure is repeated for different values of Δt, to examine how the
change of EPSP depends on Δt, and the sign of Δt.
Dejan Pecevski, Neural Networks B, April, 2009
4 of 38
4
Spike-timing-dependent plasticity
Results
•
If the pre-synaptic neuron fires in a time window on the order of 10s of ms before the
post-synaptic neuron, then the EPSP increases => the synapse is strengthened
(LTP)
•
If the pre-synaptic neuron fires in a time window on the order of 10s of ms
after the post-synaptic neuron, then the EPSP decreases => the synapse is
weakened (LTD)
Dejan Pecevski, Neural Networks B, April, 2009
5 of 38
5
Phenomenological model of STDP
•
We denote as Si(t) the spike train of neuron i:
•
The STDP model is defined through a learning window function W( Δt):
•
Each presynaptic and postsynaptic spike pair
instantaneous weight change
that happens at the time of the later spike
Dejan Pecevski, Neural Networks B, April, 2009
contributes an
.
6 of 38
6
Model of STDP
•
The weight change can be expressed as
•
Substituting for
and
we can calculate the total weight change in a time interval [0,T]
where we sum over all spikes of the presynaptic and postsynaptic neuron that are
within the interval [0,T] i.e.
Dejan Pecevski, Neural Networks B, April, 2009
7 of 38
7
STDP model by local variables
pre
j
t
x jpre
(f)
j
wij
yipost
i
post
ti( g )
update with every presynaptic spike
update with every postsynaptic spike
At a postsynaptic spike the weight change is proportional to the
and at presynaptic spike proportional to the
trace.
Dejan Pecevski, Neural Networks B, April, 2009
trace,
8 of 38
8
A biophysical model of STDP
•
•
•
Arrival of an action potential at the presynaptic terminal induces release of
the neurotransmitter glutamate into the synaptic cleft.
Glutamate binds with AMPA and NMDA receptors.
AMPA receptors open but NMDA are blocked by Mg.
Dejan Pecevski, Neural Networks B, April, 2009
9 of 38
9
A biophysical model of STDP
•
•
•
Depolarization of the postsynaptic membrane unblocks Mg allowing influx of
Ca2+.
Depolarization is caused by back-propagating action potential that travels
up the dendrites.
Change of Ca2+ concentration in the postsynaptic cell triggers processes
that change the synaptic efficacy.
Dejan Pecevski, Neural Networks B, April, 2009
10 of 38
10
Different STDP windows for different types of cells
Dejan Pecevski, Neural Networks B, April, 2009
11 of 38
11
Analysis of STDP
We assume that the presynaptic and postsynaptic spike trains are stochastic processes
drawn from a stochastic ensemble E
We compute the expected weight change over some time interval [0,T]
where
is the temporal average.
Instantaneous firing rate of neuron i:
correlations of firing of neuron i and j:
Dejan Pecevski, Neural Networks B, April, 2009
12 of 38
12
Analysis of STDP
•
The weights changes depend on the correlations between the inputs and the output of
a neuron
•
The output of the neuron depends on the input.
•
For simple neuron models (e.g. linear Poisson neuron model), it is possible to derive
analytically the weight changes given the statistics of the inputs.
•
If the input is weakly correlated with the output, the weight decreases.
•
Groups of inputs that are strongly correlated (and drive the neuron) are strengthened.
•
STDP selects inputs which are correlated on the timescale of the learning
window and the postsynaptic potential.
Dejan Pecevski, Neural Networks B, April, 2009
13 of 38
13
Weight dependence of STDP
•
•
From (Bi and Poo,1998)
•
The amount of change of the EPSC
amplitude depends on the initial EPSC
amplitude
•
The dependency is different for positive
and negative spike pairing
Model from Gütig et al. 2001 - smooth transition between an additive and
weight dependent STDP rule
Dejan Pecevski, Neural Networks B, April, 2009
14 of 38
14
Dependence on pairing frequency
•
15 bursts of 5 spikes at different
frequencies are induced in the neurons
with
ƒ
±10ms
Δt = +10ms
Dejan Pecevski, Neural Networks B, April, 2009
ƒ
pre-before-post spike pairing at
Δt=10ms
post-before-pre pairing at Δt=-10ms
Sjöstrom et al. 2001
Δt = -10ms
15 of 38
15
Triplet rule (Pfister
et
al.
2006)
pre
pre
j
tj
wij
i
post
y post
y post 2
tipost
second posts. trace
Dejan Pecevski, Neural Networks B, April, 2009
16 of 38
16
Triplet rule (Pfister et al. 2006)
Dejan Pecevski, Neural Networks B, April, 2009
•
reproduces experimental data for
STDP dependence on pairing
frequency from (Sjöstrom et al.
2001)
•
reproduces other experimental
data involving protocols with
triplets of spikes (Wang et al.
2005)
17 of 38
17
Functional role of STDP (hypotheses)
•
Formation of memories
•
Developmental learning – receptive field development
•
Stabilization of network activity – prevent blowing up the synaptic
weights (Abbott et al. 2000)
•
Reward-based learning – neuromodulation with a reward signal
(dopamine)
Dejan Pecevski, Neural Networks B, April, 2009
18 of 38
18
Formation of memories
•
synaptic changes are sensitive to
input-output correlations (Hebb, 1949)
•
STDP explains induction of LTP and
LTD but not its maintenance
•
Stability-plasticity dilemma
ƒ
ƒ
•
formation of new memories should be
possible, but
memories need to be retained and
stable
Long-term synaptic changes happen in
two phases:
ƒ
ƒ
induction (tagging), on the order of
seconds, e.g. by pairing protocols
consolidation, takes more than 60 min
Dejan Pecevski, Neural Networks B, April, 2009
19 of 38
19
Receptive field development
ν in
ν out
•
Gaussian profile of the
rates at the 100 inputs
•
the center of the Gaussian
is shifting randomly every
200 sec.
Pfister et al. 2006
•
The triplet STDP rule is used in the
synapses.
•
The neuron becomes selective to one
position of the Gaussian profile.
Dejan Pecevski, Neural Networks B, April, 2009
20 of 38
20
Reward-Modulated STDP
Dejan Pecevski, Neural Networks B, April, 2009
21 of 38
21
Reward-modulated STDP
•
Synaptic changes are dependent on a reward signal
•
Based on the experimentally found influence of neuromodulators
like dopamine on LTD and LTP (Izhikevich, 2007)
ƒ
•
Dopamine enables or enhances synaptic plasticity
Link between
Local synaptic changes on microscopic level (STDP)
ƒ Behaviorally relevant adaptive changes on macroscopic level that
increase the reward signal.
ƒ
Dejan Pecevski, Neural Networks B, April, 2009
22 of 38
22
Influence of Dopamine on Plasticity
•
Activity of dopaminergic neurons
code the reward-prediction error
(Schulz et al. 2002)
•
The DA signal is thought to carry
the reward in reward-modulated
STDP
Dejan Pecevski, Neural Networks B, April, 2009
23 of 38
23
Model of Reward-modulated STDP
•
The model is from (Izhikevich, 2007)
•
Weight changes by STDP are
collected in an eligibility trace
•
The actual weight changes are
triggered by a reward signal d(t):
Dejan Pecevski, Neural Networks B, April, 2009
24 of 38
24
Theoretical Analysis of Weight Changes
•
From (Legenstein,Pecevski,Maass,2008)
•
We treat presynaptic and postsynaptic spike trains and the reward signal as
stochastic processes.
•
We derive expected weight changes over some time interval T, taken over
ƒ
ƒ
stochastic realizations of the presynaptic and postsynaptic spike trains
stochastic realizations of the reward signal
denoted by the ensemble average 〈.〉E
is the temporal average of a signal f(t)
Dejan Pecevski, Neural Networks B, April, 2009
25 of 38
25
Theoretical Analysis of Weight Changes
•
Learning equation for reward-modulated STDP
where
is the average of the reward after a pre-postsynaptic spike pair, and
describes the correlations of the spike timings between neurons j and i.
•
Weight changes are driven by co-occurrences between rewards and spike
pairings within the time scale of the eligibility kernel function.
Dejan Pecevski, Neural Networks B, April, 2009
26 of 38
26
Biofeedback Experiment
Dejan Pecevski, Neural Networks B, April, 2009
27 of 38
27
Biofeedback Experiment by Fetz and Baker
[Fetz and Baker, 1973]
•
•
•
•
The spiking activity of a single neuron in monkey motor cortex was
recorded.
The current firing rate was made visible to the monkey in form of an
illuminated meter.
The monkey received liquid rewards for high firing rates.
The monkey learnt (within tens of minutes) to change the firing rate
accordingly.
Dejan Pecevski, Neural Networks B, April, 2009
28 of 38
28
Model of the experiment
•
We consider as model a recurrent neural circuit and we reward the spiking
activity of one neuron k.
•
A reward pulse of shape
is delivered to all synapses with a delay dr
every time the reinforced neuron produces an action potential:
Dejan Pecevski, Neural Networks B, April, 2009
29 of 38
29
Theoretical predictions
•
Linear Poisson neuron model is used in analysis.
•
Equation for the expected weight change for the reinforced neuron
Weights change according to STDP with a constant learning rate.
•
Equation for the expected weight change for the other neurons
Weights change according to STDP with a learning rate proportional
to the correlation with the reinforced neuron.
Dejan Pecevski, Neural Networks B, April, 2009
30 of 38
30
Simulation of the Biofeedback Experiment
•
We simulated recurrent circuit with 4000 LIF neurons.
•
Induced spontaneous activity of 4.6 Hz by injection of Ornstein-Uhlenbeck
Noise.
•
The circuit had 228954 conductance-based synapses with short-term
dynamics.
•
Reward-modulated STDP was applied to all 142813 excitatory-to-excitatory
synapses.
•
The 80 synapses of the reinforced neuron self-organize to increase the
firing rate of the neuron.
Dejan Pecevski, Neural Networks B, April, 2009
31 of 38
31
Simulation Results
The firing rate of the reinforced neuron increases from 4 to 11 Hz.
The average firing rate of 20 other neurons remains unchanged.
reinforced neuron
other neurons
Dejan Pecevski, Neural Networks B, April, 2009
32 of 38
32
Pattern Discrimination with
Reward-Modulated STDP
Dejan Pecevski, Neural Networks B, April, 2009
33 of 38
33
Model for Theoretical Analysis
•
Two patterns with one spike per input channel are presented to a neuron
•
The reward signal is
where
for pattern P and
Dejan Pecevski, Neural Networks B, April, 2009
for pattern N,
34 of 38
34
Theoretical Predictions
•
A linear Poisson neuron model is used in the analysis.
•
We estimate the expected weight change of synapse i for the presentation
of pattern P followed after time T’ by a presentation of pattern N
Result of the analysis
The variance of the membrane potential of the neuron is increased
for the positive pattern, and decreased for the negative pattern.
Dejan Pecevski, Neural Networks B, April, 2009
35 of 38
35
Simulations of the Model
Experimental setup:
•
•
•
LIF neuron with 100 afferents
Patterns of 500 ms duration
Randomly drawn spike times
for the patterns
without threshold
Vm(t) before learning
Vm(t) after learning
Results:
•
•
Var[Vm] and num. of spikes
for P increases.
Var[Vm] and num. of spikes
for N decreases.
with threshold
Dejan Pecevski, Neural Networks B, April, 2009
36 of 38
36
Training a Readout Neuron to Recognize
Isolated Spoken Digits
“one”
BSA
“two”
•
•
•
•
•
20 different utterances of digit “one” and “two”
Raw wave forms were transformed by a model of the cochlear hair cells
[Verstreaten et al, 2005].
The output analog signals were encoded as spikes with the BSA algorithm
[Schrauwen and Van Campenhout, 2003].
Cortical microcircuit model of 560 LIF neurons with noise.
Spiking readout (LIF) neuron connected to all exc. neurons in the circuit.
Dejan Pecevski, Neural Networks B, April, 2009
37 of 38
37
Results
•
•
Strong decrease of the number of
spikes for digit “one”
Slight increase of the number of spikes
for digit “two”
•
Increase of variance of Vm(t) for digit
“two” utterances
•
Decrease of variance of Vm(t) for digit
“one” utterances
before learning
after learning
Dejan Pecevski, Neural Networks B, April, 2009
38 of 38
38