A Theory for Cerebral Neocortex D. Marr Proceedings

Transcription

A Theory for Cerebral Neocortex D. Marr Proceedings
A Theory for Cerebral Neocortex
D. Marr
Proceedings of the Royal Society of London. Series B, Biological Sciences, Vol. 176, No. 1043.
(Nov. 3, 1970), pp. 161-234.
Stable URL:
http://links.jstor.org/sici?sici=0080-4649%2819701103%29176%3A1043%3C161%3AATFCN%3E2.0.CO%3B2-4
Proceedings of the Royal Society of London. Series B, Biological Sciences is currently published by The Royal Society.
Your use of the JSTOR archive indicates your acceptance of JSTOR's Terms and Conditions of Use, available at
http://www.jstor.org/about/terms.html. JSTOR's Terms and Conditions of Use provides, in part, that unless you have obtained
prior permission, you may not download an entire issue of a journal or multiple copies of articles, and you may use content in
the JSTOR archive only for your personal, non-commercial use.
Please contact the publisher regarding any further use of this work. Publisher contact information may be obtained at
http://www.jstor.org/journals/rsl.html.
Each copy of any part of a JSTOR transmission must contain the same copyright notice that appears on the screen or printed
page of such transmission.
JSTOR is an independent not-for-profit organization dedicated to and preserving a digital archive of scholarly journals. For
more information regarding JSTOR, please contact [email protected].
http://www.jstor.org
Mon May 7 13:53:02 2007
Proc. Roy. Soc. Lond. B. 176, 161-234 (1970)
Printed in Great Britain
A theory for cerebral neocortex
BY D.MARR
I'yinity College, Cantbridge
(Communicated by G. S . Brindley, F.R.8.-Received
0.
INTLGODTJCTION
0.1.
0.2.
0.3.
0.4.
0.5.
The form of a ne~nophysiologicaltheory The nature of the present general t,llcory Olitlincs of the present thoory Ilclinitions and notation Irlformatioir mcasnres 1.0.
1.1.
1.2.
1.3.
1.4.
1.5.
1.6.
Introdlxction Information thcoretic rcdundarlcy Concept formation and redundancy P~.ohlomsin spatral redundancy T'llc recoding dilemma Biologrcal litrllty Thr flindarncnta,l hypothesis 2. TBF FUNDAMENTAL
2.0.
2.1.
2.2.
2.3.
2.4.
2.5.
TFCEOREMS
Introduction Iliagnosis : g ~ n t r l i t i c s TElc notion of evidcilcc Thr dlagnobrs theorern Notes on the dragnosrs theorcm The irltcrprrtation tl-lrorrm :!. T ~ n cCODON RFPRFSP:NTATION
3.1. Simple synaptic drstr~butions 3.2. Qlxality of evidence from codon functions 4. T n e
4.0.
4.1.
4.2.
4.3.
4.4.
4.5.
GENERAL NEIJRAL REPRESJ3NTATION
Introduction lmplcmrnting the diagnosis theorem Codon flxnctions for cvidcncc Codon nelirotechrlology Iinplemcnting the intcrprctation theorrin Tl-rr f~xllnrliral model for dragnosrs and intcrprctatron [ 161 ]
2 IKar.c,h 1970)
5.1. Scttir~gu11 tllc rlcural reprcscntation: sleep
5.2. Thc spatlal recognizcr cffcct
5.3. T h e rciincrrlent of a clessifictttory u n i t
6. Norms
6.0.
6.1.
6.2.
6.3.
6.4.
6.5.
7.0.
7.1.
7.2.
7.3.
7.4.
7.5.
ON THE CEREBRAL N E O ~ ~ O R T E X
Introduction
Coiloil cells i n tlze cerebral cortex
The cerebral o u t p u t cells
Ccrcbral climbing fibres
Inllik~itorycells
Cencralitics
Introduction
MartinoLti cells
214
219
224
22.5
225
225
228
228
229
230
23 1
231
Ccrcbral granule cclls
Pyrarrlidal cctlls
Clirnbing tibrcs
Other short axon cells
It is proposod that the learning of marly tasks by the corebrrlm is based on using a very fen,
ft~ntiamontaltechniques for organizing information. I t is argued that this is matie possible
by the prevalence in the world of a particular kinti of retinndancy, which is charaoterized.
by a 'Funtlamontal Hypothesis'.
This hypothosis is used to found a theory of the basic operations which, it is proposed, are
carried out by the cerebral neocortex. They involve the use of past experience to form socalled 'classificatory units' with which to interpret subsequent experience. Such classificatory units are imagined to be created whenever either something occurs froquerltly in the
brain's experience, or enough redundancy appears in tho form of clusters of slightly differing
inputs.
il (non-Bayesian)information theoretic account is given of the diagnosis of an input as an
instance of a n existing classificatory unit, anti of the interpretation as such of an incornplotely
specificti input. Neural models arc devised to implement the two operations of diagnosis and
interpretation, anti it is fourltl that tho performance of tho scconti is an automatic consequence
of the model's ability to perform the first.
The discovery and formation of new classificatory units is discussed within the context of
thest, neural models. It is shown how a climbing fibre input (of the kind describeti by Cajal)
t o the correct cell can cause that cell to perform a mountain-climbing operation in an underlying probability space, that will lead it to respond to a class of events for which it is approprinte to code. This is called the 'spatial recognizer effect'.
The structure of the cerebral neocortox is revie~vedin the light of the mod01 which the
theory nstablishes. I t is founti that many olernonts in the cortex havo a natural identification
with elements in the model. This enables many predictions, with specified tiegrecs of firmness,
to be matie concerning the connexions and synapses of the following cortical cells and fibres:
Atastinotti cells; cerebral granulo cells; pyramidal cells of layers 111, V anti 11; short axon
cells of all layers, especially I, I V and VT; cerebral climbing fibres and those cells of the
cortex which give rise to them; cerebral basket cells; fusiform cells of layers VI and VII.
I t is shown that if rather little information about tJheclassificatory units to be formed has
been coded genetically, it may be necessary to use a technique called codon formation to
organize structure in a suitable way to rcprosent a new unit. I t is shown that under certain
conditions, it is necessary t o carry out a part of this organization during sleep. A prediction
is mado about the cffect of sloep on learning of a certain lrir~d.
232
A theory for cerebral neocortex
163
0. I . The form of a neurophysiological theory
'I'he mammalian cerebral neocortex can learn to perform a wide variety of tasks, yet
its structure is strikingly uniform (Cajal 191I ) . It is natural to wonder whcther this
uniforinity reflects the use of rather few underlying methods of organizing information. The present paper rests on the belief that this is so, and describes a kind of
analysis which is capable of serving many aspects of the brain's function. The theory
is necessarily general, but it in principle allows the exact form of the analysis for any
particular cerebral task to be conlputed.
Therc is an analogy between the shape of the general theory set out here, and that
of a recent theory of cerebellar cortex (Marr 1969). 'I'he essence of the latter theory
was a principle, that motor sequences are driven by learned contexts, which was
clearly applicable to the kind of function with which the cerebellum was thought to
be associated. The key ideas concerned the way information was stored, and the
way stored information could be used; but the theory did not explicitly demonstrate
how any particular motor action was learned. For this, it would be necessary to have
a much fuller understanding of the nature of the elemental movements for which the
Purkinje cells actually code, and of the information present in the relevant mossy
fibres. The theory was however useful, because it postulated the existence of a
'fundamental operation' of the cerebellar cortex, and offered a candidate for it.
The present theory is once removed from the description of any task the cerebrum
might perform, in the same way as was the cerebellar theory from the description
of any particular motor action.
Something of this kind is probably an inevitable feature of the theory of any
interesting learning machine, but in the particular case of the cerebral cortex, it is
likely there exists a second, more concrete analogy between its working, and that
of the cerebellar cortex. The evidence for this is the analogy between the structures
of the two types of cortex. The cerebral cortex is of course irregular and very
complicated, but there do exist similarities between it and the cerebellar cortex:
the fundamental cerebellar components--the granule cells, Purkinje cells, parallel
fibres, climbing fibres, basket cells and so on-have recognizable counterparts in
the cerebral cortex. I n view of the great power the codon representation possesses
for the economical storage of information (Marr 1969),i t cannot be that this analogy
is accidental. There muse bc a deeper corresponde~ice.
0.2. T h e nature of Ihe. present general theory
It was the suspicion that there may exist deep reasons for these similarities that
formed the starting point of the present enquiry. The motivation for the development of the theory was provided by two intuitions. The first was that i11the generalization of the basic cerebellar circuit, the analogue of the Purkinje cell (called an
output cell) need not have a fixed 'meaning'. I n the cerebellum, each Purkinje cell
probably has predetermined 'meanings', in that the responses its outputs can
11. Marr
evoke are liltely to be determined by embryological and early post-natal
development. 1x1 a more general application^ of this kind of model, i t is clear that
what the outpnt cell ' means ' might be free t o be determined by some aspect of the
structure of the information for which the system is being used.
The second intnition was that the codon representation, in the kind of model
applicable to the cerebellar cortex, may in fact be capable of doing more than the
simple memorizing 1 ask to which it can obviously be applied (Rlomfield & Marr
1970). This feeling was tied to the idea that the recognition of a learned illput ought
properly to be viewed as a process of diagnosing whether the current input belonged
to the class of learned inputs. This immediately suggests that the behaviour of an
output cell should not be an all-or-none affair, but should convey a measure
of how certain is the outcome of the diagnostic process. This has the attraction that
it could ultimately correspond to how 'like' a tree is the object a t which one is
currently looking.
These two ideas were bound by the constraint that rrlore or less whatever theory
was set up, i t had to be grounded in information theory; or if not firm reasons why
this is undesirable must be given. It was evident from the start that no very orthodox
information theoretic approach would be of any use; but the general ideas behind
the formulation of an information measure are so p)owerfixl that it would have been
surprising had they turned out to be totally irrelevant.
The result of these ideas was a general theory which divides neatly into two parts.
The first, with which this paper is concerned, describes the formation and operation
of a language of so-called classificatory units by means of which the sensory input
can eventually be usefully interpreted (gC 1).The lormation of a classificatory unit is
imagined to occur roughly whenever enough related inputs happen to make it
worth forming a special description for them. The main results arc the information
theoretic theorems of $ 2 on the diagnosis and interpretation of an input within a
class, and the theory of $5 for class formation. The power of these results is that
they lead to specific neural models, and to operations in those models, through which
a preliminary interpretation of the histology of cerebral cortex can usefully be
made.
The first part of tllc theory may therefore be described as a model for concept
formation and recognition, where concepts are ' classificatory units '. It argues that
there exists a basic information-handling scheme which is applied by the cerebral
cortex to a wide range of different kinds of information-that there exists a ' way'
in which the cerebrt~lcortex 'works '. This scheme has a wide application, silbjeut io
reservations about the need in certain circumstances for special coding devices to
cope with particular forms of redundancy. But in principle, it can be applied to
anything from t,he recognition of a tree to the recognition of the necessity to talic a
~):attjcularcourse of action.
The theorems of $ 2 provide a complete analysis of the problem of interpreting an
input within a particular class, but the ideas of $5 provide only 2% partial analysis of
t,he formation of the classes themselves. This problem cannot be dealt with using only
A theory f o r cerebral neocortex
166
the hardware developed in this paper; and its solution requires the results of the
second part of the gcncral theory.
Thc second part of the theory embodies a second pair of ideas. One of these also
arises from the cerebellar theory, where it was seen that a codon representation is
extremely successful a t straight memorizing tasks (Brindley 1969; Marr 1969). The
othcr is the everyday concept of an associative memory. The cerebellar theory is a
kind of associative memory theory, and it is not difticult to extend the idea of the
codon representation to the case of a general associative memory. This is devcloped
in the theory of Simple Memory (Marr 1971). Once this has been donc, it is possible
to see how current tlescriptiorls of the environment can be stored, and recalled by
addressing them with small parts of such inputs. This is the facility needed to
complete the theory of the formation of classificatory units. It is, however, only a
small part of the use to which such a device can be put: almost the entire theory of
the analysis of temporally extended events, and of the execution ub i n i t i o of a
sequencc of movements, rests upon such a mechanism. Though simple, i t is important (and long) enough to warrant a separate development, and is therefore
expounded elsewhere, together with the theory of archicortex to which i t gives rise.
0.3. Outlines of the present theory
This paper starts with a discussion of the kind of analysis of sensory information
which the brain must perform. The discussion has two main strands: the structure of
the relationships which appear in the af'terent information; and the usefulness Lo tho
organism of discovering them. These two ideas are combined by the 'Fundamental
Hypothesis' of 5 1.6 which asserts the existence and prevalence in thc world of a
particular kind of relationship. This forms an explicit basis for the subsequent
theoretical development of classificatory units as a way of exploiting these relationships. The fundamental hypothesis is a statement about the world, and asserts
roughly speaking, that the world tends to be redundant in a particular way. The
subsequent theory is based, roughly, on the assumption that the brain runs on this
redundancy.
The second section contains the fundamental theorems about thc diagnosis and
interpretation of events within a class. It assumes that the classes have been set up,
and studies the way in which they allow subsequent incoming information to be
interpreted. These theorems receive their neural implementation in the model of
figure 8.
The rcst of the paper is closely tied to the examination of specific neural models.
After the technical statistics of 5 3, the main section 4 on the fundamental neural
models appears. This discusses the structures necessary for the implementation of
the b:isic theorems, and derives explicitly those models which for various reasons
seem preferable to any others. The first main resule of the paper consists in tlie
demonstration that the two theorems of 2 correspond to closely related operations
in the basic neural model.
The second main result concerns the operations involved in the discovery of new
166
D. Marr
classif catory units. It shows how a climbing fibre enables a cortical pyramitl;~lcell
to discover a cluster in tho space of events which that cell receives. This result,
together wiLh the previous ones which show how classificatory units work when
represented, completes the main argument of the paper.
Finally, in $6, the available knowledge of the structure of the cerebral cortex is
1)ric~fly
reviewed, and parts of it interpreted within the models of $4. This scction is
incomplete, both because of a lack of information, and because Simple Mcnlory
theory allows the: interpretation of other components; but it was .thought better a t
this stage to include a brief review than to say nothing. Par too little is known about
thc structure of the cerebral cortex.
0.4. L)ejnitions and notation
0.4.1. Tirne, t, is discrete, and runs through the non-negative integers (t = 0, 1,
2, ...). t scarcely appears itself in the paper, but most of the objects with which the
theory deals are essentially functions oft.
0.4.2. An input$hre, or$hre, ai(t),is a function of time t which has the value O or 1,
for each i, 1 6 i 6 N . a,(t) = 1 will have the informal meaning that the fibre a i
carries a signal, or 'fires' a t time t. A signal is usually thought to correspond to a
burst of impulses in a real axon. The set of all input fibres is denoted by A, and the
set of all subsets of A by 21.
0.4.3. An input event, or event, on A assigns to each fibre in A the value 0 or 1.
Events are usually denoted by letters like E , P,and the value which the event E
assigns to the fibre ai is written E(a,), and cquals 0 or 1 (1 < i 6 N). It is convenient
to allow the following slight abuse of notation: E can also stand for thc set of ai which
have E(ai) = 1. Thc phrase 'ai in E' therefore means that E(a,) = 1, i.0. that the
fibre ai fires during the event E.
0.4.4.A suhevent on A, usually denoted by letters like X, Y, assigns the value 0 or 1
to a subset of the fibres %, ..., aN. For example, if
X(ai) is undefined for i > s, then X is a subevent on A. As in the case of full events,
X can also mean the set of fibres ai for which X(a,) = 1: in the example therefore,
X can stand for the set {a,, ..., a?).
0.4.5. If X is a subevent, the set of fibres to which X assigils a value is called the
support of X , and is written S(X). Thus in the above example, S(X) = (a,, ...,as}.
0.4.6. A sct of events is called an event space, and is denoted by letters like E, 8.
A set of subevents is called a subevent space, and is denoted by letters like X,9.
0.4.7. Greek letters are usually reserved for probability distributions. The letter
A, for example, often denotes the probability distribution induced over (the set of
all possible events on a,, ..., aN)by the input events. Thus h(E) is the number of
occurrences of the event IC divided by the total elapsed time. If, instead of considering the whole of A = {a,, ..., aLv),attention is restricted to A' = {a,, ...,a,), then
A theory for cerebral neocortez
167
the space a' of events on A' corresponds to a set of subevents on the original fibre
set A . Every event in (21 defines a unique event in a ' , obtained by ignoring the fibres
a,,,, . .., a,. Thus the full distribution A over 3 induces a distribution A' over a'
obtained by looking only a t the fibres a,, ...,a,. h is called the projection onto 21' of A.
If X is a subevent space, then the phrase 'A' is the distribution induced over Z by
the input' refers to the A' induced from the full input probability distribution
A by projecting it onto X. If 23 is any subset of %, then the restriction A123 of A to 23 is
defined. as follows:
(A]23) (E)= A(E) when E is in B,
( A ) (E) = 0
elsewhere.
0.4.8. l h a l l y , it is often convenient to use various pieces of shorthand. The
following is a list of the abbrcviations used.
I 1 is a method of defining a set. For example, {ail 1 < i < N ) means ' -tiheset
of fibres ai which satisfy the condition that 1 6 i < N ' .
s.t.
means 'such that ', E
means 'is a rnernber of the set ': e.g. ai E E, $
means 'not E ', P(XI Y) is the conventional conditional probability of X given Y, =>
means ' implies ', e
means ' is implied by ', e,.
means 'implies and is implied by ', iff
means 'if and only if ', 3
means 'there exists ', means
'the number of elements in': e.g. 1 El means 'the number of fibres I
that are active in the event I#',
{
I
The following set-theoretic symbols are also used:
E u b'
E nF
E\P
En P
E EP
EcF
= the union of Z and 1"'
intersection of E and 17, set of elements which are in E but not in P , = the set of elements which are in exactly one of E and P , means I# is contained in or equal to F , means E is contained in P and does not equal P. = the
= the
The reader who is not familiar with this notation should not be put off by it. All
the important arguments of the paper have been written out in full. An adequate
understanding of its content may be achieved without reading the paragraphs in
small type, which is where these symbols us~tallyappear.
0.5. Information measures
The only universal measures of suitability, fit, and so forth, are information
measures. Three are of principal importance in this paper, and are defined below.
Others are derived as they are needed. All the spaces with which the paper is
168
U. Marr
concerned are finite, and therefore only discrete probability distributions need be
considered. Definitions are given here only for the finite case, although every
expression has its more general form.
0.5.1 . Entropy (Shannon 1949). The entropy of the discrete probability distributionp,, ...,fi\vill be denoted by the letter h. Thus
s
" ( ~ 1 7
PSI = . x -pi1°g2pi'
.*.7
L-
l
All logarithms are to base 2.
0.5.2. Information gain (Shannon 1949, and see RGnyi 1961). Let p, v be two discrete probability distributions over the same set of events: Then the information gain due to /Lgiven v is
0.6.3. Information radius (Sibson 1969).
pi
Let p,, ..., p, be discrete probability distributions over the same s events.
= (pil, ...,pis), x p i j = 1. Let p = (p,, ..., ps), and write /L /hi if p , = 0 implies
i
that pik = 0. Let w,, ..., w, be positive numbers. Then the information radius of the
with weights zu,, is
n
This infimum is achieved uniquely when
n
P
r; wipi
i=1
= 7-
Ii,the information radius, is an information measure of dissimilarity.
will be abbreviated to K(/L,~,).
The nature of K is explained more firlly where it is
used.
This section is concerned with the problem of what the brain does. The background
and arguments it contains are dircctcd towards the justification of the Fandamental
Hypothesis (1.6). I t is shown that despite the complications which arise in the early
A theory for cerebral neocortex
169
processing of sensory information, this hypothesis is oftcn valid for information
with which the brain has to deal. The discussion proceeds by first exploring notions
connected with the idea of eliminating information theoretic redundancy an idea
which has had a somewl~atcheyuered career in neuropnysiology (sce Barlow 1961
for discussion and references). Secondly, ideas connected with biological utilit,y are
developed; and finally thesc are combined with the ideas of the first part to produce
the philosophy from which the theory is derived.
1.1. Informalion thmretic redundancy
1.1.1. Redundancy and early processing of visual information
The notion that the processing of sensory information is an operation designed
to reduce the rcdundancy in its expression is attractive, and one thet is helpful for
understanding certain aspects of early codii~g.For example, the coding in the optic
nerve of relative rather then absolute brightness prevents the repeated transmission
ofthe average brightness of the visual field. The use of on-centre oE-surround coding
there is peculiarly suitable for another reason, namely that the visual world has a
tendency to be locally homogeneous. (liven that a particular point in the visual field
has a certain luminance and colour, the chance that neighbouring points also do is
high. 'Chis kind of reclurldancy wonld not be present if, for example, thc world was
like scattered, multi-coloured pepper.
The visual world has this tendency towards continuity because matter is cohesive:
the existence of edges and boundaries is a consequence of this. It may be possible to
view the next stages of visual processing--by the 'simple' and 'complex' Hubel
& Wicsel (1962) cells of area 17--as a further recoding designed round the redandancy associated with tho existence of edges, bars, and corners. The test of this is
whether using these cells, i t is oasier t o represent scenes from the real visual world
then an arbitrary, peppery optic nerve input; and it probably is.
'l'here are many other ways in which redundancies arise in visual inforrnrrtion.
The noxt most obvious are those introduced by the operations of translation,
magnification, and by rotation. For these operations a t least, the question of what
to do with the redundancy to whiclr they give rise poses no great dificulties of
principle. The brain is, for example, much less interested in where an image is on the
retina then on tho relative positions of its various parts. I n this case, the clear object
of a portion of thc processing must bc to rccode the input, perhaps gradually, in such
21, way that relative positions are preserved. This should probably be done so that if
two objects are seen momentarily, each in a different position, orientation, and having
a different size, then the accuracy with which they may be compared should depend
upon the magnitude of these differences.
Various similar points can be made about early processing in the other sensory
modalities; but enough has probably becn said to make the two main points. They
are first, thet notions of pure redunclaiicy reduction probably arc involved in the
early analysis of sensory information. Secondly, redundancy can occur in many
forms. The variety is especially obvious nearer the periphery. Each form requires a
special mechanism to copo with it, and so, especially lower down in the brain, it is
natural to expect a diversity of specialized coding tricks. Some of these have been
found, and some have not.
1.1.2. Redundancy and later visual processing
A great deal of the redundancy in visual information arises out of the permancnce
of the world. This, which includes the tendency of matter to cohere, makes it natural
to code for changes, and to look for common subevents, like lines, corners, and so
forth, which concern only a small fraction of the total population of input fibres.
Common subevents are often called jeatures, and the ideas associated with the
analysis of features are probably the most promising available concerning later
processing. 'l'heir potcntial advantage is most clearly secn in the analysis of objects:
the great hope they hold is in the possibility that objects may be recognized by
checking for the presence of particular features. These features are imagined to be
drawn from a central pool which is shared by all other objects, and which is not too
large.
This kind of scheme for later visual processing introduces five main categories of
problem :
(i) The discovery of the relevant feature vocabulary.
(ii) Coding features in a suitably invariant way.
(iii) Coding the relative positions of the features.
(iv) Partitioning the features so that information from one object is separated
from information about other objects.
(v) The decision process itself.
'Object', in the case of visual informatioi~,has a fairly well-defined meaning,
becausc of the coherence of matter; but these gcneral ideas have a wider application.
For example, an 'iimprcssion ' of an auditory input may bc obtained from its power
spectrum: in such cases, the 'objects' are less tangible. But for now, it is enough to
consider just the special, visual case.
Problems (i) and (v) arc vcry general, and are dealt with later ($1.4, $2, $5).
Problem (ii)is special, and only two points about it will be made here. First, lines and
edges are preserved by magnification, so parts of problem (ii) are automatically
solved. Secondly, i t is only necessary to localize the components of any particular
image to an extent that will prevent their eonhsion with other images. The exact
positions of the cdges and corners of an objcct need not be retained, becalrsc the
general restraint of continuity of form will rncan that exact relative positions can
al-cvays be rcxcovered from a knowledge of approximate relative positions, the
number of terminations, and approximate lengths of segments. Hence the problems
associ:~tedwith translation of an image across the retina can hegjn to be solved quite
early by recoding into elernents which signal the existence of their corresponding
features within a region of a particular size. The exact size will depend upon how
unusual is the feature.
This in itself is of no use unlcss sonic way can be found of representing these
A theory for cerebral neocortex
171
approximate relative positions: this is problem (iii).Fortunately, it is very easy to see
how distance relations may be held by a codon representation (Marr 1969). The key
is an idea of 'ncarness '. Suppose {f,, ...,f,) is a collection of features, oi~dowedwith
approximate distance relations d(fi, fj) between each pair. Suppose subsets of the
sct {f,, ..., f,) are formed in such a way that those features which are near one
another are more likely to be included in the same subsct than those which arc not.
Then the subsets would contain information about the relative positions of the
fi (see Petrie 1899 for an intriguing natural occurrence of this effect). Techniques
like multidimensio~~al
scaling can be used to recover metric information explicitly in
this kind of situation (Kruskal1964;Kenddl1969)~but for the prescnt purpose, it is
enough to note that two different spatial configurations would produce two diflerent
subset collections.
There is thus no difficulty of principle in the idea of analysis of shape by roughly
localized features: but it is clear that d l these techniques rely a great deal on
the ability to pick out the components of a single shape in tho first place. That is, a
successful solution to problem (iv) is a prerequisite for this kind of solution to
problems (ii) and (iii). This involvcs searching for hard criteria which will enable
the nervous system to split up its visual input into components from different
objects.
The most obvious suitable criteria arise from the tendency of matter to cohere:
they arc continuity of form, of colour, of visual texture, and of movement. For
example, most parts of a fleeing mouse are distinguished from the background by
their movement. A solution in this case would be to havc a mechaiiism which causes
signalsaboutrnovementin adjacentregions of the visual field to enharice one another,
and to suppress information from ncarby stationary objccts. It is not difficult to
devise mechanisms for this, and analogous ones for the other criteria.
These ideas about joining visual data up using certain fixed criteria, are collectively called techniques for visual bonding. It would be surprising if the visual
system did not contain mechanisms for implementing a t least somc kinds of visual
bonding, since the methods are powerfill, and can be innate.
It can be seen from this discussion that dthough ideas about redundancy eliminaLion probably do not determine the shapc of later visual proccssing, thoy are
capable of contributing to its study. Those problcrns of principle ((i)and (v))which
arise quite quickly can and will be d e d t with: the crucial point is that technical
problcins ((ii)-(iv)) will usually involvc the elimination of redundancy associated
with spccial I-tinds of transformation-perhaps specific to one sensory mode. These
problems can either be solved by brute memory (e.g. pcrhaps rotation for visual
informalion) or by suitable tricks, like visual bonding. The point is that these
problems usually can be ovcrcome somehow; and this is the optimism one nccds to
propel one to study in a serious way the latcr difficulties,which are genuinely matters
of principle.
D. Marr
172
1.1.3. R~dundancyand inforrr~ationstorage
There is a quite different possible application of infomration theoretic ideas, and
it is associated with the notion of coding information to be stored. It is a matter of
everyday experience that some things are more easily remembered than others.
Patterns are easier to rccall than rai~domlydistributed lines or dots. It cannot be
argued that the ranciorn picture contains more information in any absolzhte sense,
since the ci~lculationof its information content depends entirely upon the norm
with which it is compared. If the norm is itself, the random picture contains no
information. There can be no doubt that a normal person would have to store more
information to remernbcr the random picture than the patterned one; but Lhis, in the
first instance anyway, is a remark about the person, not about the pictures.
'I'his illustrates thc firntlamental point of this section-that the amount of
infornn:~tiona memory has to storc to record a given signal depends upon tlrc
structurc of the signal, and thc structurc of the memory. Lct Z be an event slx~cc,
and let o be the probability distribution corresponding to the afferent signal: thus
o(E), for E in Z, in the probability that 8will occur next. (The present crudc point
can be made without bringing in temporal correlatiorrs.) Let 16 bc thc probability
distribution uhich desc.ribes what the memory expects. Then the amount of information the mernory requircs to store o is
'I'his cxpression cxists if and only if
/L(E) = 0 * s ( B ) = 0.
h(o:,u) and h(s), tho entropy of n, are related by the following result. Assuming
the rnemory can dore o, then:
+ T(o1,u).
L ~ m m uJ(cJ~,u)
.
exists, and h(o:,u) = h ( s )
Proof. l f the merasory can store o , p ( E ) = 0 + o ( E ) = 0, and hence I(sj,u) exists.
Now
= J ( n l y )+h(o).
The tcrm h(o) is inevitable, but tllc tcrm J(o-116)reflects thc fundamental choice a
mcmory has when instructed to store a signal o. It can either store it straight, a t
cost h(o:p), or it call change its internal structurc to a ncw distribution, ,IL' say, and
store thc signal relative to that. The amount of information required to change the
structure from ,LL to ,uf is a t least II-I(,u,u'),wherc K is the infor.rn;ttion radius ($0.5.3);
but, though an expensivc ouIlay, it can lead to grcat savings in the long run ii'lc' is a,
good G t to thct irlcornirrg informlation.
These argupr~cntsare too general to warrant further precise development, but
A theory far cerebral neocorlex
they do illustrate thc two possibilities for a mcnlory which has to store information:
either i t can store i t raw, or it can develop a new languagc which bctter fits thc
information, and store it in tcrms of that. To this point, the next section S 1.2 will
rctr~rn.
Finally, this result sllows how important it is to examine the structure of a mcrnory
beforc trying to compute the smonnt of information necdcd to store any givcn
signal; i t would tllcrefore be disappointing to leave it without somc remarks on the
typc of internal distributions 16 we may expect to find in the actual brain. The
obvious liind of answcr is the tlistributions induced by a codon representation--as in
the cwebellum. The reliability of a memory is measured by the numbcr of wrong
answers it givcs whcn aslied whcther the current event has becn learned. 'l'his in
tarn depcnds upoil the number of possible input events: in cascs whcrc this is huge,
tlze nlcmory need only arrange that the proportion of wrong to right answers
remains low. In smallcr cvent spaces, a mcrnory rnay havc to represent the lcarned
distribution a good deal more accurately. The first case may -tvcll correspond to the
situation in the cercbcllum and a l l o ~ codons
s
of a relativcly small size: the sccond
may rcquire them to be much larger. The result relevant t o this appears in $3, but
the situation even in thc cerebellum ma,v iin fact be rather more complicated
(Blornfield & Marr 1970).
1.2. Coi7cepl?tformntio?l,und reclundu?zcy
1.2. I . 7'he relecu?ace of co.ncepts
It was sllown in $1.1.3that onc policy available to a memory faced with having to
store a signal is to construct for it a special language. In the present context, this is
bound t o suggest the notion of concept formation.
P t is difficult to doubt that one of the most important ways in which the nervous
system eventually deals m ith sensory information is to form concepts with which to
decompose and classify it. For ex:nnple, the concepts clzail.,sun, lover, music all have
their use in tlze description of the -c%orld;
and so, a t a lower lcvcl, do the notions of
line, eclg~,tom and so forth.
Concepts, in general, are things which ease the nervous system's task; and although they do this in various ways, many of these ways produce their advantage by
characterizing (and hence circumventing) a particular source of redundt~ncy.One
espc~ciallyimportant example of how a concept does this is by expressing a part or the
whole of that which marly 'things' or 'objects' have in common. This 'comrnon'
element may take many forms: the objects' representations by sensory receptors
may be related; some aspect of their functions may be the same; they may have
comrnon associaiions; or the.)?may simply havc occ~rrrcdfrequentlyin the experience
of the observing organism.
'I'his notion l ~ a the
s corollary that concept formation sllould be a natural comequence of the discovery of a large enough source of redundancy in the input generating a brain's experience. For example, if i t is noticed that a certain collection of
features commonly occurs, this collectiorl should be recoded as a slew and separate
D. Marr
entity: for this new entity, special recognition apparatus should be set up, and this
then joins the vocabulary of concepts through w1iicl.i the brain interprets and
records its experience.
Finally, concepts have been discussed as a ma;tns of formul;tting
between collections of other 'things ', 'objects ', or 'features '. This appears to ~ c s t
upon the imprecise notions of 'thing', 'object' or 'fexture': but thcre is in fact 110
undefinable notion present, for these can simply he regarded as concepts (or roughly,
occurrences of concepts) thab have previously been formed. 'l'his inductive step
allows thc argument to be taken back to the primitive input elements on which
the whole structure is built; and in neurophysiology, there is no fundsmc.nta1
problem to finding a ineaning for these: they are either the signalsin axons that constitnte the great afferent sensory tracts, or the features automatically coded for in
the nervous system.
1.2.2. Obstacles
Sometl~ingof a case can therefore be made for a connexion betwcen concept
formation and the coding out of redundancy, but it would be wrong to s~xggcstthis is
all that is involved. Concept formation is a selective process, not always a simple
recoding: quite ;IS important as coding out redundancy is the operation of throwing
away information which is irrelevant. For the moment however (until $1.4) it is
convenient to ignore the possibility that a recoding process rnight positivcly be
dcsigiied to Iosc information, and to conce1itr:~teon t h e difficulties involvccl in
recoding a redundant signal into a more suitable form.
The general prospects for this operation are not good: this is for the same reason
that the proofs of Shannon's (1949) main coding theorems are non-constructive.
Thcre exists no general finite apparatus which will 'remove redundancy' from a
signal in a channel. Different kindfi of signal are redundant in esoteric ways, and
any particular signal demands am analysis which is specially tailored to its individual quirks. Hence the only hope for a general theory is that a particular sort of
redundancy be especially coinrnon: a systcm to dcal with that would then halve a
general application. Yortunatcly, it is likely thcre does cxist such a form; and with
its dctailed discussion thc ncxt section is conccrncd.
1.3. I'roblems in spatial redundancy
I .3.0. Inlrodzcct.ion
The tcrm spatial redundancy means that redundancy which is preserved by any
reordering of the input cvents (of which only a finitc nurmbcr have occurred);
it thus fails to take account of causal or correlative relations which hold bctcveen
cvents a t different timcs. It is thc only kind of redundancy with whose detection
this paper dcals. The complications introduccd by considering temporal corrclations as well arc scvere, and anyway cannot bc discussed without somc way of
storing temporally extcnded cvents. This requires Simple Mcmory theory, and must
thcrcfore bc postponed.
A theory for cerebral neocortex
176
'Che particular kind of spatial redundancy with which this scction is concerncd is
the sort which arises from thc fact that somc objccts look alikc. This will bc intcrpreted as meaning that some objects share morc 'features' than others, where
' fcaturcs ' arc prcviolnsly constructed classes, as outlincd in § 1.1.2. It is conjectured
that this kind of information forms thc basis for the classification of objccts by thc
brain: but before examining in detail thc mechanism by which it is donc, some
arguments must bc prcscnted for the general notion that something of this sort is
possible.
1 .3.1. Numerical taxonomy
I+Cvidcnceto support this hypothesis may bc derived from recent studies in automatic classification techniques. The most important work in this ficld concerns the
usc of cluster methods to compute classes from information about the pairwise
dissimilarities of the objccts conccrncd (Jardins & Sibson 1968). Therc are two
stcps to thc process. The first coniputcs thc pairwise dissimilaritics of thc objects
from data about thc fcatures each objcct posscsscs. For this, the information radius
(Sibson 1969; Jardine & Sibson 1970)is used, and in thc casc wherc the fcatures are
of an all-or-none type (i.e. an object cither docs or does not possess any given
feature), this takcs a simple form. Suppose objcct 0,possesses featurcs f,, ..., in,
and object 0, possesses featurcs f,
..., f,, 1 < r < n < nL. Thcn K ( 0 , O , ) , the
information radius associatcd with 0, and O,, (regardcd as point distributions), is
simply r + ( m - n ) , the numbcr of features which exactly one object of the pair
posscssrs.
'L'he second stcp of the classification process uses a clustcr mcthod to compute
classcs from the information radius mcasurcments. Various argurncnts can be put
forward to show that some cluster methods arc grcatly to be prcfcrrcd to others
(Sibson 1970).Unlike the measurement of dissimilarity, these havc not been given an
information theoretic background; but to do so would require a firm idca of thc
purpose of the classification. The kind of assumption ono would need would bc to
recpire that thc classification provide the best way of storing thc information relative
to some mcasurc-for examplc, a product distribution gancrated by assigning
particular probabilitics to the individual featurcs. There is considerable choicc,
howevcr, and it is unlikcly that any particular measure could be shown to be natural
in any scnse.
bt is not argued that any cluster proccss actually occurs in the brain: the importance of this work to the prcse~rtenquiry is more indirect, and consists of two basic
points. Thc first arises out of thc type of rcdundancy thcse methods detect. It is that
the objccts concerned do not havc randomly distributed collections of fcatures:
wbak happens is that classes of objects exist which produce collections of features
that ovcrlap much more than they should on the hypothcsis of randomness. This
fact, together with some kind of convexity condition which asscrts that an objcct
nirmt be included if cnough like it are, is f~~ndamental
to the classifying process.
'Fhc sccond point is that cluster analysis works. A largc amount of information has
bcen analysed by such programs, espccially information about the attributes of
various plants. It has bcen found that thcsc mcthods do givc the classifications which
pcoplc naturally malic. This is important, for it, shows that people probably use
some process associated with the detection of'this kind of redundancy fbr the clilssificution of a wide range of objects. 'I'he motivation for studying methods for dctetating
this kind of redundancy now becomes strong.
1.3.2. Mountain climhing in a p~obabilisticlandscape
I n the brain, one may expect featurc detectors to exist, if the recognition of'
objects is based on this sort of analysis. If sl~atialredundancy ($1.3.0) is present
in the input, there will exist collections of' features which tend to occur together.
This phcnomenon can be given the following more picturcsque description. llet the
input fibres a,, ..., 11, represent feature detectors, and let 'U be the set of events on
(a,, ..., aN) ($0.4). Xndow 91 with the distance function d, where d(E,li') the
nuln ber of fibres at which the events E and P disagree. ( a , d) is a inetrie space, and in
fact d(E,P) K(E, li'), where R is the information radius.
Imagine the space (21, d) laid out, with the probability p ( E ) of each event E c 9[
represented by an extellsioll in a new dimension. p ( B ) is called the 'height' of A'.
I t will be clear that if E occurs more frequently than E', p(E)> pi$') and E;'ishigher
than P . I n this way, the environment may bc regarded as 1;tndscaping tht: spwe 92,
in which the mountains eosrcspond to areas of events which are frequent, anti the
valley to events which are rare.
'Uhe important point about the choicc of' d for the mrktric on 91 is that nenrk)y
inputs (under d) possess nearly t l ~ csanic features. Rence if' a number of inputs
conzmonly occur with very similar collections of features, they will turn out us a
mountain in (21,d) under p. The detection of such collcetions is thus equivalcnt to
the discovery in the space ( 92, d) of the mountains induced by p. 'The prohlern of
discovering such mount;~iiisis solved in $ 5. Two other problems concern the choice
~ )which to form the space (21; and the question
of the feature detectors {a,, ...,~ 1with
of what exactly one does wit11 ;L mount;~inwhen it has been discovered. These :-are
dealt with next. 'l'he point that this scetion illustrates is that the mountain idea ovcr
the space ( a , d) chai-actevizcs the kind of redundancy in which wc are interested.
-
-
1.3.3. The pccrtition problem
The pi-ospeetsfor discovering mountains in the space 91, given that they arc Ci~ci.c,
are good; but whether they are tllcrc or not depends lai-ply on the choice of'the
feature detectors (a,, ..., a,,). There can be no guarantee that an arbitrarily cklosen
collection of f'eatares will generate :z pvobabilistie lai~clscapeof ally interest.
The discovery of an appropriatc 2[ needs methods whereby features which are
likely to he related arc brought together. 'I'his is called the partition p~oblwn,and
is in general extremely difficult to solve. 'llhe problem rbr which visual bonding was
introduced in 3 1.1.2 was an cxarnplc of how special tricks can i11 certain circnmstances be used to solve it.
A theory for cerebral neocortex
177
If no bonding tricks :ire known, however, the discovery of suitable spaces must
rcst upoii measuring correlations of various kinds over likely lookiiig populations of
events. This is ;Ln oper:ztion whosc rate of success depends upon the size of the
will be discussed more
:zvailable memory. It needs the theory of Simple Memory, ; ~ n d
fully them. Suffice it hcre to say that the problcm is not totally intract;~bledespite
the huge sizes of all the relev;~ntcvcnt spaces. The reason is that only a very small
proportion of the possible events can ever actually occur, simply because of the
length of time for which a brain lives. 'Phis mcans, first, that the memory can be
quite coarse; and secondly, that if anything much happens twice, it is almost certain
to be significant.
1.4.0. Introdzcction
1.4. The recodinq dilovnma
'l'he ;~ttractionof mountains is that when applied to the correct sp;Lce, they
provide a neat characterization of the type of redundancy which, thcwe is reason to
believe, is important for the classification of objects, and probably much else
besidcs. The question that has now to be discussed is what to do with a mountain
n-hen i t has heen discovered. The obvious thing to do is to lump the events of a
mountain together and call i t a class. The problems arise k~ecausethere is virtually no
hope of ever saying why this is the right thing to do, using purely information
theoretic ideas; and until this is specified, i t will be impossible to say exactly how
the lumping should be done.
The k~asicdifficulty is that the lumping process involves losing informationabout the difference k)ctwcen the events lumped together. The simplest reason why
this process might be justiliablc, or eve11 desirable, is reliability. It would be implausible to suppose that the interpretation of an input might fail because of the
failure of a sing1efik)rc.Hence arecognitiollapparatusfor the particular event X must
admit the possibility that an input Y with d(X, Y) = I or 2 (say) should be treated
like X. But it is only by introducing such an assumption that this kind of step could
be made, a t least within the framework of the arguments set up so far.
1.4.1. Informativn theoretic assumnplions of a suitable nalure
'I'he problem about trying to develop information theoretic hypotheses to act as
justification for ignoring the difference between two events is that from an absolute
point of view, one might just as well confuse two events with d ( X , Y) large as with
d(X, Y) small: there is no deep reason for prcferring pairs of the second sort. It is
natural to hope that in some sense, less information is lost by coilfusing nearby
events, but in order for this to be true, something has to be assumed about the way
two events can be compared. This effectively means comparing them to one-or a
family of--reIerence distributions, whose choice must be arbitrary, and equivalent
to some statement that nearby events are related. The thcory thus becomes selfdefeating, and the realization that this must be so allows exactly one observation to
be made--namely that information theoretic arguments alone can never suffice to
form a basis for a neurophysiological theory.
D. Marr
178
1.4.2. lands lid^
The mountain structuuc of 1.3.2 depends on two things: the environmental
probability distributionp, and the metric d. But it has been shown in 1.4.1 that the
particular choice of d for the metric cannot bc justified in any absolute way. The
view that these mountains are important can therefore receive no support frorn any
theory, based solely on ideas about storage, which does not assume that the first
information to be thrown out is that which distinguishes the different parts of one
mountain. I n order to see how this might in fact be so, it is therefore necessary to
return to the real world, to discover how some information may be important,
while somc may be expendable.
1.5. Bioloyical ulilily
1.5.0. The general argument
The qucstion with which this section is concerned is why should i t ever be an
advantage to classify together the events of a mountain. To answer this requires a
clear idea of what the brain classifiesfor: only when this is known can it be deduced
what kind of information is irrelevant, and hence which events nlay be classified
together. The answer which will be proposed is that the classifications the brain
eventually derives are ones which allow the deduction of the presence or absence of a
properly or properlies, not necessarily directly observable, from such information as
is a t the time available. The word 'property' means here a slightly gcneralizcd idea
of a feature: that is, it includes specifications of things an object can do, or can have
done to it, as well as, for example, the sound it makes or the colour it has.
.
1 .5. P Examples
It is helpful at this point to give some concrete instances of the general statement
made above. I n its purest form, it implies a simple learning device, to which instances of the property concerned are transmitted through one channel, while
informationfrorn-cvhichthis property is to be diagnosed is conveyed through another.
l'his corresponds exactly to thc situation proposed for the cercbcllar cortex in a
recent theory of that structure (Marr 1969): the first channel is the climbing fibre
input, and the second, the mossy fibres. There clearly exist stern limitations to this
idea in any more general application, since in the cerebellar model, a property earl
only be diagnosed in conditions which are virtually a replica of a previous state in
which the property was known to hold. It is, nevcrtheless, a primitive example of
the central idea.
The property concerned nced not be thc immediate implementation of a particular
elemental movcmcnt: it, might be whethcr or not a particular. branch can support the
wcight of a particular monkey. The aninla1 concerned c1e;trly needs to be able to
make this discrimination, and to bc ablc to do so by methods other than direct
experiment. The information available is the appcarancc ofthe branch, from which it
is possible to produce a reliable estimate of its strength. J t is supposed that thc
A theory for cerebral neocortex
179
animal uscd data obtained by direct experiment (in play during his youth), to set up
the appropriate classificatioil apparatus.
Thcse two cascs illustrate the idea of a classificatory scllcme designed for thc
diagnosis of properties not directly or imnletliatcly observable. 4t is helpful to rnalie
thc Iollowing
De$laition. An intrinsic property is onc tho presence or abscncc of which is
known, and which is bcing used to decide whether anothcr property is jwescnt.
The word 'intrinsic' is uscd for this bccause if a property-dctecting fibre aiis in the
support of a space over which thcre is a mountain, thcri that property is in a real
scnse an intrinsic part of the structure of the mountain. 'I'he second part of tlze
dcfiiiition follows naturally: an extrinsic property is onc whosc prcscnce or absence is
currcntly bcing diagnosed. These two words havc only a local mcsning: they arcA
simply a useful way of dcscribing which sidc of a decision proccss a particular
propcrty lies.
Classification for biological utility may therefore be regardcd as the diagnosis of
important but not immcdiatcly observable properties from information which is
easy to obtain; and although this to some extent begs the question of what is an
important propcrty, it, ncverthclcss, rcpresents some advance. Its strength is that
it shows what informatiotl may be lost-namely thc diffcrencc bctween events which
load to a correct diagnosis of a given propcrtp. The wcakncss of this approach is that
it contains no scope for gcncralization from situations in which a propcrty is linown
to hold, to ncw situations; and thcrefore seems to reduce operations in thc brain to a
sinlplc form of memory.
1.5.2. The dicJ?oto?ny
'tt may fairly be said that the remarks of this and thc last scctions force a dichotomy. On the onc hand, thcrc are the attrsctivc anti clcgant ideas associatcd with
coding for fcaturcs, and their connexion with mountains and pure classification
theory. Thesc have been shown to bc an insuflicicnt basis for a theory, but they
have a strong intuitive appeal. On the othcr hand, thcrc are the nakedly prachical
idcas associatcd with strict biological utility. These have the advantage of giving a
criterion for what information can be ignorcd, but in this crudc shapc, thcy suggcst a
mcmorixing systcm which performs more or less by brute force. Tlrerc is no hope
for cithcr of these approaches unlcss thcy can be reconcilcd; and for this task, thc
next section is rescrvcd.
1.6. The ,fundanzental hypothesi~
1.6.0. The nnfurc oJ' a ~econcilintion
Beforc trying to discover how thcsc two views may be united, onc must have a
clear idea of the nature of any statement which could bring thcm togethcr. The first
view was of a liind of classification schcme which might be uscd by the brain. It
consisted of selecting rcgions of commonly occurring subevents in event spaccs over
a collcction of feature-dctccting fibrcs, such that thc subevents selected differcd
180
D. Marr
rather little from one anothcr. Thc sccond vicw suggcstcd that thc main function of
thc analysis of scnsory information was to dcducc propcrtics of importance to the
needs of the animal from such information as is available. These can only bc reconciled if classification by mountain sclection does prove a good guidc to thc presencc
of important propertics: to decide whether this is so, propertics of thc rcal world
must bc considered.
1.6.1. Validity for properties which are usually intrinsic
Let 3 be the cvcnt spacc on thc fcaturc-dctccting fibrcs {a,, ...,alv),and let h be the
probability distribution induced over 3 by thc cnvironmcnt. d is thc natural metric
dcfincd in $1.3.2. Tn a gcnersl input subevent, the value of each fibre will be 0, or 1,
or will bc undefined. Thc last casc can arise, for example, in thc casc of visual
information, when part of an objcct is hidden behind something clsc. I n this way, a
propcrty which is usually obscrvable may sometimes not be. It will now bc shown
that classcs obtained by lumping togethcr events of a mountain over (PX,d) can
usually act as diagnostic classes for such propcrties.
FIGURE
I . An illustration of tlic form of rcdundancy bcing discussed: the probability dmtrlbutlon ,U irtduccd by the cnvironrnent over N , ( X ) has non-zcro values only in N,(X).
Let X E 58 bc an event of PI, and let N,(X) = {YI Y E 58 antl d(X, Y) < r). A ' mountain' in 58 might correspond to somc distrib~ationlikc / A where
whcrc s > r , r. is small, antl K is some positive constant. As soon as enough values of
thc ai arc known to determine an evcnt as lying within i!Vs(X),it follows that the
event lics within Ai,(X) (see figure 1). Write pi = probability that (ai = 1given
E E&(X)). Then if an cvcnt is diagnoscd as falling within N,(X) without knowing
thc value of a,, it can bc asscrtcd that ai = 1 with probability about pi. This is
uscful if pi is near 0 or 1.
This kind of effect is a natural conscquencc cf any mountain-likc structurc of
h over PI, and allows that, in ccrtain circumstanccs, thcsc classes can bc uscd to
diagnosc properties which are usually intrinsic. The valucs of ai arc not nccessarily
as expected-the picce of thc object that is hitltlcn may in fact be broken off; but the
spikier the mountain (i.e. the smaller the local variance of A), the nearer the pi will be
to 0 or 1, and the more certain the outcome.
A theory for cerebra,l neocortex
181
1.6.2. Extrinsic properties
'I'he argument for t21is liind of classification is that whenever there is a tendency
for intrinsic properties to occur together in this way, i t is extrcrnely likely that there
will also exist other properties, perhaps not directly observablle ones, wllich also
generalize over such groups of events. Hence, although the reason may not a t tho
time be apparent, it will be good strategy for the animal to tend to make these
classifications. Thus lator, whcn a property is discovered to hold for one event in a
given class of events, the animal will be inclined to associate i t with members ofthe
whole class. 7'he generalization may or may not be found to be valid, but as long as it
is successful sufficiently often, the animal will survive.
Onc other way of looking a t this liind of generalization is to alter slightly the way
one expresses the relevant kind of redundancy. It is equivalent to the assertion that
once a context is sufficiently determined, one property may be a reliable indicator of
allotl.)ev. The example cited earlier was of a monkey judging the strength of a
branch. In practice, the thickness of a branch of a tree is a fairly reliable indicator
of its strength, so t21:~tunless the branch is rotter~,it will support the monkey if it is
thick. enough. Rottenness, too, can be visually diagnosed, so that a completely
reliable assessment can be made on the basis of visual information done. The context
within which thickness and strength are related is roughly that the object in question
is a branch of a tree, and is not rotten.
This kind of relationship is common in everyday experience; so common indeed
that further examples are unnecessary. But although the general notion of this liind
of'redundancy has a clear importance, i t is not obvious how the details might work
in any particular case, nor that they rnay work the same way in any two. This problem
must bs tackled before any methods can be given for prescribing limits to the classes.
1.Ii.3. Rejining a classijicalo~yunit
The rough heuristic for picking out likely looking classes has been discussect a t
length. It was hinted that there may exist no a priori 'correct' way of assigning
limits: where, fbr example, is the boundary between red and orange? The view that
the present author talies is that although there are likely to exist fairly good general
he~xristicsfor class delimitation-like some liind of convexity property analogous to
that which the cluster analysts use - there are probably no universal rules. I t will
be extremely difficult to give even these heuristics a satisfactory physical derivation:
the kind of argument, required is very indirect. 33ut to say there exist no precise,
generidly applicable rules is merely to say that dif'fercnt properties have different
relations to their indicators, and so is not very surprising. If, for example, an iml?ortant extrinsic property is attached to a group of subevents, then its cessation marks
the boundary of the class. If the property ceases t o hold in a gradual way, the class
will have problematical boundaries. This does not necessarily mean the class is not
a useful one: the dubious cases may be rare, or may fall loss dubiously into other
classes. I n any case, those falling well inside will be usefully dealt with.
It is therefore proposed that the exact specification of the boundaries to the
classes should proceed by experiment. A new class is tentatively formed, upon the
discovery of a promising nzountdn. i f i t turns out to have no attached extrinsic
properties, i t probably remains LL slightly vague curiosity. If an extrinsic property
more or less fits the provisional class, its boundary can be modified in a suitable way:
tliis operation requires simple men~ory.If an extrinsic property is attached to it in
no very sensible way---that is, instances of the property are scattered randonzly or
inconsistently over the class-- then the class is no use as a reliable indicator, even
with the available scope for shifting the boundaries. This does not necess~~rily
render the class usoless, for the property might be one which puts the anirnal in
danger, and the class may contain all inputs associated with this kind of danger. For
example, only a few liinds of snake are dangerous, but tlze class of snakes includcs the
class of dangerous snakes. It may be impossible to produce a reliable classification of
snalies into dangerous and not dangerous without classifying some of them by
species. This requires the consideration of more information than is necessary
for diagnosis as a sizalie, and may be impossible without a potentially lcthal
investigation.
The investigation of the viability of a prospective class should probably be a very
flexible process, drawing on the play of an anirnal when i t is young, and upon the
experience of life later on. Those classes which turn out, with slight alteration, to be
useful will survive, while those which do not will not. Provided the initial class
selection technique is neither ~vrongtoo ohen, nor fails too frequently to provide a
guess where it should, the animal will be well served; and an instinct to explore his
surroundilzgs should enable him to remove any important errors.
1.6.4. The Ir'und(~menlrc1
Hgpolhesis
The conditions for the success of the general scheme of classification by mountain
selection with later adjustments can now be explicitly characterized. It will work
whenever an extrinsic property is stable over small changes in ils diagnostic intrinsic
properlies. A given extrinsic property may possess more than one cluster of intrinsic
properties which diagnose it, but as long as this condition is satisfied within each,
thc scherrle will work. If a small change in intrinsic properties destroys :hi1 c.xtrinsic
propcrty, either the boundary of the cLzss passes near that point, or this extrinsic
property carlrlot be diagnosed this way. I n the former case, slight boundary eharlgcs
can probably accommodate the situation: in the latter, thcre are two possible
remedies. Either instances of the extrinsic property can be learned by rote-this
can only bc successful if tlle relationship of the extrinsic to the intrinsic properties
is fixed-it is in any case arduous; or the intrinsic context has to he rccodcd. To the
general recoding problem, therc exists no general solution (by thc rcmarlis of S 1.2.2).
The present theory is thus based on the existence of a particular kind of redundancy, not because it is redundancy as such, but because it is a special, useful sort.
This is expressed by the following Bundarnental IIypothesis:
Where instances of a particulmr collection of intrinsic properties (i.e. properties
A theory jor cerebral neocortex
183
already diagnosed from sensory information) tend to be grouped such that if some are
present, rnost are, then other useful properties are likely lo exist which generalize over
such instances. Further, properties often are grouped in this way.
92. TBE RUNDA~ITENTATJT H E O E E M S
2.0. Introduction
The discussion llas hitherto been concerned with the type of analysis which may
be expcctcd in the brains of sophisticated living animals. It was suggested that an
important aspect of the computations they pcrform is thc induction of extrinsic
fiom intrinsic properties. This conclusion introduces three problems: first, collcctions of frequent, closely similar subcvents have to be picked out. Thc Pundamcntal
Hypothesis asserts that it is sensible to deal with such objects. This problem, the
discov~ryproblenz, is dealt with in 5 5 . Secondly, once a subevent mountain has been
discovcrcd, its set of subevents must be made into a new classificatory unit: this is
the repr~sentationproblem, and is dealt with in S 4. Finally, on the basis of previous
information about the way various extrinsic propertics generalize over these
collections of subevents, it must be decided whether any new subevent falls into a
particular class. This is thc diagnosis problem, and is dcalt with now.
2.1. Diagnosis :generalities
A common mcthod for sclccting the hypothesis from a set (Q,, ..., Q,) which best
fits the occurrence of an event E , is to choose that Qi which maximizes P(!31Qi).
Such a, solution is callcd the maximuln likelihood solution, and is thc idea upon
which the theory of Bayesiar~inference rests (see e.g. Kingman 65 Taylor 1966,
p. 274, for a statement of Bayes's theorem). This mcthod is certainly the best for the
model in which it is tlslially developed, where the Q, niay be regarded as random
viariablcs, and the conditional probabilitics P(EjQi), for 1 < i < n, are known. Thc
lnaximunl likelihood solution will, for example, show how, and a t what odds, one
would have to place a bct on thc nature of E in ordcr to cxpect an overallprofit. It is
of course important to know all the conditional probabilities; and if the Oi are not
independent, various complications can arise.
Thc situation with which the present theory must deal is different in scvcral
ways, of which two are of decisive importance. First, the prime task of the diagnostic
process is to deal with events Ej which have never been seen before, and hencc for
which conditional probabilities Y(EjlQ,) cannot be known. I t will further often be
the case that Ef occurs only once in a brain's lifctimc, yet that brain may correctly
be quitc certain about the nature of Ej.
Secondly, thc prior knowledge available for inferring that Ej is (say) an Qi comes
from the Fundamental Hypothesis. That is, thc knowledge lies in thc expectation
that if Ej is 'like' a number of othcr E,, all of which are an Q,, then Ej is probably
also an Q,. This does not mean that P(J3jlQi) is likely to be about the same as
184
D. Marr
P(EklQi): frcqucncy and similarity are quite distinct ideas. Hencc if the Funtlamental Hypothesis is to be used t o aid in the diagnosis of classes -the assumption on
which the present theory largely r e s t s t h e n that diagnosis is bound to depend upon
nieasurements of similarity rather than upon measurements of frequencies.
The analysis of frequcllcics of thc events Ej is therefore rclativcly unimportant in
the solution of the diagnosis problem; but it is of course extremely important for the
discoverg problem. The prediction that a particular classificatory unit will be useful
rests upon the discovery that subevents often occur which are similar to some
fixed subeverit: the role of frequency here is transparentlg important. B a t when the
new classificatory unit has been formed, diagnosis itself rests upon similarity alone.
A11 example will help to clarify these ideas. The concept of a poodle is clearlg a
useful one, since animals possessing most of the relevant features are fairly common.
Further, a prize poodle is in somc sense a poodle par excellence, and is as 'like' a
poodle as one can get; but i t is also extremely rare. The essential point seems to be
that irr a prize poodle are collected together more, and perhaps all, of the features
upon which diagnosis as a poodle depends (or ought, in the eyes of poodle breeders,
to depend).
These arguments imply that for Ghe diagnosis of classiliaatory units bg the brain,
Bayesian methods are probably not used. Conditio~ialprobabilities of the form
P(EI,(L)are thus largely irrelevant. The important question, when trying to decide
whether E is an 9,is how many oi'thc events likc E are definitely krrowri to be an Q.
The computaCion of this raises entirely different issues.
2.2. T h e notion qf evidence
The diagi~osisof an input requires that an informed guess be made about it on the
k~asisof the results for other irrputs. Tf, for example, the present input E (say) has
already occurred in the history of the brain, and has been found to deserve classification in a particular class, then its subsequent recognition as a member of that class is
strictly a problem of rnemory, riot of diagnosis. On the other hand, E may never have
occurred before, though i t might be that all E's neighbours have occurred, and have
been classified in a particular way. The Fundamental I-Iypothesis asserts that this is
good ground for classifying F$ in the same way.
The existence of an event similar to E , and ltuown to be classified as, say, arr 32,
therefore constitutes evidence that E should also be classified as a11 Q. It will be clear
that tlie more such events there are, the stronger the case for cltrssifying E as an Q.
It is appropriate to make two general remarks about evidence. The first concerns
the absolute weight of evidence providod by Q-classified events a t different distancos from E. Any theory must allow that for somc categories of information,
nearby events consitute strong evidence, whereas for others, they do not. 1)iagnoscs
within different categories will not necessarily employ Ghe same weighting functions
in the analyses of their evidence.
The second point about evidence concerns its adequacy. It may, for example,
never bc possible to diagnose correctly the class or property on the basis of evidence
A theory for cerebral neoeortex
185
from events on the fibres (a,, .. ., a,): they simply may riot contaiil enough information. On the other hand they may contaiil irrelevant information, whose effect, is to
make the classifying task appear to be more difficult than it really is. 'Chis observation emp2lasizes the importarice of picking thc support of the mouiltain correctly.
The requirc?ments of the diagnostic system car1 riow be stated. 46 must:
(i) Operate only over a suitably chose11 space of suk)cvents(suggested by the
Simple Memory). This space is called the diaqnosiic space for the property in
question, 9.
(ii) ltecord, as far as condition (iii) requires, which everits of the diagrrostic
space have hitherto beell found to be D's or not to be Q's.
(iii) Be able, given a new event R, to examine events near E , discover
whether they are Q's or not, apply the weighting function appropriate to the
category of L2, and compute a ~neasureof the certainty with which h' itself inay
be diagnosed as an f2.
The three crucial points now become:
B 1. How is the evidence stored? P2. How is the storcd evidence consulted? P 3. What is the weighting furrction (of (iii))? The solutions to these which are proposed in this paper are riot unique, but it is
conjectured that they are the solutions which the nervous system actually uses.
The l-iey idea is that of an evidence function, which will in practice turn out to be a
subset detector arialogous to a cerebellar granule cell. The three points are resolved
in the following way:
P I . Itvidcnce is stored in the form of conditional probabilities a t modifiable
synapses between 'evidence function' cells and a so-called 'output cell' for B,
(evei~tuallyidentified with a cortical pyramidal cell).
P2. Nvidence is consulted by applyiilg an input event E , wllicll causes eviderice
cells relevant to E to fire. The output cell then has active afferent syriapses only from
the relevant evidence cells. The exact way in which i t deals with the evidence is
analyscd in 5 2.3.
P3. The weighting functiori comes about hecause nearby events will use overlapping evidence cells, just as very similar mossy fibre inputs are trarislated into
firing in overlapping collections of cerebellar granule cells. The exact size of subsct
~
depends upon the category of Q: reoopidetector cells used for collecti L Ievidence
tioil of speech may, for example, require a geilerally higher subset size than the 4 or 5
used in the cerebellar cortex.
Let 2 be the diagrrostic space for Q, and let c be a furrction on 2 which takes the
value 0 or 1. c may, for example, be a detector of the subsct A' of iriput fibres, in
which case, for E in 2, c(E) = 1 if and only if the event E assigns the value 1to all the
fibres in the collection A'; but c can in gelleral be any binary function on X. Let
P(Qlc) denote the conditional probability (measured in the brain's experience so
far) that the input is an Q given that c = 1.
186
D. Marr
Definition. The pair (c,P(Qlc)) is called the evidence for 52 provided by the
evidence function c.
The most important evidence functions are essentially subset detectors, (justified in $4.2.1), and it is convenient to give these functions a special name.
Definitions. (i)For all E in %, let c(E) = 1, if and only if E(a,) = 1, 1 < i < r < N .
I n this case, c is called an r-codon, or r-codon function, and is essentially a
detector of the subset {a,, ..., a,} of the input fibres.
(ii) For all E in 5, let c(E) = 1 if and only if at least 0 of E(ai) = 1, 1 < i
< R < N.
In this case, c detects activity in a t least 0 of the R fibres {a,, ..., an}, a,nd is
called an (R, 8)-codon.
The larger subset size, the fewer events E exist which have c(E) = 1, and so the
denote the number
more specifically c is tied to certain events in the space E. Let
of events in E , and let K be the number of events E in % with c(E) = 1: then the
fraction ~ 1 1 %
I is called the quality of the evidence produced by c in %. The qualities of
various kinds of codon function are derived in $ 3.2.
2.3. The diagnosis theorem
The form of evidence has now been defined, and the rules for its collection have
been set out. The information gained from the classification of one event, E , has
been transferred to its neighbours in so far as they share subsets with E , and the
subsets can be chosen to be of a size suitable for information of the category containing Q. Thus problems P 1 and I? 3 of § 2.2 have been solved in outline: the details
are cleared up in $ $ 3 and 4. It remains only to discover the exact nature of the
diagnostic operation: that is, to see exactly what function of the evidence consulted
about E should serve as a measure of the likelihood that E is an 52.
The problem may be stated precisely as follows. Let Q = ((c,, P(QIc,)))~,be the
collection of evidence available for the diagnosis of Q over the space of events %.
Let E be an event in %, and suppose
ci(E) = 1 (1 < i
c,(E) = 0 (k < i
< k),
<
That is, the evidence relevant to the diagnosis of E comes only from the functions
G,, ..., c,~,and is in the form of numbers P(Qlc,), ..., P(Qlc,). The question is, what
function of these numbers should be used to measure how certain it is that E is an52 ?
The answer most consistent with the heuristic approach implied by the Fundamental
Hypothesis is that function which gives the best results; this may be different for
different categories. But a general theory must be clear about basic general
functions if it can, and an abstract approach to this problem produces a definite
and simple answer.
Suppose that, in order to obtain some idea of what this function is in the most
general case, one assumes nothing except that E has occurred, and that the relevant
A theory jor cereb~cnlneocortez
187
evidence is available. Then E effectively causes k different estimates of the probability of Q to be made, since k of the cLhave the value one, and P(Dlc, = 1) is the
information that is available. That is, I3 may be regarded as causing k different
measurements of the probability that Q has occurred. The system wishes to know
what is the probability that Q has actually occurred; and the best estimate of this is
to taBe the arithmetic mean of the measurements. This suggests that the function
which should be computed
is the arithmetic mean of the probabilities constituting
the available, relevant evidence; in other words, that the decision function, written
' P ( Q / E )11asthe form
P(QE)
-
.If
d2
C, c,(E) P(Rc,)/
i =1
C
ci (B).
i= 1
The conclusion one may draw from those arguments is that if one takcs the most
general view, assuming nothing about the diagnosis situation other than the
evidence which E: brings into play, then the arithmetic mean is tlie function which
measures how likely it is that E is an 8. The diagnosis theorem itself simply gives
a formal proof of this. Tho meaning of the result is disvussecl in 2.4.
Lemma (Sibson 1969). Let Ti be a random variable which takes the value 0 with
probability q,, and 1with ~xobabilityp,= (1 - q i ) ,for I < i < 1.Let T be another
such variable, with corresponding probabilities q and p . Let p, q be chosen
z
I
to minirnjze C, I(%./
T), and let p,
=
i
i =1
Proof. Let p, $. I ,
C p,. Then p = p,, and is unicyue.
(111)
1
+ 0, and let ir, be its corresponding binary valued random
variable.
Hence
C,I(!qIT)
i
and I is always 3 0. Thus
=
ZI(!iyT,) -I- lI(T0~l1)
i
Zl(T,I T ) 2
i
,i
( % I q),
equality occurring only when I(T,IT) = 0, i.e. when T
value of C,I(T,IT) is achieved uniquely when p = p,.
=
7;. Hence the minimum
i
L)iag?losistheorem. Let l2 be a binary-valued random variable, and let p,, ...,p, be
independent estimates of the probability p that D = 1. Then the maximum likelihood estimate for p is p, = ( I /I() Xp,.
i
Proof. The estimate pi of p may be regarded as being made through noise whose
effect is to change the original binary signal 9 ,which has distribution (p, I -p), into
the observed binary random variable q.(say), with distribution ( p i , 1-pi). The
information gain due to the noise is I(Ti19).Hence that value o f p which attributes
D. Marr
188
least overall disruption to noise, and is therefore the maximum lilirlihood solution,
is the one which minimizes xI(T,IQ).B y the lemma, p is unique and equalsp,,, the
a
arithmetic mean of the p,.
This result applies when the p, are independent, or are so to speak symmetrically
correlated. For example, if T,, .... T,<-, are independent, but T,, = 7;,_, the result is
clearly inappropriately weighted towards 7;,,. On the other hand, if lc is even, and
5!; = T,, T3= T,, ..., 7;, = 7;,,this is not harmful. The general condition is complicated; but if c,, c,, ..., cLl,form a complete set of r-codons over the fibres {a,, ..., a,,),
or a large random sample of such r-codons, then they are symmetrically correlated
in the above sensc.
p = p, gives the best single description of p,, ..., P,~
in the sense that it minimizes
x1 (711[IT).The diagnosis theorem dettls with a situation in fact rather far removed
_,
i
from thc real one, and the next section is concerned with reservations about its
application. It is not clear that any single general rcsult can be established in a
rigorous way for this diagnostic sittliztion.
2.4. Notes on the diagnosis Iheorem
The key idea bchjnd the prescnt theory is that the brain deeornposcs its affcrent
information into what :Lrc essentially its natural clustcr classcs. The classes t l i ~ ~ s
formed may be left alone, but arc likely to be too coarse. ?'hey will often have to bo
decomposed still furthrr, until the clusters fall inside the lasses which in real life
have to be discriminaLed; and they will often later havc to be recombined, using, for
ex:zrople, an 'or ' gate, into more uscf'ul oncs, like spccifie numeral or letter detectors.
These various operations are of obvious importance, but the basic emphasis of this
approach is that the natural generalization classes in thc nai'vc animal ;trc tlic
primary clusters. Diagnosis of'a ncw input is achieved by measuring its similarity to
other events iu a cluster, and the similarity measure 'P of' $2.3is proposcd as suitable
for this purpose. Its advantages arc that it can bc dcrivcd rigorously in an analogous
situatio~iin which thc c, are proper random variables; and tha,t the rcsult does not
absolutely require that the cibe intlcpendent. Morcover, thc conditions under which
dcpertderlce bctwccrl the c, is permissible (the 'symmctrie' correlation of $2.3)
include those (when the c, are a, large sample of r-subset detectors) which resernk)le
tlicir proposed conditions of use ($4).
Nevertheless, the infercnee that if Y'(O1 E ) is sufficiently high, then E is probably
an O, rcsts upon the Fundamental Hypothesis. This observation reist\s a number of
points, about tlic structure of the evidence functions, and about ways in which
exceptions to the general rulc can be dealt witll. The various points are discussctl in
thc following paragraphs.
2.4.1 . Cyodons for evidence
?'he validity of the statcmcnt that a high 'I-'(QIh') implics that F is an 9 rests upon
the strueturc of the cvidence functions used to obtain 'P. The neural models of $ 4
A theory for cerebral neocortex
189
employ codons (i.e. subset detectors), but their physiological simplicity is not their
olily justificatioii. l n 3 4.2 it is shown, as far as the imprecision in its statement allows,
that the Fundamental Hyp~t~liesis
requires the use of rathcr small subset detectors
for collecting evidence. I t is not clear that advantage can at present be gained by
sharpening the arguments set out there.
2.4.2. Use of evidence o j ayproximately uniJorm qualily
The reason for usingfi~nctionsc~
over 5at all, rathcr than simply collecting evidence
with fibres aj, is that the untransforrncd a j would often not produce evidence of
suitable quality. It may be possible simply to use fibres, especially for storing
associational cvidcnce (see 3 2.4.5); but it is probably also often necessary to crea te
very specific codon functions giving high quality cvidence for very selective classificatory units. This process must involve learning whenever the classes concerned
arc too specialized for much information about them to be carried genetically.
The quality oEa piece of evidence is a measure of how specific it is to certain events
in the diagnostic space X. I n general, a given diagnostic task will require discriminations to be made ahove a minimum valuep (say)of F ,and the quality of the evidence
used will have to be sufficient to achicve such values of T. The higher the quality of
the chvidcnce, the more there has to be to provide an adcqnate represcntatiot~of X ;
and hence economy dictates tha-t evidence fbr a particular discrimination should
have as poor a quality as possilole, subject to the condition on T . Evidence of less
than this minimal quality will serve only to degrade the overall quality, and so must
be excluded. Hence, cvidenct. sho~lldteiid to have uniform quality. Mixing evidence
of grcatly diEercnt qualities is in general wastet'ul.
This condition is satisfied by the models of $4, where evidence is provided by
(IZ,O)-codons,and most of'the evidence fix a single classificatory unit has the same
vahrc.s of R and 0.
2.4.3. C!lassifiing to achieve (I, particular discrimination
The q~mlityof evidence function for a particular classificatory unit depends upon
the minimum value p of' 'P which is acceptable for a positive diagnosis, and this in
turn will depend on how fino are the local discriminations which have to be made.
The size of the clusters diagnosing the nurneral'2 ' (say)in the relevant feature space
depends upon the necessity for discriminating ' 2 ' from instances of other numerals
and letters. 'l'he usual condition is prohahly that the part of the diagnostic space
(over the relevant features) occupied by instances of a '2' must be covered by
clusters contained wholly in that part. This condition fixes the minimum permissible
value of p for diagnosis of a '2', which in turn fixes the subset sizes over any given
diagnostic space. There may however be important qualifications necessary about
this approach: the observations of @ 2.4.4 and 2.4.5 can seriously affect the value
of p.
190
D. Marr
2.4.4. Evidence against 9
7' will be most successful as a measure for diagnosis wl.ien the properties being
diagnosed are stable over small changes in the input event. As E moves away from
the centre of an 9-cluster in the diagnostic space X, the values of P(S2lc) where
c ( E ) = 1 gradually decrease, and P dcc:reases correspondingly. Provided these
things happen reasoilably slowly, all the remarks about symmctric:al correlations
of the cvidenc:c functions will hold in s n adequate fashion.
The possibility must, I~owcver,be raised Lh:*t within a gentlral area of E whic:li tends
to give n diagnosis of Q, there exist special regions in which for some reason, L2 docs
not hold. Provided the rogion in which 9 does not hold is itself a cluster witllirl tllc
larger f2-cluster, this state of affairs is not inconsistent uith the li'undamental
Hypothesis. This contingency can be dealt wit11 in the same may as the diagnosis
of Q, hy col1ec:ting evidence for 'not 9' cvidcnc:~against O-within either 2, o r a,
space related to X. The form of the analysis is exacLly the samo as for Q, except that
the classificatory unit for 'not L2' must be capable of overriding that for 9 . k t is of
c:ourse irrlportant for t l ~ csucc:essful diagnosis of 9 that diagnostic spaces for B and
for 'not L2 ' should both be appropriate, and both have evidence functions of suitable
y uality : but the mechanism which discovers the diagnostic: space X for 9 can clearly
be used to discover the appropriate space for 'not 9'.
It is interesting that this situation corresponds exactly to one proposcd for. the
primary nioto~.cortex. It has been suggested by Blomficld & Marr (1970)that the
superficial cortical py~amidalcells there detect inapproprit~tefiring of deep p+yryramidal cells. They presumably detec:t clusters in inform:ttion dcsc:ribing the difference
1)ctwcenan actual and an intended movement. These clustcrs in effect correspond to
the need for deletion of activity in certain dccp pyramids (an instance of the
Fundamental Hypothesis), and the superficial pyramids cause the deletions to be
learned in the cerebellar cortex. This distincticln between the classes represented
by deep and superficial cortical pyramidal cells may well not be restricted to area 4.
2.4.5. Cornp~iingdiaynoses and conl~xlzccxlc l u ~ s
It is often the case that a single retincd image could originate froni two possible
objects, yet contextual clues leave no doubt about which is the true source, and that
source is the only one which is cxperionced. Such circumstai~cesdemonstrate thegreat
importance of i n d i ~ winformation
t
to the correct diagnosis of a sensory input. The
present theory contains threeways byw1iichsuc:h information niay affect u diagnosis.
First, contextual inform ation --for example, ooncerning the plaoe one is ill-n~ay
be included in the spcoificntion of the diagnostic space for 9 . There presumably
exist classificatory units in one's brain for the places in which one conimonly finds
oneself, and other units which describe Icss comnlon 1oc:ations more pcdantic::~,lly:
and these probably either fire all thr time one is in the appropriate locat'don, or
(roughly) fire m~hencverother parts of the b n ~ i n'ask' where one is. Such information
may be treated like more conventional sensory input.
A theory for cerebral neocortex
191
Secondly, diagnostic criteria within categories can be relaxed by changing p . It is
allalogous t o the ideas proposed in explanation of the collaterals of the cerebellar
Purkinje cells (Marr 1969; Blomfield Pr, Marr 1970). A prinri information is sometimcxs available which makes uriits in one category more likely to be present
follo~virigthe diagnosis of uriits in another. I11 such cases, a general relaxation
of thc minimum acceptable value p of 'P over the relevant category will be appropriate.
Thirdly, and perhaps most important, is the matter of ' associational' contextual
information. No additional theory is required, since such information can be treated
as evidence in the us~zalway. J t is probably for this kind of information that
evidence functions are least often needed: direct association of classificatory unit
detectors (cortical pyramidal cells) will often be adequate. The matter is touched
on ill $4.1.8, and dealt wit11 a t more length in Marr (1971, 9 2.4).
2.4.6. C:~neral remarks about 'P
The direct technical importanoe of the Fundamental I-Iypotliesis t o the application of the results of the diagnosis theorem raises the wider issue of the extent to
which one can feel justified in applying information-theoretic arguments to the kind
of situation with which the diagnosis theorem deals. The Fundamental Hypothesis
sim~tlysummarizes the view that clusters are useful. This is a heuristic approach,
and i t is not obvious tliat the ciiagnosis problem deserves any better than a heuristic
approach itself. It probably matters rather little exactly what measure of similarity
or fit is used: the redundancies on which the success of the system depends are so
gross that there is probably more than one worliing alternative to 'P.
If this is so, the diagnosis theorem loses much of its importance as a derivation of
the ' correct ' measure, since there may be no genuine sense in which any measure is
correct, as long as it has a certain gener;~lform. The measure I-' does however seem
intuitively plausible, and the reader may be happy to accept it without rnuch justification. Theorem 2.3 is the best argument this author has discovered in its supporl;
but it is not binding.
The measure '7 can be given :L direct meaning in terms of the events of 2. Let
X ,brx the set of events E of X with c,(F=)= 1. Then P(Qlc,)is the probability that if an
an event of .X,occurred, it was an J2. Suppose that X is the set of all events of size L on
the fibres {a,, ..., a,), and that the evidence functions c,, ..., c, are the set of all
r-codons. Let E' be the new input event of X, which must be diagnosed; and let E
be an arbitrary event of X. Write d ( E ,P ) = x, d being the usual distance function
of 5 1.2.
'I'hc number of r-subsets which E and P share is ( L %), taking
(z)
to be zero
.
.
when y < x . Hence the weighting fuiiction which describes the 'influence' of E on
the diagnosis of F is
Thus t l ~ earithmetic vricao obtained k)y the theor*cnnof S 2.3 is
whcre h is the proba,bility distributio~~
induced over 2 hitll~erto
by the enviro~ll~l(~iit.
2.5. T ~i nP
t e ~ p r e l o f i otheorem
?~
The diagnosis theorem 2.3 wa.: concerned with the diaguosis of the pi.opcrty lo
over the diagnostic slmce X on fihrcs (a,, .., u,). 'rl~eevents E in this situation
it ~will frecquentliy occur in practice
specify the values of all the fibres {a,, ..., a,); I , L L
that some values; of thv n, will k)e undefined, and a decision has to b~ r n : r t l ~on tlre
problrrn is that this will nican t h : ~many
t
of tlre
basis of incomplete inforrriation. ' r h ~
evidence functions (.,arc also uildefii~cd,thus leaving little if any cvidcncc acatually
acucssible. to the input in quest ion. For extunplc, suppose :L recognition system has
bccn set up for a partic111:rr f:,c(>:then a ger~cilsketch oftllat face call k)c rcc20gnizcd
as such, cvcn though much information--the colour of the eyes, skin, h:rir :snd so
forth -is ~llissing.r'irlc:lr :r skvtch cnrl itself be silalj~sttlanti set up :lc3 a acw classif;ctrtory unit if that scLemsu~cf~fill,
and tlie rnechanic~sof this process are the smne :IS
for the original. But this is a notion quite separate from tlic idea that tho slzotcl~is in
some way related to thc original face, and it is this idea wibli which tlre present
section is concerned. The crux 01the relationship is th:rl, tlie original face. is the one
which in somc way best relates tlie sparse informahion contained in the fci~tures
13reserited by the sketch. The result which follows chnractc,rizcs this vclationhhip
precischly.
2 , as nsual, is tlie event space on {a,, ..., a,,). Lci X be :r suk~oventof X wlrich
speciiies the values of (say) a I , ..., a, for some T < IT. 1'hen the event E in X iq a
comp2~Lio.1~
of X , written 18 I- X, if
(i) E specifies the values 01all a,, 1 < i < LV,
(ii) &(a,) = X(cn,) where X (a,) is defined.
1,et G = {c,]l < i < df) be the set offunctions on 3 which provide evidence for the
diagnosis of f2.Since X is not :r full event of X, c,(X) is undefined (1 < i < ili).
Now there clearly- exists a sense in which c, (171) might, be defined: for example,
for all E in X sucl~that E I- A
',
either
: ( ) = I
or
c i ( E )= 0 for all
E i n X s ~ ~ c l l t l I!:I-X;
~:~,t
but such :r circumstance is excel3tion:~l,and cannot be relied upon to provide adequate (1i:rgnostic criteria.
Lot {EJ,..., E,) be the set of :dl completions of X in X. Then clearly if ' P ( S &
] ,) has
thc samt: vtduo, y, for all 1 < i < K, tht:re are strong grounds for asserting that on the
is also y. This result is :r
basis of the cvidence fvorn O, the estimate for F'(LJ!',,IX)
A theory for cerebrctl ~aeocortez
P93
specid case of the following theorem. If Y(MIx) denotes the maximum likelihood
value of thc probability of Q given X , taken from the evidence, 'r(QlE)denotes the
estimate arrived a t in the diagnosis theorem, and P(I/:,IX) is :L conventional con(iitioiial ~)rok~aloility,
then we have the
Intrrp~elationLheowm. Let X be a subevent of E with completions El,..., Itlc.
anel is unique.
Prooj. The argument is sirnilar to that of the diagnosis theorem. Let T\ (X) be ;a
binary-valued random variable such that 7 \ ( X ) = I with pr-olsability Y'('(91Jlt) = p,
(say), for each i, I < i < K. Let F'(QIX) correspond to a bini~ry-valuedrarlciom
vi~riableT where 7'(X) = 1 w i t h probability p. Then each corxlpletion
of X
corrcsponcls to an estimate, p, of p, and P(E,IX) specifies the weight t,o be
attiached to this estimate. Hence by the same argument as that of the theorem 2.3,
the maximum likelihood sollrtion for T is that which minimizes
Ry an extension of' the t~rgumcntof the lemma 2.3., thr: value of p urhioh achieves
this is unique, and is
K
p =
P(lc71x)p".
L=
I
Ii
Hence
7'(QjX)
=
2; 'P(QIE,)P(ELIX),
i-1
: ~ n dis unique.
It'errzarks. 1113 general, no information bout P(lCt\X)will be available, so that
'I(Q12j X ) will usually be the arithmetic mean of 7'('(OlEi)over those ELt- X.
This theorem shows thi~jti~lcompleteinformation should be treated i n a way mrhich
looks like an extension of the mcxthods used for complete information, and tho
reservations of $ 2 . 4 apply clqually here. The result does, lio~vever,have the satisfying consequence that the modols of $ 4 designed to implement the diagnosis
theorem automatically estimate the quantity derived in the interpr~t~ation
theorem
when presented with an incompletely specified input (:vent.
This section contains the technical preliminaries to the business of designing the
concrete neural models wl~ichform the subject of the next. The results are mainly of
an abstract or statistical nature, and despite the length of the formulae, are essentially simple.
3.1 . Simple s!/naptic distributions
v2
be two populations of cells, numbering N, and A?, elements respectively.
Let $,,
Suppose axons from the cells of $, are distributed randomly among the cells of p, in
D. Marr
194
such a way that a given cell c, E
sends a synapse to a given eel1 c2e$, with
probability x,,. x,, is called the contact probability for 9,-t $,.
If L of the cells in p, are firing, the probability that a given cell c,E p, receives
synapses from exactly r active cells in Qlis
Hence the probability that c, receives a t least 12 active synapses is X where
X ( R ,L, x,,) is called tho formation probability for
-t
9,.
Suppose the cells of $,receive synapses from no cells other than those of $3, and
that they have threshold R. The probability that exactly s cells in $, are caused to
fire is
(
1X
where X = X ( B ,L, z,,).
(3.1.3)
(1
Hence the prohability that a t leasf, S fire is
I t is of some interest to liilow how well represerrted the L active colls of '$,
arc by
the cells of !@,whichthey cause to fire. ]{'or most purposes, and all with which this
paper is coneerrred, it is sufficient that : ~ n ycllange in the cells which are firirrg ill p,
should cause a chanrge in the cells of 3,. This is in general a complicated cyuestion,
but n simple and useful guide is the following. Suppose the L cells of $, cause
cxactly 12 synapses to active on each of AS'cells of Q,.Then the probability that at
sends a synapse to nono of the active cells in 3, is
lemt orre of thc L active cells ill
( 1 - I21L)". If IZIL is small, this is approximately
3.2. Quality of ~videncc:
frorn codon fzmcfions
Codon functiolls, introduced in § 2.2, are associated with particular subsets of the
input fibres in the sense that krrowledge of the values of the fibres in a particular
subset is enoug1.1to determine the value of the codon function. The larger the subset,
the smallcr the number of evcnts a t which the function takes the value 1, so thc
more specific that function is to any single evcnt. Hence the general r~llethat r-codon
functions providc better evidence the largcr the value of r. This point is illustrated by
the discrimination theorem which follows, and by various estimators of the quality
of evidence to bc expected from a codon function of a given sizc.
A theory for cerebral neocortex
195
It is convenient to usc the ovcnt space X on fibres {a,, ..., aLV>such that in each
event of%,exactly L of the fibres ai have value I . The set of such events is called the
code of size L on (a,, ...,a,). This involvca no a1)solutc restriction, but enables one to
deal only with codon functions which assign the value 1to all the fibres in their particular subsets, rather than allowing any arbitrary (but fixed) selection of 0's and 1's.
Let X be the code of size 1; on (a,, ...,a,,), and let 3 be a set of events of X -for
example,s may be the set of events with the property Q. Let %, be the collection of
all subsets of {a,, ...,a,,) of size r .
1)eJnition. '13, discriminates 3 from the rest of % if given X EX,X $3,there exists
a subsct C E %, such that G s X but C 4_ Y, for any Y E 3.
!Z1heorenz.Let 3 j
:2; t l ~ e n
there exists a unique integer R
discriminates 3 from Z, all r 2 22.
=
R(3) such that 8,
Proof. If %, discriminates 3from X,any 110?s.t. 110,2 %, also discriminates 3 from
2. If 3 can be discriminated by %, then 3 can be discriminated by 110, some set
my,, of (r -(- 1)-subsets,since there will exist a set 110, of (r + 1)-subsetsthe set of
whose r-subsets contains %., Finally, 3 is always discriminated by %L = (l31.F:~
Hence there exists a unique lowcr bound R s.t. 3 is discriminated from 2 by all %,for
r 3 IZ.
,,,
,,
x).
This shows that for a givcn discrimination task, 3 from 2, for which codon functions are to be used, the codons must be bigger than some lowcr bound R which
depends on 3.
Ilejir~ition.13 is called the critical codon svxo Tor J, and is writton R,,,,.
An a priori cstirnato of tho lilrely valuo oi' tho ovidonce olotairlod from a codon can bo
mado by cxarninir~gtho nnrril~crof events of various kinds ovor which tho codon takes tho
value 1.Lot f be tlro code of size Ti on [a,, ..., a ~ )f:contains
ovonts. Let h drrloto the
I(:),
uniform probability distril)ntion over X: i.e. h(lC)= 1
writo A ( 3 ) =
C A(]#).
W c- C r
--
Tlrerr
all B E X ; and for
55f
45) simply measures tho number ol'evcnts In 5.
Tho rollowing rosu Lts arc usorul. 3.2.1.14:ach input fibre is irlvolvod in LIN of thc ovonts in E (nndcr the ilistnbntion A). 3.2.2. Let 5 = {El(L - IE n 8'1) < p) whore F 1s soino fixed ovcnt of X, and p is a positive
intogor. Tlrat is, 5 is $hep-nci@rbonrhootlor E'. Their tho numbcr of ovont,sin 5 is related to
3.2.3. Now suppose c is an R-codon corsospondirlg to an R-subset of tho evc,nt E" of
$3.2.2. Tlre nurnbor of'overlts 3;such that E : E 5 (of 3.2.2) and c(h') = 1 is related to
where (I = {&'I
N -1 P
h c s n ~ ) = j ~c )- o L L-12
-B-z
c(E) = 1 ) .
x
(
N-L
)(
),
3.2.5. Supposo 3, tho p-r~oighbourhoodof P , is a diagnostic class of X for uhich tho
R-codon c (eosrcspondirlg to a subsct of F) is used to calculate ovidenco. Lot fi bo the
property of boing in 5: then thc v a h ~ o
r f P(L2Ic) that w o ~ ~ lbc
t l gcnclarttedby thc r~niforln
distribution h 01 cr X is givcn by
wherc c is an R-codon. I'rovidcd p is such that
iATs
"1
is large compared to
(that ~ s ispsinallcr than .,ay i ( N - L ) ) p, ~ d pn,, if /, < ( N - L ) (71 - R ) / ( N - R ) : 60 that for
of somPevmt If', incrcasinp
the sirnplc case u hcrc thc cliagnostic class is ap-neighbour~-llood
thc rodon s ~ z cwrll, under any 11kely condit~ons,rncrcvhse thc espccted quality of the
cvidcnce .
3.2.6. hl the rnorc cor~~plicatctl
caic whero c is a n (N,@)-codon~nt~crscctlng
B i n rxactly
N clcmcni s. 11 c ha1.e
$ 3.
' ~ I T E(2lCNIClt h L N E U l t A 1, E E P H E S E N T A T I O N
4.0. lntrodztction
This section is concerned wit11 the dthsign of n e ~ ~ rmotlcls
al
for implemeiiti~~g
thc,
theorcn~sof' $ 2 . I t is nssr~rncdthat t l ~ ccx:tc*t nature of tbc olassificatory units
required has already been dccidcd: only the rey)resent;btiou prok~lcniis dealt u7itl.1
hero. The discovery and rrfinrxrrrent of new c1:zssificatory units is postponed until
S5, wherc it is di:irussed witlrin tlze context of the irlodels developed now.
'I'he central diticrcalty with producing acural models for a, specific fuxuuctiorl is t h a t
there are m;my ways oC doing the same thing: nlthollglz tlle crucial averaging
operation probably has to be pcrformcd a t exactly one cell, there are many ways in
whirh the supporting structure may vary. Both the form of the cvjdenc>c,and Ll~c.
exact t~oiiditionslaiider which it is uscd, are undefined; so the rigorous dcrivatioa of
the basic neural models cannot proceed vcry far. This does not, however, commit thc
discussjon to unredeemed vagueness. 'Fhe irijcction a t strategic points of a little
common sense allows enough precision in t l ~ emodels to makc their comparison in
5 G with the known histology of non-specific cerebral rlcocortex a, useful venture.
4.1.
Implementing the cliagnosis theorem
4.1.1. Dingtzosis b!j n sin,gkp cell
Tllcorcm 2.3 suggests that the best estimate of the likclillood that a givcn event
falls within a particular class is achieved by taking the average of the conditional
probabilities oflered by thc relevant evidcnce. Supposc first that this operation is
carried out by a single cell called the output cell: tlle arguments for this appcar in
A theory jor cerebral neocortex
197
$ 4 . 1 . 7 . Let 8 be the cell in question, and 52 its associated property. 8 receives
afferent synapses from each of the evidence function cells ci (cells which omit a
signal--usually a burst of impulses-if and only if the input event E satisfies
ci (E) = 1). I t is assumed that the strength of the synapse from the cell ci for ci to
8 depends linearly on P(S2lc,). If, for Q, the number of cvidence functions ci with
ci(E) = 1 is indcpendent of E , 8 has simply to add the values of P(Qlci) for which
ci ( E )= 1 since
M
64
IP(QlE) =
k-lci(E)P(Qjci)cc
i -1
ci(E)P(S21ci)
i-L1
if Ic is independent of E. That is, 8 has simply to add the weights of all the synapses
from currently active evidence cells, and signal the result. It is easy to imagine that
the firing rate of the cell 8 should vary monotonically with the value of this sum.
The theory therefore requires that the strength of the synapse j'rom ci to 92 should
depend linearly upon n,nil u~heren,, = the number oj' limes ci = 1 and a positive
diagnosis u ~ aachieved,
s
m d n, -= the number oj'times ci = 1. This condition can clearly
be generated by some process in which a combination of pre- and post-synaptic
firing causes the synapse to facilitate, while pre- without post-synaptic activity
causes its power to decrease.
4.1.2. Synaptic *wights:the range oj' relevance
Xconomical usc of the full range of synaptic strength demands that the maximum
strength of each synapse should be achieved tlt roughly the maximum value of
P(Olc,) taken over those c, concerned with J2. This value is not necessarily 1 --indeed
will rarely k)c 1: suppose it is q. Then the range of strengths available t o each evidence
synapse must represent the whole of [O, q ] : it cannot be limited to Lp, q] for some
p > 0, since the accurate caloulation of IP(Q1E) may often depend in part upon
cvidence suggesting it is ~7eryunlikely that E is an Q.
Furthermore, nll the evidence synapses a t 8 which are likely to be used wit11 one
anothcr must have their strengths normalized to the same range LO, q] in order that
a n unbiased sum may be taken. Any two synapses should be interchangeable, yet
give the same output cell firing frequency. The range LO, qj is called the mn,ge of
r.elevan,ce for evidence associated with Q.
4.1.3. The plausibility range
Let [(B, q] be the range of relevance for evidence associated with Q. The maximum
value which YJ('(QIE)
can achieve is a t most q, and hence the maximum firing rate of
Q should be reached a t or near this value. Unlike the synaptic strengths, however,
there is no nced to be able to cover the wholerange LO, q 1, since the lower values may
make the presence of 9 extremely unlikely. Let p be that value of IP(91E) a t and
below which it is impossible that E ever is an Q; then [ p ,q I is called the plausibility
range associated with LJ, and O 6 p < q ,< 1 . It is evident that some accuracy will
be gainecl by representing only the plausibility range through the 8-cell liring
198
1).Marr
frequency. Both p and q will depend upon the nature of the information with
which i 2 is dealing; there will exist no universally valid values.
The simplest view of the output cell coding of P(Q1E)thus requires that S2 should
not fire a t all unless T(Ql E ) exceeds some minim um value p , and that its maxirra um
firing rate should be achieved a t or near some maximum value q. The only res triction
so far placed on the nature of the coding within the plausibility ninge is that it be
rnonotonic incrcasing with Y'(Q E ) . if the outputs of two cells have to be compared
t o decide for example into which of two classes the current input falls- -then unless
unreasonahlc coinpIiurtl,iorrsare introduced, tllcy have to code 'I'(L2113) the samc \Fay.
That is, they must have the same plausibility range Ip, q], and they have to code
'IJ(QIE) identically (within the lirnits of permissible error) inside the plausibility
range. Since it is often necessary to decide between classes of the same kind, it rrlay
be concluded that all output cells for diagnosing competing classes should be cells
of the same construction: they should share a common plausibility range, and a
common coding within it.
The final complication to be added to the simple scheme of $4.1.1 which sirnply
summed the weights of the active afferent synapses is that the number of such
synapses may vary. E = C ci (E), and in general depends upon E . S2 must tllcrci fore be associated with some mechanism which can compensate
for this, and its
effect must be to divide the total ~c,(E)Y(S21ci)
by k(E) = xci (E')for the current,
i
i
event E. The output cell firing frequency must therefore be monotonically related to
lc-l(E) Cc,(E)Y (Qlei)
i
within the plausibility range for Q.
4.1.5. Computing 'P(0IE) - p
The four possibilities for the sequence of operations carried out in the computation
of 'P(QIE)-p are represented by the bracketing in the following formulae.
I n (1) and (2), the summation is performed before the division, whcreas in (3) and
(4)it is performed after. I n ( I ) and (3), the subtraction is performed bcfore thc other
opcrations, which arc donc on the residues: in (2)and (4) the subtraction is dolie last.
A theory for cerebral neocortex
199
The smaller the numbcrs can be kept, the more accuratc will bc thc final result; so
other things bring equal, computations which kccp numbers small are to be preferrcd to ones which do not. Other things are cqual in the choice between (1) and (2),
ant1 in the choicc between (3) and (4). It is thercforc natural t o prefer (1) to (2), and
(3) to (4).
I n all thesc computations, a subtraction, summation and division havc to bc
performed, so i t is important to consider whether they can plausibly bc cxecuted by a
real cortical neuron. Many types of cortical pyramidal cell will be idcntificd in $ 6
as output cells, especially those typcs found in layers I11 and V of Cajal.
The synapscs for P(SZlci) are assumed to be excitatory, and only thosc with
ci (E)= 1 carry a signal. Hcncc there is no difficulty about arranging that only those
P(Slci) with c, (E)= 1arc considered. The summation of the active synapses is, as
rcmarked in 4.1.1, an operation which it is quite plausiblc to assume possible in the
dendrites of 8.
The subtraction must be performed by inhibition. Thc actual amount of inhibition, in both (1) and (3), depends upon k ( E ) = CC,( E), which will vary with E, so
i
the amount must depend upon the numbcr of active evidence cells c,. This means
that one or morc inhibitory interncurons must have dendrites which samplc the
fibres from the ci-cells, and whose axons terminate on the dendrite of 8 itself, ncar
enough to thc aotivc ci-cell synapscs to intcract with thcm in an additive way. Thc
dendritic field of 8 may be vcry largc, in which casc many inhibitory intcrncurons,
each with a rather local dendritic field, will be needed to cnsure cach dendrite
contributes its proper share to the sum.
Both ( 1 ) and (3) require that the subtraction bc pcrformctl bcforc the summation,
and the idea of subtraction perthrmcd uniformly ovcr thc ,!2 dcndritic trec makes
both schcrncs possible from this point of view. 'rhc grcat problems arise ovcr the
division, which has to bc donc if Ic (E)varics significantly. ( I ) and (3) diffcr in the
order in which thc summation and thc division are takcn, so the discussion of division falls into two parts. First, can i t be done a t all; and secondly, if it can, docs i t
appear that either of (1) and (3) is more likely?
Suppose for the moment that division can be performed. Observe that it has
certainly to occur ufter an estimate of the total value of Ic (E)has bccn made. This is
bccausc a division by (n, + n,) becomes complicated if one insists on dividing by n,
first, and then performing some operation on n,, since the nature of the operation
to bc performcd using n, depends on the valuc of n,. If division is to take place,
therefore, an explicit estimate of k (E)has to be made: by the neural machinery. The
actual division process has then to involve this estimate.
A distinction can be made between the mechanics of this process for ( I )and for (3).
If the division is done before the summation, it has to bc donc over the whole 8
dendrite, and must thercforc involve some kind of uniform ficltl where intensity
depends on k(E). If, on thc other hand, the summation is done first, the division
might be a quite localized proocss.
200
D. Marr
4.1.6. A model for division
This is not the place for a detailed discussion of dcndriie theory, but i t in v~orth
pointing out, by way of general support for the theory's plausibility, thattthere exists
an e~t~r~crne1-y
simple model for the process of' division. Suppose Cr is a spike generator,
and 1 is a spike inhibitor, as in figure 2. The spike generator produces imp~xlsesM ith
some frequency v, and models the result of the summation process. The spike inhibitor I has two input s, one from C: and one of strengt,h wliich varies with k ( E ) ,1he
FIGURE
2. A rr1orlc:l fbr Ciivisiorl. Thc spiltt: geilcmtor G emits spilzes a t n rate v and the iilllibjtor
J allows a fraction f to bc tr:msinittcd, wher.c] cc Ic-l(E). Rpiltes arc thcreforc e~nit,tc:tl
at a
rate f v cc vl: ' ( I $ ) .
number of currently active cvidcnce cells. 1 is such tlmt each incoming spikt. is
transmitted with probability f , wherc f varies inversely with k ( E ) . That is, cac.11
incoming spike has a chance j = K k - ' ( E ) of crossing I, wherc R is some suitable
norrnalizirg constant. I may thus be regarded as a conducting medium with only
a fraction f of its maximum ability to sustain a spike. The outq)ut spike froqucnc~j
is
then monotorlicully related to vlc-'(8).
Tl~crea re of course other rnodcls which have the same effect, but one fact seems to
commend this above the rest: it is thiit spikes have been observed in the large
dcndritic stelns of the cerek)cllar Purkinje cclls (Eccles, Ito Ot flzent&gotl~ni
1967,
13. 79) and of the hippocampd pyramidal cclls (Spencer cYs Nandel 1961). It is
therefore not unreasonable to suppos(\ that the main apical dcndriLes of cortical
pyramidal cells are also able to support spikes; and if so, that this is how the sum of
the residues is commuilicated to the soma. I t is, however, wcll known that many
cortical pyr:r,midal cells, especially those of layers I11and V, have somas surrounded
by k):~skctcell synapses (Cajal 191 I ) . Tllcse cells are wcll placed to make an estimate
of lc(E),the amount of parallel fibre activity, and are almost certainly inhibitory.
'I'heir action might therefore have the eEcct that a proportion of the spikcs from the
dendrite fails to be transmitted to the axon, this proportion depending in a suitable
way on tho value of k (E).Tile estimate of k ( E )itself could be the combirled work of
many basket cells, their contributions being summed a t the sorrla itself.
If this model is correct, it provides an explanation of how the division process is
performed, in the case in which i t follows the summation of the residues. T t tlms
favours tlle order of computation described by formula ( 1 ) of $3.1.5.
A theory for cerebral neocortex
4.1.7. Arguments for diagnosis by a single cell
It is necessary now to justify the choice of using one rather than a collection of
cells a t which to compute a single decision. The arguments are these: first, the
weights of the synapses from the evidence cells must vary with 1'(9/ci) which
depends, for each cell ci, on the number of positive diagnoses coincident with tlle
firing of ci. ITencc in order that evcry evidence synapse llas tlle correct weight, all the
output cclls representing Q
. a t whose synapses tllc evidence is collccted must fire
every time a positive diagnosis is achieved. Hence either t l ~ eoutput cells must be
completely interconnected, or they must drive some supcr-output cell, which fires
them all if it is itsclf fired.
Secondly, if cvidence for D is collected and judged by many cells, tllc weight each
ell has in the final decision ought to dcpend upon thc amount of evidencc it has
considcrcd. This could bc arranged by sorne suitable trick, but thc combination of
this and the first point, though not compelling, favours tlle view that each decision
process be carried out by one cell. If therefore, as also secms likely, there (lo exist
scveral representations of any given concept, they are probably independc~it.
4.1.8. Dur~lpuypose output cells
This concludes tlle discussion of tlle implementation of the theorem 2.3, but
before leaving the topic to discuss the form of evidencc functions, solnethi~lgmust be
said about driving the cell SL k)y information of two distinct types. If a single diagnosis could be achieved by two quite unrelated sets of evidence, with different
plaixsibility rangcs, i t would bc necessary to locate thc relevant synapses on dil'ferent,
independent regions of dendrite. I4or example, use of 8 with direct sensory information may involve synapses on t1.1~
apical dendritic trcc of a cortical pyramidal
cell, whereas associational infbrmation may be held in the basilar dendrites. Thesc
systems could possess different values for both limits, p and q, of' the plausibility
range. They would require entirely different systems of inhibitory subtraction cells,
and although the baskot cells for the division function could in each case send
synapses to the soma, their dendrites would have to sample the correct, disjoint
populations of evidencc fibres. The cell 8 would then effectively become two cells
in one, and it would succeed i n this vale as long as the other cells of its class also had
the same specifications, and the same dual plausibility ranges.
If 8 can be driven by sensory or by associational information, i t is possible that
conditional probabilities for scnsory evidence should not count those instances of D
which arise k,y association. This is because in the second r61e, f
2 may be being used
symbolically, not directly. P(.Qjci)For sensory information should probably not be
influenced by instances of this r61e.
Finally, the advantages of such dual r61c cells may be important. If all the
vavious conditions are satisfied, they can probably combine in a satisfactory way
information of two kinds in a single diagnostic process. This would to some extent
be against the rules, but as long as the contravention is uniform over cells of the
2@2
D. Msrr
relovant category, it would probably work. The effect would be to make i t easier to
see what you expect to see.
Tlie results of this section arc surrlmarized in figure 3.
F r a r i n ~3. The output cell CZ bas tlircr klntls o C afCt1rc>rltbynnpsr,: Ecbb synapsps (opcn triof 1nh1131toryb y ~ ~ q > s'I'hosc~
(\.
from the S-cells
a ~ ~ g l rfr.oiri
s ) cv~tlcnce(*011s,and two 1<11\(1s
arc sprencl oviXrtIlc Jor~clritictrcc, and pcrforrn a subtmctlon: those froin the D-ccllr,
concorltmtcd itt thc some, pcrfortn abd ~ v ~ s ~ o r l .
4.2.0. Strxndnrd evidence ,[unctions
Two constraints have been pl;~cedon the evidence filnctions c, for a particular
output cell 9E: tilat the evidc~lcethey provide should beof sufticierit quality, and that
tlie arrionnt of correlation between the ci for i2 should. be either negligible or regular
in a way which does not oausc improper 10i;~s.Tile choice of evidence function ought
to depend upon tho particular circumstances for which it is required: if especially
eflicient f~mrtionscxist :tnd ~c211be constructed for a particular purpose, their usc
will pel-mit an economy irr the amount of stl.uct,urerequired for t h a t process. But it
will frequeiltly occur either that rnthcr little is known about exsctly what information will colne to he held in it p:~rticularpiece of cortex, or Lhat there is nothing
particular about that inforlnatioll which makes i t a suitable ctsndidste for special
methods, Yor such cnses, it is natural to seek t~ class of Ennetions from wl?ic*h2%
'standard' forrn of cvidcliec msy be constructecl.
rl
I here are vn~iousconditions such a class s h o ~ ~satisfy.
ld
Most important, they
A theory for cerebral neocortex
203
should have >I simple neural representation. Secondly, and also essential, there
should be different categories of function corresponding t o different expected
qualities of the evidence to which they give rise. This is an economy condition,
since it is wasteful to use better (and hence ingeneral, rnore) evidence than necessary.
Thirdly, according to the Fundamental Hypothesis 5 1.6, the expected quality of the
evidence produced by the function c will depend upon the distribution of the events
E with c(E) = 1 over the event space 2. If the property Q which the cell 8 is signalling is stable over relatively small changes in the input event E, the best evidence
functions c will be those whose events P with c(F) = 1 are grouped together, as seen
through the natural metric d of $1.3.2.
4.2.1. Arguments for codon [unctions
These three conditions do have implications about the kind of evidence one rnay
expect: they strongly suggest one particular f'amily of functions, the generalized
(R, 8)-codons. First, observe that figure 4 shows the simplest kind of afferent
Ficr~nx4. Ail (IZ, 0 ) - c o d o ~
crlt.
~ 'l'21ore arc R cxcitotory off~rcntsynapses (opm circtos),
a n d crrougll inh~bition(fillcd circlt~s)t o givc thc ccll a. threshold oC 0.
systerrl possible for a cell. 'I'hcre are I2 afferent fibres, aLl,..., ai,,each with an excitatory synapse of some fixed w e i g h t l , say. The ccll has threshold 0, which rnay be
determined by some suitably arranged inhibition. Then the cell will emit a signal
whenever a t least 0 of the IZ fibres aLl,..., a,,< are active: hence thc set ol' firing
conditions for the cell constitutes an (12, @-codon on any event space over fibres
which include
..., a,. A11 (R, 0)-coclon is thus a specification of thc firing conditions for a cell whose afferent relations \vitli its input fibrcs are simple, and anatomically and physiologically plausible.
Secondly, it has been observed in S 3.2 that suitable values of (I?,0) can be chosen
to construct an (R, 0)-codon which will match any previously specified quality of
evide~icc.I-Pcnce the second ~ o n d i t i o ~
isl fulfilled by the family of (Ii,0)-codons.
The various ccchnical problcrrrs which arise when one tries to design a net which will.
produce (I?,0)-codons for a particular input can be solved, and will be discussed in
the next section.
'I'hc above two arguments show that codon functions arc sufficient to satisfy the
two corresponding conditions: the next one shows that they are in some degree
necessary for the third.
13-2 D. Marr
204
Let 2 be the event space on (a,, ..., a,) and let d be the natural metric of 5 1.3.2.
Let (c),:
be the evidence functions for a particular property Q, and let 9 hold for a
p:wticula,r event E E 2, where
(1 6 i 6 L ) ,
J4(ai) = 0 (L < i < 111).
&(ai) = 1
Without loss of generalitg, suppose
and choose F E 2 such that d (E,F) = 1. Then :iceording to the Fundamental Hypothesis $1.6.4, the chance that .tP also has S2 is better than for :in event arbitrarily
selected from E . Hence most of the c, with c, ( B )= 1 should have c, (F)= 1 as well.
This argument applies to :ill F with d(E, 3') = 1: so let Nl (E) = (E1ld(E,P ) < 1).
For each ci, 1 < i < k, define a subset Ci of (a,, ..., a , ) in the following way. Write
Fj = the event obtained from fl by altering the value of the fibre aj, i.e.
Ei (ai) = E (ai), all
i
+ j,
4.(aj) = I +>E(uj)= 0.
The subset (4is obtained thus:
(4= {ai1ci (I$).I.ci (E)).
< i < k,
ci(<) = l a Ct c $..
That is, for 1 < i 6 k, c, rnay be regarded within N , ( B )as a detector of
Then for I
the subset C, of
the jbres (a,, ..., a,,). Thus locally, (i.e. within N , ( E ) ) ,ci behaves like the codon
function with associated subset C,.
But i t has been observed that for an arbitrary change from I8 to 4,some 1 < j 6 Ic,
the values of the majority of the functions ci should remain unchanged. Hence, for
most of the i, 1 < i < I;, it must be true that ci takes the value 1 over most of N, (E),
(assuming the c, are not organized in any special way). This implies that the size of
the subset C: which ci detects in 1%(E)is small, for most i, 1 < i < Ic.
This argument shows that if an evidence function is constructed fur classifications
in which the Fundamental Tlypothesis is true, then such a function behaves 1oc:rlly
like a codon function with a rather small associated subset.
This is the most that can be deduced about evidence functions from the necessarily
imprecise considerations out of which the present theory is constructed. The case
for (R,0)-codons being the general forrn of evidence function is not logically established, but it would at present be impossible to rnake a rigorous argument for any
family of functions. The th~.eearguments presented above do constitute good
evidence in favour of cwdons-evidence which it would require a strong and unexpected finding to dis~upt.
Finally, in the particular. case of the cerebellar cortex, where according to B1ar.r
A theory for cerebral ?~eocortex
205
(1969) something analogons to the present theory actually occurs, the evidence cells
are the granule cells, which are codorl cells with R 6 7. It will be pointed out in $ 6
that tho cerebral neocortex contains cells which may be regarded as ( B ,(I)-codons
wiC11 larger R. I t is thought that the combined weight of these arguments constitutes
sufiicient grounds for studying in detail the setting up and performance of (R,0)codon cells, where the values of It and 19have various relations to the parameters of
tht, code used on the set of input fibres (a,, ..., aN}.
4.3. Codon neurotechnology
4.3.0. The possible need for codon formation
At first sight, the use of codons virtually solves the problem of the neural represe~lt;~tion
of evidence functions. I'rovided the contact probability z from the
afferent fibres {a,, ..., a,,} to the population @! of codon cells has the appropriate
value, it remains only to set the thresholds of the codon cells in a suitable way
(see $3.1).
'i'he only possible problem with this scheme is that the evidence thus obtained
may not have the required quality. The better the evidence required, thc more
specific the codon functions must be, and so the less frequently they take the value 1.
If a roughly fixed riumbev has to fire in order to provide an adequatc representation of each input event, the size of the underlying population of codon cells
has to be larger the better the evidence required. Unless special measures are taken,
this might make it necessary in a particular case to provide a huge population of
evidence cells, only a few of which are ever used. This difficulty can be avoided by
using a special technique. It works by modifying just a few of the afferent synapses
a t s cell, so that a codon f ~ ~ n c t i oofnexactly the required sort is represented thcre.
The process of determining to which codon a particular cell should respond is called
codon jorrnation a t that cell.
The essence of codon formation is very simple. Let $3 be a population of cells, each
of which has R' afferent synapses. R' is such that a typical input event can expect to
excite 0 synapses a t each cell of $, where 0 is tho 0 of the (R,0)-codons eventually
required. The information which the codons liave to represent arrives daring a
spccial setting-up period ($5.1.2),and only the synapses used during that time have
any effective power later. This produces a population of codon cclls such that only a
few of the total number of afferent synapses have any power, but those few are the
correct ones. The details are described fully in the following pages.
4.3.1 . l'echniques l o r codwn forwl lationtion
The three basic mechanisms for codon formation appear in figure 5. In ( I ) the
afferent synapses are excitatory, and become ineffective if and only if thcre is postwithout pre-synaptic: activity. In ( 2 ) ,the synapses are composed of two parts: one
excitatory and unmodifiable, and one initially ineffective, but which is facilitated by
sirn~altancousprc- and post-synatic activity. The modifiable component is thus a
206
11. Marr
Hcbb-modifiable syilapse (Hcbb 1949). 'l'lie cornbii~ationin ono synapse of an
unmodifiable excitatory component with a I-Iebb-modifiable component has an
importance which was first noticed by Brindley (it appears a t the s-cells in Brilltiley
1969). It is therefore proposed that such synapses be named Brindley synapses, to
distinguish them from IIehb synapses which will taken be to possess the same modification conditions, but no unmodifiable excitatory component.
I'IG~JR~:
5. 'rhroo rr~odelsfor codon fo~,rn;tt~on.
(1)Uses synapses wEueh uro lnltlally excitatory,
hut are inotlifiod to be inoffertive by post- without pre-synaptic. activity (open squttrc.~),
(2) uses Br~ndlcysynapms (z~rrows),( 3 ) uses Hcbb synapses (oppr~trlarlgles) anti :t
climbirig fibro (open circlos). A11 three havo inh~bitorysynapses (filled e~reles)w11ic.i.1bet
t11c cells' throsllolds a t an appl.opl.~atclevcl.
Pn models (1) and (2), the cells also receive some inhibitory synapses which set
their thresholds a t the appropriate value. The eqllations governing the nurn1)t.r of
codons formed in any particular situation are those of $3.1 : X is called thc For-rnation proloak)ilitjr in those equations for this reason.
Case (3) is slightly different: this ccll possesses an afferent fibre analogous to the
ccrcbcllar climbing fibre. and its ordinary a,TTc.rcnt synapses arc Hebb synapes,
which arc initially incIfect,i\re, and are modified by the conjunctioi~of prc-syrt:~l~tic
and climbing fibre (or post-synaptic) activity. The climbing fibrc is activc: only
during the setting up period. ?'he consequences of this rnodcl arc slightly dif'erent
from t1:ose of (I)and (2), for after setting up, all those spthpses which were active
during the setting up period will have been rnodifiecl, not just those a t a cell whcrc, a
codon was successfully formed.
The conditions in which the codoii cells may laLcr be used are different for enell of
these modclls. I r r ( I ) , there is no difficulty, sincc t,llc irrolcvant sjrtlapscXsllnvc: no
powel*.In ( 3 ) ,the fact that all synapses active during the settin:; up periocrl will havt:
been modified may meall tirat an undesiruk)lgrlarge nr1mk)cr hltvc been rn:~dcexcitatory. Wlethods (1) and (2) arc in this scnse rnorc selective, and will tend to produce
better evidence. I n ( 2 ) , during later use, the cell thrcslloltl has to bo set so that
activity in a t least 0 mod-fied afferent synapses is rccp~iredto djscllargc the rvll.
I n d l cases, the codon cell thresholds can be set a t the appropriate levcl by using
A theory for cerebral neocortex
207
sampling techniques--both of the afferent fibres and of the codon cell axons -in the
same way as the cerebellar Colgi cells are thought to control the granule cell
thresholds (Marr 1969).
4.3.2. Model (2) preferred to model (1)
Models (1) and (2) will produce evidence of the same quality in a given situation,
k)nt modcl ( 1 ) has an inkportant disadvantage. If synaptic modification is an
irreversible process, the process of codon formation in this modcl is a once and for all
affair. The fact that all the synapses not involved in the first codon represented arc
thereby rendered ineffective means that the cell can never be used for more than
one codon. This model essentially rcprcsent,ed one codon by eliminating all other
possibilities, and as such is unattractive. This is not true of model (2), where a
synapse which is unused the first time could be used later on, if that became desirable.
The model (2) needs slightly more complicated backing up by the inhibitory
cells, since the level of inhibition necessary during codon formation both differs
from that needed for recognition of codons already formed, and depends upon the
number of codons already formed a t that particular cell. This difficulty can be
overcome if the inhibition level is set primarily by a count of the active codon cells,
so it does not significantly affect the desirability of this model.
Model (3), like ( 2 ) , does not suffer from the once-for-all disadvantage; but as
pointed out in $4.3.1, is not strictly comparable with ( I ) since i t forms evidonce in a
slightly different way.
4.3.3. A problem with ( 1 ) and ( 2 )
I n model (I), if synaptic modification is irreversible, each cell can represent only
onc codon. Hence the afferent synapses should not be modifiable all the time; the
precious potcnt,id of a cell must be reserved for information for which it is worth
being used. A similar point holds for model (2), sincc if the afferent synapses were
permanently modifiable, any incoming information could cause the creation of
codons. The point here is not that the first event rules out the rest, but that all are
treated as indiscriminately v;~lid.Since any input can create a codon if the a~iatomy
allows it, the cell is no different in function from one where afferent synapses are
unmodifiable excitatory. Therefi~re,for models ( I ) and (2), the modijable synapses
involtled must be modijable only whilst thrxt information for which codons are required
,fibres.
i s p r ~ s e n ti r ~t h aferent
~
l'his difficulty arises in model (3) in a less acutc form: the problem here is that
something has anyway to specify when codon formation should take place. No
difficulties arise with the hardware, since modification is geared to the climbing
fibre activity; but climbing fibres cannot in general select the best cells.
4.3.4. The solutior~using ir~hibitior~
The only solution to this problem in models (1) and (2) which uses conventional
ideas is to suppress the cells with inhibition until they are wanted. The alternative,
208
D. Marr
to excite them when they are wanted, is equivalent, but reduces (2)to an uninteresting variaut of (3). This scheme would work until the first codon was formod, but
would then fail in model (2): this is because inhibition cannot subsequently bc
maintained a t these cells without their losing the abilii,y to recognize the codons
that have been formed a t them. This defeats the object of the scheme.
4.3.5. Another solution
The alternative to this kind of solution is that the synapses genuinely sho~tld
become modifible only a t those times when codon formation is required. This i s not
as implausible an assumption as it might appear, since considerable organizatiorl
Elas to take place before the lormation of codorls kjecomcs necessary anyway. Codon
formation takes place either when a new classificatory unit is formed, or when new
evidence functions are added to a n existing one. The decision about how to cornrnit
a piece of information t o the neocortical store whether as a new classificatory
unit or as an association between existing ones- has to he taken on the basis of its
relationship to other incoming cvcnts. It cannot in general be taken immediately :
for example, it takes time for the mountainous sLructare of (L probability distribution
to bccornc apparent.
This has thc consequence that it is best to send all inc:oming information to a
temporary associative store, where it is held and not altered. This is onc point of
Simple Memory theory ( $ 6 and Marr 1971). When it becomes clear horn: s picec of
information should bc stored, it can be taken out and dealt with in thi: appl.oprit~tcway. If, for example, i t should be set up as a new classificatory unit, a location
must be sought (the one with tllc most favourablt pre-existing structure) and the
information directed thcrc for representation. 'I'hc complete operation is so specxiizl
and complex that thc assumption, that a suitable delicate change in the chen~ic~al
environment of the relevant codor1 cells ar*companicsthe transmission thcrc of the
setting-up information, ceases to uarry a special implausibility. The rnattcr is rliscussed further in $6.1.2.
4.4.0. Prciliwr inwry assumptions
The analysis of $4.2 suggested that oodon Functions :src likcly to be widely iised
as evidence functions. If they are, two conditions will Ilold, onc about the input
events, and one about the cotlon~thcmsclvcs. First, the input cvcnts for :r, particular
~
output cell Q are likcly to occupy a code of sorne fixed size L, say, on t h input
fibres {a,, . .., (I,,). The reason for this is that if the input events have an ar1)itrar.y
form, thcn codon functions of a n ar1,itr:sry forrn l ~ a v cto be allowed. An arbitrary
codon functionis onewhich assigns the values 0or I to a subset of (cr,, ...,a .): the eodor~
functions we have met so far have assigned only the value J . There is no objection
in principle to thc general codon function, but it is more difficult to build its licural
rcprcscntatioris, and much morc difficult t o model codon formation. I t will therefore
A theory for cerebral neocortex
209
be assumed, for the purposes of this section that the input events are events of
sizc L over {a,, ..., a*,).
Secondly, all the codons associated with a given output cell 8 are likely to be of
about the samc size. This is because only a small proportion of the codon cell
population will be used for any single input event: these are chosen by selecting an
appropriate codon cell threshold, andso come from the tail of a binomial distribution.
The numbers of cells discovered in such a situation decreases sharply as the cells'
thresholds rise, so that at any given threshold, the cells may to a first approximation
be regarded as all having the sarno number of active afferents. Since the input events
also will have tlle same size, all the codons connected with a given output cell S2 may
be regarded as having the same specifications. It will further be assumed that the
actual codon cells which exist have been chosen raildornly from the population of all
such codon cells with those specifications.
These conditions are sensible also from another point of view, since the expected
cyuality of evidence obtained from a codon depends upor1 its specifications. It was
rernarlied in $ 2 that the expected quality should be uniform for a given decision cell
8,so this condition is likely to be fulfilled. Further, tlle randomness assumption
means that problems ;~k)out
correlated evidence a,re avoided.
4.4.1. Xtatement of the main wault
Suppose a set of (R, 0)-codons are chosen as evidence functions for diagnosis of
tlle property S2, and that these codons constitute a randorn sample from the set of all
such codons. Suppose the illput events llave size L over (a,, ..., a,): then an iocompletc. event specifies the values of less than 1, input fibres. Lt is shown that the
iatery>ret;ttionof such an incomplete input may be carried out by taking a weighted
sum of certain J'(Q[c,) in a wag analogo~lsto the pr-ocedurefor diagnosis of conlplete
events. An estirnste ofthis sum, for an incornplcLe input S , can be obtained irr a real
neural not by lowering the threshold of the codon cells until X causes activity in ;:
significant number, and applying these signals to the output cell L
2 in tho usual way.
Hence in a neural model whcxre the codon cell thresllolds are controlled by cells
designed to maintain thc number of active codon cells a t a constant valucl, the
interpretation of an jncomplcte event is ;L natural consecluencc of applying the
event to the net.
Therc are two sources of error in this estimate: first, thoss. codon cells with more
active afferents than the current codon cell thresllold will probably acquire an
incorrect weighting of their corresponding value of P(Q[c,)a t 91; and secondly, the
estimate is based on a sarnpling process. 7'1~efirst kind of error is alleviated by two
fscts: that most active codon cells have thc same 1111mber of active afferents, only a
very fcw having more (because the active cells corne from the tail of a binomial
distribution); and that those codon cells with more active aEerents will be driven
harder than the rest. This effect operates in tlle right direction to reduce the error.
l'he inaccuracies from the second source are probably unimportant.
The intorprctdtlorl theorem, $2.5,1sconcerrlecl\.v~ t thc
h treatment of ~ n p a t sin whic2l t21e
of yornc, of the fil.)rciarc untlcfirled. Pn tltc pri~sc~r~t
caio, thrs corresponds to states
where fcwer than Ti of the input fibres (a,, .. .,an;}havr thc T ; h e 1 . L r t X be a, wlbtvcnt of
X(rr,) = 1, 1 < i < E < L. Let
the input event spacae X, anti suppose that X spc~c~iliri
E l , E,. ..., /< !: be tho possible completions of S 111 X, so that each E, ( 1 < j < J)spcvifies
$ha$ t,xactly L of tho a, have tho value 1.
By the Tnterprc~tntion'I'hcorcm,
I dues
.I
Y'(f2IAX)=
2
P(ff7,/>y)cP(12p3).
j-l
If nothing is kr10~571a b o l ~lP(C31X),
t
it rnust be assumned that,P(Zd,/S)
= I / , T all 1
Lot O ~1{ci\ I < i i
I<] be the set of all cviderlco functions for Q over X. 'l'hon
< j < J.
li whore L(E,)is tlze rlwnbor ofc, with c,(lC,) = 1, i.e. L(E,) = \3 c,(l$,). Hcnct:
i -1
.r
r<
= 2 J-l 22 et(EJ)k-l(E,)
P(Qlct).
Y'(l2IX)
j-
1
i-l
D c h c the f2~1ilityof real-valued functions w,,
1 < i,
K
< l< oil tlic sct {El:,,
..., B ~ J by
)
.r
= J-I 2: P(Qjc,) 2; ui,(E,).
+--I
1-7
The operation of cdcu1n;tirlp CP(l2JX')is thus equivalent t,o conzputing tho wcighted sum
.I
.I
the coefficiolltof P(Qjci) is ); to, ( E , ) , and wc now s t , ~ ~ tk~e
dy v i ~ l ~
this
~ etakes.
j-1
ID,
j-1
(E,)
measures the weight uith which P(Q\ci)contributes to the set of all possible completions
P(Qjci)has a c!ortain weight,: it is zero if ct ( l g j ) = 0 and
oSX in X. I n a givon c:ompletiorr. Ej,
if not, this weight is 3 /lc(.li:,)wk1ers k(Ej)is tho size of tho c,-rcpresent,at,ionof Zj. Now the
llumbor lc(Ej)is a r;u~clonrvariable obt,a,inocll,y adcling tlzo terms ill the tail of a birlolnial
distribution (see equation 3.1.1). Suppose k has dist,rib~~t~iorr
V : thon k-l has ~listribution
v-l say, witjhcxpoct,ation k-I ( E-1 in general), arlils~ariancoG (say). (Ass~~rno
k = 0 with
arc?st,riet,lysp~aki11g
zero probabilit,y).Tho values of k-'(Ej) for differol~t~j
not inilep~ndont~,
~.vol~ld
-~(E
Emve
~ ) the same moan
but i f they wi:ro, the random variable (l/n(c,))I; C ~ ( E ~ ) ~ G
i
16-\
and varianc.~crjJjn(c,)), where ic(c,) = tlic rurnbcr of E, with c , ( E , ) = 1 .
Tho value of' cr/J{n(c,)) does, huwevor, give sornc guide to the variance, of' this raizdum
It may k)o assumed tlmt u is small, since part of the function of tho Colgi-typo
v~~riable.
inhibitory cells which control 1,110 thresholds of tho cells is to onsure a constnnt-si7cd
representatlon for each input w e n t lC. The actual raiidoi-n variable d~?suribcd
above 1~111
have a variance somewh~rebetweon o and G/ J{r~(c,)),but since cr is small, arid the true
b ( x asslrrned that its vari:lllrc IS small cnongh
value will bo nearer cr,/,/jn(c,)\,it may sa.ft.1~
to bo ignored.
+
A theory for cerebral neocortex
Ilenco 'P(UIX)= K*
I: n(c,)l'(Qjc,), whore n(c,) = tho number
of' E, with c,(E,) = 1,
L
alld E, conlpletes X; K* is some suitable rlormalizirlg constar~t.
0)-codor,, and r is the rlunlbor of
Now n(c,) depends upon I?, 0 and r, where c , is an (R,
aeerent fibros active in X which are contained in S(c,),tho support of c,. In fact,
tho sum being talcen until on0 factor roaches 0,and whero
N = no. of'irrput fibros,
L = no. of fibres activo in each full sizod input event,
W = no. of fibros activo in X, $1
cr is a11(It, 0)-codon, r = no. of fibres activo in tho support of c,. For R = 0, n(c,) is primarily a function of r ; call it rb(r).
Then
( - -I -) - N-(W+lZ-r-1)
> N- W
rb(r)
L-(W+It--r-1)
I,- W '
For typical values, 0.g.
N = 100, L = 40,
'M = 20,
n(r+ 1)
n(r)
> 4,
which ~llustratestho fact that thosc c, wrth greater r havo much rnore ~rlflurncoover
Y'(SLIX) than those wrth smaller r.
Thoproblem of estimating Y(Q1X) from a famrly of (R, 0)-codons c, IS thus equivalent to
tnlc~ngtho weighted averago of P(Qlc,), whore the weighting depends upon tho nuinber, r,
l
bo shown that this can bo nchiovod
of activo ~nprrtelomcnts In the support of c,. It w ~ lnow
by rodlrcirlg tho threshold of the eolls for the (IZ,8)-codonto sonlo suitak)lolowor v,~luoB',
whlch depoilds upon W, tho si~c.of S.
Two problems have. to be solved when IJ(QIX)is computed: first, onough c, havo to bo
115eclfor the estimated arrswor to be reliable; andsrcondly, thoso c, which are usod havc to br
uoy,htetl In tho correct way. It 1s assumod that the c, arc all (12,O)-codonswhoso noural
rrpresontation is ofrectivoly as shown irl figure 4: it is irnmatorld whothor this is achieved
l
,Y of S ~ X OIY. Lhe pr0b:ihility of thr cell's
by nlodcls ( I ) ,(2) or ( 3 ) of figure 5. lpor i ~ i iilpllt
bcing active is
whr~rethe collhas thresholtl O', and 3 = IVlN (by analogy w ~ t h3.1.2). This 1s just iho usual
tail of a binorr~ialtiistribut~on.Now as 0' dccrcascss, the. rlumbrr of ( R ,0)-codons which
brcomo active iiloreases rapidly:
while n(8') I S small, both O f + I > 1 Z - 0' and N > 2 CV will usually hold. Herlce as thc value
X fkcs irrcreascs very fast: so that tho tldof 0' is lowcrrri, tho rrurtlbor of c, c ~ ~ lwh~ch
ls
fercmcr In 0' betwcon hav~ilgno cells acative to habnlg thc usual number for a full cvcnt
will only I)o of tho orcjl~rof 3 u n ~ t sof synq)tic stferlgth, and tho groat majority of the
active ci will have exactly 0' active affi?rentsynapses.
71hc problem of the diffcrcntial weighting uf'the P(Qjci) can thus bc: alleviatod as lorrg
as 0' docs not lio far below the mininlum rlumbor required to achievo tho response of a t
lewt on0 c,-cell. Yrovitled tho numbor of c,-cellsmade activo in this way is of tho order of tho
number ordinarily excited by :Ifull input event, cnough cvidcnce will bo irlvolved fbr the
212
D. Marr
estirnato of TJ(Q/X)to he reliable. Strictly, all tho ci which could possibly talic tho value I
on sonio complotiorr of X should be consu11tc:tl: bllt this numbcr could be vory large, and
tho problerrls of achieving tho correct woightirrg bocomo iniportarrt. I t is therefore rnuch
simplor to take mi estimate rrsing a,bout tho usual nurrrber of'c,.
Firrally, it s h o ~ ~be
l dnoted that if this is cionr:, the ci-thresholds cml be controlled b y tho
sanreinhibitory collsas control tlioir thresholds for nornial irlpul; cvents, since it has already
bocn shown that, a circuit whoso frrnction is to lcoop tho nunlbor of ci-collsactivo constant is
acloquatc for this task. 1f'tl.iistechniqne is usotl, t,hose few ci-cclls with rnorc than 0' activo
aff'erents will have a highor firing rate than thoso with exactly 0'. Honce thoy will anyway
bo given groator woight,it~g
a t tho c,-cell. I t would bc optimist,icto suppose this woightirrg
would bo oxactly the correct arnormnt. since tho factor involved tlcponds oil the pnaamctors
A7, L, W ,R, 0 , r ; but tho ef'foct will certainly rcducc the errors involved.
171uus~6. 'Vhv basic rrcural model for tllagnos~sand ~ntcrprc-ti~tron.
7'2ic- c-vitic-nor crlls
c,, ..., c, are codon cclls with Brmdlcy aifcrcnt synapses. 'L'hc G-cell coritrols thr codon
tl-~ro~~gliasc.t~riding
~ t s
dendrite to l r c ~ pthe
crll thrcsl~old:lt use% neqativc f<~r~dback
nurnbrr of cotlori cc-11s aclivr rollghly constant. I t s dcscendmng dcndrltr sarrlplcs tho input
fibres directly, thns provrdrrrg a fasL pathway throl~ghwhich an rn~tialcstimatc- is rnadc.
The othcr crlls and synapscs arc as m figurcs 3 aricl 5 (2).
4.5. l'hc full nez~ralmodel for diuqnosis and intcrpretcc,lion
The arguments of 3s 4.1 to 4.4 lead to the design of figure (fior the basic diagnostic
modol for a classificatory unit. The afferent syn:tpscs t o the ci-cells :&reexuitntory,
and rn;~yhave been achieved by some suitable codon formation process: inoclel (2)
o f figure 4 has been chosen for figure Ci. The inhibitory cells G control the thresholds
of the ci-cells, and their furlctioil is to keep the number of active c,-cells roughly
constant. If they do this, tjhcmodel a~t~omaticnlly
iiitcrprcts input cvents which arc
A theory for cerebral neocortex
213
incomplete as wcll as those which are full-sized. The G-colls are analogous to the
Golgi cells of the cerebellum, and it is therefore natural to assume that, as in the c:me
of those cells, the G-cells c;m be driven both by the input fibres a!, and by the c,-cell
axons. The find control should be exercised by the number of ci-cell axons active,
but a dircct input from the aiiRXOns would provide :I fast route for dealing with a
suddcn increase in the size of the input event.
'L'lke ci axorls and the output cell 8 have been dealt with a t length in 54.1. The
cclls S are the su1)tracting inhibitory cclls, and the cells D provide the final division.
The cell 8 is shown with two types of evidencc cell affcrcnt: one, through the
c,-cells to the apicztl dendrites, and one (whose origin is not shown) to a basal
dendrite.
I n practice, the distributvon of the aj terminals, and the G , D and S-cell axons
and dendrites will all be related. The kind of factor which arises has already becn
met in the cerebellar cortcx for the Golgi and stellate ccll axons and dendritcs.
Roughly, the more rcgular and widespread the input fibre terminals, the smallcr the
dcndrites of the interneurons may bc, ztnd the further their axons may extend. Little
more of value can be added to this in general, except that the exact most economical
distributions for a particular case depend on many factors, and their calculation is
not an easy problem.
$5.
r
r 1 ) I~S C O V~I C R Y
A N D RFFINFlKENT O F CLASSES
5.0. Int~oduction
There are three principal categories of problem associated with thc discovcrjr and
refinement of classificstory units. They are the selection of the information over
which a new unit is to be defined; the selection of a suitable location for its representation, togcther with the formation there of the appropriztte evidence and output
cells (formation in the inforrnation sense, riot their physical creation); and the later
refinement of the classificatory unit in the light of its performance.
The selection of information over which a new classificatory unit is to bc defjned
dcpcnds, according to the Fundamental Hypothesis, upon the discovery of a aollection of frequent, similar subevents in the existing coding of tho environment. The
difficulty of this task dependsmainly on two factors: t h e a p ~ i oexpectation
~i
that the
fibres evcntually decided upon would be chosen; and the time for which records
have to be kept in order to pick out the subevents. The threc basic techniques
available are simple storage in a temporary associative memory, wllich alloms
collection of information ovcr long periods; the associative acccss, which allows
recall from small subevents, and hencc eventually the selection of the appropriate
fibres for a new unit; and the mountain climbing idea, which discovers the class
once the population of fibrcs has been roughly determinrd. Only tlre third techl~iquc
can be dealt with here.
'L'he selection of a location for a new classificatory unit is simply a question of
choosing a placc where the relevant fibrcs distribute with an adequatc contact
D. Marr
214
probability. The formation of evidence cclls there is a problem which has already
been discussed in $4: the formation of output cclls is dealt with here.
Pinally, the refinement problem arises because part of the hazard surrounding the
formation of a new classiiicatory unit is that it is known in advance neither why it is
going to be useful, xior of exactly what events it should be composed. When first
created, therefore, the new classificatory unit is a highly speculative object, whose
boundaries and properties have yet to be determined. The su1)sequent discovery of
the appropriate boundaries (if such exist) is the refinement of the classificatory ~rnit.
5.1. Setting u p the neural representation :sleep
5.1 .O. introduction
It is convenient to begin with the second problem, of selecting a location and
forming there a suitable neural structure. The reason is that the other two problems
are best dealt with in the context of explicit neural models, and these are not
complete enough until the apparatus necessary for the setting up problem has been
incorporated. For the purposes of this section, i t will therefore be assumed that the
subevents which are to make up the new classificatory unit have been decided upon
in advance, and are held in a store. The problem then reduces to that of discoverirlg
a suitable location, and creating there the appropriate evidence and output cells.
5. I . I . Xebecting a location
The natural method of discovering a suitable location is to form a representation jn
all those places which are suitable. For this, the whole cortex is, so to speak, placed
in a suitably receptive state, and in those regions where enough information is
received, a representation is automatically set up. Later refinement will select fbr
the most successful, and not all of the representations initially set up will survive.
This rnethod has two important advantages: first, it removes the difficulties which
arise in computing where the appropriate fibres gather together with a large enough
contact probability. The discovery of these special locations is better left to the
method suggested, whereby it is a natural consequence of their existence.
Secondly, the method allo~vsthe multiple formation of' representations, wllicll
means that a single input can generate many different classes. There are often
excellent grounds for categorizing information, and dealing with each category
separately. For example, inlormation about shape can profitably be classified
separately from information about colour, alld this could be implicit ill the way the
connexions are originally arranged. An area of cortex which received only information of a particular category would classify within that category. Tf many sucEi
areas existed, one piece of inforrr~:itioncould simultaneoasly cause classes in several
categories to form. This is probably an important aspect of the solution to the
parhition problem 5 1.3.3, but one which relies on the rough genetic specification of
the categories.
A theory for cerebral neocortex
215
6.1.2. Codon j o ~ m a t i o na n d sleep
The problems of what evidence functions to form, and how to form them, have
been discussed in 5 4. It may turn out never to be necessary to use codon formation,
since this technique is essential only where a standard codon transformation, with
unmodifiable excitatory synapses (Marr 1969)~does not produce evidence of
suficierit qudity. The finer the cl~ssificationsrequireci, however, the better the
quality of the evidence must be; and the more sophisticated they are, the less
certain i t becornes that genetic information can provide pre-formed codons of the
right type: so if codon formation is used a t all, i t will be used more i11 higher than i11
lower animals.
I n $4.3.5, it was decided libat the most likely technique for codon formation used
Brindley synapses which become modifiable only a t those times when codon formation takes place. Arguments were set out there for the view that this assumption
does not have a complexity which is disproportionate to those concerning the other
operations which must take place a t these times.
I t was pointed out in $4.3.3 that when the afferent synapses to codon cells are
modifiable, only that information for which new evidence functions are required
should be allowed to roach these cells. In $4.3.5, it was shown that information
from which a new classificatory unit is to be formed will often come from a simple
associative store, not directly from the environment. I11 $5.1.1 it was argued that
the most natural way of selecting a location for a new classificatory unit was to
allow one to form wherever enough of the relevant fibres converge. This requires
that potential codon cells over the whole cerebral cortex should simultaneously
allow their afferent synapses to become modifiable. Hence, a t such times, ordinary
sensory information must be rigorously excluded. The only time when this exclusion
condition is satisfied is during certain phases of sleep.
The tentative coiiclusion of the theory is therefore that sorne cerebral codon
cells have Brindley afferent modifiable synapses, which only become modifiable
during sleep. The firm conclusion of the theory is that if the locations for new
classificatory units are selectod by the method of $5.1.1;if there exist plastic codon
cells in the cerebral cortex; and if they use Brindley afferent modifiable synapses;
then these synapses are modifiable only during the correct phases of sleep. A
consequence of this phenomenon for the learning characteristics of the animal as a
whole is set out in $ 7.6.
5.1.3. Output cell selection :gene~alities
No methods have so far been proposed for the selection of output cells for classificatory units. The question was raised in $4.1 of whether more than one physical
cell could profitably be used as the output for a single classificatory unit: i t was
concluded impracticable unless such cells formed independent representations.
The problem of output cell selection is therefore that of finding a single, hitherto
unused cell whose dendrites are favourably placed to receive synapses from most of
the evidence cells created for the classificatory unit concerned. These codon cells
will be clustered round the projection region of the relevant fibres, so the selection
process has to work to choose a cell in the middle of that region. Thc methods
available for cell selection are ossentidly the same as those described in $4.3 for
codon formation (figure 5), but the arguments for and against each method are
different in the present context. The methods are discussed separately.
5.1.4. Outpul cell selection :particularities
The final sbate of the output cell airerent synapses has been defined by the qweceding theory: they must have strength which varies with P(Qlcj),each c,. Tliere is
therefore not the distinction bctween different models for output cell selection that
there was between models (1) and (2) of ligurc 4 for codoil ibrmation. If some model
of this kind is used, the synapses must initially a11 have some standard excitatory
power, which gradually adjusts to become P(Q(c,).The exacb details of the way this
happens will be the subject of 55.2, but the outline can be given here. First, the cell
will fire only when a significant number of afferent synapses are active: so it will
only be selected for a set of events most of which it can reoeivc. If there exists a
single collection of common, overlapping subevents in its input, this collection will
tend to drive the cell most often, and those synapses not involved in this collection
will decay relative to those which are. Hence the cell will perform a kind of mountain
climbing of its ouTnaccord.
'I'here are two possible arguments against this scheme: first, such n, system can only
worlisuccessfully if there is just one significant mountain in the probability spacc over
the events it can receive. This makes it rather bad at selecting a particular mountain
from scver;tl, and responding otlly to events in that ; so the cell will not be very adcpt
a t forming a specialized classificatory unit unless it is fed data in a very careful
manner. Secondly, some disquiet naturally arises over the conditions rec-juircd fbr
synaptic rnodification t h a t modification is sensitive to silnclltaneous pre- and postsynaptic activity. The Q-cell dendrite will need to collect frorn a wide range of c,-cell
axons, and will therefore be mucl.1larger than the c,-cell dendrites. I n such circumstances, it is far from clear that these conditions are realizable. The most reasonable
liinds of hypothesis for synaptic modification by a combination of activities ill
pre- ant1 post-synaptic cells concern activities in adjacent structures, not elements
up to I mm apart. There are therefore some grounds for being dissatisfied with
model (1) of figure 7, even supposing the mountain-climbing details turn out ill a
favomable way.
The second model (figure 7 (2))is based on some kind of climbing fibre :tndogue.
It is of course not a, direct copy of tlre cerebellar situation, sincc thcre can exist no
cerebral analogue of the inferior olivary nucleus. It works thus: suppose there
exists a single collection of' cornmon, overlapping input events in the input space of
a, and lct a, be one of the input fibres involved. Then most of the cilxsed for such
events will occur frequently with a,, since a, is itself rrequently involved in such
events. Now slxppose a,, as well as reaching 92 through orthodox evidencc ac~lls,also
A theory for cerebru.1 neocorlez
217
drives a clinlbing fibre to 8:then this will cause the n~odificationof most of the
ci-ccll synapses used in the collectio~lof frequent events. The cell 8 will then be
found to have roughly the correct values of P(Qlc,) for most of the c,, and the find
adj~xstmentscan be made by the same methods as were used in model (1).
E'ra~rr,~.:
7. TI\-o models for output ecll selection. (1) Uses Brrndlcy synapses,
( 2 ) I ~ S C R Hchb synupscs and s clrnlbmg fibrc (CE').
1x1 other words, the effect of tying modification coriditiorls iaitially to a cljmbing
fibre driven by something known i o be correlated with the events ofa mountain is to
point the output cell $2 a t that rnou~~tain.
The use of' a climbing fibrc therefore, as
well ;ts eliminating difficulties about the irilple~nentationof synaptic modifioation,
also removes the coaldition needcd in model (1) that there should exist just one
mountain in the event space to which SE is exposed. With the clirnbing fibre acting
as a pointer, there can he as many as you like: the only condition is that thc more
there are. the more specific the pointer has t o be.
5.1.5. Driving the c1irnbil~gJCibr.e
The exact details of both these techniques will be analysed in $ 5 . 2 , but before
leaving this section, it is worth discussing the kind of way in which the climbing
fibres may be driven. One possibilit,y is the metl~odalready nientioiit:d, where the
climbing fibre is clriven by onc of the input fibres of the event space of a. This
will do for many purposes, but i t inay not alwt~ysprovide a specific enough
pointer.
The alternative method is to drive the $2-cell by a cljmbjng fibre whose action is
more localized in the event space 3i for 92 than the simple fibre a,. In this scheme,
the climbing fibre is driven by a cell near the Q-cell, arid one wtlich consequently
218
D. Murr
fires only when tbcre is considerable evidence-cell activity near Q. This cell then
acts as a more specific pointer than a simple fibre would, and is called an output
selector cell (see figure 8).
It is an elementary refinenlent of this idea t o have more than one climbing fibre
attached to a given cell Q, which then requires activity in several to be effective in
causing synaptic rnodific:~tion.The crucial thing about the climbing fibre input is
that it shoulti provide a, good cuougll rough guide t o the cvcnts a t which G! should
P C G L J8.~The
~ I Cfundanicr~titalrrc~nral
rnotlrl, o b t a ~ n c dbv combrrloly tllc modcls or iigurcs 6 anrl
7 ( 5 ) . 'l'wo clnnb~ngfibres arc slro\vn; orlcxfrom an inpllt, fibre, arrd one fkorn a n ~ a r b y
o t ~ t p ~selt,d,or
lt,
ccll T .
look for 8 eventually to be itble to discriminate a single rnoniltain from the rest of its
event space. T t is important also to note that this kind of system can be used directly
to discover new classificatory units. As long as no codon formation is required,
climbing fibres can caudc tho disnovcry of mountains- 4 . e . new olassifi cntory unitsdirectly on thc incoming information. Provided that the connectivity is suitable
(i.t.. that inforination gets brought together in roughly the corrcct way), new
classifcatory units will form witllout the nccd for any intcrrnediate storage.
A theory for cerebral neocortex
5.2. The spatial recognizer efect
5.2.0. Introduction
'I'he process central to the formation of new classificabory units is the discovery
that events often occur that arc similar to a given evcnt over a suitable collection of
fibres. 'Phis was split in $1.4 into the partition problcm, which concerns the choice
of roughly the correct collection of fibres; and the problom of selecting thc npln-opriate collection of events over those fibres. The second part of this problcm has
becn discussed in conncxiorl with ideas about mountain climbing, and an informal
description of the solutiour has becn given in fj6.1. l'he essence of this solution is that
an output cell performs the mountain climbing process naturalljr, and if started by a
suitably drive11 climbing fibre in roughly the correct region of the event space to
which it is sensitive, it will ultimately respond to tk~cevents in the l-tearkrymountain.
In this section, a closer exa,mination of this process is nmdc.
6.2.1 . Notation :the sta.r~dard(E, N)-plateau
l'hc notation for this scctiour will be slightly different from usual, since tho output
cell S2 is sensitive to events E over Z only in terms of the evidencc functions
ci ( I < i < K).It is therefore courvcnicnt to construct the space of all cvents of'size
Ic over the set {c,, ..., cz,) Each input evcnt E over Z is translated into an event
Y = Y ( E )over 9,and for the sake of simplicity, it will be assumed that each input
event E causes exactly k of' the c, (1 6 i < Pi) to take the value 1. As far as Q, is
concerned, the cvents with which it has to deal occupy a code of size E over
( c ~ .,.., cK}.
The ci are imagined to bc activc in translating input events for many output cells
other than 8,
and this allows thc further simplitying assumption that all the cj are 1
about equally often: that is P(c,) = P(cj), all 1 6 i < R.Only those evcnts which
occupy Ic fibres concern SZ, and the relative frequcncics of these are described by the
probability distribution A* (say) over 9.A* is the probak)ility distribution the
environment induces over 9,and is derived Srorn the input distribution h over 2.
Both h and A* have mountainous structure, but if y) is obtained from Z by a codor]
transformation, the mountains in !I) arc more separated than theii- parents in Z.
The term 'mountain' has hitherto had no precise definition. It is not known
exactly what kinds of distributiour are to be expected, so some kind of general
function has to be set up out of which all sensible mountains may be built. This is
what motivates the following
D~jinilion.Let y be the probability distribution over 9 defined thus: let 111 < JC,
/u.(Y)== 0 otherwise.
Then ,u is a standard (Ic, M)-plateau over c,, ..., c,.
1). Marr
220
'I'hat is, f i ascribes a constant, value to the probability of every event which
gives c, = 0 for ell fibres outsidc some choscn collection {c,, ..., c~,). The collectiorl
{c,, ...,c,) is called thc support of the plateau, and is writtcn S(,u).A sirnple mountain p* is one that can be built up out of platcaux piwith nested supports: i.e.
where
P
C wi
i=
=
1 and
S(pl.,)2 X ( / C ~3) S(/hp).
1
In the absence of any bettcr guesses about what kind of distributions should bc
studied, this section will dcal with simplc mountains. The fact that they can so
simply be constructed from standard plateaux means that i t is in fact enough to
study the properties of standard plateaux. Yurther, we shall consider plateaux
over the event space generated by the codon functions for a given classificatory
unit, rather than plateaux ovcr thc cvent space generated by the input fibres.
This is because the crucial operations occur a t the output cell, which receives only
evidence fibres.
5.2.2. Climbing Jibrcs and modijicution conditions
Without loss of generality, it may be assumed that the output cell 8 receives
only one climbing fibre, which will be represented by the ft~nction$ ( t ) of timc.
Q cannot in general be regarded as a function from g) to (0, 1) since 4 may take the
value 1 a t a timc when tl~creis no event in 9.Some kind of rclation betwcen Q,
and tlbc cvents of ,E> has to be assumed; it is that the co~~ditional
probability P(#jc,)
is well-defined and inde-pendcnt of time.
'I'hc climbir~gfibre input to Q
. is closely rclated to the conditions for synaptic
modification a t 8,but tilere are two possible views about the exact naturc of' this
relation. One is that the climbing fibre is all-imporlant in determining tke strength
of the synapse fronl c , to S , and on this view, the strength varies with P($lc,). l'hc
cell Q really diagnoses $5 if this is so, hut it will be shown in $ 5 . 2 . 3 that if the structure of A* over 9 is appropriate, this will be adecyuate.
The othcr possible view is that q5 acts as a pointer for a.On this model, the efrect
of $4 is to set the values of the synaptic strengths a t P($lc,) initially. T l ~ true
c
conditions for synaptic modification arc simultaneous pre- and post-synaptic activity. I t
is a; little difficult to see how tl20 climbing fibre should be dealt with after it has sot
up the initial synaptic strengths, so in the theory of $5.2.4,it is regarded simply as
doing this, and is then ignored. This is an approximation, but scems the best one
available. The true situation probably lies somewhere between those described
in s s 5 . 2 . 3 and 5.2.4.
5 . 2 . 3 . Mountaitz ssvlrction with P(LI,,jc,)= P ( $ ( c , )
lJct [ p , qj denote the plausibility rangc of 8.The state of a ' s afferent synapses can
be represented by the vector w = ((2,
...,o K )whcre w" P ( Q / c , )and
, it is assumed
for this model that w is fixed-that the climbing fibre is the supreme determinant of
221
A theory ,for cerebral neocortex
thc synaptic strengths. Let X E W . Then X has ~t representation as a vector
Y = ( Y 1 , ..., Y K )E '$
with
j exactly Ic of the Yi = I, and all the rest zero. Let . denote
the scalar product of vectors it1 the ~xsualway: that is o.2' = Z;w7Y%Then tlze
i
cell S?, responds to X iff C c , ( X ) P($/c,) 2 Icp, i.e. iff w . Y 2 Icip. Hence A', the set or
events to wl~jch8 respontls, is given by
Ths following example shows how this m a y work adequately in practice. Let p
denote the standard (k, N ) plateau on (c,, ..., c,?,), M < li, and let 11denote the
stauclard (k, N)-plateau on {cS+,, ..., cs ktn;) where 1 < A < M < 9 + N < K. Suppose
Q = c,. If the input distribution
= p we have
(dl < i
=0
If
A:*== v..ivehareP($lc,) = O ,
< K).
all i > t.
Ilencc if the lo~?-er
limit ofthe plausibility range 123, ql of SZ is p = k-I (Scc+ (1; - s ) P ) ,
the cell 92 will respond Co l3i f and only i f p ( 8 ) =/= 0. Thus the output cell 8 has selected
t l ~ cmountain /L from the distribution A" = ; ( / L i- 11) evcn though the climbirlg fibre
$ did not. This is the crnci:tl property which the systcrn possesses.
I n general, if Q = cl, $ will select tjhtbevents of any pl:~toau conta,i~~ing
c, in its
supl)ort, and can therefore kte madu (by suit,able ehoicc of p) to reject all cverlts of
other p1:tteaux which do not h l l into such a, p1atc.art.
4
The: rc:latiori ( 1) c;rrr hc r~sccft o corlsi,r~~ci;
thc explicit co~ldit~iolr
that;
u clirnbing fibre
inrh~crS2 t o respond t,o i x p n r t i o l i l i ~set
~ of cventjs. I f w
is the cliulhing librc? v t ~ c t o r
w -= (1'(Qlc,), .. ., I ' ( ~ ~ JL LcI ~
IK
~)n)
{XIw . Y 3 /q),
t-llorlsl os,nsc,lc:ct the events N o l ~of
t
{X, h)iff /\(NSz il N) 0; j.o. t,lio prob:J)ilil.y nndcr t,ho input clist,ribnttiouh tha,t tt7l cocnt
occurs which is ill cxa,ct,lyorlo of N,, .V is zcro.
cat1
-
-
5.2.4. T h e spal.ial r e c o g n i z e r cjfcet
I n thc more goileral case, 45 acts as $5 starting condition rathcr than perrnt~nentl~
defining the strength of thc synapse from r:, t o Q. ( 1 < i < li).The subsequent
strengths of' these synapses dcpend on and only 0x1 P(Qjc,).
M'ritc P($ I c,) = wg, 1 < i < K and let 0 = kpl'(c,). Since P(c,) = P(c,), all
1 < i ,j < I R ((15.2.1),thc initial firing condition for 8 is simply CQ$:, (S)
2 (1,
As bcfme write w,
=
((I):,
..., w,:.') as a, vector: w,
i
tlefincs tlze state of the afferent
222
D. Marr
synapses to Q. I T Y is the usual vettor (consisting of 0's and 1's) which roprcsonts the
event X over {c,, ..., c,), the firing condition for 8 is
The diEerc.ncc here is that w is now a variablc. 7'1.1~
point is that the vector w
dcpcnrison the input distribution A, nr~clonthosc~cvents
l,owhich (by (2))Qresyoncls.
Dciinc. qi(wo)E: W by %(coo) = ( X I w,, . Y 0). Defiiic the now vector
That is, the co-ordin:ttcs (1): of w, are simply the projections onto the c, of the
restriction AjNo(wo)of h to No(wo).Thcn w, represents the state of the synapses
fl'on~the c, to Q if responds only to the events in No(wo).
Fralriee 9. 'L'llc stair: vector w j , which describes t l ~ cstrengths o r tho nffc)rerlt syrrnpscs t o tlu:
o u ~ t p u cell
t a, tlcterrnirles t h e set N o ( w j )of events t o wlricll !2 will respond. '('his iil 1,111'11
deterrriirlcs n new s t a t e vector wj+,.Equ~ilil)riurnoccurrs when w j = w j + l .
'I'hesituatior1is thus thi~tinthe
state o,,
thc cell 8rcsponds only to eventsinN,(s,) .
exposure to such evelits anifiybe cxpccted to changc the st ate vector o, into w,, from
wl~erethe proccss is rcpcatcd. This gcncnztes a seriesof succcssivc transforruatio~lsof
:md this is called tbc spaticrl recognix~r~ f l e c t(see ligwrc 9).
the state vector w for 8,
,.
'I1h(wrewz.'rho stato vector acklicvei c q u r l i b r ~ ~ i~f fr nthc1.c exists n j such tll:tt w , = w ,
Proof. I n eyurltbrrum, t h e set 01cvcrlts .%(w,) t o whrcl.1 tho ec.11 Q responds spccifirs n
s t a t c vcetor w,+,sllekl t h d h(NH(w,)
A N H ( w , , = 0: h ( ~ ~ ~ ~ ( ,coe aordrnate
ch
01( w ,- w, ,,)
1s tho projectiorl onto n c, 01hjNo(w,)A iVo(w,,,I. anti SO 15 zcro.'Thus w:,- w,+)- 0, , ~ r l t i
w, = Wl+l.
simple example A* = ~ ( I +
L v) of 5 5.2.3, equilik)riumis achicved in cx;ictly
onc step. As already observed, w, is defined by
I11 the
n ~ =i 1 ( i = 1)
= a (1 < i <
= p =, <
4 )
=O
For p
1
where n-' = 13(c,),a nd
is constant.
(M<i<IC)
-
k-l(fla + ( k - #)/I), the c c l l 8 responds only to those cvcnts A' with /L
also l ~ a v cv O so that w, has thr. following specification.
=
-
no$ = 1 ( I
< i < 8)
1
0
A theory for cerebral neocortex
and w, = o,. This result extcnds to any simple nlountainp*, /L*
whore I', = ci E X(P'~) = S(pI)is an element in it>ssupport.
223
+ ... + w,y,],
= w,/L,
5.2.6. A yeneral chn~acterizalionof the recognix~rejfect
4 t is natural to scck sornc elegant way of describing the spatial recognizer effect.
In thc following informal argutnent, a characterization is given in terms of a search
for steepest ascents in 9 under A'$. This effectively puts a stop to any attempt to
produce a necessary and sufficient condition that the starting state w, should lead
to a particular final state w*, since the general question depends upon the detailed
stnlvtnre of A. The answer that i t does if and only if a line of steepest ascent leads
F~c.r-nc10. Tht state vcctor o clctrrlllincs the seL Al0(w) 01 ~ v o n t sto urh~chS2 rc~sponds.Tlir
cm\~iro~rmeutel
probohil~tydistribution over NB(w)IS stippled where ~t has non-zero
distributiorr coirrcidc
velurs. w changcs so as Lo tmd to rnake the contrc of g~3evityof t h ~ b
with tho centrr of NO(w).This 1s thc principle behind Q's ability Lo prrform i~mountainoperation.
eL~~nhirrg
there is probably its own neatest characterization. It is convenient to make the
restriction that p, the lower lirnit to the plausibility range for $2, is variable, and
varies to keep the average amouilt of activity of $2 constant; i.e. p is such that
P J
dA is constant, for all response neighbourhoods NO(oi)(dcfined by equation (2)
J\'~(w&)
of 9 5.2.4). Write hT, (w) = (,Y) Y . w 3 0 ) (see figure 10). ca, moves to w, giver1 by the
projections onto the c, of the restriction AINO(w)of A to rV,(w). (Compare (3) of
$5.2.4.)Kow w, effectively measures thc centre of gravity so t o speak of the events
in No (w)since if w, = (w:, ..., (I):'), oi varies with the expected probability that ci = 1
in IY,( w ) under h. Since, in each event X of N, (o)exactly lc of the ci have the value 1,
this means that the response area of 92 moves towards that region of N,(w) which
contains the closest, most common events. S2 is attracted by both commoness, and
by having many events close to one another all having non-zero probability. The
way these two kinds of merit compete is approximately that the movement which
maximizes the expectation of o . Y over No (o)is the one which is actually made: but
the full result along these lines is complicated. Pn fact, the move is the one which
has the bestbhance of maximiziilg this expectation.
224
D. Marr
Thus o moves to climb gradients in the scalar function E(w . Y) taken over the
response area defined by w. A proof of this result will appear elsewhere.
5.3. The rejinement of a classiJicatory unit
The refinement of a classificatory unit is the discovery of such appropriate boundaries as i t might have. There are two kinds of information on which this process
can be based: they are the frequencies of the subevents on which the unit is defined,
and the correlation of instances of the unit with properties not included in its
definition (i.e. support). The modification of a classificatory unit on the basis of its
subevent distribution is called its intrinsic rejinement, and has essentially been dealt
with in $5.2: alteration made as a result of comparison with external properties is
called extrinsic rejinement, and will be discussed briefly here.
General extrinsic refinement requires a simple memory; but it basically consists
of the same kind of mountain climbing techniques as intrinsic refinement. The only
piece of the problem that can be discussed a t the moment is the hardware needed for
it. It is appropriate to deal with this now, since the necessary machinery must
appear in the fundamental neural model.
There exist three main strategies for the extrinsic refinement problem: they are
characterized by the change during refinement of the number of sixbevents to
which the output cell 8 will give a positive diagnosis. This number can increase,
decrease, or remain about the same. The basic point is that the strategy which
requires the number to decrease is the one which is easiest to implement, since it is
easier to remove events from the response area of 8 than to add them. This is because
the only way of adding an event to 8 ' s response area is by stimulating the climbing
fibre. This needs some way of gaining access to the correct climbing fibre cell. The
models of § 5.1 for output cell selection make this difficult, since one of the key
points in their design was the absence of a special climbing fibre for each output cell,
and alternative schemes are unacceptably complicated.
The other possibility for adding events to 8 is to use an associational path to 8
itself (for example, the basilar dendrite afferents of figure 5): but it was thought
(s4.1.8) that the associational activity of the 8-cell should not have this kind of
ability to influence the strengths of the synapses arising from more direct inputs.
Finally, there can be no guarantee that the existing evidence fuiictions for 8can cope
with a new event.
Given these difficulties, it is natural to examine the possibility of refining a
classificatory unit by eliminating inappropriate events from its response field. The
main advantage such a method produces is that a general inhibitory influence acting
over all output cells (in a particular region) can be used to alter values of P(81ci) for
one particular 8 in a way in which a general excitatory influence cannot. For suppose
the event E is to be cut out: this must be achieved by allowing E to enter the c,-cells
for 8 while preventing the formation of modification conditions a t 8 itself. If the
chance that E should be interpreted in a cell near S2 is small, this effect can be
achieved by applying a general inhibitory signal to all the output cells in the region
A theory for cerebral neocortex
225
containing 8.Hence the only additional hardware this method requires is a fairly
non-specificinhibitory input to all output cells. This does not appear in figure 8, since
its derivation from the theory is less firm than that of the other elements which
appear there.
$6. N O T E SO X
THE CEREBRAL NEOCOBTEX
6.0. Introduction
The present theory receives its most concrete form in the neural model of figure 8.
I n this section, the fine structure of the cerebral cortex is reviewed in the light of
that model. Anyone familiar with the present state of knowledge of the cerebral
cortex will anticipate the sketchy nature of the discussion, but enough is probably
known to enable one to grasp some a t least of the basic patterns in the cortical
design.
It need scarcely be said that cerebral cortex is much more complicated than that
found in the cerebellum. Nothing of note has been added to the researches of Cajal
(191I ) until comparatively recently (Sholl 1956; Szentagothai 1962, 1965, 1967;
Colonnier 1968 ; Valverde 1968),because Cajal's work was probably a contribution
to knowledge to which significant additions could be made only by using new techniques. Degeneration methods have since been developed, and the electron microscope has been invented; so there is now no reason in principle why our knowledge
of the cerebral cortex should not grow to be as detailed as that we no-\$,possess of
the cerebellar cortex. It is, as Szenttigothai (1967)has remarked, a Herculean undertaking; but it is within the range of existing techniques.
6.1. Codon cells in the cerebral cortex
6.1.1. Tibe ascending-axon cells of iWartinotti
The main source of information for this section is the description by Cajal
(191I ) of the general structure of the human cerebral cortex. The codon cells of the
cerebellum are, according to Aiarr (1969), the granule cells. whose axons form the
parallel fibres. The basic neural unit of'figure 8 has analogies wit11 the basic cerebellar
unit (one Purkinje cell, 200000 granule cells, and the relevant stellate and Golgi
cells, in the cat), so it is natural to look for a similar kind of arrangement in the
cerebral cortex.
The first point to note is that cerebral cortex, like cerebellar cortex, has a molecular layer. According to Cajal (p. 521) this has few cells, and consists mostly of
fibres. The dendrites there are the terminal bouquets of the apical dendrites
originating from pyramidal cells a t various depths. Most pyramidal cells, and some
other kinds, send dendrites to layer I, so there is a clear hint in this combination that
some such cells may act as output cells. The great need is for the axons of the molecular layer to arise mainly from cells which may be interpreted as codon cells.
Cajal himself was unable to discover the origins of the axons of the molecular layer,
and probably believed they came mainly from the stellate cells there. The problem
was unresolved until Szentjgothai (1962) invented a technique for making small
local cortical ablations without damaging the blood supply, and was a t last able to
determine the true origin of these mysteriorns fibres. It is the ascending-axon cells of
Martinotti, which are situated mainly in layer VI in man.
This fundamental discovery showed thae the analogy with cerebellar cortex is not
empty, for the similarity of the ascending-axon cells of Martinotti to cerebellar
granule cells is an obvious one. There are, however, notable differences; for example,
the Martinotti cells are much larger than the cerebellar granules; and in sensory
cortex, primary afferents do not terminate in layer VI.
The interpretation of Martinotti cells as cerebral codon cells raises five principal
points, which will be taken separately. The first is the cells of origin of their excitatory
afferent synapses. There is unfortunately rather little information available about
this, but it appears from Cajal's description that the following sources could contribute fibres :
(i) The collaterals of the pyramidal cells of layers V, VI and VII.
(ii) Descending axons from the pyramids of ZV.
(iii) Collaterals of fibres entering from the white matter.
(iv) Local stellate cells.
f t would best fit the present theory if intercortical association fibres formed their
main terminal synapses with theso cells, and the collaterals of the pyramidal cells in
layers V to VZI were relatively unimportant. There is some evidence that association fibres tend to form a dense plexus in the lower layers of the cortex (Na-trta1954;
and Cajal 191I , pp. 584-5).
The second point is that the Martinotti cells would have to have inhibitory
afferent synapscs driven by the eyuivalcnt of the G-cells which appear in figure 8.
The cKect of these synapscs should be subtractive rather than divisive, so that to be
consistent with the ideas about inhibition expressed in $4.1 on output cell theory,
the synapscs from the G-cells should be distributed more or less all over the dendrites of the Martinotti cells. (There is some evidence that this is so for certain cells
of layer 1 V in the visual cortex of cat (Cololmicr 1968), but it rests upon an as yet
unproved morphological diagl~osisof excitatory and inhibitory synapscs.) This is in
direct contrast to what the theory predicts for output cells, a distinct fraction of
inhibitory synapses should be concentrated a t the soma.
whose
The third point concerns the possible independence of the dcndrites of Martinotti
cells. 'Chesc cells commonly have a quite large dcndritic expansion, and it may be
to expect much interaction between synapses on widely separated
branches. The effect, if their affercnt synapscs are unmodifiable, is to enable the cell
as a wl.lole to operate as the logical union of m, (R, 8)-codons (where m is the number
of independent dendrites) instcadof as a single (m72,H)-codon:the advantage of this
is a better quality of evidence function.
The fourth point concerns the possibility that the excitatory afferent synapscs to
Martinotti cells may be modifiable: this has been discussed in $5.1.2. If these
synapses are Brirldlcy synapses, then the dendrites may be independent from the
A theory for cerebrcxl neocortex:
227
point of view of spiiaptic modification, as well as in the way described rn point
three. If there is some kind of climbing fibre, arrangement, the fibres must be driver1
from sotno external soarce, end rnust be allo~vcdto operate only when cotlon formati011 is required. The second possibility could allow the i~odiiicatioi~
condition t o
operate sirn-t~ltancouslgover l,hth whole cell. I t has becrl seen, hoavcver, that climbing
fibrcs arc unlikely to b t 11scd. If location selectioil proceeds :is described in $5.1.1,
t,hc Martinotti agerent synapses are modifible only during the c,orrect phase of
sleep.
Fifth wnd 1 ~ 1it, is a simple conscqrrence of thc present tbcory that Xartinotti cells
should be excitatory, and should send nxons to synapse with five types of ccll: the
output cells, whoso ordinary excitatory aKerent synapses are unodifiable; {,hetwo
types (5' and D ) of inhibitory cell; the Jfartinotti threshold controlling cells. the
G-cells; and perhaps output ccll seleckor cells, tv21ose axons terminate as climbing
fibres on output cells. A 1lr~r.tinottjaxon itlay itself under certain circumstances
terminate as a climbing iibrck as well as making crossing-over syiiapscs with output
cells; but this possibility may be excluded for devclo.prneuta1 reasons.
6.1.2. The c c r ~ b r a gl rrtr~ulcc(j11s
la leyer I V of granular cer-ebr~al
cortex. there are found a large number of small
stellati: cells, 9 to 13pm in diameter, whose fine axons end locally. This layer i~
especially well developed in priniary sensory cortex, ~.vherc>
it sees the terrniriation
of tlzc majority of the aff'erent scrrsory fibres. rt has long been believed that such
fik~ress ynapsrd mainly with tlle grennrlc cells (Cajal 191I ) . Rzenthgothsi (1967) has,
Ilowever, pointc:d out that many sensory affert~ntsin fact terminate as clirnbiilg
fibres on the dcaclritic shaft4 passing through 1V, i ~ n dbelieves this may be on
impor.i ant method of' terrniaation.
Valvcrde (1968) has n~odea qrranCitativc shudy of the amount of terrninel degcneratioll in the different corticnl layers of area 17 of mouse after crzuclcation of the
contralateral eye, and has demonstrated that about 64 % ooours in layer JJJ, the
other principal contvibutions being from the a d j a c e ~layers
t
111and V. In view oi'
the abundance of' granule cclls in laycv IT, it is difficult to irnagiirc that the affercilt
tibres ncvor synapse with thcr~,,and so likely that the traditional vie\$-is oorrcct.
Thcrc call be rto doubt that alffcrcnts also tclr~ninateas clirnbing fibres, and the
possik~ilitythat both thesc things bsppon fits very neatly with the predictions of the
preser~ttheory.
These views support the interprct~~tion
of the grannle cells as codell cells, in which
case the rerrlarks of $6.1.1 about &tartinotti cells ni:~ybe applied to tlrenx. An
interc~stingcharacteristic of granule cells is that they are often very close to raw
sensory infbrmation, in a way in which the Martinotti colls arc not. They will
therefore not suppore classificatory units which rest on rnnch preceding cortical
analysis- -that is, classific-ltory units for which, if i t occurs a t all, coclon formation is
most likely to be usecl. Tllc tfleory therefore contains tire slight l ~ i n tthat, the
Martinotti cclls may be tlic plastic codon cells, and the granule cells the pre-formed
codoil cells. The consequence of this would be that the Martinotti cells have modifiable afferent excitatory synapses, while the granule cells have unmodifiable
afferent synapses.
6.2. T / Lcerebral
~
output cells
The present theory requires that candidates for output cells should possess the
following properties :
(i) A dendritic tree extending to layer I and arborizing there to receive synapses
from Martinotti cells.
(ii) An axon to the white matter, perhaps giving oilf collaterals.
(iii) Inhibitory afferent synapses of two general liinds: one, fairly scattered over
the main dendrites, and performing the subtractive [unction: the other cll~stered
over the soma, performing the division.
(iv) Climbing fibres over their main dendritic trees.
(v) A m i x t ~ ~of
r cmodifiable and unmodifia1)le afferent synapses. TIiose synapses
from codon cells- Martinotti and granule cells-should initially be ineffeotive (or
have some fixed constant strength), but shotlld be facilitated by the conjunction of
pre-synaptic and post-synaptic (or possibly j ~1stclirnbing fibre) activity, so that the
find strength of the synapse from c to 8 varies with P(i2lc).l'hese synapses should
certainly be modifiable during the course of ordinary waking life, and should probably
be permanently modifiable.
The cortical pyramidal cells of layers 111 and V arc. the rnost obvious candidates
a nd (iv),and (ii)(SzentAgothai
for thisr6le. According to Cajal(191I),they satisfy (0,
1962).'l'he evitd(1ncefor (iii)is indirect, but these ecllsrcceivc axosomatie synapses of
the btzrket type, and thrsr have been shown to be inhibitory wherever their action
has been discovered, (in the 21ippocampus (Anderson, Ecclcs & 1,oyning 1963), anci
the cerebellum (Anderson, Mccles & Broopll-loeve 1963)).Various liinds of short axon
cell e.xist in the cortex; there are probably cnough tJo perform the su1)traetion
func.tiol1 ($6.4).
The axon collaterals of thew pyramidal cells could perform two functions. Hither
1hey can thernselvc.~act :ts input fibres to nearby Martinotti cells; this would enablo
two successive classifications to be performed in the same region of cortex. Or they
could act as association fibres, synapsing with the basilar dendrites of neighbouring
output cclls. This would be usefill if nearby cclls dealt with similar information, but
not necessarily useful otherwise (M;~rr1971 ).
6.3. Cerebral cli.rr~hing$bres
One of the crucial points about the output cells is that they should possess
climbing fibres. The various possik)le sources of these were discussed in $ 5 , where it
was stated that tbere rnight be two origins-afferent fibres themselves, and cells
with a local dcndritic field.
Tht? first observation of cercbral climbing fibre cells was made by Ca,jal (191I ) ,
wllo describes certain cells with double protoplasmic boucluet, as follows. "I'he axon
filamrllts [of these cells] are so long that tl.ley can extend over the whole thickness of
A theory for cerebral neocortex
229
the cortex, including the mo1ecnl;~rlayer.. .. If one examines closely one of the small,
parallel bundles produced by the axons of these cells, one notices between its
tendrils an empty, vertical space which seems to correspond in extent to the dendritic stern of a large or medium pyramidal cell. Since the axon of one of these doubledendrite neurons can supply several of these small bundles, i t follows that it call
come into contact with several pyramidal cells,' (pp. 540-541).
Cajal saw these fibres only in man, but Valverde ( I 968) has beautiful photornicrographs of some coursing up the apical dendrite of a cortical pyramidal cell of the
mouse, so they clearly exist in other animals. Szenthgothai (1967) has found that
various types of cell can give rise to such fibres, and remarks that specific sensory
afferents often terminate in this way.
l'he cortical cells which give rise to climbing fibres have been called output cell
selectors. The theory requires that they possess a rather nonspecific set of afferents,
so that those cells in the centre of an active region of the cortex receive most stimulation. Such cells may also possess afferent inhibitory synapses to prevent their
responding to small amounts of activity.
'She present tl~eorydoes not favour the view that cells other than output cells
should possess climbing fibres, but it does not absolutely prohibit it.
(5.4. Inlzibitory cells
The basic theoretical requirements for inhibition in the cerebral cortex would be
satisfied by having three types of inhibitory cell. 7'wo should act upon the output
cells, one synapsing on the dendrites, and one on the soma; and one, the analogue of
the cerebellar Golgi cells, on the codon cells.
6.4.1. The subtractor cells
'I'he first place in which to look for inhibitory cells fbr the subtraction function is
the molecular layer 1,whero the Martinotti axons meet the pyramidal cell dendrites.
This layer does contain soino cells: it is wrong to believe that it consists of nothing
but :txons and dendrites. Cajal remarks upon the abundance of short axon cells
therch, stating that in number and diversity they achieve their maximum in man. He
distinguishes (pp. 524-525) b u r main types; ordinary, volnrninons, rcduced, and
neurogliafhrm. The last are like the dwarf stellate cells which appear frequently in
other cortical layers.
The short axon cells can be interpreted as performing the role of subtraction on
the output ccll dendrite. Thoy and their homologues are common throughout the
cortex. The sm:~llsize and great complexity of many of their axons and dendrites
enable them to assess accnrately the amount of fibre activity in their neighbourhood,
so it does not require undue optimism to imagine that thcy can provide about the
correct amount of inhibition. For this purpose, the more there are of such cells, the
smaller and more complex their axonal and dendritic arborization, the more
accurate will be their estimates of the amount of inhibition required. The neurogliaforn~cells therefore seem most suited to this task.
D. Marr
6.4.2. Th,c d i v i ~ i o ncells
The requirements of cells providing inhibition at a pyramidal ccll soma for the
functioii of divisioii are different. Their action is concentrated in one place, and does
not need to be acclrrately balanced over the dendritic fkld in the way that the
subtraction inhibition must. The clivision inhibition can therefore be provided by a
sampling process with convergenc3eat the soma. The details of this sampling must
depend on the dihtribution to the Martinotti and granule cells of the aft'ereilt
fibres, and arc based on the same principles as govern the distribution of the ccxrebellar basket cell axons.
There is no doubt $hat the pyramidal cclls of 1:iyers 1111 and V possz~ssbasket
synapses (Cajal 191I ) ,but Cajal does not describe them for those of layer 11, m~llich
otherwise look like output cc-11s. Colonnier (1968)has however studied the pyramids
olI11 in area 17 in some dctail, and has shown that, while synapses on the somas
of thesc cclls are not densely packed, they do exist, and are exclusively of the syrnmetxical type with flattened vesicles. Pt wo~rldbe interesting to have somc comparativc quantitative data about somatic synapses on pyrarnidd cells of different
layers in the cortex.
6.4.3. !L1hec o d o ~cell
~ th~esholdconi~ols
The control of thc Marti~wttiand granule cell thresholds requires an inhibilory
cell which, like the cerebellar Golgi cell, is dcsigned to produce a roughly constant
amount of codon cell activity. There are various short axon cells in layers I V and VT
which rnight perform this rBle, but no evidence availablc about the cells to tvbich
they send synapses. The obvious candidatcs in PV are the dwarf cclls (Cujnl 191I ,
p. 565) and perhaps the horizontal cclls; and in V l , the dwarf cells and stclliite cclls
with locally ramifying axon. For the coiitrol of Bhrtinotti cell thresholds, it seems
probable that the device of an ascending denchritc should be used to assess the
amount of activity in the molecular layer. ?'his could be done, for example, k)y an
inhibitory pyramidal or fusiform cell with bnsilar and ascending dendrites, and
locally arborizing axon. Such a cell would possess no climbing fibre, nor any modifiable afferentsynapses. There exist various fusiform cclls in layers V 1 and VTI which
might do this, but therc: is too little data available to know for certain.
6.6. Generalities
The theory expects output cells to fire at different frequencies, and it expects
output cells at one level to fbrm the input fibres for the next. It is therefore implicit,
in the theory that input fibres a; ( t )should take valucs in the rsnge [O,1], and should
not be rcstricted simply to the values 0 and 1. The theory has been developed hcre
only lor the simple case of binary-valued fibres. Its extcnsion to the more geileral
case is s technical mattcr, and will be carricd out elsewhere.
Finally, it is unprofitable to attempt a comprehensive survey of cortical cells at
this stage: neither the theory nor the available facts permit more than the barest
A theory for cerebral neocoriex
231
sketch. It is most unsatisfying to have to give such an incomplete series of notes, and
1 write these reluctantly. It does, however, seem essential t o say something here.
It both illustrates how the thcory may eventually be of use, and indicates thc
kind of infornlation which i t is now essential to acquire. More notes on the cerebral
cortex will accompany the Simple Memory paper, but until then, i t seems better t o
err on the side of reticence than of temcrity.
7.0. Introduction
I n this section are summarized the results which are to be expected to hold if the
theory is correct, together with an assessment of the firmness with which the
individual predictions are rnade. The firmness is indicated by superscripted stars
accompanying the prediction, the number of stars increasing with the certainty of
the statement they decorate. Three stars*** indicates a prediction which, if shown
to be false, would disprove the thcory: two stars** indicates that a prcdiction is
strongly suggested, but that remnants of the theory would survive its disproof: one
star" indicates that a prediction is elcar, but that its disproof would not bc a serious
embarrassment, since other factors may be involved; and no stars indicates a
prediction which is strictly outside tho range of the theory, but about which the
thcory provides a strong hint.
7.1. Nartinotti cells
Each Martinotti cell should have many inputs***, mainly from intercortical
association fibres**, which should terminate by means of excitatory synapses***.
Each should also have inhibitory inputs***, subtractive in effect*" and therefore
widely distributed over the dendrites*". These should be driven by local cells***
with locally arborizing axon"**, designed to keep the amount of Martinotti cell
activity evoked by different inputs roughly constant*".
Excitatory Martinotti cell afferent synapses are probably modifiable*, and if
hhey are modifiablc, thcy are probably Brindley synapses*, becomiiig modifiable
only during the correct phases of sleep*. If location selection proceeds as in $5.1.1,
and if these synapses are modifiablc, then thcy are modifiable only during the
correct phases of sleep**". Martinotti cell dendrites are probably independent.
' l ' h ~output from these cells is excitatory***, and goes to output (pyramidal)
cell^*^::^ through modifiable synapses*'g*, three** kinds of inhibitory cells***
through unmodifiable synapses**", and to output selector cells** through unmodifiable synapses.
7.2. Cerebral granule cells
These cells fall broadly into the same class as Martinotti cells, and the predictions
concerning them are the same, with the following exceptions. Their input is mainly
more direct than that of the Martinotti cells, and should (because of their smaller
232
D. Marr
size) come from thalamo-cortical rather than cortico-cortical projections. They
probably do not have modifiable afferent syiiapscs. I n the sensory projection areas,
where afferents are known to terminate in layer 1V, these afferents bhould form thc
main source of excitatory synapses on the granule cells*.
7 . 3 . Pyramidal c e l b
The pyramidal cells of layers 111 and V, and probably also those of layer 11, are
interpreted as output cells, in the sense of the theory. On the assumption that this is
correct, they reccive two kinds of excitatory synapses**, and two kinds of inhik~itory
synapse^'^*. The majority of afferent synapses comes frorri Martiiiotti and granule
cells"*, almost a11 such cells making not more than one synapsc with any given
p p m i d a l cell*". These synapses are either Hebb or Brindley type modifia,ble
synapses***. The strength of the synapse form the codon cell c to the output cell 51,
stabilizes at the value P(O[c)**.(This receives only two stars, since there may be a
workable all-or-none approximation to this value.) These synapses should be
modifiable during the course of ordinary waking life***, and probably during sleep
as well*. All other afferent synapses described here are unmodifiable'k**.
If the dendrite is large, there exists a second excitatory input in the forin of a,
climbing fibre**. If there is no climbing fibre present, the other excitatory afferent
synapses must be Brindley synapses***. The climbing fibre input, if it exists, can
produce the conditions for synaptic modification in thc whole dendrite sirnnltaiieously***, but it is subseciuently not the only input able to do this*.
There are two ltinds of inhibitory input to the cell**: one scattered, which has the
effect of performii~ga subtraction**, and one clustered a t the soma, performing the
division*". At least one of these functions is performed"**, but the all-or-none
approximation would require only one. Both essentially estimate the number of
aEerent synapses from codon cells active a t the cell**4'.
The output froin these cells is excitatory if it forms the input to a sl~bseqllent
piece of cortex's*. Their axon collaterals synapse with neighbouring output and
Martinotti cells.
7.4. Climbing Jibres
These are prescnt only on output cells*. The climbing fibre a t a given pyramidal
cell provides an accurate enough pointer for that cell for the spatial recognizer
effect to take over and make the cell a receptor for a, classificatory unit'$:"4:.
Climbing fibres arc excitatory***, if used for this purpose.
7.5. O1her short axon cells
Many of the short axon cells which are not eodon or climbing fibre cells are inhibi'I'hc theory distinguishes threc? principal kinds**. Subtractor colls sample
the activity of codon cell axoris near local regions of dendrite*:':, :sntl send inhibitory
synapses to t,l\ose regionszk*.These have a subtractive eEe~t'~'X.
P)ivision cells, the
basket cells, are inhibitory*"; and so are cerebellar Colgi cell analogues, which keep
the amount of c:odon cell activity about constant**.
A theory for cerebral neocortex
233
The granule cell threshold controls receive excitatory*** synapses from either
the granule cell excitatory afferents, or the granule cell axons***, and perhaps from
both*. They send inhibitory synapses to the granules themselves***, and these
synapses are scattered over the granule cell soma and dendrites*". The Martinotti cell
threshold controls reccive excitatory*** synapses either from the Martinotti
affcrcnts, or from the Martinotti ax on^**:^. I n view of the length of the Martinotti
axons, they probably receive from both'",and therefore have an ascending dendritic shaft**. Layers VI and VII contain fusiform cells which could be Martinotti
cell threshold controllers.
The axonal and dendritio distributions of the inhibitory cells of the cortex depend
on the distributions of the afferents, and of the codon cell axons, in a complicated
way.
7.6. Learning and sleep*
This section as a whole receives one star, but if location selection proceeds as in
5 5.1.1, and if there exist plastic codon cells, then it receives three stars. The truth
of these conditional propositions cannot be deduced from the available data. Star
ratings within the section are based on the assumption that both propositions are true.
Bleep is a prerequisite for the formation of some new classificatory units***. The
construction of new codon j'unctions for high level units***, and perhaps the selection of new output cells, takes place then, though the latter can** occur, and
probably usually does*, during waking.
Let $, and 3, be two collections of pieces of information such that many of the
spatial relations present in 3, appear frequently in s l , and have not previously
appeared in the experience of an animal. The animal is exposed to%,, and then tos,.
If the exposures are separated by a period including sleep, the amount of information
the animal has to store in order to learn$, is less than the amount he would have to
store if the exposures had been separated by a period of waking***. This is hecause
the internal language is made more suitable during the sleep, by the construction of
new classificatory units to represent the spatial redundancies ins,. The recall of$,
itself is not improved by sleep**.
Conversely, if this effect is found to occur, some codon cells have modifiable
synapse^'^'^.
I wish to thank especially Professor G. S. Brindley, F.R.S., to whom I owe more
than can be briefly described; Mr S. J. W. Blomfield, who made a number of points
in discussion, and who proposed an idea in 5 1.5; Professor A. F. Huxley, F.R.S., for
some helpful comments; and Mr H. 1'. F. Swinnerton-Dyer, F.R.S., for various
pieces of wisdom. The embryos of many of the ideas developed here appeared in a
Fellowship Dissertation offered to Trinity College, Cambridge, in August 1968:
that work was supported by an MRC research studentship. The work since then
has been supported by a grant from the Trinity College Research Fund.
It E F E R E N C E S
Andel.so11, P., Ecclcs. J. C. & Lclyning, Y . 1963 l%rbc.nrrentinl~lrbrtlollin the h~ppocampus
Nature, Lond. 197, 640 642.
wrth rdcritrficatrorr of' thc in!rrbltory crll and ~ t synapscs.
s
Andorson, T'., Ecclcs, J. ('. & Voorlloevc, I'. E. 1963 lnhibltory synapscls on sornils of'
l'urk~nje eclls in thtb c.crcbcllum. Nature, Lorzd. 199, 666 656.
Barlow, H. B. 1961 Possiblr prrneiplcs rnrdcrly~rlgl,hc trilrlsforrnations of sensory intbssagcis.
I n Sensory Covcmu?xiratzon (Ed. W . A. Roicwblitll), pp. 217 234. MlT and Wilcy.
Rlornficld, Stepherl & Marr, David 1970 How tlrc, cerebollnm may be used. Nature, Lorzd.
227, 1224 1228.
Brindlcy, G. S. 1969 N i ~ r\7(. net models of plauaibli~sizil that perform rnany simplo learning
tasks. Proc. Roy. Soc. L o d 13 174, 17:3-191.
Cajal, S. It. 191I Histoloqze d u XystPme Nerveuz 2. Maclrrd. CSIC.
Colonnier, M. 1968 Syrmptic pattc.rns on tlrfferc?rrtcell types irr the diff'vrilnt lamlnac of thil
cat v ~ s u a cortcxx.
l
,4n c%lectronmicroscopr3study. Brazn Ros. 9, 268-287.
l ~a~neuronal
rn
rr~achzne.Brrlin.
il:cc-Ics, J. (I., Ito, M. h 8~clrrtBgotha1,
J. 1967 l'he ~ e r e l ~ ~ las
Springer-Vwlag.
Hebb, 19. 0. 1949 T h e orgatazeatzon oj behavzour, pp. 62 66. New York: Wiluy.
Hubel, D. H . & Wlrscl, T. N. 1962 Itc~rc~pt~vc,
fields, hinocalar ~ n t e r a c t ~ oand
n functional
arch~tccturi~
cortcx. ,/. l'hysiol. 1160, 106 154.
rn thcl c ~ t ' vislial
s
Jardin(,, N. & Sibson, Ii. 1968 R model fix iaxonorriy. Math. Rzoscz. 2, 4(i5-482.
Jardlnc., N . & S ~ b s o i1%.
~ . 1970 'Ch" rnoasur~rrrc~nt
of d~ss~rnllarity.
(Snbmittcd forpubllcat~on.)
Kendall, D. C. 1969 Sorno problcrns and mc~thotlsin statistical archaeology. Worlcl Archaeology 1, 68 76.
Kingnian, J. F. (>.
& Taylor, S. J. 1966 It~ipodtrctzorcto meastcre and probabd~ty.Ci~nlbndgr
Univc~rsityJJrcss.
nal
Psychometrzlca. 29, 1-27 ; 28-42.
K r ~ ~ s k a.J.l , R. 1964 M u l t ~ d ~ m m s ~ osc*:~lnrg.
cortr x. ./. Plqjszol. 202, 437-470.
Rllarr, n w ~ d1969 A throry of cc~rrbc>ll:~r
Marr, n a v l d 1971 Sirnplc Memory: a thcory lor arclncortc~-c.(Submitted for pnblication.)
~ n i n a ctistributlons
l
of somc :tTf(lrcmt fililr,rclsystc~nsiri the cclrcbral
Nauta, W. J . H. 1954 '1"~~
cortox. Anat. IZec. 118, 333.
I'etr~c,,W. M. F l i r ~ d i ~ s1899
,
Siq~crtcosIn prcll~st~r~ric
r.t~inains.,/.Anthrop. Inst. 29, 295-301.
ltcnyi, A. 1961 Ou rntx:tsnres of entropy :sud i~rforrnation.I n : 4th Berkely S y m p o s ~ u morr
filathematzcal Statistzcs and f'rohah?laty (Ed. J . Ncyrrr:m), pp. 547 661. Herlceley: Unlv.
of C'alifornm l'rcss.
Shannorr. C. E. 1949 I n The mathc?raat~caltkeorli oJ totritnunzcation, C. 3:. Shannon & W.
W~avclr.Urbnna: Univ. of Tllmo~sL'rclss.
Sholl, D. A. 1956 l'he or!ganzsatzon of fhc cerebral cort(,x. 1,ondon: Mothucn.
S~hson,13. 1969 Inforrll;tt,toit radlas. ;5. Wnl~rschcznlichlccitsthrorie14, 149 160.
Sibson, R. 1970 A nlodel for taxonomy. II. (811ljrorttcdfor pli1)licatioil.)
of h~ppocampal neurons TV.
Spc'r~ccr,W. A. & Kand(~1,F. R. 1961 131(ct1~opl1y~1oJ0gy
Fast prcpotcntials. ./. Ncurop?~ysaol.24, 274 285.
Szet~tBgothai,J. 1962 On tlrc=synal?tology of t11c c c ~ ~ i ~ lcortpx.
)ral
I n : Structure andjunctzons
oj the nervous system (Ed. S. A. Sarlirsov). Moscow: Mcdgiz.
,J. 1965 'l'hc nit, of dog~nerattorlrncthods in the ir~vest~gation
of short ncuronal
Sz~rrt&got,lla~,
connect~orls.I n : L>cqer~eratio?~
patterns In the n e vous
~ system, Progr. in Brain Research 14
(Eds. M . Singer & J. 1'. SchadP), 1 32. Arnstixrd~~n:
Elsevrcr.
Szcnthgothai, J. 1967 7'1m arlatorny of complex ir~togrativelirlits ill the nervous systcrn.
Recen,f.developmerrt of rt,au.robdoloyyy,in lf un(gcrr,q 1, 9-46. '13udapctst: Aliiid6miai Kiacl6.
Valvordo, J?. 1968 Strnct11r;~lcltt:~rtgosin tlril ;Lrrsa striata of tho mollst: after onuc:leation.
E z p . Bra,in Res. 5, 274--202.