A Theory for Cerebral Neocortex D. Marr Proceedings
Transcription
A Theory for Cerebral Neocortex D. Marr Proceedings
A Theory for Cerebral Neocortex D. Marr Proceedings of the Royal Society of London. Series B, Biological Sciences, Vol. 176, No. 1043. (Nov. 3, 1970), pp. 161-234. Stable URL: http://links.jstor.org/sici?sici=0080-4649%2819701103%29176%3A1043%3C161%3AATFCN%3E2.0.CO%3B2-4 Proceedings of the Royal Society of London. Series B, Biological Sciences is currently published by The Royal Society. Your use of the JSTOR archive indicates your acceptance of JSTOR's Terms and Conditions of Use, available at http://www.jstor.org/about/terms.html. JSTOR's Terms and Conditions of Use provides, in part, that unless you have obtained prior permission, you may not download an entire issue of a journal or multiple copies of articles, and you may use content in the JSTOR archive only for your personal, non-commercial use. Please contact the publisher regarding any further use of this work. Publisher contact information may be obtained at http://www.jstor.org/journals/rsl.html. Each copy of any part of a JSTOR transmission must contain the same copyright notice that appears on the screen or printed page of such transmission. JSTOR is an independent not-for-profit organization dedicated to and preserving a digital archive of scholarly journals. For more information regarding JSTOR, please contact [email protected]. http://www.jstor.org Mon May 7 13:53:02 2007 Proc. Roy. Soc. Lond. B. 176, 161-234 (1970) Printed in Great Britain A theory for cerebral neocortex BY D.MARR I'yinity College, Cantbridge (Communicated by G. S . Brindley, F.R.8.-Received 0. INTLGODTJCTION 0.1. 0.2. 0.3. 0.4. 0.5. The form of a ne~nophysiologicaltheory The nature of the present general t,llcory Olitlincs of the present thoory Ilclinitions and notation Irlformatioir mcasnres 1.0. 1.1. 1.2. 1.3. 1.4. 1.5. 1.6. Introdlxction Information thcoretic rcdundarlcy Concept formation and redundancy P~.ohlomsin spatral redundancy T'llc recoding dilemma Biologrcal litrllty Thr flindarncnta,l hypothesis 2. TBF FUNDAMENTAL 2.0. 2.1. 2.2. 2.3. 2.4. 2.5. TFCEOREMS Introduction Iliagnosis : g ~ n t r l i t i c s TElc notion of evidcilcc Thr dlagnobrs theorern Notes on the dragnosrs theorcm The irltcrprrtation tl-lrorrm :!. T ~ n cCODON RFPRFSP:NTATION 3.1. Simple synaptic drstr~butions 3.2. Qlxality of evidence from codon functions 4. T n e 4.0. 4.1. 4.2. 4.3. 4.4. 4.5. GENERAL NEIJRAL REPRESJ3NTATION Introduction lmplcmrnting the diagnosis theorem Codon flxnctions for cvidcncc Codon nelirotechrlology Iinplemcnting the intcrprctation theorrin Tl-rr f~xllnrliral model for dragnosrs and intcrprctatron [ 161 ] 2 IKar.c,h 1970) 5.1. Scttir~gu11 tllc rlcural reprcscntation: sleep 5.2. Thc spatlal recognizcr cffcct 5.3. T h e rciincrrlent of a clessifictttory u n i t 6. Norms 6.0. 6.1. 6.2. 6.3. 6.4. 6.5. 7.0. 7.1. 7.2. 7.3. 7.4. 7.5. ON THE CEREBRAL N E O ~ ~ O R T E X Introduction Coiloil cells i n tlze cerebral cortex The cerebral o u t p u t cells Ccrcbral climbing fibres Inllik~itorycells Cencralitics Introduction MartinoLti cells 214 219 224 22.5 225 225 228 228 229 230 23 1 231 Ccrcbral granule cclls Pyrarrlidal cctlls Clirnbing tibrcs Other short axon cells It is proposod that the learning of marly tasks by the corebrrlm is based on using a very fen, ft~ntiamontaltechniques for organizing information. I t is argued that this is matie possible by the prevalence in the world of a particular kinti of retinndancy, which is charaoterized. by a 'Funtlamontal Hypothesis'. This hypothosis is used to found a theory of the basic operations which, it is proposed, are carried out by the cerebral neocortex. They involve the use of past experience to form socalled 'classificatory units' with which to interpret subsequent experience. Such classificatory units are imagined to be created whenever either something occurs froquerltly in the brain's experience, or enough redundancy appears in tho form of clusters of slightly differing inputs. il (non-Bayesian)information theoretic account is given of the diagnosis of an input as an instance of a n existing classificatory unit, anti of the interpretation as such of an incornplotely specificti input. Neural models arc devised to implement the two operations of diagnosis and interpretation, anti it is fourltl that tho performance of tho scconti is an automatic consequence of the model's ability to perform the first. The discovery and formation of new classificatory units is discussed within the context of thest, neural models. It is shown how a climbing fibre input (of the kind describeti by Cajal) t o the correct cell can cause that cell to perform a mountain-climbing operation in an underlying probability space, that will lead it to respond to a class of events for which it is approprinte to code. This is called the 'spatial recognizer effect'. The structure of the cerebral neocortox is revie~vedin the light of the mod01 which the theory nstablishes. I t is founti that many olernonts in the cortex havo a natural identification with elements in the model. This enables many predictions, with specified tiegrecs of firmness, to be matie concerning the connexions and synapses of the following cortical cells and fibres: Atastinotti cells; cerebral granulo cells; pyramidal cells of layers 111, V anti 11; short axon cells of all layers, especially I, I V and VT; cerebral climbing fibres and those cells of the cortex which give rise to them; cerebral basket cells; fusiform cells of layers VI and VII. I t is shown that if rather little information about tJheclassificatory units to be formed has been coded genetically, it may be necessary to use a technique called codon formation to organize structure in a suitable way to rcprosent a new unit. I t is shown that under certain conditions, it is necessary t o carry out a part of this organization during sleep. A prediction is mado about the cffect of sloep on learning of a certain lrir~d. 232 A theory for cerebral neocortex 163 0. I . The form of a neurophysiological theory 'I'he mammalian cerebral neocortex can learn to perform a wide variety of tasks, yet its structure is strikingly uniform (Cajal 191I ) . It is natural to wonder whcther this uniforinity reflects the use of rather few underlying methods of organizing information. The present paper rests on the belief that this is so, and describes a kind of analysis which is capable of serving many aspects of the brain's function. The theory is necessarily general, but it in principle allows the exact form of the analysis for any particular cerebral task to be conlputed. Therc is an analogy between the shape of the general theory set out here, and that of a recent theory of cerebellar cortex (Marr 1969). 'I'he essence of the latter theory was a principle, that motor sequences are driven by learned contexts, which was clearly applicable to the kind of function with which the cerebellum was thought to be associated. The key ideas concerned the way information was stored, and the way stored information could be used; but the theory did not explicitly demonstrate how any particular motor action was learned. For this, it would be necessary to have a much fuller understanding of the nature of the elemental movements for which the Purkinje cells actually code, and of the information present in the relevant mossy fibres. The theory was however useful, because it postulated the existence of a 'fundamental operation' of the cerebellar cortex, and offered a candidate for it. The present theory is once removed from the description of any task the cerebrum might perform, in the same way as was the cerebellar theory from the description of any particular motor action. Something of this kind is probably an inevitable feature of the theory of any interesting learning machine, but in the particular case of the cerebral cortex, it is likely there exists a second, more concrete analogy between its working, and that of the cerebellar cortex. The evidence for this is the analogy between the structures of the two types of cortex. The cerebral cortex is of course irregular and very complicated, but there do exist similarities between it and the cerebellar cortex: the fundamental cerebellar components--the granule cells, Purkinje cells, parallel fibres, climbing fibres, basket cells and so on-have recognizable counterparts in the cerebral cortex. I n view of the great power the codon representation possesses for the economical storage of information (Marr 1969),i t cannot be that this analogy is accidental. There muse bc a deeper corresponde~ice. 0.2. T h e nature of Ihe. present general theory It was the suspicion that there may exist deep reasons for these similarities that formed the starting point of the present enquiry. The motivation for the development of the theory was provided by two intuitions. The first was that i11the generalization of the basic cerebellar circuit, the analogue of the Purkinje cell (called an output cell) need not have a fixed 'meaning'. I n the cerebellum, each Purkinje cell probably has predetermined 'meanings', in that the responses its outputs can 11. Marr evoke are liltely to be determined by embryological and early post-natal development. 1x1 a more general application^ of this kind of model, i t is clear that what the outpnt cell ' means ' might be free t o be determined by some aspect of the structure of the information for which the system is being used. The second intnition was that the codon representation, in the kind of model applicable to the cerebellar cortex, may in fact be capable of doing more than the simple memorizing 1 ask to which it can obviously be applied (Rlomfield & Marr 1970). This feeling was tied to the idea that the recognition of a learned illput ought properly to be viewed as a process of diagnosing whether the current input belonged to the class of learned inputs. This immediately suggests that the behaviour of an output cell should not be an all-or-none affair, but should convey a measure of how certain is the outcome of the diagnostic process. This has the attraction that it could ultimately correspond to how 'like' a tree is the object a t which one is currently looking. These two ideas were bound by the constraint that rrlore or less whatever theory was set up, i t had to be grounded in information theory; or if not firm reasons why this is undesirable must be given. It was evident from the start that no very orthodox information theoretic approach would be of any use; but the general ideas behind the formulation of an information measure are so p)owerfixl that it would have been surprising had they turned out to be totally irrelevant. The result of these ideas was a general theory which divides neatly into two parts. The first, with which this paper is concerned, describes the formation and operation of a language of so-called classificatory units by means of which the sensory input can eventually be usefully interpreted (gC 1).The lormation of a classificatory unit is imagined to occur roughly whenever enough related inputs happen to make it worth forming a special description for them. The main results arc the information theoretic theorems of $ 2 on the diagnosis and interpretation of an input within a class, and the theory of $5 for class formation. The power of these results is that they lead to specific neural models, and to operations in those models, through which a preliminary interpretation of the histology of cerebral cortex can usefully be made. The first part of tllc theory may therefore be described as a model for concept formation and recognition, where concepts are ' classificatory units '. It argues that there exists a basic information-handling scheme which is applied by the cerebral cortex to a wide range of different kinds of information-that there exists a ' way' in which the cerebrt~lcortex 'works '. This scheme has a wide application, silbjeut io reservations about the need in certain circumstances for special coding devices to cope with particular forms of redundancy. But in principle, it can be applied to anything from t,he recognition of a tree to the recognition of the necessity to talic a ~):attjcularcourse of action. The theorems of $ 2 provide a complete analysis of the problem of interpreting an input within a particular class, but the ideas of $5 provide only 2% partial analysis of t,he formation of the classes themselves. This problem cannot be dealt with using only A theory f o r cerebral neocortex 166 the hardware developed in this paper; and its solution requires the results of the second part of the gcncral theory. Thc second part of the theory embodies a second pair of ideas. One of these also arises from the cerebellar theory, where it was seen that a codon representation is extremely successful a t straight memorizing tasks (Brindley 1969; Marr 1969). The othcr is the everyday concept of an associative memory. The cerebellar theory is a kind of associative memory theory, and it is not difticult to extend the idea of the codon representation to the case of a general associative memory. This is devcloped in the theory of Simple Memory (Marr 1971). Once this has been donc, it is possible to see how current tlescriptiorls of the environment can be stored, and recalled by addressing them with small parts of such inputs. This is the facility needed to complete the theory of the formation of classificatory units. It is, however, only a small part of the use to which such a device can be put: almost the entire theory of the analysis of temporally extended events, and of the execution ub i n i t i o of a sequencc of movements, rests upon such a mechanism. Though simple, i t is important (and long) enough to warrant a separate development, and is therefore expounded elsewhere, together with the theory of archicortex to which i t gives rise. 0.3. Outlines of the present theory This paper starts with a discussion of the kind of analysis of sensory information which the brain must perform. The discussion has two main strands: the structure of the relationships which appear in the af'terent information; and the usefulness Lo tho organism of discovering them. These two ideas are combined by the 'Fundamental Hypothesis' of 5 1.6 which asserts the existence and prevalence in thc world of a particular kind of relationship. This forms an explicit basis for the subsequent theoretical development of classificatory units as a way of exploiting these relationships. The fundamental hypothesis is a statement about the world, and asserts roughly speaking, that the world tends to be redundant in a particular way. The subsequent theory is based, roughly, on the assumption that the brain runs on this redundancy. The second section contains the fundamental theorems about thc diagnosis and interpretation of events within a class. It assumes that the classes have been set up, and studies the way in which they allow subsequent incoming information to be interpreted. These theorems receive their neural implementation in the model of figure 8. The rcst of the paper is closely tied to the examination of specific neural models. After the technical statistics of 5 3, the main section 4 on the fundamental neural models appears. This discusses the structures necessary for the implementation of the b:isic theorems, and derives explicitly those models which for various reasons seem preferable to any others. The first main resule of the paper consists in tlie demonstration that the two theorems of 2 correspond to closely related operations in the basic neural model. The second main result concerns the operations involved in the discovery of new 166 D. Marr classif catory units. It shows how a climbing fibre enables a cortical pyramitl;~lcell to discover a cluster in tho space of events which that cell receives. This result, together wiLh the previous ones which show how classificatory units work when represented, completes the main argument of the paper. Finally, in $6, the available knowledge of the structure of the cerebral cortex is 1)ric~fly reviewed, and parts of it interpreted within the models of $4. This scction is incomplete, both because of a lack of information, and because Simple Mcnlory theory allows the: interpretation of other components; but it was .thought better a t this stage to include a brief review than to say nothing. Par too little is known about thc structure of the cerebral cortex. 0.4. L)ejnitions and notation 0.4.1. Tirne, t, is discrete, and runs through the non-negative integers (t = 0, 1, 2, ...). t scarcely appears itself in the paper, but most of the objects with which the theory deals are essentially functions oft. 0.4.2. An input$hre, or$hre, ai(t),is a function of time t which has the value O or 1, for each i, 1 6 i 6 N . a,(t) = 1 will have the informal meaning that the fibre a i carries a signal, or 'fires' a t time t. A signal is usually thought to correspond to a burst of impulses in a real axon. The set of all input fibres is denoted by A, and the set of all subsets of A by 21. 0.4.3. An input event, or event, on A assigns to each fibre in A the value 0 or 1. Events are usually denoted by letters like E , P,and the value which the event E assigns to the fibre ai is written E(a,), and cquals 0 or 1 (1 < i 6 N). It is convenient to allow the following slight abuse of notation: E can also stand for thc set of ai which have E(ai) = 1. Thc phrase 'ai in E' therefore means that E(a,) = 1, i.0. that the fibre ai fires during the event E. 0.4.4.A suhevent on A, usually denoted by letters like X, Y, assigns the value 0 or 1 to a subset of the fibres %, ..., aN. For example, if X(ai) is undefined for i > s, then X is a subevent on A. As in the case of full events, X can also mean the set of fibres ai for which X(a,) = 1: in the example therefore, X can stand for the set {a,, ..., a?). 0.4.5. If X is a subevent, the set of fibres to which X assigils a value is called the support of X , and is written S(X). Thus in the above example, S(X) = (a,, ...,as}. 0.4.6. A sct of events is called an event space, and is denoted by letters like E, 8. A set of subevents is called a subevent space, and is denoted by letters like X,9. 0.4.7. Greek letters are usually reserved for probability distributions. The letter A, for example, often denotes the probability distribution induced over (the set of all possible events on a,, ..., aN)by the input events. Thus h(E) is the number of occurrences of the event IC divided by the total elapsed time. If, instead of considering the whole of A = {a,, ..., aLv),attention is restricted to A' = {a,, ...,a,), then A theory for cerebral neocortez 167 the space a' of events on A' corresponds to a set of subevents on the original fibre set A . Every event in (21 defines a unique event in a ' , obtained by ignoring the fibres a,,,, . .., a,. Thus the full distribution A over 3 induces a distribution A' over a' obtained by looking only a t the fibres a,, ...,a,. h is called the projection onto 21' of A. If X is a subevent space, then the phrase 'A' is the distribution induced over Z by the input' refers to the A' induced from the full input probability distribution A by projecting it onto X. If 23 is any subset of %, then the restriction A123 of A to 23 is defined. as follows: (A]23) (E)= A(E) when E is in B, ( A ) (E) = 0 elsewhere. 0.4.8. l h a l l y , it is often convenient to use various pieces of shorthand. The following is a list of the abbrcviations used. I 1 is a method of defining a set. For example, {ail 1 < i < N ) means ' -tiheset of fibres ai which satisfy the condition that 1 6 i < N ' . s.t. means 'such that ', E means 'is a rnernber of the set ': e.g. ai E E, $ means 'not E ', P(XI Y) is the conventional conditional probability of X given Y, => means ' implies ', e means ' is implied by ', e,. means 'implies and is implied by ', iff means 'if and only if ', 3 means 'there exists ', means 'the number of elements in': e.g. 1 El means 'the number of fibres I that are active in the event I#', { I The following set-theoretic symbols are also used: E u b' E nF E\P En P E EP EcF = the union of Z and 1"' intersection of E and 17, set of elements which are in E but not in P , = the set of elements which are in exactly one of E and P , means I# is contained in or equal to F , means E is contained in P and does not equal P. = the = the The reader who is not familiar with this notation should not be put off by it. All the important arguments of the paper have been written out in full. An adequate understanding of its content may be achieved without reading the paragraphs in small type, which is where these symbols us~tallyappear. 0.5. Information measures The only universal measures of suitability, fit, and so forth, are information measures. Three are of principal importance in this paper, and are defined below. Others are derived as they are needed. All the spaces with which the paper is 168 U. Marr concerned are finite, and therefore only discrete probability distributions need be considered. Definitions are given here only for the finite case, although every expression has its more general form. 0.5.1 . Entropy (Shannon 1949). The entropy of the discrete probability distributionp,, ...,fi\vill be denoted by the letter h. Thus s " ( ~ 1 7 PSI = . x -pi1°g2pi' .*.7 L- l All logarithms are to base 2. 0.5.2. Information gain (Shannon 1949, and see RGnyi 1961). Let p, v be two discrete probability distributions over the same set of events: Then the information gain due to /Lgiven v is 0.6.3. Information radius (Sibson 1969). pi Let p,, ..., p, be discrete probability distributions over the same s events. = (pil, ...,pis), x p i j = 1. Let p = (p,, ..., ps), and write /L /hi if p , = 0 implies i that pik = 0. Let w,, ..., w, be positive numbers. Then the information radius of the with weights zu,, is n This infimum is achieved uniquely when n P r; wipi i=1 = 7- Ii,the information radius, is an information measure of dissimilarity. will be abbreviated to K(/L,~,). The nature of K is explained more firlly where it is used. This section is concerned with the problem of what the brain does. The background and arguments it contains are dircctcd towards the justification of the Fandamental Hypothesis (1.6). I t is shown that despite the complications which arise in the early A theory for cerebral neocortex 169 processing of sensory information, this hypothesis is oftcn valid for information with which the brain has to deal. The discussion proceeds by first exploring notions connected with the idea of eliminating information theoretic redundancy an idea which has had a somewl~atcheyuered career in neuropnysiology (sce Barlow 1961 for discussion and references). Secondly, ideas connected with biological utilit,y are developed; and finally thesc are combined with the ideas of the first part to produce the philosophy from which the theory is derived. 1.1. Informalion thmretic redundancy 1.1.1. Redundancy and early processing of visual information The notion that the processing of sensory information is an operation designed to reduce the rcdundancy in its expression is attractive, and one thet is helpful for understanding certain aspects of early codii~g.For example, the coding in the optic nerve of relative rather then absolute brightness prevents the repeated transmission ofthe average brightness of the visual field. The use of on-centre oE-surround coding there is peculiarly suitable for another reason, namely that the visual world has a tendency to be locally homogeneous. (liven that a particular point in the visual field has a certain luminance and colour, the chance that neighbouring points also do is high. 'Chis kind of reclurldancy wonld not be present if, for example, thc world was like scattered, multi-coloured pepper. The visual world has this tendency towards continuity because matter is cohesive: the existence of edges and boundaries is a consequence of this. It may be possible to view the next stages of visual processing--by the 'simple' and 'complex' Hubel & Wicsel (1962) cells of area 17--as a further recoding designed round the redandancy associated with tho existence of edges, bars, and corners. The test of this is whether using these cells, i t is oasier t o represent scenes from the real visual world then an arbitrary, peppery optic nerve input; and it probably is. 'l'here are many other ways in which redundancies arise in visual inforrnrrtion. The noxt most obvious are those introduced by the operations of translation, magnification, and by rotation. For these operations a t least, the question of what to do with the redundancy to whiclr they give rise poses no great dificulties of principle. The brain is, for example, much less interested in where an image is on the retina then on tho relative positions of its various parts. I n this case, the clear object of a portion of thc processing must bc to rccode the input, perhaps gradually, in such 21, way that relative positions are preserved. This should probably be done so that if two objects are seen momentarily, each in a different position, orientation, and having a different size, then the accuracy with which they may be compared should depend upon the magnitude of these differences. Various similar points can be made about early processing in the other sensory modalities; but enough has probably becn said to make the two main points. They are first, thet notions of pure redunclaiicy reduction probably arc involved in the early analysis of sensory information. Secondly, redundancy can occur in many forms. The variety is especially obvious nearer the periphery. Each form requires a special mechanism to copo with it, and so, especially lower down in the brain, it is natural to expect a diversity of specialized coding tricks. Some of these have been found, and some have not. 1.1.2. Redundancy and later visual processing A great deal of the redundancy in visual information arises out of the permancnce of the world. This, which includes the tendency of matter to cohere, makes it natural to code for changes, and to look for common subevents, like lines, corners, and so forth, which concern only a small fraction of the total population of input fibres. Common subevents are often called jeatures, and the ideas associated with the analysis of features are probably the most promising available concerning later processing. 'l'heir potcntial advantage is most clearly secn in the analysis of objects: the great hope they hold is in the possibility that objects may be recognized by checking for the presence of particular features. These features are imagined to be drawn from a central pool which is shared by all other objects, and which is not too large. This kind of scheme for later visual processing introduces five main categories of problem : (i) The discovery of the relevant feature vocabulary. (ii) Coding features in a suitably invariant way. (iii) Coding the relative positions of the features. (iv) Partitioning the features so that information from one object is separated from information about other objects. (v) The decision process itself. 'Object', in the case of visual informatioi~,has a fairly well-defined meaning, becausc of the coherence of matter; but these gcneral ideas have a wider application. For example, an 'iimprcssion ' of an auditory input may bc obtained from its power spectrum: in such cases, the 'objects' are less tangible. But for now, it is enough to consider just the special, visual case. Problems (i) and (v) arc vcry general, and are dealt with later ($1.4, $2, $5). Problem (ii)is special, and only two points about it will be made here. First, lines and edges are preserved by magnification, so parts of problem (ii) are automatically solved. Secondly, i t is only necessary to localize the components of any particular image to an extent that will prevent their eonhsion with other images. The exact positions of the cdges and corners of an objcct need not be retained, becalrsc the general restraint of continuity of form will rncan that exact relative positions can al-cvays be rcxcovered from a knowledge of approximate relative positions, the number of terminations, and approximate lengths of segments. Hence the problems associ:~tedwith translation of an image across the retina can hegjn to be solved quite early by recoding into elernents which signal the existence of their corresponding features within a region of a particular size. The exact size will depend upon how unusual is the feature. This in itself is of no use unlcss sonic way can be found of representing these A theory for cerebral neocortex 171 approximate relative positions: this is problem (iii).Fortunately, it is very easy to see how distance relations may be held by a codon representation (Marr 1969). The key is an idea of 'ncarness '. Suppose {f,, ...,f,) is a collection of features, oi~dowedwith approximate distance relations d(fi, fj) between each pair. Suppose subsets of the sct {f,, ..., f,) are formed in such a way that those features which are near one another are more likely to be included in the same subsct than those which arc not. Then the subsets would contain information about the relative positions of the fi (see Petrie 1899 for an intriguing natural occurrence of this effect). Techniques like multidimensio~~al scaling can be used to recover metric information explicitly in this kind of situation (Kruskal1964;Kenddl1969)~but for the prescnt purpose, it is enough to note that two different spatial configurations would produce two diflerent subset collections. There is thus no difficulty of principle in the idea of analysis of shape by roughly localized features: but it is clear that d l these techniques rely a great deal on the ability to pick out the components of a single shape in tho first place. That is, a successful solution to problem (iv) is a prerequisite for this kind of solution to problems (ii) and (iii). This involvcs searching for hard criteria which will enable the nervous system to split up its visual input into components from different objects. The most obvious suitable criteria arise from the tendency of matter to cohere: they arc continuity of form, of colour, of visual texture, and of movement. For example, most parts of a fleeing mouse are distinguished from the background by their movement. A solution in this case would be to havc a mechaiiism which causes signalsaboutrnovementin adjacentregions of the visual field to enharice one another, and to suppress information from ncarby stationary objccts. It is not difficult to devise mechanisms for this, and analogous ones for the other criteria. These ideas about joining visual data up using certain fixed criteria, are collectively called techniques for visual bonding. It would be surprising if the visual system did not contain mechanisms for implementing a t least somc kinds of visual bonding, since the methods are powerfill, and can be innate. It can be seen from this discussion that dthough ideas about redundancy eliminaLion probably do not determine the shapc of later visual proccssing, thoy are capable of contributing to its study. Those problcrns of principle ((i)and (v))which arise quite quickly can and will be d e d t with: the crucial point is that technical problcins ((ii)-(iv)) will usually involvc the elimination of redundancy associated with spccial I-tinds of transformation-perhaps specific to one sensory mode. These problems can either be solved by brute memory (e.g. pcrhaps rotation for visual informalion) or by suitable tricks, like visual bonding. The point is that these problems usually can be ovcrcome somehow; and this is the optimism one nccds to propel one to study in a serious way the latcr difficulties,which are genuinely matters of principle. D. Marr 172 1.1.3. R~dundancyand inforrr~ationstorage There is a quite different possible application of infomration theoretic ideas, and it is associated with the notion of coding information to be stored. It is a matter of everyday experience that some things are more easily remembered than others. Patterns are easier to rccall than rai~domlydistributed lines or dots. It cannot be argued that the ranciorn picture contains more information in any absolzhte sense, since the ci~lculationof its information content depends entirely upon the norm with which it is compared. If the norm is itself, the random picture contains no information. There can be no doubt that a normal person would have to store more information to remernbcr the random picture than the patterned one; but Lhis, in the first instance anyway, is a remark about the person, not about the pictures. 'I'his illustrates thc firntlamental point of this section-that the amount of infornn:~tiona memory has to storc to record a given signal depends upon tlrc structurc of the signal, and thc structurc of the memory. Lct Z be an event slx~cc, and let o be the probability distribution corresponding to the afferent signal: thus o(E), for E in Z, in the probability that 8will occur next. (The present crudc point can be made without bringing in temporal correlatiorrs.) Let 16 bc thc probability distribution uhich desc.ribes what the memory expects. Then the amount of information the mernory requircs to store o is 'I'his cxpression cxists if and only if /L(E) = 0 * s ( B ) = 0. h(o:,u) and h(s), tho entropy of n, are related by the following result. Assuming the rnemory can dore o, then: + T(o1,u). L ~ m m uJ(cJ~,u) . exists, and h(o:,u) = h ( s ) Proof. l f the merasory can store o , p ( E ) = 0 + o ( E ) = 0, and hence I(sj,u) exists. Now = J ( n l y )+h(o). The tcrm h(o) is inevitable, but tllc tcrm J(o-116)reflects thc fundamental choice a mcmory has when instructed to store a signal o. It can either store it straight, a t cost h(o:p), or it call change its internal structurc to a ncw distribution, ,IL' say, and store thc signal relative to that. The amount of information required to change the structure from ,LL to ,uf is a t least II-I(,u,u'),wherc K is the infor.rn;ttion radius ($0.5.3); but, though an expensivc ouIlay, it can lead to grcat savings in the long run ii'lc' is a, good G t to thct irlcornirrg informlation. These argupr~cntsare too general to warrant further precise development, but A theory far cerebral neocorlex they do illustrate thc two possibilities for a mcnlory which has to store information: either i t can store i t raw, or it can develop a new languagc which bctter fits thc information, and store it in tcrms of that. To this point, the next section S 1.2 will rctr~rn. Finally, this result sllows how important it is to examine the structure of a mcrnory beforc trying to compute the smonnt of information necdcd to store any givcn signal; i t would tllcrefore be disappointing to leave it without somc remarks on the typc of internal distributions 16 we may expect to find in the actual brain. The obvious liind of answcr is the tlistributions induced by a codon representation--as in the cwebellum. The reliability of a memory is measured by the numbcr of wrong answers it givcs whcn aslied whcther the current event has becn learned. 'l'his in tarn depcnds upoil the number of possible input events: in cascs whcrc this is huge, tlze nlcmory need only arrange that the proportion of wrong to right answers remains low. In smallcr cvent spaces, a mcrnory rnay havc to represent the lcarned distribution a good deal more accurately. The first case may -tvcll correspond to the situation in the cercbcllum and a l l o ~ codons s of a relativcly small size: the sccond may rcquire them to be much larger. The result relevant t o this appears in $3, but the situation even in thc cerebellum ma,v iin fact be rather more complicated (Blornfield & Marr 1970). 1.2. Coi7cepl?tformntio?l,und reclundu?zcy 1.2. I . 7'he relecu?ace of co.ncepts It was sllown in $1.1.3that onc policy available to a memory faced with having to store a signal is to construct for it a special language. In the present context, this is bound t o suggest the notion of concept formation. P t is difficult to doubt that one of the most important ways in which the nervous system eventually deals m ith sensory information is to form concepts with which to decompose and classify it. For ex:nnple, the concepts clzail.,sun, lover, music all have their use in tlze description of the -c%orld; and so, a t a lower lcvcl, do the notions of line, eclg~,tom and so forth. Concepts, in general, are things which ease the nervous system's task; and although they do this in various ways, many of these ways produce their advantage by characterizing (and hence circumventing) a particular source of redundt~ncy.One espc~ciallyimportant example of how a concept does this is by expressing a part or the whole of that which marly 'things' or 'objects' have in common. This 'comrnon' element may take many forms: the objects' representations by sensory receptors may be related; some aspect of their functions may be the same; they may have comrnon associaiions; or the.)?may simply havc occ~rrrcdfrequentlyin the experience of the observing organism. 'I'his notion l ~ a the s corollary that concept formation sllould be a natural comequence of the discovery of a large enough source of redundancy in the input generating a brain's experience. For example, if i t is noticed that a certain collection of features commonly occurs, this collectiorl should be recoded as a slew and separate D. Marr entity: for this new entity, special recognition apparatus should be set up, and this then joins the vocabulary of concepts through w1iicl.i the brain interprets and records its experience. Finally, concepts have been discussed as a ma;tns of formul;tting between collections of other 'things ', 'objects ', or 'features '. This appears to ~ c s t upon the imprecise notions of 'thing', 'object' or 'fexture': but thcre is in fact 110 undefinable notion present, for these can simply he regarded as concepts (or roughly, occurrences of concepts) thab have previously been formed. 'l'his inductive step allows thc argument to be taken back to the primitive input elements on which the whole structure is built; and in neurophysiology, there is no fundsmc.nta1 problem to finding a ineaning for these: they are either the signalsin axons that constitnte the great afferent sensory tracts, or the features automatically coded for in the nervous system. 1.2.2. Obstacles Sometl~ingof a case can therefore be made for a connexion betwcen concept formation and the coding out of redundancy, but it would be wrong to s~xggcstthis is all that is involved. Concept formation is a selective process, not always a simple recoding: quite ;IS important as coding out redundancy is the operation of throwing away information which is irrelevant. For the moment however (until $1.4) it is convenient to ignore the possibility that a recoding process rnight positivcly be dcsigiied to Iosc information, and to conce1itr:~teon t h e difficulties involvccl in recoding a redundant signal into a more suitable form. The general prospects for this operation are not good: this is for the same reason that the proofs of Shannon's (1949) main coding theorems are non-constructive. Thcre exists no general finite apparatus which will 'remove redundancy' from a signal in a channel. Different kindfi of signal are redundant in esoteric ways, and any particular signal demands am analysis which is specially tailored to its individual quirks. Hence the only hope for a general theory is that a particular sort of redundancy be especially coinrnon: a systcm to dcal with that would then halve a general application. Yortunatcly, it is likely thcre does cxist such a form; and with its dctailed discussion thc ncxt section is conccrncd. 1.3. I'roblems in spatial redundancy I .3.0. Inlrodzcct.ion The tcrm spatial redundancy means that redundancy which is preserved by any reordering of the input cvents (of which only a finitc nurmbcr have occurred); it thus fails to take account of causal or correlative relations which hold bctcveen cvents a t different timcs. It is thc only kind of redundancy with whose detection this paper dcals. The complications introduccd by considering temporal corrclations as well arc scvere, and anyway cannot bc discussed without somc way of storing temporally extcnded cvents. This requires Simple Mcmory theory, and must thcrcfore bc postponed. A theory for cerebral neocortex 176 'Che particular kind of spatial redundancy with which this scction is concerncd is the sort which arises from thc fact that somc objccts look alikc. This will bc intcrpreted as meaning that some objects share morc 'features' than others, where ' fcaturcs ' arc prcviolnsly constructed classes, as outlincd in § 1.1.2. It is conjectured that this kind of information forms thc basis for the classification of objccts by thc brain: but before examining in detail thc mechanism by which it is donc, some arguments must bc prcscnted for the general notion that something of this sort is possible. 1 .3.1. Numerical taxonomy I+Cvidcnceto support this hypothesis may bc derived from recent studies in automatic classification techniques. The most important work in this ficld concerns the usc of cluster methods to compute classes from information about the pairwise dissimilarities of the objccts conccrncd (Jardins & Sibson 1968). Therc are two stcps to thc process. The first coniputcs thc pairwise dissimilaritics of thc objects from data about thc fcatures each objcct posscsscs. For this, the information radius (Sibson 1969; Jardine & Sibson 1970)is used, and in thc casc wherc the fcatures are of an all-or-none type (i.e. an object cither docs or does not possess any given feature), this takcs a simple form. Suppose objcct 0,possesses featurcs f,, ..., in, and object 0, possesses featurcs f, ..., f,, 1 < r < n < nL. Thcn K ( 0 , O , ) , the information radius associatcd with 0, and O,, (regardcd as point distributions), is simply r + ( m - n ) , the numbcr of features which exactly one object of the pair posscssrs. 'L'he second stcp of the classification process uses a clustcr mcthod to compute classcs from the information radius mcasurcments. Various argurncnts can be put forward to show that some cluster methods arc grcatly to be prcfcrrcd to others (Sibson 1970).Unlike the measurement of dissimilarity, these havc not been given an information theoretic background; but to do so would require a firm idca of thc purpose of the classification. The kind of assumption ono would need would bc to recpire that thc classification provide the best way of storing thc information relative to some mcasurc-for examplc, a product distribution gancrated by assigning particular probabilitics to the individual featurcs. There is considerable choicc, howevcr, and it is unlikcly that any particular measure could be shown to be natural in any scnse. bt is not argued that any cluster proccss actually occurs in the brain: the importance of this work to the prcse~rtenquiry is more indirect, and consists of two basic points. Thc first arises out of thc type of rcdundancy thcse methods detect. It is that the objccts concerned do not havc randomly distributed collections of fcatures: wbak happens is that classes of objects exist which produce collections of features that ovcrlap much more than they should on the hypothcsis of randomness. This fact, together with some kind of convexity condition which asscrts that an objcct nirmt be included if cnough like it are, is f~~ndamental to the classifying process. 'Fhc sccond point is that cluster analysis works. A largc amount of information has bcen analysed by such programs, espccially information about the attributes of various plants. It has bcen found that thcsc mcthods do givc the classifications which pcoplc naturally malic. This is important, for it, shows that people probably use some process associated with the detection of'this kind of redundancy fbr the clilssificution of a wide range of objects. 'I'he motivation for studying methods for dctetating this kind of redundancy now becomes strong. 1.3.2. Mountain climhing in a p~obabilisticlandscape I n the brain, one may expect featurc detectors to exist, if the recognition of' objects is based on this sort of analysis. If sl~atialredundancy ($1.3.0) is present in the input, there will exist collections of' features which tend to occur together. This phcnomenon can be given the following more picturcsque description. llet the input fibres a,, ..., 11, represent feature detectors, and let 'U be the set of events on (a,, ..., aN) ($0.4). Xndow 91 with the distance function d, where d(E,li') the nuln ber of fibres at which the events E and P disagree. ( a , d) is a inetrie space, and in fact d(E,P) K(E, li'), where R is the information radius. Imagine the space (21, d) laid out, with the probability p ( E ) of each event E c 9[ represented by an extellsioll in a new dimension. p ( B ) is called the 'height' of A'. I t will be clear that if E occurs more frequently than E', p(E)> pi$') and E;'ishigher than P . I n this way, the environment may bc regarded as 1;tndscaping tht: spwe 92, in which the mountains eosrcspond to areas of events which are frequent, anti the valley to events which are rare. 'Uhe important point about the choicc of' d for the mrktric on 91 is that nenrk)y inputs (under d) possess nearly t l ~ csanic features. Rence if' a number of inputs conzmonly occur with very similar collections of features, they will turn out us a mountain in (21,d) under p. The detection of such collcetions is thus equivalcnt to the discovery in the space ( 92, d) of the mountains induced by p. 'The prohlern of discovering such mount;~iiisis solved in $ 5. Two other problems concern the choice ~ )which to form the space (21; and the question of the feature detectors {a,, ...,~ 1with of what exactly one does wit11 ;L mount;~inwhen it has been discovered. These :-are dealt with next. 'l'he point that this scetion illustrates is that the mountain idea ovcr the space ( a , d) chai-actevizcs the kind of redundancy in which wc are interested. - - 1.3.3. The pccrtition problem The pi-ospeetsfor discovering mountains in the space 91, given that they arc Ci~ci.c, are good; but whether they are tllcrc or not depends lai-ply on the choice of'the feature detectors (a,, ..., a,,). There can be no guarantee that an arbitrarily cklosen collection of f'eatares will generate :z pvobabilistie lai~clscapeof ally interest. The discovery of an appropriatc 2[ needs methods whereby features which are likely to he related arc brought together. 'I'his is called the partition p~oblwn,and is in general extremely difficult to solve. 'llhe problem rbr which visual bonding was introduced in 3 1.1.2 was an cxarnplc of how special tricks can i11 certain circnmstances be used to solve it. A theory for cerebral neocortex 177 If no bonding tricks :ire known, however, the discovery of suitable spaces must rcst upoii measuring correlations of various kinds over likely lookiiig populations of events. This is ;Ln oper:ztion whosc rate of success depends upon the size of the will be discussed more :zvailable memory. It needs the theory of Simple Memory, ; ~ n d fully them. Suffice it hcre to say that the problcm is not totally intract;~bledespite the huge sizes of all the relev;~ntcvcnt spaces. The reason is that only a very small proportion of the possible events can ever actually occur, simply because of the length of time for which a brain lives. 'Phis mcans, first, that the memory can be quite coarse; and secondly, that if anything much happens twice, it is almost certain to be significant. 1.4.0. Introdzcction 1.4. The recodinq dilovnma 'l'he ;~ttractionof mountains is that when applied to the correct sp;Lce, they provide a neat characterization of the type of redundancy which, thcwe is reason to believe, is important for the classification of objects, and probably much else besidcs. The question that has now to be discussed is what to do with a mountain n-hen i t has heen discovered. The obvious thing to do is to lump the events of a mountain together and call i t a class. The problems arise k~ecausethere is virtually no hope of ever saying why this is the right thing to do, using purely information theoretic ideas; and until this is specified, i t will be impossible to say exactly how the lumping should be done. The k~asicdifficulty is that the lumping process involves losing informationabout the difference k)ctwcen the events lumped together. The simplest reason why this process might be justiliablc, or eve11 desirable, is reliability. It would be implausible to suppose that the interpretation of an input might fail because of the failure of a sing1efik)rc.Hence arecognitiollapparatusfor the particular event X must admit the possibility that an input Y with d(X, Y) = I or 2 (say) should be treated like X. But it is only by introducing such an assumption that this kind of step could be made, a t least within the framework of the arguments set up so far. 1.4.1. Informativn theoretic assumnplions of a suitable nalure 'I'he problem about trying to develop information theoretic hypotheses to act as justification for ignoring the difference between two events is that from an absolute point of view, one might just as well confuse two events with d ( X , Y) large as with d(X, Y) small: there is no deep reason for prcferring pairs of the second sort. It is natural to hope that in some sense, less information is lost by coilfusing nearby events, but in order for this to be true, something has to be assumed about the way two events can be compared. This effectively means comparing them to one-or a family of--reIerence distributions, whose choice must be arbitrary, and equivalent to some statement that nearby events are related. The thcory thus becomes selfdefeating, and the realization that this must be so allows exactly one observation to be made--namely that information theoretic arguments alone can never suffice to form a basis for a neurophysiological theory. D. Marr 178 1.4.2. lands lid^ The mountain structuuc of 1.3.2 depends on two things: the environmental probability distributionp, and the metric d. But it has been shown in 1.4.1 that the particular choice of d for the metric cannot bc justified in any absolute way. The view that these mountains are important can therefore receive no support frorn any theory, based solely on ideas about storage, which does not assume that the first information to be thrown out is that which distinguishes the different parts of one mountain. I n order to see how this might in fact be so, it is therefore necessary to return to the real world, to discover how some information may be important, while somc may be expendable. 1.5. Bioloyical ulilily 1.5.0. The general argument The qucstion with which this section is concerned is why should i t ever be an advantage to classify together the events of a mountain. To answer this requires a clear idea of what the brain classifiesfor: only when this is known can it be deduced what kind of information is irrelevant, and hence which events nlay be classified together. The answer which will be proposed is that the classifications the brain eventually derives are ones which allow the deduction of the presence or absence of a properly or properlies, not necessarily directly observable, from such information as is a t the time available. The word 'property' means here a slightly gcneralizcd idea of a feature: that is, it includes specifications of things an object can do, or can have done to it, as well as, for example, the sound it makes or the colour it has. . 1 .5. P Examples It is helpful at this point to give some concrete instances of the general statement made above. I n its purest form, it implies a simple learning device, to which instances of the property concerned are transmitted through one channel, while informationfrorn-cvhichthis property is to be diagnosed is conveyed through another. l'his corresponds exactly to thc situation proposed for the cercbcllar cortex in a recent theory of that structure (Marr 1969): the first channel is the climbing fibre input, and the second, the mossy fibres. There clearly exist stern limitations to this idea in any more general application, since in the cerebellar model, a property earl only be diagnosed in conditions which are virtually a replica of a previous state in which the property was known to hold. It is, nevcrtheless, a primitive example of the central idea. The property concerned nced not be thc immediate implementation of a particular elemental movcmcnt: it, might be whethcr or not a particular. branch can support the wcight of a particular monkey. The aninla1 concerned c1e;trly needs to be able to make this discrimination, and to bc ablc to do so by methods other than direct experiment. The information available is the appcarancc ofthe branch, from which it is possible to produce a reliable estimate of its strength. J t is supposed that thc A theory for cerebral neocortex 179 animal uscd data obtained by direct experiment (in play during his youth), to set up the appropriate classificatioil apparatus. Thcse two cascs illustrate the idea of a classificatory scllcme designed for thc diagnosis of properties not directly or imnletliatcly observable. 4t is helpful to rnalie thc Iollowing De$laition. An intrinsic property is onc tho presence or abscncc of which is known, and which is bcing used to decide whether anothcr property is jwescnt. The word 'intrinsic' is uscd for this bccause if a property-dctecting fibre aiis in the support of a space over which thcre is a mountain, thcri that property is in a real scnse an intrinsic part of the structure of the mountain. 'I'he second part of tlze dcfiiiition follows naturally: an extrinsic property is onc whosc prcscnce or absence is currcntly bcing diagnosed. These two words havc only a local mcsning: they arcA simply a useful way of dcscribing which sidc of a decision proccss a particular propcrty lies. Classification for biological utility may therefore be regardcd as the diagnosis of important but not immcdiatcly observable properties from information which is easy to obtain; and although this to some extent begs the question of what is an important propcrty, it, ncverthclcss, rcpresents some advance. Its strength is that it shows what informatiotl may be lost-namely thc diffcrencc bctween events which load to a correct diagnosis of a given propcrtp. The wcakncss of this approach is that it contains no scope for gcncralization from situations in which a propcrty is linown to hold, to ncw situations; and thcrefore seems to reduce operations in thc brain to a sinlplc form of memory. 1.5.2. The dicJ?oto?ny 'tt may fairly be said that the remarks of this and thc last scctions force a dichotomy. On the onc hand, thcrc are the attrsctivc anti clcgant ideas associatcd with coding for fcaturcs, and their connexion with mountains and pure classification theory. Thesc have been shown to bc an insuflicicnt basis for a theory, but they have a strong intuitive appeal. On the othcr hand, thcrc are the nakedly prachical idcas associatcd with strict biological utility. These have the advantage of giving a criterion for what information can be ignorcd, but in this crudc shapc, thcy suggcst a mcmorixing systcm which performs more or less by brute force. Tlrerc is no hope for cithcr of these approaches unlcss thcy can be reconcilcd; and for this task, thc next section is rescrvcd. 1.6. The ,fundanzental hypothesi~ 1.6.0. The nnfurc oJ' a ~econcilintion Beforc trying to discover how thcsc two views may be united, onc must have a clear idea of the nature of any statement which could bring thcm togethcr. The first view was of a liind of classification schcme which might be uscd by the brain. It consisted of selecting rcgions of commonly occurring subevents in event spaccs over a collcction of feature-dctccting fibrcs, such that thc subevents selected differcd 180 D. Marr rather little from one anothcr. Thc sccond vicw suggcstcd that thc main function of thc analysis of scnsory information was to dcducc propcrtics of importance to the needs of the animal from such information as is available. These can only bc reconciled if classification by mountain sclection does prove a good guidc to thc presencc of important propertics: to decide whether this is so, propertics of thc rcal world must bc considered. 1.6.1. Validity for properties which are usually intrinsic Let 3 be the cvcnt spacc on thc fcaturc-dctccting fibrcs {a,, ...,alv),and let h be the probability distribution induced over 3 by thc cnvironmcnt. d is thc natural metric dcfincd in $1.3.2. Tn a gcnersl input subevent, the value of each fibre will be 0, or 1, or will bc undefined. Thc last casc can arise, for example, in thc casc of visual information, when part of an objcct is hidden behind something clsc. I n this way, a propcrty which is usually obscrvable may sometimes not be. It will now bc shown that classcs obtained by lumping togethcr events of a mountain over (PX,d) can usually act as diagnostic classes for such propcrties. FIGURE I . An illustration of tlic form of rcdundancy bcing discussed: the probability dmtrlbutlon ,U irtduccd by the cnvironrnent over N , ( X ) has non-zcro values only in N,(X). Let X E 58 bc an event of PI, and let N,(X) = {YI Y E 58 antl d(X, Y) < r). A ' mountain' in 58 might correspond to somc distrib~ationlikc / A where whcrc s > r , r. is small, antl K is some positive constant. As soon as enough values of thc ai arc known to determine an evcnt as lying within i!Vs(X),it follows that the event lics within Ai,(X) (see figure 1). Write pi = probability that (ai = 1given E E&(X)). Then if an cvcnt is diagnoscd as falling within N,(X) without knowing thc value of a,, it can bc asscrtcd that ai = 1 with probability about pi. This is uscful if pi is near 0 or 1. This kind of effect is a natural conscquencc cf any mountain-likc structurc of h over PI, and allows that, in ccrtain circumstanccs, thcsc classes can bc uscd to diagnosc properties which are usually intrinsic. The valucs of ai arc not nccessarily as expected-the picce of thc object that is hitltlcn may in fact be broken off; but the spikier the mountain (i.e. the smaller the local variance of A), the nearer the pi will be to 0 or 1, and the more certain the outcome. A theory for cerebra,l neocortex 181 1.6.2. Extrinsic properties 'I'he argument for t21is liind of classification is that whenever there is a tendency for intrinsic properties to occur together in this way, i t is extrcrnely likely that there will also exist other properties, perhaps not directly observablle ones, wllich also generalize over such groups of events. Hence, although the reason may not a t tho time be apparent, it will be good strategy for the animal to tend to make these classifications. Thus lator, whcn a property is discovered to hold for one event in a given class of events, the animal will be inclined to associate i t with members ofthe whole class. 7'he generalization may or may not be found to be valid, but as long as it is successful sufficiently often, the animal will survive. Onc other way of looking a t this liind of generalization is to alter slightly the way one expresses the relevant kind of redundancy. It is equivalent to the assertion that once a context is sufficiently determined, one property may be a reliable indicator of allotl.)ev. The example cited earlier was of a monkey judging the strength of a branch. In practice, the thickness of a branch of a tree is a fairly reliable indicator of its strength, so t21:~tunless the branch is rotter~,it will support the monkey if it is thick. enough. Rottenness, too, can be visually diagnosed, so that a completely reliable assessment can be made on the basis of visual information done. The context within which thickness and strength are related is roughly that the object in question is a branch of a tree, and is not rotten. This kind of relationship is common in everyday experience; so common indeed that further examples are unnecessary. But although the general notion of this liind of'redundancy has a clear importance, i t is not obvious how the details might work in any particular case, nor that they rnay work the same way in any two. This problem must bs tackled before any methods can be given for prescribing limits to the classes. 1.Ii.3. Rejining a classijicalo~yunit The rough heuristic for picking out likely looking classes has been discussect a t length. It was hinted that there may exist no a priori 'correct' way of assigning limits: where, fbr example, is the boundary between red and orange? The view that the present author talies is that although there are likely to exist fairly good general he~xristicsfor class delimitation-like some liind of convexity property analogous to that which the cluster analysts use - there are probably no universal rules. I t will be extremely difficult to give even these heuristics a satisfactory physical derivation: the kind of argument, required is very indirect. 33ut to say there exist no precise, generidly applicable rules is merely to say that dif'fercnt properties have different relations to their indicators, and so is not very surprising. If, for example, an iml?ortant extrinsic property is attached to a group of subevents, then its cessation marks the boundary of the class. If the property ceases t o hold in a gradual way, the class will have problematical boundaries. This does not necessarily mean the class is not a useful one: the dubious cases may be rare, or may fall loss dubiously into other classes. I n any case, those falling well inside will be usefully dealt with. It is therefore proposed that the exact specification of the boundaries to the classes should proceed by experiment. A new class is tentatively formed, upon the discovery of a promising nzountdn. i f i t turns out to have no attached extrinsic properties, i t probably remains LL slightly vague curiosity. If an extrinsic property more or less fits the provisional class, its boundary can be modified in a suitable way: tliis operation requires simple men~ory.If an extrinsic property is attached to it in no very sensible way---that is, instances of the property are scattered randonzly or inconsistently over the class-- then the class is no use as a reliable indicator, even with the available scope for shifting the boundaries. This does not necess~~rily render the class usoless, for the property might be one which puts the anirnal in danger, and the class may contain all inputs associated with this kind of danger. For example, only a few liinds of snake are dangerous, but tlze class of snakes includcs the class of dangerous snakes. It may be impossible to produce a reliable classification of snalies into dangerous and not dangerous without classifying some of them by species. This requires the consideration of more information than is necessary for diagnosis as a sizalie, and may be impossible without a potentially lcthal investigation. The investigation of the viability of a prospective class should probably be a very flexible process, drawing on the play of an anirnal when i t is young, and upon the experience of life later on. Those classes which turn out, with slight alteration, to be useful will survive, while those which do not will not. Provided the initial class selection technique is neither ~vrongtoo ohen, nor fails too frequently to provide a guess where it should, the animal will be well served; and an instinct to explore his surroundilzgs should enable him to remove any important errors. 1.6.4. The Ir'und(~menlrc1 Hgpolhesis The conditions for the success of the general scheme of classification by mountain selection with later adjustments can now be explicitly characterized. It will work whenever an extrinsic property is stable over small changes in ils diagnostic intrinsic properlies. A given extrinsic property may possess more than one cluster of intrinsic properties which diagnose it, but as long as this condition is satisfied within each, thc scherrle will work. If a small change in intrinsic properties destroys :hi1 c.xtrinsic propcrty, either the boundary of the cLzss passes near that point, or this extrinsic property carlrlot be diagnosed this way. I n the former case, slight boundary eharlgcs can probably accommodate the situation: in the latter, thcre are two possible remedies. Either instances of the extrinsic property can be learned by rote-this can only bc successful if tlle relationship of the extrinsic to the intrinsic properties is fixed-it is in any case arduous; or the intrinsic context has to he rccodcd. To the general recoding problem, therc exists no general solution (by thc rcmarlis of S 1.2.2). The present theory is thus based on the existence of a particular kind of redundancy, not because it is redundancy as such, but because it is a special, useful sort. This is expressed by the following Bundarnental IIypothesis: Where instances of a particulmr collection of intrinsic properties (i.e. properties A theory jor cerebral neocortex 183 already diagnosed from sensory information) tend to be grouped such that if some are present, rnost are, then other useful properties are likely lo exist which generalize over such instances. Further, properties often are grouped in this way. 92. TBE RUNDA~ITENTATJT H E O E E M S 2.0. Introduction The discussion llas hitherto been concerned with the type of analysis which may be expcctcd in the brains of sophisticated living animals. It was suggested that an important aspect of the computations they pcrform is thc induction of extrinsic fiom intrinsic properties. This conclusion introduces three problems: first, collcctions of frequent, closely similar subcvents have to be picked out. Thc Pundamcntal Hypothesis asserts that it is sensible to deal with such objects. This problem, the discov~ryproblenz, is dealt with in 5 5 . Secondly, once a subevent mountain has been discovcrcd, its set of subevents must be made into a new classificatory unit: this is the repr~sentationproblem, and is dealt with in S 4. Finally, on the basis of previous information about the way various extrinsic propertics generalize over these collections of subevents, it must be decided whether any new subevent falls into a particular class. This is thc diagnosis problem, and is dcalt with now. 2.1. Diagnosis :generalities A common mcthod for sclccting the hypothesis from a set (Q,, ..., Q,) which best fits the occurrence of an event E , is to choose that Qi which maximizes P(!31Qi). Such a, solution is callcd the maximuln likelihood solution, and is thc idea upon which the theory of Bayesiar~inference rests (see e.g. Kingman 65 Taylor 1966, p. 274, for a statement of Bayes's theorem). This mcthod is certainly the best for the model in which it is tlslially developed, where the Q, niay be regarded as random viariablcs, and the conditional probabilitics P(EjQi), for 1 < i < n, are known. Thc lnaximunl likelihood solution will, for example, show how, and a t what odds, one would have to place a bct on thc nature of E in ordcr to cxpect an overallprofit. It is of course important to know all the conditional probabilities; and if the Oi are not independent, various complications can arise. Thc situation with which the present theory must deal is different in scvcral ways, of which two are of decisive importance. First, the prime task of the diagnostic process is to deal with events Ej which have never been seen before, and hencc for which conditional probabilities Y(EjlQ,) cannot be known. I t will further often be the case that Ef occurs only once in a brain's lifctimc, yet that brain may correctly be quitc certain about the nature of Ej. Secondly, thc prior knowledge available for inferring that Ej is (say) an Qi comes from the Fundamental Hypothesis. That is, thc knowledge lies in thc expectation that if Ej is 'like' a number of othcr E,, all of which are an Q,, then Ej is probably also an Q,. This does not mean that P(J3jlQi) is likely to be about the same as 184 D. Marr P(EklQi): frcqucncy and similarity are quite distinct ideas. Hencc if the Funtlamental Hypothesis is to be used t o aid in the diagnosis of classes -the assumption on which the present theory largely r e s t s t h e n that diagnosis is bound to depend upon nieasurements of similarity rather than upon measurements of frequencies. The analysis of frequcllcics of thc events Ej is therefore rclativcly unimportant in the solution of the diagnosis problem; but it is of course extremely important for the discoverg problem. The prediction that a particular classificatory unit will be useful rests upon the discovery that subevents often occur which are similar to some fixed subeverit: the role of frequency here is transparentlg important. B a t when the new classificatory unit has been formed, diagnosis itself rests upon similarity alone. A11 example will help to clarify these ideas. The concept of a poodle is clearlg a useful one, since animals possessing most of the relevant features are fairly common. Further, a prize poodle is in somc sense a poodle par excellence, and is as 'like' a poodle as one can get; but i t is also extremely rare. The essential point seems to be that irr a prize poodle are collected together more, and perhaps all, of the features upon which diagnosis as a poodle depends (or ought, in the eyes of poodle breeders, to depend). These arguments imply that for Ghe diagnosis of classiliaatory units bg the brain, Bayesian methods are probably not used. Conditio~ialprobabilities of the form P(EI,(L)are thus largely irrelevant. The important question, when trying to decide whether E is an 9,is how many oi'thc events likc E are definitely krrowri to be an Q. The computaCion of this raises entirely different issues. 2.2. T h e notion qf evidence The diagi~osisof an input requires that an informed guess be made about it on the k~asisof the results for other irrputs. Tf, for example, the present input E (say) has already occurred in the history of the brain, and has been found to deserve classification in a particular class, then its subsequent recognition as a member of that class is strictly a problem of rnemory, riot of diagnosis. On the other hand, E may never have occurred before, though i t might be that all E's neighbours have occurred, and have been classified in a particular way. The Fundamental I-Iypothesis asserts that this is good ground for classifying F$ in the same way. The existence of an event similar to E , and ltuown to be classified as, say, arr 32, therefore constitutes evidence that E should also be classified as a11 Q. It will be clear that tlie more such events there are, the stronger the case for cltrssifying E as an Q. It is appropriate to make two general remarks about evidence. The first concerns the absolute weight of evidence providod by Q-classified events a t different distancos from E. Any theory must allow that for somc categories of information, nearby events consitute strong evidence, whereas for others, they do not. 1)iagnoscs within different categories will not necessarily employ Ghe same weighting functions in the analyses of their evidence. The second point about evidence concerns its adequacy. It may, for example, never bc possible to diagnose correctly the class or property on the basis of evidence A theory for cerebral neoeortex 185 from events on the fibres (a,, .. ., a,): they simply may riot contaiil enough information. On the other hand they may contaiil irrelevant information, whose effect, is to make the classifying task appear to be more difficult than it really is. 'Chis observation emp2lasizes the importarice of picking thc support of the mouiltain correctly. The requirc?ments of the diagnostic system car1 riow be stated. 46 must: (i) Operate only over a suitably chose11 space of suk)cvents(suggested by the Simple Memory). This space is called the diaqnosiic space for the property in question, 9. (ii) ltecord, as far as condition (iii) requires, which everits of the diagrrostic space have hitherto beell found to be D's or not to be Q's. (iii) Be able, given a new event R, to examine events near E , discover whether they are Q's or not, apply the weighting function appropriate to the category of L2, and compute a ~neasureof the certainty with which h' itself inay be diagnosed as an f2. The three crucial points now become: B 1. How is the evidence stored? P2. How is the storcd evidence consulted? P 3. What is the weighting furrction (of (iii))? The solutions to these which are proposed in this paper are riot unique, but it is conjectured that they are the solutions which the nervous system actually uses. The l-iey idea is that of an evidence function, which will in practice turn out to be a subset detector arialogous to a cerebellar granule cell. The three points are resolved in the following way: P I . Itvidcnce is stored in the form of conditional probabilities a t modifiable synapses between 'evidence function' cells and a so-called 'output cell' for B, (evei~tuallyidentified with a cortical pyramidal cell). P2. Nvidence is consulted by applyiilg an input event E , wllicll causes eviderice cells relevant to E to fire. The output cell then has active afferent syriapses only from the relevant evidence cells. The exact way in which i t deals with the evidence is analyscd in 5 2.3. P3. The weighting functiori comes about hecause nearby events will use overlapping evidence cells, just as very similar mossy fibre inputs are trarislated into firing in overlapping collections of cerebellar granule cells. The exact size of subsct ~ depends upon the category of Q: reoopidetector cells used for collecti L Ievidence tioil of speech may, for example, require a geilerally higher subset size than the 4 or 5 used in the cerebellar cortex. Let 2 be the diagrrostic space for Q, and let c be a furrction on 2 which takes the value 0 or 1. c may, for example, be a detector of the subsct A' of iriput fibres, in which case, for E in 2, c(E) = 1 if and only if the event E assigns the value 1to all the fibres in the collection A'; but c can in gelleral be any binary function on X. Let P(Qlc) denote the conditional probability (measured in the brain's experience so far) that the input is an Q given that c = 1. 186 D. Marr Definition. The pair (c,P(Qlc)) is called the evidence for 52 provided by the evidence function c. The most important evidence functions are essentially subset detectors, (justified in $4.2.1), and it is convenient to give these functions a special name. Definitions. (i)For all E in %, let c(E) = 1, if and only if E(a,) = 1, 1 < i < r < N . I n this case, c is called an r-codon, or r-codon function, and is essentially a detector of the subset {a,, ..., a,} of the input fibres. (ii) For all E in 5, let c(E) = 1 if and only if at least 0 of E(ai) = 1, 1 < i < R < N. In this case, c detects activity in a t least 0 of the R fibres {a,, ..., an}, a,nd is called an (R, 8)-codon. The larger subset size, the fewer events E exist which have c(E) = 1, and so the denote the number more specifically c is tied to certain events in the space E. Let of events in E , and let K be the number of events E in % with c(E) = 1: then the fraction ~ 1 1 % I is called the quality of the evidence produced by c in %. The qualities of various kinds of codon function are derived in $ 3.2. 2.3. The diagnosis theorem The form of evidence has now been defined, and the rules for its collection have been set out. The information gained from the classification of one event, E , has been transferred to its neighbours in so far as they share subsets with E , and the subsets can be chosen to be of a size suitable for information of the category containing Q. Thus problems P 1 and I? 3 of § 2.2 have been solved in outline: the details are cleared up in $ $ 3 and 4. It remains only to discover the exact nature of the diagnostic operation: that is, to see exactly what function of the evidence consulted about E should serve as a measure of the likelihood that E is an 52. The problem may be stated precisely as follows. Let Q = ((c,, P(QIc,)))~,be the collection of evidence available for the diagnosis of Q over the space of events %. Let E be an event in %, and suppose ci(E) = 1 (1 < i c,(E) = 0 (k < i < k), < That is, the evidence relevant to the diagnosis of E comes only from the functions G,, ..., c,~,and is in the form of numbers P(Qlc,), ..., P(Qlc,). The question is, what function of these numbers should be used to measure how certain it is that E is an52 ? The answer most consistent with the heuristic approach implied by the Fundamental Hypothesis is that function which gives the best results; this may be different for different categories. But a general theory must be clear about basic general functions if it can, and an abstract approach to this problem produces a definite and simple answer. Suppose that, in order to obtain some idea of what this function is in the most general case, one assumes nothing except that E has occurred, and that the relevant A theory jor cereb~cnlneocortez 187 evidence is available. Then E effectively causes k different estimates of the probability of Q to be made, since k of the cLhave the value one, and P(Dlc, = 1) is the information that is available. That is, I3 may be regarded as causing k different measurements of the probability that Q has occurred. The system wishes to know what is the probability that Q has actually occurred; and the best estimate of this is to taBe the arithmetic mean of the measurements. This suggests that the function which should be computed is the arithmetic mean of the probabilities constituting the available, relevant evidence; in other words, that the decision function, written ' P ( Q / E )11asthe form P(QE) - .If d2 C, c,(E) P(Rc,)/ i =1 C ci (B). i= 1 The conclusion one may draw from those arguments is that if one takcs the most general view, assuming nothing about the diagnosis situation other than the evidence which E: brings into play, then the arithmetic mean is tlie function which measures how likely it is that E is an 8. The diagnosis theorem itself simply gives a formal proof of this. Tho meaning of the result is disvussecl in 2.4. Lemma (Sibson 1969). Let Ti be a random variable which takes the value 0 with probability q,, and 1with ~xobabilityp,= (1 - q i ) ,for I < i < 1.Let T be another such variable, with corresponding probabilities q and p . Let p, q be chosen z I to minirnjze C, I(%./ T), and let p, = i i =1 Proof. Let p, $. I , C p,. Then p = p,, and is unicyue. (111) 1 + 0, and let ir, be its corresponding binary valued random variable. Hence C,I(!qIT) i and I is always 3 0. Thus = ZI(!iyT,) -I- lI(T0~l1) i Zl(T,I T ) 2 i ,i ( % I q), equality occurring only when I(T,IT) = 0, i.e. when T value of C,I(T,IT) is achieved uniquely when p = p,. = 7;. Hence the minimum i L)iag?losistheorem. Let l2 be a binary-valued random variable, and let p,, ...,p, be independent estimates of the probability p that D = 1. Then the maximum likelihood estimate for p is p, = ( I /I() Xp,. i Proof. The estimate pi of p may be regarded as being made through noise whose effect is to change the original binary signal 9 ,which has distribution (p, I -p), into the observed binary random variable q.(say), with distribution ( p i , 1-pi). The information gain due to the noise is I(Ti19).Hence that value o f p which attributes D. Marr 188 least overall disruption to noise, and is therefore the maximum lilirlihood solution, is the one which minimizes xI(T,IQ).B y the lemma, p is unique and equalsp,,, the a arithmetic mean of the p,. This result applies when the p, are independent, or are so to speak symmetrically correlated. For example, if T,, .... T,<-, are independent, but T,, = 7;,_, the result is clearly inappropriately weighted towards 7;,,. On the other hand, if lc is even, and 5!; = T,, T3= T,, ..., 7;, = 7;,,this is not harmful. The general condition is complicated; but if c,, c,, ..., cLl,form a complete set of r-codons over the fibres {a,, ..., a,,), or a large random sample of such r-codons, then they are symmetrically correlated in the above sensc. p = p, gives the best single description of p,, ..., P,~ in the sense that it minimizes x1 (711[IT).The diagnosis theorem dettls with a situation in fact rather far removed _, i from thc real one, and the next section is concerned with reservations about its application. It is not clear that any single general rcsult can be established in a rigorous way for this diagnostic sittliztion. 2.4. Notes on the diagnosis Iheorem The key idea bchjnd the prescnt theory is that the brain deeornposcs its affcrent information into what :Lrc essentially its natural clustcr classcs. The classes t l i ~ ~ s formed may be left alone, but arc likely to be too coarse. ?'hey will often have to bo decomposed still furthrr, until the clusters fall inside the lasses which in real life have to be discriminaLed; and they will often later havc to be recombined, using, for ex:zrople, an 'or ' gate, into more uscf'ul oncs, like spccifie numeral or letter detectors. These various operations are of obvious importance, but the basic emphasis of this approach is that the natural generalization classes in thc nai'vc animal ;trc tlic primary clusters. Diagnosis of'a ncw input is achieved by measuring its similarity to other events iu a cluster, and the similarity measure 'P of' $2.3is proposcd as suitable for this purpose. Its advantages arc that it can bc dcrivcd rigorously in an analogous situatio~iin which thc c, are proper random variables; and tha,t the rcsult does not absolutely require that the cibe intlcpendent. Morcover, thc conditions under which dcpertderlce bctwccrl the c, is permissible (the 'symmctrie' correlation of $2.3) include those (when the c, are a, large sample of r-subset detectors) which resernk)le tlicir proposed conditions of use ($4). Nevertheless, the infercnee that if Y'(O1 E ) is sufficiently high, then E is probably an O, rcsts upon the Fundamental Hypothesis. This observation reist\s a number of points, about tlic structure of the evidence functions, and about ways in which exceptions to the general rulc can be dealt witll. The various points are discussctl in thc following paragraphs. 2.4.1 . Cyodons for evidence ?'he validity of the statcmcnt that a high 'I-'(QIh') implics that F is an 9 rests upon the strueturc of the cvidence functions used to obtain 'P. The neural models of $ 4 A theory for cerebral neocortex 189 employ codons (i.e. subset detectors), but their physiological simplicity is not their olily justificatioii. l n 3 4.2 it is shown, as far as the imprecision in its statement allows, that the Fundamental Hyp~t~liesis requires the use of rathcr small subset detectors for collecting evidence. I t is not clear that advantage can at present be gained by sharpening the arguments set out there. 2.4.2. Use of evidence o j ayproximately uniJorm qualily The reason for usingfi~nctionsc~ over 5at all, rathcr than simply collecting evidence with fibres aj, is that the untransforrncd a j would often not produce evidence of suitable quality. It may be possible simply to use fibres, especially for storing associational cvidcnce (see 3 2.4.5); but it is probably also often necessary to crea te very specific codon functions giving high quality cvidence for very selective classificatory units. This process must involve learning whenever the classes concerned arc too specialized for much information about them to be carried genetically. The quality oEa piece of evidence is a measure of how specific it is to certain events in the diagnostic space X. I n general, a given diagnostic task will require discriminations to be made ahove a minimum valuep (say)of F ,and the quality of the evidence used will have to be sufficient to achicve such values of T. The higher the quality of the chvidcnce, the more there has to be to provide an adcqnate represcntatiot~of X ; and hence economy dictates tha-t evidence fbr a particular discrimination should have as poor a quality as possilole, subject to the condition on T . Evidence of less than this minimal quality will serve only to degrade the overall quality, and so must be excluded. Hence, cvidenct. sho~lldteiid to have uniform quality. Mixing evidence of grcatly diEercnt qualities is in general wastet'ul. This condition is satisfied by the models of $4, where evidence is provided by (IZ,O)-codons,and most of'the evidence fix a single classificatory unit has the same vahrc.s of R and 0. 2.4.3. C!lassifiing to achieve (I, particular discrimination The q~mlityof evidence function for a particular classificatory unit depends upon the minimum value p of' 'P which is acceptable for a positive diagnosis, and this in turn will depend on how fino are the local discriminations which have to be made. The size of the clusters diagnosing the nurneral'2 ' (say)in the relevant feature space depends upon the necessity for discriminating ' 2 ' from instances of other numerals and letters. 'l'he usual condition is prohahly that the part of the diagnostic space (over the relevant features) occupied by instances of a '2' must be covered by clusters contained wholly in that part. This condition fixes the minimum permissible value of p for diagnosis of a '2', which in turn fixes the subset sizes over any given diagnostic space. There may however be important qualifications necessary about this approach: the observations of @ 2.4.4 and 2.4.5 can seriously affect the value of p. 190 D. Marr 2.4.4. Evidence against 9 7' will be most successful as a measure for diagnosis wl.ien the properties being diagnosed are stable over small changes in the input event. As E moves away from the centre of an 9-cluster in the diagnostic space X, the values of P(S2lc) where c ( E ) = 1 gradually decrease, and P dcc:reases correspondingly. Provided these things happen reasoilably slowly, all the remarks about symmctric:al correlations of the cvidenc:c functions will hold in s n adequate fashion. The possibility must, I~owcver,be raised Lh:*t within a gentlral area of E whic:li tends to give n diagnosis of Q, there exist special regions in which for some reason, L2 docs not hold. Provided the rogion in which 9 does not hold is itself a cluster witllirl tllc larger f2-cluster, this state of affairs is not inconsistent uith the li'undamental Hypothesis. This contingency can be dealt wit11 in the same may as the diagnosis of Q, hy col1ec:ting evidence for 'not 9' cvidcnc:~against O-within either 2, o r a, space related to X. The form of the analysis is exacLly the samo as for Q, except that the classificatory unit for 'not L2' must be capable of overriding that for 9 . k t is of c:ourse irrlportant for t l ~ csucc:essful diagnosis of 9 that diagnostic spaces for B and for 'not L2 ' should both be appropriate, and both have evidence functions of suitable y uality : but the mechanism which discovers the diagnostic: space X for 9 can clearly be used to discover the appropriate space for 'not 9'. It is interesting that this situation corresponds exactly to one proposcd for. the primary nioto~.cortex. It has been suggested by Blomficld & Marr (1970)that the superficial cortical py~amidalcells there detect inapproprit~tefiring of deep p+yryramidal cells. They presumably detec:t clusters in inform:ttion dcsc:ribing the difference 1)ctwcenan actual and an intended movement. These clustcrs in effect correspond to the need for deletion of activity in certain dccp pyramids (an instance of the Fundamental Hypothesis), and the superficial pyramids cause the deletions to be learned in the cerebellar cortex. This distincticln between the classes represented by deep and superficial cortical pyramidal cells may well not be restricted to area 4. 2.4.5. Cornp~iingdiaynoses and conl~xlzccxlc l u ~ s It is often the case that a single retincd image could originate froni two possible objects, yet contextual clues leave no doubt about which is the true source, and that source is the only one which is cxperionced. Such circumstai~cesdemonstrate thegreat importance of i n d i ~ winformation t to the correct diagnosis of a sensory input. The present theory contains threeways byw1iichsuc:h information niay affect u diagnosis. First, contextual inform ation --for example, ooncerning the plaoe one is ill-n~ay be included in the spcoificntion of the diagnostic space for 9 . There presumably exist classificatory units in one's brain for the places in which one conimonly finds oneself, and other units which describe Icss comnlon 1oc:ations more pcdantic::~,lly: and these probably either fire all thr time one is in the appropriate locat'don, or (roughly) fire m~hencverother parts of the b n ~ i n'ask' where one is. Such information may be treated like more conventional sensory input. A theory for cerebral neocortex 191 Secondly, diagnostic criteria within categories can be relaxed by changing p . It is allalogous t o the ideas proposed in explanation of the collaterals of the cerebellar Purkinje cells (Marr 1969; Blomfield Pr, Marr 1970). A prinri information is sometimcxs available which makes uriits in one category more likely to be present follo~virigthe diagnosis of uriits in another. I11 such cases, a general relaxation of thc minimum acceptable value p of 'P over the relevant category will be appropriate. Thirdly, and perhaps most important, is the matter of ' associational' contextual information. No additional theory is required, since such information can be treated as evidence in the us~zalway. J t is probably for this kind of information that evidence functions are least often needed: direct association of classificatory unit detectors (cortical pyramidal cells) will often be adequate. The matter is touched on ill $4.1.8, and dealt wit11 a t more length in Marr (1971, 9 2.4). 2.4.6. C:~neral remarks about 'P The direct technical importanoe of the Fundamental I-Iypotliesis t o the application of the results of the diagnosis theorem raises the wider issue of the extent to which one can feel justified in applying information-theoretic arguments to the kind of situation with which the diagnosis theorem deals. The Fundamental Hypothesis sim~tlysummarizes the view that clusters are useful. This is a heuristic approach, and i t is not obvious tliat the ciiagnosis problem deserves any better than a heuristic approach itself. It probably matters rather little exactly what measure of similarity or fit is used: the redundancies on which the success of the system depends are so gross that there is probably more than one worliing alternative to 'P. If this is so, the diagnosis theorem loses much of its importance as a derivation of the ' correct ' measure, since there may be no genuine sense in which any measure is correct, as long as it has a certain gener;~lform. The measure I-' does however seem intuitively plausible, and the reader may be happy to accept it without rnuch justification. Theorem 2.3 is the best argument this author has discovered in its supporl; but it is not binding. The measure '7 can be given :L direct meaning in terms of the events of 2. Let X ,brx the set of events E of X with c,(F=)= 1. Then P(Qlc,)is the probability that if an an event of .X,occurred, it was an J2. Suppose that X is the set of all events of size L on the fibres {a,, ..., a,), and that the evidence functions c,, ..., c, are the set of all r-codons. Let E' be the new input event of X, which must be diagnosed; and let E be an arbitrary event of X. Write d ( E ,P ) = x, d being the usual distance function of 5 1.2. 'I'hc number of r-subsets which E and P share is ( L %), taking (z) to be zero . . when y < x . Hence the weighting fuiiction which describes the 'influence' of E on the diagnosis of F is Thus t l ~ earithmetic vricao obtained k)y the theor*cnnof S 2.3 is whcre h is the proba,bility distributio~~ induced over 2 hitll~erto by the enviro~ll~l(~iit. 2.5. T ~i nP t e ~ p r e l o f i otheorem ?~ The diagnosis theorem 2.3 wa.: concerned with the diaguosis of the pi.opcrty lo over the diagnostic slmce X on fihrcs (a,, .., u,). 'rl~eevents E in this situation it ~will frecquentliy occur in practice specify the values of all the fibres {a,, ..., a,); I , L L that some values; of thv n, will k)e undefined, and a decision has to b~ r n : r t l ~on tlre problrrn is that this will nican t h : ~many t of tlre basis of incomplete inforrriation. ' r h ~ evidence functions (.,arc also uildefii~cd,thus leaving little if any cvidcncc acatually acucssible. to the input in quest ion. For extunplc, suppose :L recognition system has bccn set up for a partic111:rr f:,c(>:then a ger~cilsketch oftllat face call k)c rcc20gnizcd as such, cvcn though much information--the colour of the eyes, skin, h:rir :snd so forth -is ~llissing.r'irlc:lr :r skvtch cnrl itself be silalj~sttlanti set up :lc3 a acw classif;ctrtory unit if that scLemsu~cf~fill, and tlie rnechanic~sof this process are the smne :IS for the original. But this is a notion quite separate from tlic idea that tho slzotcl~is in some way related to thc original face, and it is this idea wibli which tlre present section is concerned. The crux 01the relationship is th:rl, tlie original face. is the one which in somc way best relates tlie sparse informahion contained in the fci~tures 13reserited by the sketch. The result which follows chnractc,rizcs this vclationhhip precischly. 2 , as nsual, is tlie event space on {a,, ..., a,,). Lci X be :r suk~oventof X wlrich speciiies the values of (say) a I , ..., a, for some T < IT. 1'hen the event E in X iq a comp2~Lio.1~ of X , written 18 I- X, if (i) E specifies the values 01all a,, 1 < i < LV, (ii) &(a,) = X(cn,) where X (a,) is defined. 1,et G = {c,]l < i < df) be the set offunctions on 3 which provide evidence for the diagnosis of f2.Since X is not :r full event of X, c,(X) is undefined (1 < i < ili). Now there clearly- exists a sense in which c, (171) might, be defined: for example, for all E in X sucl~that E I- A ', either : ( ) = I or c i ( E )= 0 for all E i n X s ~ ~ c l l t l I!:I-X; ~:~,t but such :r circumstance is excel3tion:~l,and cannot be relied upon to provide adequate (1i:rgnostic criteria. Lot {EJ,..., E,) be the set of :dl completions of X in X. Then clearly if ' P ( S & ] ,) has thc samt: vtduo, y, for all 1 < i < K, tht:re are strong grounds for asserting that on the is also y. This result is :r basis of the cvidence fvorn O, the estimate for F'(LJ!',,IX) A theory for cerebrctl ~aeocortez P93 specid case of the following theorem. If Y(MIx) denotes the maximum likelihood value of thc probability of Q given X , taken from the evidence, 'r(QlE)denotes the estimate arrived a t in the diagnosis theorem, and P(I/:,IX) is :L conventional con(iitioiial ~)rok~aloility, then we have the Intrrp~elationLheowm. Let X be a subevent of E with completions El,..., Itlc. anel is unique. Prooj. The argument is sirnilar to that of the diagnosis theorem. Let T\ (X) be ;a binary-valued random variable such that 7 \ ( X ) = I with pr-olsability Y'('(91Jlt) = p, (say), for each i, I < i < K. Let F'(QIX) correspond to a bini~ry-valuedrarlciom vi~riableT where 7'(X) = 1 w i t h probability p. Then each corxlpletion of X corrcsponcls to an estimate, p, of p, and P(E,IX) specifies the weight t,o be attiached to this estimate. Hence by the same argument as that of the theorem 2.3, the maximum likelihood sollrtion for T is that which minimizes Ry an extension of' the t~rgumcntof the lemma 2.3., thr: value of p urhioh achieves this is unique, and is K p = P(lc71x)p". L= I Ii Hence 7'(QjX) = 2; 'P(QIE,)P(ELIX), i-1 : ~ n dis unique. It'errzarks. 1113 general, no information bout P(lCt\X)will be available, so that 'I(Q12j X ) will usually be the arithmetic mean of 7'('(OlEi)over those ELt- X. This theorem shows thi~jti~lcompleteinformation should be treated i n a way mrhich looks like an extension of the mcxthods used for complete information, and tho reservations of $ 2 . 4 apply clqually here. The result does, lio~vever,have the satisfying consequence that the modols of $ 4 designed to implement the diagnosis theorem automatically estimate the quantity derived in the interpr~t~ation theorem when presented with an incompletely specified input (:vent. This section contains the technical preliminaries to the business of designing the concrete neural models wl~ichform the subject of the next. The results are mainly of an abstract or statistical nature, and despite the length of the formulae, are essentially simple. 3.1 . Simple s!/naptic distributions v2 be two populations of cells, numbering N, and A?, elements respectively. Let $,, Suppose axons from the cells of $, are distributed randomly among the cells of p, in D. Marr 194 such a way that a given cell c, E sends a synapse to a given eel1 c2e$, with probability x,,. x,, is called the contact probability for 9,-t $,. If L of the cells in p, are firing, the probability that a given cell c,E p, receives synapses from exactly r active cells in Qlis Hence the probability that c, receives a t least 12 active synapses is X where X ( R ,L, x,,) is called tho formation probability for -t 9,. Suppose the cells of $,receive synapses from no cells other than those of $3, and that they have threshold R. The probability that exactly s cells in $, are caused to fire is ( 1X where X = X ( B ,L, z,,). (3.1.3) (1 Hence the prohability that a t leasf, S fire is I t is of some interest to liilow how well represerrted the L active colls of '$, arc by the cells of !@,whichthey cause to fire. ]{'or most purposes, and all with which this paper is coneerrred, it is sufficient that : ~ n ycllange in the cells which are firirrg ill p, should cause a chanrge in the cells of 3,. This is in general a complicated cyuestion, but n simple and useful guide is the following. Suppose the L cells of $, cause cxactly 12 synapses to active on each of AS'cells of Q,.Then the probability that at sends a synapse to nono of the active cells in 3, is lemt orre of thc L active cells ill ( 1 - I21L)". If IZIL is small, this is approximately 3.2. Quality of ~videncc: frorn codon fzmcfions Codon functiolls, introduced in § 2.2, are associated with particular subsets of the input fibres in the sense that krrowledge of the values of the fibres in a particular subset is enoug1.1to determine the value of the codon function. The larger the subset, the smallcr the number of evcnts a t which the function takes the value 1, so thc more specific that function is to any single evcnt. Hence the general r~llethat r-codon functions providc better evidence the largcr the value of r. This point is illustrated by the discrimination theorem which follows, and by various estimators of the quality of evidence to bc expected from a codon function of a given sizc. A theory for cerebral neocortex 195 It is convenient to usc the ovcnt space X on fibres {a,, ..., aLV>such that in each event of%,exactly L of the fibres ai have value I . The set of such events is called the code of size L on (a,, ...,a,). This involvca no a1)solutc restriction, but enables one to deal only with codon functions which assign the value 1to all the fibres in their particular subsets, rather than allowing any arbitrary (but fixed) selection of 0's and 1's. Let X be the code of size 1; on (a,, ...,a,,), and let 3 be a set of events of X -for example,s may be the set of events with the property Q. Let %, be the collection of all subsets of {a,, ...,a,,) of size r . 1)eJnition. '13, discriminates 3 from the rest of % if given X EX,X $3,there exists a subsct C E %, such that G s X but C 4_ Y, for any Y E 3. !Z1heorenz.Let 3 j :2; t l ~ e n there exists a unique integer R discriminates 3 from Z, all r 2 22. = R(3) such that 8, Proof. If %, discriminates 3from X,any 110?s.t. 110,2 %, also discriminates 3 from 2. If 3 can be discriminated by %, then 3 can be discriminated by 110, some set my,, of (r -(- 1)-subsets,since there will exist a set 110, of (r + 1)-subsetsthe set of whose r-subsets contains %., Finally, 3 is always discriminated by %L = (l31.F:~ Hence there exists a unique lowcr bound R s.t. 3 is discriminated from 2 by all %,for r 3 IZ. ,,, ,, x). This shows that for a givcn discrimination task, 3 from 2, for which codon functions are to be used, the codons must be bigger than some lowcr bound R which depends on 3. Ilejir~ition.13 is called the critical codon svxo Tor J, and is writton R,,,,. An a priori cstirnato of tho lilrely valuo oi' tho ovidonce olotairlod from a codon can bo mado by cxarninir~gtho nnrril~crof events of various kinds ovor which tho codon takes tho value 1.Lot f be tlro code of size Ti on [a,, ..., a ~ )f:contains ovonts. Let h drrloto the I(:), uniform probability distril)ntion over X: i.e. h(lC)= 1 writo A ( 3 ) = C A(]#). W c- C r -- Tlrerr all B E X ; and for 55f 45) simply measures tho number ol'evcnts In 5. Tho rollowing rosu Lts arc usorul. 3.2.1.14:ach input fibre is irlvolvod in LIN of thc ovonts in E (nndcr the ilistnbntion A). 3.2.2. Let 5 = {El(L - IE n 8'1) < p) whore F 1s soino fixed ovcnt of X, and p is a positive intogor. Tlrat is, 5 is $hep-nci@rbonrhootlor E'. Their tho numbcr of ovont,sin 5 is related to 3.2.3. Now suppose c is an R-codon corsospondirlg to an R-subset of tho evc,nt E" of $3.2.2. Tlre nurnbor of'overlts 3;such that E : E 5 (of 3.2.2) and c(h') = 1 is related to where (I = {&'I N -1 P h c s n ~ ) = j ~c )- o L L-12 -B-z c(E) = 1 ) . x ( N-L )( ), 3.2.5. Supposo 3, tho p-r~oighbourhoodof P , is a diagnostic class of X for uhich tho R-codon c (eosrcspondirlg to a subsct of F) is used to calculate ovidenco. Lot fi bo the property of boing in 5: then thc v a h ~ o r f P(L2Ic) that w o ~ ~ lbc t l gcnclarttedby thc r~niforln distribution h 01 cr X is givcn by wherc c is an R-codon. I'rovidcd p is such that iATs "1 is large compared to (that ~ s ispsinallcr than .,ay i ( N - L ) ) p, ~ d pn,, if /, < ( N - L ) (71 - R ) / ( N - R ) : 60 that for of somPevmt If', incrcasinp the sirnplc case u hcrc thc cliagnostic class is ap-neighbour~-llood thc rodon s ~ z cwrll, under any 11kely condit~ons,rncrcvhse thc espccted quality of the cvidcnce . 3.2.6. hl the rnorc cor~~plicatctl caic whero c is a n (N,@)-codon~nt~crscctlng B i n rxactly N clcmcni s. 11 c ha1.e $ 3. ' ~ I T E(2lCNIClt h L N E U l t A 1, E E P H E S E N T A T I O N 4.0. lntrodztction This section is concerned wit11 the dthsign of n e ~ ~ rmotlcls al for implemeiiti~~g thc, theorcn~sof' $ 2 . I t is nssr~rncdthat t l ~ ccx:tc*t nature of tbc olassificatory units required has already been dccidcd: only the rey)resent;btiou prok~lcniis dealt u7itl.1 hero. The discovery and rrfinrxrrrent of new c1:zssificatory units is postponed until S5, wherc it is di:irussed witlrin tlze context of the irlodels developed now. 'I'he central diticrcalty with producing acural models for a, specific fuxuuctiorl is t h a t there are m;my ways oC doing the same thing: nlthollglz tlle crucial averaging operation probably has to be pcrformcd a t exactly one cell, there are many ways in whirh the supporting structure may vary. Both the form of the cvjdenc>c,and Ll~c. exact t~oiiditionslaiider which it is uscd, are undefined; so the rigorous dcrivatioa of the basic neural models cannot proceed vcry far. This does not, however, commit thc discussjon to unredeemed vagueness. 'Fhe irijcction a t strategic points of a little common sense allows enough precision in t l ~ emodels to makc their comparison in 5 G with the known histology of non-specific cerebral rlcocortex a, useful venture. 4.1. Implementing the cliagnosis theorem 4.1.1. Dingtzosis b!j n sin,gkp cell Tllcorcm 2.3 suggests that the best estimate of the likclillood that a givcn event falls within a particular class is achieved by taking the average of the conditional probabilities oflered by thc relevant evidcnce. Supposc first that this operation is carried out by a single cell called the output cell: tlle arguments for this appcar in A theory jor cerebral neocortex 197 $ 4 . 1 . 7 . Let 8 be the cell in question, and 52 its associated property. 8 receives afferent synapses from each of the evidence function cells ci (cells which omit a signal--usually a burst of impulses-if and only if the input event E satisfies ci (E) = 1). I t is assumed that the strength of the synapse from the cell ci for ci to 8 depends linearly on P(S2lc,). If, for Q, the number of cvidence functions ci with ci(E) = 1 is indcpendent of E , 8 has simply to add the values of P(Qlci) for which ci ( E )= 1 since M 64 IP(QlE) = k-lci(E)P(Qjci)cc i -1 ci(E)P(S21ci) i-L1 if Ic is independent of E. That is, 8 has simply to add the weights of all the synapses from currently active evidence cells, and signal the result. It is easy to imagine that the firing rate of the cell 8 should vary monotonically with the value of this sum. The theory therefore requires that the strength of the synapse j'rom ci to 92 should depend linearly upon n,nil u~heren,, = the number oj' limes ci = 1 and a positive diagnosis u ~ aachieved, s m d n, -= the number oj'times ci = 1. This condition can clearly be generated by some process in which a combination of pre- and post-synaptic firing causes the synapse to facilitate, while pre- without post-synaptic activity causes its power to decrease. 4.1.2. Synaptic *wights:the range oj' relevance Xconomical usc of the full range of synaptic strength demands that the maximum strength of each synapse should be achieved tlt roughly the maximum value of P(Olc,) taken over those c, concerned with J2. This value is not necessarily 1 --indeed will rarely k)c 1: suppose it is q. Then the range of strengths available t o each evidence synapse must represent the whole of [O, q ] : it cannot be limited to Lp, q] for some p > 0, since the accurate caloulation of IP(Q1E) may often depend in part upon cvidence suggesting it is ~7eryunlikely that E is an Q. Furthermore, nll the evidence synapses a t 8 which are likely to be used wit11 one anothcr must have their strengths normalized to the same range LO, q] in order that a n unbiased sum may be taken. Any two synapses should be interchangeable, yet give the same output cell firing frequency. The range LO, qj is called the mn,ge of r.elevan,ce for evidence associated with Q. 4.1.3. The plausibility range Let [(B, q] be the range of relevance for evidence associated with Q. The maximum value which YJ('(QIE) can achieve is a t most q, and hence the maximum firing rate of Q should be reached a t or near this value. Unlike the synaptic strengths, however, there is no nced to be able to cover the wholerange LO, q 1, since the lower values may make the presence of 9 extremely unlikely. Let p be that value of IP(91E) a t and below which it is impossible that E ever is an Q; then [ p ,q I is called the plausibility range associated with LJ, and O 6 p < q ,< 1 . It is evident that some accuracy will be gainecl by representing only the plausibility range through the 8-cell liring 198 1).Marr frequency. Both p and q will depend upon the nature of the information with which i 2 is dealing; there will exist no universally valid values. The simplest view of the output cell coding of P(Q1E)thus requires that S2 should not fire a t all unless T(Ql E ) exceeds some minim um value p , and that its maxirra um firing rate should be achieved a t or near some maximum value q. The only res triction so far placed on the nature of the coding within the plausibility ninge is that it be rnonotonic incrcasing with Y'(Q E ) . if the outputs of two cells have to be compared t o decide for example into which of two classes the current input falls- -then unless unreasonahlc coinpIiurtl,iorrsare introduced, tllcy have to code 'I'(L2113) the samc \Fay. That is, they must have the same plausibility range Ip, q], and they have to code 'IJ(QIE) identically (within the lirnits of permissible error) inside the plausibility range. Since it is often necessary to decide between classes of the same kind, it rrlay be concluded that all output cells for diagnosing competing classes should be cells of the same construction: they should share a common plausibility range, and a common coding within it. The final complication to be added to the simple scheme of $4.1.1 which sirnply summed the weights of the active afferent synapses is that the number of such synapses may vary. E = C ci (E), and in general depends upon E . S2 must tllcrci fore be associated with some mechanism which can compensate for this, and its effect must be to divide the total ~c,(E)Y(S21ci) by k(E) = xci (E')for the current, i i event E. The output cell firing frequency must therefore be monotonically related to lc-l(E) Cc,(E)Y (Qlei) i within the plausibility range for Q. 4.1.5. Computing 'P(0IE) - p The four possibilities for the sequence of operations carried out in the computation of 'P(QIE)-p are represented by the bracketing in the following formulae. I n (1) and (2), the summation is performed before the division, whcreas in (3) and (4)it is performed after. I n ( I ) and (3), the subtraction is performed bcfore thc other opcrations, which arc donc on the residues: in (2)and (4) the subtraction is dolie last. A theory for cerebral neocortex 199 The smaller the numbcrs can be kept, the more accuratc will bc thc final result; so other things bring equal, computations which kccp numbers small are to be preferrcd to ones which do not. Other things are cqual in the choice between (1) and (2), ant1 in the choicc between (3) and (4). It is thercforc natural t o prefer (1) to (2), and (3) to (4). I n all thesc computations, a subtraction, summation and division havc to bc performed, so i t is important to consider whether they can plausibly bc cxecuted by a real cortical neuron. Many types of cortical pyramidal cell will be idcntificd in $ 6 as output cells, especially those typcs found in layers I11 and V of Cajal. The synapscs for P(SZlci) are assumed to be excitatory, and only thosc with ci (E)= 1 carry a signal. Hcncc there is no difficulty about arranging that only those P(Slci) with c, (E)= 1arc considered. The summation of the active synapses is, as rcmarked in 4.1.1, an operation which it is quite plausiblc to assume possible in the dendrites of 8. The subtraction must be performed by inhibition. Thc actual amount of inhibition, in both (1) and (3), depends upon k ( E ) = CC,( E), which will vary with E, so i the amount must depend upon the numbcr of active evidence cells c,. This means that one or morc inhibitory interncurons must have dendrites which samplc the fibres from the ci-cells, and whose axons terminate on the dendrite of 8 itself, ncar enough to thc aotivc ci-cell synapscs to intcract with thcm in an additive way. Thc dendritic field of 8 may be vcry largc, in which casc many inhibitory intcrncurons, each with a rather local dendritic field, will be needed to cnsure cach dendrite contributes its proper share to the sum. Both ( 1 ) and (3) require that the subtraction bc pcrformctl bcforc the summation, and the idea of subtraction perthrmcd uniformly ovcr thc ,!2 dcndritic trec makes both schcrncs possible from this point of view. 'rhc grcat problems arise ovcr the division, which has to bc donc if Ic (E)varics significantly. ( I ) and (3) diffcr in the order in which thc summation and thc division are takcn, so the discussion of division falls into two parts. First, can i t be done a t all; and secondly, if it can, docs i t appear that either of (1) and (3) is more likely? Suppose for the moment that division can be performed. Observe that it has certainly to occur ufter an estimate of the total value of Ic (E)has bccn made. This is bccausc a division by (n, + n,) becomes complicated if one insists on dividing by n, first, and then performing some operation on n,, since the nature of the operation to bc performcd using n, depends on the valuc of n,. If division is to take place, therefore, an explicit estimate of k (E)has to be made: by the neural machinery. The actual division process has then to involve this estimate. A distinction can be made between the mechanics of this process for ( I )and for (3). If the division is done before the summation, it has to bc donc over the whole 8 dendrite, and must thercforc involve some kind of uniform ficltl where intensity depends on k(E). If, on thc other hand, the summation is done first, the division might be a quite localized proocss. 200 D. Marr 4.1.6. A model for division This is not the place for a detailed discussion of dcndriie theory, but i t in v~orth pointing out, by way of general support for the theory's plausibility, thattthere exists an e~t~r~crne1-y simple model for the process of' division. Suppose Cr is a spike generator, and 1 is a spike inhibitor, as in figure 2. The spike generator produces imp~xlsesM ith some frequency v, and models the result of the summation process. The spike inhibitor I has two input s, one from C: and one of strengt,h wliich varies with k ( E ) ,1he FIGURE 2. A rr1orlc:l fbr Ciivisiorl. Thc spiltt: geilcmtor G emits spilzes a t n rate v and the iilllibjtor J allows a fraction f to bc tr:msinittcd, wher.c] cc Ic-l(E). Rpiltes arc thcreforc e~nit,tc:tl at a rate f v cc vl: ' ( I $ ) . number of currently active cvidcnce cells. 1 is such tlmt each incoming spikt. is transmitted with probability f , wherc f varies inversely with k ( E ) . That is, cac.11 incoming spike has a chance j = K k - ' ( E ) of crossing I, wherc R is some suitable norrnalizirg constant. I may thus be regarded as a conducting medium with only a fraction f of its maximum ability to sustain a spike. The outq)ut spike froqucnc~j is then monotorlicully related to vlc-'(8). Tl~crea re of course other rnodcls which have the same effect, but one fact seems to commend this above the rest: it is thiit spikes have been observed in the large dcndritic stelns of the cerek)cllar Purkinje cclls (Eccles, Ito Ot flzent&gotl~ni 1967, 13. 79) and of the hippocampd pyramidal cclls (Spencer cYs Nandel 1961). It is therefore not unreasonable to suppos(\ that the main apical dcndriLes of cortical pyramidal cells are also able to support spikes; and if so, that this is how the sum of the residues is commuilicated to the soma. I t is, however, wcll known that many cortical pyr:r,midal cells, especially those of layers I11and V, have somas surrounded by k):~skctcell synapses (Cajal 191 I ) . Tllcse cells are wcll placed to make an estimate of lc(E),the amount of parallel fibre activity, and are almost certainly inhibitory. 'I'heir action might therefore have the eEcct that a proportion of the spikcs from the dendrite fails to be transmitted to the axon, this proportion depending in a suitable way on tho value of k (E).Tile estimate of k ( E )itself could be the combirled work of many basket cells, their contributions being summed a t the sorrla itself. If this model is correct, it provides an explanation of how the division process is performed, in the case in which i t follows the summation of the residues. T t tlms favours tlle order of computation described by formula ( 1 ) of $3.1.5. A theory for cerebral neocortex 4.1.7. Arguments for diagnosis by a single cell It is necessary now to justify the choice of using one rather than a collection of cells a t which to compute a single decision. The arguments are these: first, the weights of the synapses from the evidence cells must vary with 1'(9/ci) which depends, for each cell ci, on the number of positive diagnoses coincident with tlle firing of ci. ITencc in order that evcry evidence synapse llas tlle correct weight, all the output cclls representing Q . a t whose synapses tllc evidence is collccted must fire every time a positive diagnosis is achieved. Hence either t l ~ eoutput cells must be completely interconnected, or they must drive some supcr-output cell, which fires them all if it is itsclf fired. Secondly, if cvidence for D is collected and judged by many cells, tllc weight each ell has in the final decision ought to dcpend upon thc amount of evidencc it has considcrcd. This could bc arranged by sorne suitable trick, but thc combination of this and the first point, though not compelling, favours tlle view that each decision process be carried out by one cell. If therefore, as also secms likely, there (lo exist scveral representations of any given concept, they are probably independc~it. 4.1.8. Dur~lpuypose output cells This concludes tlle discussion of tlle implementation of the theorem 2.3, but before leaving the topic to discuss the form of evidencc functions, solnethi~lgmust be said about driving the cell SL k)y information of two distinct types. If a single diagnosis could be achieved by two quite unrelated sets of evidence, with different plaixsibility rangcs, i t would bc necessary to locate thc relevant synapses on dil'ferent, independent regions of dendrite. I4or example, use of 8 with direct sensory information may involve synapses on t1.1~ apical dendritic trcc of a cortical pyramidal cell, whereas associational infbrmation may be held in the basilar dendrites. Thesc systems could possess different values for both limits, p and q, of' the plausibility range. They would require entirely different systems of inhibitory subtraction cells, and although the baskot cells for the division function could in each case send synapses to the soma, their dendrites would have to sample the correct, disjoint populations of evidencc fibres. The cell 8 would then effectively become two cells in one, and it would succeed i n this vale as long as the other cells of its class also had the same specifications, and the same dual plausibility ranges. If 8 can be driven by sensory or by associational information, i t is possible that conditional probabilities for scnsory evidence should not count those instances of D which arise k,y association. This is because in the second r61e, f 2 may be being used symbolically, not directly. P(.Qjci)For sensory information should probably not be influenced by instances of this r61e. Finally, the advantages of such dual r61c cells may be important. If all the vavious conditions are satisfied, they can probably combine in a satisfactory way information of two kinds in a single diagnostic process. This would to some extent be against the rules, but as long as the contravention is uniform over cells of the 2@2 D. Msrr relovant category, it would probably work. The effect would be to make i t easier to see what you expect to see. Tlie results of this section arc surrlmarized in figure 3. F r a r i n ~3. The output cell CZ bas tlircr klntls o C afCt1rc>rltbynnpsr,: Ecbb synapsps (opcn triof 1nh1131toryb y ~ ~ q > s'I'hosc~ (\. from the S-cells a ~ ~ g l rfr.oiri s ) cv~tlcnce(*011s,and two 1<11\(1s arc sprencl oviXrtIlc Jor~clritictrcc, and pcrforrn a subtmctlon: those froin the D-ccllr, concorltmtcd itt thc some, pcrfortn abd ~ v ~ s ~ o r l . 4.2.0. Strxndnrd evidence ,[unctions Two constraints have been pl;~cedon the evidence filnctions c, for a particular output cell 9E: tilat the evidc~lcethey provide should beof sufticierit quality, and that tlie arrionnt of correlation between the ci for i2 should. be either negligible or regular in a way which does not oausc improper 10i;~s.Tile choice of evidence function ought to depend upon tho particular circumstances for which it is required: if especially eflicient f~mrtionscxist :tnd ~c211be constructed for a particular purpose, their usc will pel-mit an economy irr the amount of stl.uct,urerequired for t h a t process. But it will frequeiltly occur either that rnthcr little is known about exsctly what information will colne to he held in it p:~rticularpiece of cortex, or Lhat there is nothing particular about that inforlnatioll which makes i t a suitable ctsndidste for special methods, Yor such cnses, it is natural to seek t~ class of Ennetions from wl?ic*h2% 'standard' forrn of cvidcliec msy be constructecl. rl I here are vn~iousconditions such a class s h o ~ ~satisfy. ld Most important, they A theory for cerebral neocortex 203 should have >I simple neural representation. Secondly, and also essential, there should be different categories of function corresponding t o different expected qualities of the evidence to which they give rise. This is an economy condition, since it is wasteful to use better (and hence ingeneral, rnore) evidence than necessary. Thirdly, according to the Fundamental Hypothesis 5 1.6, the expected quality of the evidence produced by the function c will depend upon the distribution of the events E with c(E) = 1 over the event space 2. If the property Q which the cell 8 is signalling is stable over relatively small changes in the input event E, the best evidence functions c will be those whose events P with c(F) = 1 are grouped together, as seen through the natural metric d of $1.3.2. 4.2.1. Arguments for codon [unctions These three conditions do have implications about the kind of evidence one rnay expect: they strongly suggest one particular f'amily of functions, the generalized (R, 8)-codons. First, observe that figure 4 shows the simplest kind of afferent Ficr~nx4. Ail (IZ, 0 ) - c o d o ~ crlt. ~ 'l'21ore arc R cxcitotory off~rcntsynapses (opm circtos), a n d crrougll inh~bition(fillcd circlt~s)t o givc thc ccll a. threshold oC 0. systerrl possible for a cell. 'I'hcre are I2 afferent fibres, aLl,..., ai,,each with an excitatory synapse of some fixed w e i g h t l , say. The ccll has threshold 0, which rnay be determined by some suitably arranged inhibition. Then the cell will emit a signal whenever a t least 0 of the IZ fibres aLl,..., a,,< are active: hence thc set ol' firing conditions for the cell constitutes an (12, @-codon on any event space over fibres which include ..., a,. A11 (R, 0)-coclon is thus a specification of thc firing conditions for a cell whose afferent relations \vitli its input fibrcs are simple, and anatomically and physiologically plausible. Secondly, it has been observed in S 3.2 that suitable values of (I?,0) can be chosen to construct an (R, 0)-codon which will match any previously specified quality of evide~icc.I-Pcnce the second ~ o n d i t i o ~ isl fulfilled by the family of (Ii,0)-codons. The various ccchnical problcrrrs which arise when one tries to design a net which will. produce (I?,0)-codons for a particular input can be solved, and will be discussed in the next section. 'I'hc above two arguments show that codon functions arc sufficient to satisfy the two corresponding conditions: the next one shows that they are in some degree necessary for the third. 13-2 D. Marr 204 Let 2 be the event space on (a,, ..., a,) and let d be the natural metric of 5 1.3.2. Let (c),: be the evidence functions for a particular property Q, and let 9 hold for a p:wticula,r event E E 2, where (1 6 i 6 L ) , J4(ai) = 0 (L < i < 111). &(ai) = 1 Without loss of generalitg, suppose and choose F E 2 such that d (E,F) = 1. Then :iceording to the Fundamental Hypothesis $1.6.4, the chance that .tP also has S2 is better than for :in event arbitrarily selected from E . Hence most of the c, with c, ( B )= 1 should have c, (F)= 1 as well. This argument applies to :ill F with d(E, 3') = 1: so let Nl (E) = (E1ld(E,P ) < 1). For each ci, 1 < i < k, define a subset Ci of (a,, ..., a , ) in the following way. Write Fj = the event obtained from fl by altering the value of the fibre aj, i.e. Ei (ai) = E (ai), all i + j, 4.(aj) = I +>E(uj)= 0. The subset (4is obtained thus: (4= {ai1ci (I$).I.ci (E)). < i < k, ci(<) = l a Ct c $.. That is, for 1 < i 6 k, c, rnay be regarded within N , ( B )as a detector of Then for I the subset C, of the jbres (a,, ..., a,,). Thus locally, (i.e. within N , ( E ) ) ,ci behaves like the codon function with associated subset C,. But i t has been observed that for an arbitrary change from I8 to 4,some 1 < j 6 Ic, the values of the majority of the functions ci should remain unchanged. Hence, for most of the i, 1 < i < I;, it must be true that ci takes the value 1 over most of N, (E), (assuming the c, are not organized in any special way). This implies that the size of the subset C: which ci detects in 1%(E)is small, for most i, 1 < i < Ic. This argument shows that if an evidence function is constructed fur classifications in which the Fundamental Tlypothesis is true, then such a function behaves 1oc:rlly like a codon function with a rather small associated subset. This is the most that can be deduced about evidence functions from the necessarily imprecise considerations out of which the present theory is constructed. The case for (R,0)-codons being the general forrn of evidence function is not logically established, but it would at present be impossible to rnake a rigorous argument for any family of functions. The th~.eearguments presented above do constitute good evidence in favour of cwdons-evidence which it would require a strong and unexpected finding to dis~upt. Finally, in the particular. case of the cerebellar cortex, where according to B1ar.r A theory for cerebral ?~eocortex 205 (1969) something analogons to the present theory actually occurs, the evidence cells are the granule cells, which are codorl cells with R 6 7. It will be pointed out in $ 6 that tho cerebral neocortex contains cells which may be regarded as ( B ,(I)-codons wiC11 larger R. I t is thought that the combined weight of these arguments constitutes sufiicient grounds for studying in detail the setting up and performance of (R,0)codon cells, where the values of It and 19have various relations to the parameters of tht, code used on the set of input fibres (a,, ..., aN}. 4.3. Codon neurotechnology 4.3.0. The possible need for codon formation At first sight, the use of codons virtually solves the problem of the neural represe~lt;~tion of evidence functions. I'rovided the contact probability z from the afferent fibres {a,, ..., a,,} to the population @! of codon cells has the appropriate value, it remains only to set the thresholds of the codon cells in a suitable way (see $3.1). 'i'he only possible problem with this scheme is that the evidence thus obtained may not have the required quality. The better the evidence required, thc more specific the codon functions must be, and so the less frequently they take the value 1. If a roughly fixed riumbev has to fire in order to provide an adequatc representation of each input event, the size of the underlying population of codon cells has to be larger the better the evidence required. Unless special measures are taken, this might make it necessary in a particular case to provide a huge population of evidence cells, only a few of which are ever used. This difficulty can be avoided by using a special technique. It works by modifying just a few of the afferent synapses a t s cell, so that a codon f ~ ~ n c t i oofnexactly the required sort is represented thcre. The process of determining to which codon a particular cell should respond is called codon jorrnation a t that cell. The essence of codon formation is very simple. Let $3 be a population of cells, each of which has R' afferent synapses. R' is such that a typical input event can expect to excite 0 synapses a t each cell of $, where 0 is tho 0 of the (R,0)-codons eventually required. The information which the codons liave to represent arrives daring a spccial setting-up period ($5.1.2),and only the synapses used during that time have any effective power later. This produces a population of codon cclls such that only a few of the total number of afferent synapses have any power, but those few are the correct ones. The details are described fully in the following pages. 4.3.1 . l'echniques l o r codwn forwl lationtion The three basic mechanisms for codon formation appear in figure 5. In ( I ) the afferent synapses are excitatory, and become ineffective if and only if thcre is postwithout pre-synaptic: activity. In ( 2 ) ,the synapses are composed of two parts: one excitatory and unmodifiable, and one initially ineffective, but which is facilitated by sirn~altancousprc- and post-synatic activity. The modifiable component is thus a 206 11. Marr Hcbb-modifiable syilapse (Hcbb 1949). 'l'lie cornbii~ationin ono synapse of an unmodifiable excitatory component with a I-Iebb-modifiable component has an importance which was first noticed by Brindley (it appears a t the s-cells in Brilltiley 1969). It is therefore proposed that such synapses be named Brindley synapses, to distinguish them from IIehb synapses which will taken be to possess the same modification conditions, but no unmodifiable excitatory component. I'IG~JR~: 5. 'rhroo rr~odelsfor codon fo~,rn;tt~on. (1)Uses synapses wEueh uro lnltlally excitatory, hut are inotlifiod to be inoffertive by post- without pre-synaptic. activity (open squttrc.~), (2) uses Br~ndlcysynapms (z~rrows),( 3 ) uses Hcbb synapses (oppr~trlarlgles) anti :t climbirig fibro (open circlos). A11 three havo inh~bitorysynapses (filled e~reles)w11ic.i.1bet t11c cells' throsllolds a t an appl.opl.~atclevcl. Pn models (1) and (2), the cells also receive some inhibitory synapses which set their thresholds a t the appropriate value. The eqllations governing the nurn1)t.r of codons formed in any particular situation are those of $3.1 : X is called thc For-rnation proloak)ilitjr in those equations for this reason. Case (3) is slightly different: this ccll possesses an afferent fibre analogous to the ccrcbcllar climbing fibre. and its ordinary a,TTc.rcnt synapses arc Hebb synapes, which arc initially incIfect,i\re, and are modified by the conjunctioi~of prc-syrt:~l~tic and climbing fibre (or post-synaptic) activity. The climbing fibrc is activc: only during the setting up period. ?'he consequences of this rnodcl arc slightly dif'erent from t1:ose of (I)and (2), for after setting up, all those spthpses which were active during the setting up period will have been rnodifiecl, not just those a t a cell whcrc, a codon was successfully formed. The conditions in which the codoii cells may laLcr be used are different for enell of these modclls. I r r ( I ) , there is no difficulty, sincc t,llc irrolcvant sjrtlapscXsllnvc: no powel*.In ( 3 ) ,the fact that all synapses active during the settin:; up periocrl will havt: been modified may meall tirat an undesiruk)lgrlarge nr1mk)cr hltvc been rn:~dcexcitatory. Wlethods (1) and (2) arc in this scnse rnorc selective, and will tend to produce better evidence. I n ( 2 ) , during later use, the cell thrcslloltl has to bo set so that activity in a t least 0 mod-fied afferent synapses is rccp~iredto djscllargc the rvll. I n d l cases, the codon cell thresholds can be set a t the appropriate levcl by using A theory for cerebral neocortex 207 sampling techniques--both of the afferent fibres and of the codon cell axons -in the same way as the cerebellar Colgi cells are thought to control the granule cell thresholds (Marr 1969). 4.3.2. Model (2) preferred to model (1) Models (1) and (2) will produce evidence of the same quality in a given situation, k)nt modcl ( 1 ) has an inkportant disadvantage. If synaptic modification is an irreversible process, the process of codon formation in this modcl is a once and for all affair. The fact that all the synapses not involved in the first codon represented arc thereby rendered ineffective means that the cell can never be used for more than one codon. This model essentially rcprcsent,ed one codon by eliminating all other possibilities, and as such is unattractive. This is not true of model (2), where a synapse which is unused the first time could be used later on, if that became desirable. The model (2) needs slightly more complicated backing up by the inhibitory cells, since the level of inhibition necessary during codon formation both differs from that needed for recognition of codons already formed, and depends upon the number of codons already formed a t that particular cell. This difficulty can be overcome if the inhibition level is set primarily by a count of the active codon cells, so it does not significantly affect the desirability of this model. Model (3), like ( 2 ) , does not suffer from the once-for-all disadvantage; but as pointed out in $4.3.1, is not strictly comparable with ( I ) since i t forms evidonce in a slightly different way. 4.3.3. A problem with ( 1 ) and ( 2 ) I n model (I), if synaptic modification is irreversible, each cell can represent only onc codon. Hence the afferent synapses should not be modifiable all the time; the precious potcnt,id of a cell must be reserved for information for which it is worth being used. A similar point holds for model (2), sincc if the afferent synapses were permanently modifiable, any incoming information could cause the creation of codons. The point here is not that the first event rules out the rest, but that all are treated as indiscriminately v;~lid.Since any input can create a codon if the a~iatomy allows it, the cell is no different in function from one where afferent synapses are unmodifiable excitatory. Therefi~re,for models ( I ) and (2), the modijable synapses involtled must be modijable only whilst thrxt information for which codons are required ,fibres. i s p r ~ s e n ti r ~t h aferent ~ l'his difficulty arises in model (3) in a less acutc form: the problem here is that something has anyway to specify when codon formation should take place. No difficulties arise with the hardware, since modification is geared to the climbing fibre activity; but climbing fibres cannot in general select the best cells. 4.3.4. The solutior~using ir~hibitior~ The only solution to this problem in models (1) and (2) which uses conventional ideas is to suppress the cells with inhibition until they are wanted. The alternative, 208 D. Marr to excite them when they are wanted, is equivalent, but reduces (2)to an uninteresting variaut of (3). This scheme would work until the first codon was formod, but would then fail in model (2): this is because inhibition cannot subsequently bc maintained a t these cells without their losing the abilii,y to recognize the codons that have been formed a t them. This defeats the object of the scheme. 4.3.5. Another solution The alternative to this kind of solution is that the synapses genuinely sho~tld become modifible only a t those times when codon formation is required. This i s not as implausible an assumption as it might appear, since considerable organizatiorl Elas to take place before the lormation of codorls kjecomcs necessary anyway. Codon formation takes place either when a new classificatory unit is formed, or when new evidence functions are added to a n existing one. The decision about how to cornrnit a piece of information t o the neocortical store whether as a new classificatory unit or as an association between existing ones- has to he taken on the basis of its relationship to other incoming cvcnts. It cannot in general be taken immediately : for example, it takes time for the mountainous sLructare of (L probability distribution to bccornc apparent. This has thc consequence that it is best to send all inc:oming information to a temporary associative store, where it is held and not altered. This is onc point of Simple Memory theory ( $ 6 and Marr 1971). When it becomes clear horn: s picec of information should bc stored, it can be taken out and dealt with in thi: appl.oprit~tcway. If, for example, i t should be set up as a new classificatory unit, a location must be sought (the one with tllc most favourablt pre-existing structure) and the information directed thcrc for representation. 'I'hc complete operation is so specxiizl and complex that thc assumption, that a suitable delicate change in the chen~ic~al environment of the relevant codor1 cells ar*companicsthe transmission thcrc of the setting-up information, ceases to uarry a special implausibility. The rnattcr is rliscussed further in $6.1.2. 4.4.0. Prciliwr inwry assumptions The analysis of $4.2 suggested that oodon Functions :src likcly to be widely iised as evidence functions. If they are, two conditions will Ilold, onc about the input events, and one about the cotlon~thcmsclvcs. First, the input cvcnts for :r, particular ~ output cell Q are likcly to occupy a code of sorne fixed size L, say, on t h input fibres {a,, . .., (I,,). The reason for this is that if the input events have an ar1)itrar.y form, thcn codon functions of a n ar1,itr:sry forrn l ~ a v cto be allowed. An arbitrary codon functionis onewhich assigns the values 0or I to a subset of (cr,, ...,a .): the eodor~ functions we have met so far have assigned only the value J . There is no objection in principle to thc general codon function, but it is more difficult to build its licural rcprcscntatioris, and much morc difficult t o model codon formation. I t will therefore A theory for cerebral neocortex 209 be assumed, for the purposes of this section that the input events are events of sizc L over {a,, ..., a*,). Secondly, all the codons associated with a given output cell 8 are likely to be of about the samc size. This is because only a small proportion of the codon cell population will be used for any single input event: these are chosen by selecting an appropriate codon cell threshold, andso come from the tail of a binomial distribution. The numbers of cells discovered in such a situation decreases sharply as the cells' thresholds rise, so that at any given threshold, the cells may to a first approximation be regarded as all having the sarno number of active afferents. Since the input events also will have tlle same size, all the codons connected with a given output cell S2 may be regarded as having the same specifications. It will further be assumed that the actual codon cells which exist have been chosen raildornly from the population of all such codon cells with those specifications. These conditions are sensible also from another point of view, since the expected cyuality of evidence obtained from a codon depends upor1 its specifications. It was rernarlied in $ 2 that the expected quality should be uniform for a given decision cell 8,so this condition is likely to be fulfilled. Further, tlle randomness assumption means that problems ;~k)out correlated evidence a,re avoided. 4.4.1. Xtatement of the main wault Suppose a set of (R, 0)-codons are chosen as evidence functions for diagnosis of tlle property S2, and that these codons constitute a randorn sample from the set of all such codons. Suppose the illput events llave size L over (a,, ..., a,): then an iocompletc. event specifies the values of less than 1, input fibres. Lt is shown that the iatery>ret;ttionof such an incomplete input may be carried out by taking a weighted sum of certain J'(Q[c,) in a wag analogo~lsto the pr-ocedurefor diagnosis of conlplete events. An estirnste ofthis sum, for an incornplcLe input S , can be obtained irr a real neural not by lowering the threshold of the codon cells until X causes activity in ;: significant number, and applying these signals to the output cell L 2 in tho usual way. Hence in a neural model whcxre the codon cell thresllolds are controlled by cells designed to maintain thc number of active codon cells a t a constant valucl, the interpretation of an jncomplcte event is ;L natural consecluencc of applying the event to the net. Therc are two sources of error in this estimate: first, thoss. codon cells with more active afferents than the current codon cell thresllold will probably acquire an incorrect weighting of their corresponding value of P(Q[c,)a t 91; and secondly, the estimate is based on a sarnpling process. 7'1~efirst kind of error is alleviated by two fscts: that most active codon cells have thc same 1111mber of active afferents, only a very fcw having more (because the active cells corne from the tail of a binomial distribution); and that those codon cells with more active aEerents will be driven harder than the rest. This effect operates in tlle right direction to reduce the error. l'he inaccuracies from the second source are probably unimportant. The intorprctdtlorl theorem, $2.5,1sconcerrlecl\.v~ t thc h treatment of ~ n p a t sin whic2l t21e of yornc, of the fil.)rciarc untlcfirled. Pn tltc pri~sc~r~t caio, thrs corresponds to states where fcwer than Ti of the input fibres (a,, .. .,an;}havr thc T ; h e 1 . L r t X be a, wlbtvcnt of X(rr,) = 1, 1 < i < E < L. Let the input event spacae X, anti suppose that X spc~c~iliri E l , E,. ..., /< !: be tho possible completions of S 111 X, so that each E, ( 1 < j < J)spcvifies $ha$ t,xactly L of tho a, have tho value 1. By the Tnterprc~tntion'I'hcorcm, I dues .I Y'(f2IAX)= 2 P(ff7,/>y)cP(12p3). j-l If nothing is kr10~571a b o l ~lP(C31X), t it rnust be assumned that,P(Zd,/S) = I / , T all 1 Lot O ~1{ci\ I < i i I<] be the set of all cviderlco functions for Q over X. 'l'hon < j < J. li whore L(E,)is tlze rlwnbor ofc, with c,(lC,) = 1, i.e. L(E,) = \3 c,(l$,). Hcnct: i -1 .r r< = 2 J-l 22 et(EJ)k-l(E,) P(Qlct). Y'(l2IX) j- 1 i-l D c h c the f2~1ilityof real-valued functions w,, 1 < i, K < l< oil tlic sct {El:,, ..., B ~ J by ) .r = J-I 2: P(Qjc,) 2; ui,(E,). +--I 1-7 The operation of cdcu1n;tirlp CP(l2JX')is thus equivalent t,o conzputing tho wcighted sum .I .I the coefficiolltof P(Qjci) is ); to, ( E , ) , and wc now s t , ~ ~ tk~e dy v i ~ l ~ this ~ etakes. j-1 ID, j-1 (E,) measures the weight uith which P(Q\ci)contributes to the set of all possible completions P(Qjci)has a c!ortain weight,: it is zero if ct ( l g j ) = 0 and oSX in X. I n a givon c:ompletiorr. Ej, if not, this weight is 3 /lc(.li:,)wk1ers k(Ej)is tho size of tho c,-rcpresent,at,ionof Zj. Now the llumbor lc(Ej)is a r;u~clonrvariable obt,a,inocll,y adcling tlzo terms ill the tail of a birlolnial distribution (see equation 3.1.1). Suppose k has dist,rib~~t~iorr V : thon k-l has ~listribution v-l say, witjhcxpoct,ation k-I ( E-1 in general), arlils~ariancoG (say). (Ass~~rno k = 0 with arc?st,riet,lysp~aki11g zero probabilit,y).Tho values of k-'(Ej) for differol~t~j not inilep~ndont~, ~.vol~ld -~(E Emve ~ ) the same moan but i f they wi:ro, the random variable (l/n(c,))I; C ~ ( E ~ ) ~ G i 16-\ and varianc.~crjJjn(c,)), where ic(c,) = tlic rurnbcr of E, with c , ( E , ) = 1 . Tho value of' cr/J{n(c,)) does, huwevor, give sornc guide to the variance, of' this raizdum It may k)o assumed tlmt u is small, since part of the function of tho Colgi-typo v~~riable. inhibitory cells which control 1,110 thresholds of tho cells is to onsure a constnnt-si7cd representatlon for each input w e n t lC. The actual raiidoi-n variable d~?suribcd above 1~111 have a variance somewh~rebetweon o and G/ J{r~(c,)),but since cr is small, arid the true b ( x asslrrned that its vari:lllrc IS small cnongh value will bo nearer cr,/,/jn(c,)\,it may sa.ft.1~ to bo ignored. + A theory for cerebral neocortex Ilenco 'P(UIX)= K* I: n(c,)l'(Qjc,), whore n(c,) = tho number of' E, with c,(E,) = 1, L alld E, conlpletes X; K* is some suitable rlormalizirlg constar~t. 0)-codor,, and r is the rlunlbor of Now n(c,) depends upon I?, 0 and r, where c , is an (R, aeerent fibros active in X which are contained in S(c,),tho support of c,. In fact, tho sum being talcen until on0 factor roaches 0,and whero N = no. of'irrput fibros, L = no. of fibres activo in each full sizod input event, W = no. of fibros activo in X, $1 cr is a11(It, 0)-codon, r = no. of fibres activo in tho support of c,. For R = 0, n(c,) is primarily a function of r ; call it rb(r). Then ( - -I -) - N-(W+lZ-r-1) > N- W rb(r) L-(W+It--r-1) I,- W ' For typical values, 0.g. N = 100, L = 40, 'M = 20, n(r+ 1) n(r) > 4, which ~llustratestho fact that thosc c, wrth greater r havo much rnore ~rlflurncoover Y'(SLIX) than those wrth smaller r. Thoproblem of estimating Y(Q1X) from a famrly of (R, 0)-codons c, IS thus equivalent to tnlc~ngtho weighted averago of P(Qlc,), whore the weighting depends upon tho nuinber, r, l bo shown that this can bo nchiovod of activo ~nprrtelomcnts In the support of c,. It w ~ lnow by rodlrcirlg tho threshold of the eolls for the (IZ,8)-codonto sonlo suitak)lolowor v,~luoB', whlch depoilds upon W, tho si~c.of S. Two problems have. to be solved when IJ(QIX)is computed: first, onough c, havo to bo 115eclfor the estimated arrswor to be reliable; andsrcondly, thoso c, which are usod havc to br uoy,htetl In tho correct way. It 1s assumod that the c, arc all (12,O)-codonswhoso noural rrpresontation is ofrectivoly as shown irl figure 4: it is irnmatorld whothor this is achieved l ,Y of S ~ X OIY. Lhe pr0b:ihility of thr cell's by nlodcls ( I ) ,(2) or ( 3 ) of figure 5. lpor i ~ i iilpllt bcing active is whr~rethe collhas thresholtl O', and 3 = IVlN (by analogy w ~ t h3.1.2). This 1s just iho usual tail of a binorr~ialtiistribut~on.Now as 0' dccrcascss, the. rlumbrr of ( R ,0)-codons which brcomo active iiloreases rapidly: while n(8') I S small, both O f + I > 1 Z - 0' and N > 2 CV will usually hold. Herlce as thc value X fkcs irrcreascs very fast: so that tho tldof 0' is lowcrrri, tho rrurtlbor of c, c ~ ~ lwh~ch ls fercmcr In 0' betwcon hav~ilgno cells acative to habnlg thc usual number for a full cvcnt will only I)o of tho orcjl~rof 3 u n ~ t sof synq)tic stferlgth, and tho groat majority of the active ci will have exactly 0' active affi?rentsynapses. 71hc problem of the diffcrcntial weighting uf'the P(Qjci) can thus bc: alleviatod as lorrg as 0' docs not lio far below the mininlum rlumbor required to achievo tho response of a t lewt on0 c,-cell. Yrovitled tho numbor of c,-cellsmade activo in this way is of tho order of tho number ordinarily excited by :Ifull input event, cnough cvidcnce will bo irlvolved fbr the 212 D. Marr estirnato of TJ(Q/X)to he reliable. Strictly, all tho ci which could possibly talic tho value I on sonio complotiorr of X should be consu11tc:tl: bllt this numbcr could be vory large, and tho problerrls of achieving tho correct woightirrg bocomo iniportarrt. I t is therefore rnuch simplor to take mi estimate rrsing a,bout tho usual nurrrber of'c,. Firrally, it s h o ~ ~be l dnoted that if this is cionr:, the ci-thresholds cml be controlled b y tho sanreinhibitory collsas control tlioir thresholds for nornial irlpul; cvents, since it has already bocn shown that, a circuit whoso frrnction is to lcoop tho nunlbor of ci-collsactivo constant is acloquatc for this task. 1f'tl.iistechniqne is usotl, t,hose few ci-cclls with rnorc than 0' activo aff'erents will have a highor firing rate than thoso with exactly 0'. Honce thoy will anyway bo given groator woight,it~g a t tho c,-cell. I t would bc optimist,icto suppose this woightirrg would bo oxactly the correct arnormnt. since tho factor involved tlcponds oil the pnaamctors A7, L, W ,R, 0 , r ; but tho ef'foct will certainly rcducc the errors involved. 171uus~6. 'Vhv basic rrcural model for tllagnos~sand ~ntcrprc-ti~tron. 7'2ic- c-vitic-nor crlls c,, ..., c, are codon cclls with Brmdlcy aifcrcnt synapses. 'L'hc G-cell coritrols thr codon tl-~ro~~gliasc.t~riding ~ t s dendrite to l r c ~ pthe crll thrcsl~old:lt use% neqativc f<~r~dback nurnbrr of cotlori cc-11s aclivr rollghly constant. I t s dcscendmng dcndrltr sarrlplcs tho input fibres directly, thns provrdrrrg a fasL pathway throl~ghwhich an rn~tialcstimatc- is rnadc. The othcr crlls and synapscs arc as m figurcs 3 aricl 5 (2). 4.5. l'hc full nez~ralmodel for diuqnosis and intcrpretcc,lion The arguments of 3s 4.1 to 4.4 lead to the design of figure (fior the basic diagnostic modol for a classificatory unit. The afferent syn:tpscs t o the ci-cells :&reexuitntory, and rn;~yhave been achieved by some suitable codon formation process: inoclel (2) o f figure 4 has been chosen for figure Ci. The inhibitory cells G control the thresholds of the ci-cells, and their furlctioil is to keep the number of active c,-cells roughly constant. If they do this, tjhcmodel a~t~omaticnlly iiitcrprcts input cvents which arc A theory for cerebral neocortex 213 incomplete as wcll as those which are full-sized. The G-colls are analogous to the Golgi cells of the cerebellum, and it is therefore natural to assume that, as in the c:me of those cells, the G-cells c;m be driven both by the input fibres a!, and by the c,-cell axons. The find control should be exercised by the number of ci-cell axons active, but a dircct input from the aiiRXOns would provide :I fast route for dealing with a suddcn increase in the size of the input event. 'L'lke ci axorls and the output cell 8 have been dealt with a t length in 54.1. The cclls S are the su1)tracting inhibitory cclls, and the cells D provide the final division. The cell 8 is shown with two types of evidencc cell affcrcnt: one, through the c,-cells to the apicztl dendrites, and one (whose origin is not shown) to a basal dendrite. I n practice, the distributvon of the aj terminals, and the G , D and S-cell axons and dendrites will all be related. The kind of factor which arises has already becn met in the cerebellar cortcx for the Golgi and stellate ccll axons and dendritcs. Roughly, the more rcgular and widespread the input fibre terminals, the smallcr the dcndrites of the interneurons may bc, ztnd the further their axons may extend. Little more of value can be added to this in general, except that the exact most economical distributions for a particular case depend on many factors, and their calculation is not an easy problem. $5. r r 1 ) I~S C O V~I C R Y A N D RFFINFlKENT O F CLASSES 5.0. Int~oduction There are three principal categories of problem associated with thc discovcrjr and refinement of classificstory units. They are the selection of the information over which a new unit is to be defined; the selection of a suitable location for its representation, togcther with the formation there of the appropriztte evidence and output cells (formation in the inforrnation sense, riot their physical creation); and the later refinement of the classificatory unit in the light of its performance. The selection of information over which a new classificatory unit is to bc defjned dcpcnds, according to the Fundamental Hypothesis, upon the discovery of a aollection of frequent, similar subevents in the existing coding of tho environment. The difficulty of this task dependsmainly on two factors: t h e a p ~ i oexpectation ~i that the fibres evcntually decided upon would be chosen; and the time for which records have to be kept in order to pick out the subevents. The threc basic techniques available are simple storage in a temporary associative memory, wllich alloms collection of information ovcr long periods; the associative acccss, which allows recall from small subevents, and hencc eventually the selection of the appropriate fibres for a new unit; and the mountain climbing idea, which discovers the class once the population of fibrcs has been roughly determinrd. Only tlre third techl~iquc can be dealt with here. 'L'he selection of a location for a new classificatory unit is simply a question of choosing a placc where the relevant fibrcs distribute with an adequatc contact D. Marr 214 probability. The formation of evidence cclls there is a problem which has already been discussed in $4: the formation of output cclls is dealt with here. Pinally, the refinement problem arises because part of the hazard surrounding the formation of a new classiiicatory unit is that it is known in advance neither why it is going to be useful, xior of exactly what events it should be composed. When first created, therefore, the new classificatory unit is a highly speculative object, whose boundaries and properties have yet to be determined. The su1)sequent discovery of the appropriate boundaries (if such exist) is the refinement of the classificatory ~rnit. 5.1. Setting u p the neural representation :sleep 5.1 .O. introduction It is convenient to begin with the second problem, of selecting a location and forming there a suitable neural structure. The reason is that the other two problems are best dealt with in the context of explicit neural models, and these are not complete enough until the apparatus necessary for the setting up problem has been incorporated. For the purposes of this section, i t will therefore be assumed that the subevents which are to make up the new classificatory unit have been decided upon in advance, and are held in a store. The problem then reduces to that of discoverirlg a suitable location, and creating there the appropriate evidence and output cells. 5. I . I . Xebecting a location The natural method of discovering a suitable location is to form a representation jn all those places which are suitable. For this, the whole cortex is, so to speak, placed in a suitably receptive state, and in those regions where enough information is received, a representation is automatically set up. Later refinement will select fbr the most successful, and not all of the representations initially set up will survive. This rnethod has two important advantages: first, it removes the difficulties which arise in computing where the appropriate fibres gather together with a large enough contact probability. The discovery of these special locations is better left to the method suggested, whereby it is a natural consequence of their existence. Secondly, the method allo~vsthe multiple formation of' representations, wllicll means that a single input can generate many different classes. There are often excellent grounds for categorizing information, and dealing with each category separately. For example, inlormation about shape can profitably be classified separately from information about colour, alld this could be implicit ill the way the connexions are originally arranged. An area of cortex which received only information of a particular category would classify within that category. Tf many sucEi areas existed, one piece of inforrr~:itioncould simultaneoasly cause classes in several categories to form. This is probably an important aspect of the solution to the parhition problem 5 1.3.3, but one which relies on the rough genetic specification of the categories. A theory for cerebral neocortex 215 6.1.2. Codon j o ~ m a t i o na n d sleep The problems of what evidence functions to form, and how to form them, have been discussed in 5 4. It may turn out never to be necessary to use codon formation, since this technique is essential only where a standard codon transformation, with unmodifiable excitatory synapses (Marr 1969)~does not produce evidence of suficierit qudity. The finer the cl~ssificationsrequireci, however, the better the quality of the evidence must be; and the more sophisticated they are, the less certain i t becornes that genetic information can provide pre-formed codons of the right type: so if codon formation is used a t all, i t will be used more i11 higher than i11 lower animals. I n $4.3.5, it was decided libat the most likely technique for codon formation used Brindley synapses which become modifiable only a t those times when codon formation takes place. Arguments were set out there for the view that this assumption does not have a complexity which is disproportionate to those concerning the other operations which must take place a t these times. I t was pointed out in $4.3.3 that when the afferent synapses to codon cells are modifiable, only that information for which new evidence functions are required should be allowed to roach these cells. In $4.3.5, it was shown that information from which a new classificatory unit is to be formed will often come from a simple associative store, not directly from the environment. I11 $5.1.1 it was argued that the most natural way of selecting a location for a new classificatory unit was to allow one to form wherever enough of the relevant fibres converge. This requires that potential codon cells over the whole cerebral cortex should simultaneously allow their afferent synapses to become modifiable. Hence, a t such times, ordinary sensory information must be rigorously excluded. The only time when this exclusion condition is satisfied is during certain phases of sleep. The tentative coiiclusion of the theory is therefore that sorne cerebral codon cells have Brindley afferent modifiable synapses, which only become modifiable during sleep. The firm conclusion of the theory is that if the locations for new classificatory units are selectod by the method of $5.1.1;if there exist plastic codon cells in the cerebral cortex; and if they use Brindley afferent modifiable synapses; then these synapses are modifiable only during the correct phases of sleep. A consequence of this phenomenon for the learning characteristics of the animal as a whole is set out in $ 7.6. 5.1.3. Output cell selection :gene~alities No methods have so far been proposed for the selection of output cells for classificatory units. The question was raised in $4.1 of whether more than one physical cell could profitably be used as the output for a single classificatory unit: i t was concluded impracticable unless such cells formed independent representations. The problem of output cell selection is therefore that of finding a single, hitherto unused cell whose dendrites are favourably placed to receive synapses from most of the evidence cells created for the classificatory unit concerned. These codon cells will be clustered round the projection region of the relevant fibres, so the selection process has to work to choose a cell in the middle of that region. Thc methods available for cell selection are ossentidly the same as those described in $4.3 for codon formation (figure 5), but the arguments for and against each method are different in the present context. The methods are discussed separately. 5.1.4. Outpul cell selection :particularities The final sbate of the output cell airerent synapses has been defined by the qweceding theory: they must have strength which varies with P(Qlcj),each c,. Tliere is therefore not the distinction bctween different models for output cell selection that there was between models (1) and (2) of ligurc 4 for codoil ibrmation. If some model of this kind is used, the synapses must initially a11 have some standard excitatory power, which gradually adjusts to become P(Q(c,).The exacb details of the way this happens will be the subject of 55.2, but the outline can be given here. First, the cell will fire only when a significant number of afferent synapses are active: so it will only be selected for a set of events most of which it can reoeivc. If there exists a single collection of common, overlapping subevents in its input, this collection will tend to drive the cell most often, and those synapses not involved in this collection will decay relative to those which are. Hence the cell will perform a kind of mountain climbing of its ouTnaccord. 'I'here are two possible arguments against this scheme: first, such n, system can only worlisuccessfully if there is just one significant mountain in the probability spacc over the events it can receive. This makes it rather bad at selecting a particular mountain from scver;tl, and responding otlly to events in that ; so the cell will not be very adcpt a t forming a specialized classificatory unit unless it is fed data in a very careful manner. Secondly, some disquiet naturally arises over the conditions rec-juircd fbr synaptic rnodification t h a t modification is sensitive to silnclltaneous pre- and postsynaptic activity. The Q-cell dendrite will need to collect frorn a wide range of c,-cell axons, and will therefore be mucl.1larger than the c,-cell dendrites. I n such circumstances, it is far from clear that these conditions are realizable. The most reasonable liinds of hypothesis for synaptic modification by a combination of activities ill pre- ant1 post-synaptic cells concern activities in adjacent structures, not elements up to I mm apart. There are therefore some grounds for being dissatisfied with model (1) of figure 7, even supposing the mountain-climbing details turn out ill a favomable way. The second model (figure 7 (2))is based on some kind of climbing fibre :tndogue. It is of course not a, direct copy of tlre cerebellar situation, sincc thcre can exist no cerebral analogue of the inferior olivary nucleus. It works thus: suppose there exists a single collection of' cornmon, overlapping input events in the input space of a, and lct a, be one of the input fibres involved. Then most of the cilxsed for such events will occur frequently with a,, since a, is itself rrequently involved in such events. Now slxppose a,, as well as reaching 92 through orthodox evidencc ac~lls,also A theory for cerebru.1 neocorlez 217 drives a clinlbing fibre to 8:then this will cause the n~odificationof most of the ci-ccll synapses used in the collectio~lof frequent events. The cell 8 will then be found to have roughly the correct values of P(Qlc,) for most of the c,, and the find adj~xstmentscan be made by the same methods as were used in model (1). E'ra~rr,~.: 7. TI\-o models for output ecll selection. (1) Uses Brrndlcy synapses, ( 2 ) I ~ S C R Hchb synupscs and s clrnlbmg fibrc (CE'). 1x1 other words, the effect of tying modification coriditiorls iaitially to a cljmbing fibre driven by something known i o be correlated with the events ofa mountain is to point the output cell $2 a t that rnou~~tain. The use of' a climbing fibrc therefore, as well ;ts eliminating difficulties about the irilple~nentationof synaptic modifioation, also removes the coaldition needcd in model (1) that there should exist just one mountain in the event space to which SE is exposed. With the clirnbing fibre acting as a pointer, there can he as many as you like: the only condition is that thc more there are. the more specific the pointer has t o be. 5.1.5. Driving the c1irnbil~gJCibr.e The exact details of both these techniques will be analysed in $ 5 . 2 , but before leaving this section, it is worth discussing the kind of way in which the climbing fibres may be driven. One possibilit,y is the metl~odalready nientioiit:d, where the climbing fibre is clriven by onc of the input fibres of the event space of a. This will do for many purposes, but i t inay not alwt~ysprovide a specific enough pointer. The alternative method is to drive the $2-cell by a cljmbjng fibre whose action is more localized in the event space 3i for 92 than the simple fibre a,. In this scheme, the climbing fibre is driven by a cell near the Q-cell, arid one wtlich consequently 218 D. Murr fires only when tbcre is considerable evidence-cell activity near Q. This cell then acts as a more specific pointer than a simple fibre would, and is called an output selector cell (see figure 8). It is an elementary refinenlent of this idea t o have more than one climbing fibre attached to a given cell Q, which then requires activity in several to be effective in causing synaptic rnodific:~tion.The crucial thing about the climbing fibre input is that it shoulti provide a, good cuougll rough guide t o the cvcnts a t which G! should P C G L J8.~The ~ I Cfundanicr~titalrrc~nral rnotlrl, o b t a ~ n c dbv combrrloly tllc modcls or iigurcs 6 anrl 7 ( 5 ) . 'l'wo clnnb~ngfibres arc slro\vn; orlcxfrom an inpllt, fibre, arrd one fkorn a n ~ a r b y o t ~ t p ~selt,d,or lt, ccll T . look for 8 eventually to be itble to discriminate a single rnoniltain from the rest of its event space. T t is important also to note that this kind of system can be used directly to discover new classificatory units. As long as no codon formation is required, climbing fibres can caudc tho disnovcry of mountains- 4 . e . new olassifi cntory unitsdirectly on thc incoming information. Provided that the connectivity is suitable (i.t.. that inforination gets brought together in roughly the corrcct way), new classifcatory units will form witllout the nccd for any intcrrnediate storage. A theory for cerebral neocortex 5.2. The spatial recognizer efect 5.2.0. Introduction 'I'he process central to the formation of new classificabory units is the discovery that events often occur that arc similar to a given evcnt over a suitable collection of fibres. 'Phis was split in $1.4 into the partition problcm, which concerns the choice of roughly the correct collection of fibres; and the problom of selecting thc npln-opriate collection of events over those fibres. The second part of this problcm has becn discussed in conncxiorl with ideas about mountain climbing, and an informal description of the solutiour has becn given in fj6.1. l'he essence of this solution is that an output cell performs the mountain climbing process naturalljr, and if started by a suitably drive11 climbing fibre in roughly the correct region of the event space to which it is sensitive, it will ultimately respond to tk~cevents in the l-tearkrymountain. In this section, a closer exa,mination of this process is nmdc. 6.2.1 . Notation :the sta.r~dard(E, N)-plateau l'hc notation for this scctiour will be slightly different from usual, since tho output cell S2 is sensitive to events E over Z only in terms of the evidencc functions ci ( I < i < K).It is therefore courvcnicnt to construct the space of all cvents of'size Ic over the set {c,, ..., cz,) Each input evcnt E over Z is translated into an event Y = Y ( E )over 9,and for the sake of simplicity, it will be assumed that each input event E causes exactly k of' the c, (1 6 i < Pi) to take the value 1. As far as Q, is concerned, the cvents with which it has to deal occupy a code of size E over ( c ~ .,.., cK}. The ci are imagined to bc activc in translating input events for many output cells other than 8, and this allows thc further simplitying assumption that all the cj are 1 about equally often: that is P(c,) = P(cj), all 1 6 i < R.Only those evcnts which occupy Ic fibres concern SZ, and the relative frequcncics of these are described by the probability distribution A* (say) over 9.A* is the probak)ility distribution the environment induces over 9,and is derived Srorn the input distribution h over 2. Both h and A* have mountainous structure, but if y) is obtained from Z by a codor] transformation, the mountains in !I) arc more separated than theii- parents in Z. The term 'mountain' has hitherto had no precise definition. It is not known exactly what kinds of distributiour are to be expected, so some kind of general function has to be set up out of which all sensible mountains may be built. This is what motivates the following D~jinilion.Let y be the probability distribution over 9 defined thus: let 111 < JC, /u.(Y)== 0 otherwise. Then ,u is a standard (Ic, M)-plateau over c,, ..., c,. 1). Marr 220 'I'hat is, f i ascribes a constant, value to the probability of every event which gives c, = 0 for ell fibres outsidc some choscn collection {c,, ..., c~,). The collectiorl {c,, ...,c,) is called thc support of the plateau, and is writtcn S(,u).A sirnple mountain p* is one that can be built up out of platcaux piwith nested supports: i.e. where P C wi i= = 1 and S(pl.,)2 X ( / C ~3) S(/hp). 1 In the absence of any bettcr guesses about what kind of distributions should bc studied, this section will dcal with simplc mountains. The fact that they can so simply be constructed from standard plateaux means that i t is in fact enough to study the properties of standard plateaux. Yurther, we shall consider plateaux over the event space generated by the codon functions for a given classificatory unit, rather than plateaux ovcr thc cvent space generated by the input fibres. This is because the crucial operations occur a t the output cell, which receives only evidence fibres. 5.2.2. Climbing Jibrcs and modijicution conditions Without loss of generality, it may be assumed that the output cell 8 receives only one climbing fibre, which will be represented by the ft~nction$ ( t ) of timc. Q cannot in general be regarded as a function from g) to (0, 1) since 4 may take the value 1 a t a timc when tl~creis no event in 9.Some kind of rclation betwcen Q, and tlbc cvents of ,E> has to be assumed; it is that the co~~ditional probability P(#jc,) is well-defined and inde-pendcnt of time. 'I'hc climbir~gfibre input to Q . is closely rclated to the conditions for synaptic modification a t 8,but tilere are two possible views about the exact naturc of' this relation. One is that the climbing fibre is all-imporlant in determining tke strength of the synapse fronl c , to S , and on this view, the strength varies with P($lc,). l'hc cell Q really diagnoses $5 if this is so, hut it will be shown in $ 5 . 2 . 3 that if the structure of A* over 9 is appropriate, this will be adecyuate. The othcr possible view is that q5 acts as a pointer for a.On this model, the efrect of $4 is to set the values of the synaptic strengths a t P($lc,) initially. T l ~ true c conditions for synaptic modification arc simultaneous pre- and post-synaptic activity. I t is a; little difficult to see how tl20 climbing fibre should be dealt with after it has sot up the initial synaptic strengths, so in the theory of $5.2.4,it is regarded simply as doing this, and is then ignored. This is an approximation, but scems the best one available. The true situation probably lies somewhere between those described in s s 5 . 2 . 3 and 5.2.4. 5 . 2 . 3 . Mountaitz ssvlrction with P(LI,,jc,)= P ( $ ( c , ) lJct [ p , qj denote the plausibility rangc of 8.The state of a ' s afferent synapses can be represented by the vector w = ((2, ...,o K )whcre w" P ( Q / c , )and , it is assumed for this model that w is fixed-that the climbing fibre is the supreme determinant of 221 A theory ,for cerebral neocortex thc synaptic strengths. Let X E W . Then X has ~t representation as a vector Y = ( Y 1 , ..., Y K )E '$ with j exactly Ic of the Yi = I, and all the rest zero. Let . denote the scalar product of vectors it1 the ~xsualway: that is o.2' = Z;w7Y%Then tlze i cell S?, responds to X iff C c , ( X ) P($/c,) 2 Icp, i.e. iff w . Y 2 Icip. Hence A', the set or events to wl~jch8 respontls, is given by Ths following example shows how this m a y work adequately in practice. Let p denote the standard (k, N ) plateau on (c,, ..., c,?,), M < li, and let 11denote the stauclard (k, N)-plateau on {cS+,, ..., cs ktn;) where 1 < A < M < 9 + N < K. Suppose Q = c,. If the input distribution = p we have (dl < i =0 If A:*== v..ivehareP($lc,) = O , < K). all i > t. Ilencc if the lo~?-er limit ofthe plausibility range 123, ql of SZ is p = k-I (Scc+ (1; - s ) P ) , the cell 92 will respond Co l3i f and only i f p ( 8 ) =/= 0. Thus the output cell 8 has selected t l ~ cmountain /L from the distribution A" = ; ( / L i- 11) evcn though the climbirlg fibre $ did not. This is the crnci:tl property which the systcrn possesses. I n general, if Q = cl, $ will select tjhtbevents of any pl:~toau conta,i~~ing c, in its supl)ort, and can therefore kte madu (by suit,able ehoicc of p) to reject all cverlts of other p1:tteaux which do not h l l into such a, p1atc.art. 4 The: rc:latiori ( 1) c;rrr hc r~sccft o corlsi,r~~ci; thc explicit co~ldit~iolr that; u clirnbing fibre inrh~crS2 t o respond t,o i x p n r t i o l i l i ~set ~ of cventjs. I f w is the cliulhing librc? v t ~ c t o r w -= (1'(Qlc,), .. ., I ' ( ~ ~ JL LcI ~ IK ~)n) {XIw . Y 3 /q), t-llorlsl os,nsc,lc:ct the events N o l ~of t {X, h)iff /\(NSz il N) 0; j.o. t,lio prob:J)ilil.y nndcr t,ho input clist,ribnttiouh tha,t tt7l cocnt occurs which is ill cxa,ct,lyorlo of N,, .V is zcro. cat1 - - 5.2.4. T h e spal.ial r e c o g n i z e r cjfcet I n thc more goileral case, 45 acts as $5 starting condition rathcr than perrnt~nentl~ defining the strength of thc synapse from r:, t o Q. ( 1 < i < li).The subsequent strengths of' these synapses dcpend on and only 0x1 P(Qjc,). M'ritc P($ I c,) = wg, 1 < i < K and let 0 = kpl'(c,). Since P(c,) = P(c,), all 1 < i ,j < I R ((15.2.1),thc initial firing condition for 8 is simply CQ$:, (S) 2 (1, As bcfme write w, = ((I):, ..., w,:.') as a, vector: w, i tlefincs tlze state of the afferent 222 D. Marr synapses to Q. I T Y is the usual vettor (consisting of 0's and 1's) which roprcsonts the event X over {c,, ..., c,), the firing condition for 8 is The diEerc.ncc here is that w is now a variablc. 7'1.1~ point is that the vector w dcpcnrison the input distribution A, nr~clonthosc~cvents l,owhich (by (2))Qresyoncls. Dciinc. qi(wo)E: W by %(coo) = ( X I w,, . Y 0). Defiiic the now vector That is, the co-ordin:ttcs (1): of w, are simply the projections onto the c, of the restriction AjNo(wo)of h to No(wo).Thcn w, represents the state of the synapses fl'on~the c, to Q if responds only to the events in No(wo). Fralriee 9. 'L'llc stair: vector w j , which describes t l ~ cstrengths o r tho nffc)rerlt syrrnpscs t o tlu: o u ~ t p u cell t a, tlcterrnirles t h e set N o ( w j )of events t o wlricll !2 will respond. '('his iil 1,111'11 deterrriirlcs n new s t a t e vector wj+,.Equ~ilil)riurnoccurrs when w j = w j + l . 'I'hesituatior1is thus thi~tinthe state o,, thc cell 8rcsponds only to eventsinN,(s,) . exposure to such evelits anifiybe cxpccted to changc the st ate vector o, into w,, from wl~erethe proccss is rcpcatcd. This gcncnztes a seriesof succcssivc transforruatio~lsof :md this is called tbc spaticrl recognix~r~ f l e c t(see ligwrc 9). the state vector w for 8, ,. 'I1h(wrewz.'rho stato vector acklicvei c q u r l i b r ~ ~ i~f fr nthc1.c exists n j such tll:tt w , = w , Proof. I n eyurltbrrum, t h e set 01cvcrlts .%(w,) t o whrcl.1 tho ec.11 Q responds spccifirs n s t a t c vcetor w,+,sllekl t h d h(NH(w,) A N H ( w , , = 0: h ( ~ ~ ~ ~ ( ,coe aordrnate ch 01( w ,- w, ,,) 1s tho projectiorl onto n c, 01hjNo(w,)A iVo(w,,,I. anti SO 15 zcro.'Thus w:,- w,+)- 0, , ~ r l t i w, = Wl+l. simple example A* = ~ ( I + L v) of 5 5.2.3, equilik)riumis achicved in cx;ictly onc step. As already observed, w, is defined by I11 the n ~ =i 1 ( i = 1) = a (1 < i < = p =, < 4 ) =O For p 1 where n-' = 13(c,),a nd is constant. (M<i<IC) - k-l(fla + ( k - #)/I), the c c l l 8 responds only to those cvcnts A' with /L also l ~ a v cv O so that w, has thr. following specification. = - no$ = 1 ( I < i < 8) 1 0 A theory for cerebral neocortex and w, = o,. This result extcnds to any simple nlountainp*, /L* whore I', = ci E X(P'~) = S(pI)is an element in it>ssupport. 223 + ... + w,y,], = w,/L, 5.2.6. A yeneral chn~acterizalionof the recognix~rejfect 4 t is natural to scck sornc elegant way of describing the spatial recognizer effect. In thc following informal argutnent, a characterization is given in terms of a search for steepest ascents in 9 under A'$. This effectively puts a stop to any attempt to produce a necessary and sufficient condition that the starting state w, should lead to a particular final state w*, since the general question depends upon the detailed stnlvtnre of A. The answer that i t does if and only if a line of steepest ascent leads F~c.r-nc10. Tht state vcctor o clctrrlllincs the seL Al0(w) 01 ~ v o n t sto urh~chS2 rc~sponds.Tlir cm\~iro~rmeutel probohil~tydistribution over NB(w)IS stippled where ~t has non-zero distributiorr coirrcidc velurs. w changcs so as Lo tmd to rnake the contrc of g~3evityof t h ~ b with tho centrr of NO(w).This 1s thc principle behind Q's ability Lo prrform i~mountainoperation. eL~~nhirrg there is probably its own neatest characterization. It is convenient to make the restriction that p, the lower lirnit to the plausibility range for $2, is variable, and varies to keep the average amouilt of activity of $2 constant; i.e. p is such that P J dA is constant, for all response neighbourhoods NO(oi)(dcfined by equation (2) J\'~(w&) of 9 5.2.4). Write hT, (w) = (,Y) Y . w 3 0 ) (see figure 10). ca, moves to w, giver1 by the projections onto the c, of the restriction AINO(w)of A to rV,(w). (Compare (3) of $5.2.4.)Kow w, effectively measures thc centre of gravity so t o speak of the events in No (w)since if w, = (w:, ..., (I):'), oi varies with the expected probability that ci = 1 in IY,( w ) under h. Since, in each event X of N, (o)exactly lc of the ci have the value 1, this means that the response area of 92 moves towards that region of N,(w) which contains the closest, most common events. S2 is attracted by both commoness, and by having many events close to one another all having non-zero probability. The way these two kinds of merit compete is approximately that the movement which maximizes the expectation of o . Y over No (o)is the one which is actually made: but the full result along these lines is complicated. Pn fact, the move is the one which has the bestbhance of maximiziilg this expectation. 224 D. Marr Thus o moves to climb gradients in the scalar function E(w . Y) taken over the response area defined by w. A proof of this result will appear elsewhere. 5.3. The rejinement of a classiJicatory unit The refinement of a classificatory unit is the discovery of such appropriate boundaries as i t might have. There are two kinds of information on which this process can be based: they are the frequencies of the subevents on which the unit is defined, and the correlation of instances of the unit with properties not included in its definition (i.e. support). The modification of a classificatory unit on the basis of its subevent distribution is called its intrinsic rejinement, and has essentially been dealt with in $5.2: alteration made as a result of comparison with external properties is called extrinsic rejinement, and will be discussed briefly here. General extrinsic refinement requires a simple memory; but it basically consists of the same kind of mountain climbing techniques as intrinsic refinement. The only piece of the problem that can be discussed a t the moment is the hardware needed for it. It is appropriate to deal with this now, since the necessary machinery must appear in the fundamental neural model. There exist three main strategies for the extrinsic refinement problem: they are characterized by the change during refinement of the number of sixbevents to which the output cell 8 will give a positive diagnosis. This number can increase, decrease, or remain about the same. The basic point is that the strategy which requires the number to decrease is the one which is easiest to implement, since it is easier to remove events from the response area of 8 than to add them. This is because the only way of adding an event to 8 ' s response area is by stimulating the climbing fibre. This needs some way of gaining access to the correct climbing fibre cell. The models of § 5.1 for output cell selection make this difficult, since one of the key points in their design was the absence of a special climbing fibre for each output cell, and alternative schemes are unacceptably complicated. The other possibility for adding events to 8 is to use an associational path to 8 itself (for example, the basilar dendrite afferents of figure 5): but it was thought (s4.1.8) that the associational activity of the 8-cell should not have this kind of ability to influence the strengths of the synapses arising from more direct inputs. Finally, there can be no guarantee that the existing evidence fuiictions for 8can cope with a new event. Given these difficulties, it is natural to examine the possibility of refining a classificatory unit by eliminating inappropriate events from its response field. The main advantage such a method produces is that a general inhibitory influence acting over all output cells (in a particular region) can be used to alter values of P(81ci) for one particular 8 in a way in which a general excitatory influence cannot. For suppose the event E is to be cut out: this must be achieved by allowing E to enter the c,-cells for 8 while preventing the formation of modification conditions a t 8 itself. If the chance that E should be interpreted in a cell near S2 is small, this effect can be achieved by applying a general inhibitory signal to all the output cells in the region A theory for cerebral neocortex 225 containing 8.Hence the only additional hardware this method requires is a fairly non-specificinhibitory input to all output cells. This does not appear in figure 8, since its derivation from the theory is less firm than that of the other elements which appear there. $6. N O T E SO X THE CEREBRAL NEOCOBTEX 6.0. Introduction The present theory receives its most concrete form in the neural model of figure 8. I n this section, the fine structure of the cerebral cortex is reviewed in the light of that model. Anyone familiar with the present state of knowledge of the cerebral cortex will anticipate the sketchy nature of the discussion, but enough is probably known to enable one to grasp some a t least of the basic patterns in the cortical design. It need scarcely be said that cerebral cortex is much more complicated than that found in the cerebellum. Nothing of note has been added to the researches of Cajal (191I ) until comparatively recently (Sholl 1956; Szentagothai 1962, 1965, 1967; Colonnier 1968 ; Valverde 1968),because Cajal's work was probably a contribution to knowledge to which significant additions could be made only by using new techniques. Degeneration methods have since been developed, and the electron microscope has been invented; so there is now no reason in principle why our knowledge of the cerebral cortex should not grow to be as detailed as that we no-\$,possess of the cerebellar cortex. It is, as Szenttigothai (1967)has remarked, a Herculean undertaking; but it is within the range of existing techniques. 6.1. Codon cells in the cerebral cortex 6.1.1. Tibe ascending-axon cells of iWartinotti The main source of information for this section is the description by Cajal (191I ) of the general structure of the human cerebral cortex. The codon cells of the cerebellum are, according to Aiarr (1969), the granule cells. whose axons form the parallel fibres. The basic neural unit of'figure 8 has analogies wit11 the basic cerebellar unit (one Purkinje cell, 200000 granule cells, and the relevant stellate and Golgi cells, in the cat), so it is natural to look for a similar kind of arrangement in the cerebral cortex. The first point to note is that cerebral cortex, like cerebellar cortex, has a molecular layer. According to Cajal (p. 521) this has few cells, and consists mostly of fibres. The dendrites there are the terminal bouquets of the apical dendrites originating from pyramidal cells a t various depths. Most pyramidal cells, and some other kinds, send dendrites to layer I, so there is a clear hint in this combination that some such cells may act as output cells. The great need is for the axons of the molecular layer to arise mainly from cells which may be interpreted as codon cells. Cajal himself was unable to discover the origins of the axons of the molecular layer, and probably believed they came mainly from the stellate cells there. The problem was unresolved until Szentjgothai (1962) invented a technique for making small local cortical ablations without damaging the blood supply, and was a t last able to determine the true origin of these mysteriorns fibres. It is the ascending-axon cells of Martinotti, which are situated mainly in layer VI in man. This fundamental discovery showed thae the analogy with cerebellar cortex is not empty, for the similarity of the ascending-axon cells of Martinotti to cerebellar granule cells is an obvious one. There are, however, notable differences; for example, the Martinotti cells are much larger than the cerebellar granules; and in sensory cortex, primary afferents do not terminate in layer VI. The interpretation of Martinotti cells as cerebral codon cells raises five principal points, which will be taken separately. The first is the cells of origin of their excitatory afferent synapses. There is unfortunately rather little information available about this, but it appears from Cajal's description that the following sources could contribute fibres : (i) The collaterals of the pyramidal cells of layers V, VI and VII. (ii) Descending axons from the pyramids of ZV. (iii) Collaterals of fibres entering from the white matter. (iv) Local stellate cells. f t would best fit the present theory if intercortical association fibres formed their main terminal synapses with theso cells, and the collaterals of the pyramidal cells in layers V to VZI were relatively unimportant. There is some evidence that association fibres tend to form a dense plexus in the lower layers of the cortex (Na-trta1954; and Cajal 191I , pp. 584-5). The second point is that the Martinotti cells would have to have inhibitory afferent synapscs driven by the eyuivalcnt of the G-cells which appear in figure 8. The cKect of these synapscs should be subtractive rather than divisive, so that to be consistent with the ideas about inhibition expressed in $4.1 on output cell theory, the synapscs from the G-cells should be distributed more or less all over the dendrites of the Martinotti cells. (There is some evidence that this is so for certain cells of layer 1 V in the visual cortex of cat (Cololmicr 1968), but it rests upon an as yet unproved morphological diagl~osisof excitatory and inhibitory synapscs.) This is in direct contrast to what the theory predicts for output cells, a distinct fraction of inhibitory synapses should be concentrated a t the soma. whose The third point concerns the possible independence of the dcndrites of Martinotti cells. 'Chesc cells commonly have a quite large dcndritic expansion, and it may be to expect much interaction between synapses on widely separated branches. The effect, if their affercnt synapscs are unmodifiable, is to enable the cell as a wl.lole to operate as the logical union of m, (R, 8)-codons (where m is the number of independent dendrites) instcadof as a single (m72,H)-codon:the advantage of this is a better quality of evidence function. The fourth point concerns the possibility that the excitatory afferent synapscs to Martinotti cells may be modifiable: this has been discussed in $5.1.2. If these synapses are Brirldlcy synapses, then the dendrites may be independent from the A theory for cerebrcxl neocortex: 227 point of view of spiiaptic modification, as well as in the way described rn point three. If there is some kind of climbing fibre, arrangement, the fibres must be driver1 from sotno external soarce, end rnust be allo~vcdto operate only when cotlon formati011 is required. The second possibility could allow the i~odiiicatioi~ condition t o operate sirn-t~ltancouslgover l,hth whole cell. I t has becrl seen, hoavcver, that climbing fibrcs arc unlikely to b t 11scd. If location selectioil proceeds :is described in $5.1.1, t,hc Martinotti agerent synapses are modifible only during the c,orrect phase of sleep. Fifth wnd 1 ~ 1it, is a simple conscqrrence of thc present tbcory that Xartinotti cells should be excitatory, and should send nxons to synapse with five types of ccll: the output cells, whoso ordinary excitatory aKerent synapses are unodifiable; {,hetwo types (5' and D ) of inhibitory cell; the Jfartinotti threshold controlling cells. the G-cells; and perhaps output ccll seleckor cells, tv21ose axons terminate as climbing fibres on output cells. A 1lr~r.tinottjaxon itlay itself under certain circumstances terminate as a climbing iibrck as well as making crossing-over syiiapscs with output cells; but this possibility may be excluded for devclo.prneuta1 reasons. 6.1.2. The c c r ~ b r a gl rrtr~ulcc(j11s la leyer I V of granular cer-ebr~al cortex. there are found a large number of small stellati: cells, 9 to 13pm in diameter, whose fine axons end locally. This layer i~ especially well developed in priniary sensory cortex, ~.vherc> it sees the terrniriation of tlzc majority of the aff'erent scrrsory fibres. rt has long been believed that such fik~ress ynapsrd mainly with tlle grennrlc cells (Cajal 191I ) . Rzenthgothsi (1967) has, Ilowever, pointc:d out that many sensory affert~ntsin fact terminate as clirnbiilg fibres on the dcaclritic shaft4 passing through 1V, i ~ n dbelieves this may be on impor.i ant method of' terrniaation. Valvcrde (1968) has n~odea qrranCitativc shudy of the amount of terrninel degcneratioll in the different corticnl layers of area 17 of mouse after crzuclcation of the contralateral eye, and has demonstrated that about 64 % ooours in layer JJJ, the other principal contvibutions being from the a d j a c e ~layers t 111and V. In view oi' the abundance of' granule cclls in laycv IT, it is difficult to irnagiirc that the affercilt tibres ncvor synapse with thcr~,,and so likely that the traditional vie\$-is oorrcct. Thcrc call be rto doubt that alffcrcnts also tclr~ninateas clirnbing fibres, and the possik~ilitythat both thesc things bsppon fits very neatly with the predictions of the preser~ttheory. These views support the interprct~~tion of the grannle cells as codell cells, in which case the rerrlarks of $6.1.1 about &tartinotti cells ni:~ybe applied to tlrenx. An interc~stingcharacteristic of granule cells is that they are often very close to raw sensory infbrmation, in a way in which the Martinotti colls arc not. They will therefore not suppore classificatory units which rest on rnnch preceding cortical analysis- -that is, classific-ltory units for which, if i t occurs a t all, coclon formation is most likely to be usecl. Tllc tfleory therefore contains tire slight l ~ i n tthat, the Martinotti cclls may be tlic plastic codon cells, and the granule cells the pre-formed codoil cells. The consequence of this would be that the Martinotti cells have modifiable afferent excitatory synapses, while the granule cells have unmodifiable afferent synapses. 6.2. T / Lcerebral ~ output cells The present theory requires that candidates for output cells should possess the following properties : (i) A dendritic tree extending to layer I and arborizing there to receive synapses from Martinotti cells. (ii) An axon to the white matter, perhaps giving oilf collaterals. (iii) Inhibitory afferent synapses of two general liinds: one, fairly scattered over the main dendrites, and performing the subtractive [unction: the other cll~stered over the soma, performing the division. (iv) Climbing fibres over their main dendritic trees. (v) A m i x t ~ ~of r cmodifiable and unmodifia1)le afferent synapses. TIiose synapses from codon cells- Martinotti and granule cells-should initially be ineffeotive (or have some fixed constant strength), but shotlld be facilitated by the conjunction of pre-synaptic and post-synaptic (or possibly j ~1stclirnbing fibre) activity, so that the find strength of the synapse from c to 8 varies with P(i2lc).l'hese synapses should certainly be modifiable during the course of ordinary waking life, and should probably be permanently modifiable. The cortical pyramidal cells of layers 111 and V arc. the rnost obvious candidates a nd (iv),and (ii)(SzentAgothai for thisr6le. According to Cajal(191I),they satisfy (0, 1962).'l'he evitd(1ncefor (iii)is indirect, but these ecllsrcceivc axosomatie synapses of the btzrket type, and thrsr have been shown to be inhibitory wherever their action has been discovered, (in the 21ippocampus (Anderson, Ecclcs & 1,oyning 1963), anci the cerebellum (Anderson, Mccles & Broopll-loeve 1963)).Various liinds of short axon cell e.xist in the cortex; there are probably cnough tJo perform the su1)traetion func.tiol1 ($6.4). The axon collaterals of thew pyramidal cells could perform two functions. Hither 1hey can thernselvc.~act :ts input fibres to nearby Martinotti cells; this would enablo two successive classifications to be performed in the same region of cortex. Or they could act as association fibres, synapsing with the basilar dendrites of neighbouring output cclls. This would be usefill if nearby cclls dealt with similar information, but not necessarily useful otherwise (M;~rr1971 ). 6.3. Cerebral cli.rr~hing$bres One of the crucial points about the output cells is that they should possess climbing fibres. The various possik)le sources of these were discussed in $ 5 , where it was stated that tbere rnight be two origins-afferent fibres themselves, and cells with a local dcndritic field. Tht? first observation of cercbral climbing fibre cells was made by Ca,jal (191I ) , wllo describes certain cells with double protoplasmic boucluet, as follows. "I'he axon filamrllts [of these cells] are so long that tl.ley can extend over the whole thickness of A theory for cerebral neocortex 229 the cortex, including the mo1ecnl;~rlayer.. .. If one examines closely one of the small, parallel bundles produced by the axons of these cells, one notices between its tendrils an empty, vertical space which seems to correspond in extent to the dendritic stern of a large or medium pyramidal cell. Since the axon of one of these doubledendrite neurons can supply several of these small bundles, i t follows that it call come into contact with several pyramidal cells,' (pp. 540-541). Cajal saw these fibres only in man, but Valverde ( I 968) has beautiful photornicrographs of some coursing up the apical dendrite of a cortical pyramidal cell of the mouse, so they clearly exist in other animals. Szenthgothai (1967) has found that various types of cell can give rise to such fibres, and remarks that specific sensory afferents often terminate in this way. l'he cortical cells which give rise to climbing fibres have been called output cell selectors. The theory requires that they possess a rather nonspecific set of afferents, so that those cells in the centre of an active region of the cortex receive most stimulation. Such cells may also possess afferent inhibitory synapses to prevent their responding to small amounts of activity. 'She present tl~eorydoes not favour the view that cells other than output cells should possess climbing fibres, but it does not absolutely prohibit it. (5.4. Inlzibitory cells The basic theoretical requirements for inhibition in the cerebral cortex would be satisfied by having three types of inhibitory cell. 7'wo should act upon the output cells, one synapsing on the dendrites, and one on the soma; and one, the analogue of the cerebellar Golgi cells, on the codon cells. 6.4.1. The subtractor cells 'I'he first place in which to look for inhibitory cells fbr the subtraction function is the molecular layer 1,whero the Martinotti axons meet the pyramidal cell dendrites. This layer does contain soino cells: it is wrong to believe that it consists of nothing but :txons and dendrites. Cajal remarks upon the abundance of short axon cells therch, stating that in number and diversity they achieve their maximum in man. He distinguishes (pp. 524-525) b u r main types; ordinary, volnrninons, rcduced, and neurogliafhrm. The last are like the dwarf stellate cells which appear frequently in other cortical layers. The short axon cells can be interpreted as performing the role of subtraction on the output ccll dendrite. Thoy and their homologues are common throughout the cortex. The sm:~llsize and great complexity of many of their axons and dendrites enable them to assess accnrately the amount of fibre activity in their neighbourhood, so it does not require undue optimism to imagine that thcy can provide about the correct amount of inhibition. For this purpose, the more there are of such cells, the smaller and more complex their axonal and dendritic arborization, the more accurate will be their estimates of the amount of inhibition required. The neurogliaforn~cells therefore seem most suited to this task. D. Marr 6.4.2. Th,c d i v i ~ i o ncells The requirements of cells providing inhibition at a pyramidal ccll soma for the functioii of divisioii are different. Their action is concentrated in one place, and does not need to be acclrrately balanced over the dendritic fkld in the way that the subtraction inhibition must. The clivision inhibition can therefore be provided by a sampling process with convergenc3eat the soma. The details of this sampling must depend on the dihtribution to the Martinotti and granule cells of the aft'ereilt fibres, and arc based on the same principles as govern the distribution of the ccxrebellar basket cell axons. There is no doubt $hat the pyramidal cclls of 1:iyers 1111 and V possz~ssbasket synapses (Cajal 191I ) ,but Cajal does not describe them for those of layer 11, m~llich otherwise look like output cc-11s. Colonnier (1968)has however studied the pyramids olI11 in area 17 in some dctail, and has shown that, while synapses on the somas of thesc cclls are not densely packed, they do exist, and are exclusively of the syrnmetxical type with flattened vesicles. Pt wo~rldbe interesting to have somc comparativc quantitative data about somatic synapses on pyrarnidd cells of different layers in the cortex. 6.4.3. !L1hec o d o ~cell ~ th~esholdconi~ols The control of thc Marti~wttiand granule cell thresholds requires an inhibilory cell which, like the cerebellar Golgi cell, is dcsigned to produce a roughly constant amount of codon cell activity. There are various short axon cells in layers I V and VT which rnight perform this rBle, but no evidence availablc about the cells to tvbich they send synapses. The obvious candidatcs in PV are the dwarf cclls (Cujnl 191I , p. 565) and perhaps the horizontal cclls; and in V l , the dwarf cells and stclliite cclls with locally ramifying axon. For the coiitrol of Bhrtinotti cell thresholds, it seems probable that the device of an ascending denchritc should be used to assess the amount of activity in the molecular layer. ?'his could be done, for example, k)y an inhibitory pyramidal or fusiform cell with bnsilar and ascending dendrites, and locally arborizing axon. Such a cell would possess no climbing fibre, nor any modifiable afferentsynapses. There exist various fusiform cclls in layers V 1 and VTI which might do this, but therc: is too little data available to know for certain. 6.6. Generalities The theory expects output cells to fire at different frequencies, and it expects output cells at one level to fbrm the input fibres for the next. It is therefore implicit, in the theory that input fibres a; ( t )should take valucs in the rsnge [O,1], and should not be rcstricted simply to the values 0 and 1. The theory has been developed hcre only lor the simple case of binary-valued fibres. Its extcnsion to the more geileral case is s technical mattcr, and will be carricd out elsewhere. Finally, it is unprofitable to attempt a comprehensive survey of cortical cells at this stage: neither the theory nor the available facts permit more than the barest A theory for cerebral neocoriex 231 sketch. It is most unsatisfying to have to give such an incomplete series of notes, and 1 write these reluctantly. It does, however, seem essential t o say something here. It both illustrates how the thcory may eventually be of use, and indicates thc kind of infornlation which i t is now essential to acquire. More notes on the cerebral cortex will accompany the Simple Memory paper, but until then, i t seems better t o err on the side of reticence than of temcrity. 7.0. Introduction I n this section are summarized the results which are to be expected to hold if the theory is correct, together with an assessment of the firmness with which the individual predictions are rnade. The firmness is indicated by superscripted stars accompanying the prediction, the number of stars increasing with the certainty of the statement they decorate. Three stars*** indicates a prediction which, if shown to be false, would disprove the thcory: two stars** indicates that a prcdiction is strongly suggested, but that remnants of the theory would survive its disproof: one star" indicates that a prediction is elcar, but that its disproof would not bc a serious embarrassment, since other factors may be involved; and no stars indicates a prediction which is strictly outside tho range of the theory, but about which the thcory provides a strong hint. 7.1. Nartinotti cells Each Martinotti cell should have many inputs***, mainly from intercortical association fibres**, which should terminate by means of excitatory synapses***. Each should also have inhibitory inputs***, subtractive in effect*" and therefore widely distributed over the dendrites*". These should be driven by local cells*** with locally arborizing axon"**, designed to keep the amount of Martinotti cell activity evoked by different inputs roughly constant*". Excitatory Martinotti cell afferent synapses are probably modifiable*, and if hhey are modifiablc, thcy are probably Brindley synapses*, becomiiig modifiable only during the correct phases of sleep*. If location selection proceeds as in $5.1.1, and if these synapses are modifiablc, then thcy are modifiable only during the correct phases of sleep**". Martinotti cell dendrites are probably independent. ' l ' h ~output from these cells is excitatory***, and goes to output (pyramidal) cell^*^::^ through modifiable synapses*'g*, three** kinds of inhibitory cells*** through unmodifiable synapses**", and to output selector cells** through unmodifiable synapses. 7.2. Cerebral granule cells These cells fall broadly into the same class as Martinotti cells, and the predictions concerning them are the same, with the following exceptions. Their input is mainly more direct than that of the Martinotti cells, and should (because of their smaller 232 D. Marr size) come from thalamo-cortical rather than cortico-cortical projections. They probably do not have modifiable afferent syiiapscs. I n the sensory projection areas, where afferents are known to terminate in layer 1V, these afferents bhould form thc main source of excitatory synapses on the granule cells*. 7 . 3 . Pyramidal c e l b The pyramidal cells of layers 111 and V, and probably also those of layer 11, are interpreted as output cells, in the sense of the theory. On the assumption that this is correct, they reccive two kinds of excitatory synapses**, and two kinds of inhik~itory synapse^'^*. The majority of afferent synapses comes frorri Martiiiotti and granule cells"*, almost a11 such cells making not more than one synapsc with any given p p m i d a l cell*". These synapses are either Hebb or Brindley type modifia,ble synapses***. The strength of the synapse form the codon cell c to the output cell 51, stabilizes at the value P(O[c)**.(This receives only two stars, since there may be a workable all-or-none approximation to this value.) These synapses should be modifiable during the course of ordinary waking life***, and probably during sleep as well*. All other afferent synapses described here are unmodifiable'k**. If the dendrite is large, there exists a second excitatory input in the forin of a, climbing fibre**. If there is no climbing fibre present, the other excitatory afferent synapses must be Brindley synapses***. The climbing fibre input, if it exists, can produce the conditions for synaptic modification in thc whole dendrite sirnnltaiieously***, but it is subseciuently not the only input able to do this*. There are two ltinds of inhibitory input to the cell**: one scattered, which has the effect of performii~ga subtraction**, and one clustered a t the soma, performing the division*". At least one of these functions is performed"**, but the all-or-none approximation would require only one. Both essentially estimate the number of aEerent synapses from codon cells active a t the cell**4'. The output froin these cells is excitatory if it forms the input to a sl~bseqllent piece of cortex's*. Their axon collaterals synapse with neighbouring output and Martinotti cells. 7.4. Climbing Jibres These are prescnt only on output cells*. The climbing fibre a t a given pyramidal cell provides an accurate enough pointer for that cell for the spatial recognizer effect to take over and make the cell a receptor for a, classificatory unit'$:"4:. Climbing fibres arc excitatory***, if used for this purpose. 7.5. O1her short axon cells Many of the short axon cells which are not eodon or climbing fibre cells are inhibi'I'hc theory distinguishes threc? principal kinds**. Subtractor colls sample the activity of codon cell axoris near local regions of dendrite*:':, :sntl send inhibitory synapses to t,l\ose regionszk*.These have a subtractive eEe~t'~'X. P)ivision cells, the basket cells, are inhibitory*"; and so are cerebellar Colgi cell analogues, which keep the amount of c:odon cell activity about constant**. A theory for cerebral neocortex 233 The granule cell threshold controls receive excitatory*** synapses from either the granule cell excitatory afferents, or the granule cell axons***, and perhaps from both*. They send inhibitory synapses to the granules themselves***, and these synapses are scattered over the granule cell soma and dendrites*". The Martinotti cell threshold controls reccive excitatory*** synapses either from the Martinotti affcrcnts, or from the Martinotti ax on^**:^. I n view of the length of the Martinotti axons, they probably receive from both'",and therefore have an ascending dendritic shaft**. Layers VI and VII contain fusiform cells which could be Martinotti cell threshold controllers. The axonal and dendritio distributions of the inhibitory cells of the cortex depend on the distributions of the afferents, and of the codon cell axons, in a complicated way. 7.6. Learning and sleep* This section as a whole receives one star, but if location selection proceeds as in 5 5.1.1, and if there exist plastic codon cells, then it receives three stars. The truth of these conditional propositions cannot be deduced from the available data. Star ratings within the section are based on the assumption that both propositions are true. Bleep is a prerequisite for the formation of some new classificatory units***. The construction of new codon j'unctions for high level units***, and perhaps the selection of new output cells, takes place then, though the latter can** occur, and probably usually does*, during waking. Let $, and 3, be two collections of pieces of information such that many of the spatial relations present in 3, appear frequently in s l , and have not previously appeared in the experience of an animal. The animal is exposed to%,, and then tos,. If the exposures are separated by a period including sleep, the amount of information the animal has to store in order to learn$, is less than the amount he would have to store if the exposures had been separated by a period of waking***. This is hecause the internal language is made more suitable during the sleep, by the construction of new classificatory units to represent the spatial redundancies ins,. The recall of$, itself is not improved by sleep**. Conversely, if this effect is found to occur, some codon cells have modifiable synapse^'^'^. I wish to thank especially Professor G. S. Brindley, F.R.S., to whom I owe more than can be briefly described; Mr S. J. W. Blomfield, who made a number of points in discussion, and who proposed an idea in 5 1.5; Professor A. F. Huxley, F.R.S., for some helpful comments; and Mr H. 1'. F. Swinnerton-Dyer, F.R.S., for various pieces of wisdom. The embryos of many of the ideas developed here appeared in a Fellowship Dissertation offered to Trinity College, Cambridge, in August 1968: that work was supported by an MRC research studentship. The work since then has been supported by a grant from the Trinity College Research Fund. It E F E R E N C E S Andel.so11, P., Ecclcs. J. C. & Lclyning, Y . 1963 l%rbc.nrrentinl~lrbrtlollin the h~ppocampus Nature, Lond. 197, 640 642. wrth rdcritrficatrorr of' thc in!rrbltory crll and ~ t synapscs. s Andorson, T'., Ecclcs, J. ('. & Voorlloevc, I'. E. 1963 lnhibltory synapscls on sornils of' l'urk~nje eclls in thtb c.crcbcllum. Nature, Lorzd. 199, 666 656. Barlow, H. B. 1961 Possiblr prrneiplcs rnrdcrly~rlgl,hc trilrlsforrnations of sensory intbssagcis. I n Sensory Covcmu?xiratzon (Ed. W . A. Roicwblitll), pp. 217 234. MlT and Wilcy. Rlornficld, Stepherl & Marr, David 1970 How tlrc, cerebollnm may be used. Nature, Lorzd. 227, 1224 1228. Brindlcy, G. S. 1969 N i ~ r\7(. net models of plauaibli~sizil that perform rnany simplo learning tasks. Proc. Roy. Soc. L o d 13 174, 17:3-191. Cajal, S. It. 191I Histoloqze d u XystPme Nerveuz 2. Maclrrd. CSIC. Colonnier, M. 1968 Syrmptic pattc.rns on tlrfferc?rrtcell types irr the diff'vrilnt lamlnac of thil cat v ~ s u a cortcxx. l ,4n c%lectronmicroscopr3study. Brazn Ros. 9, 268-287. l ~a~neuronal rn rr~achzne.Brrlin. il:cc-Ics, J. (I., Ito, M. h 8~clrrtBgotha1, J. 1967 l'he ~ e r e l ~ ~ las Springer-Vwlag. Hebb, 19. 0. 1949 T h e orgatazeatzon oj behavzour, pp. 62 66. New York: Wiluy. Hubel, D. H . & Wlrscl, T. N. 1962 Itc~rc~pt~vc, fields, hinocalar ~ n t e r a c t ~ oand n functional arch~tccturi~ cortcx. ,/. l'hysiol. 1160, 106 154. rn thcl c ~ t ' vislial s Jardin(,, N. & Sibson, Ii. 1968 R model fix iaxonorriy. Math. Rzoscz. 2, 4(i5-482. Jardlnc., N . & S ~ b s o i1%. ~ . 1970 'Ch" rnoasur~rrrc~nt of d~ss~rnllarity. (Snbmittcd forpubllcat~on.) Kendall, D. C. 1969 Sorno problcrns and mc~thotlsin statistical archaeology. Worlcl Archaeology 1, 68 76. Kingnian, J. F. (>. & Taylor, S. J. 1966 It~ipodtrctzorcto meastcre and probabd~ty.Ci~nlbndgr Univc~rsityJJrcss. nal Psychometrzlca. 29, 1-27 ; 28-42. K r ~ ~ s k a.J.l , R. 1964 M u l t ~ d ~ m m s ~ osc*:~lnrg. cortr x. ./. Plqjszol. 202, 437-470. Rllarr, n w ~ d1969 A throry of cc~rrbc>ll:~r Marr, n a v l d 1971 Sirnplc Memory: a thcory lor arclncortc~-c.(Submitted for pnblication.) ~ n i n a ctistributlons l of somc :tTf(lrcmt fililr,rclsystc~nsiri the cclrcbral Nauta, W. J . H. 1954 '1"~~ cortox. Anat. IZec. 118, 333. I'etr~c,,W. M. F l i r ~ d i ~ s1899 , Siq~crtcosIn prcll~st~r~ric r.t~inains.,/.Anthrop. Inst. 29, 295-301. ltcnyi, A. 1961 Ou rntx:tsnres of entropy :sud i~rforrnation.I n : 4th Berkely S y m p o s ~ u morr filathematzcal Statistzcs and f'rohah?laty (Ed. J . Ncyrrr:m), pp. 547 661. Herlceley: Unlv. of C'alifornm l'rcss. Shannorr. C. E. 1949 I n The mathc?raat~caltkeorli oJ totritnunzcation, C. 3:. Shannon & W. W~avclr.Urbnna: Univ. of Tllmo~sL'rclss. Sholl, D. A. 1956 l'he or!ganzsatzon of fhc cerebral cort(,x. 1,ondon: Mothucn. S~hson,13. 1969 Inforrll;tt,toit radlas. ;5. Wnl~rschcznlichlccitsthrorie14, 149 160. Sibson, R. 1970 A nlodel for taxonomy. II. (811ljrorttcdfor pli1)licatioil.) of h~ppocampal neurons TV. Spc'r~ccr,W. A. & Kand(~1,F. R. 1961 131(ct1~opl1y~1oJ0gy Fast prcpotcntials. ./. Ncurop?~ysaol.24, 274 285. Szet~tBgothai,J. 1962 On tlrc=synal?tology of t11c c c ~ ~ i ~ lcortpx. )ral I n : Structure andjunctzons oj the nervous system (Ed. S. A. Sarlirsov). Moscow: Mcdgiz. ,J. 1965 'l'hc nit, of dog~nerattorlrncthods in the ir~vest~gation of short ncuronal Sz~rrt&got,lla~, connect~orls.I n : L>cqer~eratio?~ patterns In the n e vous ~ system, Progr. in Brain Research 14 (Eds. M . Singer & J. 1'. SchadP), 1 32. Arnstixrd~~n: Elsevrcr. Szcnthgothai, J. 1967 7'1m arlatorny of complex ir~togrativelirlits ill the nervous systcrn. Recen,f.developmerrt of rt,au.robdoloyyy,in lf un(gcrr,q 1, 9-46. '13udapctst: Aliiid6miai Kiacl6. Valvordo, J?. 1968 Strnct11r;~lcltt:~rtgosin tlril ;Lrrsa striata of tho mollst: after onuc:leation. E z p . Bra,in Res. 5, 274--202.