Introduction Lucretius wrote the De Rerum Natura to explaln
Transcription
Introduction Lucretius wrote the De Rerum Natura to explaln
ANALYSIS OF LUCRETIUS' POETIC STYLE BY COMPUTER: METHODOLOGICAL CONSIDERA TIaNS AND SOME CONCLUSIONS Introduction Lucretius wrote the De Rerum Natura to explaln Eplcureanlsm to Romans : he descrlbes his project as a novel and ambltlous one at a tlme when the Latin language he was using dld not contain the necessary technical terms for such an exposition. (Nec me animi fallit Gralorum obscura reperta difficile inlush'are Latlnis versibus esse, L 136-7). To make up for what he described as the poverty of the language he was working with, Lucretlus developed an innovatlve and highly indlviduallstlc style of writing hexameter. Critics of Lucretius have often found his poetic abilitles wantlng because his poetl'y does not satlsfy an aesthetlc formulated for other poets. It is the intention of tllis paper to describe sorne of the features peculiar to Lucretlus' poetic style while using his poetry as the basis fol' outlinlng a methodology fol' stylistic analysis by computer. The first step in any stylistic analysis must necessarily be the formulation and design of a research methodology. Il is primariIy with thls problem that the present paper is concerned, The project whose theoretlcal foundations are outlined here will eventuaHy compare Lucretius to a variety of Roman authors of prose and poet l'y . This work is Intended to support my hypothesis that Lucretlus' poetic style is connected with and derived from his philosophical content (9,10). The final goal of this project will be to integrate what are now two separate areas of work : a computer program whlch scans Latin hexameter and a series of programs which counts the words and word forms found in the hexameter. This procedUl'e will aHow detailed examination of Lucretius' metre and semantlc choices. The initial analysis explained here will show stylistlc diffel'ences revealed by a comparison of Book 1 of Vergil's Aeneid to Book 1 of the De Rerum Natura, The analysis was done on tapes of half of each of these works obtalned from L.A,S.L,A. The tapes contained Iists of the words in each work, analyzed according to grammatical category and classified according to lexical entry. Because ancient languages are, fol' the most part, inflected, sorting an anclent text into its separate tokens or words will not reveal the types or lexical entrles which an author used, 189 Extrait de la Revue Informatique et Statistique dans les Sciences humaines XIX, 1 à 4, 1983. C.I.P.L. - Université de Liège - Tous droits réservés. The analysis progl'am developed by L.A.S.L.A. lemmatizes text in Latin, 1. e,. sorts it Into separate forms while uniting these forms under their proper lexical entry. Some of the analyses required unlemmatized text which was obtained from the A. P. A. repository holdings. The text of Lucretius was prepared by Louis Roberts while that of Vergil's Aeneid was prepared by Wilhelm Ott. Lucretlus' Poetlc Style The De Rerum Natura is an epic poem of about seven thousand lines whlch Is dlvlded into six books, each explaining a different faeet of Epieurean philosophy. The poem is written in dactylie hexameter. so that the metrical scheme of each line Is as follows : foot number scansion: 1 2 3 4 5 UV uu uv uv 6 A spondee (--) may be substltuted for a dactyl (_VV) in any one of the first f1ve feet and the last foot Is always a spondee. The maximum number of syllables that a line can eontain, then, is seventeen, and the minimum is twelve. My earlier work on Lucretius drew on the studles of Minyard, Ingalls and Deutsch, who have shown that the poet's style is formulaic : his ehoice of words for semantie content and poetic image interaets with his choiee of words for metrical position. Throughout his work, Lucretius will repeat the same words in the same position in the line. For example, his description of his work as naturae species ratioque starts the third foot each time it occurs. Mlnyard has identlfied 52 different major categories of formulas (7). He suggests that 55% of the verses show some degree of metrical regularlty of expression : this figure is likely to be an underestimatlon. Of course, there is enough variation ln Lucretlus' poetie method that his poem remains varied and interesting even though it repeats the same phrases over and over. But a substantial portion of his poem consists of variations and interactions of phrases which appeared in exactly the same metrieal position elsewhere in the poem. 190 Extrait de la Revue Informatique et Statistique dans les Sciences humaines XIX, 1 à 4, 1983. C.I.P.L. - Université de Liège - Tous droits réservés. In previous work, 1 showed that this formaI stylistic feature interacted with the content of the poem. Lucretius manipulates these formulas or metrically regular modules to expIain the motions and interactions of the atoms. the modules of which Epicureans believed the universe was made, Several times in the poem, Lucretius explicitly connects the atomic modules of which the world consists with the letters, words and linos of his own poetry. He says that just as a limited number of atoms can by position and context produce everything in the universe, so he, as poet can produce ail the elements of the universe in his poetry, by placing a limited number of letters in a variety of different combinations and contexts. For example • Lucretius opens his poem with an invocation to Venus, which has often been thought a strange gesture by an Epicurean, who did not believe that the gods acted in the WOl'ld of men. The first line of the poem caIls Venus hominum divumque voluptas the delight of men and gods. Much later in the poem (VI, 94). Lucretius uses the same phrase in the same position to describe Calliope, the muse of history, who is certainly a lesser being in the divine hiel'archy, The two instances of this "formula" or modular unit, several thousand lines apart, are connected because they are instances of exactly the same words in exactly the same position in the line. This method of writing dactylic hexameter by combining modules like hominum divumque voluptas was first developed by Homer, who used it as a mnemonic device, to facilitate the production of poetry which was performed without being written down; ail oral poetry is, as a rule, formulaic. Later poets who no longer needed to use formulas because they wrote theÏ1' poetry down nonetheless continued to use formulas to pay homage to a tradition of epic poetry going back to Homer, Lucretius uses such formulas not just to allude to a tradition, nor. as sorne have suggested, because he is a poor poet, but to make philosophical points. For instance, in the example just given, the poet deliberately starts with the popular view of Venus as "the delight of gods and men", But, aftel' several books of expianation which corrects the popular view of the way things really are, Lucretius, "demotes" Venus by giving hel' epithet to Calliope : if Calliope can deserve this elevated title. 191 Extrait de la Revue Informatique et Statistique dans les Sciences humaines XIX, 1 à 4, 1983. C.I.P.L. - Université de Liège - Tous droits réservés. then anyone can. The gods are not important. after all. The displaced image. now used ironically. shows this. It may be clear from this explanatlon of Lucretius' poetic style that, in my stylistic analysls, 1 am concerned just with words the poet repeats but wlth which forms he repeats exactly and in the same position in the line. Consequently. 1 want to examine the lemmatized word frequency list, but. for portions of my analysis, the unlemmatized word list will be more interestlng to me. In partlcular. 1 will produce a list which allows me to determine how many words are repeated in exactly the same position in the line. to show that. as 1 suspect, Lucretius uses formulas more than Vergil does, and dlfferently. One aspect of my project wl1l be to develop a list of each form in the text followed by a number. from one to seventeen, for the syllable or metrical position (sorne lines wl1l have spondees for dactyls, and so will have "null" metrical positions) in which the form begins. This will allow the locatlon and examination of formulaic expressions as well as the evaluation of the formula list which Minyard produced by hand. The Conflict in Computerlzed Analyses between Form snd Content Typically, work on style with computers falls into two categories : statistlcal analyses of semantically neutral elements of an author's style and manipulations of numbers that describe meaningful events or units : semantically significant elements like "concepts" or "themes". A good example of the former type of study is Mosteller and Wallace's analysis of the Federa1lst Papers to resolve instances of dlsputed authorship (8). Their work consists of a series of tests on samples of varylng sizes on which they use a varlety of different statistical measures. Their analyses focus on "functlon words" the opposite of meaningful, contextuai words. Mosteller and Wallace compare the frequencies of the functlon words in varlous papers known to have been written by Madison or Hamilton to reveal words more typical of one than of the other. They then examine Federalist Papers whose authorship is unknown to see which of these "marker words" they contain. those found in Papers attrlbuted to Hamilton or those found in Papers attributed to Madison. A work of this sort in the area of ancient texts is the analysis by Levison et al. which purports to show that Plato's Seventh 192 Extrait de la Revue Informatique et Statistique dans les Sciences humaines XIX, 1 à 4, 1983. C.I.P.L. - Université de Liège - Tous droits réservés. Letter was not written by Plato. This argument is based on a comparison of the Seventh Letter and Plato's Apology for such factors as relative length of sentence and the frequency of two colorless words, kai and de. The counterargument by Brandwood shows that the diffërences between the two documents are caused not by their having different authors but by the different genres in which they were written : on is a highly rhetorical oration while the other is a prose letter (1). The second type of study described above analyzes contextually significant units. But, precisely because of the many problems introduced by work with semantically significant units, this type of study often produces questionable or ambiguous results. For example, looking forlltheme" words or "conceptll words is difficult because we have to decide which words are synonymous with or connected to which others to form a theme or a concept. In addition, the frequency of semantically laden words varies with topic even more than, as we saw above, the frequency of function words varies with genre : a high frequency of the word "ship" in a treatise on shipbuilding. or of the words for "child" in Euripides ' Medea does not describe the authOl"S style, but, rather, the topic at hand. How many instances of a word allow us to conclude that it is present as a stylistic choice made by the author ? Does the author use the word more than do others who use the language ? Who else uses the words and at what rate ? These questions do not suggest that stylistic analysis which works on contextuai words is impossible, but they do indicate that such work poses difficult pl·oblems. On of the major difficulties of this method is that it requires the researcher to decide what constitutes a normal frequency of use for a word, in order to compare this figure with the frequency with which a particular author uses the word. As long ago as the 1930's. Zipf noted that in any language. a small number of words is used a great number of times : these are words Iike "the". On the other hand. a large number of words is used a small number of times: these are words Iike "Mississippi" (12). Thus the shapes of the frequency curves of aIl languages are approximately the same and can, after the very frequent words Iike "the" are removed, be described by the formula. 193 Extrait de la Revue Informatique et Statistique dans les Sciences humaines XIX, 1 à 4, 1983. C.I.P.L. - Université de Liège - Tous droits réservés. where R = rank of each word in the frequency list F = frequency of that word K = a constant Figure 1 is a plot of the frequency of occurrence of words in the Latin of Plautus and in modern English. The plot is based on Zipf's calculations and data (12). A wide variety of linguistic and stylistic analyses are based upon Zipf's formula. Guiraud combines the formula with the use of a word frequency list fOl' a given language (5). Guiraud develops a formula for comparing an author's use of a word to its use in general, as described by the Zipfian curve. His formula is : AD = F - FT RD = AD FT where AD = absolute difference F = frequency of use FT = theoretical freq uency (from a word list) RD = adjusted difference The denominator of the expression for RD is a correction factor for the curved line which Zipf identified. While this method of evaluating style is very appealing and potentially useful. its use poses difficulties for ancient languages because, for many of them. we have relatively small samples from which to produce a word-frequency list. Delatte uses Guiraud's formula to compare the literary styles of Tibullus and Propertius by determining the words each uses most frequently and computing Guiraud's index for them to show which words each of the two poets uses more than that word is normally used ln the language (2). Clearly this kind of analysis is only as good as the word list on which it is based. The L.A.S.L.A. has recently produced a list (3) which is based on a broader sample than that of Diederich. which was previously the most readily available word list. The use of 194 Extrait de la Revue Informatique et Statistique dans les Sciences humaines XIX, 1 à 4, 1983. C.I.P.L. - Université de Liège - Tous droits réservés. Diederich's list was problematical for my research because it contains samples of both Lucretius and Vergil which were chosen for their excellence as poetl'y . Such passages can be expected to be richer in semantic anomaly than either the entire work of each of the two poets or the language as a whole. The new list prepared by L. A. S. L. A. contains no Lucretius and so is better for the analysis of that poet, and, although it does contain Vergil (Bucolics, Eclogues, Aeneid 1-6) the sample it uses is at least a uniform one and not based on carefully chosen selections from the poet. Yet the composition of the new word list reflects work done by L. A. S. L. A. and not an attempt to produce a systematic l'andom sample of Latin. Although 1 have included a Guiraudian analysis into my study, 1 have not based broad conclusions on lt alone because it is so dependent on problems of sampIing. Because comparing an author's frequency of word use to "normal" use is so problematic, quantitative IInguists and information scientists have developed ways of avoiding the use of these lists. Luhn has shown that the semantically important words an author uses can be found by marking off areas of the Zipfian curve to focus on the words an author uses a medium number of times (6). He begins by cutting off the top and bottom of the curve to eliminate words IIke "the" and like "Mississippi". He argues that the remaining, intermediate region of the overall frequency curve will contain two kinds of words: a small number of structural words which would occur frequently in any writing and which are the trailing off of the large group which was eut off at the top of the curve, and a larger number of content words. This latter group contains those words which would be much ral'er in a frequency count of the language as a whole, but whose frequency has been shifted ~ by the constraints of subject matter and style which the author has placed upon his work. lt is Luhn's supposition that this intermediate area of the curve will contain the greatest concentration of content words. He uses common sense ta determine the demarcatian of this area : he says that establishing the optimum locations would be "a matter of experience with appropriately large samples ll • Luhn's work is especially useful because he suggests a way of calculating a "significance factor" based on the words in the middle region of the curve. This factor "reflects the number of occurrences of significant words within a sentence and the Iinear distance between them due to the intervention of non195 Extrait de la Revue Informatique et Statistique dans les Sciences humaines XIX, 1 à 4, 1983. C.I.P.L. - Université de Liège - Tous droits réservés. slgnificant words. "This calculation aUows a judgement to be made of the syntagmatic plane on the basis of the paradigmatic plane; that is, it permits us to judge the whole structure of a work from what we can learn about individual words. In my own work, 1 will also use signlficance factor to point to those areas of the De Rerum Natura which are particularly rich in formulaic utterances. The amount of semantic novelty whlch an author employs can be assessed by entropy (H). Like the Luhnian demarcation of the curve, thls measure can be applied independently of comparative evidence across the language. Il points to the numbers of different words that an author uses : a hundred-word text which consists of a hundred different words w!th no repetltions has a hlgh entropy whlle a text of equal length which consists of a hundred repetltions of the same word has a low entropy. A low H thus means a small amount of unexpectedness : a word missing from the text could be guessed wlth reasonable accuracy, even independently of the context. And a smaU H indicates a higher degree of language structure. Various analyses of a text can be done with entropy as their base : a preliminary analysis of Lucretius and Vergil fol' entropy is suggestive and 1 will describe !ts results below. Forecast of the Project and Account of the Present Series of Investigations Because 1 wanted to work with both structuraUy Important units and w!th semanticaUy rich units, 1 have planned a group of analyses which Incorporate both elements. The results of these analyses will be independent of each other and will, if my notions of Lucretius are correct, rein force each other. My study will include the analysis of Lucretius and Vergil fol' function words, the semanticaUy neutra! words which can distinguish one author from another. The aim of this analysis is not to determine who wrote the De Rerum Natura but to develop a list of markers, semantically neutral words whose use is typical of each author. Another analysis will look fol' the significant or semanticaUy meaningful words by employing Luhnian analysis, making Guiraudian calculations and performing tests based on entropy. Both the semanticaUy rich words and the neutral words can then be compared wlth the formulaic repetitions which the scansion and counting programs will locate. Il 196 Extrait de la Revue Informatique et Statistique dans les Sciences humaines XIX, 1 à 4, 1983. C.I.P.L. - Université de Liège - Tous droits réservés. is not necessary to demonstrate that Lucretius employed formulas, but it will be worth while to see with what marker words and with what theme words his formulaic expressions can be connected. For several reasons, then. frequency counts will be important to these analyses. They will locate marker words and semantically rich words. The frequency lists of Lucretius can then be compared to those of Vergil. as 1 will show below. and, eventually, to those of Cicero and Plautus. and, finally. to the standard word list for Latin usage. The counts provided by analysis of individual authors will couect the figures obtained from the standard word list, and thus help set confidence limits for the Guiraudian measures. Further calculations of the entropy shown by sections of the De Rerum Natura will reinforce the findings obtained by Luhnian sectloning of the frequency curve and the calculation of the Luhnian significance factor. These analyses will glve us a much better idea than we have at the moment of what constitute Lucretius' themes and of how he presents and develops them. It is early in the project, but a series of preliminary analyses has been suggestive. 1 have produced word frequency lists for both the lemmatized and the unlemmatlzed word lists for both Book 1 of the De Rerum Natura and for book 1 of the Aeneid. The two samples were of different sizes : 5152 words of Vergil. 7427 of Lucretius. The first set of analyses worked with each of the samples separately and so was not disturbed by the difference of sample size. The second set of analyses. described below, used the Guiraudian calculation to compensate for the difference in sample size. The first analysis simply produced a ranked frequency list for each author. Luhnlan markers were then placed on the frequency lists to lndicate the area of middle frequency words which might have thematic significance. An excerpt from the frequency list for each author is presented in Table l, below. The first notable difference between the two authors in this limited data was in the rate at which thelr word frequency curves dropped off. The most frequent word in each sample is used about 250 times : sum at 256 instances ln Lucretius and que at 252 in Vergil. But the fifty-fourth most frequent word in Lucretius is unde which appears 19 times while in Vergil it is oris which appeal's only 8 times : Lucretius has 76 words more 197 Extrait de la Revue Informatique et Statistique dans les Sciences humaines XIX, 1 à 4, 1983. C.I.P.L. - Université de Liège - Tous droits réservés. than Vergil that intervene, that is, that are used more than 8 times, ln general, it seems clear from the stem frequency as weil as from the ward frequency counts that Lucretius uses a larger number of words a medium number of times. 1 have followed Luhn's suggestion and placed the demarcation of the medium range which he descrlbes by sense : reading down the frequency Iist of Vergil, 1 found que (252), et (166), in (49), ~nd then, eventually urbs (20). 1 would assume that the medIUm frequency range starts with urbs and ends, perhaps at venio (15) or ge~s (15), The whole middle area seems shifted upward in LucretlUs, where it seems ta start with the eighth ward instead of the twenty-second, golng from corpus (100) ta Inane (47), Larger Luhnian areas couId be marked off as weil : from urbs (20) ta ex (5) or Aeolus (4) ln Vergil, and from corpus (100) ta fruges (5) ln LUCl'etlus, The smaller segments of the frequency curve that 1 have marked off seem more significant because they have such a high concenh'ation of signiflcant words. ln fact, the difficulty of placing Luhnian Iimlts on the CUl've highlights a dlffel'ence between Lucretius and Vergil : ta start at corpus (number 8 on the Lucretius Iist), 1 have considered l'es (number 2) a marker ward in a work called De Rerum Niitura. Clearly l'es and Its genltlve plural l'erum are significant theme words for Lucretius, although in ordinary Latin they are empty words. Now, as the Gulraudian analysis below shows, Lucretius uses l'es only sllghtly more often than it Is used in the language as a whole, but much more often than Vergil does (he uses l'es 16 times in the pl'esent sample). Sa perhaps l'es is significant in Lucretius, and certainly, corpus is, The Luhnian "middle" region starts, in Lucretius, at ward 8 or perhaps even ward 2 ! 1 began this study ta describe Lucretius' poetic style in objective terms and, even at this early stage, 1 have seen a difference between Lucretlus and Vergil which is of an overwhelming and powerful sort : Lucretius seems ta have reshaped the frequency curve of the language in such a way that his Luhnian middle region, if he can be said ta have one, occurs much higher on his frequency Iist. Further study will show whether Lucretius does this ln the l'est of his poem and will compare his ward frequency pattern wlth that of Vergil, Plautus and Cicero, 198 Extrait de la Revue Informatique et Statistique dans les Sciences humaines XIX, 1 à 4, 1983. C.I.P.L. - Université de Liège - Tous droits réservés. Perhaps sorne of the difference in word frequency rate can be explained by the different genre and content of the two works. Lucretius uses a large number of words often, because he is talking about the same atomic basis which underlies the disparate elements of the universe. Thus, he establishes a series of familiar terms, almost technical terms, whose meanings rein force one another. Vergil, on the other hand, transports a hero to ever different regions where he encounters new problems and people, for which and whom the author uses ever-different words. It will be the work of further analysis to see if the pattern of word frequency use that is present in these segments of the two authors endures throughout their works, and if any significance can be suggested for it beyond the subject matter. However, even this preliminary analysis shows that Lucretius and Vergil, while writing in the same verse form, went about their tasks very differently. Further work will compare Lucretius'style to that of other Roman authors. But the present analysis aiready shows how Lucretius'usage differs from standard Latin usage. The Guiraudian calculation both compensates for the disparate sizes of the samples from Lucretius and vergil and compares their word frequency patterns to those of other Latin authors. The frequency samples used for general Latin usage are P. B. Diederich's word list (4) and the L.A.S.L.A. word list (3). Diederich's list has been compiled from a sample of 124,686 words, 49.363 of prose and 75,323 of poetry while the L.A.S.L.A. list is based on a sample of 794,662 words, 582,411 from prose and 212,251 from poetry. In my comparison, 1 have used the lemmatized lists of my authors because the entries of both word lists are organized according to lemma. Table 1 shows the Guiraudian figures for the first fort y words of each author, as calculated from the Diederich word list; Table 2 presents the same calculations from the L. A. S. L. A. list. About 11% of Diederich's sampie of Latin poetry consists of excerpts of Lucretius and Vergil, so that Diederich's poetry sample seemed a questionable basis from which to predièt frequencies in these two authors. Accordingly, 1 made separate Guiraudian calculations on Diederich's prose sample, on his poetry sample and on their combination. In fact, the three calculations were not much different from each other. So 1 have in Table 1 and Figure 2, which is based on the table, presented only the calculation based on Diederich's poetry sample. 199 Extrait de la Revue Informatique et Statistique dans les Sciences humaines XIX, 1 à 4, 1983. C.I.P.L. - Université de Liège - Tous droits réservés. A different problem obtained with the L.A.S.L.A. list. This list contains no Lucretlus, but a substantial amount of Vergil. About 25% of this list consists of Vergil. Thus 1 feared that comparlson of Lucretius with thls list would show only that he wasn't Vergil. And, of course. that comparison of Vergil with the list would show only that Vergil was very like himself. In fact. the results obtained from this list are less cieal' than those from the Diederich list. To minimize the misleading impression which could result from the examination of so biased a sample. 1 have presented, in Table 3 the Guiraudian scores calculated from this list, based on the prose sampie , the poetry sample, as weil as both samples taken together. For the L.A.S.L.A. word list, Figure 3 shows a graphie representation of the prose sampie , whlle Figure 4 represents the poetry sample. 1 regal'd these calculations as less important for my work simply because the sample happens to be unfortunate for my project. However, both sets of calculations support the same conclusions. The Guiraudian scores show even more clearly than the ranked frequency lists that Lucretius uses words very differently from Vergil. The graphs of the Guiraudian scores from the tables make this clear. Figures 2 and 3 are graphs of the Guiraudian scores of each author for the top of the word frequency list. The curves are very simllar in shape and show a similar difference between Lucretius and Vergil (Figure 2 shows more detail by adjusting the score of Vergil's top word que. which is 117.4, and reducing it to 65). Guiraudian scores are relative : Figure 2 shows that Vergil's usage is about normal. that is. he uses words as they are used in the language. For Vergil, the places where the curve is ab ove normal show Individu al theme words or marker words. On the whole, theme words are more frequent than markel' words. But, for Lucretlus the whole curve has risen. On the whole, Lucretius' Guiraudian scores are much hlgher than Vergil's and for a whole area in Figures 2 and 3 from word 4 to word 20, his Guiraudian scores do not drop to zero or below. In Figure 2 this area is markedly elevated and has no dlps below zero in it. This region colncides with the Luhnian markers that we imposed on the frequency list above. No part of Vergll shows such elevation. The analogous region, the Luhnian area we marked off for Vergi!o from word 22 to word 37, shows no such pattern : there are four drops to zero ln it. The region of greatest elevation for Vergll, from word 35 to word 47, is 200 Extrait de la Revue Informatique et Statistique dans les Sciences humaines XIX, 1 à 4, 1983. C.I.P.L. - Université de Liège - Tous droits réservés. surpassed by the elevatlon of both the same region in Lucretius and even more by the elevatlon of the Luhnian region in Lucretius. Figures 2 and 3 both show that Lucretius and Vergil have very different poetic styles, and that Vergil's is much more typical of Latin as a whole. Lucretlus has distorted the very pattern of word usage in the Latin language to produce poetry that ref1ects his Epicurean message. lt remains to be seen whether this pattern of word usage continues throughout the work, and if the elevated areas on the graph of the Guiraudian scores colncide with partlcular metrlcal patterns. The second phase of the analysis, which will utl1lze the scansion program , will reveal such coincidences. The figures derived from the L. A. S. L. A. word IIst, the other lIst used for calculations, raise important questions which future analysis must try to answer. The shape of the curve based on the Guiraudian scores calculated accordlng to this list is about the same as the shape of the curve calculated from Dlederich's Ust. Yet there are interesting differences between the two sets of calculations. The L. A. S. L. A. list seems to have proportionally many more Instances of que, et, video, sui and atque. It is not entirely clear why these differences exlst and yet they do not affect the observations which the Guiraudian scores suggest. There seems to be sorne effect caused by the larger slze of the L. A. S. L. A. IIst: the largest Guiraudian scores calculated from it are not as large as the largest scores calculated from the Diederich Ust. This Is doubtless because even very rare words come up more often in a larger list than in a sma11er one. For instance, regina appears not at ail in Diederich's list, but 73 times in the L.A.S.L.A. IIst. Conversely, the newer list has far fewer instances of inane and ratio, probably because It contains no sample of Lucretlus. Yet many scores are very similar from both lists : for example , Lucretius' scores for corpus,llatura, hic and ignis and Vergil's scores for hic, in, do, ille, and fero are probably in the same range in each Ust. Thus,-It is not the case that the two IIsts are on different scales although they are scaled somewhat dlfferently. The major difference seems to be that Vergil scores slgnificantly more positlvely in the Guiraudian scores calculated from the L. A. S . L. A. IIst than ln the scores calculated from the Diederich IIst. This increase can most clearly be seen from Figure 3. Perhaps it arises in part as a result of the large amount of Vergil in the newer word Ust. 201 Extrait de la Revue Informatique et Statistique dans les Sciences humaines XIX, 1 à 4, 1983. C.I.P.L. - Université de Liège - Tous droits réservés. The present analysis suggests that Lucretius'stylistic pattern has ta do with blacks or modules that are repeated. Not only is Lucretius'usage of significant words different from VergiI's, but the Epicurean poet uses marker words differently as weIl. For Vergil, the Guiraudian curve tends toward zero or the negative at marker words, bath inaide and out of the Luhnian region marked off on the frequency list : Table 1 shows that only 13 seemingly empty words of the first fort y have scores higher than one (Table 2 has the same number). Two of these, tu and do, may be theme words. In Lucretius, 22 (20 in Table2) of the first 40 words are seemingly empty but have scores higher than one. Of these nuIlus, possum and quoniam may have thematic raies. But at least nineteen words remain; it seems lhat Lucretius uses bath colorless words and semantically meaningful words more often than Vergil does, and more often than they are used in the language as a whole. This practise relates to his widespread use of formulaic language, which repeats modular units that incorporate bath thematic words and markel' words. The prelimlnary findings about Lucretius' stylistlc pattern are confirmed as weIl by the calculation of entropy for the samples drawn. For each author, entropy calculated on the ward sample is necessarily larger than that calculated on the stem sampie. But the entropy of Lucretius' word sample (78.59) is about the same size as that of Vergil's stem sampie (77.37). The complete figures are found in Table ~Again, the figures show that Lucretlus uses the same words over and over again. Stephen Waite' s study of enh'opy in Latin prose authors provides figures for comparison (11). Waite used samples of a thousand words, arbitrarily drawn, from three prose authors. Except for very short samples, entropy is not affected by disparate sample size, sa Waite's figures can be compared ta those given here. Waite's lowest figure is an average entropy of 84.59 for Cato's De Agricultura, followed by 90.3 for selections from Sallust and 90.69 for selections from Livy. His high figure, of 91.62, for one sample (aIl that is available) from Cato's fragments is, as he says himself, questionable, for the entropy of fragmentary passages is a questionable entity. These figures should be compared ta those 1 obtained for my wordsamples, as he used unlemmatized material. VergiI's entropy figure is simllar ta the figure for Cato's De Agricultura. The most probable reason for the relatively low figure for the poets is that poetry limits ward choices far more than prose does : metre imposes restrictions 202 Extrait de la Revue Informatique et Statistique dans les Sciences humaines XIX, 1 à 4, 1983. C.I.P.L. - Université de Liège - Tous droits réservés. on word choice, forcing repetition even in a poet as innovatlve as Vergll. But none of Walte's figures Is more than two points removed from any other. Lucretius' entropy, at 78.59, is much more dlfferent from that of Vergil than any of Walte's authors are from each other. AIso, Lucretlus' entropy figure Is the lowest of the group as a whole. Agaln, thls suggests that Lucretius Imposed constraints on hls use of language that went far beyond those of other authors. 1 would like to end by conslderlng what thls analysls shows about poetic themes ln each poet. For Vergll, the analysls reveaIs few surprises. We know that urbs, terra, vil' and arma are hls themes. His Luhnlan reglon contams nothing unexpected. For Lucretius, however, thls Is not the case. Agaln the Luhnlan reglon contalns words we would expect to flnd as theme words :. corpus and Inane. But at least the extent to whlch l'es, ratio and natura appear are surprlslng. Students of Lucretlus have argued that these are colorless words whose Incidence ln Lucretlus Is slmply a feature of the Latin language. Yet at least the last two are used by Lucretius far more than they are ln the language as a whole. Lucretlus uses these words as theme words : or, If you prefer, as he usesother, clearly colorless words like qulsque, constoand Is. Perhaps It Is fair to say that he has redefined what a theme word Is, maklng It Into a neutral atomlc unit whlch· acqulres color, as sald, from Its position and function ln the poem as a whole : . .. It makes a great difference wlth what others the atoms are comblned, and ln what position ... DRN l, 817-818. Eva THURY Reference list 1) Brandwood, L., "Plato's L.A.S.L.A., 1969, 4, 1-21. 2) Delatte, L., "Key-words and poetic themes ln Propertius and Tlbullus". Revue L.A.S.L.A., 1967, 3, 31-83. Seventh Letter" . Revue 203 Extrait de la Revue Informatique et Statistique dans les Sciences humaines XIX, 1 à 4, 1983. C.I.P.L. - Université de Liège - Tous droits réservés. 3) Delatte, L., Evrard, Et., Govaerts, S. and Denooz, J., Dictionnaire fréquentiel et index inverse de la langue latine. Liège, 1981. 4) Diederich, P., The Frequency of Latin Words and their Endings. Chicago, 1939. 5) Guiraud, P., Les Caractères Statistiques du Vocabulaire, Paris, 1954. 6) Luhn, H., "A Statistical Approach to Mechanized Encoding and Searching of Literary Information". IBM Journal, 1957, 4, 309-317. 7) Minyard, J., Mode and Value in the De Rerum Natura. Wiesbaden, 1978. 8) Mosteller, F. and Wallace, D., Inference and Disputed Authorship : "The Federalist". Reading, 1964. 9) Thury, E., "Naturae species ratioque. Poetic Image and Philosophical Perspective in the De Rerum Natura of Lucretius". Ph.D. diss. University of Pennsylvania, 1976. 10) Thury, E., "The Poem of Lucretius as a Simulacrum of the Rerum Natura" to appear in the American Journal of PhiIology. 11) Waite, S., "Approaches to the Analysis of Latin Prose, Applied to Cato, Sallust and Livy". Revue L.A.S.L.A., 1970, 2, 91-112. 12) Zipf, G. Selective Studies of the Principle of Relative Frequency in Language. Cambridge, Ma. 1932. 204 Extrait de la Revue Informatique et Statistique dans les Sciences humaines XIX, 1 à 4, 1983. C.I.P.L. - Université de Liège - Tous droits réservés. lucretiul Verlil Humber 1 2 3 1 5 6 7 8 9 10 Il 12 13 11 15 16 17 18 19 20 21 22 23 24 Word que 0'qui hl, ln qull .u.. lu d. por Ille ,u.. ... ah atque ,ul lam ml(QUI OOUlII .ul ad urbi terra nec rero lu.. .Ir Gulraudl.. AetuaJ Frequency Scoro 252 166 101 58 19 13 11 33 32 29 28 28 25 25 117.1 18.2 1.0 .8 -1.1 .7 -10.7 3.7 3.0 1.3 0.0 0.1 2.1 25 7.3 3.1 0.1 3.2 1.8 6.9 0.9 nec ox nalura ut video .ul rallo Ilque qullque loane .1 hic ~.9 por 1.~ 33 al, r.. 21 23 23 22 21 21 20 20 19 18 18 18 17 17 17 16 16 16 34 reelnt I~ 14.1 35 36 video venlo lens nunc 15 15 -0.1 0.7 I~ H •peclui 15 14 14 0.1 1.3 2.9 2~ 26 27 28 29 30 31 32 37 38 39 40 WI. non ar... Iple Word 3.7 -3.8 0.6 4.9 2.7 7.3 -4.3 ~.I -1.3 1.7 1.8 .u.. r.. qui que et pOlluœ ln CorpUI omnl, terra 1. l,ois qull non ,u.. conal0 lu Dulhn par. noque nam IDUItUI tempUI quonllJl ah aHUI d. AetuaJ Frequency 256 196 188 158 130 12~ 117 100 99 71 65 62 58 ~6 ~1 ~2 19 17 17 47 47 4~ 4~ 43 12 10 38 38 37 37 GuinudÎlO _0 -.9 .6 2.1 ~9.9 8.2 7.6 3.7 22.3 25.2 26.0 27.7 10.6 12.5 13-1 16.8 10.3 13.1 10.1 30.1 2.8 -3.2 2.1 8.9 11.6 9.2 1.1 -3.6 0.0 31.5 2.3 3~ ~.6 34 34 34 7.2 8.6 7.3 33 33 33 33 28 28 ~.2 4.8 19.0 2.2 6.4 3.9 Table 1 GulraudllO Scorn ror Poetry Sucd on Diederfch'l Frequency Lilt 205 Extrait de la Revue Informatique et Statistique dans les Sciences humaines XIX, 1 à 4, 1983. C.I.P.L. - Université de Liège - Tous droits réservés. Lucrellus Verall Humber W...d Gulraudian Actual Frequency 1 2 3 1 5 6 7 8 9 10 Il 12 13 li 15 16 17 18 19 20 21 22 23 21 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 que .1 qui hl, in qui, .um lU 252 166 104 58 19 13 11 B d. per III. ,um ... .b atque 'Ul i... ".."Ut omoi. .ul Id urbi terra ne< fero lUm ,l, 11.111 .,.. non ipse .1, 'OI retina video venlo .... nunc 0 peclUI 32 29 28 28 25 25 25 21 23 23 22 21 21 20 20 19 18 18 18 17 17 17 16 16 16 15 15 15 15 15 14 li Word Scor. 9.7 1.6 1.1 -0.0 -0.1 wO.7 -5.0 -3.0 2.7 2.7 -0.7 -0.2 3.1 -3.1 2.5 1.5 0.5 0.9 0.2 1.7 0.6 5.2 2.5 -2.1 0.3 3.6 1.1 ActuaJ Pnquency .u" ro. qui que "poIOu" ln corpus omol. flOt ex natura ul video .ul ratio alque qui.que iOlM .1 hic po< terra 1. 19n1l quis non 5.6 -<t.a 'u" 1.1 -0.9 2.7 1.7 12.1 -0.1 1.4 6.0 -0.0 1.5 2.9 lU nullua p.... neque nUl Ilultus comto te1IpUI quonl.. ab .JiUl d. 256 196 188 158 130 125 117 100 99 71 65 62 58 56 54 52 49 47 17 17 17 15 15 13 12 10 38 38 37 37 35 34 34 34 33 33 33 33 28 28 Gulraudlan. Scor. 11.5 6M 8.7 -2.9 -5.6 22.5 1.3 29.7 12.5 3.5 14.0 11.3 5.6 6.6 7.1 18.9 5.7 28.5 125.3 1.6 -1.0 3.9 7.0 17.1 lM -3.2 -3.1 -0.6 38.6 -1.8 1.7 8.4 16.3 8.0 1.1 8.2 25.1 3.1 6.1 5.2 Table 2 Gulnudian Scoret Based CD the L.A.S.L.A. frequency Lin 206 Extrait de la Revue Informatique et Statistique dans les Sciences humaines XIX, 1 à 4, 1983. C.I.P.L. - Université de Liège - Tous droits réservés. Verlll w.... Num.ber Aclua1 Frequency 2 3 4 5 6 7 & 9 10 que ••qui bit in qui. .u.. 'u do p..- Ill. .2 tU" '3 .b '4 ego 15 atque .6 au' 17 1... .& œa.tnu. '9 omol. 20 .ul 21 ad 22 urbi 23 1"-" 24 nB< 25 r.... 26 lU" 27 .Ir 2& taHI 29 non 30 3' ipl9 32 Bit Il "'... 33 H 35 36 37 3& 39 40 .... reslna. ,ideo venlo .... 252 166 104 5& 49 43 41 33 32 29 2& 2& 25 25 25 24 23 23 22 2. 21 20 20 19 1& .& 1& 17 17 17 .6 16 16 15 15 15 15 nunc 15 0 '4 14 pot'" w.... Guiraudian SCorel for Poe.ry • _. Locrellua 9.7 1.6 4.' -0.0 -0.4 -0.7 -5.0 -3.0 2.7 2.7 -0.7 -0.2 3.1 -3.1 2.5 1.5 0.5 23.0 1.3 -2.7 0.5 2.0 -5.0 -10.2 2A 4.2 6.9 0.5 -1.9 -3.0 -3.6 4.6 &.2 -0.5 4.2 5.7 4.5 9.' -6.5 5.' -1.7 4.0 -1.3 33.6 -0.4 3.1 5.& 4.2 '3.& 12.& 1'-2 1.7 0.6 5.2 2.5 -2.4 0.3 3.6 4.1 5.6 -4.0 4.1 -0.9 2.7 4.7 12A -0.4 lA 6.0 -0.0 1.5 2.9 3.4 -2.6 -2.2 ·1.2 0.5 2.5 0.5 4.4 Botb 1&.1 1.4 -1.3 0.3 1.3 -4.' -9.1 0.4 3.7 3.2 -2.1 -1.7 -OA ·0.7 2.5 0.6 3.0 0.6 -1.4 -2.1 -2.7 4.7 5.& -1.1 2.7 5.0 4.4 &.0 -5.9 4.& -1.5 3.6 -0.4 21.1 -0.4 2.' 5.& 2.5 6.\ 7.3 .u.. .... qui quo et POUUla ln CorpUI omnil nB< .x natura u. video .ul ratio atque ,.... qulsque .1 bit pB< 1..... · 1. llOtl Qui. non tu.. CotLlto lu nul1uI ... part n",uo Actual Frequency 256 '96 186 15& 130 .25 117 100 99 71 65 62 5& 56 54 52 49 47 47 47 47 45 45 43 42 40 3& 3& 37 37 35 34 34 34 lDutlUI tempu. 33 33 quoalua 33 33 2& 2& ab aliua d. Guiraudian Scores for Poe.ry PrOIe Botb 2.4 11.5 0.1 68.5 33.7 28.9 1.3 &.7 -0.6 3.2 -2.9 6.5 -5.& -5.6 -5.9 22.5 13.6 15.3 -0.7 -2.1 4.3 29.7 26.0 26.9 &.5 12.5 7A 6.1 7.3 3.5 14.0 3.& 5.4 20.4 41.3 17.6 0.7 5.6 -0.5 6.6 6.6 6.5 -0.4 0.9 7.' 19.7 4&.9 16.7 5.& 5.7 5.& 15.7 2&.\ 13.7 125.3 103.6 10&.3 0.7 1.6 0.4 -4.0 -3.6 -3.7 4.6 4.& 3.9 12.5 7.0 16.7 -2.8 -4.7 17A 17.2 22.6 lM -1.5 -3.2 -0.& -5.7 -3.1 -6.5 -2.4 -0.6 -2.9 3&.6 23.4 25.8 -1.1 1.0 -4.& 4.7 5.2 5.5 4.7 5.5 &.4 6.0 10.3 '.0 7.1 &.0 6.& 1.5 4.1 0.& 7.6 &.2 6.3 19.1 25.4 17.6 -1.0 -1.9 3.' -0.2 0.& 6.' 2.4 \.7 5.2 Tab183 Guiraudlan Scoret for Poetry. ProIe and Bolh Comblned Based on L.A.S.L.A. Frequençy Lill 207 Extrait de la Revue Informatique et Statistique dans les Sciences humaines XIX, 1 à 4, 1983. C.I.P.L. - Université de Liège - Tous droits réservés. stems words Lucretius Vergil 69.94 78.59 77.37 84.98 Table 4 Entropy for Book One of Each Author 208 Extrait de la Revue Informatique et Statistique dans les Sciences humaines XIX, 1 à 4, 1983. C.I.P.L. - Université de Liège - Tous droits réservés. Zipf, Selected Studies of the Principle of Relative Frequency in Language Frequency of Occurrence of Words in the Latin of Plautus Number of Occurrences 70 rrr-------------------------, 0" 60 " " 40 20 1 2 3 4 5 6 7 8 910203040506070809(10203040506070809(102030405000 NJIter of Words Figure 1 209 Extrait de la Revue Informatique et Statistique dans les Sciences humaines XIX, 1 à 4, 1983. C.I.P.L. - Université de Liège - Tous droits réservés. Raw Guiraudian Score: Luc. vs. Vergil Based on Poetry calculation Based on oiederich's Frequency List Guiraudian scores 80 ~-------------------------, Lucretius o 60 Vergil ~ o 5 10 15 20 25 30 35 40 45 50 55 Words: Host ta Least Frequent Lucretius List and Vergil List FigJre 2 210 Extrait de la Revue Informatique et Statistique dans les Sciences humaines XIX, 1 à 4, 1983. C.I.P.L. - Université de Liège - Tous droits réservés. Raw Guiraudian Score: Luc. vs. vergil Based on Poetry Calculaticn Based on L.A.S.L.A. Freqvency List Guiraudien SCores 80,.-------------------------...., Lucretius o 60 vergil ~ 40 20 o -20 l-_-'-_-"--_-'-_--'-_-'---_-'---_-L_-'-_---'-_---'_----' o 5 10 15 20 25 UJrds: l'I:lst ta Least fl'<'<:fJfflt 30 35 40 45 50 55 Lvcretius List éVld Vergil List Figure 3 211 Extrait de la Revue Informatique et Statistique dans les Sciences humaines XIX, 1 à 4, 1983. C.I.P.L. - Université de Liège - Tous droits réservés. Raw Guiraudian score: Luc. vs. Vergil Based on Prose calculation Based on L.A.S.L.A. Frequenoy List Guiraudian Scores 80 ~--------------------------, Lucretius D 60 Vergll • 40 20 o o 5 10 15 20 25 30 35 40 45 50 55 Words: Most ta Least Frequent Lucretius List and Vergi1 List Figure 4 212 Extrait de la Revue Informatique et Statistique dans les Sciences humaines XIX, 1 à 4, 1983. C.I.P.L. - Université de Liège - Tous droits réservés. Raw Guiraudian Score: Luc. vs. vergil Based on Prose and Poetry Calculation Based on L.A.S.L.A. Frequency List Guiraudian Scores 80,-------------------------- Lucretius o 60 Vergil ~ 40 20 o -20 -'-_--.J L - _ - - L _ - - ' -_ _- ' - - - _ - ' - _ - - ' -_ _L-_-'-_-"-_---'_ _ o 5 10 15 20 25 30 35 40 45 50 55 Lucretius List and Vergil List Wards: Host to Least Frequent Figure 5 213 Extrait de la Revue Informatique et Statistique dans les Sciences humaines XIX, 1 à 4, 1983. C.I.P.L. - Université de Liège - Tous droits réservés.