Introduction Lucretius wrote the De Rerum Natura to explaln

Transcription

Introduction Lucretius wrote the De Rerum Natura to explaln
ANALYSIS OF LUCRETIUS' POETIC STYLE BY COMPUTER:
METHODOLOGICAL CONSIDERA TIaNS AND SOME CONCLUSIONS
Introduction
Lucretius wrote the De Rerum Natura to explaln Eplcureanlsm to
Romans : he descrlbes his project as a novel and ambltlous one
at a tlme when the Latin language he was using dld not contain
the necessary technical terms for such an exposition. (Nec me
animi fallit Gralorum obscura reperta difficile inlush'are Latlnis
versibus esse, L 136-7). To make up for what he described as
the poverty of the language he was working with, Lucretlus
developed an innovatlve and highly indlviduallstlc style of
writing hexameter. Critics of Lucretius have often found his
poetic abilitles wantlng because his poetl'y does not satlsfy an
aesthetlc formulated for other poets. It is the intention of tllis
paper to describe sorne of the features peculiar to Lucretlus'
poetic style while using his poetry as the basis fol' outlinlng a
methodology fol' stylistic analysis by computer.
The first step in any stylistic analysis must necessarily be the
formulation and design of a research methodology. Il is primariIy with thls problem that the present paper is concerned, The
project whose theoretlcal foundations are outlined here will
eventuaHy compare Lucretius to a variety of Roman authors of
prose and poet l'y . This work is Intended to support my hypothesis that Lucretlus' poetic style is connected with and derived
from his philosophical content (9,10). The final goal of this
project will be to integrate what are now two separate areas of
work : a computer program whlch scans Latin hexameter and a
series of programs which counts the words and word forms
found in the hexameter. This procedUl'e will aHow detailed
examination of Lucretius' metre and semantlc choices. The initial
analysis explained here will show stylistlc diffel'ences revealed
by a comparison of Book 1 of Vergil's Aeneid to Book 1 of the
De Rerum Natura, The analysis was done on tapes of half of
each of these works obtalned from L.A,S.L,A. The tapes contained Iists of the words in each work, analyzed according to
grammatical category and classified according to lexical entry.
Because ancient languages are, fol' the most part, inflected,
sorting an anclent text into its separate tokens or words will
not reveal the types or lexical entrles which an author used,
189
Extrait de la Revue Informatique et Statistique dans les Sciences humaines
XIX, 1 à 4, 1983. C.I.P.L. - Université de Liège - Tous droits réservés.
The analysis progl'am developed by L.A.S.L.A. lemmatizes text
in Latin, 1. e,. sorts it Into separate forms while uniting these
forms under their proper lexical entry. Some of the analyses
required unlemmatized text which was obtained from the A. P. A.
repository holdings. The text of Lucretius was prepared by
Louis Roberts while that of Vergil's Aeneid was prepared by
Wilhelm Ott.
Lucretlus' Poetlc Style
The De Rerum Natura is an epic poem of about seven thousand
lines whlch Is dlvlded into six books, each explaining a different faeet of Epieurean philosophy. The poem is written in
dactylie hexameter. so that the metrical scheme of each line Is
as follows :
foot number
scansion:
1
2
3
4
5
UV
uu
uv
uv
6
A spondee (--) may be substltuted for a dactyl (_VV) in any one
of the first f1ve feet and the last foot Is always a spondee. The
maximum number of syllables that a line can eontain, then, is
seventeen, and the minimum is twelve.
My earlier work on Lucretius drew on the studles of Minyard,
Ingalls and Deutsch, who have shown that the poet's style is
formulaic : his ehoice of words for semantie content and poetic
image interaets with his choiee of words for metrical position.
Throughout his work, Lucretius will repeat the same words in
the same position in the line. For example, his description of
his work as naturae species ratioque starts the third foot each
time it occurs. Mlnyard has identlfied 52 different major categories of formulas (7). He suggests that 55% of the verses show
some degree of metrical regularlty of expression : this figure is
likely to be an underestimatlon. Of course, there is enough
variation ln Lucretlus' poetie method that his poem remains
varied and interesting even though it repeats the same phrases
over and over. But a substantial portion of his poem consists of
variations and interactions of phrases which appeared in exactly
the same metrieal position elsewhere in the poem.
190
Extrait de la Revue Informatique et Statistique dans les Sciences humaines
XIX, 1 à 4, 1983. C.I.P.L. - Université de Liège - Tous droits réservés.
In previous work, 1 showed that this formaI stylistic feature
interacted with the content of the poem. Lucretius manipulates
these formulas or metrically regular modules to expIain the
motions and interactions of the atoms. the modules of which
Epicureans believed the universe was made, Several times in the
poem, Lucretius explicitly connects the atomic modules of which
the world consists with the letters, words and linos of his own
poetry. He says that just as a limited number of atoms can by
position and context produce everything in the universe, so he,
as poet can produce ail the elements of the universe in his
poetry, by placing a limited number of letters in a variety of
different combinations and contexts.
For example • Lucretius opens his poem with an invocation to
Venus, which has often been thought a strange gesture by an
Epicurean, who did not believe that the gods acted in the WOl'ld
of men. The first line of the poem caIls Venus
hominum divumque voluptas
the delight of men and gods. Much later in the poem (VI, 94).
Lucretius uses the same phrase in the same position to describe
Calliope, the muse of history, who is certainly a lesser being in
the divine hiel'archy, The two instances of this "formula" or
modular unit, several thousand lines apart, are connected
because they are instances of exactly the same words in exactly
the same position in the line. This method of writing dactylic
hexameter by combining modules like hominum divumque voluptas was first developed by Homer, who used it as a mnemonic
device, to facilitate the production of poetry which was performed without being written down; ail oral poetry is, as a
rule, formulaic. Later poets who no longer needed to use formulas because they wrote theÏ1' poetry down nonetheless continued to use formulas to pay homage to a tradition of epic
poetry going back to Homer,
Lucretius uses such formulas not just to allude to a tradition,
nor. as sorne have suggested, because he is a poor poet, but to
make philosophical points. For instance, in the example just
given, the poet deliberately starts with the popular view of
Venus as "the delight of gods and men", But, aftel' several
books of expianation which corrects the popular view of the way
things really are, Lucretius, "demotes" Venus by giving hel'
epithet to Calliope : if Calliope can deserve this elevated title.
191
Extrait de la Revue Informatique et Statistique dans les Sciences humaines
XIX, 1 à 4, 1983. C.I.P.L. - Université de Liège - Tous droits réservés.
then anyone can. The gods are not important. after all. The
displaced image. now used ironically. shows this.
It may be clear from this explanatlon of Lucretius' poetic style
that, in my stylistic analysls, 1 am concerned just with words
the poet repeats but wlth which forms he repeats exactly and in
the same position in the line. Consequently. 1 want to examine
the lemmatized word frequency list, but. for portions of my
analysis, the unlemmatized word list will be more interestlng to
me. In partlcular. 1 will produce a list which allows me to
determine how many words are repeated in exactly the same
position in the line. to show that. as 1 suspect, Lucretius uses
formulas more than Vergil does, and dlfferently. One aspect of
my project wl1l be to develop a list of each form in the text
followed by a number. from one to seventeen, for the syllable
or metrical position (sorne lines wl1l have spondees for dactyls,
and so will have "null" metrical positions) in which the form
begins. This will allow the locatlon and examination of formulaic
expressions as well as the evaluation of the formula list which
Minyard produced by hand.
The Conflict in Computerlzed Analyses between Form snd
Content
Typically, work on style with computers falls into two categories : statistlcal analyses of semantically neutral elements of
an author's style and manipulations of numbers that describe
meaningful events or units : semantically significant elements
like "concepts" or "themes". A good example of the former type
of study is Mosteller and Wallace's analysis of the Federa1lst
Papers to resolve instances of dlsputed authorship (8). Their
work consists of a series of tests on samples of varylng sizes
on which they use a varlety of different statistical measures.
Their analyses focus on "functlon words" the opposite of meaningful, contextuai words. Mosteller and Wallace compare the
frequencies of the functlon words in varlous papers known to
have been written by Madison or Hamilton to reveal words more
typical of one than of the other. They then examine Federalist
Papers whose authorship is unknown to see which of these
"marker words" they contain. those found in Papers attrlbuted
to Hamilton or those found in Papers attributed to Madison.
A work of this sort in the area of ancient texts is the analysis
by Levison et al. which purports to show that Plato's Seventh
192
Extrait de la Revue Informatique et Statistique dans les Sciences humaines
XIX, 1 à 4, 1983. C.I.P.L. - Université de Liège - Tous droits réservés.
Letter was not written by Plato. This argument is based on a
comparison of the Seventh Letter and Plato's Apology for such
factors as relative length of sentence and the frequency of two
colorless words,
kai and de.
The counterargument by
Brandwood shows that the diffërences between the two documents are caused not by their having different authors but by
the different genres in which they were written : on is a
highly rhetorical oration while the other is a prose letter (1).
The second type of study described above analyzes contextually
significant units. But, precisely because of the many problems
introduced by work with semantically significant units, this
type of study often produces questionable or ambiguous results.
For example, looking forlltheme" words or "conceptll words is
difficult because we have to decide which words are synonymous
with or connected to which others to form a theme or a concept. In addition, the frequency of semantically laden words
varies with topic even more than, as we saw above, the frequency of function words varies with genre : a high frequency
of the word "ship" in a treatise on shipbuilding. or of the
words for "child" in Euripides ' Medea does not describe the
authOl"S style, but, rather, the topic at hand. How many
instances of a word allow us to conclude that it is present as a
stylistic choice made by the author ? Does the author use the
word more than do others who use the language ? Who else uses
the words and at what rate ? These questions do not suggest
that stylistic analysis which works on contextuai words is
impossible, but they do indicate that such work poses difficult
pl·oblems.
On of the major difficulties of this method is that it requires
the researcher to decide what constitutes a normal frequency of
use for a word, in order to compare this figure with the frequency with which a particular author uses the word. As long
ago as the 1930's. Zipf noted that in any language. a small
number of words is used a great number of times : these are
words Iike "the". On the other hand. a large number of words
is used
a small number of times: these are words Iike
"Mississippi" (12). Thus the shapes of the frequency curves of
aIl languages are approximately the same and can, after the
very frequent words Iike "the" are removed, be described by
the formula.
193
Extrait de la Revue Informatique et Statistique dans les Sciences humaines
XIX, 1 à 4, 1983. C.I.P.L. - Université de Liège - Tous droits réservés.
where
R = rank of each word in the frequency list
F = frequency of that word
K = a constant
Figure 1 is a plot of the frequency of occurrence of words in
the Latin of Plautus and in modern English. The plot is based
on Zipf's calculations and data (12).
A wide variety of linguistic and stylistic analyses are based
upon Zipf's formula. Guiraud combines the formula with the
use of a word frequency list fOl' a given language (5). Guiraud
develops a formula for comparing an author's use of a word to
its use in general, as described by the Zipfian curve. His
formula is :
AD = F - FT
RD = AD
FT
where
AD = absolute difference
F = frequency of use
FT = theoretical freq uency
(from a word list)
RD = adjusted difference
The denominator of the expression for RD is a correction factor
for the curved line which Zipf identified. While this method of
evaluating style is very appealing and potentially useful. its
use poses difficulties for ancient languages because, for many
of them. we have relatively small samples from which to produce
a word-frequency list.
Delatte uses Guiraud's formula to compare the literary styles of
Tibullus and Propertius by determining the words each uses
most frequently and computing Guiraud's index for them to
show which words each of the two poets uses more than that
word is normally used ln the language (2). Clearly this kind of
analysis is only as good as the word list on which it is based.
The L.A.S.L.A. has recently produced a list (3) which is
based on a broader sample than that of Diederich. which was
previously the most readily available word list. The use of
194
Extrait de la Revue Informatique et Statistique dans les Sciences humaines
XIX, 1 à 4, 1983. C.I.P.L. - Université de Liège - Tous droits réservés.
Diederich's list was problematical for my research because it
contains samples of both Lucretius and Vergil which were
chosen for their excellence as poetl'y . Such passages can be
expected to be richer in semantic anomaly than either the entire
work of each of the two poets or the language as a whole. The
new list prepared by L. A. S. L. A. contains no Lucretius and so
is better for the analysis of that poet, and, although it does
contain Vergil (Bucolics, Eclogues, Aeneid 1-6) the sample it
uses is at least a uniform one and not based on carefully chosen selections from the poet. Yet the composition of the new
word list reflects work done by L. A. S. L. A. and not an attempt
to produce a systematic l'andom sample of Latin. Although 1
have included a Guiraudian analysis into my study, 1 have not
based broad conclusions on lt alone because it is so dependent
on problems of sampIing.
Because comparing an author's frequency of word use to "normal" use is so problematic, quantitative IInguists and information scientists have developed ways of avoiding the use of these
lists. Luhn has shown that the semantically important words an
author uses can be found by marking off areas of the Zipfian
curve to focus on the words an author uses a medium number
of times (6). He begins by cutting off the top and bottom of
the curve to eliminate words IIke "the" and like "Mississippi".
He argues that the remaining, intermediate region of the overall
frequency curve will contain two kinds of words: a small
number of structural words which would occur frequently in
any writing and which are the trailing off of the large group
which was eut off at the top of the curve, and a larger number
of content words. This latter group contains those words which
would be much ral'er in a frequency count of the language as a
whole, but whose frequency has been shifted ~ by the constraints of subject matter and style which the author has placed
upon his work. lt is Luhn's supposition that this intermediate
area of the curve will contain the greatest concentration of
content words. He uses common sense ta determine the demarcatian of this area : he says that establishing the optimum locations would be "a matter of experience with appropriately large
samples ll •
Luhn's work is especially useful because he suggests a way of
calculating a "significance factor" based on the words in the
middle region of the curve. This factor "reflects the number of
occurrences of significant words within a sentence and the
Iinear distance between them due to the intervention of non195
Extrait de la Revue Informatique et Statistique dans les Sciences humaines
XIX, 1 à 4, 1983. C.I.P.L. - Université de Liège - Tous droits réservés.
slgnificant words. "This calculation aUows a judgement to be
made of the syntagmatic plane on the basis of the paradigmatic
plane; that is, it permits us to judge the whole structure of a
work from what we can learn about individual words. In my own
work, 1 will also use signlficance factor to point to those areas
of the De Rerum Natura which are particularly rich in formulaic
utterances.
The amount of semantic novelty whlch an author employs can be
assessed by entropy (H). Like the Luhnian demarcation of the
curve, thls measure can be applied independently of comparative evidence across the language. Il points to the numbers of
different words that an author uses : a hundred-word text
which consists of a hundred different words w!th no repetltions
has a hlgh entropy whlle a text of equal length which consists
of a hundred repetltions of the same word has a low entropy. A
low H thus means a small amount of unexpectedness : a word
missing from the text could be guessed wlth reasonable accuracy, even independently of the context. And a smaU H indicates a higher degree of language structure. Various analyses
of a text can be done with entropy as their base : a preliminary analysis of Lucretius and Vergil fol' entropy is suggestive
and 1 will describe !ts results below.
Forecast of the Project and Account of the Present Series of
Investigations
Because 1 wanted to work with both structuraUy Important units
and w!th semanticaUy rich units, 1 have planned a group of
analyses which Incorporate both elements. The results of these
analyses will be independent of each other and will, if my
notions of Lucretius are correct, rein force each other.
My study will include the analysis of Lucretius and Vergil fol'
function words, the semanticaUy neutra! words which can distinguish one author from another. The aim of this analysis is
not to determine who wrote the De Rerum Natura but to develop
a list of markers, semantically neutral words whose use is
typical of each author. Another analysis will look fol' the significant or semanticaUy meaningful words by employing Luhnian
analysis, making Guiraudian calculations and performing tests
based on entropy. Both the semanticaUy rich words and the
neutral words can then be compared wlth the formulaic repetitions which the scansion and counting programs will locate. Il
196
Extrait de la Revue Informatique et Statistique dans les Sciences humaines
XIX, 1 à 4, 1983. C.I.P.L. - Université de Liège - Tous droits réservés.
is not necessary to demonstrate that Lucretius employed formulas, but it will be worth while to see with what marker words
and with what theme words his formulaic expressions can be
connected.
For several reasons, then. frequency counts will be important
to these analyses. They will locate marker words and semantically rich words. The frequency lists of Lucretius can then be
compared to those of Vergil. as 1 will show below. and, eventually, to those of Cicero and Plautus. and, finally. to the
standard word list for Latin usage. The counts provided by
analysis of individual authors will couect the figures obtained
from the standard word list, and thus help set confidence limits
for the Guiraudian measures.
Further calculations of the entropy shown by sections of the De
Rerum Natura will reinforce the findings obtained by Luhnian
sectloning of the frequency curve and the calculation of the
Luhnian significance factor. These analyses will glve us a much
better idea than we have at the moment of what constitute
Lucretius' themes and of how he presents and develops them.
It is early in the project, but a series of preliminary analyses
has been suggestive. 1 have produced word frequency lists for
both the lemmatized and the unlemmatlzed word lists for both
Book 1 of the De Rerum Natura and for book 1 of the Aeneid.
The two samples were of different sizes : 5152 words of Vergil.
7427 of Lucretius. The first set of analyses worked with each of
the samples separately and so was not disturbed by the difference of sample size. The second set of analyses. described
below, used the Guiraudian calculation to compensate for the
difference in sample size. The first analysis simply produced a
ranked frequency list for each author. Luhnlan markers were
then placed on the frequency lists to lndicate the area of middle
frequency words which might have thematic significance. An
excerpt from the frequency list for each author is presented in
Table l, below.
The first notable difference between the two authors in this
limited data was in the rate at which thelr word frequency
curves dropped off. The most frequent word in each sample is
used about 250 times : sum at 256 instances ln Lucretius and
que at 252 in Vergil. But the fifty-fourth most frequent word
in Lucretius is unde which appears 19 times while in Vergil it is
oris which appeal's only 8 times : Lucretius has 76 words more
197
Extrait de la Revue Informatique et Statistique dans les Sciences humaines
XIX, 1 à 4, 1983. C.I.P.L. - Université de Liège - Tous droits réservés.
than Vergil that intervene, that is, that are used more than 8
times,
ln general, it seems clear from the stem frequency as weil as
from the ward frequency counts that Lucretius uses a larger
number of words a medium number of times. 1 have followed
Luhn's suggestion and placed the demarcation of the medium
range which he descrlbes by sense : reading down the frequency Iist of Vergil, 1 found que (252), et (166), in (49), ~nd
then, eventually urbs (20). 1 would assume that the medIUm
frequency range starts with urbs and ends, perhaps at
venio (15) or ge~s (15), The whole middle area seems shifted
upward in LucretlUs, where it seems ta start with the eighth
ward instead of the twenty-second, golng from corpus (100) ta
Inane (47), Larger Luhnian areas couId be marked off as weil :
from urbs (20) ta ex (5) or Aeolus (4) ln Vergil, and from
corpus (100) ta fruges (5) ln LUCl'etlus, The smaller segments
of the frequency curve that 1 have marked off seem more
significant because they have such a high concenh'ation of
signiflcant words.
ln fact, the difficulty of placing Luhnian Iimlts on the CUl've
highlights a dlffel'ence between Lucretius and Vergil : ta start
at corpus (number 8 on the Lucretius Iist), 1 have considered
l'es (number 2) a marker ward in a work called De Rerum
Niitura. Clearly l'es and Its genltlve plural l'erum are significant
theme words for Lucretius, although in ordinary Latin they are
empty words. Now, as the Gulraudian analysis below shows,
Lucretius uses l'es only sllghtly more often than it Is used in
the language as a whole, but much more often than Vergil does
(he uses l'es 16 times in the pl'esent sample). Sa perhaps l'es is
significant in Lucretius, and certainly, corpus is, The Luhnian
"middle" region starts, in Lucretius, at ward 8 or perhaps even
ward 2 ! 1 began this study ta describe Lucretius' poetic style
in objective terms and, even at this early stage, 1 have seen a
difference between Lucretlus and Vergil which is of an overwhelming and powerful sort : Lucretius seems ta have reshaped
the frequency curve of the language in such a way that his
Luhnian middle region, if he can be said ta have one, occurs
much higher on his frequency Iist. Further study will show
whether Lucretius does this ln the l'est of his poem and will
compare his ward frequency pattern wlth that of Vergil, Plautus
and Cicero,
198
Extrait de la Revue Informatique et Statistique dans les Sciences humaines
XIX, 1 à 4, 1983. C.I.P.L. - Université de Liège - Tous droits réservés.
Perhaps sorne of the difference in word frequency rate can be
explained by the different genre and content of the two works.
Lucretius uses a large number of words often, because he is
talking about the same atomic basis which underlies the disparate elements of the universe. Thus, he establishes a series of
familiar terms, almost technical terms, whose meanings rein force
one another. Vergil, on the other hand, transports a hero to
ever different regions where he encounters new problems and
people, for which and whom the author uses ever-different
words. It will be the work of further analysis to see if the
pattern of word frequency use that is present in these segments of the two authors endures throughout their works, and
if any significance can be suggested for it beyond the subject
matter. However, even this preliminary analysis shows that
Lucretius and Vergil, while writing in the same verse form,
went about their tasks very differently.
Further work will compare Lucretius'style to that of other
Roman authors. But the present analysis aiready shows how
Lucretius'usage differs from standard Latin usage.
The
Guiraudian calculation both compensates for the disparate sizes
of the samples from Lucretius and vergil and compares their
word frequency patterns to those of other Latin authors. The
frequency
samples
used
for
general
Latin
usage
are
P. B. Diederich's word list (4) and the L.A.S.L.A. word
list (3). Diederich's list has been compiled from a sample of
124,686 words, 49.363 of prose and 75,323 of poetry while the
L.A.S.L.A. list is based on a sample of 794,662 words, 582,411
from prose and 212,251 from poetry. In my comparison, 1 have
used the lemmatized lists of my authors because the entries of
both word lists are organized according to lemma. Table 1
shows the Guiraudian figures for the first fort y words of each
author, as calculated from the Diederich word list; Table 2
presents the same calculations from the L. A. S. L. A. list.
About 11% of Diederich's sampie of Latin poetry consists of
excerpts of Lucretius and Vergil, so that Diederich's poetry
sample seemed a questionable basis from which to predièt frequencies in these two authors. Accordingly, 1 made separate
Guiraudian calculations on Diederich's prose sample, on his
poetry sample and on their combination. In fact, the three
calculations were not much different from each other. So 1 have
in Table 1 and Figure 2, which is based on the table, presented only the calculation based on Diederich's poetry sample.
199
Extrait de la Revue Informatique et Statistique dans les Sciences humaines
XIX, 1 à 4, 1983. C.I.P.L. - Université de Liège - Tous droits réservés.
A different problem obtained with the L.A.S.L.A. list. This list
contains no Lucretlus, but a substantial amount of Vergil.
About 25% of this list consists of Vergil. Thus 1 feared that
comparlson of Lucretius with thls list would show only that he
wasn't Vergil. And, of course. that comparison of Vergil with
the list would show only that Vergil was very like himself. In
fact. the results obtained from this list are less cieal' than
those from the Diederich list. To minimize the misleading impression which could result from the examination of so biased a
sample. 1 have presented, in Table 3 the Guiraudian scores
calculated from this list, based on the prose sampie , the poetry
sample, as weil as both samples taken together. For the
L.A.S.L.A. word list, Figure 3 shows a graphie representation
of the prose sampie , whlle Figure 4 represents the poetry
sample. 1 regal'd these calculations as less important for my
work simply because the sample happens to be unfortunate for
my project.
However, both sets of calculations support the same conclusions. The Guiraudian scores show even more clearly than
the ranked frequency lists that Lucretius uses words very
differently from Vergil. The graphs of the Guiraudian scores
from the tables make this clear. Figures 2 and 3 are graphs of
the Guiraudian scores of each author for the top of the word
frequency list. The curves are very simllar in shape and show
a similar difference between Lucretius and Vergil (Figure 2
shows more detail by adjusting the score of Vergil's top word
que. which is 117.4, and reducing it to 65).
Guiraudian scores are relative : Figure 2 shows that Vergil's
usage is about normal. that is. he uses words as they are used
in the language. For Vergil, the places where the curve is
ab ove normal show Individu al theme words or marker words. On
the whole, theme words are more frequent than markel' words.
But, for Lucretlus the whole curve has risen. On the whole,
Lucretius' Guiraudian scores are much hlgher than Vergil's and
for a whole area in Figures 2 and 3 from word 4 to word 20,
his Guiraudian scores do not drop to zero or below. In Figure 2
this area is markedly elevated and has no dlps below zero in it.
This region colncides with the Luhnian markers that we imposed
on the frequency list above. No part of Vergll shows such
elevation. The analogous region, the Luhnian area we marked
off for Vergi!o from word 22 to word 37, shows no such
pattern : there are four drops to zero ln it. The region of
greatest elevation for Vergll, from word 35 to word 47, is
200
Extrait de la Revue Informatique et Statistique dans les Sciences humaines
XIX, 1 à 4, 1983. C.I.P.L. - Université de Liège - Tous droits réservés.
surpassed by the elevatlon of both the same region in Lucretius
and even more by the elevatlon of the Luhnian region in
Lucretius.
Figures 2 and 3 both show that Lucretius and Vergil have very
different poetic styles, and that Vergil's is much more typical
of Latin as a whole. Lucretlus has distorted the very pattern of
word usage in the Latin language to produce poetry that ref1ects his Epicurean message. lt remains to be seen whether
this pattern of word usage continues throughout the work, and
if the elevated areas on the graph of the Guiraudian scores
colncide with partlcular metrlcal patterns. The second phase of
the analysis, which will utl1lze the scansion program , will reveal
such coincidences.
The figures derived from the L. A. S. L. A. word IIst, the other
lIst used for calculations, raise important questions which
future analysis must try to answer. The shape of the curve
based on the Guiraudian scores calculated accordlng to this list
is about the same as the shape of the curve calculated from
Dlederich's Ust. Yet there are interesting differences between
the two sets of calculations. The L. A. S. L. A. list seems to have
proportionally many more Instances of que, et, video, sui and
atque. It is not entirely clear why these differences exlst and
yet they do not affect the observations which the Guiraudian
scores suggest. There seems to be sorne effect caused by the
larger slze of the L. A. S. L. A. IIst: the largest Guiraudian
scores calculated from it are not as large as the largest scores
calculated from the Diederich Ust. This Is doubtless because
even very rare words come up more often in a larger list than
in a sma11er one. For instance, regina appears not at ail in
Diederich's list, but 73 times in the L.A.S.L.A. IIst. Conversely, the newer list has far fewer instances of inane and
ratio, probably because It contains no sample of Lucretlus. Yet
many scores are very similar from both lists : for example ,
Lucretius' scores for corpus,llatura, hic and ignis and Vergil's
scores for hic, in, do, ille, and fero are probably in the same
range in each Ust. Thus,-It is not the case that the two IIsts
are on different scales although they are scaled somewhat
dlfferently. The major difference seems to be that Vergil scores
slgnificantly more positlvely in the Guiraudian scores calculated
from the L. A. S . L. A. IIst than ln the scores calculated from the
Diederich IIst. This increase can most clearly be seen from
Figure 3. Perhaps it arises in part as a result of the large
amount of Vergil in the newer word Ust.
201
Extrait de la Revue Informatique et Statistique dans les Sciences humaines
XIX, 1 à 4, 1983. C.I.P.L. - Université de Liège - Tous droits réservés.
The present analysis suggests that Lucretius'stylistic pattern
has ta do with blacks or modules that are repeated. Not only is
Lucretius'usage of significant words different from VergiI's, but
the Epicurean poet uses marker words differently as weIl. For
Vergil, the Guiraudian curve tends toward zero or the negative
at marker words, bath inaide and out of the Luhnian region
marked off on the frequency list : Table 1 shows that only 13
seemingly empty words of the first fort y have scores higher
than one (Table 2 has the same number). Two of these, tu and
do, may be theme words. In Lucretius, 22 (20 in Table2) of
the first 40 words are seemingly empty but have scores higher
than one. Of these nuIlus, possum and quoniam may have
thematic raies. But at least nineteen words remain; it seems
lhat Lucretius uses bath colorless words and semantically meaningful words more often than Vergil does, and more often than
they are used in the language as a whole. This practise relates
to his widespread use of formulaic language, which repeats
modular units that incorporate bath thematic words and markel'
words.
The prelimlnary findings about Lucretius' stylistlc pattern are
confirmed as weIl by the calculation of entropy for the samples
drawn. For each author, entropy calculated on the ward sample
is necessarily larger than that calculated on the stem sampie.
But the entropy of Lucretius' word sample (78.59) is about the
same size as that of Vergil's stem sampie (77.37). The complete
figures are found in Table ~Again, the figures show that
Lucretlus uses the same words over and over again. Stephen
Waite' s study of enh'opy in Latin prose authors provides figures
for comparison (11). Waite used samples of a thousand words,
arbitrarily drawn, from three prose authors. Except for very
short samples, entropy is not affected by disparate sample size,
sa Waite's figures can be compared ta those given here.
Waite's lowest figure is an average entropy of 84.59 for Cato's
De Agricultura, followed by 90.3 for selections from Sallust and
90.69 for selections from Livy. His high figure, of 91.62, for
one sample (aIl that is available) from Cato's fragments is, as
he says himself, questionable, for the entropy of fragmentary
passages is a questionable entity. These figures should be
compared ta those 1 obtained for my wordsamples, as he used
unlemmatized material. VergiI's entropy figure is simllar ta the
figure for Cato's De Agricultura. The most probable reason for
the relatively low figure for the poets is that poetry limits ward
choices far more than prose does : metre imposes restrictions
202
Extrait de la Revue Informatique et Statistique dans les Sciences humaines
XIX, 1 à 4, 1983. C.I.P.L. - Université de Liège - Tous droits réservés.
on word choice, forcing repetition even in a poet as innovatlve
as Vergll. But none of Walte's figures Is more than two points
removed from any other. Lucretius' entropy, at 78.59, is much
more dlfferent from that of Vergil than any of Walte's authors
are from each other. AIso, Lucretlus' entropy figure Is the
lowest of the group as a whole. Agaln, thls suggests that
Lucretius Imposed constraints on hls use of language that went
far beyond those of other authors.
1 would like to end by conslderlng what thls analysls shows
about poetic themes ln each poet. For Vergll, the analysls
reveaIs few surprises. We know that urbs, terra, vil' and arma
are hls themes. His Luhnlan reglon contams nothing unexpected. For Lucretius, however, thls Is not the case. Agaln the
Luhnlan reglon contalns words we would expect to flnd as theme
words :. corpus and Inane. But at least the extent to whlch
l'es, ratio and natura appear are surprlslng. Students of
Lucretlus have argued that these are colorless words whose
Incidence ln Lucretlus Is slmply a feature of the Latin language.
Yet at least the last two are used by Lucretius far more than
they are ln the language as a whole. Lucretlus uses these
words as theme words : or, If you prefer, as he usesother,
clearly colorless words like qulsque, constoand Is. Perhaps It
Is fair to say that he has redefined what a theme word Is,
maklng It Into a neutral atomlc unit whlch· acqulres color, as
sald, from Its position and function ln the poem as a whole :
. .. It makes a great difference wlth what others the
atoms are comblned, and ln what position ...
DRN l, 817-818.
Eva THURY
Reference list
1)
Brandwood,
L.,
"Plato's
L.A.S.L.A., 1969, 4, 1-21.
2)
Delatte, L., "Key-words and poetic themes ln Propertius
and Tlbullus". Revue L.A.S.L.A., 1967, 3, 31-83.
Seventh
Letter" .
Revue
203
Extrait de la Revue Informatique et Statistique dans les Sciences humaines
XIX, 1 à 4, 1983. C.I.P.L. - Université de Liège - Tous droits réservés.
3)
Delatte, L., Evrard, Et., Govaerts, S. and Denooz, J.,
Dictionnaire fréquentiel et index inverse de la langue
latine. Liège, 1981.
4)
Diederich, P., The Frequency of Latin Words and their
Endings. Chicago, 1939.
5)
Guiraud, P., Les Caractères Statistiques du Vocabulaire,
Paris, 1954.
6)
Luhn, H., "A Statistical Approach to Mechanized Encoding
and Searching of Literary Information". IBM Journal, 1957,
4, 309-317.
7)
Minyard, J., Mode and Value in the De Rerum Natura.
Wiesbaden, 1978.
8)
Mosteller, F. and Wallace, D., Inference and Disputed
Authorship : "The Federalist". Reading, 1964.
9)
Thury, E., "Naturae species ratioque. Poetic Image and
Philosophical Perspective in the De Rerum Natura of
Lucretius". Ph.D. diss. University of Pennsylvania, 1976.
10)
Thury, E., "The Poem of Lucretius as a Simulacrum of the
Rerum Natura" to appear in the American Journal of
PhiIology.
11)
Waite, S., "Approaches to the Analysis of Latin Prose,
Applied to Cato, Sallust and Livy". Revue L.A.S.L.A.,
1970, 2, 91-112.
12)
Zipf, G. Selective Studies of the Principle of Relative
Frequency in Language. Cambridge, Ma. 1932.
204
Extrait de la Revue Informatique et Statistique dans les Sciences humaines
XIX, 1 à 4, 1983. C.I.P.L. - Université de Liège - Tous droits réservés.
lucretiul
Verlil
Humber
1
2
3
1
5
6
7
8
9
10
Il
12
13
11
15
16
17
18
19
20
21
22
23
24
Word
que
0'qui
hl,
ln
qull
.u..
lu
d.
por
Ille
,u..
...
ah
atque
,ul
lam
ml(QUI
OOUlII
.ul
ad
urbi
terra
nec
rero
lu..
.Ir
Gulraudl..
AetuaJ
Frequency
Scoro
252
166
101
58
19
13
11
33
32
29
28
28
25
25
117.1
18.2
1.0
.8
-1.1
.7
-10.7
3.7
3.0
1.3
0.0
0.1
2.1
25
7.3
3.1
0.1
3.2
1.8
6.9
0.9
nec
ox
nalura
ut
video
.ul
rallo
Ilque
qullque
loane
.1
hic
~.9
por
1.~
33
al,
r..
21
23
23
22
21
21
20
20
19
18
18
18
17
17
17
16
16
16
34
reelnt
I~
14.1
35
36
video
venlo
lens
nunc
15
15
-0.1
0.7
I~
H
•peclui
15
14
14
0.1
1.3
2.9
2~
26
27
28
29
30
31
32
37
38
39
40
WI.
non
ar...
Iple
Word
3.7
-3.8
0.6
4.9
2.7
7.3
-4.3
~.I
-1.3
1.7
1.8
.u..
r..
qui
que
et
pOlluœ
ln
CorpUI
omnl,
terra
1.
l,ois
qull
non
,u..
conal0
lu
Dulhn
par.
noque
nam
IDUItUI
tempUI
quonllJl
ah
aHUI
d.
AetuaJ
Frequency
256
196
188
158
130
12~
117
100
99
71
65
62
58
~6
~1
~2
19
17
17
47
47
4~
4~
43
12
10
38
38
37
37
GuinudÎlO
_0
-.9
.6
2.1
~9.9
8.2
7.6
3.7
22.3
25.2
26.0
27.7
10.6
12.5
13-1
16.8
10.3
13.1
10.1
30.1
2.8
-3.2
2.1
8.9
11.6
9.2
1.1
-3.6
0.0
31.5
2.3
3~
~.6
34
34
34
7.2
8.6
7.3
33
33
33
33
28
28
~.2
4.8
19.0
2.2
6.4
3.9
Table 1
GulraudllO Scorn ror Poetry Sucd on Diederfch'l Frequency Lilt
205
Extrait de la Revue Informatique et Statistique dans les Sciences humaines
XIX, 1 à 4, 1983. C.I.P.L. - Université de Liège - Tous droits réservés.
Lucrellus
Verall
Humber
W...d
Gulraudian
Actual
Frequency
1
2
3
1
5
6
7
8
9
10
Il
12
13
li
15
16
17
18
19
20
21
22
23
21
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
que
.1
qui
hl,
in
qui,
.um
lU
252
166
104
58
19
13
11
B
d.
per
III.
,um
...
.b
atque
'Ul
i...
".."Ut
omoi.
.ul
Id
urbi
terra
ne<
fero
lUm
,l,
11.111
.,..
non
ipse
.1,
'OI
retina
video
venlo
....
nunc
0
peclUI
32
29
28
28
25
25
25
21
23
23
22
21
21
20
20
19
18
18
18
17
17
17
16
16
16
15
15
15
15
15
14
li
Word
Scor.
9.7
1.6
1.1
-0.0
-0.1
wO.7
-5.0
-3.0
2.7
2.7
-0.7
-0.2
3.1
-3.1
2.5
1.5
0.5
0.9
0.2
1.7
0.6
5.2
2.5
-2.1
0.3
3.6
1.1
ActuaJ
Pnquency
.u"
ro.
qui
que
"poIOu"
ln
corpus
omol.
flOt
ex
natura
ul
video
.ul
ratio
alque
qui.que
iOlM
.1
hic
po<
terra
1.
19n1l
quis
non
5.6
-<t.a
'u"
1.1
-0.9
2.7
1.7
12.1
-0.1
1.4
6.0
-0.0
1.5
2.9
lU
nullua
p....
neque
nUl
Ilultus
comto
te1IpUI
quonl..
ab
.JiUl
d.
256
196
188
158
130
125
117
100
99
71
65
62
58
56
54
52
49
47
17
17
17
15
15
13
12
10
38
38
37
37
35
34
34
34
33
33
33
33
28
28
Gulraudlan.
Scor.
11.5
6M
8.7
-2.9
-5.6
22.5
1.3
29.7
12.5
3.5
14.0
11.3
5.6
6.6
7.1
18.9
5.7
28.5
125.3
1.6
-1.0
3.9
7.0
17.1
lM
-3.2
-3.1
-0.6
38.6
-1.8
1.7
8.4
16.3
8.0
1.1
8.2
25.1
3.1
6.1
5.2
Table 2
Gulnudian Scoret Based CD the L.A.S.L.A. frequency Lin
206
Extrait de la Revue Informatique et Statistique dans les Sciences humaines
XIX, 1 à 4, 1983. C.I.P.L. - Université de Liège - Tous droits réservés.
Verlll
w....
Num.ber
Aclua1
Frequency
2
3
4
5
6
7
&
9
10
que
••qui
bit
in
qui.
.u..
'u
do
p..-
Ill.
.2 tU"
'3 .b
'4 ego
15 atque
.6 au'
17 1...
.& œa.tnu.
'9 omol.
20 .ul
21 ad
22 urbi
23 1"-"
24 nB<
25 r....
26 lU"
27 .Ir
2& taHI
29 non
30
3' ipl9
32 Bit
Il
"'...
33
H
35
36
37
3&
39
40
....
reslna.
,ideo
venlo
....
252
166
104
5&
49
43
41
33
32
29
2&
2&
25
25
25
24
23
23
22
2.
21
20
20
19
1&
.&
1&
17
17
17
.6
16
16
15
15
15
15
nunc
15
0
'4
14
pot'"
w....
Guiraudian
SCorel for
Poe.ry
•
_.
Locrellua
9.7
1.6
4.'
-0.0
-0.4
-0.7
-5.0
-3.0
2.7
2.7
-0.7
-0.2
3.1
-3.1
2.5
1.5
0.5
23.0
1.3
-2.7
0.5
2.0
-5.0
-10.2
2A
4.2
6.9
0.5
-1.9
-3.0
-3.6
4.6
&.2
-0.5
4.2
5.7
4.5
9.'
-6.5
5.'
-1.7
4.0
-1.3
33.6
-0.4
3.1
5.&
4.2
'3.&
12.&
1'-2
1.7
0.6
5.2
2.5
-2.4
0.3
3.6
4.1
5.6
-4.0
4.1
-0.9
2.7
4.7
12A
-0.4
lA
6.0
-0.0
1.5
2.9
3.4
-2.6
-2.2
·1.2
0.5
2.5
0.5
4.4
Botb
1&.1
1.4
-1.3
0.3
1.3
-4.'
-9.1
0.4
3.7
3.2
-2.1
-1.7
-OA
·0.7
2.5
0.6
3.0
0.6
-1.4
-2.1
-2.7
4.7
5.&
-1.1
2.7
5.0
4.4
&.0
-5.9
4.&
-1.5
3.6
-0.4
21.1
-0.4
2.'
5.&
2.5
6.\
7.3
.u..
....
qui
quo
et
POUUla
ln
CorpUI
omnil
nB<
.x
natura
u.
video
.ul
ratio
atque
,....
qulsque
.1
bit
pB<
1..... ·
1.
llOtl
Qui.
non
tu..
CotLlto
lu
nul1uI
...
part
n",uo
Actual
Frequency
256
'96
186
15&
130
.25
117
100
99
71
65
62
5&
56
54
52
49
47
47
47
47
45
45
43
42
40
3&
3&
37
37
35
34
34
34
lDutlUI
tempu.
33
33
quoalua
33
33
2&
2&
ab
aliua
d.
Guiraudian
Scores for
Poe.ry
PrOIe Botb
2.4
11.5
0.1
68.5
33.7
28.9
1.3
&.7
-0.6
3.2
-2.9
6.5
-5.&
-5.6
-5.9
22.5
13.6
15.3
-0.7
-2.1
4.3
29.7
26.0
26.9
&.5
12.5
7A
6.1
7.3
3.5
14.0
3.&
5.4
20.4
41.3
17.6
0.7
5.6
-0.5
6.6
6.6
6.5
-0.4
0.9
7.'
19.7
4&.9
16.7
5.&
5.7
5.&
15.7
2&.\
13.7
125.3
103.6 10&.3
0.7
1.6
0.4
-4.0
-3.6 -3.7
4.6
4.&
3.9
12.5
7.0
16.7
-2.8
-4.7
17A
17.2
22.6
lM
-1.5
-3.2
-0.&
-5.7
-3.1
-6.5
-2.4
-0.6
-2.9
3&.6
23.4
25.8
-1.1
1.0
-4.&
4.7
5.2
5.5
4.7
5.5
&.4
6.0
10.3
'.0
7.1
&.0
6.&
1.5
4.1
0.&
7.6
&.2
6.3
19.1
25.4
17.6
-1.0
-1.9
3.'
-0.2
0.&
6.'
2.4
\.7
5.2
Tab183
Guiraudlan Scoret for Poetry. ProIe and Bolh Comblned Based on L.A.S.L.A. Frequençy Lill
207
Extrait de la Revue Informatique et Statistique dans les Sciences humaines
XIX, 1 à 4, 1983. C.I.P.L. - Université de Liège - Tous droits réservés.
stems
words
Lucretius
Vergil
69.94
78.59
77.37
84.98
Table 4
Entropy for Book One of Each Author
208
Extrait de la Revue Informatique et Statistique dans les Sciences humaines
XIX, 1 à 4, 1983. C.I.P.L. - Université de Liège - Tous droits réservés.
Zipf, Selected Studies of the Principle of Relative Frequency in Language
Frequency of Occurrence of Words in the
Latin of Plautus
Number of Occurrences
70
rrr-------------------------,
0"
60
"
"
40
20
1 2 3 4 5 6 7 8 910203040506070809(10203040506070809(102030405000
NJIter of Words
Figure 1
209
Extrait de la Revue Informatique et Statistique dans les Sciences humaines
XIX, 1 à 4, 1983. C.I.P.L. - Université de Liège - Tous droits réservés.
Raw Guiraudian Score: Luc. vs. Vergil
Based on Poetry
calculation Based on oiederich's Frequency List
Guiraudian scores
80
~-------------------------,
Lucretius
o
60
Vergil
~
o
5
10
15
20
25
30
35
40
45
50
55
Words: Host ta Least Frequent
Lucretius List and Vergil List
FigJre 2
210
Extrait de la Revue Informatique et Statistique dans les Sciences humaines
XIX, 1 à 4, 1983. C.I.P.L. - Université de Liège - Tous droits réservés.
Raw Guiraudian Score: Luc. vs. vergil
Based on Poetry
Calculaticn Based on L.A.S.L.A. Freqvency List
Guiraudien SCores
80,.-------------------------....,
Lucretius
o
60
vergil
~
40
20
o
-20 l-_-'-_-"--_-'-_--'-_-'---_-'---_-L_-'-_---'-_---'_----'
o
5
10
15
20
25
UJrds: l'I:lst ta Least fl'<'<:fJfflt
30
35
40
45
50
55
Lvcretius List éVld Vergil List
Figure 3
211
Extrait de la Revue Informatique et Statistique dans les Sciences humaines
XIX, 1 à 4, 1983. C.I.P.L. - Université de Liège - Tous droits réservés.
Raw Guiraudian score: Luc. vs. Vergil
Based on Prose
calculation Based on L.A.S.L.A. Frequenoy List
Guiraudian Scores
80 ~--------------------------,
Lucretius
D
60
Vergll
•
40
20
o
o
5
10
15
20
25
30
35
40
45
50
55
Words: Most ta Least Frequent
Lucretius List and Vergi1 List
Figure 4
212
Extrait de la Revue Informatique et Statistique dans les Sciences humaines
XIX, 1 à 4, 1983. C.I.P.L. - Université de Liège - Tous droits réservés.
Raw Guiraudian Score: Luc. vs. vergil
Based on Prose and Poetry
Calculation Based on L.A.S.L.A. Frequency List
Guiraudian Scores
80,--------------------------
Lucretius
o
60
Vergil
~
40
20
o
-20
-'-_--.J
L - _ - - L _ - - ' -_ _- ' - - - _ - ' - _ - - ' -_ _L-_-'-_-"-_---'_ _
o
5
10
15
20
25
30
35
40
45
50
55
Lucretius List and Vergil List
Wards: Host to Least Frequent
Figure 5
213
Extrait de la Revue Informatique et Statistique dans les Sciences humaines
XIX, 1 à 4, 1983. C.I.P.L. - Université de Liège - Tous droits réservés.