honey throat 1980

Transcription

honey throat 1980
Log-Normal Distributions across the Sciences: Keys and Clues
Author(s): Eckhard Limpert, Werner A. Stahel, Markus Abbt
Source: BioScience, Vol. 51, No. 5 (May, 2001), pp. 341-352
Published by: American Institute of Biological Sciences
Stable URL: http://www.jstor.org/stable/1314039
Accessed: 15/10/2008 21:11
Your use of the JSTOR archive indicates your acceptance of JSTOR's Terms and Conditions of Use, available at
http://www.jstor.org/page/info/about/policies/terms.jsp. JSTOR's Terms and Conditions of Use provides, in part, that unless
you have obtained prior permission, you may not download an entire issue of a journal or multiple copies of articles, and you
may use content in the JSTOR archive only for your personal, non-commercial use.
Please contact the publisher regarding any further use of this work. Publisher contact information may be obtained at
http://www.jstor.org/action/showPublisher?publisherCode=aibs.
Each copy of any part of a JSTOR transmission must contain the same copyright notice that appears on the screen or printed
page of such transmission.
JSTOR is a not-for-profit organization founded in 1995 to build trusted digital archives for scholarship. We work with the
scholarly community to preserve their work and the materials they rely upon, and to build a common research platform that
promotes the discovery and use of these resources. For more information about JSTOR, please contact [email protected].
American Institute of Biological Sciences is collaborating with JSTOR to digitize, preserve and extend access
to BioScience.
http://www.jstor.org
Articles
Distributions
Log-normal
the
across
Keys
and
Sciences:
Clues
ECKHARD
WERNERA. STAHEL,AND MARKUSABBT
LIMPERT,
ON THE CHARMS OF STATISTICS,AND
A s
the need grows for conceptualization,
formalization,and abstractionin biology,so too does mathematics'relevanceto the field(Fagerstr6met al. 1996).Mathematicsis particularlyimportantfor analyzingand characterizingrandomvariationof, for example,size andweightof
individualsin populations,theirsensitivityto chemicals,and
time-to-eventcases,such as the amount of time an individual needsto recoverfromillness.The frequencydistribution
of such datais a majorfactordeterminingthe type of statistical analysisthat can be validlycarriedout on any dataset.
Manywidelyusedstatisticalmethods,suchasANOVA(analysis of variance)and regressionanalysis,requirethat the data
be normallydistributed,but only rarelyis the frequencydistributionof datatestedwhen these techniquesareused.
TheGaussian(normal)distributionis most oftenassumed
to describethe randomvariationthatoccursin the datafrom
manyscientificdisciplines;the well-knownbell-shapedcurve
can easilybe characterizedand describedby two values:the
arithmeticmeanx and the standarddeviations, so thatdata
sets arecommonly describedby the expression +
? s. A historicalexampleof a normaldistributionis thatof chestmeasurements of Scottish soldiers made by Quetelet, Belgian
founderof modern social statistics(Swoboda1974).In addition, such disparatephenomena as milk production by
cows and randomdeviationsfrom targetvaluesin industrial
processesfit a normal distribution.
However,manymeasurementsshowa moreor lessskewed
distribution.Skeweddistributionsareparticularlycommon
when meanvaluesarelow,varianceslarge,andvaluescannot
be negative,asis the case,forexample,withspeciesabundance,
lengthsof latentperiodsof infectiousdiseases,and distribution of mineralresourcesin the Earth'scrust.Suchskeweddistributionsoftencloselyfitthe log-normaldistribution(Aitchison and Brown 1957, Crow and Shimizu 1988, Lee 1992,
Johnsonet al. 1994,Sachs1997).Examplesfittingthe normal
distribution, which is symmetrical, and the lognormal distribution,which is skewed,aregiven in Figure1.
Note that body height fits both distributions.
Often,biologicalmechanismsinducelog-normaldistributions(Koch1966),aswhen,forinstance,exponentialgrowth
HOW MECHANICALMODELS RESEMBLING
GAMBLING MACHINES OFFERA LINK TO A
HANDY WAYTO CHARACTERIZELOGNORMAL DISTRIBUTIONS,WHICH CAN
PROVIDE DEEPERINSIGHT INTO
VARIABILITYAND PROBABILITY-NORMAL
OR LOG-NORMAL: THAT IS THE QUESTION
is combinedwith furthersymmetricalvariation:Witha mean
concentrationof, say, 106bacteria,one cell divisionmoreor less-will leadto 2 x 106-or 5 x 105--cells.Thus,therange
will be asymmetrical-to be precise,multipliedor dividedby
2 around the mean. The skewed size distributionmay be
why "exceptionally"
big fruitarereportedin journalsyearafteryearin autumn.Suchexceptions,however,maywellbe the
rule:Inheritanceof fruitand flowersizehaslong beenknown
to fit the log-normaldistribution(Groth1914,Powers1936,
Sinnot 1937).
What is the differencebetween normal and log-normal
variability?Both forms of variabilityarebased on a variety
of forces acting independently of one another. A major
difference,however,is that the effects can be additive or
multiplicative, thus leading to normal or log-normal
distributions,respectively.
EckhardLimpert(e-mail:[email protected])
is a
and
senior
in
scientist
the
of
the
Inbiologist
Group
Phytopathology
stituteof PlantSciences inZurich,Switzerland.
WernerA.Stahel(email:[email protected])
is a mathematician
andheadof the
ConsultingServiceat the Statistics Group,Swiss FederalInstitute
of Technology(ETH),CH-8092Zirich,Switzerland.MarkusAbbtis
a mathematicianand consultantat FJAFeilmeier&JunkerAG,CH8008 Zuirich,
Switzerland.@ 2001 AmericanInstituteof Biological
Sciences.
May 2001 / Vol.51 No. 5 * BioScience 341
Articles
geous as, by definition,log-normal distributions are symmetricalagainat the log level.
Unfortunately,the widespreadaversionto
cO
statisticsbecomes even more pronouncedas
soon as logarithmsareinvolved.This may be
Q)
the major reason that log-normal distribuN
tions are so little understood in general,
which leads to frequent misunderstandings
and errors. Plotting the data can help, but
111
I
a
CI
IL
55
60
65
70
0
10
20
30
40
graphsaredifficultto communicateorally.In
concentration
height
short, currentways of handling log-normal
distributionsare often unwieldy.
To get an idea of a sample, most people
Figure 1. Examplesof normal and log-normaldistributions.While the
to think in terms of the original
prefer
distribution of the heights of 1052 women (a, in inches;Snedecorand
than the log-transformeddata. This
rather
Cochran1989)fits the normal distribution,with a goodnessoffit p value of
conception is indeed feasible and advisable
0.75, that of the content of hydroxymethylfurfurol(HMF,mg.kg-1)in 1573
for log-normal data,too, because the familhoney samples (b;Renner 1970)fits the log-normal (p = 0.41) but not the
iar properties of the normal distribution
normal (p = 0.0000). Interestingly,the distribution of the heights of women
have
their analogies in the log-normal disfits the log-normaldistribution equally well (p = 0.74).
tribution. To improve comprehension of
log-normal distributions, to encourage
their proper use, and to show their importance in life, we
Some basic principles of additive and multiplicative
present a novel physicalmodel for generatinglog-normal
effects can easily be demonstratedwith the help of two
distributions, thus filling a 100-year-old gap. We also
1
dice
with
sides
numbered
from
to
6.
the
ordinary
Adding
demonstratethe evolution and use of parametersallowing
two numbers,which is the principleof most games,leadsto
characterization of the data at the original scale.
values from 2 to 12, with a mean of 7, and a symmetrical
Moreover,we compare log-normal distributions from a
can
distribution.
The
total
be
described
as
frequency
range
varietyof branchesof science to elucidatepatternsof vari7 plus or minus 5 (that is, 7 ? 5) where,in this case,5 is not
ability, thereby reemphasizing the importance of logthe standarddeviation.Multiplyingthe two numbers,hownormal distributionsin life.
1
and
36
with
a
leads
to
values
between
skewed
ever,
highly
distribution.The totalvariabilitycan be describedas 6 mulA physical model demonstratingthe
tiplied or dividedby 6 (or 6 X/ 6). In this case,the symmehas
moved
to
the
level.
genesis of log-normal distributions
multiplicative
try
these
are
neither
normal
nor
Therewas reasonfor Galton (1889) to complainabout colexamples
Although
lognormal distributions,they do clearlyindicatethat additive
leagueswho wereinterestedonlyin averagesand ignoredranand multiplicativeeffectsgive riseto differentdistributions.
dom variability.In his thinking,variabilitywas even partof
in
we
cannot
describe
both
of
distribution
the
the "charmsof statistics."Consequently,he presenteda simThus,
types
same way. Unfortunately,however,common belief has it
plephysicalmodelto givea dearvisualizationof binomialand,
that quantitativevariabilityis generallybell shaped and
finally,normalvariabilityand its derivation.
in
The
current
science
is
to
use
practice
symmetrical.
symFigure2a shows a furtherdevelopment of this "Galton
metricalbars in graphsto indicate standarddeviations or
board,"in which particlesfall down a board and are devi?
and
the
to
summarize
even
the
ated at decision points (the tips of the triangularobstacles)
errors,
data,
sign
though
data or the underlyingprinciplesmay suggest skeweddiseitherleft or rightwith equalprobability.(Galtonused simtributions(Factoret al. 2000, Keesing.2000,Le Naour et al.
ple nails insteadof the isoscelestrianglesshown here,so his
al.
In
a
Rhew
et
number
of
cases
the
variabiliinventionresemblesa pinballmachineor the Japanesegame
2000,
2000).
is
because
three
stanPachinko.)The normaldistributioncreatedby the boardrety clearlyasymmetrical
subtracting
darddeviationsfrom the mean producesnegativevalues,as
flects the cumulativeadditiveeffectsof the sequenceof dein the example 100 ? 50. Moreover,the exampleof the dice
cision points.
shows that the establishedway to characterizesymmetrical,
A particleleavingthe funnelat the top meetsthe tip of the
additivevariabilitywith the sign ? (plus or minus) has its
firstobstacleand is deviatedto the left or rightby a distance
c with equalprobability.It then meetsthe correspondingtriequivalentin the handysign x/ (times or dividedby), which
will be discussedfurtherbelow.
anglein the secondrow,andis againdeviatedin thesamemanare
in
distributions
characterized
ner,and so forth.The deviationof the particlefromone row
usually
Log-normal
terms of the log-transformedvariable,using as parameters
to the next is a realizationof a randomvariablewith possible
the expected value, or mean, of its distribution,and the
values+c and-c, andwith equalprobabilityforboth of them.
standarddeviation.This characterizationcan be advantaFinally,afterpassingr rowsof triangles,the particlefallsinto
L)(a)
C.)
342 BioScience - May 2001 / Vol.51 No. 5
_
(b)
Articles
A
AL
i
Ak.
k
A A& A
Ai
i
Ab
Adbo.
A6AL
AA
6
Aihi,
amadmo
ohm,
Ai
AA
AAA
A ALiiM?
A A Ak
"
A
k
t
A
k
Ak
A
.h,
dbh
dk h
thegenesisof normaland log-normaldistributions.
Particlesfallfroma funnel
Figure2. Physicalmodelsdemonstrating
ontotipsof triangles,wheretheyaredeviatedto theleftor to therightwithequalprobability(0.5)andfinallyfall into
remainbelowtheentrypointsof theparticles.If thetip of a triangleis at
Themediansof thedistributions
receptacles.
distancexfrom theleftedgeof theboard,triangletipsto therightand to theleftbelowit areplacedat x + c andx - c
for thenormaldistribution(panela), andx * c' andx / c'for thelog-normal(panelb,patentpending),c andc' being
aregeneratedbymanysmallrandomeffects(accordingto thecentrallimittheorem)thatare
constants.Thedistributions
additivefor thenormaldistributionandmultiplicative
suggestthealternativename
for thelog-normalWetherefore
normal
distribution
the
latter.
for
multiplicative
r
+
1
the
Theprobabilities
firsttriangleareatxm- c andXm/C(ignoringthegapnecesof
the
at
bottom.
one
receptacles
of endingup in thesereceptacles,
numbered0, 1,...,r,follow
saryto allowthe particlesto passbetweenthe obstacles).
randp = 0.5.Whenmanypartheparticlemeetsthetipof a trianglein thenext
abinomial
lawwithparameters
Therefore,
forboth
rowatX= xm-c,orX= xm/c,withequalprobabilities
theheightof
ticleshavemadetheirwaythroughtheobstacles,
will be apvalues.In thesecondandfollowingrows,thetriangleswith
the particlespiledup in the severalreceptacles
thetip atdistancex fromtheleftedgehavelowercornersat
to thebinomialprobabilities.
proximately
proportional
- c andxlc (upto thegapwidth).Thus,thehorizontalpoa
x
Fora largenumberof rows,theprobabilities
approach
in eachrowbya randomvarito thecentrallimittheositionof a particle
ismultiplied
normaldensityfunctionaccording
foritstwopossiblevaluesc and
ablewithequalprobabilities
lawstatesthatthe
rem.Initssimplestform,thismathematical
distributed
random
sumof many(r)independent,
1/c.
identically
of particles
in
limit
distributed.
ThereOnceagain,theprobabilities
variables
the
as
is,
r-+oo,normally
fallingintoanyrebinomial
law
as in Galton's
shows
norfollow
the
same
board
with
rows
of
obstacles
a
Galton
fore,
ceptacle
many
on
the
but
because
the
maldensityastheexpectedheightof particlepilesin theredevice,
receptacles
rightarewider
r
of
accumulated
of
than
those
on
the
the
its
mechanism
the
idea
of
a
sum
and
left, height
particlesis
captures
ceptacles,
of rows,
left.
For
number
a
skewed
to
the
a
variables.
random
large
"histogram"
independent
follows
distribution.
This
modified
the
a
2b
how
construction
was
shows
Galton's
heightsapproachlog-normal
Figure
versionof thecentrallimittheorem,
fromthemultiplicative
to describethe distribution
of a productof suchvariables,
identiTothis
whichprovesthattheproductof manyindependent,
whichultimately
leadsto a log-normaldistribution.
random
variables
has
to
are
needed
be
scalene
aim,
approxitheyappear
callydistributed,positive
triangles
(although
distribution.
isoscelesin thefigure),withthelongersideto theright.Let
Computer
implementations
matelyalog-normal
attheWeb
of themodelsshownin Figure2 alsoareavailable
thedistancefromtheleftedgeof theboardto thetip of the
et
al.
sitehttp://stat.ethz.ch/vis/log-normal
firstobstacle
belowthefunnelbexm.Thelowercornersof the
(Gut
2000).
May 2001 / Vol.51 No. 5 * BioScience 343
Articles
p*=100.*
=2
2.
=0.301
LO
0
(a)
0
0'-
0
(a)*
(a)
(b)
50 100
200
L*x a*
300
400
originalscale
(b)
1.0
1.5
log scale
2.0
2.5
uP + o
3.0
the Kapteynboardcreateadditiveeffects
at eachdecisionpoint,in contrastto the
multiplicative,log-normal effects apparentin Figure2b.
Consequently,the log-normalboard
presentedhereis a physicalrepresentation of the multiplicativecentrallimit
theoremin probabilitytheory.
Basic properties of lognormal distributions
Thebasicpropertiesof log-normaldistribution were established long ago
Figure3. A log-normaldistributionwith original scale (a) and with logarithmic
(Weber1834,Fechner1860, 1897,Galscale (b). Areas under the curve,from the median to both sides, correspondto one and
ton 1879,McAlister1879,Gibrat1931,
two standarddeviation rangesof the normal distribution.
Gaddum1945),and it is not difficultto
characterizelog-normal distributions
A
of
the
random
variableX is saidto be log-normally
C.
the
direct
J. Kapteyndesigned
predecessor
mathematically.
logdistributedif log(X) is normallydistributed(see the box on
normalmachine(Kapteyn1903,Aitchisonand Brown1957).
the facing page). Only positive values are possible for the
Forthatmachine,isoscelestriangleswereused insteadof the
skewedshapedescribedhere.Becausethe triangles'width is
variable,and the distributionis skewedto the left (Figure3a).
Twoparametersareneededto specifya log-normaldistrito
their
horizontal
this
model
also
position,
proportional
and the standarddeviation
the mean VL
bution.Traditionally,
leads to a log-normal distribution.However,the isosceles
are
used (Figure3b). Howwith
wide
sides
to
the
of
the
enthe
variance
a
(or
02) of log(X)
increasingly
right
triangles
to
The
median
of
there
are
clear
have
a
hidden
ever,
advantages using"back-transformed"
try point
logicaldisadvantage:
values (the values are in terms of x, the measureddata):
the particleflow shiftsto the left.In contrast,thereis no such
shiftandthe medianremainsbelowthe entrypointof the par= e 9, o*: =e o
(1)
We then use X - A(g*,t*:
ticles in the log-normal board presentedhere (which was
o*) as a mathematicalexpression
in
X
E.
the
isosceles
that
is
distributed
author
L.).Moreover,
triangles
meaning
accordingto the log-normallaw
designedby
stanwith median[t* andmultiplicative
darddeviationo*.
The medianof this log-normaldiso 1.2
=
since [t
tributionis med(X)=
t* e1,
1.5
o
is the medianof log(X).Thus,theprob2.0
ability that the value of X is greater
than
0
lt* is 0.5, as is the probabilitythat
8.\
the value is less than [t*. The parameI,
ter o*, which we call multiplicative
-0
0
standard deviation, determines the
shape of the distribution. Figure 4
showsdensitycurvesfor some selected
valuesof o*. Note that lt* is a scaleparameter;hence,if X is expressedin different units (or multiplied by a constant for other reasons), then [t*
0o
I
I11
changes accordinglybut o* remains
the same.
200
250
100
0
50
150
Distributionsare commonly characterizedby theirexpectedvalue[tand
for
standarddeviationG.In applications
Figure4. Densityfunctions of selectedlog-normaldistributionscomparedwith a
normal distribution.Log-normaldistributionsA(1P*,o*) shownfor five values of
which the log-normaldistributionadmultiplicativestandarddeviation,s*, are comparedwith the normal distribution
equatelydescribesthe data, these pa(100 ? 20, shaded). The o* values covermost of the rangeevident in Table2. While rametersare usuallyless easyto interthe median p* is the samefor all densities, the modes approachzero with increasing pret than the median pg*(McAlister
1879)andthe shapeparameterO*.It is
shape parameterso*. A change in p* affectsthe scaling in horizontaland vertical
worth noting that o* is relatedto the
directions,but the essential shape o* remains the same.
344 BioScience o May 2001 / Vol.51 No. 5
Articles
coefficientof variationby a monotonic,increasingtransformation (see the box below,eq. 2).
Fornormallydistributeddata,the intervalpi? o coversa
probabilityof 68.3%, while pi+ 20 covers95.5% (Table1).
The correspondingstatementsfor log-normalquantitiesare
[p*/o*, p* o*] = p* x/ o* (contains 68.3%) and
(y*)2] = p
[p*/(oy*)2,p.
(contains 95.5%).
x/ (y*)2
This characterizationshowsthat the operationsof multiplying and dividing, which we denote with the sign x/
(times/divide),help to determine useful intervalsfor lognormal distributions(Figure3), in the same way that the
operationsof addingand subtracting(? , or plus/minus)do
for normaldistributions.Table1 summarizesand compares
some propertiesof normaland log-normaldistributions.
The sum of severalindependentnormalvariablesis itself
a normalrandomvariable.Forquantitieswith a log-normal
distribution,however,multiplicationis the relevantoperation
for combiningthem in most applications;for example,the
productof concentrationsdeterminesthe speed of a simple
chemicalreaction.The productof independentlog-normal
quantitiesalsofollowsa log-normaldistribution.Themedian
of thisproductis theproductof themediansof its factors.The
formula for o* of the product is given in the box below
(eq. 3).
For a log-normal distribution, the most precise (i.e.,
asymptoticallymost efficient)method for estimatingthe parametersp*and o* relieson log transformation.The mean
and empiricalstandarddeviationof the logarithmsof the data
are calculatedand then back-transformed,as in equation 1.
These estimators are called x * and s*, where * is the
geometricmean of the data(McAlister1879;eq.4 in the box
below).Morerobustbutlessefficientestimatescanbe obtained
from the median and the quartilesof the data,as described
in the box below.
As noted previously,it is not uncommon for datawith a
log-normaldistributionto be characterizedin the literature
by the arithmeticmean X and the standarddeviations of a
sample,but it is still possible to obtain estimatesfor p*and
Definition and propertiesof the log-normaldistribution
A randomvariableX is log-normally
distributedif log(X)has a normaldistribution.Usually,naturallogarithmsare used, but other bases would lead to the
same familyof distributions,with rescaled parameters.The probabilitydensity functionof such a randomvariablehas the form
I
f (x
1
=
exp(
F_?
-
1
I(log(x)
p) ,
A shift parametercan be includedto define a three-parameterfamily.This may be adequate if the data cannot be smaller than a certain bound different
fromzero (cf. Aitchisonand Brown1957, page 14). The mean and varianceare exp(?i+ 0/2) and (exp(o2) - 1)exp(2tp+o2),respectively,and therefore,
the coefficient of variationis
c = (exp(a2)-
(2)
1)Y
which is a functionin ( only.
The productof two independentlog-normally
distributedrandomvariables has the shape parameter
(3)
or* (X,- X2)= exp([[logo* (X)]2+ [log
* (X2)]212)
since the variances at the log-transformedvariablesadd.
Estimation:The asymptoticallymost efficient (maximumlikelihood)estimators are
(4)
X
=
exp-Ilog(x,)
s* = exp
xi
2
1
n-1x
1
=
log(
)
Thequartilesqi andq2 leadto a morerobustestimate(ql/q2)cforS*,where1/c = 1.349 = 2 D-1(0.75), D-1denotingthe inverse
? available,i.e. the datais summastandardnormaldistribution
function.Ifthe meank andthe standarddeviations of a sampleare
rizedinthe formx s, the parameters
and
S*
can
be
and
estimated
from
them
p*
byusing
2/--1~
witho0=1+ (s/x)2 = 1+ cv2,cv = coefficientof variation.
of S* is determined
respectively,
Thus,this estimate
onlybythe cv (eq. 2).
exp(,log(w)
May 2001 / Vol.51 No. 5 * BioScience 345
Articles
been shown to be log-normally distributed (Sartwell 1950, 1952, 1966,
Kondo 1977); approximately70% of
Normaldistribution
distribution
Log-normal
86 examplesreviewedby Kondo(1977)
or
additive
(Gaussian,
(Multiplicative
normaldistribution)
appear to be log-normal. Sartwell
normal,distribution)
Property
(1950, 1952,1966) documents37 cases
Effects (centrallimittheorem)
Additive
fitting the log-normal distribution.A
Multiplicative
Skewed
Shape of distribution
Symmetrical
particularlyimpressiveone is that of
Models
5914 soldiersinoculatedon the same
Isosceles
Scalene
Triangleshape
x?c
Effects at each decision pointx
x x/ c'
daywith the samebatchof faultyvaccine, 1005 of whom developedserum
Characterization
hepatitis.
Mean
x *, Geometric
x, Arithmetic
Standarddeviation
Interestingly,despite considerable
s, Additive
s*, Multiplicative
Measure of dispersion
cv = s/xs*
differencesin the median X* of laIntervalof confidence
tencyperiodsof variousdiseases(rang68.3%
+s
X/ s*
,
,•
2s
95.5%
ing from 2.3 hours to severalmonths;
, x/ (S*)2
f+
99.7%
x ? 3s
Table2), the majorityof s* valueswere
,* x/ (S*)3
closeto 1.5.It mightbe worthtryingto
Notes: cv = coefficient of variation;x/ = times/divide, corresponding to plus/minus for the
account for the similarities and disestablished sign +.
similaritiesin s*.Forinstance,the small
s* value of 1.24 in the exampleof the
o(* (see the box on page 345). For example,Stehmannand
Scottishsoldiersmaybe due to limitedvariabilitywithinthis
De Waard(1996) describetheirdataas log-normal,with the
ratherhomogeneousgroupof people.Survivaltime afterdiarithmeticmean x and standarddeviation s as 4.1 ? 3.7.
agnosis of four types of canceris, comparedwith latentpethe
nature
of
the
distribution
into
acriodsof infectiousdiseases,much morevariable,with s*valTaking log-normal
ues between2.5 and 3.2 (Boag 1949,Feinleiband McMahon
count, the probabilityof the corresponding ?- s interval
1960).It would be interestingto see whether ? * and s* val(0.4 to 7.8) turns out to be 88.4%instead of 68.3%.Moreues havechangedin accordwith the changesin diagnosisand
over,65%of the populationarebelow the mean and almost
In
of cancerin the lasthalf century.The age of onset
standard
deviation.
treatment
within
one
contrast,
exclusively
only
of Alzheimer'sdisease can be characterizedwith the geothe proposed characterization,which uses the geometric
metricmean "*of 60 yearsand s* of 1.16 (Horner 1987).
mean "*and the multiplicativestandarddeviations*,reads
Table1. A bridgebetween normal and log-normaldistributions.
3.0 x/ 2.2 (1.36to 6.6). Thisintervalcoversapproximately
68% of the data and thus is more appropriatethan the
other intervalfor the skeweddata.
Comparinglog-normaldistributions
acrossthe sciences
fromvariousbranchof log-normaldistributions
Examples
es of sciencerevealinterestingpatterns(Table2). In general, values of s* vary between 1.1 and 33, with most in the
rangeof approximately1.4 to 3. The shapesof such distributions are apparentby comparisonwith selectedinstances
shown in Figure4.
Geology and mining. In the Earth'scrust,the concentrationof elementsandtheirradioactivity
usuallyfollowa lognormaldistribution.In geology,valuesof s* in 27 examples
varied from 1.17 to 5.6 (Razumovsky1940, Ahrens 1954,
Malancaet al. 1996);nine other examplesaregivenin Table
2. A closerlook at extensivedatafrom differentreefs (Krige
1966)indicatesthatvaluesof s*forgoldanduraniumincrease
in concertwith the size of the region considered.
Human medicine. A varietyof examplesfrom medicine
fit the log-normaldistribution.Latentperiods(timefrominfection to first symptoms) of infectiousdiseaseshave often
346 BioScience - May 2001 / Vol.51 No. 5
Environment. The distribution of particles,chemicals,
and organismsin the environmentis often log-normal.For
example,the amounts of rain fallingfrom seeded and unseeded clouds differed significantly (Biondini 1976), and
again s* valueswere similar(seedingitself accountsfor the
greatervariationwith seeded clouds). The parametersfor
in honey (seeFigureib)
the contentof hydroxymethylfurfurol
showthatthe distributionof the chemicalin 1573samplescan
be describedadequatelywith just the two values.Ott (1978)
presenteddataon the PollutantStandardIndex,a measureof
airquality.DatawerecollectedforeightUS cities;the extremes
of ?* and s* werefound in LosAngeles,Houston,and Seattle, allowinginterestingcomparisons.
comAtmosphericsciencesand aerobiology.Another
which
ponentof airqualityis itscontentof microorganisms,
in
variable
less
and
was-not surprisingly-muchhigher
et
island
in
that
of
an
than
theairof Marseille
(DiGiorgio al.
is a majorpartof lifesupportsystems,
1996).Theatmosphere
and manyatmosphericphysicaland chemicalproperties
law.Amongotherexamples
distribution
followa log-normal
of aerosolsandcloudsandparameters
aresizedistributions
of turbulentprocesses.In the contextof turbulence,the
surfaceis formedbyeddies,the
thermalfluxfromtheEarth's
Articles
Table2. Comparinglog-normaldistributionsacross the sciences in terms of the original data. * is an estimatorof the
median of the distribution,usually the geometricmean of the observeddata, and s* estimates the multiplicativestandard
deviation, the shape parameter of the distribution;68% of the data are within the range of x * x s*,and 95%within
* x (s*)2. In general, values of s* and some of x * were obtained by transformationfrom the parametersgiven in the
?
literature(cf. Table3). Thegoodness offit was testedeither by the original authors or by us.
Discipline and type
of measurement
Geology and mining
Concentrationof elements
Example
Ga in diabase
Co in diabase
Cu
Cr in diabase
226Ra
Au: small sections
large sections
U: small sections
large sections
Humanmedicine
Latencyperiods of diseases
Survivaltimes after cancer
diagnosis
Age of onset of a disease
Environment
Rainfall
HMFin honey
Airpollution(PSI)
Aerobiology
Airbornecontaminationby
bacteriaand fungi
Phytomedicine
Fungicidesensitivity,EC50
Banana leaf spot
Powderymildewon barley
Plant physiology
Permeabilityand
solute mobility(rate of
constant desorption)
Ecology
Species abundance
Food technology
Size of unit
(mean diameter)
x*
n
56
57
688
53
52
100
75,000
100
75,000
*
Reference
17 mg kg-1
35 mg - kg-1
0.37%
93 mg kg-1
25.4 Bq - kg-1
(20 inch-dwt.)a
n.a.
(2.5 inch-lb.)a
n.a.
1.17
1.48
2.67
5.60
1.70
1.39
2.42
1.35
2.35
Ahrens1954
Ahrens1954
Razumovsky1940
Ahrens1954
Malancaet al. 1996
Krige1966
Krige1966
Krige1966
Krige1966
Chickenpox
Serum hepatitis
Bacterialfood poisoning
Salmonellosis
Poliomyelitis,8 studies
Amoebicdysentery
Mouthand throatcancer
Leukemiamyelocytic(female)
Leukemialymphocytic(female)
Cervixuteri
Alzheimer
127
1005
144
227
258
215
338
128
125
939
90
14 days
100 days
2.3 hours
2.4 days
12.6 days
21.4 days
9.6 months
15.9 months
17.2 months
14.5 months
60 years
1.14
1.24
1.48
1.47
1.50
2.11
2.50
2.80
3.21
3.02
1.16
Sartwell1950
Sartwell1950
Sartwell1950
Sartwell1950
Sartwell1952
Sartwell1950
Boag 1949
Feinleiband McMahon1960
Feinleiband McMahon1960
Boag 1949
Horner1987
Seeded
Unseeded
Contentof hydroxymethylfurfurol
Los Angeles, CA
Houston,TX
Seattle, WA
26
25
1573
364
363
357
211,600 m3
78,470 m3
5.56 g kg-1
109.9 PSI
49.1 PSI
39.6 PSI
4.90
4.29
2.77
1.50
1.85
1.58
Biondini1976
Biondini1976
Renner1970
Ott 1978
Ott 1978
Ott 1978
630
65
22
30
cfu m-3
cfu m-3
cfu m-3
cfu m-3
1.96
2.30
3.17
2.57
Di Giorgioet
Di Giorgioet
Di Giorgioet
Di Giorgioet
[gg ml-1a.i.
igg ml-1a.i.
ig ml-1a.i.
tg ?- ml-1a.i.
[g - ml-1a.i.
1.85
2.42
>3.58
1.29
1.68
al. 1996
al. 1996
al. 1996
al. 1996
Bacteriain Marseilles
Fungiin Marseilles
Bacteriaon PorquerollesIsland
Fungion PorquerollesIsland
n.a.
n.a.
n.a.
n.a.
Untreatedarea
Treatedarea
Afteradditionaltreatment
Spain (untreatedarea)
England(treated area)
100
100
94
20
21
Citrusaurantium/H20/Leaf
Capsicumannuum/H20/CM
Citrusaurantium/2,4-D/CM
Citrusaurantium/WL110547/CM
Citrusaurantium/2,4-D/CM
Citrusaurantium/2,4-D/CM + acc.1
Citrusaurantium/2,4-D/CM
Citrusaurantium/2,4-D/CM + acc.2
73
149
750
46
16
16
19
19
1.58 10-10 m s-1
26.9 10-10 m s-1
7.41 10-71 s-1
2.6310-71 s-1
n.a.
n.a.
n.a.
n.a.
1.18
1.30
1.40
1.64
1.38
1.17
1.56
1.03
n.a.
n.a.
n.a.
n.a.
15,609
56,131
87,110
12.1 i/sp
-0.4%
2.93%
n.a.
17.5 i/sp
19.5 i/sp
n.a.
5.68
7.39
11.82
33.15
8.66
10.67
25.14
May1981
Magurran1988
Magurran1988
Preston 1962
Preston 1948
Preston 1948
Preston 1948
n.a.
n.a.
n.a.
15 pm
20 pm
10 pm
1.5
2
1.5-2
Limpertet al. 2000b
Limpertet al. 2000b
Limpertet al. 2000b
Diatoms (150 species)
Plants (coverageper species)
Fish (87 species)
Birds(142 species)
Moths in England(223 species)
Moths in Maine (330 species)
Moths in Saskatchewan (277 species)
Crystals in ice cream
Oildrops in mayonnaise
Pores in cocoa press cake
0.0078
0.063
0.27
0.0153
6.85
Romeroand Sutton 1997
Romeroand Sutton 1997
Romeroand Sutton 1997
Limpertand Koller1990
Limpertand Koller1990
Baur1997
Baur1997
Baur1997
Baur1997
Baur1997
Baur1997
Baur1997
Baur1997
May 2001 / Vol.51 No. 5 * BioScience 347
Articles
Table2. (continuedfrom previouspage)
Disciplineand type
of measurement
Linguistics
Lengthof spokenwordsin
phoneconversation
Lengthof sentences
Social sciences
and economics
Ageof marriage
Farmsize in England
and Wales
Income
a
Example
n
*
S
s*
Reference
Differentwords
Totaloccurrenceof words
G. K.Chesterton
G. B. Shaw
738
76,054
600
600
5.05 letters
3.12 letters
23.5 words
24.5 words
1.47
1.65
1.58
1.95
Herdan 1958
Herdan 1958
Williams1940
Williams1940
Womenin Denmark,1970s
1939
1989
Householdsof employeesin
Switzerland,1990
30,200
n.a.
n.a.
1.7x106
(12.4)a 10.7 years
27.2 hectares
37.7 hectares
sFr.6,726
1.69
2.55
2.90
1.54
Preston 1981
Allanson1992
Allanson1992
StatistischesJahrbuchder
Schweiz1997
The shift parameterof a three-parameter
distribution.
log-normal
Notes:n.a. = not available;PSI= PollutantStandardIndex;acc. = accelerator;i/sp = individualsper species; a.i. = active ingredient;and
cfu = colonyformingunits.
size of which is distributed log-normally (Limpert et al.
2000b).
Phytomedicine and microbiology. Examplesfrom
microbiologyand phytomedicineinclude the distribution
of sensitivityto fungicidesin populationsand distributionof
population size. Romero and Sutton (1997) analyzedthe
sensitivityof the banana leaf spot fungus (Mycosphaerella
fijiensis) to the fungicide propiconazole in samples from
untreatedand treatedareasin CostaRica.The differencesin
S* and s* amongthe areascanbe explainedby treatmenthistory.The s* in untreatedareasreflectsmostly environmental conditionsand stabilizingselection.The increasein s* after treatmentreflectsthe widened spectrum of sensitivity,
which resultsfrom the additionalselectioncausedby use of
the chemical.
Similar results were obtained for the barley mildew
pathogen,Blumeria(Erysiphe)graminisf. sp. hordei,and the
fungicide triadimenol (Limpert and Koller 1990) where,
again,s* was higherin the treatedregion.Mildewin Spain,
wheretriadimenolhad not been used, representedthe original sensitivity.In contrast,in Englandthe pathogenwas often treatedandwasconsequentlyhighlyresistant,differingby
a resistancefactorof close to 450 * England/ * Spain).
(a?
Toobtainthe samecontrolof the resistantpopulation,then,
the concentrationof the chemicalwouldhaveto be increased
by this factor.
The abundanceof bacteriaon plantsvariesamong plant
species, type of bacteria,and environment and has been
found to be log-normallydistributed(Hirano et al. 1982,
Loperet al. 1984).In the caseof bacterialpopulationson the
leaves of corn (Zea mays), the median population size
(X*) increasedfromJulyandAugustto October,but the relativevariabilityexpressed(s*) remainednearlyconstant(Hi348 BioScience * May 2001 / Vol.51 No. 5
rano et al. 1982).Interestingly,whereass* for the total number of bacteriavariedlittle(from 1.26to 2.0), thatforthe sub(from3.75
groupof icenucleationbacteriavariedconsiderably
to 8.04).
evidencewasprePlant physiology. Recently,
convincing
sentedfromplantphysiologyindicatingthatthe log-normal
distributionfitswellto permeabilityandto solutemobilityin
plantcuticles(Baur1997).Forthe numberof combinations
of species, plant parts, and chemical compounds studied,
the median s* for waterpermeabilityof leaveswas 1.18.The
correspondings* of isolatedcuticles,1.30,appearsto be considerablyhigher,presumablybecauseof the preparationof cuhigherformobilityof the herticles.Again,s*wasconsiderably
bicidesDichlorophenoxyaceticacid (2,4-D) and WL110547
1,2,3,4-tetra(1-(3-fluoromethylphenyl)-5-U-14C-phenoxyin
for
s*
waterand
for
differences
the
One
zole).
explanation
for the other chemicals may be extrapolatedfrom results
fromfood technology,where,for transportthroughfilters,s*
is smallerfor simple (e.g., spherical)particlesthan for more
complex particlessuch as rods (E. J. Windhab [Eidgenossische TechnischeHochschule, Zurich, Switzerland],personal communication,2000).
Chemicalscalledacceleratorscan reducethe variabilityof
mobility.Forthe combinationof Citrusaurantiumcuticlesand
2,4-D,diethyladipate
(accelerator1) causeds*to fallfrom 1.38
to 1.17. For the same combination,tributylphosphate(accelerator2) causedan evengreaterdecrease,from 1.56to 1.03.
Statisticalreasoningsuggeststhatthesedata,with s*valuesof
1.17and 1.03,arenormallydistributed(Baur1997).However,
because the underlyingprinciples of permeabilityremain
the same,we thinkthese casesrepresentlog-normaldistributions.Thus,consideringonly statisticalreasonsmayleadto
misclassification,which may handicapfurtheranalysis.One
Articles
Table3. Establishedmethodsfor describinglog-normaldistributions.
Characterization
Disadvantages
Graphicalmethods
Density plots, histograms,box plots
Spacey, difficultto describe and compare
Indicationof parameters
Logarithmof X, mean, median,
standarddeviation,variance
Skewness and curtosis of X
Unclearwhich base of the logarithmshould be
chosen; parametersare not on the scale of the
originaldata and values are difficultto interpret
and use for mental calculations
Difficultto estimate and interpret
questionremains:Whatarethe underlyingprinciplesof permeabilitythat causelog-normalvariability?
Ecology. In the majorityof plantand animalcommunities,
the abundanceof speciesfollowsa (truncated)log-normaldistribution(Sugihara1980,Magurran1988).Interestingly,the
rangeof s* for birds,fish,moths, plants,or diatomswas very
closeto thatfoundwithinone groupor another.Basedon the
dataand conclusionsof Preston(1948), we determinedthe
most typicalvalue of s* to be 11.6 .1
Food technology. Variousapplications
of thelog-normal
distributionarerelatedto the characterizationof structures
in food technologyand food processengineering.Such dispersestructuresmay be the size and frequencyof particles,
droplets, and bubbles that are generated in dispersing
processes,or they may be the pores in filteringmembranes.
The latteraretypicallyformedby particlesthat arealso lognormallydistributedin diameter.Suchparticlescan also be
generated in dry or wet milling processes, in which lognormal distributionis a powerfulapproximation.The examplesof icecreamandmayonnaisegivenin Table2 alsopoint
to the presenceof log-normaldistributionsin everydaylife.
Linguistics. In linguistics,the numberof lettersper word
andthe numberof wordsper sentencefit the log-normaldistribution.In Englishtelephoneconversations,the variability
s* of the length of all words used-as well as of different
words-was similar(Herdan1958).Likewise,the numberof
words per sentencevaried little between writers (Williams
1940).
Social sciences and economics. Examplesof lognormaldistributionsin the socialsciencesand economicsincludeage of marriage,farmsize,and income.The age of first
marriagein Westerncivilizationfollows a three-parameter
log-normaldistribution;the thirdparametercorrespondsto
age at puberty(Preston1981).Forfarmsize in Englandand
'Speciesabundancemaybe describedby a log-normallaw (Preston1948),
usuallywrittenin the form S(R) = So. exp(-a2R2),where Sois the number
of speciesat the mode of the distribution.The shapeparametera amounts
to approximately0.2 for all species,which correspondsto s* = 11.6.
Wales, both * and s* increased over 50
years,the formerby 38.6%(Allanson1992).
Forincome distributions,X * and s* mayfacilitate comparisons among societies and
generations (Aitchison and Brown 1957,
Limpertet al. 2000a).
Typicals* values
One questionarisesfromthe comparisonof
log-normaldistributionsacrossthe sciences:
Towhatextentares* valuestypicalfor a certain attribute?In some cases, values of s*
appearto be fairlyrestricted,asis the casefor
the range of s* for latent periods of diseases-a fact that
Sartwellrecognized(1950, 1952, 1966) and Lawrencereemphasized(1988a).Describingpatternsof typicalskewnessat
the establishedlog level,Lawrence(1988a,1988b)can be regardedas the directpredecessorof our comparisonof s*values acrossthe sciences.Aitchisonand Brown (1957), using
plotsand Lorenz
graphicalmethodssuchas quantile-quantile
curves,demonstratedthatlog-normaldistributionsdescribing, forexample,nationalincomeacrosscountries,or income
for groups of occupations within a country, show typical
shapes.
A restrictedrangeof variationfor a specificresearchquestion makes sense. For infectious diseases of humans, for
example,the infectionprocessesof the pathogensaresimilar,
as is the geneticvariabilityof the humanpopulation.Thesame
appearsto hold for survivaltime afterdiagnosisof cancer,although the valueof s* is higher;this can be attributedto the
additionalvariationcausedby cancerrecognitionand treatment.Otherexampleswithtypicalrangesof s*come fromlinguistics.Forbacteriaon plantsurfaces,the rangeof variation
of total bacteriais smallerthan that of a group of bacteria,
which can be expected because of competition. Thus, the
rangesof variationin these cases appearto be typical and
meaningful,a factthat might well stimulatefutureresearch.
Future challenges
A numberof scientificareas-and everydaylife-will present
opportunities for new comparisons and for more farreachinganalysesof s*valuesfor the applicationsconsidered
to date.Moreover,our concept can be extendedalso to descriptions based on sigmoid curves,such as, for example,
dose-responserelationships.
andmoFurthercomparisons ofs* values. Permeability
bility are important not only for plant physiology (Baur
1997)but also for manyotherfieldssuch as soil sciencesand
humanmedicine,aswellasforsomeindustrialprocesses.With
the help of -* and s*, the mobility of differentchemicals
througha varietyof naturalmembranescould easilybe assessed,allowingfor comparisonsof the membraneswith one
anotheras well as with those for filteractionsof soils or with
technicalmembranesand filters.Suchcomparisonswill undoubtedlyyield valuableinsights.
May 2001 / Vol.51 No. 5 * BioScience 349
Articles
of
description
Farther-reaching analyses. Anadequate
variabilityis a prerequisitefor studyingits patternsand estimatingvariancecomponents.One componentthat deserves
more attentionis the variabilityarisingfrom unknown reasons and chance,commonlycallederrorvariation,or in this
case,s*E. Suchvariabilitycanbe estimatedif otherconditions
accountingforvariability-the environmentandgenetics,for
example-are kept constant.The field of populationgenetics andfungicidesensitivity,aswellasthatof permeabilityand
mobility,candemonstratethe benefitsof analysesof variance.
An importantparameterof populationgeneticsis migration. Migrationamong regionsleadsto populationmixing,
thuswideningthe spectrumof fungicidesensitivityencounteredin any one of the regionsmentionedin the discussion
of phytomedicine and microbiology. Population mixing
amongregionswill increases* but decreasethe differencein
X *. Migrationof sporesappearsto be greater,for instance,
in regionsof CostaRicathanin those of Europe(Limpertet
al. 1996,Romeroand Sutton 1997,Limpert1999).
Another important aim of pesticide researchis to estimate resistancerisk.As a firstapproximation,resistancerisk
is assumedto correlatewith s*. However,s* dependson geneticandothercausesof variability.
Thus,determiningits genetic part,s*G,is a taskworth undertaking,becausegenetic
variationhas a majorimpact on futureevolution (Limpert
1999). Severalaspects of various branches of science are
expected to benefit from improved identificationof components of s*. In the study of plant physiologyand permeabilitynoted above (Baur 1997), for example,determining
the effectsof acceleratorsand theirshareof variabilitywould
be elucidative.
Sigmoidcurvesbasedon log-normaldistributions.
Dose-response relations are essential for understanding
the controlof pests and pathogens(Horsfall 1956).Equally
important are dose-response curves that demonstratethe
effects of other chemicals,such as hormones or minerals.
Typically,such curvesaresigmoid and show the cumulative
action of the chemical.If plotted against the logarithm of
the chemical dose, the sigmoid is symmetricaland corresponds to the cumulativecurve of the log-normal distribution at logarithmic scale (Figure 3b). The steepness of
the sigmoid curve is inverselyproportionalto s*, and the
geometric mean value * equals the "ED,," the chemical
dose creating50% of the maximal effect. Consideringthe
generalimportance of chemical sensitivity,a wide field of
furtherapplicationsopens up in which progresscan be expected and in which researchersmay find the proposed
characterization * X/s* advantageous.
Normal or log-normal?
Consideringthe patternsof normaland log-normaldistributionsfurther,as well as the connectionsand distinctionsbetween them (Tables1, 3), is useful for describingand explainingphenomenarelatingto frequencydistributionsin life.
Some importantaspectsarediscussedbelow.
350 BioScience * May 2001 / Vol.51 No. 5
The range of log-normalvariability.Howfarcans*
values extend beyond the range described,from 1.1 to 33?
Towardthehighendof thescaleof possibles*values,we found
one s* largerthan 150for hailenergyof clouds (Federeret al.
1986,calculationsbyW.A. S). Valuesbelow 1.2 may evenbe
common, and thereforeof great interest in science. However, such log-normal distributionsare difficult to distinguish from normal ones-see Figures 1 and 3-and thus
until now haveusuallybeen takento be normal.
Becauseof the generalpreferencefor the normal distribution, we wereaskedto find examplesof datathat followed
a normal distributionbut did not match a log-normal distribution.Interestingly,originalmeasurementsdid not yield
any such examples.As noted earlier,even the classicexample of the heightof women (Figurela; Snedecorand Cochran
1989)fitsboth distributionsequallywell.Thedistributioncan
be characterizedwith 62.54 inches ? 2.38 and 62.48 inches
X/ 1.039, respectively.The examplesthat we found of normally-?but not log-normally-distributed data consisted
of differences,sums, means, or other functions of original
measurements.Thesefindingsraisequestionsaboutthe role
of symmetryin quantitativevariationin nature.
Why the normal distribution is so popular. Re-
therearea numberof reagardlessof statisticalconsiderations,
sons why the normaldistributionis muchbetterknownthan
the log-normal.A majorone appearsto be symmetry,one of
the basicprinciplesrealizedin natureas wellas in our culture
and thinking.Thus,probabilitydistributionsbasedon symmetry may have more inherent appeal than skewed ones.
Twootherreasonsrelateto simplicity.First,asAitchisonand
Brown (1957, p. 2) stated,"Manhas found additionan easier operationthan multiplication,and so it is not surprising
that an additivelaw of errorswas the firstto be formulated."
Second,the established,concisedescriptionof a normalsample-fx + s-is handy,well-known,and sufficientto represent
the underlyingdistribution,which madeit easier,until now,
to handlenormaldistributionsthanto workwith log-normal
distributions.Another reason relatesto the history of the
distributions:The normal distributionhas been known and
applied more than twice as long as its log-normal sister
distribution.Finally,the very notion of "normal"conjures
more positiveassociationsfornonstatisticiansthandoes"lognormal."For all of these reasons,the normal or Gaussian
distributionis farmore familiarthan the log-normaldistribution is to most people.
This preferenceleads to two practicalways to make data
look normaleven if they areskewed.First,skeweddistributions producelargevaluesthat may appearto be outliers.It
is common practiceto rejectsuch observationsand conduct
the analysiswithout them, therebyreducingthe skewness
but introducingbias.Second,skeweddataareoften grouped
together,and their means-which are more normallydistributed-are usedfor furtheranalyses.Of course,following
that proceduremeans that important featuresof the data
may remainundiscovered.
Articles
Whythe log-normaldistributionis usuallythe
the
bettermodelfor originaldata. Asdiscussed
above,
connectionbetweenadditiveeffectsand the normaldistribution parallelsthatof multiplicativeeffectsand the log-normal
distribution.Kapteyn(1903) noted long agothatif datafrom
one-dimensionalmeasurementsin naturefit the normaldistribution,two- andthree-dimensionalresultssuchas surfaces
and volumescannotbe symmetric.A numberof effectsthat
point to the log-normaldistributionas an appropriatemodel
have been describedin various papers (e.g., Aitchison and
Brown1957,Koch 1966,1969,Crowand Shimizu 1988).Interestingly,evenin biologicalsystematics,whichis the science
the numberof, say,speciesperfamilywasexof classification,
to
fit
pected
log-normality(Koch 1966).
The most basic indicatorof the importanceof the lognormal distribution may be even more general,however.
Clearly,chemistryandphysicsarefundamentalin life,andthe
prevailingoperationin the lawsof these disciplinesis multiplication.In chemistry,for instance,the velocityof a simple
reactiondependson the productof the concentrationsof the
moleculesinvolved.Equilibriumconditionslikewisearegovernedby factorsthat act in a multiplicativeway.Fromthis,a
majorcontrastbecomesobvious:The reasonsgoverningfrequencydistributionsin natureusuallyfavorthe log-normal,
whereaspeople arein favorof the normal.
Forsmallcoefficientsof variation,normalandlog-normal
distributionsboth fit well. For these cases, it is naturalto
choosethe distributionfoundappropriate
forrelatedcasesexincreased
which
hibiting
correspondsto the law
variability,
the
reasons
of
will most often be
This
governing
variability.
the log-normal.
Conclusion
This articleshows,in a nutshell,the fundamentalrole of the
log-normaldistributionand providesinsightsfor gaininga
deepercomprehensionof thatrole.Comparedwithestablished
methodsfordescribinglog-normaldistributions(Table3), the
proposed characterizationby x * and s* offers severaladvantages,some of whichhavebeen describedbefore(Sartwell
1950,Ahrens 1954,Limpert1993).Both "* and s* describe
the datadirectlyat theiroriginalscale,theyareeasyto calculate
and imagine,and they even allowmentalcalculationand estimation.The proposedcharacterizationdoes not appearto
haveany majordisadvantage.
On the first page of their book, Aitchison and Brown
(1957) statedthat,comparedwith its sisterdistributions,the
normaland the binomial,the log-normaldistribution"has
remainedthe Cinderellaof distributions,the interestof writersin the learnedjournalsbeing curiouslysporadicand that
of the authorsof statisticaltextbooksbut faintlyaroused."
This
is indeedtrue:Despiteabundant,increasingevidencethatlognormaldistributionsarewidespreadin the physical,biological,andsocialsciences,andin economics,log-normalknowledge has remaineddispersed.The questionnow is this:Can
we begin to bringthe wealthof knowledgewe haveon normal and log-normaldistributionsto the public?We feel that
doing so would lead to a general preferencefor the lognormal,distributionoverthe Gaussnormal,or multiplicative
ian distributionwhen describingoriginaldata.
Acknowledgments
This work was supportedby a grantfrom the SwissFederal
Instituteof Technology(Zurich)andby COST(Coordination
of Scienceand Technologyin Europe)at Brusselsand Bern.
PatrickFliitsch,Swiss FederalInstituteof Technology,constructedthe physicalmodels.Wearegratefulto RoySnaydon,
professoremeritusat the Universityof Reading,UnitedKingdom, to RebeccaChasan,and to four anonymousreviewers
for valuablecommentson the manuscript.WethankDonna
Verdierand HermanMarshallforgettingthe paperinto good
shapefor publication,and E. L.also thanksGerhardWenzel,
professorof agronomyand plantbreedingat TechnicalUniversity,Munich,for helpfuldiscussions.
Becauseof his fundamentaland comprehensivecontribution to the understandingof skeweddistributionscloseto
100yearsago,ourpaperis dedicatedto the Dutchastronomer
JacobusCorneliusKapteyn(van der Heijden2000).
References cited
AhrensLH.1954.Thelog-normaldistributionof theelements(A fundamental
law of geochemistryand its subsidiary).Geochimicaet Cosmochimica
Acta5: 49-73.
AitchisonJ,BrownJAC.1957.TheLog-normalDistribution.Cambridge(UK):
Press.
Cambridge
University
AllansonP. 1992.Farmsize structurein EnglandandWales,1939-89. Journal of AgriculturalEconomics43: 137-148.
andorganicsolute
BaurP.1997.Log-normaldistributionof waterpermeability
mobilityin plant cuticles.Plant,Cell and Environment20: 167-177.
MeofApplied
statistics.
R.1976.Cloudmotionandrainfall
Biondini
Journal
teorology15:205-224.
BoagJW.1949.Maximumlikelihoodestimatesof the proportionof patients
curedby cancertherapy.Journalof the RoyalStatisticalSociety B, 11:
15-53.
CrowEL,ShimizuK,eds. 1988.Log-normalDistributions:TheoryandApplication.New York:Dekker.
Di GiorgioC, KrempffA, GuiraudH, BinderP,TiretC, Dumenil G. 1996.
Atmosphericpollutionby airbornemicroorganismsin the Cityof Marseilles.AtmosphericEnvironment30: 155-160.
FactorVM, LaskowskaD, JensenMR,WoitachJT,PopescuNC, Thorgeirsson SS.2000.VitaminE reduceschromosomaldamageand inhibitshepatictumor formationin a transgenicmouse model. Proceedingsof the
NationalAcademyof Sciences97: 2196-2201.
Fagerstr6mT, JagersP, SchusterP, SzathmaryE. 1996. Biologistsput on
mathematicalglasses.Science274: 2039-2041.
FechnerGT. 1860. Elemente der Psychophysik.Leipzig (Germany):Breitkopf und Hirtel.
. 1897.Kollektivmasslehre.
Leipzig(Germany):Engelmann.
•FedererB, et al. 1986.Main resultsof GrossversuchIV.Journalof Climate
and AppliedMeteorology25: 917-957.
FeinleibM, McMahonB. 1960.Variationin the durationof survivalof patients with the chronicleukemias.Blood 15-16: 332-349.
GaddumJH. 1945.Lognormaldistributions.Nature 156:463, 747.
GaltonE 1879.The geometricmean, in vital and social statistics.Proceedings of the RoyalSociety29: 365-367.
.1889. NaturalInheritance.London:Macmillan.
GibratR. 1931.LesInegalitisEconomiques.Paris:RecueilSirey.
GrothBHA. 1914.The golden mean in the inheritanceof size. Science39:
581-584.
May 2001 / Vol.51 No. 5 * BioScience 351
Articles
GutC,Limpert
H.2000.Acomputer
simulation
ontheweb
E,Hinterberger
to visualizethe genesisof normaland log-normaldistributions.
http://stat.ethz.ch/vis/log-normal
HerdanG. 1958.Therelationbetweenthedictionary
distribution
andthe
occurrence
distribution
of wordlengthanditsimportance
forthestudy
of quantitative
Biometrika
45:222-228.
linguistics.
HiranoSS,NordheimEV,ArnyDC,UpperCD.1982.Log-normal
distributionof epiphytic
bacterial
onleafsurfaces.
andEnpopulations
Applied
vironmental
44:695-700.
Microbiology
HornerRD.1987.AgeatonsetofAlzheimer's
disease:
Clueto therelative
imAmerican
126:
Journalof Epidemiology
portanceof etiologicfactors?
409-414.
Horsfall
of fungicidal
actions.Chronica
30.
Botanica
JG.1956.Principle
N. 1994.Continuous
Univariate
DistribuJohnsonNL,KotzS,Balkrishan
tions.NewYork:
Wiley.
Curvesin BiologyandStatistics.
AstroJC.1903.SkewFrequency
Kapteyn
nomicalLaboratory,
Noordhoff.
(TheNetherlands):
Groningen
E2000.Crypticconsumers
andtheecologyof anAfricanSavanna.
Keesing
BioScience
50:205-215.
KochAL.1966.Thelogarithmin biology.I. Mechanisms
the
generating
log-normaldistributionexactly.Journalof TheoreticalBiology23:
276-290.
. 1969.Thelogarithm
inbiology.II.Distributions
thelogsimulating
normal.Journal
of Theoretical
Biology23:251-268.
KondoK.1977.Thelog-normal
distribution
of theincubation
timeof exof HumanGenetics21:217-237.
Journal
ogenousdiseases.
Japanese
KrigeDG.1966.A studyof goldanduraniumdistribution
patternsin the
GoldField.Geoexploration
4:43-53.
Klerksdorp
Lawrence
RJ.1988a.The log-normalas event-timedistribution.Pages
211-228in CrowEL,ShimizuK,eds.Log-normal
Distributions:
TheNewYork:
Dekker.
oryandApplication.
in economicsandbusiness.Pages229-266in
-.1988b.Applications
CrowEL,ShimizuK,eds.Log-normal
Distributions:
TheoryandApNewYork:
Dekker.
plication.
LeeET.1992.Statistical
MethodsforSurvival
DataAnalysis.
NewYork:
Wiley.
LeNaourF,Rubinstein
E,JasminC,Prenant
M,BoucheixC.2000.Severely
reducedfemalefertilityin CD9-deficient
mice.Science287:319-321.
E.1993.Log-normal
distributions
inphytomedicine:
Ahandyway
Limpert
fortheircharacterization
andapplication.
of the6thInterProceedings
nationalCongress
of PlantPathology;
28July-6August,1993;Montreal,
NationalResearch
CouncilCanada.
. 1999.Fungicide
Towards
of gesensitivity:
improved
understanding
neticvariability.
andAntifungal
Pages188-193in ModernFungicides
II.Andover(UK):Intercept.
Compounds
of theBarleyMildewPathogen
to TriE,KollerB.1990.Sensitivity
Limpert
adimenolin Selected
Areas.Zurich(Switzerland):
Institute
of
European
PlantSciences.
Controlof Cereal
LimpertE,FinckhMR,WolfeMS,eds.1996.Integrated
MildewsandRusts:Towards
Coordination
of Research
AcrossEurope.
Brussels
EUR16884EN.
Commission.
(Belgium):
European
E,FuchsJG,StahelWA,2000a.Lifeis lognormal:Onthe charms
Limpert
of statistics
forsociety.Pages518-522in HaberliR,ScholzRW,BillA,
disciplinarity:Joint Problem-Solvingamong Science,Technologyand
Society.Zurich(Switzerland):Haffmans.
LoperJE,SuslowTV,SchrothMN. 1984.Log-normaldistributionof bacterialpopulationsin the rhizosphere.Phytopathology74: 1454-1460.
MagurranAE. 1988. EcologicalDiversityand its Measurement.London:
Croom Helm.
MalancaA, GaidolfiL, PessinaV, DallaraG. 1996.Distributionof 226-Ra,
232-Th,and40-Kin soilsof Rio Grandedo Norte (Brazil).Journalof EnvironmentalRadioactivity30: 55-67.
MayRM.1981.Patternsin multi-speciescommunities.Pages197-227in May
RM, ed. Theoretical Ecology: Principles and Applications. Oxford:
Blackwell.
McAlisterD. 1879.The lawof the geometricmean.Proceedingsof the Royal
Society29: 367-376.
Ott WR. 1978.EnvironmentalIndices.AnnArbor(MI):AnnArborScience.
PowersL. 1936.The natureof the interactionof genesaffectingfour quantitativecharactersin a crossbetweenHordeumdeficiensand H. vulgare.
Genetics21: 398-420.
Preston FW. 1948. The commonness and rarity of species. Ecology 29:
254-283.
. 1962.The canonicaldistributionof commonnessand rarity.Ecol-
Welti M, eds. Transdisciplinarity:
Joint Problem-Solvingamong Science, Technologyand Society.Zurich(Switzerland):Haffmans.
LimpertE,AbbtM,AsperR, GraberWK, GodetF,StahelWA,WindhabEJ,
2000b.Lifeis log normal:Keysand cluesto understandpatternsof multiplicativeinteractionsfrom the disciplinaryto the transdisciplinary
level.Pages20-24 in HaberliR, ScholzRW,BillA, WeltiM, eds. Trans-
van der Heijden P 2000. Jacob Cornelius Kapteyn (1851-1922): A Short Biography. (16 May 2001; www.strw.leidenuniv.nl/-heijden/kapteynbio.html)
Weber H. 1834. De pulsa resorptione auditu et tactu. Annotationes anatomicae et physiologicae. Leipzig (Germany): Koehler.
Williams CB. 1940. A note on the statistical analysis of sentence length as a
criterion of literary style. Biometrika 31: 356-361.
352 BioScience * May 2001 / Vol. 51 No. 5
ogy 43: 185-215, 410-432.
. 1981.Pseudo-log-normaldistributions.Ecology62: 355-364.
RazumovskyNK. 1940.Distributionof metalvaluesin oredeposits.Comptes
Rendus(Doklady)de l'Academiedes Sciencesde l'URSS9: 814-816.
Renner E. 1970. Mathematisch-statistische
Methoden in der praktischen
Anwendung.Hamburg(Germany):Parey.
Rhew CR,MillerRB,WeissRE2000. Naturalmethylbromideand methyle
chlorideemissionsfrom coastalsalt marshes.Nature403:292-295.
RomeroRA,SuttonTB. 1997.Sensitivityof Mycosphaerellafijiensis,causal
agentof blacksigatokaof banana,to propiconozole.Phytopathology87:
96-100.
SachsL.1997.AngewandteStatistik.
Methoden.HeiAnwendungstatistischer
delberg(Germany):Springer.
SartwellPE. 1950.The distributionof incubationperiodsof infectiousdisease.AmericanJournalof Hygiene51: 310-318.
. 1952The incubationperiod of poliomyelitis.AmericanJournalof
PublicHealthand the Nation'sHealth42: 1403-1408.
. 1966.The incubationperiodandthe dynamicsof infectiousdisease.
AmericanJournalof Epidemiology83:204-216.
SinnotEW.1937.The relationof geneto characterin quantitative
inheritance.
Proceedingsof the NationalAcademyof Sciences23: 224-227.
SnedecorGW,CochranWG.1989.StatisticalMethods.Ames(IA):IowaUniversityPress.
StatistischesJahrbuchder Schweiz.1997:Ziurich(Switzerland):
VerlagNeue
ZiuricherZeitung.
StehmannC,DeWaardMA.1996.Sensitivityof populationsof Botrytiscinerea
to triazoles,benomylandvinclozolin.EuropeanJournalof PlantPathology 102:171-180.
SugiharaG. 1980.Minimalcommunitystructure:An explanationof species
abunduncepatterns.AmericanNaturalist116:770-786.
SwobodaH. 1974.KnaursBuchderModernenStatistik.
Mtinchen(Germany):
DroemerKnaur.

Similar documents