The Influence of Model Averaging on Clade Posteriors: An Example

Transcription

The Influence of Model Averaging on Clade Posteriors: An Example
Syst. Biol. 57(6):905–919, 2008
c Society of Systematic Biologists
Copyright ISSN: 1063-5157 print / 1076-836X online
DOI: 10.1080/10635150802562392
The Influence of Model Averaging on Clade Posteriors: An Example Using
the Triggerfishes (Family Balistidae)
ALEX D ORNBURG ,1 FRANCESCO S ANTINI ,2 AND M ICHAEL E. ALFARO 2
1
School of Biological Sciences, Washington State University, Pullman, Washington 99164, USA; E-mail: [email protected]
2
Department of Ecology and Evolutionary Biology, University of California, Los Angeles, California 90095, USA
Abstract.—Although substantial uncertainty typically surrounds the choice of the best model in most phylogenetic analyses,
little is known about how accommodating this uncertainty affects phylogenetic inference. Here we explore the influence of
Bayesian model averaging on the phylogenetic inference of the triggerfishes (Family: Balistidae), a charismatic group of reef
fishes. We focus on clade support as this area has received little attention and is typically one of the most important outcomes
of phylogenetic studies. We present a novel phylogenetic hypothesis for the family Balistidae based on an analysis of two
mitochondrial (12S, 16S) and three nuclear genes (TMO-4C4, Rhodopsin, RAG1) sampled from 26 ingroup species. Despite
the presence of substantial model uncertainty in almost all partitions of our data, we found model-averaged topologies
and clade posteriors to be nearly identical to those conditioned on a single model. Furthermore, statistical comparison of
clade posteriors using the Wilcoxon signed-rank test revealed no significant differences. Our results suggest that although
current model-selection approaches are likely to lead to overparameterization of the substitution model, the consequences
of conditioning on this overparameterized model are likely to be mild. Our phylogenetic results strongly support the
monophyly of the triggerfishes but suggest that the genera Balistoides and Pseudobalistes are polyphyletic. Divergence time
estimation supports a Miocene origin of the crown group. Despite the presence of several young species-rich subclades,
statistical analysis of temporal diversification patterns reveals no significant increase in the rate of cladogenesis across
geologic time intervals. [Balistidae; Bayesian methods; diversification rates; macroevolution; model averaging; molecular
clock; Tetraodontiformes.]
The importance of model choice to phylogenetic
inference is now widely recognized (reviewed in Sullivan and Joyce, 2005) and model selection procedures are commonly performed as a part of nearly all
phylogenetic analyses. One limitation of most current
approaches is that the pool of candidate models is small
relative to the universe of reasonable models. For example, ModelTest (Posada and Crandall, 1998) currently
considers only eight models with respect to substitution
class, representing less than 4% of the possible 203 timereversible models (Huelsenbeck et al., 2004). When the
candidate pool is expanded to include the entire universe
of substitution models, “unnamed” models frequently
fit the data better than the best-fit model identified by
ModelTest (Posada and Crandall, 1998) and the most
parameter-rich substitution model, GTR, rarely emerges
as the best model, even for large multigene data sets
(Huelsenbeck et al., 2004). If this result is generally true,
it implies that most phylogenetic analyses rely on overparameterized models with respect to substitution type.
Indeed, Kelchner and Thomas (2006) found that over 60%
of the publications they surveyed used a variant of GTR.
The effects of this overparameterization have not been
studied but theoretically should lead to an increase in
the error variance surrounding each parameter (Burnham and Anderson, 2003). This tradeoff could be particularly troublesome for phylogenetic inference as the
topology itself is a parameter being inferred.
A second, related concern for most commonly used
model selection procedures is model uncertainty. Although substantial uncertainty typically surrounds the
choice of the best model in most phylogenetic studies
(Huelsenbeck et al., 2004; Alfaro and Huelsenbeck, 2006),
the influence of this uncertainty on phylogenetic inference is not well understood. Theoretically, conditioning inference on a model that is only marginally better
than other candidate models should lead to an underestimation of parameter variance and thus overconfidence
(e.g., Hoeting et al., 1999; Buckley et al., 2002; Alfaro and
Huelsenbeck, 2006). Empirically, studies suggest that accommodation of model uncertainty has only a minor effect on tree topology (e.g., Posada and Crandall, 2001;
Beier et al., 2004; Nylander, 2004; Posada and Buckley,
2004). However, model uncertainty may have an especially pronounced effect on clade posterior probabilities
as these have been shown to be particularly sensitive
to model violation and underparameterization (Felsenstein, 1978; Huelsenbeck and Hillis, 1993; Sullivan and
Swofford, 1997).
Bayesian model averaging is a computationally
tractable alternative to the current practice of conditioning analyses on a single phylogenetic model (Huelsenbeck et al., 2004; Posada and Buckley, 2004; Pagel and
Meade, 2006; Posada, 2008). Using this method, each
model contributes to the phylogenetic inference in proportion to its posterior probability (Huelsenbeck et al.,
2004; Nylander, 2004). Despite the potential promise of
this approach, it has been implemented in only a small
number of studies (e.g., Beier et al., 2004; Huelsenbeck
et al., 2004; Nylander, 2004; Lee and Hugall, 2006; Alfaro
and Huelsenbeck, 2006). Here we use Bayesian model averaging and model averaging based on Akaike weights
(Akaike, 1973) to explore the influence of model uncertainty on the phylogenetic inference of triggerfishes, a
charismatic group of reef fishes.
Triggerfish Phylogenetics
Members of the family Balistidae (order Tetraodontiformes) are mostly tropical in distribution (Kuiter and
Debelius, 2006) and are some of the most conspicuous
members of the diurnal reef community. The 42 species
905
906
SYSTEMATIC BIOLOGY
in this clade exhibit a high degree of ecological diversity.
Owing to their small and powerful jaws, as well as a
novel feeding repertoire that includes buccal manipulation (Wainwright and Friel, 2000), triggerfish are able to
exploit a wide variety of invertebrate prey. Indeed, several balistids have been recognized as “keystone species”
that control populations of Acanthaster planci, a sea star
known to cause severe damage to coral reefs at high population densities (Ormond et al., 1973; Chen et al., 2001).
Balistid intrafamilial relationships are poorly understood despite two previous phylogenetic studies based
on morphological characters. Matsuura (1979), in his
analysis of the osteology of extant balistoids, suggested
that Canthidermis represents the sister group to all remaining triggerfishes due to its possession of “primitive” scale bones. He divided the remaining balistids into
three lineages: (1) Abalistes; (2) a Rhinecantus + Sufflamen clade supported by a modified interhyal bone; and
(3) an unresolved clade formed by all remaining genera. Tyler (1980) performed an evolutionary taxonomic
investigation of the osteology and external features of
both extant and fossil balistids and hypothesized that the
Oligocene genera Balistomorphus and Oligobalistes represented ancestors of the modern crown group. Given
the absence of a preopercular groove, Tyler (1980) further hypothesized Rhinecanthus and Balistapus to be the
“most generalized” and hence most primitive extant triggerfish. Odonus and Xanthichthys were considered to be
the “most derived” taxa, united by the presence of fanglike second medial teeth and by a strongly supraterminal
mouth. The presence of a slightly supraterminal mouth
identified Melichthys as the sister group to these previous two genera. The relationships of the remaining
genera were not resolved (Tyler, 1980: fig. 66). Both of
these studies assumed all balistid genera to be monophyletic though this hypothesis has never been explicitly tested. Subsequent morphological investigations of
tetraodontiform relationships (Winterbottom, 1974; Leis,
1984; Santini and Tyler, 2003, 2004) included only a small
number of balistids, leaving many questions about triggerfish relationships unanswered.
Recently Holcroft (2005) included 14 balistids as part of
a molecular study of tetraodontiform relationships. Her
analysis retrieved four major groupings: (1) Melichthys
niger; (2) Balistoides conspicillum; (3) a clade composed of
Xanthichthys auromarginatus and Balistoides viridescens; (4)
a clade of all remaining balistids including Canthidermis,
Rhinecanthus, and Balistapus. Her topology is incongruent
with many of the relationships presented in prior morphological studies, providing strong evidence for a polyphyletic Balistoides and placing Melichthys niger as sister
to the remaining balistids, rather than Canthidermis and
Abalistes (Matsuura, 1979) or Balistapus and Rhinecanthus
(Tyler, 1980).
Age and Evolutionary History of the Triggerfishes
The fossil record of the Tetraodontiformes suggests
that triggerfish are a relatively young group. Stem balistoids are not known until the Oligocene (35 Ma) despite
a tetraodontiform fossil record that extends back to the
VOL. 57
Cretaceous. This observation led Tyler and Santini (2002)
to speculate that triggerfishes and filefishes were the last
of the crown tetraodontiform families to appear. Recently
two studies have presented conflicting estimates of balistid divergence times based on fossil-calibrated molecular
data. Yamanoue et al. (2006) dated the split between balistids and their sister group, the monacanthids, at approximately 129.5 Ma, in the early Cretaceous but, due to the
presence of only one triggerfish in their data set, could
not determine the time of origin of the crown group.
More recently, Alfaro et al. (2007) reanalyzed a data set
containing Holcroft’s (2005) data, as well as newly sequenced taxa, in conjunction with 11 fossil calibration
points and estimated that the split between the triggerfish and their sister taxon was about 90 million years
younger (late Eocene, ∼40 Ma), with crown triggerfish
first appearing between the late Oligocene and the early
Miocene (∼25 Ma). Given the large discrepancy between
these divergence time estimates, it is not surprising that
they support conflicting hypotheses as explanations for
the origin and subsequent diversification of triggerfishes.
Alfaro et al. (2007) have suggested that reef association
in triggerfishes is correlated with higher than expected
rates of diversification within the family. A Jurassic or
early Cretaceous origin of the group would undermine
this hypothesis because scleractinian coral reefs are not
known to extend past the Tertiary.
Questions about diversification patterns of the triggerfishes at a finer scale have yet to be addressed. For
example, Floeter et al. (2007) recently suggested two
potentially important “speciation bursts” for numerous Atlantic reef fish groups corresponding to the late
Miocene through Pliocene (8 to 2 Ma) and the Pliestocene
(<1.5 Ma). It is currently not known if triggerfish diversification shows a similar pattern of increase. In fact,
despite their circumtropical distribution (Table 1), to
our knowledge there are no previous hypotheses of
triggerfish biogeography or diversification. Here we test
whether triggerfish subclades experienced significant
increases in diversification rates over their evolutionary
history.
Objective
We statistically compare single-model and modelaveraged posterior probabilities for clades of triggerfishes to investigate the influence of model uncertainty
on phylogenetic confidence. We adopt a Bayesian approach using a reversible-jump Markov chain Monte
Carlo sampler that allows model posteriors to be easily calculated (Green, 1995) and compare this to a procedure based on Akaike weights (Posada and Buckley,
2004). We also present the first detailed molecular phylogenetic study of the triggerfishes and integrate our data
from multiple nuclear and mitochondrial loci with previously published morphological data to produce a total
evidence topology for the family. Finally, we quantify
temporal patterns of balistid cladogenesis using a relaxed clock method and use our chronogram as a framework to statistically test several hypotheses of balistid
diversification.
2008
907
DORNBURG ET AL.—MODEL-AVERAGED PHYLOGENY OF THE BALISTIDAE
TABLE 1. List of Balistid taxa examined in this study, locality data, voucher numbers, and GenBank accession numbers.
GenBank accession number
Taxon
Abalistes stellatus
Balistapus undulates
Balistes capriscus
Balistes polylepis
Balistes punctatus
Balistes vetula
Balistoides conspicillum
Balistoides viridescens
Canthidermis maculatus
Melichthys niger
Melichthys vidua
Odonus niger
Pseudobalistes flavimarginatus
Pseudobalistes fuscus
Rhinecanthus aculeatus
Rhinecanthus assassi
Rhinecanthus lunula
Rhinecanthus rectangulus
Rhinecanthus verrucosus
Sufflamen albicaudatum
Sufflamen bursa
Sufflamen chrysopterum
Sufflamen fraenatum
Xanthichthys auromarginatus
Xanthichthys mento
Xanthichthys ringens
Locality
Voucher
12S
16S
Rhodopsin
Tmo-4C4
RAG1
Marine Wholesale
Fiji
Alabama, USA
Genbank
Ghana
Smithsonian
Marine Wholesale
Genbank
Alabama, USA
Oahu, HI
Marine Wholesale
Fiji
Marine Wholesale
Indian Ocean
Fiji
Red Sea
Marine Wholesale
Sri Lanka
Solomon Islands
Solomon Islands
Marine Wholesale
Marine Wholesale
Genbank
Oceania
Caribbean
Marine Wholesale
PW-1324
MEA 264
CU-90721
KU 28370
MEA 264
5243
—
KU uncat.
CU-90732
MEA 312
MEA 168
MEA 129
—
MEA 115
MEA 194
MEA 167
MEA 142
MEA 110
MEA 288
MEA 145
MEA 198
PW-1325
—
MEA 164
MEA 116
MEA 132
AY700248
EU108802
AY700238
AY700239
EU108801
AY700240
AY700241
AY700250
AY700242
AY700243
EU108803
EU108804
EU108805
AY700244
AY700247
AY700245
EU108806
EU108807
EU108808
EU108809
AY700249
AY700251
NC 004416
AY700246
EU108810
EU108811
AY679632
EU108813
AY679622
AY679623
EU108812
AY679624
AY679625
AY679634
AY679626
AY679627
EU108814
EU108815
EU108816
AY679628
AY679631
AY679629
EU108817
EU108818
EU108819
EU108820
AY679633
AY679634
NC 004416
AY679630
EU108821
EU108822
EU108845
EU108849
EU108846
—
EU108848
EU108850
EU108847
—
EU108851
EU108852
EU108853
EU108854
EU108855
EU108856
EU108857
EU108858
EU108859
EU108860
EU108861
EU108862
EU108863
EU108864
—
EU108865
EU108866
EU108867
EU108823
EU108826
EU108824
—
EU108827
EU108828
EU108825
—
EU108829
—
EU108830
EU108831
EU108832
EU108833
EU108834
EU108835
EU108836
EU108837
EU108838
EU108839
EU108840
EU108841
—
EU108842
EU108843
EU108844
AY700318
EU108869
AY700308
AY700309
EU108868
AY700310
AY700311
AY700320
AY700312
AY700313
EU108870
EU108871
EU108872
AY700314
AY308790
AY700315
EU108873
EU108874
EU108875
EU108876
AY700319
—
AY700321
AY700316
EU108877
EU108878
M ETHODS
Sampling
Samples were obtained through tissue loan, marine
wholesale, and field collection with voucher specimens
deposited into the collection of the Charles R. Conner Museum at Washington State University (Table 1).
Additional sequences were downloaded from GenBank
(Table 1). Filefish (monacanthids) are uncontroversially
recognized as the sister group to the triggerfishes (Winterbottom, 1974; Matsuura, 1979; Rosen, 1984; Santini
and Tyler, 2003, 2004; Holcroft, 2005; Yamanoue et al.,
2006; Alfaro et al., 2007) and we included three species
to serve as outgroups in our study. Our ingroup sample included 26 species and 11 genera of balistids. This
includes all genera of extant triggerfish lineages, except
the most recently described rare genus Xenobalistes (Matsuura, 1981).
DNA Extraction, PCR Amplification, and Sequencing
Muscle tissue samples were stored in 70% ethanol
prior to use. DNA was extracted for most taxa using
the Chelex (Bio-Rad) protocol described in Walsh et al.
(1991). Additional extractions for Balistes vetula, Pseudobalistes fuscus, and Balistes capriscus utilized the PureGene extraction kit and protocol (Gentra Systems).
We used the polymerase chain reaction (PCR; Saiki,
1990) to amplify two mitochondrial genes, 12S rDNA
(∼833 bp) and 16S rDNA (∼563 bp), and three nuclear
genes, Rhodopsin (∼564 bp), Tmo4C4 (∼575 bp), and
RAG1 (∼1471 bp). One microliter of genomic template
was used per 25-µL reaction, containing 5 µL of 5× Go-
Taq Flexi PCR buffer (Promega), 2 µL MgCl 2 (25 mM),
0.5µL dNTPs (8 µM), 1.25 µL of each primer (Table 2),
and 0.125 µL of Promega GoTaq Flexi DNA polymerase
(5 U/µL).Amplification of all gene fragments was conducted with an initial denaturing step at 94◦ C for 1 to 2
min; 37 cycles with a 0.5- to 1.5-min 94◦ C denaturing; a
45- to 75-s 48.5◦ C to 60◦ C annealing; and a 1- to 2-min
72◦ C extension, followed by an additional 5-min 72◦ C
extension and a 10-min 23◦ C cool down. PCRs were performed on two MJ Research PTC-200 Peltier thermal cyclers and a Bio-Rad iCycler. All products were stored at
−4◦ C after amplification.
Excess dNTPs and unincorporated primers were removed from PCR products using ExoSap (Amersham
Biosciences). Purified products were cycle-sequenced using the BigDye Terminator v.3.1 cycle sequencing kit (Applied BioScience) with each gene’s original or additional
internal primers (Table 2) used for amplification. The
cycle sequencing protocol consisted of 25 cycles with
a 10-s 94◦ C denaturation, 5 s of 50◦ C annealing, and
a 4-min 60◦ C extension. Sequences were produced at
the Washington State University Center for Integrated
Biotechnology Core Laboratory using an ABI 377 and an
ABI3100.
Sequence Alignment
12S and 16S rDNA sequences were aligned by eye
to secondary structure models used in previously published studies of labrid fishes (Streelman et al., 2002;
Clements et al., 2004; Westneat and Alfaro, 2005). Ambiguously aligned regions were identified by eye and removed prior to analysis for both mitochondrial genes.
908
VOL. 57
SYSTEMATIC BIOLOGY
TABLE 2. Primers used for PCR amplification and sequencing.
Gene
12s rDNA
16s rDNA
Rag1
Rhodopsin
Tmo-4C4
Primer
Reference
Phe2-L
12sd-R
12d-L
12Sb-H
16SAR-F
16SBR-R
R-2510F
R-3261R
R-3098F
DDRAG1F
DDRag1R
RH-545
RH-1073
TMO-FL-6A
TMO-F1-5
TMO-RL-3
Holcroft, 2005
Holcroft, 2005
Holcroft, 2005
Holcroft, 2005
Holcroft, 2005
Holcroft, 2005
This study
This study
This study
This study
This study
Chen et al., 2003
Chen et al., 2003
Clements et al., 2004
Clements et al., 2004
Clements et al., 2004
Alignment of the protein-coding genes (Rhodopsin,
Rag1, and Tmo4C4) was trivial and done using a text editor (BBEdit, BareBones Software). Gene matrices were
edited in Se-Al v.2.0 (Rambaut, 1996). We trimmed sequences to the size of the smallest fragment for each
gene to minimize missing characters in the data matrix. Our final data matrix consisted of 754 bp of
12S, 475 bp of 16S, 404 bp of Rhodopsin, 545 bp of
Tmo4C4, and 1205 bp of Rag1 for a total of 3383 characters used in analysis. Sequences were checked using
NCBI’s BLAST and have been deposited in GenBank (Table 1). All aligned data matrices have been deposited
in TreeBase (accession numbers: SN3541-20316 (12S),
SN3541-20318 (16S), SN3541-20315 (RAG1), SN354120313 (Rhodopsin), SN3541-20310 (Tmo4C4), SN354120309 (concatenated data)).
Bayesian Analysis
We ran all MCMC chains in the analyses below for
20 million generations, sampling every 1000, as our preliminary analysis revealed this to be sufficient to ensure
convergence of the chains. Convergence was assessed by
visual inspection of the state likelihoods, potential scale
reduction factors, and the average deviation of clade
splits between replicate runs. To further insure that inference was based on samples from the target distribution,
we discarded the first 25% of the 20,000 trees as burn-in.
Morphological Analysis
We assembled a list of 34 potentially informative
characters by selecting all the characters that defined
the familial clades of the Balistidae and Monacanthidae identified in Santini and Tyler (2003), as well
as the characters used in Matsuura (1979) and Tyler
(1980). We scored taxa using osteological descriptions
from the literature as well as new clear and stained
specimens (Appendix S1, available at http://www.
systematicbiology.org). The morphological matrix was
analyzed in MrBayes 3.1.2 (Ronquist and Huelsenbeck,
2003) using the default prior settings for morphological
data. Visual inspection of model parameters and potential scale reduction factors revealed that the chain appeared to reach stationarity after 200,000 generations.
Sequence
AAA GCA TAA CAC TGA AGA TGT TAA GAT G
GGG TTG GTA AAT CTC GTG C
GCT GGC ACG AGT TTT ACC GGC C
AGG AGG GTG ACG GGC GGT GTG T
CGC CTG TTT ATC AAA AAC AT
CCG GTC TGA ACT CAG ATC ACG T
TGG CCA TCC GGG TMA ACA C
CCC TCC ATY TCN CGM ACC ATC TT
TGT GCC TGA TGY TYG TDG AYG ART
TTC ACC AGT TTG AAT GGC AGC C
AAC GCC TGA AYA GTT TAT TTG C
GCA AGC CCA TCA GCA ACT TCC G
CCR CAG CAC ARC GTG GTG ATC ATG
GAA AAG AGT GTT TGA AAA TGA
CCT CCG GCC TTC CTA AAA CCT CTC
CAT CGT GCT CCT GGG TGA CAA AGT
Bayesian Analysis of Molecular and Combined Data
We analyzed each gene partition independently and
performed a partitioned mixed model analysis of the
combined data. For each data set we assigned default priors to all model parameters (topology: uniform; revmat:
Dirichlet (1.0, 1.0, 1.0, 1.0, 1.0, 1.0); statefreq: Dirichlet
(5.0, 5.0, 5.0, 5.0); pinvar: uniform (0.0, 1.0); Brlengths:
exp (10.0); shape: uniform (0, 1)). We selected models
of evolution for these analyses using two approaches
(see Table 4): direct calculation of model posterior probabilities using RJ-MCMC (Huelsenbeck et al., 2004) and
Akaike weights (Akaike, 1973) calculated using ModelTest (Posada and Crandall, 1998). We also performed a
combined analysis of the morphological and molecular
data by concatenating the morphological and gene matrices and using the MCMC chain parameters as described
above. The mixed-partition analysis utilized the default
settings for morphology and the best-fit model according
to the AIC criterion.
Single-Model and Model-Averaged Bayesian Analysis
We used MrBayes 3.1.2 (Ronquist and Huelsenbeck,
2003) and a custom-written reversible jump MCMC
sampler (Huelsenbeck et al., 2004) to perform singlemodel and model-averaged Bayesian analyses, respectively. Priors on all parameters of the model (with the
exception of the substitution model itself) were identical for both sets of analysis: uniform on topology, flat
Dirichlet (5.0, 5.0, 5.0, 5.0) on nucleotide frequencies, exp
(10) on branch lengths, and a uniform (0, 50) prior on
the gamma-shape parameter. Our reversible jump sampler further applied a uniform prior on the 203 possible
substitution models.
We performed parallel single-model and modelaveraged analyses using the following partitions of our
data: all individual gene partitions, a concatenated mitochondrial gene (12S+16S) data matrix, a concatenated
nuclear gene (Rhodopsin, RAG1, Tmo-4C4) data matrix,
and the entire concatenated data. This approach allowed
us to compare the effects of accommodating model uncertainty on data sets of varying length. We additionally
performed seven single-model analyses on the following
2008
DORNBURG ET AL.—MODEL-AVERAGED PHYLOGENY OF THE BALISTIDAE
data partitioning schemes for the concatenated data set:
(1) One substitution model; (2) rDNA and lumped nuclear genes; (3) rDNA stem regions, rDNA loop regions,
lumped nuclear genes; (4) rDNA and a model for each
nuclear gene; (5) by gene; (6) rDNA stems, rDNA loops,
each nuclear gene; (7) rDNA stems, rDNA loops, codon
positions for each gene. We calculated marginal likelihoods and computed Bayes factor scores for each of these
analyses to account for topological uncertainty as discussed in Brandley et al. (2005). Single-model clade posterior probabilities were regressed on model-averaged
posteriors using the comparetree command in MrBayes
and KaleidaGraph (Synergy Software). Statistical significance between the model-averaged posteriors and those
contingent on a single model of sequence evolution was
assessed using the nonparametric Wilcoxon signed-rank
test (Simmons et al., 2004).
Single-Model and Model-Averaged
Maximum Likelihood Analysis
We analyzed the concatenated molecular data set to assess the effect of nucleotide substitution model overparameterization on bootstrap support values. Corrected
Akaike scores (AICc ) scores and Akaike weights for all
possible models of nucleotide substitution were calculated and the best-fit model from ModelTest was compared to the 95% interval of credible models from the
candidate pool. For each selected model, including the
best-fit model, we used PAUP* 4.01b (Swofford, 2003) to
perform 1000 bootstrap replicates. Each replicate incorporated two random sequence additions and the TBR
branch-swapping algorithm. To save on computational
time, a time limit of 4 h was assigned to each replicate.
For the model-averaged analysis, the bootstrap value at
each node was multiplied by the weight of the model
and all values were summed to obtain a model-averaged
bootstrap measure of support. These values were compared to the best-fit model’s support values and tested
for significance using the Wilcoxon signed-rank test.
Divergence Time Estimation
We constrained three nodes in the balistid tree (Table
3) for divergence time analysis. Two of these calibrations
were based on the fossil record (Table 3). The split between the Balistidae and the Monacanthidae (see Fig. 3,
node 1) was based on four fossil stem balistids dated to
35 Ma: Balistomorphus orbiculatus, B. ovalis, B. spinosus,
and Oligobalistes robustus (Tyler and Santini, 2002). We
TABLE 3. Calibrations used in this study.
Node
MRCA of Monacanthidae
and Balistidae
Crown Balistidae
Split Balistes
and Canthidermis
Minimum age/95%
HPD (Myr)
35/70
22.9/29.9
5/50
Source
Fossil
Secondary constraint
(Alfaro et al., 2007)
Fossil
909
assigned a prior minimum age of 35 Myr to this calibration to reflect the age of these fossils and further assigned
a mean age of 50 Myr (reflecting the appearance of several other tetraodontiform families in the fossil record)
and an upper bound of 70 Myr (reflecting the appearance of the first stem tetraodontiforms) after Alfaro et al.
(2007). We used soft upper bounds (i.e., upper bounds
indicate the 95% cumulative density of the prior) on all
fossil constraints to avoid artificially truncating the posterior distribution of our divergence time estimates (e.g.,
Yang and Rannala, 2006)
We used the fossil Balistes procapriscus from the late
Miocene to assign a minimum age of 5 Myr to the crown
age of Balistes (Fig. 3, node 3; Santini and Tyler, 2003). We
initially used this fossil to date the split between Balistes
and Pseudobalistes fuscus following Alfaro et al. (2007);
however, preliminary analysis revealed P. fuscus to be
nested within Balistes. Based on this, we reassigned the
calibration to the crown Balistes. We assigned an upper
bound of 50 Myr to this calibration.
We assigned a secondary constraint to the age of crown
balistids (Fig. 3, node 2) corresponding to the 95% credible interval estimate for balistids from Alfaro et al. (2007).
Our normally distributed prior assigned a mean age of
22.9 Myr to the split (d = 4.2 Myr). This age is congruent with the current paleontological evidence: no balistid fossils are known older than the middle Miocene
(Schultz, 2004), whereas the stem balistids are at least
35 million years old. (Tyler and Santini, 2002).
We estimated divergence times using the concatenated
data under a model of uncorrelated log-normally distributed rates using BEAST (Drummond et al., 2006). A
Yule (pure-birth) prior was assigned to rates of cladogenesis. Based on results from the MrBayes analysis, we partitioned our data into seven regions to allow a separate
substitution model to be used for each ribosomal stem
and loop region and also an individual model for each
nuclear gene. We ran three independent analyses of 20
million generations, assessing convergence using Tracer
1.3 (Rambaut and Drummond, 2007). The first 25% of
the generations were discarded as burn-in and the effective sample size (ESS) for model parameters was also
assessed to check for good mixing of the MCMC (ESS
exceeded 200 for all model parameters in our analysis).
Diversification Statistics
All diversification statistics were implemented in
the software package R (R core-development team,
2006), using functions in the package Geiger (Harmon et al., 2008) and APE (Paradis et al., 2004). The
global diversification rate of the Balistidae (λG ) as
well as the diversification rate of five focal subclades
was calculated using the method-of-moments estimator from Magallon and Sanderson (2001). We further
tested whether diversification rates of subclades with
a Pliocene/Pleistocene crown age differed significantly
from λG using the method of Magallon and Sanderson (2001). To account for the pull of the present (Pybus and Harvey, 2000) in these estimates, we used
910
VOL. 57
SYSTEMATIC BIOLOGY
extinction rates ranging from 0 to 0.5 in our estimates (see Table 7). These values represented the confidence interval obtained using the birth-death function
to calculate relative extinction in the package Geiger
(Harmon et al., 2008). To test for a nonconstant triggerfish diversification rate given our incomplete taxonomic sampling, we used the MCCR test function
(Pybus and Harvey, 2000; Pybus et al., 2002) based on
20,000 simulations in Geiger (Harmon et al., 2008). The
MCCR assumes no significant difference in diversification rates between lineages. We tested this assumption
using the relative cladogenesis statistic (Nee et al., 1992).
To test for significantly elevated rates of cladogenesis
during specific time intervals, we used a novel function
(Brock, unpublished) to calculate Kendall-Moran estimates of diversification rate (r) based on a pure-birth
process (Baldwin and Sanderson, 1998; Nee, 2001) for
each major geologic period, as well as each subdivision
of the Miocene. This test accounts for incomplete taxon
sampling and the impact of extinction on the distribution of waiting times (“pull of the present”; e.g., Pybus
and Harvey, 2000). We compared observed values of r
for each time interval to a null distribution generated by
the simulation of 20,000 birth-death trees under global
estimates of triggerfish diversification and extinction
rates.
R ESULTS
Bayesian Analysis
The average standard deviation of the clade splits between independent runs was less than 0.1% and potential scale reduction factors (Gelman and Rubin, 1992) for
all parameters were approximately 1.00 for all analyses,
suggesting that we adequately sampled the target distributions. Comparison of Bayes factor scores revealed
that assigning separate models to each ribosomal stem
or loop region, as well as each individual gene (seven
separate partitions total), fit our data best. Analysis of
the concatenated data set recovered a well-resolved phylogeny of the balistids and revealed five major clades: (1)
Balistes (including Pseudobalistes fuscus); (2) Rhinecanthus;
(3) Sufflamen; (4) Canthidermis + Abalistes; (5) all remaining balistids not in clades 1 to 4 (Fig. 1). The genera Balistes (clade 1) is strongly supported as the sister group
to the remaining balistids. Pseudobalistes fuscus appears
deeply nested within this group, a placement that renders the genus Balistes paraphyletic. Sufflamen (clade 2)
is strongly supported as sister to Rhinecanthus (clade 3).
The relationship of R. lunula, R. rectangulus, and R. verrucosus was unresolved, though these taxa formed the
sister group to the remaining Rhinecanthus species.
Clade 5 shows Balistoides and Pseudobalistes to be polyphyletic as currently defined. In the latter case, the two
species of Pseudobalistes are recovered in clades 1 and
5, respectively. The two species of Balistoides are placed
within two different subclades of clade 5, with strong
support for a sister relationship between Balistoides viridescens and Pseudobalistes flavimarginatus. The remaining
Balistoides in our study, B. conspicillum, forms the sister
TABLE 4. Uncertainty in model choice. Best-fit models selected using the AIC (Akaike, 1973) implemented in ModelTest (Posada and
Crandall, 1998) and PAUP (Swofford, 2003) compared to the posterior probability of the most visited model by the RJ-MCMC sampler
(Huelsenbeck et al., 2004) for all data sets. Probability of a model is
equal to the frequency it was visited by the RJ-MCMC sampler.
Model averaging
Gene partition
12S
16S
RAG1
Rhodopsin
Tmo-4C4
12S + 16S
Concatenated
data
Model
selected
GTR+G
1, 1, 1, 2, 3, 2a
GTR+G
1, 1, 1, 1, 2, 1a
1, 1, 1, 1, 2, 1a
GTR+G
1, 2, 3, 4, 5, 4a
Probability
0.50800
0.84673
0.24566
0.29684
0.61910
0.70176
0.57004
Model test
Model
selected
Weight
(AIC)
GTR+I+G
SYM+I+G
GTR+I+G
HKY+I+G
K81uf+I
GTR+I+G
GTR+I+G
0.9956
0.7895
0.6931
0.3185
0.1586
0.9968
0.9990
a
Unnamed models represented by substitution rate matrix (see Huelsenbeck
et al., 2004).
group to Melichthys. This placement leaves B. conspicillum placed deep within a clade consisting of Balistapus,
Odonus, and Melichthys.
The combined morphological/molecular tree (data
not shown) was perfectly congruent with the tree based
on molecular data only (Fig. 1). We attribute this to a
nearly complete lack of resolution provided by analysis
of the morphological data set only (data not shown). Posterior probabilities were qualitatively similar between
the molecular and the molecular + morphological data
and not significantly different by the Wilcoxon signedrank test (P = 0.232).
Single-Model and Model-Averaged Bayesian Analysis
Our analysis revealed that the most probable model
was not always congruent with the model chosen by
ModelTest (Table 4). Four of our seven data sets revealed
the model with the highest posterior probability to be an
unnamed model. Additionally, the posterior probability
of the most visited model by the RJ-MCMC sampler was
lower than the probability of the “best-fit” model chosen
using the AIC in ModelTest for five of the seven data sets.
Our model-averaged topology (Fig. 1) was qualitatively similar to the topology conditioned on a single
model, with no conflicts between strongly supported
relationships (PP > 95%). Visual inspection of clade
support values revealed little difference between modelaveraged and single-model analyses (Fig. 1) for highly
supported nodes (PP > 95%). Although qualitative differences were more obvious for lower support values (see
Fig. 4), a Wilcoxon signed-rank test revealed that these
were not statistically significant (Table 5). Further tests
of PPs of <90 % and <50% were also not significant (P >
0.4 for all, data not shown).
Single-Model and Model-Averaged Maximum
Likelihood Analysis
Maximum likelihood analysis recovered a single best
topology (−LnL = −14,338.09) qualitatively similar to
the tree inferred by Bayesian methods (Fig. 2). Visual
2008
911
DORNBURG ET AL.—MODEL-AVERAGED PHYLOGENY OF THE BALISTIDAE
Paraluteres prionurus
1.0
1.0
Pervagor janthinosoma
Outgroup
Monacanthus ciliatus
1.0
1.0
Balistes punctatus
Balistes vetula
.95
.99
Pseudobalistes fuscus
.95
(1)
Balistes capriscus
1.0
1.0
Balistes polylepis
1.0
1.0
1.0
1.0
Rhinecanthus assasi
Rhinecanthus aculeatus
Rhinecanthus rectangulus
1.0
1.0
(2)
Rhinecanthus lunula
1.0
.95
Rhinecanthus verrucosus
Sufflamen fraenatum
1.0
1.0
Sufflamen bursa
1.0
.97
.97
.98
1.0
1.0
Sufflamen albicaudatum
Sufflamen chrysopterum
Abalistes stellatus
Canthidermis maculata
(4)
Balistoides viridescens
1.0
1.0
Pseudobalistes flavimarginatus
1.0
1.0
Xanthichthys mento
1.0
1.0
Xanthichthys auromarginatus
.96
.95
1.0
.98
Xanthichthys ringens
Balistapus undulatus
1.0
.96
0.03 substitutions/site
(3)
Balistidae
1.0
.98
(5)
Odonus niger
1.0
.99
Balistoides conspicillum
.99
.98
1.0
1.0
Melichthys niger
Melichthys vidua
FIGURE 1. Fifty percent majority-rule consensus tree resulting from the single-model and model-averaged Bayesian analyses of the molecular
concatenated data set. Posterior probabilities greater than 0.95 inferred by the single model analysis are shown above each node, model averaged
PPs are depicted below. Branch lengths are in substitution units based on analysis of the molecular data by the single model only. The branch
leading to Sufflamen fraenatum has been scaled by 50% to fit into this figure. Clade numbers (1 to 5) represent identified clades in the text: (1)
Balistes; (2) Rhinecanthus; (3) Sufflamen; (4) Canthidermis + Abalistes; (5) all remaining balistids.
912
VOL. 57
SYSTEMATIC BIOLOGY
TABLE 5. Results of Wilcoxon signed rank test. Comparison of posterior probabilities inferred by model averaging, MrBayes, and BEAST.
All data sets analyzed produced statistically non-significant differences
in phylogenetic inference between model-averaged results and those
conditioned on a single substitution model.
Data set
Methods compared
P-value
Tmo-4C4
Rhodopsin
RAG1
12S
Concatenated data
Concatenated data
Concatenated data
Model averaging vs. MrBayes
Model averaging vs. MrBayes
Model averaging vs. MrBayes
Model averaging vs. MrBayes
Model averaging vs. MrBayes
BEAST vs. model averaging
BEAST vs. MrBayes
0.1344
0.1814
0.4911
0.1817
0.1013
0.3210
0.2487
inspection of single-model versus model-averaged bootstrap values reveals little fluctuation around highly supported nodes. Additionally, support values below 90
appear not to experience high levels of fluctuation. The
Wilcoxon signed-rank test reveals no statistical significant difference in bootstrap support values between
single-model and model-averaged analyses (P > 0.97).
Additional tests reveal no statistical difference in bootstrap values greater than 90 (P > 0.99) and no significant differences in values between 50 and 90 (P >
0.84).
Divergence Time Estimation
Our BEAST topology (Fig. 3) revealed the same major clades as the model-averaged consensus tree (Fig. 1),
with some topological differences (see below). Additionally, this analysis recovered slightly higher support at
some nodes (though these differences were not statistically significant; Table 5). We recover a crown age of
the balistids at approximately 11.3 Myr (Table 6, node
2). Our chronogram places the stem ages of A. stellatus and Canthidermis maculata (Table 6, nodes 3 and 4)
at approximately 10.0 and 9.9 Myr, respectively, indicating these genera as belonging among the oldest extant
balistid lineages. Our analysis recovers Balistes deeply
nested within the balistids, though this placement is
weakly supported. Crown Sufflamen originated approximately 6.1 Ma (Table 6, node 5), whereas the crown age
of Rhinecanthus (Table 6, node 10) indicates the group
to be relatively young, appearing approximately 3.3 Ma.
Eleven of the 24 nodes appear within the past 4 Myr,
with Rhinecanthus, Xanthichthys, and Melichthys all being
relatively young genera that originated in the last 4 to
2 Myr.
Diversification Statistics
We estimated a diversification rate (λG ) of 0.25 for
crown triggerfishes. The MCCR test (Pybus and Harvey,
2000; Pybus et al., 2002) failed to reject a hypothesis of
constant diversification rates for the triggerfishes (P =
0.10). Our global extinction rate was estimated to be 0
and the same log-likelihood score (−14.42) was given to
the purebirth model using Magallon and Sanderson’s
(2001) equation. Based on these results, we were unable
TABLE 6. Median divergence estimates for Balistid nodes.
Median age (Myr)/
95% HPD
Node
1. MRCA Monacanthidae + Balistidae
2. Crown Balistidae
3. MRCA Canthidermis + Abalistes +
Sufflamen + Rhinecanthus
4. MRCA Abalistes + Sufflamen +
Rhinecanthus
5. Crown Sufflamen
6. MRCA S. bursa + S. chrysopterum + S.
albicaudatum
7. MRCA S. chrysopterum + S. albicaudatum
8. MRCA Sufflamen + Rhinecanthus
9. MRCA R. assasi + R. aculeatus
10. Crown Rhinecanthus
11. MRCA R. rectangulus + R. verrucosus +
R. lunula
12. MRCA Rhinecanthus verrucosus + R.
lunula
13. Crown Balistes
14. MRCA “B.” fuscus + B. vetula + B.
capriscus + B. polylepis
15. MRCA B. vetula + B. capriscus + B.
polylepis Abalistes Abalistes
16. MRCA B. capriscus + B. polylepis
17. MRCA Xanthichthys + “Pseudobalistes” +
Balistes Balistapus + Melichthys +
Odonus + Balistoides
18. Crown “Pseudobalistes”
19. MRCA Xanthichthys + “Pseudobalistes”
20. Crown Xanthichthys
21. MRCA X. auromarginatus + X. ringens
22. MRCA Xanthichthys + “Pseudobalistes”
Balistapus + Melichthys + Odonus +
Balistoides
23. MRCA Balistapus + Odonus
24. MRCA Odonus + Melichthys + Balistoides
+ Balistapus
25. MRCA Balistoides + Melichthys
26. Crown Melichthys
36.6/(35.2, 39.7)
11.3/(8.2, 15.9)
10.0/(7.1, 13.7)
9.9/(7.2, 13.7)
6.2/(4.0, 9.1)
3.9/(2.4, 5.9)
1.8/(0.7, 3.0)
8.9/(6.0, 12.8)
1.5/(0.7, 2.7)
3.3/(2.0, 5.3)
1.9/(1.0, 3.0)
1.6/(0.8, 3.0)
7.9/(5.8, 10.8)
6.7/(4.8, 9.4)
5.1/(3.3, 7.3)
1.5/(0.6, 2.5)
10.2/(7.6, 14.2)
0.9/(0.3, 1.8)
6.3/(4.0, 9.2)
2.0/(1.0, 3.2)
0.9/(0.3, 1.7)
9.1/(6.5, 12.5)
5.7/(3.4, 8.1)
7.7/(5.4, 10.7)
1.7/(4.1, 9.0)
2.5/(1.3, 4.1)
to reject a purebirth model for the balistid diversification. Our Kendall-Moran estimates of speciation rate
(Baldwin and Sanderson, 1998; Nee, 2001) across given
time intervals revealed fluctuations in diversification
rates over given time periods; however, none of these
results were significantly different from λG (Table 7). Assessing rates of cladogenesis between lineages revealed
no statistically significant rapid radiations for any of the
major clades, including major lineages with crown ages
in the Pliocene/Pleistocene such as Xanthichthys and
Rhinecanthus.
D ISCUSSION
Despite often substantial uncertainty surrounding
model choice and frequently overparameterized substitution models, our analysis revealed that modelaveraging had only a modest influence on inference of
triggerfish phylogeny and clade support. This suggests
that although the widespread use of GTR substitution
models in phylogenetics is probably not statistically justified, phylogenetic inference is likely robust to both overparameterization of and uncertainty surrounding the
substitution model.
2008
913
DORNBURG ET AL.—MODEL-AVERAGED PHYLOGENY OF THE BALISTIDAE
Paraluteres prionurus
Pervagor janthinosoma
Outgroup
Monacanthus ciliatus
100
100
Balistes punctatus
Balistes vetula
92
92
Pseudobalistes fuscus
99
99
Balistes capriscus
99
99
Rhinecanthus aculeatus
96
96
(1)
Balistes polylepis
Rhinecanthus assasi
Rhinecanthus rectangulus
(2)
95
Rhinecanthus lunula
100
100
Rhinecanthus verrucosus
Sufflamen fraenatum
Sufflamen bursa
90
90
100
100
Sufflamen chrysopterum
(3)
Sufflamen albicaudatum
Balistidae
92
92
Abalistes stellatus
(4)
Canthidermis maculata
100
Pseudobalistes flavimarginatus
Balistoides viridescens
96
95
Xanthichthys mento
100
100
95
94
Xanthichthys auromarginatus
Xanthichthys ringens
(5)
Balistapus undulatus
Odonus niger
94
94
Balistoides conspicillum
100
100
Melichthys niger
Melichthys vidua
FIGURE 2. Maximum likelihood consensus tree estimated by single-model and model-averaged analysis. Single-model bootstrap support
greater than 90 shown above each branch, model averaged support is depicted below. Clade numbers (1 to 5) represent the major clades referenced
in the text.
914
SYSTEMATIC BIOLOGY
VOL. 57
FIGURE 3. Fifty percent majority-rule consensus chronogram resulting from the concatenated data set. Posterior probabilities greater than
0.95 are indicated at each node by squares. Branch lengths are in units of time corresponding to upper and lower scale bars. Upper scale bar
marks major geological intervals of interest, lower scale bar displays time (Ma) since present. All numbered nodes correspond to Table 6, where
ages and 95% HPD are given. Nodes 1 (insert), 2, and 3 correspond with calibration points referenced in the text (Table 3).
2008
DORNBURG ET AL.—MODEL-AVERAGED PHYLOGENY OF THE BALISTIDAE
915
FIGURE 4. Linear regressions of model-averaged posteriors compared to those inferred in MrBayes for six gene partitions: (a) expected regression of two identical runs in GTR + G only, (b) expected regression of two identical model-averaged analyses, (c) Rhodopsin, (d) concatenated
molecular data set, (e) 12S, (f) RAG1. Posteriors include all possible bipartitions, analyzed using the comparetree command in MrBayes.
916
SYSTEMATIC BIOLOGY
TABLE 7. Kendall Moran estimates of speciation rate (λ). Median
divergence time estimates for selected Balistid nodes (with 95% HPD).
Node numbers correspond to numbered nodes in Figure 4.
Geological time division
Middle Miocene
Late Miocene
Pliocene
Pleistocene
KM estimate of λ
P-value
0.3213
0.1762
0.1136
0.16607
0.3533
0.4308
0.6125
0.3729
In addition, our analyses provide the first species-level
phylogeny for triggerfishes and suggest several novel
hypotheses of their evolutionary history. Below we consider the value of model-averaging in phylogenetics as
well as the implications of our combined data phylogeny
and chronogram for the triggerfishes.
Model-Averaging
Our results are consistent with several previous studies that have shown topology to be generally robust to
model uncertainty (e.g., Posada and Crandall, 2001; Beier
et al., 2004; Nylander, 2004; Posada and Buckley, 2004).
Although it has been suggested that model-averaged
posterior probabilities differ from single-model posteriors (Beier et al., 2004), our statistical analyses suggest that
these differences are not likely to be significant. Thus,
with respect to substitution parameters, both topology
and clade support appear to be relatively robust to
the uncertainty surrounding model choice. One reason for this may be that the current practice of selecting models from a relatively small fraction of the
possible substitution models still leads investigators to
reasonably good models for their data. In our study,
even though the GTR model did not always receive
the highest posterior probability for any data partitions
(Table 4), it frequently appeared within the 95% credible
interval.
Previously, Beier et al. (2004) noted visual differences
between low model-averaged clade support values and
those conditioned on a single model. As other studies have suggested (Huelsenbeck and Rannala, 2004;
Lemmon and Moriarty, 2004), slight overparameterization does not seem to cause substantial problems in
phylogenetic inference. However, the effects of gross
overparameterization on phylogenetic inference of posterior probabilities and other parameters have not been
systematically addressed and we suggest that the assessments of model adequacy and uncertainty are appropriate for phylogenetic statistical studies of complex
data sets. Although we witnessed differences of clade
support values of as much as 18% for poorly supported
nodes in some of our data sets, none of these results were
significant. The Wilcoxon signed-rank test is conservative by nature, and we do not know how qualitatively
different PPs could be without yielding a significant
result.
Given that many analyses are potentially overparameterizing with respect to substitution model (see
Kelchner and Thomas, 2006), our results should not be
VOL. 57
taken to mean that accommodation of model uncertainty
is not helpful. The method may provide more reliable estimates of branch lengths or of substitution rates for studies specifically focused on these parameters. Indeed, we
observed qualitative differences between branch lengths
estimated by single-model and model-averaged analyses for data sets used in this study (analysis by A.D.
and M.E.A.). Furthermore, reversible-jump algorithms
could allow for averaging across more divergent models.
Accommodating uncertainty surrounding partitioning
strategies, for example, might have a profound influence
on topology and clade posteriors.
Phylogenetic Relationships of the Balistidae
Triggerfish represent one of the most conspicuous
components of the diurnal coral reef fauna worldwide,
yet until now little was known about their interspecific
phylogenetic relationships. Our results conflict with several prior phylogenetic hypotheses concerning the oldest
lineages of the balistids. The deeply nested placement
and young age of Melichthys refutes Holcroft’s (2005)
analysis, which proposed Melichthys niger to be sister to
the rest of the triggerfish. Our analyses refute Tyler’s
(1980) hypothesis that Balistapus represents one of the
oldest lineages of the triggerfish and instead suggests
that the absence of a preopercular groove represents a
secondary loss. The long branches leading to Canthidermis and Abalistes tentatively support Matsuura’s (1979)
proposal that taxa represent the oldest extant lineages of
the balistids, though incomplete sampling prevented us
from obtaining a crown age estimate for either lineage.
Our results also contradict Tyler’s (1980) suggestion that
Xanthichthys and Odonus are sister groups and suggest
instead that a strongly supraterminal mouth and second medial teeth have evolved independently in each
of these lineages. Additionally, the sister relationship between Sufflamen and Rhinecanthus recovered by our analysis was originally proposed by Matsuura (1979) and
suggests that the modified interhyal observed in these
lineages either arose in a recent common ancestor or was
lost in other triggerfish clades. We recovered an early
split between a clade of Balistes (+ Pseudobalistes fuscus)
from the remaining triggerfish.
Our results indicate that three currently recognized
genera—Balistoides, Balistes, and Pseudobalistes—are nonmonophyletic and in need of nomenclatural revision.
We propose a taxonomic regression of Pseudobalistes fuscus to Balistes fuscus (Bloch and Schneider, 1801) to retain the monophyly of the genus Balistes. To resolve
the polyphyly of Balistoides, we suggest a revision from
Balistoides viridescens (Fraser-Brunner, 1935) to Pseudobalistes viridescens. This is a classification originally proposed in Bloch and Schneider (1801) that is strongly
supported by our data and retains the monophyly of
the genus Pseudobalistes. This classification also identifies Balistoides to be a monotypic genus comprising B.
conspicillum, a genetically unique member of the family
Balistidae.
2008
DORNBURG ET AL.—MODEL-AVERAGED PHYLOGENY OF THE BALISTIDAE
Divergence Times and Diversification
Although we use the same calibrations in our analysis
as were used in a recent study of all tetraodontiforms
(Alfaro et al., 2007), we recover a slightly younger crown
group age for triggerfish (originating 25 versus 11.3 Ma).
This discrepancy is non significant, however, as the 95%
highest posterior density intervals overlap between the
two studies. Our estimates of an earlier appearance are
still in accord with the known balistid fossil record (Tyler
and Santini, 2002; Schultz, 2004) and also corresponds to
the origins of several other species marine fish and invertebrates (e.g., Streelman et al., 2002; Klanten et al., 2004;
Barber and Bellwood, 2005; Read et al., 2006; Wallace and
Rosen, 2006). We consider our estimate of the triggerfish
crown age to be inconsistent with a Jurassic/Cretaceous
split between the filefish and triggerfish (Yamanoue et al.,
2006), as this would imply a 100 Myr long fuse between
the origin of the triggerfishes and their subsequent diversification. This discrepancy is expected and described in
Alfaro et al. (2007), who argue that some of the calibration points used by Yamanoue et al. (2006) are fossils
that have been erroneously dated. For example, the oldest gadiform fossil is 61 million years old and not 161 as
stated by Yamanoue et al. (2006). Additionally, some of
the other calibration points used by Yamanoue and colleagues are secondary calibration points recovered from
the mammal/bird split. For these reasons we argue that
the available tetraodontiform fossil data more strongly
favors the younger stem and crown balistid ages recovered in our analysis than those presented by Yamanoue
et al. (2006).
Our chronogram reveals that many extant lineages are
very young. Almost all triggerfish clades had formed by
the Middle/Late Miocene, yet 19 out of the 26 taxa sampled hold their origins during the Pliocene/Pleistocene.
Although visual inspection might suggest a recent elevation in the global diversification rate of triggerfish,
statistical analysis shows that this trend is not significant. Instead, our results are suggestive that triggerfish
as a whole did not experience elevated rates of diversification during paleoclimatic events associated with
the Pliocene/Pleistocene as has been suggested in other
groups (Taylor and Dodson, 1994; Palumbi et al., 1997;
LaJeunesse, 2005; Floeter et al., 2007) but that their speciation rates may have stayed more consistent for alternate, and as yet unknown, reasons. Similarly, triggerfish subclades with crown ages in the Pliocene and
Pleistocene are not diversifying more quickly than other
subclades.This result underscores the need for rigorous
statistical testing of macroevolutionary patterns, as visual inspection alone may not be enough to deduce patterns of change in diversification rate. Further ecological
correlates of these results are difficult to explore. Despite
the conspicuous nature of the balistids, robust published
studies of the group’s ecology are disappointingly sparse
(but see Bean et al., 2002), and further studies are needed
before we are able to understand the ecological correlates underlying the group’s diversification. We propose
that the triggerfish may be a model group with which to
917
study macroevolutionary patterns in marine fish, given
the young age of the group, its circumglobal distribution,
the ecological dependency on reefs of most of its members, and the availability of several well-preserved fossils. Further sampling may reveal novel morphological
innovations or shed light on novel patterns of diversification that may be correlated with historical biogeographic
hypotheses, including historical changes in currents and
reef ecology as a result of climatic fluctuations during the
Pliocene.
ACKNOWLEDGMENTS
We are incredibly grateful to all the people and institutions that have
contributed to this work. We would especially like to thank Lindsay
Godfrey for all the help sequencing, Chad Brock for help with diversification statistics, and Devin Drown for the help designing primers.
We would also like to thank Hugo Alamillo, Barbara Banbury, Magnus
Wood, and the rest of the Alfaro lab for their constant support during this project. This project would not have been possible without the
tissue loans from Peter Wainwright at UC Davis, Jeffrey Hunt at the
Smithsonian Institution, Mark McGrouther at the Australian Museum,
and John Friel the Cornell University Museum of Vertebrates. A.D. received support for this project from a Washington State University Undergraduate Research Grant in Zoology, a WSU Center for Integrated
Biotechnology Fellowship, and an NSF Undergraduate Research in Biology and Mathematics Fellowship (UBM 0531870). Additional support was provided by an NSF ITR grant (EB0336148) and by NSF DEB
0445453 to M.E.A.
R EFERENCES
Akaike, H. 1973. Information theory as an extension of the maximum
likelihood principle. Pages 267–281 in Second Annual Symposium
on Information Theory (B. N. Petrov, and F. Csaki, eds.). Akademi
Kiado, Budapest.
Alfaro, M. E., C. D. Brock, and F. Santini. 2007. Do reefs drive diversification in marine fish? Examples from the pufferfishes and their
allies. Evolution 61:2104–2126.
Alfaro, M. E., and J. P. Huelsenbeck. 2006. Comparative performance
of Bayesian and AIC-based measures of phylogenetic model uncertainty. Syst. Biol. 55:89–96.
Baldwin, B. G., and M. J. Sanderson. 1998. Age and rate of diversification of the Hawaiian silversword alliance (Compositae). Proc. Natl.
Acad. Sci. USA 95:9402–9406.
Barber, P. H., and D. R. Bellwood. 2005. Biodiversity hotspots: Evolutionary origins of biodiversity in wrasses (Halichoeres: Labridae)
in the Indo-Pacific and new world tropics. Mol. Phylogenet. Evol.
35:235–253.
Bean, K., G. P. Jones, and M. J. Caley. 2002. Relationships among distribution, abundance, and microhabitat specialization in a guild of coral
reef triggerfish (family Balistidae). Mar. Ecol. Press Ser. 233:263–272.
Beier, B. A., J. A. A. Nylander, M. W. Chase, and M. Thulin. 2004. Phylogenetic relationships and biogeography of the desert plant genus Fagonia (Zygophyllaceae), inferred by parsimony and Bayesian model
averaging. Mol. Phylogenet. Evol. 33:91–108.
Brandley, M., A. Schmitz, and T. W. Reeder. 2005. Partitioned Bayesian
analyses, partition choice, and phylogenetic relationships of scincid
lizards. Syst. Biol. 54:373–390.
Buckley, T. R., P. Arensburger, C. Simon, and G. K. Chambers. 2002.
Combined data, Bayesian phylogenetics, and the origin of the New
Zealand cicada genera. Syst. Biol. 51:4–18.
Burnham, K. P., and D. R. Anderson. 2003. Model selection and
multimodel inference, a practical information-theoretic approach.
Springer, New York.
Chen, T. C., R. F. G. Ormond, and H. K. Mok. 2001. Feeding and territorial behavior in juveniles of three co-existing triggerfishes. J. Fish
Biol. 59:524–532.
Chen, W. J., C. Bonillo, and G. Lecointre. 2003. Repeatability of clades
as a criterion of reliability: A case study for molecular phylogeny of
918
SYSTEMATIC BIOLOGY
Acanthomorpha (Teleostei) with larger number of taxa. Mol. Phylogenet. Evol. 26:262–288.
Clements, K. D., M. E. Alfaro, J. Fessler, and M. W. Westneat. 2004.
Relationships of the temperate Australasian labrid fish tribe Odacini.
Mol. Phylogenet. Evol. 32:575–587.
Drummond, A. J., S. Y. W. Ho, M. J. Phillips, and A. Rambaut. 2006.
Relaxed phylogenetics and dating with confidence. PLoS Biol. 4. e88.
Felsenstein, J. 1978. Cases in which parsimony and compatability methods will be positively misleading. Syst. Zool. 27:401–410.
Floeter, S. R., L. A. Rocha, D. R. Robertson, J. C. Joyeux, W. F. SmithVaniz, P. Wirtz, A. J. Edwards, J. P. Barreiros, C. E. L. Ferreira, J. L.
Gasparini, A. Brito, J. M. Falcon, B. W. Bowen, and G. Bernardi. 2007.
Atlantic reef fish biogeography and evolution. J. Biogeogr. 35:22–47.
Fraser-Brunner, A. 1935. Notes on the Plectognath fishes. I. A synopsis of the genera of the family Balistidae. Ann. Mag. Nat. Hist. Ser.
10:658–663.
Gelman, A., and D. B. Rubin. 1992. Inference from iterative simulation
using multiple sequences. Stat. Sci. 7:457–511.
Green, P. J. 1995. Reversible jump Markov chain Monte Carlo computation and Bayesian model determination. Biometrika 82:711–732.
Harmon, L., J. Weir, C. D. Brock, and W. Challenger. 2008. GEIGER:
Investigating evolutionary radiations. Bioinformatics 24:129–131.
Hoeting, J. A., D. Madigan, A. E. Raftery, and C. T. Volinsky. 1999.
Bayesian model averaging: A tutorial. Stat. Sci. 14:382–417.
Holcroft, N. I. 2005. A molecular analysis of the interrelationships of
tetraodontiform fishes (Acanthomorpha: Tetraodontiformes). Mol.
Phylogenet. Evol. 34:525–544.
Huelsenbeck, J. P., and D. M. Hillis. 1993. Success of phylogenetic methods in the four-taxon case. Syst. Biol. 42:247–264.
Huelsenbeck, J. P., B. Larget, and M. E. Alfaro. 2004. Bayesian phylogenetic model selection using reversible jump Markov chain Monte
Carlo. Mol. Biol. Evol. 2004:1123–1133.
Huelsenbeck, J. P., and B. Rannala. 2004. Frequentist properties of
Bayesian posterior probabilities of phylogenetic trees under simple
and complex substitution models. Syst. Biol. 53:904–913.
Kelchner, S. A., and M. A. Thomas. 2006. Model use in phylogenetics:
Nine key questions. Trends Ecol. Evol. 22:87–94.
Klanten, S. O., L. van Herwerden, J. H. Choat, and D. Blair. 2004. Patterns of lineage diversification in the genus Naso (Acanthuridae).
Mol. Phylogenet. Evol. 32:221–235.
Kuiter, R. H., and H. Debelius. 2006. World atlas of marine fishes. Hollywood Import and Export, Inc., Frankfurt.
LaJeunesse, T. C. 2005. “Species” radiations of symbiotic dinoflagellates
in the Atlantic and Indo-Pacific since the Miocene-Pliocene transition. Mol. Biol. Evol. 22:570–581.
Lee, M. S. Y., and A. F. Hugall. 2006. Model type, implicit data weighting, and model averaging in phylogenetics. Mol. Phylogenet. Evol.
38:848–857.
Leis, J. M. 1984. Tetraodontiformes: Relationships. Pages 459–463 in
Ontogeny and systematics of fishes (H. G. Moser, W. J. Richards, D.
M. Cohen, M. P. Fahay, A. W. Kendall Jr., and S. L. Richardson, eds.).
Amer. Soc. Ichthyol. Herp. Lawrence, Kansas.
Lemmon, A. R., and E. C. Moriarty. 2004. The importance of proper
model assumption in Bayesian phylogenetics. Syst. Biol. 53:265–277.
Magallon, S., and M. J. Sanderson. 2001. Absolute diversification rates
in angiosperm clades. Evolution 55:1762–1780.
Matsuura, K. 1979. Phylogeny of the superfamily Balistoidea (Pisces:
Tetraodontiformes). Memoirs of the Faculty of Fisheries, Hokkaido
University 26:49–149.
Matsuura, K. 1981. Xenobalistes tumidipectoris, a new genus and
species of triggerfish (Tetraodontiformes, Balistidae) from the Marianas Islands. Bull. Natl. Sci. Mus. Ser. A 7:191–200.
Nee, S. 2001. Inferring speciation rates from phylogenies. Evolution
55:661–668.
Nee, S., A. O. Mooers, and P. H. Harvey. 1992. Tempo and mode of
evolution revealed from molecular phylogenies. Proc. Natl. Acad.
Sci. USA 89:8322–8326.
Nylander, J. A. A. 2004. Bayesian phylogenetics and the evolution
of gall wasps. Comprehensive summaries of Uppsala dissertations
from the Faculty of Science and Technology, 937. University of
Uppsala, Sweden.
Ormond, R. F. G., A. C. Campbell, S. M. Head, R. J. Moore, P. S. Rainbow,
and A. P. Sanders. 1973. Formation and breakdown of aggregations
VOL.
57
of the crown of thron starfish Acanthaster Planci (L.) in the Red Sea.
Nature 246:167–169.
Pagel, M., and A. Meade. 2006. Bayesian analysis of correlated evolution of discrete characters by reversible-jump Markov chain Monte
Carlo. Am. Nat. 167:808–825.
Palumbi, S. R., G. Grabowsky, T. Duda, L. Geyer, and N. Tachino. 1997.
Speciation and population genetic structure in tropical Pacific Sea
urchins. Evolution 51:1506–1517.
Paradis, E., J. Claude, and K. Strimmer. 2004. APE; analyses of phylogenetics and evolution in R language. Bioinformatics 20:289–290.
Posada, D. 2008. ModelTest: Phylogenetic model averaging. Mol Biol
Evol 25:1253–1256.
Posada, D., and T. R. Buckley. 2004. Model selection and model averaging in phylogenetics: Advantages of AIC and Bayesian approaches
over likelihood ratio tests. Syst. Biol. 53:793–808.
Posada, D., and K. A. Crandall. 1998. ModelTest: Testing the model of
DNA substitution. Bioinformatics 14:817–818.
Posada, D., and K. A. Crandall. 2001. Selecting the best-fit model of
nucleotide substitution. Syst. Biol. 50:580–601.
Pybus, O. G., and P. H. Harvey. 2000. Testing macro-evolutionary models using incomplete molecular phylogenies. Philos. Trans. R. Soc.
Lond. B 267:2267–2272.
Pybus, O. G., A. Rambaut, E. C. Holmes, and P. H. Harvey. 2002. New
Inferences from tree shape: Number of missing taxa and population
growth rates. Syst. Biol. 51:881–888.
Rambaut, A. 1996. Se-Al. Available at http://beast.bio.ed.ac.uk/
software/seal
Rambaut, A., and A. J. Drummond. 2007. Tracer v1.4. Available from
http://beast.bio.ed.ac.uk/tracer
Read, C. I., D. R. Bellwood, and L. van Herwerden. 2006. Ancient origins of Indo-Pacific coral reef fish biodiversity: A case study of the
leopard wrasses (Labridae: Macropharyngodon). Mol. Phylogenet.
Evol. 38:808–819.
Ronquist, F., and J. P. Huelsenbeck. 2003. MrBayes 3: Bayesian phylogenetic inference under mixed models. Bioinformatics 19:1572–1574.
Rosen, D. E. 1984. Zeiformes as primitive plectognath fishes. Am. Mus.
Novit. 2782:1–45.
Saiki, R. S. 1990. Amplification of genomic DNA. Pages 13–20 in PCR
protocols (M. A. Innis, D. H. Gelfand, J. J. Sninsky, and T. J. White,
eds.). Academic Press, San Diego, California.
Santini, F., and J. C. Tyler. 2003. A phylogeny of the families of fossil and
extant tetraodontiform fishes (Acanthomorpha, Tetraodontiformes),
Upper Cretaceous to recent. Zool. J. Linn. Soc. 139:565–617.
Santini, F., and J. C. Tyler. 2004. The importance of even highly incomplete fossil taxa in reconstructing the phylogenetic relationships of
the Tetraodontiformes (Acanthomorpha: Pisces). Integr. Comp. Biol.
44:349–357.
Schultz, O. 2004. A triggerfish (Osteichthyes: Balistidae: Balistes) from
the Badenian (Middle Miocene) of the VIenna and the Styrian
Basin (Central Paratethys). Ann. Naturhist. Mus. Wien 106A:345–
369.
Simmons, M. P., K. M. Pickett, and M. Miya. 2004. How meaningful are
Bayesian support values? Mol. Biol. Evol. 21:188–199.
Streelman, J. T., M. Alfaro, M. W. Westneat, D. R. Bellwood, and S. A.
Karl. 2002. Evolutionary history of the parrotfishes: Biogeography,
ecomorphology, and comparative diversity. Evolution 56:961–971.
Sullivan, J., and P. Joyce. 2005. Model selection in phylogenetics. Annu.
Rev. Ecol. Evol. Syst. 36:445–466.
Sullivan, J., and D. L. Swofford. 1997. Are guinea pigs rodents? The
importance of adequate models in molecular phylogenetics. J. Mammal. Evol. 2:77–86.
Swofford, D. L. 2003. PAUP* 4.00: Phylogenetic analysis using parsimony (*and other methods). Version 4.0. Sinauer Associates, Sunderland, Massachusetts.
Taylor, E. B., and J. J. Dodson. 1994. A molecular analysis of relationships and biogeography within a species complex of Holarctic fish
(genus Osmerus). Mol. Ecol. 3:235–248.
Tyler, J. C. 1980. Osteology phylogeny and higher classification of the
fishes of the order plectognathi tetraodontiformes. NOAA Technical
Report NMFS Circular 431:1–422.
Tyler, J. C., and F. Santini. 2002. Review and reconstruction of the
tetraodontiform fishes from the Eocene of Monte Bolca, Italy, with
comments on related Tertiary taxa. Studi e Rieche sui Giacimenti
2008
DORNBURG ET AL.—MODEL-AVERAGED PHYLOGENY OF THE BALISTIDAE
Terziari di Bolca, Museo Civico di Storia Naturale di Verona 9:47–
119.
Wainwright, P. C., and J. P. Friel. 2000. Effects of prey type on motor pattern variance in tetraodontiform fishes. J. Exp. Zool. 286:563–
571.
Wallace, C. C., and B. R. Rosen. 2006. Diverse staghorn corals (Acropora) in high-latitude Eocene assemblages: Implications for the evolution of modern diversity patterns of reef corals. Proc. R. Soc. B
2006:975–982.
Walsh, P. S., D. A. Metzger, and R. Higuchi. 1991. Chelex 100 as a
medium for simple extraction of DNA for PCR-based typing from
forensic material. Biotechniques 10:506–513.
Westneat, M. W., and M. E. Alfaro. 2005. Phylogenetic relationships
and evolutionary history of the reef fish family Labridae. Mol. Phylogenet. Evol. 36:370–390.
919
Winterbottom, R. 1974. The familial phylogeny of the Tetraodontiformes (Acanthoptrygii: Pisces) as eveidenced by their comparative
myology. Smithson. Contrib. Zool. 155:1–201.
Yamanoue, Y., M. Miya, J. G. Inoue, K. Matsuura, and M. Nishida. 2006.
The mitochondrial genome of spotted green pufferfish Tetraodon
nigroviridis (Teleostei: Tetraodontiformes) and divergence time
estimation among model organisms in fishes. Gene. Genet. Syst.
81:29–39.
Yang, Z., and B. Rannala. 2006. Bayesian estimation of species divergence times under a molecular clock using multiple fossil calibrations
with soft bounds. Mol. Biol. Evol. 23:212–226.
First submitted 21 September 2007; reviews returned 13 November 2007;
final acceptance 4 September 2008
Associate Editor: Frank Anderson