Donovan Parks and Robert Beiko geophylogeny of the 2009 H1N1

Transcription

Donovan Parks and Robert Beiko geophylogeny of the 2009 H1N1
Investigating Viral and Microbial Biogeography
using GenGIS
Donovan Parks and Robert Beiko
biodiversity of marine mircrobes
introduction
Advances in DNA sequencing have provided new insights into the biogeography of organisms and
populations. Thousands of georeferenced genome sequences are now available for viral pathogens
such as Influenza A, while metagenomic studies such as the Global Ocean Sampling (GOS)
expedition are providing microbial biodiversity data from distinct habitats and locations. Our software
package, GenGIS, integrates digital maps with gene sequence, habitat, and hierarchical data in order
to allow users to visualize and analyze the interplay between these sources of information.
hypothesis testing
Visualizations of tree structures within a cartographic display allow the influence of geographic and
environmental gradients on biodiversity to be studied. Two alternative hypotheses of the influence of
geography on the evolutionary history of genetic samples taken along a stream are depicted below. A
strictly latitudinal gradient (left image) results in 2 crossings occurring between the correlation lines
(shown in red). In contrast, a non-linear gradient (right image) which follows the path of the stream
results in zero crossings between the correlation lines and, as such, a more parsimonious explanation
of the inferred evolutionary history. Hypothesis testing of this nature can be done interactively in
GenGIS with trees being automatically laid out to minimize the number of crossings between
correlation lines. The quantitative nature of this visualization also permits a permutation test to be
performed in order to assess if a fit is significantly better than random.
Correlation
Lines
Samples
GenGIS was used to investigate 19
marine metagenomes collected as
part of the GOS expedition (Rusch
et al., 2007). These samples cover a
wide latitudinal gradient extending
from the Panama Canal to Halifax,
Nova Scotia (~9°N to 45°N) and a
range of habitat types (see table
below). Using 16S ribosomal DNA
genes from these samples as
indicative of their taxonomic
composition, the ‘species’ richness
and phylogenetic beta-diversity of
these samples can be explored.
The image above depicts the number of unique 16S sequences within each sample. A linear
regression of this data provides statistical evidence for a latitudinal gradient of ‘species’ richness.
However, this analysis conflates geographic and habitat effects. The left figure below indicates the
phylogenetic beta-diversity between samples as determined using UniFrac (Lozupone and Knight,
2005). This visualization illustrates the relationship between community composition, geography, and
Habitat type
Sample IDs Salinity (ppt) Temp . (°C) habitat type. In the below right figure, we
reduce the tree to open ocean samples
Northern Atlantic
9 samples
30
14
from the Northern Atlantic and Caribbean
Caribbean Sea
GS15-19
36
27
Sea. A permutation test provides strong
Estuaries
GS11 , GS12
3 to 10
1 to 11
evidence that the samples from these two
Bay of Fundy
GS06
25 to 31
11
open ocean habitats are distinct, but lack
Lake Gatun
GS20
0.1
29
latitudinal or general spatial structuring
within a habitat.
Bedford Basin
GS05
30
15
Acidobacteriales
Actinobacteridae
Alphaproteobacteria
Betaproteobacteria
Candidatus Microthrix
Tree
Layout Line
Geographic
Layout Line
Chlorobia
Flavobacteria
Gammaproteobacteria
Prochlorales
geophylogeny of the 2009 H1N1 pandemic
Sphingobacteria
Unclassified
Other (< 3%)
GenGIS was used to explore the spatiotemporal dynamics of the recent H1N1 pandemic. A maximum
likelihood phylogeny of 203 complete isolates collected between March and August of 2009 was
inferred. The evolutionary history of 16 isolates within a well-supported subtree is depicted here as a
3D geophylogeny. The intermingling of isolates from different continents in the phylogeny provides
strong evidence of the global nature of this pandemic.
In the image above, the taxonomic composition
of each sample is depicted as a pie chart. A
distinctive composition of bacteria is present
within each habitat type:
l
The
references
GenGIS references:
Website with executables, source code, video examples, and tutorials: http://kiwi.cs.dal.ca/GenGIS
Parks, D.H., Porter, M., et al. (2009). GenGIS: A geospatial information system for genomic data. Genome
Research, 19, 1896-1904.
Parks, D.H., MacDonald, N.J., and Beiko, R.G. (2009). Tracking the evolution and geographic spread of
Influenza A. PLoS Currents: Influenza, RRN1014.
Parks, D.H. and Beiko, R.G. (2009). Quantitative visualizations of hierarchically organized data in a
geographic context. Geoinformatics, Fairfax, VA.
Cited literature:
Johnson, Z. I., Zinser, E. R., et al. (2006). Niche partitioning among Prochlorococcus ecotypes along oceanscale environmental gradients. Science, 311, 1737-1740.
Lozupone, C. and Knight, R. (2005). UniFrac: a new phylogenetic method for comparing microbial communities.
Applied and Environmental Microbiology, 71, 8228-8235.
Rusch, D.B., Halpern, A.L., et al. (2007). The Sorcerer II Global Ocean Sampling expedition: northwest Atlantic
through eastern tropical Pacific. PLoS Biology, 5, e77.
similarity among Caribbean samples
(GS015-GS019) can largely be attributed to
the relatively high abundance of Prochlorales
(Johnson et al., 2006).
l
The Bay of Fundy (GS006) estuary has salinity
levels similar to the surrounding open ocean
samples and appears similar in composition to
these open ocean communities.
l
The Delaware (GS011) and Chesapeake
(GS012) estuaries are significantly
overrepresented in Actinobacteriadae and
Betaproteobacteria.
l
Lake Gatun (GS020), the sole freshwater
sample, is unusual with <50%
Alphaproteobacteria and relatively high
amounts of Acidobacteriales and
Actinobacteridae.
l
The human impacted Bedford Basin (GS005)
sample is atypical, possibly due to a total lack
of Actinobacteridae along with a relatively high
proportion of Betaproteobacteria.
Acidobacteriales
Actinobacteridae
Betaproteobacteria
Gammaproteobacteria
Prochlorales
GenGIS allows users to interactively explore
subsets of their data. On the right, the relative
abundance of the five most significant taxonomic
classes found within these samples is shown in
order to emphasize the above findings.
acknowledgements
The development of GenGIS has been supported by Genome Atlantic, the Natural Sciences and
Engineering Research Council of Canada, the Canada Foundation for Innovation and the Dalhousie
Faculty of Computer Science. DHP is supported by the Killam Trust and RGB is supported by the
Canada Research Chairs Program.