Donovan Parks and Robert Beiko geophylogeny of the 2009 H1N1
Transcription
Donovan Parks and Robert Beiko geophylogeny of the 2009 H1N1
Investigating Viral and Microbial Biogeography using GenGIS Donovan Parks and Robert Beiko biodiversity of marine mircrobes introduction Advances in DNA sequencing have provided new insights into the biogeography of organisms and populations. Thousands of georeferenced genome sequences are now available for viral pathogens such as Influenza A, while metagenomic studies such as the Global Ocean Sampling (GOS) expedition are providing microbial biodiversity data from distinct habitats and locations. Our software package, GenGIS, integrates digital maps with gene sequence, habitat, and hierarchical data in order to allow users to visualize and analyze the interplay between these sources of information. hypothesis testing Visualizations of tree structures within a cartographic display allow the influence of geographic and environmental gradients on biodiversity to be studied. Two alternative hypotheses of the influence of geography on the evolutionary history of genetic samples taken along a stream are depicted below. A strictly latitudinal gradient (left image) results in 2 crossings occurring between the correlation lines (shown in red). In contrast, a non-linear gradient (right image) which follows the path of the stream results in zero crossings between the correlation lines and, as such, a more parsimonious explanation of the inferred evolutionary history. Hypothesis testing of this nature can be done interactively in GenGIS with trees being automatically laid out to minimize the number of crossings between correlation lines. The quantitative nature of this visualization also permits a permutation test to be performed in order to assess if a fit is significantly better than random. Correlation Lines Samples GenGIS was used to investigate 19 marine metagenomes collected as part of the GOS expedition (Rusch et al., 2007). These samples cover a wide latitudinal gradient extending from the Panama Canal to Halifax, Nova Scotia (~9°N to 45°N) and a range of habitat types (see table below). Using 16S ribosomal DNA genes from these samples as indicative of their taxonomic composition, the ‘species’ richness and phylogenetic beta-diversity of these samples can be explored. The image above depicts the number of unique 16S sequences within each sample. A linear regression of this data provides statistical evidence for a latitudinal gradient of ‘species’ richness. However, this analysis conflates geographic and habitat effects. The left figure below indicates the phylogenetic beta-diversity between samples as determined using UniFrac (Lozupone and Knight, 2005). This visualization illustrates the relationship between community composition, geography, and Habitat type Sample IDs Salinity (ppt) Temp . (°C) habitat type. In the below right figure, we reduce the tree to open ocean samples Northern Atlantic 9 samples 30 14 from the Northern Atlantic and Caribbean Caribbean Sea GS15-19 36 27 Sea. A permutation test provides strong Estuaries GS11 , GS12 3 to 10 1 to 11 evidence that the samples from these two Bay of Fundy GS06 25 to 31 11 open ocean habitats are distinct, but lack Lake Gatun GS20 0.1 29 latitudinal or general spatial structuring within a habitat. Bedford Basin GS05 30 15 Acidobacteriales Actinobacteridae Alphaproteobacteria Betaproteobacteria Candidatus Microthrix Tree Layout Line Geographic Layout Line Chlorobia Flavobacteria Gammaproteobacteria Prochlorales geophylogeny of the 2009 H1N1 pandemic Sphingobacteria Unclassified Other (< 3%) GenGIS was used to explore the spatiotemporal dynamics of the recent H1N1 pandemic. A maximum likelihood phylogeny of 203 complete isolates collected between March and August of 2009 was inferred. The evolutionary history of 16 isolates within a well-supported subtree is depicted here as a 3D geophylogeny. The intermingling of isolates from different continents in the phylogeny provides strong evidence of the global nature of this pandemic. In the image above, the taxonomic composition of each sample is depicted as a pie chart. A distinctive composition of bacteria is present within each habitat type: l The references GenGIS references: Website with executables, source code, video examples, and tutorials: http://kiwi.cs.dal.ca/GenGIS Parks, D.H., Porter, M., et al. (2009). GenGIS: A geospatial information system for genomic data. Genome Research, 19, 1896-1904. Parks, D.H., MacDonald, N.J., and Beiko, R.G. (2009). Tracking the evolution and geographic spread of Influenza A. PLoS Currents: Influenza, RRN1014. Parks, D.H. and Beiko, R.G. (2009). Quantitative visualizations of hierarchically organized data in a geographic context. Geoinformatics, Fairfax, VA. Cited literature: Johnson, Z. I., Zinser, E. R., et al. (2006). Niche partitioning among Prochlorococcus ecotypes along oceanscale environmental gradients. Science, 311, 1737-1740. Lozupone, C. and Knight, R. (2005). UniFrac: a new phylogenetic method for comparing microbial communities. Applied and Environmental Microbiology, 71, 8228-8235. Rusch, D.B., Halpern, A.L., et al. (2007). The Sorcerer II Global Ocean Sampling expedition: northwest Atlantic through eastern tropical Pacific. PLoS Biology, 5, e77. similarity among Caribbean samples (GS015-GS019) can largely be attributed to the relatively high abundance of Prochlorales (Johnson et al., 2006). l The Bay of Fundy (GS006) estuary has salinity levels similar to the surrounding open ocean samples and appears similar in composition to these open ocean communities. l The Delaware (GS011) and Chesapeake (GS012) estuaries are significantly overrepresented in Actinobacteriadae and Betaproteobacteria. l Lake Gatun (GS020), the sole freshwater sample, is unusual with <50% Alphaproteobacteria and relatively high amounts of Acidobacteriales and Actinobacteridae. l The human impacted Bedford Basin (GS005) sample is atypical, possibly due to a total lack of Actinobacteridae along with a relatively high proportion of Betaproteobacteria. Acidobacteriales Actinobacteridae Betaproteobacteria Gammaproteobacteria Prochlorales GenGIS allows users to interactively explore subsets of their data. On the right, the relative abundance of the five most significant taxonomic classes found within these samples is shown in order to emphasize the above findings. acknowledgements The development of GenGIS has been supported by Genome Atlantic, the Natural Sciences and Engineering Research Council of Canada, the Canada Foundation for Innovation and the Dalhousie Faculty of Computer Science. DHP is supported by the Killam Trust and RGB is supported by the Canada Research Chairs Program.