GARBAGE IN = GARBAGE OUT

Transcription

GARBAGE IN = GARBAGE OUT
USER RESPONSIBILITY GARBAGE IN = GARBAGE OUT Each step relies on accuracy of previous steps Just because you get an answer does not make it right: Appropriate test? Correct parameters? Applicable dataset? ANALYSIS PIPELINE Mul?ple Alignment CLUSTALW T-­‐COFFEE MAFFT MUSCLE PROBCONS Visualiza?on
& Adjustment GENEDOC JALVIEW Format Input Data FASTA PHYLIP NEXUS Newick Phylogene?cs Methods: Distance Matrix Max Parsimony Max Likelihood Programs: PHYLIP RAxML MrBayes Evolu?onary Analyses r8s PAML BEAST Mul?div?me ALIGNMENT PROGRAMS ClustalW (1994) h]p://www.ebi.ac.uk/Tools/msa/clustalw2/ T-­‐Coffee (2000) http://igs-server.cnrs-mrs.fr/Tcoffee/
MAFFT (2002) http://mafft.cbrc.jp/alignment/server/
MUSCLE (2004) http://www.drive5.com/muscle
PROBCONS (2005)
http://probcons.stanford.edu <30 taxa**
Uses a progressive mul/ple alignment; Parameters e.g. gap penal/es are adjusted according to input i.e. divergence, length, local hydropathy, etc. Performs pairwise local and global alignments, then combines them in a progressive mul/ple alignment
Detects local homologous regions by Fast Fourier Transform (considers aa size & polarity), then uses a restricted global DP and a progressive algorithm and horizontal refinement kmer distances and log-­‐expecta/on scores, progressive and horizontal refinement pairwise consistency based on an objec/ve func/on COMPARISON OF ALIGNMENT PROGRAMS ALIGNMENT: CLUSTALW ALIGNMENT: MUSCLE ALIGNMENT: MAFFT ALIGNMENT VIEWERS/MANIPULATORS GENEDOC Program Descrip.on: A Full Featured Mul?ple Sequence Alignment Editor, Analyser and Shading U?lity for Windows. h]p://www.nrbsc.org/gfx/genedoc/ Pla1orm: Windows Input: Amino acid and nucleo?de FASTA, Clustal (.aln), Phylip, PIR, GCG (.msf), and GenBank formats. Output: Default are .msf files. Can also export in FASTA, Clustal (.aln), Phylip, PIR, and text JALVIEW Program Descrip.on: Jalview is a mul?ple alignment editor wri]en in Java. It is used widely in a variety of web pages but is available as a general purpose alignment editor and analysis workbench. hIp://www.jalview.org/ Pla1orm: Mac, Windows, Linux, Solaris, Unix, etc. Input: Amino acid and nucleo?de FASTA, Clustal (.aln), BLC, PIR, GCG (.msf), and PFAM formats. Output: Default are .msf files. Can also export in FASTA, Clustal (.aln), Phylip, PIR, and text ALIGNMENT VIEWERS/MANIPULATORS BLOSUM62 PERCENT IDENTITY CLUSTAL HYDORPHOBICITY REGIONS OF PROBLEMATIC ALIGNMENT Accuracy of Alignment has an impact on the resulNng phylogeneNc tree!! ALIGNMENT: MUSCLE -­‐ FULL LENGTH ALIGNMENT: MUSCLE -­‐ CONSERVED REGIONS Gblocks: Castresana (2000) Mol. Biol. Evol. 17: 540-­‐552 Radish2
CONSERVED REGIONS 30
Wradish3
62
Radish3
Wradish2
56
Wradish1
92
Wradish3
Radish1
99
B oleracea
Cotton2
21
Grape
MFlower1
27
Poplar1
Tomato2
100
46
Cassava1
65
37
Potato2
57
1
81
Cassava2
Potato1
100
Apple1A
99
Tomato1
Apple1B
Grape
91
59
Soybean3
Moss1
22
1
100
Medicago1
Moss2
Medicago2
97
33
Cotton1
Soybean2
88
Poplar1
3
93
Cassava2
39
Lettuce1
52
Soybean4
32
Sunflower2
Sunflower1
12
Soybean3
MFlower1
63
Medicago1
58
Tomato2
85
Soybean2
62
CommonBean
23
55
Potato1
100
Soybean1
72
Potato2
100
Medicago2
20
4
Cowpea
82
Cassava1
99
CommonBean
57
Apple1A
10
Soybean1
99
Apple1B
93
4
Soybean4
100
Cotton2
1
A lyrata1
Cotton1
85
Sunflower1
Lettuce1
4
B napus2
Athaliana
41
Sunflower2
20
B oleracea
100
91
A lyrata1
27
44
41
Athaliana
91
13
Tomato1
Moss2
100
Cowpea
3
Moss1
Rice3
Rice3
8
Brachy2
97
86
Sorghum1
70
63
Maize2
Columbine
Papaya
Papaya
Lettuce3
99
Lettuce3
Dandelion2
96
Lettuce2
Lettuce2
98
Dandelion1
MFlower4
42
3
9 15
Cucumber2
Cucumber2
Tomato3
Cotton3
100
Cotton3
100
1
Poplar2
Poplar2
6
Brachy1
65
57
100
Dandelion1
MFlower4
45
Tomato3
10
Artichoke
96
98
Artichoke
Dandelion
100
Maize2
Columbine
65
83
Brachy2
97
Sorghum1
78
B rapa
67
100
B napus2
6
Radish1
Wradish2
B napus1
B napus1
91
Radish3
91
B rapa
99
Brachy1
53
48
Rice1
20
Sorghum2
98
Rice1
Sorghum2
Maize1
Maize1
Brachyp3
31
69
79
Brachy3
100
Sorghum3
22
Rice2
43
Rice2
100
0.1
FULL LENGTH Wradish1
Radish2
82
40
94
Sorghum3
71
Maize3
91
Maize4
Maize4
88
Maize3
0.2
21
Cotton1
Poplar1
EFFECTS BRANCH/NODE SUPPORT Poplar1
3
Apple1B
93
Cassava1
65
37
Apple1A
81
Cassava2
10
Cassava2
CONSERVED REGIONS 4
39
FULL Apple1A
LENGTH 99
Cassava1
Apple1B
Soybean4
99
32
Grape
Soybean4
100
Soybean3
91
59
Soybean3
Medicago1
Medicago1
Medicago2
20
4
33
Soybean2
62
Soybean2
88
Soybean1
72
Medicago2
97
CommonBean
23
55
93
Rice3
82
Sunflower2
Sorghum1
78
86
Sunflower1
12
Maize2
100
Papaya
Lettuce2
100
85
Dandelion1
MFlower4
Potato1
100
13
Potato2
Tomato2
Artichoke
Dandelion
83
58
Lettuce3
99
MFlower1
63
Columbine
65
Cowpea
Lettuce1
52
Brachy2
97
CommonBean
57
Cowpea
3
Soybean1
99
100
Tomato1
Moss2
Moss1
8
Rice3
4
Medicago1
B rapa
Medicago2
97
EFFECTS BRANCH/NODE S
UPPORT 33
B napus1
99
91
B oleracea
Soybean2
88
B napus2
CommonBean
57
FULL LENGTH Athaliana
CONSERVED REGIONS 91
6
93
A lyrata1
Sunflower2
20
27
27
Lettuce1
82
Sunflower1
Sunflower2
Sunflower1
12
MFlower1
MFlower1
63
NO “CORRECT” SOLUTION KNOW IMPLICATIONS OF YOUR DECISIONS 100
46
Tomato2
100
58
Potato2
57
1
Cowpea
52
Lettuce1
4
Soybean1
99
100
Tomato2
85
Potato1
Potato1
100
Tomato1
13
Grape
Tomato1
Moss2
100
Moss1
Cotton2
1
Moss1
22
1
100
Rice3
8
Moss2
63
Apple1A
Papaya
98
Cassava1
99
Soybean4
Artich
96
Cassava2
39
Ma
Columbine
Apple1B
10
Sorghum1
70
Poplar1
93
Brachy2
97
Cotton1
3
Potato2
96
Lettuce
Dandelion2
Lettuce2
ANALYSIS PIPELINE Mul?ple Alignment Manual Adjustment Format Input Data CLUSTALW T-­‐COFFEE MAFFT MUSCLE PROBCONS GENEDOC JALVIEW FASTA PHYLIP NEXUS Phylogene?cs Methods: Distance Matrix Max Parsimony Max Likelihood Programs: PHYLIP RAxML MrBayes Evolu?onary Analyses r8s PAML BEAST Mul?div?me FILE FORMATS FASTA FORMAT >Struthio_camelus
!
VKYPNTNEEGKEVVLPKILSPIGSDGVYSNELANIEYTNVSKNNNNNNFAT--VDDYKPVPLDYMLDSK!
>Rhea_americana
!
VKYPNTNEEGKEVLLPEILNPVGTDGVYSNELANIEYTNVNKDNNNNNFAT--VDDHKPVSLEYMLDSK!
>Pterocnemia_pennata
!
VKYPNTNEEGKEVLLPEILNPVGADGVYSNELANIEYTNVSKDHDNEVFAT--VDDHKPVSLEYMLDSK!
>Casuarius_casuarius
!
VKYPNTNEDGKEVLLPKILNPIGSDGVYSDDLANIEYANVSKDHDKEVFAT--VDEYKPVSPEYMLDSK!
>Dromaius_novaehollandiae
!
VKYPNTNEDGKEVLLPKILNPIGSDGVYSNDLANIEYANVNNDNNNNNFAT--VDDYKPVSLEYMLDSK!
>Nothoprocta_cinerascens
!
VKYPNANDDGKEVPLPKTPSPIAANAVFGSDLANVEYTNISKDHDKNNNNNT-VDGYKPATLEYFLDNQ!
>Eudromia_elegans
!
VRYPNANDDGKEVPLPKTPSPVGANGVYSSDLANVEYTNINKNNNNNNNNNS-IDGYKPATLEFFLDNQ!
80 chars PHYLIP FORMAT 7 69!
S_camelus
R_american
P_pennata
C_casuariu
D_novaehol
N_cinerasc
E_elegans
10 chars VKYPNTNEEGKEVVLPKILSPIGSDGVYSNELANIEYTNVSKNNNNNNFAT--VDDYKPVPLDYMLDSK!
VKYPNTNEEGKEVLLPEILNPVGTDGVYSNELANIEYTNVNKDNNNNNFAT--VDDHKPVSLEYMLDSK!
VKYPNTNEEGKEVLLPEILNPVGADGVYSNELANIEYTNVSKDHDNEVFAT--VDDHKPVSLEYMLDSK!
VKYPNTNEDGKEVLLPKILNPIGSDGVYSDDLANIEYANVSKDHDKEVFAT--VDEYKPVSPEYMLDSK!
VKYPNTNEDGKEVLLPKILNPIGSDGVYSNDLANIEYANVNNDNNNNNFAT--VDDYKPVSLEYMLDSK!
VKYPNANDDGKEVPLPKTPSPIAANAVFGSDLANVEYTNISKDHDKNNNNNT-VDGYKPATLEYFLDNQ!
VRYPNANDDGKEVPLPKTPSPVGANGVYSSDLANVEYTNINKNNNNNNNNNS-IDGYKPATLEFFLDNQ!
NO WHITE SPACE FILE FORMATS NEXUS FORMAT #NEXUS !
begin data;!
dimensions ntax=7 nchar=69;!
format datatype=protein missing=? gap=- matchchar=.;!
!
matrix!
Struthio_camelus
VKYPNTNEEGKEVVLPKILSPIGSDGVYSNELANIEYTNVSK??????FAT—VDDYKPVPLDYMLDSK!
Rhea_americana
.............L..E..N.V.T................?.D?????...--...H...S.E.....!
Pterocnemia_pennata
.............L..E..N.V.A..................DHD?EV...--...H...S.E.....!
Casuarius_casuarius
........D....L.....N.........DD......A....DHDKEV...--..E....SPE.....!
Dromaius_novaehollandiae ........D....L.....N..........D......A..??D?????...--.......S.E.....!
Nothoprocta_cinerascens .....A.D.....P...TP...A.NA.FGS....V....I..DHDK?????T-..G...AT.E.F..N!
Eudromia_elegans
.R.....D.....P...TP..V.AN....S....V....I?.?????????S-I.G...AT.EFF..N!
;!
end; !
!
begin mrbayes;!
!prset aamodelpr=mixed;!
end;!
33
Soybean2
NEWICK TREE FORMAT Soybean1
88
99
CommonBean
57
Topology 93
((A,B),C) ((A:2,B:4):10,C:8) B
Lettuce1
52
82
Branch Length
A
Cowpea
C
Sunflower2
A
Sunflower1
12
MFlower1
63
2
58
Confidence Stats ((A:2,B:4):10[89],C:8) 100
C
Potato2
Potato1
100
A
89
Tomato2
85
B
C
Tomato1
2
13
B
Moss2
100
Moss1
Rice3
8
Brachy2
97
Sorghum1
70
63
((((Moss2:0.59223167356244488246,Moss1:0.48430519315771680677):
Columbine
0.47610587518093150372[100],(Rice3:0.55644328355758998494,
Papaya
(Brachy2:0.63383594852707514367,
96
(Sorghum1:0.14451441234434442284,Maize2:0.55808284363435467501):
98
0.29412654253200387622[63]):0.14718362545267285602[70]):
Dandelion2
96
0.72708851517482031568[97]):0.16225290952698268043[8],…) Lettuce2
Maize2
Artichoke
Lettuce3
ANALYSIS PIPELINE Mul?ple Alignment Manual Adjustment Format Input Data CLUSTALW T-­‐COFFEE MAFFT MUSCLE PROBCONS GENEDOC JALVIEW FASTA PHYLIP NEXUS Phylogene?cs Methods: Distance Matrix Max Parsimony Max Likelihood Programs: PHYLIP RAxML MrBayes Evolu?onary Analyses r8s PAML BEAST Mul?div?me PHYLOGENETIC METHODS DISTANCE MATRIX ANALYSES •  The number of differences between all sequence pairs is treated as a distance •  Clustering method Neighbor-­‐Joining: select tree with smallest total branch length by sequen?al selec?on of neighbors PROS & CONS • 
• 
• 
Computa?onally fast Produces 1 tree > does not consider all possible topologies Can get different results based on input order PROGRAMS •  PAUP* •  MEGA5 •  PHYLIP PHYLOGENETIC METHODS MAXIMUM PARSIMONY ANALYSES b c d a c b d V a V 4!
A!
G!
A!
G!
a d PROS & CONS •  Considers all possible trees (sort of) •  Computa?onally intensive 10 taxa > 2million possible trees •  No mul?ple hit correc?on PROGRAMS •  PAUP* •  MEGA5 •  PHYLIP •  MESQUITE b V 3
G
G
C
C
V 2
G
G
A
A
V a
b
c
d
1
A
A
A
A
V •  The op?mum tree requires the minimum number of changes needed to explain the divergence between the taxa •  Hypothesis that requires the fewest assump?ons is the best c PHYLOGENETIC METHODS MAXIMUM LIKELIHOOD ANALYSES Uses the maximum likelihood for each possible topology to chose the best tree Ø  Choose a probability model to es?mate likelihood that a posi?on will undergo a subs?tu?on within a given ?me Ø  Generate likelihood for each possible tree Ø  Calculate which tree has the op?mal likelihood PROS & CONS •  Makes assump?ons about both the rate of evolu?on and pa]ern of site subsitu?on •  Very slow – takes into considera?on all possible trees AND calculates their likelihood •  As long as assump?ons are realis?c – tends to be most consistent method PROGRAMS •  PAUP* •  PHYLIP •  MrBayes •  RAxML •  TREE-­‐PUZZLE •  PhyML ANALYSIS PIPELINE Mul?ple Alignment Manual Adjustment Format Input Data CLUSTALW T-­‐COFFEE MAFFT MUSCLE PROBCONS GENEDOC JALVIEW FASTA PHYLIP NEXUS Phylogene?cs Methods: Distance Matrix Max Parsimony Max Likelihood Programs: PHYLIP RAxML MrBayes Evolu?onary Analyses r8s PAML BEAST Mul?div?me VALUABLE RESOURCE h_p://evoluNon.geneNcs.washington.edu/phylip/socware.html PHYLIP h<p://evolu.on.gs.washington.edu/phylip.html
PROGRAM DESCRIPTION: A package of programs for inferring phylogenies. Methods available include parsimony, distance matrix, and likelihood methods, including bootstrapping and consensus trees. PLATFORMS: Windows, Mac OS X, and Linux INPUT: PHYLIP format; Data types include: molecular sequences, gene frequencies, restric?on sites and fragments, distance matrices, and discrete characters. OTHER GENERAL PURPOSE PACKAGES: •  PAUP* •  MEGA5 •  MESQUITE PHYLIP: Distance Matrix Example Pipeline 63 proteins; 515 chars Seqboot Generates mul?ple resampled dataset from Input data set Protdist Computes distance matrix from protein sequence Fitch Consense Generates topology using distance matrix Generates consensus tree from replicates above Instantaneous (100 replicates) 1 ½ hours Global readjustment Jumble = 5 <2 days instantaneous PHYLIP MrBayes h<p://mrbayes.sourceforge.net/index.php PROGRAM DESCRIPTION: A program for Bayesian es?ma?on of phylogeny. PLATFORMS: Mac (serial or clusters), Windows & Unix INPUT: Nucleo?de or amino acid alignments in NEXUS format RUN TIME: 12 taxa; 898 char (nt), ngen=10000; samplefreq=10 89 taxa; 88 char (aa), ngen=10000; samplefreq=10 63 taxa; 515 char (aa), ngen=500000; samplefreq=10 <5 mins <15 mins 19+ hours MrBayes: Loading Input Data MrBayes > excute filename.nex MrBayes: Define Structure of the Model Datatype 4x4 Doublet Codon Nucmodel 1 = F81 2 = K80 6 = GTR Rates equal gamma proinv invgamma adgamma MrBayes > lset nst=6 rates=invgamma MrBayes > help lset MrBayes: Seing the Priors Types of parameters in the model: 1. 
2. 
3. 
4. 
5. 
6. 
Topology Branch lengths Sta?onary frequencies of the nucleo?des Nucleo?de subs?tu?on rates (6) Propor?on of invariable sites Shape parameter of the gamma distribu?on of rate varia?on Default parameters work well for most analyses MrBayes > help prset MrBayes: Understanding Screen Printout MrBayes > mcmc ngen=200000 samplefreq=10 prinxreq=50 (1,000,000) (100) Cold Chain ngen MrBayes > help mcmc TREE #1 TREE #2 Time MrBayes: When to Stop Analysis? MrBayes > sump burnin=#; # = value corresponding to 25% of samples Example: if ngen=200000 samplefreq=50 than burnin=1000 (200000 ÷ 50 * 0.25) COMPLETE RUN INCOMPLETE RUN +------------------------------------------------------------+ !
|2
2
2
2
2
|!
|
2
2
2
2
2|!
|
2
1
1|!
|
|!
|
|!
|
|!
|
1
1
|!
|
1
|!
|
|!
|
|!
|
|!
|
1
1
1
|!
|1
1
|!
|
|!
|
1
|!
+------+-----+-----+-----+-----+-----+-----+-----+-----+-----+ !
!
!95% Cred. Interval!
----------------------!
Parameter
Mean
Variance
Lower
Upper
Median
PSRF *!
----------------------------------------------------------------------------!
TL
19.978955
0.050256
19.597000
20.258000
20.084000
3.113!
----------------------------------------------------------------------------!
Poten?al Scale Reduc?on Factor RAxML h<p://sco.h-­‐its.org/exelixis/soFware.html PROGRAM DESCRIPTION: A program for sequen?al and parallel Maximum Likelihood based inference of large phylogene?c trees. PLATFORMS: Mac & Linux; online version h]p://phylobench.vital-­‐it.ch/raxml-­‐bb/ INPUT: Nucleo?de or amino acid alignments in PHYLIP format; Newick trees RUN TIME: 25,000 taxa; 1500 char (nt) on single CPU >> 13 ½ days 63 taxa; 515 char (aa), 20 itera?ons; 100 bootstraps >> 1 ¼ days 63 taxa; 134 char (aa), 20 itera?ons; 100 bootstraps >> 9 hours RAxML The Easy & Fast Way: (Works well in most prac?cal cases) raxmlHPC -­‐f a -­‐x 12345 -­‐p 12345 -­‐# 100 -­‐m model -­‐s infile -­‐n TEST #conducts BS search and then find best-­‐scoring ML tree >> bootstrapped trees, best-­‐scoring ML tree, & BS support values. RAxML The Hard & Slow Way 1. Determining ini/al rearrangement seUng: If not specified with -­‐i command, it will try se{ngs of 5, 10, 15, 20, and 25 and use the minimal se{ng that yields the best likelihood improvement on the star?ng trees Run program several /mes with both auto determina/on seUng and with a pre-­‐defined value of 10. raxmlHPC -­‐y -­‐s infile -­‐m GTRCAT -­‐n ST0
#generates random MP star?ng tree raxmlHPC -­‐f d -­‐i 10 -­‐m GTRMIX -­‐s infile -­‐t RAxML_parsimonyTree.ST0 -­‐n FI0 #infers ML tree from star?ng tree using fixed se{ng raxmlHPC -­‐f d -­‐m GTRMIX -­‐s infile -­‐t RAxML_parsimonyTree.ST0 -­‐n AI0 #infers ML tree from star?ng tree using auto se{ng 2. Determining Number of Rate Categories: Try several rate categories i.e. 10, 25, 40, & 55 and choose the one that gives the best likelihood value raxmlHPC -­‐f d -­‐i 10 -­‐c 10 -­‐m GTRMIX -­‐s infile -­‐t RAxML_parsimonyTree.ST0 -­‐n C10
3. Finding the Best-­‐Known Likelihood Tree (BKL): raxmlHPC -­‐f d -­‐i 10 -­‐c 25 -­‐m GTRMIX -­‐s infile -­‐# 10 -­‐n MO 4. Bootstrapping: raxmlHPC -­‐f d -­‐i 10 -­‐c 25 -­‐m GTRCAT -­‐s infile -­‐# 100 -­‐b 12345 -­‐n MB
5. Generate Confidence Values: raxmlHPC -­‐f b -­‐m GTRCAT -­‐s infile -­‐z RAxML_bootstrap.MB –t RAxML_result.MO -­‐n BS_tree TREE VISUALIZATION\MANIPULATION FigTree http://tree.bio.ed.ac.uk/software/figtree/
Prepares graphical representa/ons of tress for publica/on (specifically with BEAST)
MEGA5 (Tree Explorer) http://www.megasoftware.net/
Dendroscope
http://mafft.cbrc.jp/alignment/server/
MacClade
h]p://www.macclade.org/ PloUng, rearranging and edi/ng trees Visualiza/on and naviga/on of phylogene/c trees; designed specifically to handle very large trees i.e. 100,000s of taxa (recommended by RAxML) Interac/ve analysis of evolu/on: observe effect of tree manipula/on i.e # of char steps & distribu/on of states of a given character Wradish1
Radish
2
Rad
ish3
91
Rad
ish1
Wr
ad
B
ish
na
2
pu
B
s1
ra
pa
B
ol
er
ac
ea
82
4
94 0
Wradish3
67
0
44
10
13
8
s2
99
91 10
0
59
97
Br
ac
Mos
Ric
e
3
Mo
ss1
hu
m
1
2
So
rg
ze
M
ai
85
100
97
70
63
100
ize
6 913
di
20
98
e
nd
Da
Ar
tic
ho
3
ce
tu
t
Le
2
n
lio
0.2
Brach
y2
Sor
ghu
m1
96
0.2
e
Ma
4
Maize3
63
mb
in
ya
2
ce
ttu
elion1
Dand
Maize4
88
Columbine
Rice3
70
Colu
pa
Le
91
Maize3
Tomato1
Cotton3
Maize4
Brachy1
Rice
1
88
hu
m2
91
71
1
71
Potato1
100
Sorghum3
ze
Sorghum3
Brachy3
100
Potato2
Tomato2
85
ai
Brachy3
100
58
Rice2
43
M
Rice2
Sunflower2
MFlower1
100
Maize1
Maize1
43
52
63
Sorghum2
Pa
wer
Sorghum2
Rice1
So
rg
98
91
97
MFlo
Brachy1
53
48
20
Lettuce1
Pop
lar2
Brachy1
Rice1
98
Sunflower1
82
Poplar2
2
53
48
20
Cotton3
100
6
Ri
ce
Poplar2
Apple1B
0
10
Moss2
Moss1
100
8
98
Cotton3
99
88
71
52
98
100
6
Soybean1
Apple1A
Tomato3
Tomato3
43
Cowpea
93
96
Cucumber2
41
4121
104647
0 100
91
57
er2
1 low
ce nf 1
tu Sulower
f
n
Su 1
er
2
low
MF Potato
100
Tomato2
o1
Potat
Tomato
100
1
t
Le
82
63
58 85
r2
Cucumbe
Tomato3
Cucumber2
9 15
CommonBean
93
57
99
88
45
15
9
99
59
MFlower4
45
33
85
15
MFlower4
Soybean2
88
13
Dandelion1
na
na
lera pus
cea2
B rapa
s1
B napu
2
dish
Wra sh1
di
Ra
81 65 3733 12
45
Medicago2
Lettuce2
98
Soybean3
n2
Lettuce3
100
Dandelion2
96
Lettuce2
Dandelion1
97
n1
32
98
Lettuce3
98
Artichoke
96
tto
91
Artichoke
Dandelion2
Medicago1
Soybean4
lia
1
98
96
Grape
91
ha
Bo
Cotton2
Columbine
Papaya
Papaya
13
85
Maize2
At
ata
63
Columbine
96
Sorghum1
70
Maize2
B
lyr
63
Radish3
Cotton1
37
A
Sorghum1
70
40
41
Brachy2
97
e
Rice3
8
Brachy2
97
Co
Radish2
ap
Rice3
8
Wradish1
62
tto
Moss1
Moss1
Wradish3
82
100
Moss2
100
Radish1
94
48
13
41
Cass
Moss2
Tomato1
Gr
100
Wradish2
12
Potato1
100
53
85
B napus2
Co
Tomato1
B oleracea
100
91
Tomato2
40
6892
4
58
Potato1
100
B rapa
44
Apple1B
Apple1A
Cassava2
ava1 r1
pla
Po
85
21
Potato2
100
Ra sh3
dis
h
Wradish12
3
Wradish
Potato2
Tomato2
B napus1
67
99
58
100
MFlower1
63
So
yb
e
an
M
Me
dica edica 2
go
go
2
Soyb 1
ean
Soybean4 3
100
Sunflower1
12
MFlower1
63
A lyrata1
an
Be
n1
on
ea
yb omm
So
C
a
e
wp
Co
e1
tuc
er2
Let
flow
Sun
wer1
Sunflower1
12
Athaliana
91
Sunflower2
57 93
82
Sunflower2
Cassava2
Lettuce1
52
Tomato2
82
Cassava1
81
Cowpea
59
Poplar1
Cassava1
Cassava2
99
Apple1A
Apple
91 100
1B
Soy
bea
So
n4
ybe
an3
Me Me
dic
di
ag
So cag
o1
yb o2
ea
n2
o
Sunfl
Lettuce1
93
Co
er1
MFlow
Potato2
52
65
CommonBean
57
96
e3
Maize2
tuc
Let
e
ok
ich
Art
na
lia
ha
At
ta1
a
yr
Al
1
tton
Co
on2
Cott
e
Grap
85
65
81
97
88
100
Cowpea
Poplar1
b
lum
Po
ta
to
1
Toma
to1
93
Soybean1
99
CommonBean
57
63
33
52
82
Soybean1
99
Sorghum1
70
Soybean2
88
de
Dan
ya
63
58
Medicago2
97
33
Soybean2
88
Brachy2
97
hy
2
Medicago1
Medicago2
97
33
Rice3
Soybean3
pa
Pa
ine
98
Ra
91
59
Medicago1
96
lion2
3
hy
ac
Br
m3
hu
rg
So aize4
M
Maize3
Soybean3
Moss1
8
12
91
59
Soybean4
100
37
45 15 13
98
Dandelion1
e2
Lettuc
Moss2
100
9
pea 1
Cowan ean
Be yb
So
Apple1B
Soybean4
Lettuce3
Papaya
Apple1A
Apple1B
100
96
Cassava2
99
s2
on
mm
81
pu
Co
37
41
21
6
MFlower4
na
91
20
98
0
41
Tomato
3 100
Cucumber2
Dandelion2
Artichoke
10
43
Apple1A
0
Cassava2
99
10
81
lar2
Dandelion1
96
Cassava1
65
n3
B
98
48
53
Pop
Lettuce2
98
Grape
Poplar1
Cassava1
Cotto
Poplar2
Cotton2
21
71
Cotton2
100
1
Rice1
Cotton3
0.2
88
91
37
53
Cotton1
85
Poplar1
65
20
A lyrata1
e3
41
Grape
9
um
2
Brachy1
48
Maiz
85
Sorghum2
98
Athaliana
91
A lyrata1
Cotton1
21
B oleracea
B napus2
e1
chy
m3
41
44
100
41
Athaliana
91
Bra
Maize1
1
e4
41
B napus2
Ric
Rice2
43
ze
So
rg
h
2
100
ai
Brachy3
ce
B oleracea
44
M
Maize3
B rapa
67
100
88
100
B napus1
B rapa
67
6
Wradish2
91
B napus1
100
Radish1
Ri
Wradish2
91
Maize4
71
3
Radish1
Sorghum3
91
Radish3
hu
Radish3
Tomato3
hy
40
94
Wradish1
Radish2
82
40
94
ac
Br
Radish2
Cucumber2
rg
So
62
Wradish1
62
82
15
iz
Ma
Wradish3
Wradish3
32
MFlower4
45
32
62
TREE TYPES ke
2
ANALYSIS PIPELINE Mul?ple Alignment Manual Adjustment Format Input Data CLUSTALW T-­‐COFFEE MAFFT MUSCLE PROBCONS GENEDOC JALVIEW FASTA PHYLIP NEXUS Phylogene?cs Methods: Distance Matrix Max Parsimony Max Liklihood Programs: PHYLIP RAxML MrBayes Evolu?onary Analyses r8s PAML BEAST Mul?DivTime POST-­‐PYLOGENETIC ANALYSES r8s
h<p://loco.biosci.arizona.edu/r8s/index.html BEAST MulNdivNme
h<p://statgen.ncsu.edu/thorne/mul.div.me.html PAML Analysis of rates ("r8s") of evolu/on: a program for es/ma/ng absolute rates ("r8s") of molecular evolu/on and divergence /mes on a phylogene/c tree. •  Species phylogenies for molecular da/ng •  Coalescent-­‐based popula/on gene/cs •  Measurably evolving popula/ons h<p://beast.bio.ed.ac.uk/Main_Page •  Studying rates of molecular evolu/on •  Es/ma/ng divergence /mes h<p://abacus.gene.ucl.ac.uk/soFware/paml.html •  es/mate branch length •  es/mate parameters in evolu/onary model: •  transi/on/transversion rate ra/o •  the gamma parameter for variable subs/tu/on among sites •  rate parameters for different genes •  synonymous and nonsynonymous subs/tu/on rates • 
• 
• 
• 
Test evolu/onary models calculate subs/tu/on rates among sites, reconstruct ancestral sequences, simulate sequence evolu/on and phylogene/c reconstruc/on.