GARBAGE IN = GARBAGE OUT
Transcription
GARBAGE IN = GARBAGE OUT
USER RESPONSIBILITY GARBAGE IN = GARBAGE OUT Each step relies on accuracy of previous steps Just because you get an answer does not make it right: Appropriate test? Correct parameters? Applicable dataset? ANALYSIS PIPELINE Mul?ple Alignment CLUSTALW T-‐COFFEE MAFFT MUSCLE PROBCONS Visualiza?on & Adjustment GENEDOC JALVIEW Format Input Data FASTA PHYLIP NEXUS Newick Phylogene?cs Methods: Distance Matrix Max Parsimony Max Likelihood Programs: PHYLIP RAxML MrBayes Evolu?onary Analyses r8s PAML BEAST Mul?div?me ALIGNMENT PROGRAMS ClustalW (1994) h]p://www.ebi.ac.uk/Tools/msa/clustalw2/ T-‐Coffee (2000) http://igs-server.cnrs-mrs.fr/Tcoffee/ MAFFT (2002) http://mafft.cbrc.jp/alignment/server/ MUSCLE (2004) http://www.drive5.com/muscle PROBCONS (2005) http://probcons.stanford.edu <30 taxa** Uses a progressive mul/ple alignment; Parameters e.g. gap penal/es are adjusted according to input i.e. divergence, length, local hydropathy, etc. Performs pairwise local and global alignments, then combines them in a progressive mul/ple alignment Detects local homologous regions by Fast Fourier Transform (considers aa size & polarity), then uses a restricted global DP and a progressive algorithm and horizontal refinement kmer distances and log-‐expecta/on scores, progressive and horizontal refinement pairwise consistency based on an objec/ve func/on COMPARISON OF ALIGNMENT PROGRAMS ALIGNMENT: CLUSTALW ALIGNMENT: MUSCLE ALIGNMENT: MAFFT ALIGNMENT VIEWERS/MANIPULATORS GENEDOC Program Descrip.on: A Full Featured Mul?ple Sequence Alignment Editor, Analyser and Shading U?lity for Windows. h]p://www.nrbsc.org/gfx/genedoc/ Pla1orm: Windows Input: Amino acid and nucleo?de FASTA, Clustal (.aln), Phylip, PIR, GCG (.msf), and GenBank formats. Output: Default are .msf files. Can also export in FASTA, Clustal (.aln), Phylip, PIR, and text JALVIEW Program Descrip.on: Jalview is a mul?ple alignment editor wri]en in Java. It is used widely in a variety of web pages but is available as a general purpose alignment editor and analysis workbench. hIp://www.jalview.org/ Pla1orm: Mac, Windows, Linux, Solaris, Unix, etc. Input: Amino acid and nucleo?de FASTA, Clustal (.aln), BLC, PIR, GCG (.msf), and PFAM formats. Output: Default are .msf files. Can also export in FASTA, Clustal (.aln), Phylip, PIR, and text ALIGNMENT VIEWERS/MANIPULATORS BLOSUM62 PERCENT IDENTITY CLUSTAL HYDORPHOBICITY REGIONS OF PROBLEMATIC ALIGNMENT Accuracy of Alignment has an impact on the resulNng phylogeneNc tree!! ALIGNMENT: MUSCLE -‐ FULL LENGTH ALIGNMENT: MUSCLE -‐ CONSERVED REGIONS Gblocks: Castresana (2000) Mol. Biol. Evol. 17: 540-‐552 Radish2 CONSERVED REGIONS 30 Wradish3 62 Radish3 Wradish2 56 Wradish1 92 Wradish3 Radish1 99 B oleracea Cotton2 21 Grape MFlower1 27 Poplar1 Tomato2 100 46 Cassava1 65 37 Potato2 57 1 81 Cassava2 Potato1 100 Apple1A 99 Tomato1 Apple1B Grape 91 59 Soybean3 Moss1 22 1 100 Medicago1 Moss2 Medicago2 97 33 Cotton1 Soybean2 88 Poplar1 3 93 Cassava2 39 Lettuce1 52 Soybean4 32 Sunflower2 Sunflower1 12 Soybean3 MFlower1 63 Medicago1 58 Tomato2 85 Soybean2 62 CommonBean 23 55 Potato1 100 Soybean1 72 Potato2 100 Medicago2 20 4 Cowpea 82 Cassava1 99 CommonBean 57 Apple1A 10 Soybean1 99 Apple1B 93 4 Soybean4 100 Cotton2 1 A lyrata1 Cotton1 85 Sunflower1 Lettuce1 4 B napus2 Athaliana 41 Sunflower2 20 B oleracea 100 91 A lyrata1 27 44 41 Athaliana 91 13 Tomato1 Moss2 100 Cowpea 3 Moss1 Rice3 Rice3 8 Brachy2 97 86 Sorghum1 70 63 Maize2 Columbine Papaya Papaya Lettuce3 99 Lettuce3 Dandelion2 96 Lettuce2 Lettuce2 98 Dandelion1 MFlower4 42 3 9 15 Cucumber2 Cucumber2 Tomato3 Cotton3 100 Cotton3 100 1 Poplar2 Poplar2 6 Brachy1 65 57 100 Dandelion1 MFlower4 45 Tomato3 10 Artichoke 96 98 Artichoke Dandelion 100 Maize2 Columbine 65 83 Brachy2 97 Sorghum1 78 B rapa 67 100 B napus2 6 Radish1 Wradish2 B napus1 B napus1 91 Radish3 91 B rapa 99 Brachy1 53 48 Rice1 20 Sorghum2 98 Rice1 Sorghum2 Maize1 Maize1 Brachyp3 31 69 79 Brachy3 100 Sorghum3 22 Rice2 43 Rice2 100 0.1 FULL LENGTH Wradish1 Radish2 82 40 94 Sorghum3 71 Maize3 91 Maize4 Maize4 88 Maize3 0.2 21 Cotton1 Poplar1 EFFECTS BRANCH/NODE SUPPORT Poplar1 3 Apple1B 93 Cassava1 65 37 Apple1A 81 Cassava2 10 Cassava2 CONSERVED REGIONS 4 39 FULL Apple1A LENGTH 99 Cassava1 Apple1B Soybean4 99 32 Grape Soybean4 100 Soybean3 91 59 Soybean3 Medicago1 Medicago1 Medicago2 20 4 33 Soybean2 62 Soybean2 88 Soybean1 72 Medicago2 97 CommonBean 23 55 93 Rice3 82 Sunflower2 Sorghum1 78 86 Sunflower1 12 Maize2 100 Papaya Lettuce2 100 85 Dandelion1 MFlower4 Potato1 100 13 Potato2 Tomato2 Artichoke Dandelion 83 58 Lettuce3 99 MFlower1 63 Columbine 65 Cowpea Lettuce1 52 Brachy2 97 CommonBean 57 Cowpea 3 Soybean1 99 100 Tomato1 Moss2 Moss1 8 Rice3 4 Medicago1 B rapa Medicago2 97 EFFECTS BRANCH/NODE S UPPORT 33 B napus1 99 91 B oleracea Soybean2 88 B napus2 CommonBean 57 FULL LENGTH Athaliana CONSERVED REGIONS 91 6 93 A lyrata1 Sunflower2 20 27 27 Lettuce1 82 Sunflower1 Sunflower2 Sunflower1 12 MFlower1 MFlower1 63 NO “CORRECT” SOLUTION KNOW IMPLICATIONS OF YOUR DECISIONS 100 46 Tomato2 100 58 Potato2 57 1 Cowpea 52 Lettuce1 4 Soybean1 99 100 Tomato2 85 Potato1 Potato1 100 Tomato1 13 Grape Tomato1 Moss2 100 Moss1 Cotton2 1 Moss1 22 1 100 Rice3 8 Moss2 63 Apple1A Papaya 98 Cassava1 99 Soybean4 Artich 96 Cassava2 39 Ma Columbine Apple1B 10 Sorghum1 70 Poplar1 93 Brachy2 97 Cotton1 3 Potato2 96 Lettuce Dandelion2 Lettuce2 ANALYSIS PIPELINE Mul?ple Alignment Manual Adjustment Format Input Data CLUSTALW T-‐COFFEE MAFFT MUSCLE PROBCONS GENEDOC JALVIEW FASTA PHYLIP NEXUS Phylogene?cs Methods: Distance Matrix Max Parsimony Max Likelihood Programs: PHYLIP RAxML MrBayes Evolu?onary Analyses r8s PAML BEAST Mul?div?me FILE FORMATS FASTA FORMAT >Struthio_camelus ! VKYPNTNEEGKEVVLPKILSPIGSDGVYSNELANIEYTNVSKNNNNNNFAT--VDDYKPVPLDYMLDSK! >Rhea_americana ! VKYPNTNEEGKEVLLPEILNPVGTDGVYSNELANIEYTNVNKDNNNNNFAT--VDDHKPVSLEYMLDSK! >Pterocnemia_pennata ! VKYPNTNEEGKEVLLPEILNPVGADGVYSNELANIEYTNVSKDHDNEVFAT--VDDHKPVSLEYMLDSK! >Casuarius_casuarius ! VKYPNTNEDGKEVLLPKILNPIGSDGVYSDDLANIEYANVSKDHDKEVFAT--VDEYKPVSPEYMLDSK! >Dromaius_novaehollandiae ! VKYPNTNEDGKEVLLPKILNPIGSDGVYSNDLANIEYANVNNDNNNNNFAT--VDDYKPVSLEYMLDSK! >Nothoprocta_cinerascens ! VKYPNANDDGKEVPLPKTPSPIAANAVFGSDLANVEYTNISKDHDKNNNNNT-VDGYKPATLEYFLDNQ! >Eudromia_elegans ! VRYPNANDDGKEVPLPKTPSPVGANGVYSSDLANVEYTNINKNNNNNNNNNS-IDGYKPATLEFFLDNQ! 80 chars PHYLIP FORMAT 7 69! S_camelus R_american P_pennata C_casuariu D_novaehol N_cinerasc E_elegans 10 chars VKYPNTNEEGKEVVLPKILSPIGSDGVYSNELANIEYTNVSKNNNNNNFAT--VDDYKPVPLDYMLDSK! VKYPNTNEEGKEVLLPEILNPVGTDGVYSNELANIEYTNVNKDNNNNNFAT--VDDHKPVSLEYMLDSK! VKYPNTNEEGKEVLLPEILNPVGADGVYSNELANIEYTNVSKDHDNEVFAT--VDDHKPVSLEYMLDSK! VKYPNTNEDGKEVLLPKILNPIGSDGVYSDDLANIEYANVSKDHDKEVFAT--VDEYKPVSPEYMLDSK! VKYPNTNEDGKEVLLPKILNPIGSDGVYSNDLANIEYANVNNDNNNNNFAT--VDDYKPVSLEYMLDSK! VKYPNANDDGKEVPLPKTPSPIAANAVFGSDLANVEYTNISKDHDKNNNNNT-VDGYKPATLEYFLDNQ! VRYPNANDDGKEVPLPKTPSPVGANGVYSSDLANVEYTNINKNNNNNNNNNS-IDGYKPATLEFFLDNQ! NO WHITE SPACE FILE FORMATS NEXUS FORMAT #NEXUS ! begin data;! dimensions ntax=7 nchar=69;! format datatype=protein missing=? gap=- matchchar=.;! ! matrix! Struthio_camelus VKYPNTNEEGKEVVLPKILSPIGSDGVYSNELANIEYTNVSK??????FAT—VDDYKPVPLDYMLDSK! Rhea_americana .............L..E..N.V.T................?.D?????...--...H...S.E.....! Pterocnemia_pennata .............L..E..N.V.A..................DHD?EV...--...H...S.E.....! Casuarius_casuarius ........D....L.....N.........DD......A....DHDKEV...--..E....SPE.....! Dromaius_novaehollandiae ........D....L.....N..........D......A..??D?????...--.......S.E.....! Nothoprocta_cinerascens .....A.D.....P...TP...A.NA.FGS....V....I..DHDK?????T-..G...AT.E.F..N! Eudromia_elegans .R.....D.....P...TP..V.AN....S....V....I?.?????????S-I.G...AT.EFF..N! ;! end; ! ! begin mrbayes;! !prset aamodelpr=mixed;! end;! 33 Soybean2 NEWICK TREE FORMAT Soybean1 88 99 CommonBean 57 Topology 93 ((A,B),C) ((A:2,B:4):10,C:8) B Lettuce1 52 82 Branch Length A Cowpea C Sunflower2 A Sunflower1 12 MFlower1 63 2 58 Confidence Stats ((A:2,B:4):10[89],C:8) 100 C Potato2 Potato1 100 A 89 Tomato2 85 B C Tomato1 2 13 B Moss2 100 Moss1 Rice3 8 Brachy2 97 Sorghum1 70 63 ((((Moss2:0.59223167356244488246,Moss1:0.48430519315771680677): Columbine 0.47610587518093150372[100],(Rice3:0.55644328355758998494, Papaya (Brachy2:0.63383594852707514367, 96 (Sorghum1:0.14451441234434442284,Maize2:0.55808284363435467501): 98 0.29412654253200387622[63]):0.14718362545267285602[70]): Dandelion2 96 0.72708851517482031568[97]):0.16225290952698268043[8],…) Lettuce2 Maize2 Artichoke Lettuce3 ANALYSIS PIPELINE Mul?ple Alignment Manual Adjustment Format Input Data CLUSTALW T-‐COFFEE MAFFT MUSCLE PROBCONS GENEDOC JALVIEW FASTA PHYLIP NEXUS Phylogene?cs Methods: Distance Matrix Max Parsimony Max Likelihood Programs: PHYLIP RAxML MrBayes Evolu?onary Analyses r8s PAML BEAST Mul?div?me PHYLOGENETIC METHODS DISTANCE MATRIX ANALYSES • The number of differences between all sequence pairs is treated as a distance • Clustering method Neighbor-‐Joining: select tree with smallest total branch length by sequen?al selec?on of neighbors PROS & CONS • • • Computa?onally fast Produces 1 tree > does not consider all possible topologies Can get different results based on input order PROGRAMS • PAUP* • MEGA5 • PHYLIP PHYLOGENETIC METHODS MAXIMUM PARSIMONY ANALYSES b c d a c b d V a V 4! A! G! A! G! a d PROS & CONS • Considers all possible trees (sort of) • Computa?onally intensive 10 taxa > 2million possible trees • No mul?ple hit correc?on PROGRAMS • PAUP* • MEGA5 • PHYLIP • MESQUITE b V 3 G G C C V 2 G G A A V a b c d 1 A A A A V • The op?mum tree requires the minimum number of changes needed to explain the divergence between the taxa • Hypothesis that requires the fewest assump?ons is the best c PHYLOGENETIC METHODS MAXIMUM LIKELIHOOD ANALYSES Uses the maximum likelihood for each possible topology to chose the best tree Ø Choose a probability model to es?mate likelihood that a posi?on will undergo a subs?tu?on within a given ?me Ø Generate likelihood for each possible tree Ø Calculate which tree has the op?mal likelihood PROS & CONS • Makes assump?ons about both the rate of evolu?on and pa]ern of site subsitu?on • Very slow – takes into considera?on all possible trees AND calculates their likelihood • As long as assump?ons are realis?c – tends to be most consistent method PROGRAMS • PAUP* • PHYLIP • MrBayes • RAxML • TREE-‐PUZZLE • PhyML ANALYSIS PIPELINE Mul?ple Alignment Manual Adjustment Format Input Data CLUSTALW T-‐COFFEE MAFFT MUSCLE PROBCONS GENEDOC JALVIEW FASTA PHYLIP NEXUS Phylogene?cs Methods: Distance Matrix Max Parsimony Max Likelihood Programs: PHYLIP RAxML MrBayes Evolu?onary Analyses r8s PAML BEAST Mul?div?me VALUABLE RESOURCE h_p://evoluNon.geneNcs.washington.edu/phylip/socware.html PHYLIP h<p://evolu.on.gs.washington.edu/phylip.html PROGRAM DESCRIPTION: A package of programs for inferring phylogenies. Methods available include parsimony, distance matrix, and likelihood methods, including bootstrapping and consensus trees. PLATFORMS: Windows, Mac OS X, and Linux INPUT: PHYLIP format; Data types include: molecular sequences, gene frequencies, restric?on sites and fragments, distance matrices, and discrete characters. OTHER GENERAL PURPOSE PACKAGES: • PAUP* • MEGA5 • MESQUITE PHYLIP: Distance Matrix Example Pipeline 63 proteins; 515 chars Seqboot Generates mul?ple resampled dataset from Input data set Protdist Computes distance matrix from protein sequence Fitch Consense Generates topology using distance matrix Generates consensus tree from replicates above Instantaneous (100 replicates) 1 ½ hours Global readjustment Jumble = 5 <2 days instantaneous PHYLIP MrBayes h<p://mrbayes.sourceforge.net/index.php PROGRAM DESCRIPTION: A program for Bayesian es?ma?on of phylogeny. PLATFORMS: Mac (serial or clusters), Windows & Unix INPUT: Nucleo?de or amino acid alignments in NEXUS format RUN TIME: 12 taxa; 898 char (nt), ngen=10000; samplefreq=10 89 taxa; 88 char (aa), ngen=10000; samplefreq=10 63 taxa; 515 char (aa), ngen=500000; samplefreq=10 <5 mins <15 mins 19+ hours MrBayes: Loading Input Data MrBayes > excute filename.nex MrBayes: Define Structure of the Model Datatype 4x4 Doublet Codon Nucmodel 1 = F81 2 = K80 6 = GTR Rates equal gamma proinv invgamma adgamma MrBayes > lset nst=6 rates=invgamma MrBayes > help lset MrBayes: Seing the Priors Types of parameters in the model: 1. 2. 3. 4. 5. 6. Topology Branch lengths Sta?onary frequencies of the nucleo?des Nucleo?de subs?tu?on rates (6) Propor?on of invariable sites Shape parameter of the gamma distribu?on of rate varia?on Default parameters work well for most analyses MrBayes > help prset MrBayes: Understanding Screen Printout MrBayes > mcmc ngen=200000 samplefreq=10 prinxreq=50 (1,000,000) (100) Cold Chain ngen MrBayes > help mcmc TREE #1 TREE #2 Time MrBayes: When to Stop Analysis? MrBayes > sump burnin=#; # = value corresponding to 25% of samples Example: if ngen=200000 samplefreq=50 than burnin=1000 (200000 ÷ 50 * 0.25) COMPLETE RUN INCOMPLETE RUN +------------------------------------------------------------+ ! |2 2 2 2 2 |! | 2 2 2 2 2|! | 2 1 1|! | |! | |! | |! | 1 1 |! | 1 |! | |! | |! | |! | 1 1 1 |! |1 1 |! | |! | 1 |! +------+-----+-----+-----+-----+-----+-----+-----+-----+-----+ ! ! !95% Cred. Interval! ----------------------! Parameter Mean Variance Lower Upper Median PSRF *! ----------------------------------------------------------------------------! TL 19.978955 0.050256 19.597000 20.258000 20.084000 3.113! ----------------------------------------------------------------------------! Poten?al Scale Reduc?on Factor RAxML h<p://sco.h-‐its.org/exelixis/soFware.html PROGRAM DESCRIPTION: A program for sequen?al and parallel Maximum Likelihood based inference of large phylogene?c trees. PLATFORMS: Mac & Linux; online version h]p://phylobench.vital-‐it.ch/raxml-‐bb/ INPUT: Nucleo?de or amino acid alignments in PHYLIP format; Newick trees RUN TIME: 25,000 taxa; 1500 char (nt) on single CPU >> 13 ½ days 63 taxa; 515 char (aa), 20 itera?ons; 100 bootstraps >> 1 ¼ days 63 taxa; 134 char (aa), 20 itera?ons; 100 bootstraps >> 9 hours RAxML The Easy & Fast Way: (Works well in most prac?cal cases) raxmlHPC -‐f a -‐x 12345 -‐p 12345 -‐# 100 -‐m model -‐s infile -‐n TEST #conducts BS search and then find best-‐scoring ML tree >> bootstrapped trees, best-‐scoring ML tree, & BS support values. RAxML The Hard & Slow Way 1. Determining ini/al rearrangement seUng: If not specified with -‐i command, it will try se{ngs of 5, 10, 15, 20, and 25 and use the minimal se{ng that yields the best likelihood improvement on the star?ng trees Run program several /mes with both auto determina/on seUng and with a pre-‐defined value of 10. raxmlHPC -‐y -‐s infile -‐m GTRCAT -‐n ST0 #generates random MP star?ng tree raxmlHPC -‐f d -‐i 10 -‐m GTRMIX -‐s infile -‐t RAxML_parsimonyTree.ST0 -‐n FI0 #infers ML tree from star?ng tree using fixed se{ng raxmlHPC -‐f d -‐m GTRMIX -‐s infile -‐t RAxML_parsimonyTree.ST0 -‐n AI0 #infers ML tree from star?ng tree using auto se{ng 2. Determining Number of Rate Categories: Try several rate categories i.e. 10, 25, 40, & 55 and choose the one that gives the best likelihood value raxmlHPC -‐f d -‐i 10 -‐c 10 -‐m GTRMIX -‐s infile -‐t RAxML_parsimonyTree.ST0 -‐n C10 3. Finding the Best-‐Known Likelihood Tree (BKL): raxmlHPC -‐f d -‐i 10 -‐c 25 -‐m GTRMIX -‐s infile -‐# 10 -‐n MO 4. Bootstrapping: raxmlHPC -‐f d -‐i 10 -‐c 25 -‐m GTRCAT -‐s infile -‐# 100 -‐b 12345 -‐n MB 5. Generate Confidence Values: raxmlHPC -‐f b -‐m GTRCAT -‐s infile -‐z RAxML_bootstrap.MB –t RAxML_result.MO -‐n BS_tree TREE VISUALIZATION\MANIPULATION FigTree http://tree.bio.ed.ac.uk/software/figtree/ Prepares graphical representa/ons of tress for publica/on (specifically with BEAST) MEGA5 (Tree Explorer) http://www.megasoftware.net/ Dendroscope http://mafft.cbrc.jp/alignment/server/ MacClade h]p://www.macclade.org/ PloUng, rearranging and edi/ng trees Visualiza/on and naviga/on of phylogene/c trees; designed specifically to handle very large trees i.e. 100,000s of taxa (recommended by RAxML) Interac/ve analysis of evolu/on: observe effect of tree manipula/on i.e # of char steps & distribu/on of states of a given character Wradish1 Radish 2 Rad ish3 91 Rad ish1 Wr ad B ish na 2 pu B s1 ra pa B ol er ac ea 82 4 94 0 Wradish3 67 0 44 10 13 8 s2 99 91 10 0 59 97 Br ac Mos Ric e 3 Mo ss1 hu m 1 2 So rg ze M ai 85 100 97 70 63 100 ize 6 913 di 20 98 e nd Da Ar tic ho 3 ce tu t Le 2 n lio 0.2 Brach y2 Sor ghu m1 96 0.2 e Ma 4 Maize3 63 mb in ya 2 ce ttu elion1 Dand Maize4 88 Columbine Rice3 70 Colu pa Le 91 Maize3 Tomato1 Cotton3 Maize4 Brachy1 Rice 1 88 hu m2 91 71 1 71 Potato1 100 Sorghum3 ze Sorghum3 Brachy3 100 Potato2 Tomato2 85 ai Brachy3 100 58 Rice2 43 M Rice2 Sunflower2 MFlower1 100 Maize1 Maize1 43 52 63 Sorghum2 Pa wer Sorghum2 Rice1 So rg 98 91 97 MFlo Brachy1 53 48 20 Lettuce1 Pop lar2 Brachy1 Rice1 98 Sunflower1 82 Poplar2 2 53 48 20 Cotton3 100 6 Ri ce Poplar2 Apple1B 0 10 Moss2 Moss1 100 8 98 Cotton3 99 88 71 52 98 100 6 Soybean1 Apple1A Tomato3 Tomato3 43 Cowpea 93 96 Cucumber2 41 4121 104647 0 100 91 57 er2 1 low ce nf 1 tu Sulower f n Su 1 er 2 low MF Potato 100 Tomato2 o1 Potat Tomato 100 1 t Le 82 63 58 85 r2 Cucumbe Tomato3 Cucumber2 9 15 CommonBean 93 57 99 88 45 15 9 99 59 MFlower4 45 33 85 15 MFlower4 Soybean2 88 13 Dandelion1 na na lera pus cea2 B rapa s1 B napu 2 dish Wra sh1 di Ra 81 65 3733 12 45 Medicago2 Lettuce2 98 Soybean3 n2 Lettuce3 100 Dandelion2 96 Lettuce2 Dandelion1 97 n1 32 98 Lettuce3 98 Artichoke 96 tto 91 Artichoke Dandelion2 Medicago1 Soybean4 lia 1 98 96 Grape 91 ha Bo Cotton2 Columbine Papaya Papaya 13 85 Maize2 At ata 63 Columbine 96 Sorghum1 70 Maize2 B lyr 63 Radish3 Cotton1 37 A Sorghum1 70 40 41 Brachy2 97 e Rice3 8 Brachy2 97 Co Radish2 ap Rice3 8 Wradish1 62 tto Moss1 Moss1 Wradish3 82 100 Moss2 100 Radish1 94 48 13 41 Cass Moss2 Tomato1 Gr 100 Wradish2 12 Potato1 100 53 85 B napus2 Co Tomato1 B oleracea 100 91 Tomato2 40 6892 4 58 Potato1 100 B rapa 44 Apple1B Apple1A Cassava2 ava1 r1 pla Po 85 21 Potato2 100 Ra sh3 dis h Wradish12 3 Wradish Potato2 Tomato2 B napus1 67 99 58 100 MFlower1 63 So yb e an M Me dica edica 2 go go 2 Soyb 1 ean Soybean4 3 100 Sunflower1 12 MFlower1 63 A lyrata1 an Be n1 on ea yb omm So C a e wp Co e1 tuc er2 Let flow Sun wer1 Sunflower1 12 Athaliana 91 Sunflower2 57 93 82 Sunflower2 Cassava2 Lettuce1 52 Tomato2 82 Cassava1 81 Cowpea 59 Poplar1 Cassava1 Cassava2 99 Apple1A Apple 91 100 1B Soy bea So n4 ybe an3 Me Me dic di ag So cag o1 yb o2 ea n2 o Sunfl Lettuce1 93 Co er1 MFlow Potato2 52 65 CommonBean 57 96 e3 Maize2 tuc Let e ok ich Art na lia ha At ta1 a yr Al 1 tton Co on2 Cott e Grap 85 65 81 97 88 100 Cowpea Poplar1 b lum Po ta to 1 Toma to1 93 Soybean1 99 CommonBean 57 63 33 52 82 Soybean1 99 Sorghum1 70 Soybean2 88 de Dan ya 63 58 Medicago2 97 33 Soybean2 88 Brachy2 97 hy 2 Medicago1 Medicago2 97 33 Rice3 Soybean3 pa Pa ine 98 Ra 91 59 Medicago1 96 lion2 3 hy ac Br m3 hu rg So aize4 M Maize3 Soybean3 Moss1 8 12 91 59 Soybean4 100 37 45 15 13 98 Dandelion1 e2 Lettuc Moss2 100 9 pea 1 Cowan ean Be yb So Apple1B Soybean4 Lettuce3 Papaya Apple1A Apple1B 100 96 Cassava2 99 s2 on mm 81 pu Co 37 41 21 6 MFlower4 na 91 20 98 0 41 Tomato 3 100 Cucumber2 Dandelion2 Artichoke 10 43 Apple1A 0 Cassava2 99 10 81 lar2 Dandelion1 96 Cassava1 65 n3 B 98 48 53 Pop Lettuce2 98 Grape Poplar1 Cassava1 Cotto Poplar2 Cotton2 21 71 Cotton2 100 1 Rice1 Cotton3 0.2 88 91 37 53 Cotton1 85 Poplar1 65 20 A lyrata1 e3 41 Grape 9 um 2 Brachy1 48 Maiz 85 Sorghum2 98 Athaliana 91 A lyrata1 Cotton1 21 B oleracea B napus2 e1 chy m3 41 44 100 41 Athaliana 91 Bra Maize1 1 e4 41 B napus2 Ric Rice2 43 ze So rg h 2 100 ai Brachy3 ce B oleracea 44 M Maize3 B rapa 67 100 88 100 B napus1 B rapa 67 6 Wradish2 91 B napus1 100 Radish1 Ri Wradish2 91 Maize4 71 3 Radish1 Sorghum3 91 Radish3 hu Radish3 Tomato3 hy 40 94 Wradish1 Radish2 82 40 94 ac Br Radish2 Cucumber2 rg So 62 Wradish1 62 82 15 iz Ma Wradish3 Wradish3 32 MFlower4 45 32 62 TREE TYPES ke 2 ANALYSIS PIPELINE Mul?ple Alignment Manual Adjustment Format Input Data CLUSTALW T-‐COFFEE MAFFT MUSCLE PROBCONS GENEDOC JALVIEW FASTA PHYLIP NEXUS Phylogene?cs Methods: Distance Matrix Max Parsimony Max Liklihood Programs: PHYLIP RAxML MrBayes Evolu?onary Analyses r8s PAML BEAST Mul?DivTime POST-‐PYLOGENETIC ANALYSES r8s h<p://loco.biosci.arizona.edu/r8s/index.html BEAST MulNdivNme h<p://statgen.ncsu.edu/thorne/mul.div.me.html PAML Analysis of rates ("r8s") of evolu/on: a program for es/ma/ng absolute rates ("r8s") of molecular evolu/on and divergence /mes on a phylogene/c tree. • Species phylogenies for molecular da/ng • Coalescent-‐based popula/on gene/cs • Measurably evolving popula/ons h<p://beast.bio.ed.ac.uk/Main_Page • Studying rates of molecular evolu/on • Es/ma/ng divergence /mes h<p://abacus.gene.ucl.ac.uk/soFware/paml.html • es/mate branch length • es/mate parameters in evolu/onary model: • transi/on/transversion rate ra/o • the gamma parameter for variable subs/tu/on among sites • rate parameters for different genes • synonymous and nonsynonymous subs/tu/on rates • • • • Test evolu/onary models calculate subs/tu/on rates among sites, reconstruct ancestral sequences, simulate sequence evolu/on and phylogene/c reconstruc/on.