slides - CBSU - Cornell University
Transcription
slides - CBSU - Cornell University
Fei Lu Post‐doctoral Associate Cornell University http://www.maizegenetics.net Genotyping by sequencing (GBS) is simple and cost effective 3. Pool DNAs 4. PCR ........ ..... ... ............. ............ ................ ... Reduced representation library approach (Altshuler et al. 2000. Nature) 5. Illumina sequencing . ........ ............ ........ 1. Digest DNA 2. Ligate adapters with barcodes 500,000 reads/sample (384 plex) (Elshire et al. 2011. PLoSone) Universal Network Enabled Analysis Kit (UNEAK) A reference free SNP calling pipeline Designed for species that…. lack a reference genome are diploid or polyploid are inbreeders or outcrossers have limited genetic or genomic resources Overview of UNEAK A Genome is digested, sequenced using GBS Reads are trimmed to 64 bp B Identical reads = tag Overview of UNEAK – Network filter C Pairwise alignment to find tag pairs with 1 bp mismatch D count Build tag networks E F Topology of tag networks Keep common reciprocal tags real tags error Topology of tag networks Networks of 2496 tags Tag Error Plastid & Highly repetitive tags Moderately repetitive tags, Paralogs & SNPs Details about network filter Error tolerance SNP Program flowchart of UNEAK Fastq/Qseq HapMap TagCount Optional filters Network filter MapInfo TagPair TBT(Byte/Bit) TagPair (Long, Long, Integer) Seq, Seq, Order MapInfo includes: •SNP •Seq •Count •Count distribution •Heterozygote code Pipeline validated with maize inbred linkage population Step 1 Pairwise alignment of tags 23.30% 87.26% 0.07 0.06 0.05 0.04 0.03 0.02 0.01 0 Allele frequency Proportion of SNPs Proportion of SNPs Allele frequency distribution Step 2 Network filter 0.05 0.045 0.04 0.035 0.03 0.025 0.02 0.015 0.01 0.005 0 0 0.07 0.14 0.21 0.28 0.35 0.42 0.49 0.56 0.63 0.7 0.77 0.84 0.91 0.98 Single‐locus rate (Blast to maize) 0 0.08 0.16 0.24 0.32 0.4 0.48 0.56 0.64 0.72 0.8 0.88 0.96 Evaluation criteria Allele frequency Characterization of the Genetic Diversity of Switchgrass Using Genotyping by Sequencing (GBS) GWAS and GS require high‐density markers to accelerate breeding SNP discovery Genome Wide Association Study (GWAS) Genomic Selection (GS) Accelerate switchgrass breeding Challenges and goals Challenges No reference genome Multiple ploidy levels (4X, 6X and 8X) Highly heterozygous Goals Discover high‐density SNPs Construct linkage disequilibrium (LD) map Evaluate population structure Reconstruct phylogeny Switchgrass data set Linkage Populations • Full‐sib Population n=130 individuals • Half‐sib Population n=168 individuals Association Populations 66 diverse populations • Mostly northern‐adapted, Upland populations and cultivars n= 540 individuals 350 GB sequence 720,000 SNPs generated! Tetraploid switchgrass behaves like a diploid Allele frequency in full-sib population Proportion of SNPs 0.035 Most informative markers to construct linkage map 0.03 0.025 F1 0.02 50,000 SNPs 0.015 0.01 0.005 0 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 0.55 0.6 0.65 0.7 0.75 0.8 0.85 0.9 0.95 Allele frequency 1:3 1:1 3:1 AA×Aa AA×aa Aa×Aa aa×Aa 18 Linkage groups perfectly match the chromosome number of switchgrass Correlation of linkage groups R Can we order the SNPs? Yes, use synteny 3,000 high coverage SNPs Linkage groups perfect match to syntenic chromosomes of Foxtail millet (Setaria italica) Small (490 Mb) genome, diploid, n=9 13 million years divergent from switchgrass 10% switchgrass SNPs map to foxtail millet genome Linkage groups of switchgrass Constructed a linkage map of 18 groups 1,401 markers Chromosomes of foxtail millet Upland and lowland ecotypes clearly separate in phylogeny Jackson, MI Hansens Island, MI Upland Tipton, IN Fillmore, MN Genesee, MN Ipswich prairie, WI Ipswich prairie, WI Lowland WS4U Detail Ploidy level resolves into distinct groups Upland 4X Upland 8X Upland 8X Upland 8X Lowland 4X Lowland 4X Ploidy level identified by flow cytometry (Costich et al. 2010. Plant Genome) Geography shows isolation by distance Upland 4X North Upland 8X East Upland 8X West Upland 8X South Lowland 4X South Lowland 4X Northeast Upland 4X arose from Upland 8X a b 66 87 100 58 Upland Upland 8X East 16 100 Upland 4X North 61 15 Upland 8X West Upland 8X South Lowland 4X South 96 Lowland 100 Lowland 4X Southeast Foxtail millet (outgroup) NJ tree using 7,000 markers NJ tree using 29,921 markers Reduced diversity in Upland 4X compared with Upland 8X MDS plot Upland 8X East Upland 8X West Coordinate 2 Upland 4X North 0.4 0.2 0.0 -0.2 Upland 8X South -0.4 -0.2 0.0 Coordinate 1 0.2 0.4 Migration paths of switchgrass Upland 4X North Upland 8X East Upland 8X West Upland 8X South Lowland 4X South Lowland 4X Northeast Summary Effective SNP calling pipeline is developed It works well for non‐reference, heterozygous, and polyploid species 720,000 high density SNPs discovered for GWAS Tetraploid switchgrass behaves like a diploid A synteny based SNP map constructed with low‐ coverage GBS markers Robust phylogeny concurs well with ecotype, ploidy level and geographic distribution of switchgrass Data suggests that Upland 4X arose from Upland 8X Future Direction Putting it all together: GWAS and GS Caldwell Field, Cornell U, Ithaca, NY • • • • • Flowering time Plant height Leaf length and width Standability Biomass quality traits Linkage populations Association populations Acknowledgements Project Manager: Denise Costich (USDA‐ARS, Cornell ) PIs: Edward Buckler (USDA‐ARS, Cornell) Michael Casler (USDA‐ARS, UW‐Madison) Jerome Cherney (Cornell) Institute for Genomic Diversity (Cornell) Sequencing: Rob Elshire Jeff Glaubitz Wenyan Zhu Statistics: Alex Lipka Bioinformatics: Dallas Kroon Field: Ken Paddock Nick Lepak Nick Kaczmar Supported by DOE (including JGI), USDA, and NSF