slides - CBSU - Cornell University

Transcription

slides - CBSU - Cornell University
Fei Lu
Post‐doctoral Associate
Cornell University
http://www.maizegenetics.net
Genotyping by sequencing (GBS) is simple and cost effective
3. Pool DNAs
4. PCR
........
.....
...
.............
............
................
...
Reduced representation library approach
(Altshuler et al. 2000. Nature)
5. Illumina
sequencing
.
........
............
........
1. Digest DNA
2. Ligate adapters
with barcodes
500,000 reads/sample (384 plex)
(Elshire et al. 2011. PLoSone)
Universal Network Enabled Analysis Kit (UNEAK) A reference free SNP calling pipeline
 Designed for species that….




lack a reference genome
are diploid or polyploid
are inbreeders or outcrossers
have limited genetic or genomic resources
Overview of UNEAK
A
Genome is digested, sequenced using GBS
Reads are trimmed to 64 bp
B
Identical reads = tag
Overview of UNEAK – Network filter
C
Pairwise alignment to find
tag pairs with 1 bp mismatch
D
count
Build tag networks
E
F
Topology of tag networks
Keep common reciprocal tags
real tags
error
Topology of tag networks
Networks of 2496 tags
Tag
Error
Plastid &
Highly repetitive
tags
Moderately
repetitive tags,
Paralogs &
SNPs
Details about network filter
Error tolerance
SNP
Program flowchart of UNEAK
Fastq/Qseq
HapMap
TagCount
Optional
filters
Network
filter
MapInfo
TagPair
TBT(Byte/Bit)
TagPair (Long, Long, Integer)
Seq, Seq, Order
MapInfo includes:
•SNP
•Seq
•Count
•Count distribution
•Heterozygote code
Pipeline validated with maize inbred linkage population
Step 1 Pairwise alignment of tags
23.30%
87.26%
0.07
0.06
0.05
0.04
0.03
0.02
0.01
0
Allele frequency
Proportion of SNPs
Proportion of SNPs
Allele frequency
distribution
Step 2
Network filter
0.05
0.045
0.04
0.035
0.03
0.025
0.02
0.015
0.01
0.005
0
0
0.07
0.14
0.21
0.28
0.35
0.42
0.49
0.56
0.63
0.7
0.77
0.84
0.91
0.98
Single‐locus rate
(Blast to maize)
0
0.08
0.16
0.24
0.32
0.4
0.48
0.56
0.64
0.72
0.8
0.88
0.96
Evaluation
criteria
Allele frequency
Characterization of the Genetic Diversity of Switchgrass Using Genotyping by Sequencing (GBS)
GWAS and GS require high‐density markers to accelerate breeding
SNP discovery
Genome Wide Association Study (GWAS) Genomic Selection (GS) Accelerate switchgrass breeding
Challenges and goals
 Challenges
 No reference genome
 Multiple ploidy levels (4X, 6X and 8X)
 Highly heterozygous
 Goals
 Discover high‐density SNPs
 Construct linkage disequilibrium (LD) map  Evaluate population structure  Reconstruct phylogeny Switchgrass data set
Linkage Populations • Full‐sib Population
n=130 individuals
• Half‐sib Population
n=168 individuals
Association Populations 66 diverse populations
• Mostly northern‐adapted,
Upland populations and cultivars
n= 540 individuals
350 GB sequence 720,000 SNPs generated!
Tetraploid switchgrass behaves like a diploid
Allele frequency in full-sib population
Proportion of SNPs
0.035
Most informative markers to construct linkage map
0.03
0.025
F1
0.02
50,000 SNPs
0.015
0.01
0.005
0
0
0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 0.55 0.6 0.65 0.7 0.75 0.8 0.85 0.9 0.95
Allele frequency
1:3
1:1
3:1
AA×Aa
AA×aa
Aa×Aa
aa×Aa
18 Linkage groups perfectly match the chromosome number of switchgrass
Correlation of linkage groups
R
Can we order the SNPs?
Yes, use synteny
3,000 high coverage SNPs
Linkage groups perfect match to syntenic
chromosomes of Foxtail millet (Setaria italica)
 Small (490 Mb) genome, diploid, n=9  13 million years divergent from switchgrass
 10% switchgrass SNPs map to foxtail millet genome
Linkage groups of switchgrass
 Constructed a linkage map of 18 groups 1,401 markers
Chromosomes of foxtail millet
Upland and lowland ecotypes clearly separate in phylogeny
Jackson, MI
Hansens Island, MI
Upland
Tipton, IN
Fillmore, MN
Genesee, MN
Ipswich prairie, WI
Ipswich prairie, WI
Lowland
WS4U
Detail
Ploidy level resolves into distinct groups
Upland 4X
Upland 8X
Upland 8X
Upland 8X
Lowland 4X
Lowland 4X
Ploidy level identified by flow cytometry (Costich et al. 2010. Plant Genome)
Geography shows isolation by distance
Upland 4X North
Upland 8X
East
Upland 8X
West
Upland 8X
South
Lowland 4X
South
Lowland 4X
Northeast
Upland 4X arose from Upland 8X
a
b
66
87
100
58
Upland
Upland 8X East
16
100
Upland 4X North
61
15
Upland 8X West
Upland 8X South
Lowland 4X South
96
Lowland
100
Lowland 4X Southeast
Foxtail millet
(outgroup)
NJ tree using 7,000 markers
NJ tree using 29,921 markers
Reduced diversity in Upland 4X compared with Upland 8X
MDS plot
Upland 8X East
Upland 8X West
Coordinate 2
Upland 4X North
0.4
0.2
0.0
-0.2
Upland 8X South
-0.4
-0.2
0.0
Coordinate 1
0.2
0.4
Migration paths of switchgrass
Upland 4X North
Upland 8X
East
Upland 8X
West
Upland 8X
South
Lowland 4X
South
Lowland 4X
Northeast
Summary
 Effective SNP calling pipeline is developed
 It works well for non‐reference, heterozygous, and 




polyploid species
720,000 high density SNPs discovered for GWAS
Tetraploid switchgrass behaves like a diploid
A synteny based SNP map constructed with low‐
coverage GBS markers
Robust phylogeny concurs well with ecotype, ploidy
level and geographic distribution of switchgrass
Data suggests that Upland 4X arose from Upland 8X
Future Direction
Putting it all together: GWAS and GS
Caldwell Field, Cornell U, Ithaca, NY
•
•
•
•
•
Flowering time
Plant height
Leaf length and width
Standability
Biomass quality traits
Linkage populations
Association populations
Acknowledgements
Project Manager:
Denise Costich (USDA‐ARS, Cornell )
PIs:
Edward Buckler (USDA‐ARS, Cornell)
Michael Casler (USDA‐ARS, UW‐Madison)
Jerome Cherney (Cornell)
Institute for Genomic Diversity (Cornell)
Sequencing:
Rob Elshire
Jeff Glaubitz
Wenyan Zhu
Statistics:
Alex Lipka
Bioinformatics:
Dallas Kroon
Field:
Ken Paddock
Nick Lepak
Nick Kaczmar
Supported by DOE (including JGI), USDA, and NSF

Similar documents