Genome Mapping - SciTech Connect

Comments

Transcription

Genome Mapping - SciTech Connect
Genome Mapping
VK Tiwari, Kansas State University, Manhattan, KS, USA
JD Faris, USDA-ARS Cereal Crops Research Unit, Fargo, ND, USA
B Friebe, Kansas State University, Manhattan, KS, USA
BS Gill, Kansas State University, Manhattan, KS, USA
ã 2016 Elsevier Ltd. All rights reserved.
Topic Highlights
•
•
•
•
•
Molecular markers are important for genetic and genome
mapping studies.
Next-generation sequencing-based marker genotyping,
such as genotyping by sequencing, is an important aid for
gene and genome mapping.
Single-nucleotide polymorphism-based marker development and their detection.
Genome mapping methods use recombination-dependent
and recombination-independent approaches.
Comparative mapping is an important tool for genome
analysis in the crops where sequence information is not
available.
Learning Objective
•
To achieve an understanding of the commonly used molecular markers and approaches used for genome mapping
Introduction
Genome mapping is used to assign short DNA sequences
(molecular markers) or specific genes to particular regions of
chromosomes and to determine their relative linear orders and
distances. A map is an essential tool for scientists to navigate
across the genome. Genome maps can be divided into two
groups: genetic maps and physical maps. Genetic maps are
based on recombination frequencies between genetic markers
and genes, and linked markers/genes form linkage groups
showing their relative order. A physical map of a given chromosome or a genome shows the physical locations of genes
and other DNA sequences of interest, and distances are typically measured in base pairs. Physical maps can be divided into
three general types: chromosomal or cytogenetic maps, radiation hybrid (RH) maps, and sequence maps. The ultimate
physical map is the complete sequence itself.
Molecular Markers and Their Visualization
DNA-based genetic markers rely on differences in DNA
sequences (polymorphisms) between two parental lines. Polymorphisms can result from various factors that lead to either
nucleotide changes or differences in DNA segment lengths
such as mutations, errors in DNA replication, and insertions,
inversions, and deletions of DNA fragments.
Encyclopedia of Food Grains
There are several established approaches for the detection of
polymorphisms using molecular markers including restriction
fragment length polymorphism (RFLP), random amplified
polymorphic DNA (RAPD), amplified fragment length polymorphism (AFLP), sequence-tagged site (STS), microsatellites
or simple sequence repeats (SSRs), and single nucleotide polymorphism (SNP). Originating in the 1980s, RFLP markers
were the first type of DNA-based markers to be used. RFLPs
involve the use of a restriction enzyme, which cleaves DNA at
specific DNA sequence palindromes, and the hybridization of
a short-labeled DNA fragment, or probe, to the restriction
enzyme-cleaved DNA. The probe label reveals the restriction
fragment hybridized by the probe, and polymorphisms are
revealed when an insertion/deletion occurs between critical
restriction sites in one genotype compared to the other or
when a particular restriction site is abolished due to mutation
in one genotype and not the other. RFLP markers can be
applied to essentially any organism, and they are still
employed to a limited extent today due to their usefulness in
comparative mapping analysis and map-based cloning studies.
However, these markers are not amenable to high-throughput
analysis, and they are difficult and laborious to handle due the
large amounts of DNA required, enzymatic digestions, Southern blotting, and probe labeling techniques.
Besides the RFLP marker technique, all the other types are
based on the use of polymerase chain reaction (PCR). PCRbased markers require the development of an oligonucleotide
primer, which is a fragment of DNA typically 15–30 nucleotides
in length, to serve as a starting point for PCR amplification on
template DNA. In a PCR reaction, template DNA is mixed with
primers, nucleotides, and a specific enzyme called Taq
polymerase, which polymerizes DNA fragments. The mixture
is placed into a thermal cycler and subjected to repeated cycles
of different temperatures to allow the template DNA to denature, the oligonucleotide primers to anneal to complementary
sites on the template DNA, and the Taq polymerase to catalyze
the synthesis of new DNA strands leading to the generation of
billions of copies of the target sequence. After the completion of
the PCR reaction, the amplified product is electrophoresed
through an agarose or polyacrylamide gel and subsequently
visualized by DNA staining or other technologies.
RAPD markers are DNA fragments from PCR-based amplification of random segments of genomic DNA with a primer of
arbitrary nucleotide sequences. RAPD markers were the first
PCR-based markers to be used but, today, have very limited
application in molecular biology and mapping studies due to
the unpredictability of short primers in PCR and low
repeatability.
AFLPs, which combine the use of restriction enzymes with
PCR, have been used extensively in a wide range of organisms.
http://dx.doi.org/10.1016/B978-0-12-394437-5.00220-5
1
2
GENETICS OF GRAINS | Genome Mapping
The AFLP technique uses restriction enzymes to digest the
genomic DNA followed by ligation of adapters to the sticky
ends of the restriction fragments to serve as priming sites for
PCR. Subsets of the restriction fragments are selected by using
primers with sequencing complimentary to the adapter
sequence and also one or two nucleotides within the restriction
fragments of the template DNA. The reactions often employ
end-labeled radioactive or fluorescent primers for the visualization of the amplified products on polyacrylamide gels. The
AFLP technology is also highly sensitive and reproducible and
has the capability to detect various polymorphisms in different
genomic regions simultaneously. AFLP has higher reproducibility, resolution, and sensitivity at the whole-genome level
compared to some of the other marker techniques, and it also
has the capability to amplify multiple fragments (50–100) in a
single PCR, which provides a high-throughput format.
STSs are short DNA sequences (200–500 bp) with known
genomic locations. STSs can be easily detected by the PCR
using specific primers. In complex genomes, STS markers
derived from the coding regions of genes, that is, the expressed
portion of genome referred to as expressed sequence tags
(ESTs), can be a very useful resource for mapping the locations
of expressed genes. These markers are usually codominant in
nature, which allows the identification of homozygous and
heterozygous individuals in a mapping population. The STS
sequences may contain repetitive elements with unique and
conserved sequences at both ends of the site, and in broad
sense, STS can have a site for markers such as microsatellites,
sequence-characterized amplified region, cleaved amplified
polymorphic sequences, and inter-simple sequence repeats.
Microsatellite markers, also called SSRs, are widely used in
gene and genome mapping studies. These are simple sequence
tandem repeats and the repeat units are generally di-, tri- tetra-,
or pentanucleotides. In a common repeat motif (e.g., in a trirepeat motif in wheat (GAA)n), two nucleotides G and A are
repeated for a variable number of times in a bead-like fashion
(n could range from 8 to 50). SSRs are usually found in noncoding regions of DNA with a few exceptions. On both sides of
the repeat unit are flanking regions that contain unordered
DNA, and these flanking regions are most important to
develop locus-specific primers to amplify SSRs with PCR. The
number and repeats within a microsatellite tend to be highly
variable within a given species, which leads to a high frequency
of polymorphism even among closely related individuals.
Many large and complex genomes, especially those of some
plants, are composed of only about 10–20% gene sequences,
whereas the vast majority (80–90%) is composed of transposable
element (TE)-related sequences or repeat-based sequences. These
repetitive or TEs are widespread throughout the genome and
therefore represent a useful resource for whole-genome mapping.
These elements have higher levels of tolerance for mutations or
rearrangements, which make these TEs highly polymorphic and a
good source of marker development for genome mapping. Various TE-based marker development approaches have been used
and some of the most common repeat-based markers, which were
developed in wheat, belong to two classes including insertion sitebased polymorphism markers and repeat junction markers. These
markers are based on PCR with primers designed in conserved
regions of TEs. In general, repeat sequences in the genome are not
unique, but the insertion sites or repeat junctions are. Therefore,
by developing primers that are specific to particular insertion sites
or repeat junctions, it is possible to develop genome-specific
markers (Figure 1). After the identification of an insertion site or
repeat junction, the flanking sequences can be used to design the
primers. After the fragment is PCR-amplified, there are various
detection methods available for visualizing the marker polymorphisms including high-resolution melting analyses, temperature
gradient capillary electrophoresis, and fluorescent capillary
electrophoresis.
With advances in next-generation sequencing (NGS) technology, it is less expensive to determine the DNA sequence of a
fragment, and this has led to dramatic advances in highthroughput marker technologies. With restriction siteassociated DNA (RAD) markers, the flanking DNA sequence
around each restriction site is an integral component for isolation of restriction site-associated tags. The application of the
flanking DNA sequences in RAD tag techniques is referred as
reduced-representation method. The RAD tag isolation procedure has been modified for use with high-throughput sequencing on the Illumina sequencing platform, to reduce error rates
and make the process high throughput. Isolated RAD tags can
be used to identify and genotype DNA sequence-based polymorphisms such as SNPs, and these polymorphic sites are
called as RAD markers.
The advent of automated Sanger sequencing and especially
recent advances in NGS technologies led to the development
of a second generation of markers based on sequence information. SNPs differ by a single nucleotide A, T, C, or G at a given
Transposable sequences
TE junction
(a)
(b)
Gene or unknown sequence
(c)
(d)
Figure 1 Types of repeat junctions in a given genomic DNA sequence that can be used for designing unique locus-specific markers: (a) A repeat
junction between two different transposable elements (TEs). (b) Two repeat junctions with two different TEs (black and green) and an unknown sequence
(pink). (c) Repeat junction with a TE on one side and a gene fragment or unknown sequence on the other side. (d) Two repeat junctions (nested)
created by a TE inserting into another TE.
GENETICS OF GRAINS | Genome Mapping
locus between different individuals, populations, and parental
lines (Figure 2). If this variation occurs between the members
of the same population, these variations are considered alleles
(e.g., A or T), and most SNPs have only two alleles. SNPs have
emerged as the markers of choice because of their abundance
and high-throughput detection capacities. There are many
ways to identify SNPs starting from a low-throughput method
like PCR amplification followed by electrophoresis, sequence
detection, and mass spectrometry to high-throughput NGSbased SNP discovery. After generating sequences for SNP
discovery, the next step is to detect useful SNPs. Manual identification of putative SNPs had been a major bottleneck for
high-throughput SNP calling, but now, there are numerous
3
software programs available for SNP discovery. These programs (CASAVA, GS Amplicon, BioScope™, NextGENe®,
GigaBayes, SNPdetector, PolyScan, etc.) are very important
for the development of accurate computational methods for
automated SNP calling. There are established approaches and
protocols for SNP discovery in many species, and for species
with reference genome sequences, NGS reads can be mapped
on the reference sequences and SNP discovery can be made.
However, SNP discovery can also be done in the species without a reference sequence. There are many assays available for
SNP genotyping including Illumina GoldenGate, KASPar,
iPLEX Gold technology, and Illumina BeadChips, to name a
few (Figure 3(a) and 3(b)). Exciting progress has been made in
SNP site
Genotype1 TTGGCCTGATTTTAGTGGTACGGCCCCGTCACCCGTGATTGGTGAAGTTGGAATGGAGGA
Genotype2 TTGGCCTGATTTTAGTGGTATGGCCCCGTCACCCGTGATTGGTCAAGTTGGAATGGAGGA
∗∗∗∗ ∗∗ ∗∗∗ ∗∗ ∗∗∗ ∗∗ ∗∗ ∗∗ ∗ ∗ ∗∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗∗ ∗ ∗∗ ∗ ∗ ∗ ∗ ∗ ∗∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗
Figure 2 Identification of SNP sites in a DNA sequence: Two SNPs between genomic DNA of two genotypes are shown. The length of the sequence is
60 base pairs, and genotype 1 and genotype 2 show variants at positions 21 [C/T] and 44 [G/C].
A
T
A
T
G
C
A
G
C
A
T
A
G
G
C
G
A
T
A
T
A
G
C
G
C
G
A
T
A
G
C
G
C
TA
CG
A
T
A
T
G
C
G
C
Figure 3 (a) Hybridization-based SNP genotyping method (Illumina Infinium assay): In this assay, the genomic DNA is captured by direct hybridization
to array-bound target sequences (50 bases directly upstream of the SNP). Followed by hybridization, a single-base extension reaction with
dideoxynucleotides (fluorescent) is used at the target SNP nucleotide. Differences in the relative intensity of fluorescent signals can be used to make
genotyping calls. (b) PCR-based genotyping methods (Applied Biosystems’ TaqMan assay): For each locus, two common locus-specific primers are
designed on each side of the SNP to amplify the fragment spanning the polymorphic site. Two fluorescence resonance energy transfer (FRET)-labeled
oligonucleotides called TaqMan probes are then added to the PCR. Each probe is specific to one of the alleles and is designed to hybridize at
the SNP site between the forward and reverse primers. By design, these have a reporter dye at their 50 end (different for each allele) and a quencher
(Q) at their 30 end. If there is no reaction, the probes are intact and the reporter dye’s emission is suppressed by the quencher. During the PCR
amplification, the Taq polymerase cleaves the probe that anneals to the template and separates reporter and quencher resulting in the emission of
fluorescence from the reporter. Genotype calling can then be made according to the fluorescent signal.
4
GENETICS OF GRAINS | Genome Mapping
sequencing technologies that are providing high-throughput
molecular marker information at low costs. Genotyping by
sequencing (GBS) provides marker polymorphisms using
NGS technologies followed by a bioinformatics pipeline. It is
a preferred method for several reasons including reduced cost
through an enzyme-based genomic complexity reduction step
and the use of barcoded adapters for multiplexing. Additionally, it can be used for the discovery and identification of SNPs,
even for those species with complex genomes that lack a reference sequence. GBS has advantages when studying polyploid
species, which is a big challenge for any technology. It relies on
secondary genome-specific polymorphisms that are next to the
SNP, and it allows the assignation of a given sequence to a
specific genome so it becomes a single-locus marker.
chromosomes pair at meiosis, they recombine at various positions along the chromosomes. Thus, recombination is the
basis for genetic linkage mapping and determining the order
of markers along the chromosome, that is, markers are separated by genetic distances calculated based on the amount of
meiotic recombination that occurs between them.
An example of genetic linkage mapping of three linked
markers in 20 F2 progeny is presented in Figure 4. The markers
include two DNA markers (A and B) and one morphological
marker (disease resistance gene ‘R’). The DNA markers are
codominant, and therefore, all possible genotypes can be
determined in the F2 progeny (homozygous for parent A,
homozygous for parent B, and heterozygous). For the morphological marker, disease resistance is dominant, and therefore,
the genotypic classes of heterozygous and homozygous for the
resistant parent (parent A) cannot be distinguished (resistant
plants can have allelic compositions of ‘RR’ or ‘Rr,’ and susceptible plants have ‘rr’). Inspection of Figure 4 indicates there are
three individuals (2, 6, and 12) with genotypes that differ
between markers A and B. Between A and R, there are two
individuals (6 and 12) with differing genotypes, and one individual (2) has differing genotypes between markers B and R.
This suggests that marker R (disease resistance gene) lies
between markers A and B. The two recombination events
between markers A and R translate into ten map units (2/
20 100 ¼ 10), and there are five map units between markers
B and R (1/20 100 ¼ 5).
Genetic Linkage Mapping
Parent B
F1
Parent A
Markers are powerful for many diagnostic applications for
typing biological samples in determining the identity of
unknown samples, sample mixtures, criminal justice system,
and curation of biological collections, to name a few. Highdensity genetic linkage maps facilitate map-based cloning,
quantitative trait mapping, marker-assisted breeding, and comparative genome evolution. Genetic mapping relies on the fact
that nuclear genomes are made up of chromosomes, which
contain both genes and noncoding DNA. When homologous
F2 progeny
1
2
3
4
5
6
7
8
9 10 11 12 13 14 15 16 17 18 19 20
1
2
3
4
5
6
7
8
9 10 11 12 13 14 15 16 17 18 19 20
1
2
3
4
5
6
7
8
9 10 11 12 13 14 15 16 17 18 19 20
R
R
S
R
R
R
R
R
S
Marker A
Marker B
Disease
resistance (R)
gene
R
S
R
R
S
R
R
R
R
S
R
R
R
R
Phenotype: R = resistant; S = susceptible
Genotypes: Parent A = RR; Parent B = rr; F1 = Rr; F2 progeny R = RR or Rr, S = rr
Linkage Map
B
R gene
5
A
10
Figure 4 Genotypic data of two DNA markers (A and B) and phenotypic data for one morphological marker (disease resistance gene ‘R’) for
two parents, the F1 plant derived from crossing the two parents, and 20 F2 individuals. The DNA markers are codominant; thus, all possible genotypes
can be distinguished (homozygous for parent A and heterozygous and homozygous for parent B). The morphological marker ‘R’ is dominant,
and therefore, the genotypes of resistant F2 individuals cannot be distinguished (resistant plants can be either homozygous for parent A (RR) or
heterozygous (Rr)). The resulting genetic linkage map of the three loci and genetic distances separating them are shown at the bottom.
GENETICS OF GRAINS | Genome Mapping
This type of analysis can be applied to hundreds, or even
thousands of markers to construct complete genetic linkage
maps of chromosomes. Fortunately, there are various computer software programs available to handle such large data
sets and to determine the most likely marker orders and intermarker distances.
The number of individuals surveyed in a mapping population determines the precision of the genetic distance measured.
In the example, only 20 individuals were surveyed, and if no
recombinants were identified between two markers, this would
translate to a genetic distance of 0 map units between the
markers. If 100 individuals were surveyed, then one or more
recombinants may be identified leading to a genetic distance of
one or more map units. Generally, initial genetic maps of plant
species are generated using 80–120 individuals, which allows
for the detection of recombination between markers one to
three map units apart. This level of precision is considered
acceptable, and, at the same time, the amount of labor and
cost is considered manageable. However, certain mapping
experiments such as map-based cloning of genes by chromosome walking require much higher resolution in order to
separate markers extremely close to the target gene. In these
experiments, it is not uncommon to survey 3000–5000 individuals to obtain the necessary level of precision.
In plants, most populations are derived from crossing two
highly homozygous parents. The population shown in the
example in Figure 4 is an F2 population. While F2 populations
are commonly used and generally a good choice for chromosome mapping, other types of populations, such as backcross
(BC), doubled-haploid (DH), and recombinant inbred (RI),
are also commonly used. However, DH technology is not easily
accomplished in some crops, and it is currently impossible in
others. Each type of population has its advantages and disadvantages. F2, BC, and DH populations can be developed very
rapidly, while RI populations are developed by advancing each
line by single-seed descent for many generations with the goal
of selfing to homozygosity. F2 and BC populations are shortlived and provide limited opportunity to obtain DNA and
phenotypic data, while DH and RI populations provide essentially pure lines that may be tested for traits in replicated
experiments over several environments if desired. Thus, RI
and DH populations are preferred for mapping of quantitative
traits that may be affected by environmental influences. BC
and DH populations result from one cycle of meiosis, but an F2
population has undergone recombination in both male and
female gametes and, therefore, provides twice the recombination information. RI populations have undergone several
cycles of meiosis but contain two identical homologues and,
therefore, provide about the same amount of information
as an F2.
The development and analysis of genetic linkage maps lead
to an abundance of information regarding genome structure.
From a more applied perspective, they provide knowledge
regarding the locations of genes and DNA markers associated
with them. In a segregating population, morphological
markers can be scored and analyzed in the same manner as
DNA markers. The difference in scoring for morphological
makers compared to DNA markers lies in the fact that, for
morphological markers, the genotype is determined based on
the visualization of the plant’s phenotype, while DNA markers
5
are scored at the DNA level. For example, a population segregating for resistance to a particular disease would be scored
based on the reaction of each individual to the disease as being
one of either parental type. Inclusion of this phenotypic data
with genotypic DNA marker data for map generation might
reveal that the disease resistance gene is flanked by closely
linked DNA markers. Such markers are valuable tools that
can be employed by plant breeders who wish to move the
disease resistance gene into elite lines for the development of
new and improved varieties. Using the markers to make selections is known as marker-assisted selection (MAS). MAS has
advantages over selecting for the trait itself in that markers are
not affected by environmental factors as phenotypic traits
sometimes are. In addition, MAS allows breeders to make
selections in early generations and growth stages allowing
them to eliminate undesirable material early on.
In a nutshell, genetic mapping is a great resource for trait
mapping and map-based cloning studies; however, a genetic
map is not sufficient for sequencing a genome. Polymorphism
level is many times a limitation for genetic mapping; however,
the advanced and low-cost NGS approaches have been a big
boost to overcome this limitation. GBS has been widely
accepted and is now being used to map target traits in various
crops. In addition, sequence-based genome mapping, where
members (mostly RI lines) of a given mapping population are
sequenced to high coverage, is now gaining momentum as
well. This approach is very useful for generating a large number
of sequence tags, which can be assembled and anchored on the
genetic map in a chromosome-wise manner. However, precise
ordering of these tags/contigs will be an issue due to the
limitation of genetic mapping in terms of the number of
recombination events. The resolution of a genetic map
depends on the number of recombination events that have
been scored in a given population. Recombination events are
not uniformly distributed across the length of the chromosome
as recombination is suppressed around the centromeric
regions. So reduced or nearly absent recombination events
affect the resolving power of linkage analysis, which means
that genes that are several kilobases to megabases apart may
appear at the same position on the genetic map.
Physical Mapping
In contrast to genetic mapping where distances between landmarks are calculated based on recombination frequency, physical mapping determines the actual physical distance. Physical
mapping can be done cytologically by chemically staining and
viewing whole chromosomes using techniques such as in situ
hybridization (ISH) and C-banding. Such techniques have very
low resolution in terms of physical mapping because chromosomes are viewed at the cellular level usually at metaphase.
However, recent techniques, such as fiber-fluorescence in situ
hybridization (FISH) where nuclear DNA is lysed on a glass
slide and used for in situ mapping, can provide a much higher
resolution (see succeeding text). The highest-resolution physical mapping is obtained by sequencing the DNA itself. It is
usually preceded by constructing local contiguous sequences
(contigs) of large-insert DNA clones and anchoring the contig
to a genetic map.
6
GENETICS OF GRAINS | Genome Mapping
In Situ Hybridization
The ISH technique was developed about 45 years ago and
allows the localization of genes or DNA sequences directly on
chromosomes in cytological preparations. The ISH technique
uses probe DNA that is labeled with biotinylated dUTP or
digoxigenin-dUTP and the hybridization sites are detected by
enzymatic reporter molecules such as horseradish peroxidase
or alkaline phosphatase-conjugated avidin/streptavidin. ISH
has been used successfully to determine the physical location
and distribution of dispersed or tandemly repetitive DNA
sequences on individual chromosomes. For example, it has
been used to determine the physical location of multicopy
gene families such as the 5S and 18S–26S ribosomal genes.
FISH uses fluorochromes for signal detection. The FISH
technique allows different DNA probes to be labeled with
different fluorochromes that emit different colors (multicolor
FISH). Thus, the physical order of two or more probes on a
chromosome can be determined simultaneously. Also, FISH
can allow more precise mapping of probes because the fluorescence signals can be analyzed with special cameras and
digital imaging tools.
In humans, the order of two DNA probes can be determined by ISH on metaphase chromosomes only if the two
sequences are separated by at least 1 Mb. However, when ISH
is done using interphase nuclei, DNA sequences separated by
as little as 50 kb can be resolved. Plant metaphase chromosomes are more condensed than human metaphase
chromosomes, and this may be one reason why ISH using
low-copy probes is more difficult in some plant species. Thus,
it has been suggested that interphase nuclei can be exploited
for ISH mapping in plants. Subsequently, experiments where
DNA probes were hybridized to maize interphase nuclei suggested that the resolving power of interphase FISH mapping
can be as little as 100 kb.
FISH technique has been used successfully to determine the
physical location of bacterial artificial chromosome (BAC)
clones on interphase and metaphase chromosomes. Rice BAC
clones have been hybridized to rice (Oryza sativa L.) chromosomes revealing that the repetitive DNA sequences in the BAC
clones could be efficiently suppressed by using rice genomic
DNA as a competitor in the hybridization mixture. The successful application of this technique to plants with very large
genomes may depend on the size of the genomic clones analyzed and the amount of repetitive sequences in the genome.
Fiber-FISH
Fiber-FISH technique uses extended chromatin DNA across a
glass slide and a probe is labeled as with standard FISH and
hybridized to the extended fibers and where DNA sequences,
which are only a few kilobases apart, can be ordered. In
humans, fiber-FISH has been used to analyze overlapping
clones, detect chromosomal rearrangements, determine the
physical distances between genes, measure the sizes of long
DNA loci, and aid in the positional cloning of specific genes.
Fiber-FISH was used in Arabidopsis thaliana to measure clusters
of DNA repeats as long as 1.71 Mb, which is more than 1% of
the Arabidopsis genome. It was found that fiber-FISH signals
derived from small DNA fragments (<3 kb) were often
observed as single spots on extended DNA fibers, and thus,
sequences that are less than 5–10 kb apart cannot be ordered.
Single-Copy Gene FISH
Single-copy gene FISH is an approach to develop a cytogenetic
map of a given chromosome using full-length cDNA (fl-cDNA)
probes. Because genes and gene syntenic blocks are conserved
between different grass species such as wheat, barley, rice, and
maize, single-copy FISH provides a rapid method for determining chromosome synteny for species for which little genetic or
cytogenetic mapping information is available. In an event of
transferring important genes from wild relatives to bread wheat
(Triticum aestivum L., 2n ¼ 6 ¼ 42, AABBDD) by induced
homoeologous recombination, it is important to know the
chromosomal relationships of the species involved. Singlecopy FISH provides a powerful and rapid method for determining genetic relationships of relatively little studied wild
relatives with those of wheat. Once identified from singlegene markers, fl-cDNA probes are used for FISH and the respective positions of these probes are determined to develop a
cytogenetic map. This technique can also be used to identify
structural changes between the homoeologous groups of chromosomes, between the genomes of wheat, and other species
from the Triticeae tribe. This provides important information
on the strategies to be used for exploitation of those species for
wheat improvement.
Aneuploid Mapping
Wheat is a polyploid and can tolerate a high degree of aneuploidy (abnormal chromosome numbers). There are a vast
array of aneuploidy stocks such as nullisomic–tetrasomic
(NT) lines and the ditelosomic (dt) lines. NT lines lack one
pair of chromosomes and extra pair of homoeologous chromosomes and allow arm mapping of genes. Ditelosomic lines
lack one pair of chromosome arms and allow arm mapping of
genes.
With today’s molecular technology, the power and utility of
the wheat aneuploids have been even more fully realized. DNA
markers can be quickly located to a specific chromosome or
chromosome arm using a single hybridization or amplification
reaction without the need for polymorphism. Telocentric chromosomes can be flow-sorted and DNA-amplified and used for
NGS for marker development. Dense chromosomal arm maps
have been developed and genes identified and ordered to
specific chromosome arms. These maps are useful for gene
tagging, linkage and mapping of quantitative trait loci (QTL),
cytogenetic manipulations, estimation of genetic distance, and
evolutionary studies.
Chromosome Deletion Mapping
A unique system in wheat is the use of gametocidal (Gc) factors
to construct chromosome deletion lines. Gc chromosomes
GENETICS OF GRAINS | Genome Mapping
were introduced into wheat by interspecific hybridization with
the related Aegilops species and backcrossing. Plants monosomic for the Gc chromosome produce two types of gametes.
Only those gametes possessing the Gc chromosome are normal. Gametes lacking the Gc chromosome undergo structural
chromosome aberrations and, in most cases, are nonfunctional. However, if the damage caused by the chromosome
breakage is not sufficient to kill the gamete, it may still function
and be transmitted to the offspring.
The Gc system has been used to develop wheat lines with
terminal chromosome deletions. These stocks have proved very
useful for the physical mapping of genes and DNA markers to
subarm locations and for the development of physical maps,
which have been constructed for all seven homoeologous
chromosome groups of wheat. In addition, chromosome bin
maps of most of the expressed genes in the wheat plant have
been constructed using a set of wheat aneuploid and deletion
lines (http://wheat.pw.usda.gov/wEST/binmaps/).
HAPPY Mapping
Another genome mapping approach known as HAPPY mapping has been used for genome mapping studies. This
approach is based on haploid DNA samples analyzed using
the polymerase chain reaction (HAPPY). HAPPY mapping
does not require marker polymorphism or time-consuming
population development. It is an in vitro approach for the
ordering of DNA markers directly on native genomic DNA
and is based on analyzing the segregation of markers amplified
from high-molecular-weight genomic DNA. It is a three-step
process. First, genomic DNA is broken into random fragments
using gamma irradiation or mechanical shearing. The DNA is
isolated and analyzed for quality and integrity, which is the
most important aspect of the technique. Various protocols
have been tested and used to avoid unwanted mechanical
breakage of the DNA molecules. It is usually done by embedding the living cells in agarose gel; during DNA extraction, long
molecules of chromosomal DNA remain trapped and protected within the agarose. The high-quality DNA (DNA solution) is then subjected to random fragmentation using
mechanical shearing, gel melting, and x-ray treatments. The
average size of the broken fragments depends on the dosage
or mechanical shearing used. The next step involves the development of a ‘mapping panel,’ and to achieve this, broken DNA
fragments are diluted to a very low concentration and 100
samples from individual treatments usually get dispensed into
DNA collecting plates or tubes. Since these samples are very
small, each well or tube may represent a small incomplete set
of random fragments. The third and final step involves a highly
sensitive PCR followed by the scoring of markers as present or
absent in the HAPPY mapping panel.
Genotyping of large sets of markers and detailed analysis of
marker data can be used for the construction of maps and to
calculate precise locations of markers on a given chromosome
or genome. Because the samples in a mapping panel are so
small that each one will contain only a randomly sampled
subset of the markers rather than the complete genome, a
given marker tested on the panel can be present in only one
subset of the panel. If two marker loci are close together, then
7
they will remain on the same broken fragments and not show
any break between them, whereas distant markers may be lost.
With increasing distances between two marker pairs, the frequency of random breaks between them will also increase. The
statistical analysis of the cosegregation frequencies and different mapping software can be used to deduce a marker or map
order based on the data generated from the HAPPY mapping
panel. There are certain limitations attached to this approach.
The first is that it is difficult to prepare DNA fragments of more
than a few megabases in size, and therefore, intermarker distances of more than one megabase are difficult to measure.
Another major limitation is the sample size of the DNA in the
mapping panel, as all markers need to be mapped by PCR.
RH Mapping
RH mapping has been exploited in animal genome mapping
projects and is a recombination-independent approach. It was
pioneered in the human genetics arena and uses radiationinduced chromosome breakage rather than meiotic recombination for mapping. After fragmentation, samples containing
different subsets of the original chromosome or genome are
isolated and used for marker assays. In this method, any given
mapping panel member is assayed for the presence or absence
of a given marker, thus circumventing the need for marker
polymorphisms between genotypes.
Gross and Harris produced the first RHs by irradiating the
cultured human cells with a high dose of x-rays and their
subsequent fusion to unirradiated hamster cells. Generated
RHs showed many broken fragments of human chromosomes
with unfragmented chromosomes of hamster cells. The
approach was then modified and applied to a number of
animal species. In the modified approach, donor cells are
irradiated and then fused to unirradiated host cells, and RHs
containing donor chromosome fragments are identified using
selectable markers for a given species. Species-specific RHs can
be isolated, cultured, and saved as an immortal resource.
For genome (RH) mapping, the DNA of 100 hybrid cell
lines (each containing a different set of donor fragments) can
be assembled as an RH panel. The assembled panel can be used
for marker genotyping and the order and distances of the
markers in a given genome can be inferred. Mapping resolution in an RH panel is a function of the size of the fragments
that are generated during the development of the mapping
panel. Therefore, the mapping resolution can be altered by
simply changing the level of chromosome fragmentation.
Additionally, in RHs, map distances better reflect the true
physical distance between markers than do recombinationbased maps, so maps constructed by the RH approach can
better approximate the physical layout of a given chromosome.
The RH approach has been used to map the human genome
along with various animal genomes; however, its application
in plants has been limited. RH mapping in plants was first
reported for a maize chromosome, and then, it was applied
to cotton, barley, and wheat. Recently, RH mapping was used
for genome mapping of hexaploid wheat (Figure 5). Figure 5
presents a scheme for the development of an RH panel for Dgenome chromosomes of hexaploid wheat. Pollen from the
reference hexaploid wheat Chinese Spring was irradiated using
8
GENETICS OF GRAINS | Genome Mapping
Tetraploid wheat line
Altar
2n=4x=28 (AABB)
Hexaploid wheat line
Chinese Spring
2n=6x=42 (AABBDD)
Green House
Planting
Emasculation of
tetraploid wheat
spikes
Pollen
n=3x=21(ABD)
Pollen
Gamma
irradiation
X
Pollen
n=3x=21(ABD)
Genotyping
Egg
n=2x=14(AB)
About 25 days after pollination spikes were
harvested which carried RH1 seeds.. Each
Seed represents a Chinese Spring-RH and
independent deletion event(s).
Green house planting,
tissue collection, DNA
extraction
RH1
2n=5x=35(AABBD)
Figure 5 Development of Chinese Spring D-genome radiation hybrid panel: The spikes of hexaploid wheat cultivar Chinese Spring (T. aestivum;
2n ¼ 6 ¼ 42, AABBDD) were used for g-irradiation. Pollen from irradiated spikes was immediately used to pollinate the stigmas of emasculated florets
(male anthers removed) of tetraploid wheat variety Altar 84 (T. turgidum; 2n ¼ 4 ¼ 28, AABB). Seeds of F1 hybrids were harvested 20 days after
pollination. Each surviving F1 seed (RH1-pentaploid) on germination represents a unique RH event. DNA samples of the individual RH1 plants were then
harvested and genotyped for RH mapping.
gamma radiation, and these pollen samples were used to pollinate a tetraploid wheat line Altar84. F1 seeds (pentaploid)
represent an RH panel and each plant from these seeds presents
a unique RH event. Chromosome lesions induced in the A and
B genomes of Chinese Spring are masked in these quasipentaploids due to the presence of A and B genome chromosomes from the tetraploid parent, but the chromosomes from
the D genome are present in one copy and allow RH mapping
of all D-genome chromosomes simultaneously. It has been
found that using a small RH panel (94 lines), map resolution
of up to 300 kb can be achieved throughout the length of any
given chromosome in hexaploid wheat. The RH panel can be
used to anchor and order BAC contigs, derived from flowsorted chromosome arm-specific libraries to individual wheat
chromosomes. RH panels will also be highly useful for ongoing wheat genome sequencing projects for ordering of
sequence scaffolds.
Large-Insert Clone Contigs
The construction of physical contig maps is important for
facilitating positional cloning of genes, sequencing of genomic
DNA, and detailed analysis of chromosome and genome structure. Physical contig mapping is the arrangement of large-insert
clones (YACs, BACs, and cosmids) in a linear array that
represents the DNA sequence along the chromosome. Clones
are selected by screening a library with DNA probes used to
detect genetic markers on a genetic linkage map of the organism. Several DNA probes that detect closely linked genetic loci
will hybridize to corresponding large-insert clones, and these
clones can then be arranged into a contig based on overlapping
segments and fingerprinting. BAC contigs are currently being
developed in many crop species. However, crops with complex
genomes offer huge problems due to large genome size, polyploid nature, and very high percentages of repetitive sequences.
To address these issues in wheat, a sophisticated flowsorting technique was applied for isolation of individual
chromosomes or chromosome arms. The DNA from these
flow-sorted chromosomes and arms was used for the development of BAC libraries. These BAC libraries laid the foundation
for the physical mapping of the wheat genomes. Once a physical contig map is complete, the structure and organization of
the genome, such as the distribution of repetitive and singlecopy sequences, can be discerned. A BAC-by-BAC approach has
been considered as the most suitable approach for generating
reference genome maps of barley and wheat. In this method, a
BAC library for an individual chromosome is the starting point
and BAC contigs are constructed from individual BACs by
identifying BACs containing overlapping fragments. Ideally
then, the BAC contigs are anchored onto a genetic or RH map
of the genome, so that the sequence data from the contig can
GENETICS OF GRAINS | Genome Mapping
be checked and interpreted by looking for markers or genes
known to be present in a particular region. The BACs constituting the minimum tiling path are then individually
sequenced by the shotgun method and assembled into a pseudomolecule providing a sequence of each chromosome.
Comparing Physical Distance to Genetic Distance
Physical maps have led to a wealth of information regarding
the physical locations of morphological traits and evolutionary
translocation breakpoints and genome-wide structure and
organization. Comparisons of the physical maps with genetic
linkage maps can reveal the physical distribution of genes and
recombination along the chromosome. For example, RFLP
probes derived from mRNA (called cDNA probes) represent
expressed genes, and thus, the physical mapping of cDNA
probes will reveal the physical locations of expressed genes.
Therefore, when sets of cDNA probes are mapped genetically as
well as physically, one can infer the relationship between
physical distances and genetic distances among the common
markers. In wheat, physical maps constructed using the chromosome deletion lines have been compared extensively to
corresponding genetic maps of the same chromosomes. This
work has revealed that genes and DNA markers tend to be
clustered in small physical segments that undergo a high
degree of recombination (Figure 6). These gene-rich regions
are separated by large gene-poor segments that undergo very
little recombination. This work has facilitated BAC contig construction of regions containing genes of interest for the purpose
of positional cloning.
In barley, physical maps generated based on translocation
breakpoints were compared to corresponding genetic linkage
maps. The results agreed with those found in wheat by deletion
mapping and showed that the barley genome consists of relatively small gene-rich regions that are hot spots for recombination interspersed among large segments that are gene-poor and
undergo very little recombination. The information obtained
by physical mapping of translocation breakpoints has facilitated the construction of BAC contigs and positional cloning of
important genes by allowing researchers to focus on the generich regions of the genome. More intricate comparisons of
physical and genetic relationships can be obtained by comparing local BAC contigs to genetic maps. The primary goal of such
experiments is to identify a large-insert clone containing a gene
of interest, but additional important information is obtained.
For example, once a physical contig map of the region is
developed, it can be compared to the genetic linkage map of
the corresponding region to calculate physical to genetic distance ratios. This is important information because recombination is known to be distributed nonrandomly throughout
the genomes of many plant species causing the physical to
genetic distance ratios to be highly variable depending on the
characteristics of the region.
Comparative Mapping
Much effort has been put forth in comparing the genomic
relationships among grasses and among members of other
9
plant families. For example, comparative mapping experiments among members of the Poaceae such as wheat, rice,
barley, rye, oat, and maize have revealed remarkable similarities in gene content and marker synteny at the chromosome
level. It is well established that DNA probes cloned from these
related species commonly identify sets of orthologous loci that
lie at approximately the same positions relative to each other
and to the centromeres. GenomeZipper-based consensus
maps, which integrate ordered gene loci from homoeologous
wheat genomes and the corresponding chromosomes of barley, Ae. tauschii, T. monococcum, and rice, have been constructed. These experiments have shown that the genomes of
barley, Ae. tauschii, and T. monococcum are essentially colinear
with that of wheat. The genomes of more distantly related
cereals such as oat, rice, and maize can be divided into linkage
blocks that have homology to corresponding segments of the
wheat genome. The degree of genomic similarities observed at
the chromosome level among grass genomes led to the notion
that information from the small genome of rice could be
directly applied to the much larger genome of wheat. However,
even though a substantial degree of synteny is observed at the
chromosome level, studies of the degree of microcolinearity
between rice and wheat show less promise for gene discovery
in wheat. Genes with conserved order across these three species
with sequenced genomes can be used to predict the order of
corresponding genes conserved in other grass species using
synteny-based analysis.
There have been exciting developments in genome mapping studies in grasses in terms of the development of highdensity genetic maps and physical maps. This was followed by
the generation of EST databases in cereals. In the recent past,
large-scale genome sequencing projects in grasses have been
successfully implemented, the list including rice, Brachypodium,
sorghum, maize, and foxtail millet. These studies provided
extensive information on the genome organization of major
cereals. Knowledge gained from the genome sequencing has
enhanced understanding of the structural and functional components of the genome for its effective utilization in genetic
improvement of cereals. Genome maps (whole-genome
sequences) of the diploid model grass Brachypodium (genome
size 272 Mb) are available, and these provide a useful resource
to study the evolution of genomes across the grasses. Among
sequenced cereal crops, rice has a smaller genome (420 Mbp)
and higher gene density as compared to other cereals; sorghum
is positioned after rice with genome size of 730 Mb, whereas
the maize genome is larger (2.3 Gb), and it has undergone
several rounds of genome duplications and is distinguishable
from its close relative, sorghum. Reference genome maps of
sorghum and foxtail millet are available, and altogether, these
reference genome maps provide a great resource to study comparative genomics in order to develop mapping information
about an orphan grass or cereals with no genomic information.
There are many software programs and databases developed to look at the syntenic relationship of the cereal genomes.
Recently, a GenomeZipper approach was developed to provide
an extensive database for studying syntenic relationships
among grass genomes (between wheat, Brachypodium, rice,
sorghum, and barley genomes). The GenomeZipper uses a
novel approach that allows systematic exploitation of conserved synteny with model grasses. For example, it allowed
10
GENETICS OF GRAINS | Genome Mapping
Genetic Map
Physical Map
Xbcd873
Xbcd873
Xabg705, Xbcd1871
38.4
0.0
8.7
5.6
2.5
2.4
0.0
4.8
2.4
2.4
14.5
2.6
1.3
0.0
3.6
1.2
3.6
1.2
1.2
2.4
9.0
12.3
Xabg705
Xbcd1871
Xwg363
Xwg363
XksuA3
Xbcd204
Xpsr128
Xbcd157
XksuH1
Xbcd1140
Xpm181
Xmwg914
Xmwg72
Xpsr120
XksuQ63
Xbcd9
Xwg583
Xcdo400
Xbcd183
tsn1
Xbcd1030
Xrz575
Xcdo948
XksuA3
Xbcd204
Xpsr128
Xbcd157
XksuH1
Xbcd1140, Xpm181
7.7
Xpm182
Xmwg914, Xmwg72, Xpsr120, XksuQ63
Xbcd9, Xwg583, Xcdo400, Xbcd183, tsn1
Xbcd1030, Xrz575, Xcdo948, Xpm182
19.6
Xpsr370
9.7
Xpsr370, Xmwg862, Xpsr580
Xmwg862
13.3
Xpsr580
Figure 6 Wheat chromosome 5B genetic linkage map (left) compared to the physical map (right). The genetic linkage map was constructed using a
backcross population and the physical map was constructed using the chromosome deletion lines of wheat. On the genetic linkage map, map units
separating markers are shown at the left, and markers are indicated on the right. On the physical map, hash marks on the left of the chromosome indicate
deletion breakpoints; black and hatched regions on the chromosome represent dark and light C-bands, respectively; and DNA markers and their
bin locations are shown to the right. Lines drawn between the maps indicate where deletion breakpoints occur relative to the genetic map. Notice that
the centromeric region is nearly void of DNA markers and recombination, while more distal regions possess most of the DNA markers and recombination.
the assignment of 86% of the total estimated (32 000) barley
genes to individual chromosome arms.
Future Mapping Prospects
The ultimate goal in map construction is the deciphering of the
linear DNA sequences of the full complement of chromosomes
of an organism and the utilization of map information in trait
mapping. The whole-genome sequence information available
in major cereals like rice, sorghum, maize, and foxtail millet
has revolutionized the understanding of the mechanisms
underlying genome evolution in these important cereal crops
as well as unraveling the important mechanisms in plant
growth and developmental processes and tolerance to various
biotic and abiotic stresses. The practical applications of the
genome maps and reference sequences are best realized only
when allelic diversity among diverse germplasm is better
understood. In crops where sequence information is not available, comparative genomics-based tools can be very useful for
providing a virtual gene order based on synteny. Sequenceready physical maps of diploid barley chromosomes, reference
GENETICS OF GRAINS | Genome Mapping
sequences of wheat chromosome 3B, and sequence-ready
physical maps of some wheat chromosomes are available.
These ongoing efforts in wheat and barley are critical for developing amenable and high-yielding crops to fight various challenges emerging in the form of new diseases and changing
environmental conditions.
Exercises and Assignments for Revision
•
•
•
•
•
What are the molecular markers?
What are the differences between genetic mapping and RH
mapping?
What are the limitations with HAPPY mapping in order to
develop a genome map?
Which cereal genomes have been sequenced?
What is comparative genome mapping?
Exercises for Readers to Explore the Topic Further
•
•
What is the status of cereal crop genome sequencing
projects?
How many wheat chromosomes are sequenced to date?
See also: Genetics of Grains: Wheat Genetics; Wheat Genomics.
11
Further Reading
Appels R, Morris R, Gill B, and May C (1998) Chromosome Biology. Boston, MA:
Kluwer Academic, p. 401.
Devos KM and Gale MD (2000) Genome relationships: The grass model in current
research. Plant Cell 12: 637–646.
Faris JD, Friebe B, and Gill BS (2002) Wheat genomics: Exploring the polyploid model.
Current Genomics 3: 577–591.
Feuillet C and Keller B (1999) High gene density is conserved at syntenic loci of small
and large grass genomes. Proceedings of the National Academy of Sciences of the
United States of America 96: 8265–8270.
Jiang J and Gill BS (1994) Nonisotopic in situ hybridization and plant genome
mapping: The first 10 years. Genome 37: 717–725.
Jiang JM and Gill BS (2006) Current status and the future of fluorescence in situ
hybridization (FISH) in plant genome research. Genome 49: 1057–1068.
Lander ES and Botstein D (1989) Mapping Mendelian factors underlying quantitative
traits using RFLP linkage maps. Genetics 121: 185–199.
Liu BH (1997) Statistical Genomics: Linkage, Mapping and QTL Analysis. Boca Raton,
FL: CRC Press.
McCarthy LC (1996) Whole genome radiation hybrid mapping. Trends in Genetics
12: 491–493.
Paterson AH (1996) Making genetic maps. In: Paterson AH (ed.) Genome Mapping in
Plants, pp. 23–39. Austin, TX: R G Landes Company.
Paux E, Sourdille P, Mackay I, and Feuillet C (2012) Sequence-based marker
development in wheat: Advances and applications to breeding. Biotechnology
Advances 30: 1071–1088.
Redei GP (1999) Genetics Manual. Singapore: World Scientific, pp. 1141.
Tanksley SD, Ganal MW, and Martin GB (1995) Chromosome landing: A paradigm for
map-based gene cloning in plants with large genomes. Trends in Genetics
11: 63–68.
Tanksley SD, Young ND, Paterson AH, and Bonierbale MW (1989) RFLP mapping in
plant breeding: New tools for an old science. Biotechnology 7: 257–263.

Similar documents