Differences in DNA Methylation Signatures Reveal Multiple

Transcription

Differences in DNA Methylation Signatures Reveal Multiple
Gastroenterology 2014;147:418–429
Differences in DNA Methylation Signatures Reveal Multiple
Pathways of Progression From Adenoma to Colorectal Cancer
Yanxin Luo,1,2 Chao-Jen Wong,2 Andrew M. Kaz,2,3,4 Slavomir Dzieciatkowski,2
Kelly T. Carter,2 Shelli M. Morris,2 Jianping Wang,1 Joseph E. Willis,5 Karen W. Makar,6
Cornelia M. Ulrich,6,7 James D. Lutterbaugh,8 Martha J. Shrubsole,9 Wei Zheng,9
Sanford D. Markowitz,8 and William M. Grady2,4
1
Department of Colorectal Surgery, The Sixth Affiliated Hospital, Sun Yat-Sen University, Guangzhou, PR China; 2Clinical
Research Division, Fred Hutchinson Cancer Research Center, Seattle, Washington; 3Research and Development Service, VA
Puget Sound Health Care System, Seattle, Washington; 4Department of Medicine, University of Washington School of
Medicine, Seattle, Washington; 5Department of Pathology, Case Medical Center, Case Comprehensive Cancer Center and
Case Western Reserve University, Cleveland, Ohio; 6Public Health Sciences Division, Fred Hutchinson Cancer Research
Center, Seattle, Washington; 7National Center for Tumor Diseases (NCT) and German Cancer Research Center (DKFZ),
University of Heidelberg, Heidelberg, Germany GDR; 8Department of Medicine and Ireland Cancer Center, Case Western
Reserve University School of Medicine and Case Medical Center, Cleveland, Ohio; and 9Division of Epidemiology, Department
of Medicine, Vanderbilt Epidemiology Center, Vanderbilt University School of Medicine, Nashville, Tennessee
See Covering the Cover synopsis on page 258.
BASIC AND
TRANSLATIONAL AT
BACKGROUND & AIMS: Genetic and epigenetic alterations
contribute to the pathogenesis of colorectal cancer (CRC). There is
considerable molecular heterogeneity among colorectal tumors,
which appears to arise as polyps progress to cancer. This heterogeneity results in different pathways to tumorigenesis.
Although epigenetic and genetic alterations have been detected in
conventional tubular adenomas, little is known about how these
affect progression to CRC. We compared methylomes of normal
colon mucosa, tubular adenomas, and colorectal cancers to
determine how epigenetic alterations might contribute to cancer
formation. METHODS: We conducted genome-wide arraybased studies and comprehensive data analyses of aberrantly
methylated loci in 41 normal colon tissue, 42 colon adenomas, and 64 cancers using HumanMethylation450 arrays.
RESULTS: We found genome-wide alterations in DNA
methylation in the nontumor colon mucosa and cancers.
Three classes of cancers and 2 classes of adenomas were
identified based on their DNA methylation patterns. The adenomas separated into classes of high-frequency methylation
and low-frequency methylation. Within the high-frequency
methylation adenoma class a subset of adenomas had
mutant KRAS. Additionally, the high-frequency methylation
adenoma class had DNA methylation signatures similar to
those of cancers with low or intermediate levels of methylation, and the low-frequency methylation adenoma class had
methylation signatures similar to that of nontumor colon
tissue. The CpG sites that were differentially methylated in
these signatures are located in intragenic and intergenic regions. CONCLUSIONS: Genome-wide alterations in DNA
methylation occur during early stages of progression of
tubular adenomas to cancer. These findings reveal heterogeneity in the pathogenesis of colorectal cancer, even at the
adenoma step of the process.
Keywords: Epigenetic Modifications; Colon Cancer; Progression;
Gene Regulation.
C
olorectal cancer (CRC) results from the progressive
accumulation of gene mutations and epigenetic alterations, which induce the initiation and progression of
these cancers. Although global DNA hypomethylation was
one of the first DNA abnormalities identified in cancer, gene
mutations were the first type of DNA alterations unequivocally demonstrated to drive cancer formation. Mutations
have been found in adenomas, the precursor neoplasms to
colon cancer, as well as in CRCs.1 The number of genetic
alterations per adenoma genome is significantly smaller
than that seen in CRCs.2 CRCs have hundreds of mutations
and often display genomic instability.3 The genomic instability most commonly found in CRC is chromosome instability, which is recognized by the presence of aneuploidy
and chromosomal gains and losses. A second form of
genomic instability, microsatellite instability, is found in
approximately 15% of CRCs and results from inactivation of
the DNA mismatch repair system.4 Genomic instability is
thought to predispose cells to mutations. A subset of the
protumorigenic mutations then drive CRC formation.3–6
Based on the identification of multiple molecular subclasses of CRC, it appears that there are multiple molecular
pathways that lead to CRC, although many aspects of these
pathways are not well defined at this time.6,7
More recently, aberrant DNA methylation has been found
in CRC and appears to also play a role in driving CRC formation.8 The average CRC genome carries thousands of alterations
in the DNA methylation status of CpG dinucleotides. These
Abbreviations used in this paper: adenoma-H, high methylator phenotype
adenoma; adenoma-L, low methylator phenotype adenoma; CIMP, CpG
island methylator phenotype; CRC, colorectal cancer; DMP, differentially
methylated probe; HM450, HumanMethylation450; M-H, high methylation
pattern; M-I, intermediate methylation pattern; M-L, low methylation
pattern; MSI, microsatellite instability.
© 2014 by the AGA Institute
0016-5085/$36.00
http://dx.doi.org/10.1053/j.gastro.2014.04.039
August 2014
419
The identification of methylated genes in aberrant crypt
foci and adenomas has suggested that epigenetic alterations
might play a role in both the initiation and progression of
conventional tubular adenomas and CRC, as well as of CIMP
CRCs.19 In addition, aberrantly methylated genes can be
found in the histologically normal colon of individuals with
an increased predisposition to CRC and in older individuals,
suggesting that aberrant DNA methylation might be one of
the earliest molecular events that initiate CRC formation.20,21 In order to further assess the role of epigenetic
alterations in the initiation and progression of CRC, we
carried out an epigenome-wide analysis of normal colon
mucosa, tubular adenomas, and CRCs. We assessed the DNA
methylation status of CpG dinucleotides in CpG islands
(defined as regions with a fraction of C and G dinucleotides
>50%), CpG shores (which are CpG-rich regions located
within 2 kb of islands), and CpG shelves (which are CpG-rich
regions flanking shores).22,23
Methods
Primary Human Tissue Samples
DNA extracted from snap-frozen tissues was used for the
studies using the HumanMethylation450 arrays. A detailed
description of the samples used is available in the
Supplementary Methods.
BASIC AND
TRANSLATIONAL AT
alterations often affect CpG dinucleotides found in promoter
regions of genes and can induce the transcriptional repression
of these genes.9,10 The underlying mechanism responsible for
the aberrant methylation seen in CRC is not known at this time.
Importantly, there is a close association between the DNA
methylation status of a locus and its chromatin structure.
Regions bound by polycomb group protein complexes are
commonly subjected to aberrant methylation in cancer.11
Analyses of established CRCs have also revealed that
there is a molecular subclass of CRCs that has an excessive number of aberrantly methylated CpG dinucleotides.12,13 These CRCs, which are designated as
having a CpG island methylator phenotype (CIMP), account for approximately 15%20% of all CRCs and often
carry mutant BRAF.14 Although virtually all CRCs have
aberrantly methylated genes, the CIMP CRCs are recognized by an exceptionally high proportion of methylated
loci.14 It appears that CIMP CRCs arise from sessile
serrated polyps, which is an observation supported not
only by the occurrence of CIMP in approximately 30% of
sessile serrated polyps, but also by shared clinical features with CIMP CRCs, such as a predisposition to occur in
the proximal colon, to occur in women, and a high frequency of mutant BRAF and microsatellite instability
(MSI).7,15,16 In contrast, CIMP is rarely found in tubular or
tubulovillous adenomas.17,18
DNA Methylation Patterns and Colorectal Cancer
Figure 1. Differentially methylated CpG probes in the normal colon mucosa between people with no history of CRC and people
with concurrent CRC. The strip-plot shows the top 65 probes that distinguish these 2 groups of tissue samples with a q value
<1E-4. The Y-axis represents the methylation level (b value) of each case ranging from 0 (completely unmethylated) to 1
(completely methylated). The corresponding CpG probe ID on the HM450 array is indicated on the X-axis. Each dot represents
the b value of a single case for each targeted CpG probe. Normal-Cancer (Normal-C), normal colon mucosa samples from
patients with concurrent CRC; Normal-Healthy (Normal-H), samples from patients without a history of CRC.
420
Luo et al
Gastroenterology Vol. 147, No. 2
BASIC AND
TRANSLATIONAL AT
Figure 2. Identification and validation of clustering of colorectal adenomas and heatmap representation of DNA methylation array data. DNA methylation status was assessed using HM450 arrays. Each column represents 1 sample and each
row represents 1 of the top 5000 most variable probes. The probes are arranged based on the order of unsupervised
hierarchical cluster analysis using a correlation distance metric and average linkage method. The DNA methylation
M-values are represented by using a color scale from green (low DNA methylation) to red (high DNA methylation). The
presence of TP53, PIK3CA, KRAS, BRAFV600E, CTNNB1, or APC mutations is indicated by a colored block (no color ¼
wild-type). Two subgroups (adenoma-L and adenoma-H) were identified using the clustering analysis on the discovery set
of 18 adenoma samples (middle panel). These results were confirmed in an independently collected validation set of 24
adenomas using the same probes identified in the discovery set of adenomas (right panel). Normal colon samples (left
panel, n ¼ 41) were used for reference.
DNA Isolation and Bisulfite Conversion
Data Access
Genomic DNA was extracted and bisulfite modified as
described previously.24
All methylation array data are available at the NCBI Gene
Expression Omnibus under accession number GSE48684.
Molecular Characterization
The CIMP status and MSI status of the CRCs were assessed
using methods as described previously.24,25 Gene mutation
status of KRAS, BRAFV600E, APC, TP53, and PIK3CA was
determined using the qBiomarker Somatic Mutation PCR
System Arrays/Human Colon Cancer (Qiagen, Valencia, CA)
following the manufacturer’s protocol, as described
previously.26
HumanMethylation450 Array
Illumina Infinium HumanMethylation450 (HM450) BeadChips (Illumina, San Diego, CA) were used for these studies.
The processing of the DNAs on the methylation arrays was
conducted in the Genomics Shared Resources at the Fred
Hutchinson Cancer Research Center according to the manufacturer’s specifications (Illumina). Data filtering and
normalization procedures are described in Supplementary
Methods.
Results
Identification and Validation of Methylated
Probes on the Human Methylation450 Arrays
That Are Differentially Methylated Between
Normal Colon, Tubular Adenomas, and
Colorectal Cancer
We and others have reported previously that results
from the HM450 BeadChips are technically robust, but that
there is a measurable false discovery rate.26,27 Therefore,
we initially conducted technical validation studies and biological validation studies of a subset of differentially methylated CpGs (n ¼ 4) found on the HM450 arrays (described
in detail in the Supplementary Methods and Results). All
of the CpG probes (n ¼ 4) that were either aberrantly
methylated in cancers or adenomas were confirmed to be
methylated by pyrosequencing. In addition, these 4 CpG
Cancer
All
Cluster 1
Adenoma (training set)
Cluster 2
Cluster 3
All
High
Adenoma (validation set)
Low
All
High
Low
n
%
n
%
n
%
n
%
n
%
n
%
n
%
n
%
n
%
n
%
Total
Age, y
<50
50–60
60–70
>70
Sex
Male
Female
Location
Proximal colon
Transverse colon
Distal colon
Rectum
Unknown
MSI/MSS
MSI
MSS
Stage
I or II
III or IV
TP53
Mutant
Wild-type
SRC
Mutant
Wild-type
PIK3CA
Mutant
Wild-type
KRAS
Mutant
Wild-type
FBXW7
Mutant
Wild-type
64
100.0
36
56.3
13
20.3
15
23.4
18
100.0
11
61.0
7
39.0
24
100.0
19
79.2
5
20.8
13
18
15
18
20.3
28.1
23.4
28.1
9
10
10
7
25.0
27.8
27.8
19.4
3
5
3
2
23.1
38.5
23.1
15.4
1
3
2
9
6.7
20.0
13.3
60.0
7
3
5
3
38.9
16.7
27.8
16.7
3
2
4
2
27.3
18.2
36.4
18.2
4
1
1
1
57.1
14.3
14.3
14.3
1
8
5
10
4.2
33.3
20.8
41.7
1
6
4
8
5.3
31.6
21.1
42.1
0
2
1
2
0.0
40.0
20.0
40.0
23
41
35.9
64.1
15
21
41.7
58.3
5
8
38.5
61.5
3
12
20.0
80.0
4
14
22.2
77.8
3
8
27.3
72.7
1
6
14.3
85.7
9
15
37.5
62.5
7
12
36.8
63.2
2
3
40.0
60.0
28
3
25
6
2
43.8
4.7
39.1
9.4
3.1
9
2
17
6
2
25.0
5.6
47.2
16.7
5.6
8
0
5
0
0
61.5
0.0
38.5
0.0
0.0
11
1
3
0
0
73.3
6.7
20.0
0.0
0.0
12
0
5
1
0
66.7
0.0
27.8
5.6
0.0
8
0
2
1
0
72.7
0.0
18.2
9.1
0.0
4
0
3
0
0
57.1
0.0
42.9
0.0
0.0
14
0
9
1
0
58.3
0.0
37.5
4.2
0.0
11
0
7
1
0
57.9
0.0
36.8
5.2
0.0
3
0
2
0
0
60.0
0.0
40.0
0.0
0.0
9
55
14.1
85.9
2
34
5.6
94.4
0
13
0.0
100.0
7
8
46.7
53.3
21
43
32.8
67.2
11
25
30.6
69.4
4
9
30.8
69.2
6
9
40.0
60.0
23
41
35.9
64.1
15
21
41.7
58.3
6
7
46.2
53.8
2
13
13.3
86.7
4
14
22.2
77.8
1
10
9.1
90.9
3
4
42.9
57.1
5
19
20.8
79.2
4
15
21.1
78.9
1
4
20.0
80.0
1
63
1.6
98.4
0
36
0.0
100.0
0
13
0.0
100.0
1
14
6.7
93.3
0
18
0.0
100.0
0
11
0.0
100.0
0
7
0.0
100.0
0
24
0.0
100.0
0
19
0.0
100.0
0
5
0.0
100.0
7
57
10.9
89.1
3
33
8.3
91.7
3
10
23.1
76.9
1
14
6.7
93.3
3
15
16.7
83.3
2
9
18.2
81.8
1
6
14.3
85.7
4
20
16.7
83.3
4
15
21.1
78.9
0
5
0.0
100.0
29
35
45.3
54.7
16
20
44.4
55.6
8
5
61.5
38.5
5
11
33.3
73.3
8
10
44.4
55.6
7
4
63.6
36.4
1
6
14.3
85.7
11
13
45.8
54.2
10
9
52.6
47.4
1
4
20.0
80.0
3
61
4.7
95.3
1
35
2.8
97.2
0
13
0.0
100.0
2
13
13.3
86.7
0
18
0.0
100.0
0
11
0.0
100.0
0
7
0.0
100.0
1
23
4.2
95.8
1
18
5.3
94.7
0
5
0.0
100.0
DNA Methylation Patterns and Colorectal Cancer
Characteristics
August 2014
Table 1.Clinical and Genetic Characteristics of DNA Methylation-Based Subtypes of CRC and Adenoma Samples
421
BASIC AND
TRANSLATIONAL AT
0.0
100.0
52.6
47.4
41.7
58.2
10
14
10
9
0
5
20.0
80.0
5.3
94.7
8.3
91.7
2
22
1
18
1
4
0.0
100.0
0
5
5.3
94.7
4.2
95.8
1
23
1
18
n
%
n
%
n
High
probes were also assessed in 2 independent collections of
samples using pyrosequencing and were shown to have the
same methylation pattern as seen in the samples run on the
methylation arrays (Supplementary Tables 1 and 2 and
Supplementary Figures 1 and 2). These results demonstrate
that the data generated from the HM450 arrays are reproducible and generalizable to colon adenomas and CRCs.
The Methylation Status in Normal Colon Mucosa
Near Concurrent Colorectal Cancer Differs From
That of Normal Colon Mucosa From Healthy
Individuals (No History of Colorectal Cancer and
No Concurrent Colorectal Cancer)
5
13
27.8
72.2
2
9
18.2
81.8
3
4
42.9
57.1
In order to assess the role of aberrant DNA methylation
in the polyp/cancer sequence, we first determined the
methylation status of normal colon mucosa using HM450
arrays, which assess the DNA methylation status of 485,577
CpG dinucleotides. Greater than 90% of the CpG islands in
the genome are assessed using the HM450 array.22 We
analyzed both the normal colon mucosa from people with
no history of colon neoplasms, who are considered to be in
an average risk group for CRC, and the normal colon mucosa
from people with concurrent CRC, who are at increased risk
of metachronous CRC.28
After filtering the data as described in the Methods section, we identified 343 differentially methylated probes using
a q value of 1E-3 (Supplementary Data, Supplementary
Table 3). As shown in Figure 1, these probes, which are
located in 65 loci, can distinguish the DNA methylation levels
in normal samples from CRC patients compared with normal
mucosa from healthy individuals (q value <1E-4). The majority of these 343 probes (86%) have higher methylation
levels in the CRC-associated mucosa compared with colon
mucosa in cancer-free individuals. Additional studies will
need to be done to determine if these probes are potential
markers of a field cancerization process.29,30
Epigenetic Alterations Are a Common
Occurrence in Colon Adenomas
2
13
CTNNB1
Mutant
Wild-type
BRAF
Mutant
Wild-type
APC
Mutant
Wild-type
MSS, microsatellite stability.
29.7
70.3
19
45
13
23
36.1
63.9
4
9
30.8
69.2
13.3
86.7
14.3
85.7
1
6
0.0
100.0
0
11
5.6
94.4
1
17
9
6
14.1
85.9
9
55
0
36
0.0
100.0
0
13
0.0
100.0
60.0
40.0
0.0
100.0
0
7
0.0
100.0
0
11
0.0
100.0
0
18
0
15
3.1
96.9
2
62
1
35
2.8
97.2
1
12
7.7
92.3
%
n
%
n
%
Characteristics
Table 1. Continued
n
All
%
n
Cluster 2
Cluster 1
Cancer
Cluster 3
n
BASIC AND
TRANSLATIONAL AT
0.0
100.0
%
n
%
%
n
High
All
Adenoma (training set)
Low
All
Adenoma (validation set)
Gastroenterology Vol. 147, No. 2
%
Luo et al
Low
422
We next assessed the methylation status of CpG dinucleotides in 18 adenomas using the HM450 arrays. Adenomas are a well-recognized transition step between
normal colon and colorectal cancer. The histologic heterogeneity of colon polyps and associated risk of CRC has been
appreciated for many years. Recently, unique molecular
features have been found in the different histologic types of
adenomas, and it has been argued that these differences
affect the likelihood of the polyps progressing to CRC.7,31
This led us to compare the methylation state of adenomas
and established CRCs to normal colon mucosa. We identified
86,460 differentially methylated probes (DMPs) with a q
value of 1E-5. Nearly 40% were hypermethylated and 60%
were hypomethylated in the adenomas compared with the
normal colon mucosa.
The DNA methylation status of CpGs located in different
classes of loci, including CpG islands, shores, and shelves as
well as promoters, gene bodies, and intergenic regions was
assessed. The HM450 array categorizes probes based on
gene regions into 3 major gene feature groups: promoter
(50 UTR, TSS200, TSS1500, and first exons), intragenic
regions (body and 30 UTR), and intergenic regions.32 We
found a higher proportion of hypomethylated probes in
nonpromoter regions (25% vs 15% in promoter regions)
and a higher proportion of hypermethylated probes in
promoter regions compared with nonpromoter regions
(Supplementary Methods; Supplementary Figure 3). Approximately 17%, 17%, and 19% of probes in CpG islands,
shores, and shelves, respectively, are differentially methylated (colon adenoma vs normal colon), suggesting that the
proportion of DMPs among these 3 classes of loci are not
significantly different.
Next, we assessed the overall DNA methylation patterns
of the adenomas. Cluster analysis of the 10,000 most variable CpG probes (2.5%) in the adenomas revealed 2 distinct
epigenotypes, which we have termed adenoma-H (high
methylator phenotype) and adenoma-L (low methylator
phenotype) (Figure 2). Of note, using leave-one-out crossvalidation, we determined that the misclassification rate of
the cluster results is 5.5%, which suggests that the adenomaH and adenoma-L groups are truly unique entities and not
simply a consequence of a chance association secondary to
multiple comparisons. Comparison of methylated CpGs between the adenoma-H and adenoma-L groups revealed 1196
differentially methylated probes (q value <1E-4;
Supplementary Data; Supplementary Table 4). Among these
DMPs, 89.7% are hypermethylated in the adenoma-H polyps
compared with the adenoma-L group. Most of the probes
that are hypomethylated (n ¼ 58/124 [47%]) in the
adenoma-H group vs adenoma-L group are located in the
intergenic or intragenic regions (Supplementary Methods;
Supplementary Figure 4). No association between DNA
methylation and polyp size or histology was found.
Interestingly, the adenoma-L polyps have a methylation
pattern similar to normal colon mucosa, and the adenoma-H
polyps are more similar to CIMP-negative CRC (Figure 3 and
Supplementary Data; Supplementary Figures 5 and 6). We
also found that KRAS mutations occur frequently in a subset of
adenoma-H polyps (n ¼ 7/11 [63.3%]), and that mutant APC,
BRAF, and PIK3CA occur in small subsets (n ¼ 5/18 [27.8%];
n ¼ 1/18 [5.6%]; n ¼ 3/18 [16.7%], respectively) of all the
adenomas. Mutations in SRC, FBXW7, or CTNNB1 were not
found in any of the adenomas (Table 1). Also, the adenoma-H
polyps with mutant KRAS exhibit a unique methylation pattern
compared with the adenoma-H polyps with wild-type KRAS.
In order to validate the discovery of unique epigenotypes
of tubular adenomas, we assessed the methylation patterns
of an independent collection of 24 adenomas using the
HM450 arrays. We used the same set of probes identified in
the discovery set of samples (Figure 2) and confirmed the
previous cluster results. We also identified an adenoma-H
and an adenoma-L subgroup, with obvious heterogeneity
existing in the adenoma-H group. Mutation profiling results
were similar in this second set of samples compared with the
discovery set of samples. We found KRAS mutations occur
frequently in a subset of adenoma-H polyps (n ¼ 10/19
[52.6%]). In addition, we found mutations in APC (n ¼ 10/24
DNA Methylation Patterns and Colorectal Cancer
423
[41.7%]), BRAF (n ¼ 2/24 [8.3%]), PIK3CA (n ¼ 4/24
[16.7%]), and CTNNB1 (n ¼ 1/24 [4.2%]), which is similar to
the frequencies found in the discovery set of samples. SRC
mutations were not found in any of the adenomas in either
set of samples. In addition, there are no significant differences in the age, sex, and location of the adenomas between
adenoma-H and adenoma-L clusters (Table 1).
Methylator Phenotype in Colorectal Cancer
After assessing the methylation status of the normal
colon mucosa and of colon tubular adenomas, we performed
comprehensive DNA methylation profiling of 64 CRCs. We
also assessed the mutation status of KRAS, BRAFV600E, APC,
SRC, FBXW7, TP53, and PIK3CA in the CRCs as we had done
with the adenomas.
Using a recursively partitioned mixture model and a
hierarchical model (hCluster) clustering approach on the
10,000 most variable CpGs, we identified 3 distinct CRC
subgroups, indicated as cluster 1 (n ¼ 26/64 [53%]), cluster
2 (n ¼ 13/64 [22%]), and cluster 3 (n ¼ 15/64 [25%]),
which we have termed methyl-low (low methylation pattern;
M-L), methyl-intermediate (intermediate methylation pattern,
M-I), and methyl-high (high methylation pattern; M-H),
respectively (Figure 3). The M-H subgroup is enriched for
CIMP-high cancers (n ¼ 11/15 [68.8%]).14 The remaining
M-H CRCs (n ¼ 4/15 [31.2%]) were CIMP-low, which were
characterized by having 12 methylated CIMP loci. Consistent with other studies, the M-H subgroup CRCs have
frequent BRAFV600E mutations (n ¼ 9/15 [60.0%]) and are
often MSI (n ¼ 7/15 [46.7%]).14 The M-H subgroup also has
a relative paucity of APC mutations (n ¼ 2/15 [13.3%]) and
TP53 mutations (n ¼ 2/15 [13.3%]) when compared with the
CRCs of the other cluster subgroups, in which there are no
BRAFV600E mutations, infrequent MSI (n ¼ 2/39 [4.1%]) and
a higher frequency of APC mutations (n ¼ 13/39 [34.7%])
and TP53 mutations (n ¼ 15/39 [42.9%]) (Table 1).
CRCs in the M-I subgroup exhibit an intermediate
methylation pattern when compared with the other 2 cluster
subgroups (Figure 3). In the M-I subgroup, mutant KRAS (n ¼
8/13 [61.5%]) and mutant APC (n ¼ 4/13 [30.8%]) were
frequent, which is similar to the frequencies seen in other
studies6,33 (Table 1). The clinicopathologic features of the M-I
subgroup suggest that this subtype of cancers corresponds to
the classifications CIMP2 or CIMP-low that have been
described.6,33 The M-L group is enriched for non-CIMP cancers
as determined by the CIMP MethyLight assay panel and has a
relatively low frequency of methylated CpGs.14 Of note, the MH, M-I, and M-L classes of CRCs identified in this study were
validated through the analysis of 2 independent collections of
CRCs, which identified the same 3 subsets of CRCs (described
in detail in the Supplementary Methods and Results).6,13
Association of Methylated Loci With Polycomb
Group Binding Sites in Colon Adenomas and
Colorectal Cancers
After identifying methylated CpGs that occur commonly
in adenomas and CRC, we next assessed which of these
methylated CpGs occur in loci commonly occupied by
BASIC AND
TRANSLATIONAL AT
August 2014
424
Luo et al
Gastroenterology Vol. 147, No. 2
BASIC AND
TRANSLATIONAL AT
Figure 3. Cluster analysis
of CRCs and heatmap
representation of DNA
methylation array data.
DNA methylation status
was
assessed
using
HM450 arrays. Each column represents 1 sample
and each row represents 1
of the top 5000 most variable probes. The DNA
methylation M-values are
represented by using a
color scale from green (low
DNA methylation) to red
(high DNA methylation).
Three subgroups were
identified by clustering and
are indicated above the
heatmap
(Methyl-Low,
Methyl-Intermediate, and
Methyl-High). The presence of TP53, SRC
PIK3CA, KRAS, FBXW7,
CTNNB1, BRAFV600E, or
APC mutations and CIMP
status, which was determined using the Weisenberger panel of CIMP
CpGs with MethyLight assays, are indicated by a
colored block (no color ¼
wild-type). Normal colon
samples (left panel, n ¼ 41)
were used for reference.
polycomb group proteins. Previous studies have demonstrated that genes whose expression is affected by methylation often are polycomb group target loci.11 We found that
55% of the hypermethylated loci (defined by q value < E-6
and a b value >.3 vs normal colon; n ¼ 756) occurred in
areas of bivalent chromatin in all 3 methylation classes of
CRCs (Figure 4), which is consistent with previously published findings.13 In addition, we found that the polycomb
group proteinmarked hypermethylated loci found in the
CRCs were also hypermethylated in the adenoma-H polyps,
but not the adenoma-L polyps (Figure 4 and Supplementary
Data; Supplementary Figure 7 and Supplementary Tables 5
and 6). These findings suggest that the adenoma-H polyps
might be more progressed toward CRCs than adenoma-L
polyps and also suggest that the aberrant methylation of
genes that are likely to be transcriptionally repressed by
this epigenetic alteration occurs early in the polyp to cancer
progression sequence.
Hypervariability of Methylation Occurs Early in the
AdenomaCarcinoma Progression Sequence
In addition to assessing the methylation status of CpGs in
the normal colon and colon neoplasms, we assessed the
inter-sample variability in methylation of the CpGs in the
polyps and CRCs. We observed higher inter-sample variability in the frequency of methylated loci across the CRCs
and adenomas as compared with the normal samples, which
had a small amount of inter-sample variability (Figure 5A
and B). Of interest, the degree of epigenetic hypervariability
found in the adenomas and CRCs is similar (Figure 5C),
suggesting the variability in methylation status occurs early
in CRC formation (Figure 5D).
DNA Methylation Patterns and Colorectal Cancer
425
BASIC AND
TRANSLATIONAL AT
August 2014
Figure 4. Venn diagram of hypermethylated genes that are polycomb group protein (PcG)-marked. (A) PcG-marked hypermethylated genes in CRC samples (419 genes). (B) PcG-marked hypermethylated genes in adenoma samples (554 genes). (C)
PcG-marked hypermethylated genes in adenoma and CRC samples (384 genes). The number in each area indicates the
number of genes in that area.
Adenomas and Colorectal Cancers With a High
Methylation Pattern Have a Unique Pattern of
Methylated Intergenic and Intragenic CpGs
Recent studies suggest that CIMP CRCs are derived from
sessile serrated adenomas,34 a CRC precursor lesion exhibiting unique morphologic features and epigenetic characteristics.35–37 Earlier studies have shown that up to 30% of
serrated adenomas have a CIMP methylation pattern and
that tubular adenomas are rarely CIMP.35 These findings
suggest that there are at least 2 unique polyp/cancer
progression sequences that can lead to CRC.7 In light of the
identification of CIMP in serrated adenomas, we carried out
a detailed assessment of the epigenome in tubular adenomas and CRCs by assessing the methylation status of
CpGs located in intergenic and intragenic regions. We
compared the methylation status of these CpGs in the M-H
426
Luo et al
Gastroenterology Vol. 147, No. 2
BASIC AND
TRANSLATIONAL AT
Figure 5. Increased intersample methylation variability in colon adenomas
and CRCs. (A) The standard deviation of 1000
randomly selected probes
in normal colon samples
and adenomas is shown.
(B) The standard deviation
of 1000 randomly selected
probes in normal colon
samples and CRCs is
shown. (C) The standard
deviation of 1000 randomly
selected probes in adenomas and CRCs is shown.
(D) The standard deviation
of 1000 randomly selected
probes in adenoma-L polyps and adenoma-H polyps is shown. The gray
solid line in each image
(from left low corner to the
up right corner) is the identity line, which indicates the
location of a 1:1 ratio between the comparison sets,
and the dashed red line indicates the best-fit linear
regression of the CpG
probes. The red line above
the gray line means the
group on the y-axis has
higher variability than the
one on the x-axis.
CRCs and M-L CRCs and found 3555 intergenic and 3676
intragenic differentially methylated probes (q value <1E-5)
(Supplementary Data, Supplementary Figure 8 and
Supplementary Table 7). When we compared the M-H to MI/M-L CRCs, we identified 3122 intragenic and 3114 intergenic DMPs (q value <1E-5, Supplementary Data,
Supplementary Table 8).
High Methylator Phenotype Adenoma Polyps
Share an Epigenetic Signature With Colorectal
Cancers of Low/Intermediate Methylation Pattern
Based on our observations of there being 2 discrete
classes of adenomas based on methylation patterns (adenoma-L and adenoma-H) and on our identification of patterns of methylated intergenic and intragenic CpGs that
distinguish M-H from M-L CRCs, we next assessed these
intergenic and intragenic CpGs in adenomas. Interestingly,
this analysis revealed that the epigenetic signature of the
adenoma-H polyp class is similar to the M-I/M-L CRCs,
whereas the adenoma-L polyp class is similar to normal
colon mucosa (Figure 6). Of note, similar results were found
in a validation set of adenomas (Supplementary Figure 9).
Multi-dimensional scaling analysis agreed with the cluster
analysis, which showed adenoma-H polyps to be similar to
M-I/M-L CRCs and adenoma-L polyps to be similar to
normal colon mucosa (Supplementary Data, Supplementary
Figure 10). These findings suggest that the adenoma-H
polyps might be the origin of M-I/M-L CRCs, and
adenoma-L polyps might be polyps that ultimately will not
progress to CRC. Additionally, these studies reveal significant heterogeneity in the epigenome of adenomas and
suggest that the epigenome might portend the fate of
tubular adenomas.
Discussion
In these studies we have found considerable genetic and
epigenetic heterogeneity among not only CRCs but also
among adenomas.6,38,39 The existence of different classes of
CRC that differ based on DNA methylation patterns was first
proposed by Issa and colleagues in 1999 when they identified a CIMP class of CRCs.8 Our studies provide additional
insight into CIMP CRCs by showing that in the high
August 2014
DNA Methylation Patterns and Colorectal Cancer
427
methylation pattern CRCs there is a low frequency of APC
mutations, consistent with findings from the TCGA,6 and
suggest CIMP CRCs might arise by a WNT independent
pathway.40 In addition, we identified a series of differentially methylated probes located in intragenic and intergenic
regions that distinguish M-H from M-I/M-L CRCs. These
probes lie in regions called tissue differential methylated
regions, and their methylation status is tissue specific,
perhaps reflecting the stem cell methylation pattern from
which the tissue is derived.23,41 These findings suggest that
the M-H CRCs might be derived from a different stem cell
precursor than M-I/M-L CRCs, which might explain the
unique clinicopathologic features of CIMP CRCs compared
with non-CIMP CRCs.14
We also analyzed the methylation status of normal colon
mucosa. Our analysis of the normal colon mucosa included
samples from individuals with concurrent CRC, who might
have a field defect in their colons that predisposes them to
adenomas and CRC, and from individuals who are cancer
free. Consistent with earlier studies that have used limited
gene panels, we observed differences in the methylation
patterns between these 2 groups.20,42,43 Additional studies
using a prospective study design will be needed to confirm
whether the methylation state in the normal colon predicts
risk for developing CRC.
BASIC AND
TRANSLATIONAL AT
Figure 6. Cluster analysis
of M-H CRCs, M-I/M-L
CRCs, and adenoma-H
adenomas using differentially methylated probes in
intragenic/intergenic
regions. Differentially methylated intragenic (A) and
intergenic (B) CpG probes
between M-H and M-I/M-L
CRCs were used in unsupervised clustering analysis of the adenoma-H
polyps and CRCs. The results show that the
methylation pattern in
adenoma-H polyps is
similar to that in M-I/M-L
CRCs, but not to that in
M-H CRCs. (C) Schematic
of concept that there are
multiple adenoma to cancer pathways, which can
be identified by DNA
methylation signatures in
the normal colon, adenomas, and cancers (see
Discussion).
Earlier studies of the methylation status of adenomas,
using small sample sizes and small panels of candidate
genes, have shown that methylation is present and heterogeneous in adenomas.15,44,45 Additionally, studies
have demonstrated that CIMP can be detected in serrated
polyps, but is rare in tubular adenomas.15,46 Using HM450
arrays and a moderately sized set of 18 tubular adenomas, we observed that aberrant DNA methylation occurs commonly in tubular adenomas. In addition, we
identified subclasses of tubular adenomas based on their
methylation status and KRAS mutation status. We studied
the intergenic and intragenic CpGs, rather than promoter
CpGs, because their methylation status appears to reflect
the epigenome of the stem cell population from which the
tumor cells are derived.47 We found that the adenoma-H
polyps have a methylation pattern that is similar to that
of M-I/M-L CRCs, and the adenoma-L polyps have a
methylation pattern that is similar to that of normal colon
epithelium. Importantly, these results were confirmed in
an independent collection of 24 adenomas. These results
suggest that adenoma-H polyps might be the precursors
for M-I/M-L CRCs. Our findings also suggest adenoma-L
polyps might have a low potential to progress to CRC
and might represent the 90% of adenomas that do not
evolve into CRC.
428
Luo et al
Our studies provide insight into the colon adenoma to
cancer progression sequence. This sequence of histologic
progression of normal colon epithelial cells to CRC is
widely believed to be driven by the serial acquisition of
gene mutations and epigenetic alterations. It appears likely
that the different subclasses of CRCs arise through
different polyp to cancer progression sequences with CIMP
CRCs arising from serrated polyps and non-CIMP CRCs
arising from tubular adenomas.48 Our studies demonstrate
there are subclasses of adenomas recognized by their
epigenotype and KRAS mutation status and raise the possibility that one of these subclasses, adenoma-H polyps,
might be the precursors for CRCs with a low/intermediate
methylation pattern.
In summary, our results confirm those of earlier studies
that have shown aberrant DNA methylation occurs early in
CRC formation and might predispose histologically normal
tissue to become neoplastic. In addition, we have found that
the epigenetic state of the adenomas might influence the
propensity of the adenoma to undergo malignant transformation and portend the epigenotype of the resulting CRC.
Supplementary Material
Note: To access the supplementary material accompanying
this article, visit the online version of Gastroenterology at
www.gastrojournal.org, and at http://dx.doi.org/10.1053/
j.gastro.2014.04.039.
BASIC AND
TRANSLATIONAL AT
References
1. Gal-Yam EN, Egger G, Iniguez L, et al. Frequent
switching of polycomb repressive marks and DNA
hypermethylation in the pc3 prostate cancer cell line.
Proc Natl Acad Sci U S A 2008;105:12979–12984.
2. Houseman EA, Christensen BC, Yeh RF, et al. Modelbased clustering of DNA methylation array data: a
recursive-partitioning algorithm for high-dimensional
data arising as a mixture of beta distributions. BMC
Bioinformatics 2008;9:365.
3. Vogelstein B, Papadopoulos N, Velculescu VE, et al.
Cancer genome landscapes. Science 2013;339:
1546–1558.
4. Grady WM, Carethers JM. Genomic and epigenetic
instability in colorectal cancer pathogenesis. Gastroenterology 2008;135:1079–1099.
5. Wood LD, Parsons DW, Jones S, et al. The genomic
landscapes of human breast and colorectal cancers.
Science 2007;318:1108–1113.
6. Cancer Genome Atlas N. Comprehensive molecular
characterization of human colon and rectal cancer. Nature 2012;487:330–337.
7. Jass JR. Molecular heterogeneity of colorectal cancer:
Implications for cancer control. Surg Oncol 2007;16(Suppl 1):S7–S9.
8. Toyota M, Ho C, Ahuja N, et al. Identification of differentially methylated sequences in colorectal cancer by
methylated cpg island amplification. Cancer Res 1999;
59:2307–2312.
Gastroenterology Vol. 147, No. 2
9. Issa JP, Ottaviano YL, Celano P, et al. Methylation of the
oestrogen receptor cpg island links ageing and neoplasia
in human colon. Nat Genet 1994;7:536–540.
10. van Engeland M, Derks S, Smits KM, et al. Colorectal
cancer epigenetics: complex simplicity. J Clin Oncol
2011;29:1382–1391.
11. Ohm JE, McGarvey KM, Yu X, et al. A stem cell-like
chromatin pattern may predispose tumor suppressor
genes to DNA hypermethylation and heritable silencing.
Nat Genet 2007;39:237–242.
12. Issa JP. Cpg island methylator phenotype in cancer. Nat
Rev Cancer 2004;4:988–993.
13. Hinoue T, Weisenberger DJ, Lange CP, et al. Genomescale analysis of aberrant DNA methylation in colorectal
cancer. Genome Res 2012;22:271–282.
14. Weisenberger DJ, Siegmund KD, Campan M, et al.
Cpg island methylator phenotype underlies sporadic
microsatellite instability and is tightly associated with
braf mutation in colorectal cancer. Nat Genet 2006;
38:787–793.
15. Burnett-Hartman AN, Newcomb PA, Potter JD, et al.
Genomic aberrations occurring in subsets of serrated
colorectal lesions but not conventional adenomas. Cancer Res 2013;73:2863–2872.
16. Spring KJ, Zhao ZZ, Karamatic R, et al. High prevalence
of sessile serrated adenomas with braf mutations: a
prospective study of patients undergoing colonoscopy.
Gastroenterology 2006;131:1400–1407.
17. Rex DK, Ahnen DJ, Baron JA, et al. Serrated lesions of
the colorectum: review and recommendations from an
expert panel. Am J Gastroenterol 2012;107:1315–1329.
quiz 1314, 1330.
18. Kim YH, Kakar S, Cun L, et al. Distinct CpG island
methylation profiles and braf mutation status in serrated
and adenomatous colorectal polyps. Int J Cancer 2008;
123:2587–2593.
19. Lao VV, Grady WM. Epigenetics and colorectal cancer.
Nat Rev Gastroenterol Hepatol 2011;8:686–700.
20. Shen L, Kondo Y, Rosner GL, et al. Mgmt promoter
methylation and field defect in sporadic colorectal
cancer. J Natl Cancer Inst 2005;97:1330–1338.
21. Ahuja N, Li Q, Mohan AL, et al. Aging and DNA methylation in colorectal mucosa and cancer. Cancer Res 1998;
58:5489–5494.
22. Sandoval J, Heyn H, Moran S, et al. Validation of a DNA
methylation microarray for 450,000 CpG sites in the
human genome. Epigenetics 2011;6:692–702.
23. Irizarry RA, Ladd-Acosta C, Wen B, et al. The human
colon cancer methylome shows similar hypo- and
hypermethylation at conserved tissue-specific CpG island shores. Nat Genet 2009;41:178–186.
24. Luo Y, Tsuchiya KD, Il Park D, et al. Ret is a potential
tumor suppressor gene in colorectal cancer. Oncogene
2013;32:2037–2047.
25. Grady WM, Rajput A, Myeroff L, et al. Mutation of the
type II transforming growth factor-beta receptor is
coincident with the transformation of human colon
adenomas to malignant carcinomas. Cancer Res 1998;
58:3101–3104.
26. Luo Y, Kaz AM, Kanngurn S, et al. NTRK3 is a potential
tumor suppressor gene commonly inactivated by
epigenetic mechanisms in colorectal cancer. PLoS Genet
2013;9:e1003552.
27. Busche S, Ge B, Vidal R, et al. Integration of highresolution methylome and transcriptome analyses to
dissect epigenomic changes in childhood acute
lymphoblastic
leukemia.
Cancer
Res
2013;
73:4323–4336.
28. Saini SD, Kim HM, Schoenfeld P. Incidence of advanced
adenomas at surveillance colonoscopy in patients with a
personal history of colon adenomas: a meta-analysis
and systematic review. Gastrointest Endosc 2006;64:
614–626.
29. Shen L, Issa JP. Epigenetics in colorectal cancer. Curr
Opin Gastroenterol 2002;18:68–73.
30. Grady WM, Parkin RK, Mitchell PS, et al. Epigenetic
silencing of the intronic microrna hsa-miR-342 and its
host gene evl in colorectal cancer. Oncogene 2008;
27:3880–3888.
31. Yagi K, Akagi K, Hayashi H, et al. Three DNA methylation
epigenotypes in human colorectal cancer. Clin Cancer
Res 2010;16:21–33.
32. Dedeurwaerder S, Defrance M, Calonne E, et al.
Evaluation of the infinium methylation 450k technology.
Epigenomics 2011;3:771–784.
33. Shen L, Toyota M, Kondo Y, et al. Integrated genetic and
epigenetic analysis identifies three different subclasses
of colon cancer. Proc Natl Acad Sci U S A 2007;
104:18654–18659.
34. Leggett B, Whitehall V. Role of the serrated pathway in
colorectal cancer pathogenesis. Gastroenterology 2010;
138:2088–2100.
35. Yamamoto E, Suzuki H, Yamano HO, et al. Molecular
dissection of premalignant colorectal lesions reveals
early onset of the cpg island methylator phenotype. Am J
Pathol 2012;181:1847–1861.
36. Jass JR. Serrated adenoma of the colorectum and the
DNA-methylator phenotype. Nat Clin Pract Oncol 2005;
2:398–405.
37. Gaiser T, Meinhardt S, Hirsch D, et al. Molecular patterns
in the evolution of serrated lesion of the colorectum. Int J
Cancer 2013;132:1800–1810.
38. Markowitz SD, Bertagnolli MM. Molecular origins of
cancer: molecular basis of colorectal cancer. N Engl J
Med 2009;361:2449–2460.
39. Lugli A, Jass JR. Types of colorectal adenoma. Verh
Dtsch Ges Pathol 2006;90:18–24.
40. Kawasaki T, Nosho K, Ohnishi M, et al. Correlation of
beta-catenin
localization
with
cyclooxygenase-2
expression and cpg island methylator phenotype (cimp)
in colorectal cancer. Neoplasia 2007;9:569–577.
41. Doi A, Park IH, Wen B, et al. Differential methylation of
tissue- and cancer-specific cpg island shores distinguishes human induced pluripotent stem cells, embryonic stem cells and fibroblasts. Nat Genet 2009;
41:1350–1353.
DNA Methylation Patterns and Colorectal Cancer
429
42. Worthley DL, Whitehall VL, Buttenshaw RL, et al. DNA
methylation within the normal colorectal mucosa is
associated with pathway-specific predisposition to
cancer. Oncogene 2010;29:1653–1662.
43. Ushijima T. Epigenetic field for cancerization. J Biochem
Mol Biol 2007;40:142–150.
44. Kim YH, Petko Z, Dzieciatkowski S, et al. Cpg island
methylation of genes accumulates during the adenoma
progression step of the multistep pathogenesis of colorectal
cancer. Genes Chromosomes Cancer 2006;45:781–789.
45. Kim KM, Lee EJ, Ha S, et al. Molecular features of
colorectal hyperplastic polyps and sessile serrated adenoma/polyps from korea. Am J Surg Pathol 2011;
35:1274–1286.
46. O’Brien MJ, Yang S, Mack C, et al. Comparison of microsatellite instability, cpg island methylation phenotype,
braf and kras status in serrated polyps and traditional
adenomas indicates separate pathways to distinct
colorectal carcinoma end points. Am J Surg Pathol 2006;
30:1491–1501.
47. Maunakea AK, Nagarajan RP, Bilenky M, et al.
Conserved role of intragenic DNA methylation in regulating alternative promoters. Nature 2010;466:253–257.
48. Bettington M, Walker N, Clouston A, et al. The serrated
pathway to colorectal carcinoma: Current concepts and
challenges. Histopathology 2013;62:367–386.
Author names in bold designate shared co-first authorship.
Received October 6, 2013. Accepted April 23, 2014.
Reprint requests
Address requests for reprints to: Yanxin Luo, MD, PhD, Department of
Colorectal Surgery, The Sixth Affiliated Hospital, Sun Yat-Sen University,
Guangzhou 510655, PR China. e-mail: [email protected]; fax: (206)
667-2917; or William M. Grady, MD, Fred Hutchinson Cancer Research
Center, 1100 Fairview Avenue North, D4-100, Seattle, Washington 98109.
e-mail: [email protected]; fax: (206) 667-2917.
Acknowledgments
The authors would like to acknowledge the outstanding service provided by the
Genomics Shared Resources (FHCRC) and the Cooperative Human Tissue
Network for the tissues they provided. We also thank the ColoCare team
(Chris Velicer, Rebecca Holmes, Stephanie Zschäbitz, Kathy Vickers, Rachel
Wilbur, Shannon Rush, and Sara Bates and others) for their assistance on
these studies. In addition, we would like to thank Toshinori Hinoue and Peter
W. Laird for kindly sharing their data. Finally, we would like to thank the
study participants who kindly agreed to provide tissues for analysis.
Data access: All methylation array data are available at the NCBI Gene
Expression Omnibus under accession number GSE48684. The editors and
reviewers can access the private dataset by the following link: http://
www.ncbi.nlm.nih.gov/geo/query/acc.cgi?token¼djghdmayauuisvo&acc¼
GSE48684.
Funding
This research reported in this publication was supported by the National
Cancer Institute of the National Institutes of Health under award number
RO1CA115513,
P30CA15704,
UO1CA152756,
U54CA143862,
and
P01CA077852 (WMG), P50CA95103 (ZW), R01CA121060, P30CA68485,
K07CA122451 (MJS). The content is solely the responsibility of the authors,
and does not necessarily represent the official views of the National
Institutes of Health. Support for these studies was also provided by a
Burroughs Wellcome Fund Translational Research Award for Clinician
Scientist (WMG), Program of Introducing Talents of Discipline to Universities
of China (B12003, JW) and International Science & Technology Cooperation
Program of China (2011DFA32570, JW), National Natural Science Foundation
of China (81201920, YL); and 5P50CA150964 (SDM).
BASIC AND
TRANSLATIONAL AT
August 2014
429.e1
Luo et al
Supplementary Methods
Materials
The tissue samples were collected by endoscopic biopsy
for the normal colon mucosa from cancer-free study subjects (19 cases). Normal colon from patients with cancer (22
cases) was obtained from the normal-appearing resection
margin of a surgical resection specimen. The samples were
obtained from the following sources: the University of
Washington Medical Center and Fred Hutchinson Cancer
Research Center through the ColoCare consortium (Seattle,
WA), Vanderbilt University Medical Center and the Department of Veterans Affairs Tennessee Valley Health Care
System (Nashville, TN), and the University Hospitals of
Cleveland (Cleveland, OH) following protocols approved by
the Institutional Review Board of each institution. Snapfrozen tissue was also provided by the Cooperative Human Tissue Network. All samples used for the methylation
arrays were reviewed by a gastrointestinal pathologist to
confirm the diagnosis and ensure that the cancer samples
were >60% tumor epithelium.
Data Filtering, Normalization, and
Differential Analysis
For the Illumina Infinium DNA methylation data analysis,
we first removed unreliable probes using a detection P value
>.05.1 The normalization process was conducted using the
Bioconductor minfi package, which includes Illumina background level correction, color adjustment, and subsetquantile within array normalization. Note that subsetquantile within array normalization is a newly developed
normalization scheme specifically designed for Illumina
Infinium HM450 array data to account for the difference
between the Infinium I and Infinium II probe designs.2 In
order to reduce the background signal effects and biases, we
also filtered out probes that contain SNPs (Target ID starts
with rs on the arrays), that assess non-CpG sites (Target IDs
start with ch), that are chromosome Xassociated and that
have a SNP present within 10 bp from the query sites. We
also applied the ComBat algorithm to assess and correct for
batch effects across all array runs.3 The differential analyses
comparing variable subgroups were performed based on the
M-values converted from b values that were generated by
the Illumina Infinium DNA methylation arrays. We computed
a refined F-statistics to quantify the difference in DNA
methylation M-values for each probe between 2 different
sample sets. This statistic is based on an empirical Bayes
approach in which the estimated sample error variance is
scaled toward a pooled estimate.4 To account for multiple
comparisons, we used false discover rate q value to determine the significance of differentially methylated positions
and reported lists of probes associated with q < 1E-5.5
Unsupervised Clustering Analyses
The clustering and subgroups classification were conducted using unsupervised hierarchical clustering provided
by R-based function hClust. To identify the subgroups of
CRC and adenoma, we used M-values from the 10,000
Gastroenterology Vol. 147, No. 2
(2.5%) probes that showed the greatest variability across
the samples in each group. Leave-One-Out cross validation
was used to predict the error rate of the clustering results.
Identification of Potentially Silenced
Methylated Genes
As compared with normal colon samples, we first
identified lists of promoter associated DMPs (q < 1E-5) in
adenomas and cancers. The genes that were found to have
at least 1 common probe among these 2 lists of DMPs and
that harbored embryonic stem-cellassociated bivalent
domains were determined as being potentially methylated
and silenced genes.
Classification of Methyl-High Colorectal Cancer
Stem-CellAssociated Methylated CpGs
We first analyzed the CpG probes that are uniquely
methylated in Methyl-High CRCs relative to MethylIntermediate/Low CRCs and to normal samples (q < 1E5). This set of CpG probes comprises the Methyl-High-CRC
specific methylated loci. Next, we selected probes located
in the intergenic/intragenic regions, whose methylation
status has been proposed to reflect that of the cell of origin,
as discussed here. The final set of probes was considered as
Methyl-High CRC stem cellassociated probes/regions.
Identify of Adenoma-H/Methyl-Intermediate/
Low-Associated CpG Loci
We initially selected a set of methylated CpG probes that
discriminated between M-H and M-I/M-L CRCs and that
were located in intergenic regions (q < 1E-5) because these
probes have been suggested to indicate the cell of origin of
the tissue or tumor. We then used these probes to conduct
an unsupervised clustering and heatmap analysis of the
CRCs and adenoma-H polyps to assess the relationship between these 3 groups of tumors with regards to the
methylation levels of these probes.
Validation of the Results Generated From the
DNA Methylation Microarrays
In the current study, we carried out 2 levels of validation. For technical validation of the array results, we used
Pyrosequencing assays to assess a subset of the differentially probes identified on the HM450 DNA methylation
microarrays. We also conducted clinical validation studies
in which assessed the methylated CpGs identified in the
discovery set of samples in an independent set of samples.
For the technical validation studies, we first assessed the
accuracy of the pyrosequencing assay. We designed PyroMark assays that targeted the specific CpG dinucleotide
identified on the arrays and first assessed a sample set that
consisted of serial dilutions of a 100% methylated control
sample with a 100% unmethylated control DNA sample
(#59655 for methyl and #59665 for unmethyl EpiTect
Control DNA; Qiagen, Valencia, CA) (100%, 75%, 50%,
25%, and 0). Upon confirmation of the accuracy of the
assay, we then used the PyroMark assay to technically
August 2014
validate the HumanMethylation450 results by analyzing
the identical samples run on the HM450 arrays. We then
used the validated PyroMark assay to analyze the candidate genes in an independent collection of samples for
clinical validation of the methylated CpGs.
Validation of Methylated CpGs That Classify the
Epigenotype of the Colorectal Cancers
In order to validate the methylated CpG sets that characterize the M-High and M-I/M-L CRCs, we compared the
classification results of our discovery set with those of an
existing HM27 dataset (n ¼ 125 CRC samples; GSE250621).
Of note, out of these samples, 103 cases were CIMP-negative
and 22 were CIMP-positive as determined by MethyLight
assays.1 We performed unsupervised clustering analysis
using the 10,000 most variable probes identified in the
discovery set and that were also included in the HumanMethylation27 platform. We also analyzed a second set of
CRCs whose methylomes were determined as part of the
TCGA.
Correlation Between DNA Methylation Level and
Gene Expression Level
A previous study has shown that <10% of hypermethylated promoter associated CpGs associate with
decreased gene expression when compared with samples
with unmethylated CpGs.1 We used expression profiling
datasets available for the samples described in the previous
section to assess for a correlation between DNA methylation
status and expression levels.
Supplementary Results
DNA Methylation Patterns and Colorectal Cancer 429.e2
sets of samples (n ¼ 105 and n ¼ 119). cg21101720
methylation was detected in 05.7% of normal samples,
60%75% of adenomas, and 69%80% cancer samples
(the primers used for pyrosequencing are described in details in Supplementary Table 2). These results demonstrate
that differentially methylated probes identified using the
discovery set of samples and results the HM450 arrays are
reliable and reproducible.6,7
The Epigenotype Classifications of Colorectal
Cancers Were Validated Using an Independent
Collection of Samples
We identified the 506 most variable probes in our
dataset that were also on the HumanMethylation27 platform. This methylation-based cluster analysis applied using
these probes the probes from the discovery set classified
125 CRCs into 3 groups (cluster 1, 2 and 3 showed in
Supplementary Figure 11A), which corresponded to M-L, MI, and M-H subtypes in our discovery dataset. In addition,
the cluster 3 CRCs were enriched with CIMP-positive CRCs
as determined by MethyLight (P ¼ 2E-12), and were also
enriched with BRAFV600E mutation (P ¼ 3E-11), which is
consistent with our findings using the discovery set of
samples. More important, as we described previously, we
have performed the clustering analyses on TCGA samples
(batch no. 99, 39 CRCs), which were assessed using HM450
arrays. We used the same 10,000 variable probes identified
in the discovery set of samples and applied them to the
TCGA CRCs. As shown in Supplementary Figure 11B, 3
clusters were again identified based on the DNA methylation
profiling data. Taken together, these results suggest that the
classification of CRC epigenotypes based on the discovery
set of samples is reproducible and generalizable.
Differentially Methylated Probes Generated From
the HM450 Arrays Are Reliable and Reproducible
DNA Promoter Methylation Correlates With
Absence of Gene Expression
We first designed PyroMark assays targeting the
following specific CpG dinucleotides: cg21101720,
cg14215472, cg26532627, or cg03537386. The first 3
probes are hypermethylated in adenoma/cancer, and the
last probe is hypomethylated in adenoma/cancers based on
the DNA methylation array data. Analysis of the methylated
DNA serial dilution sample sets demonstrated that the
pyrosequencing assays can accurately determine the percentage of methylated DNA present in a sample
(Supplementary Figure 1). We then applied these assays to
the samples that were analyzed with the DNA methylation
arrays. The results from the pyrosequencing assays correlated closely with the results from the methylation arrays
(Supplementary Figure 2). In addition, we assessed the
methylation status of the differentially methylated probes
using the pyrosequencing assays on DNA from an independent collection of samples (Supplementary Table 1). The
results on the validation set of samples were very similar to
those in the discovery set of samples. For instance,
cg21101720 was identified to be significantly hypermethylated in cancer/adenoma in 2 independent validation
Using publically available gene expression profiling data
(GSE25070), we found approximately 53% of downregulated genes are associated with hypermethylated CpGs
(206 genes in total), presumably due to the expanded
coverage of the methylation microarray platform
(27K/450K). Of note, those 206 genes are targeted by
1594 hypermethylated probes in promoter and intragenic
regions. Interestingly, 72% (1152 probes) of these hypermethylated probes are located in promoter regions and 79%
(1258 probes) are in CpG islands. This suggests that the
hypermethylation in promoter regions and CpG islands are
very likely correlated with lack of gene expression, especially when compared with CpGs in intragenic regions and
non-CpG islands. In addition, we found 37% of genes with
high expression (269 genes) are associated with 610
hypomethylated probes. Nearly 60% of these probes are
located in intragenic regions and approximately 80% are in
non-CpG islands, including shores and shelves, which suggests that the hypomethylation in non-CpG islands likely
correlates with down-regulation of gene expression, especially when compared with CpGs in CpG islands.
429.e3
Luo et al
References
1. Hinoue T, Weisenberger DJ, Lange CP, et al. Genomescale analysis of aberrant DNA methylation in colorectal
cancer. Genome Res 2012;22:271–282.
2. Maksimovic J, Gordon L, Oshlack A. Swan: subsetquantile within array normalization for illumina infinium
humanmethylation450 beadchips. Genome Biol 2012;
13:R44.
3. Johnson WE, Li C, Rabinovic A. Adjusting batch effects
in microarray expression data using empirical bayes
methods. Biostatistics 2007;8:118–127.
4. Smyth GK. Linear models and empirical bayes methods
for assessing differential expression in microarray
Gastroenterology Vol. 147, No. 2
experiments. Stat Appl Genet Mol Biol 2004;3: Article3.
Epub 2004 Feb 12.
5. Storey JD. The positive false discovery rate: a bayesian
interpretation and the q-value. Ann Stat 2003;
31:2013–2035.
6. Busche S, Ge B, Vidal R, et al. Integration of highresolution methylome and transcriptome analyses to
dissect epigenomic changes in childhood acute
lymphoblastic
leukemia.
Cancer
Res
2013;
73:4323–4336.
7. Luo Y, Kaz AM, Kanngurn S, et al. Ntrk3 is a potential
tumor suppressor gene commonly inactivated by
epigenetic mechanisms in colorectal cancer. PLoS
Genet 2013;9:e1003552.
August 2014
Supplementary Figure 1. Development of accurate PyroMark pyrosequencing assays to quantify the percentage of
methylation of specific candidate CpG dinucleotides identified with the HM450 arrays. PyroMark Assay Design software
(version 2.0; Qiagen) was used to design assays for pyrosequencing four CpG dinucleotides: cg21101720, cg14215472,
cg26532627, and cg03537386, which were found to be
differentially methylated between normal colon and colon
neoplasms. The technical accuracy of each assay was tested
using defined mixtures of standard EpiTect methyl/unmethyl
control DNA. The following DNA mixtures were used for each
assay: 100%, 75%, 50%, 25%, and 0% methylated DNA.
The pyrosequencing results correlated well with the known
DNA methylation content in the defined mixtures (r2 ¼ 0.9994
for cg21101720, 0.9981 for cg14215472, 0.9989 for
cg26532627, and 0.9907 for cg03537386; all P < .0001).
DNA Methylation Patterns and Colorectal Cancer 429.e4
Supplementary Figure 2. Technical validation of methylation
status of CpGs identified as differentially methylated CpGs on
the HM450 arrays. The pyrosequencing assays that were
developed in Supplementary Figure 1 were used to assess
the percentage of methylated CpG alleles at cg21101720,
cg14215472, cg26532627, and cg03537386 in the same
samples that were run on the HM450 arrays. These studies
were conducted to determine the reliability of the HM450
results. Eighteen samples run on the HM450 arrays (randomly
selected based on DNA availability) were used to assess the
percentage of methylated cytosine of the cg21101720 (A) and
cg14215472 (B) probes using pyrosequencing. The relative
methylation levels measured by HM450 correlated reasonably well with the percentage of methylation determined by
pyrosequencing (r2 ¼ 0.786 for cg21101720, and 0.893 for
cg14215472, both P < .0001).
429.e5
Luo et al
Gastroenterology Vol. 147, No. 2
Supplementary Figure 3. Determination of the distribution of
the differentially methylated CpG probes when analyzed on
the basis of the type of region in which the CpG is located.
The CpG probes that are on the HM450 arrays have been
annotated to include information regarding their location in
promoter regions, intergenic regions, intragenic regions, etc.
We assessed the distribution of the location of the differentially methylated CpGs (colorectal neoplasms [adenoma and
CRCs] vs normal colon). We identified 86,460 methylated
probes whose methylation status varied significantly between
normal colon samples and colorectal adenomas and CRCs
with a q value <1E-5. Almost 40% of these probes are
hypermethylated in colorectal neoplasms. The majority of
CpG probes that are hypermethylated in colon adenomas
and CRCs are located in the promoter regions, and the majority of the hypomethylated CpG probes are located in
intergenic regions (Fisher exact test P value <.1).
Supplementary Figure 4. Heatmap of hypermethylated and
hypomethylated CpG probes (DMPs) (q value <1E-4) in
adenoma-H relative to adenoma-L polyps. The color panel on
the left of the heatmap indicates the type of region in which
the CpG is located and the color panel on the top indicates
the cluster group of the adenomas. (A) Heatmap of the 1072
hypermethylated DMPs in adenoma-H polyps vs adenoma-L
polyps. Half of the hypermethylated CpGs are located in the
promoter regions (50%). (B) Heatmap of the 124 hypomethylated DMPs. The majority of the hypomethylated CpG
probes are located in intergenic/intragenic regions (85.5%).
August 2014
Supplementary Figure 5. Clustering analysis of adenoma-L
polyps, adenoma-H polyps, and normal colon using DMPs
(adenoma-H vs adenoma-H) reveals similarity of adenoma-L
polyps and normal samples. The right major branch in the
dendrogram includes only the adenoma-L polyps and normal
colon samples, and the left branch has only adenoma-H
polyps. Both the heatmap and the clustering results indicate that the adenoma-L polyps are more like the normal
colon than the adenoma-H polyps.
DNA Methylation Patterns and Colorectal Cancer 429.e6
Supplementary Figure 6. Multidimensional scaling of pairwise distance derived from the 1000 most variable probes
across the adenoma and normal colon samples. Note that the
adenoma-H polyps (green) are clustered together on the left,
and the adenoma-L polyps (orange) are either inseparable
from the normal colon or are in the intermediate distance from
the normal cluster on the right. Normal-C, normal colon with
concurrent CRC; Normal-H, normal colon with no adenoma
or CRC.
Supplementary Figure 7. Venn diagram of the hypermethylated promoters that are PcG-marked in cancers and adenomas.
Almost all of PcG-marked hypermethylated promoters in CRCs are hypermethylated in the adenoma-H (A) polyps (99.5% of
CpGs), however, they are not hypermethylated in the adenoma-L (B) polyps (2.1% of CpGs).
429.e7
Luo et al
Supplementary Figure 8. Heatmap of DMPs showing the
DMPs in methyl-high CRCs compared with methyl-low CRCs.
Comparison of the methyl-high CRCs (M-H) to the methyl-low
CRCs (M-L) identified 3,676 DMPs (q < 1E-5) in intragenic
regions (A) and 3555 DMPs in intergenic regions (B).
Gastroenterology Vol. 147, No. 2
Supplementary Figure 9. Clustering of M-H, M-I/M-L CRCs
and the validation set of adenoma-H polyps using DMPs from
a comparison of M-H CRCs vs M-I/M-L CRCs. Using the
intragenic (A) and intergenic (B) DMPs (M-H vs M-I/M-L, q <
1E-5), the cluster dendrograms for both the intragenic and
intergenic CpGs show that most of the adenoma-H polyps
cluster with the M-I/M-L CRCs rather than with the M-H
CRCs.
August 2014
Supplementary Figure 10. Multidimensional scaling of pairwise distance derived from the 1000 most variable differentially methylated CpGs across adenoma-H polyps, M-H
CRCs, and M-I/M-L CRCs. Although a few of the M-H CRCs
(green) are dispersed throughout the space, most of them are
clustered tightly together and show a substantial distance
from the adenoma-H polyps. On the other hand, M-I/M-L
CRCs (orange) and adenoma-H polyps (purple) cluster
together more closely for CpGs located in intergenic (A) and
intragenic (B) regions.
DNA Methylation Patterns and Colorectal Cancer 429.e8
Supplementary Figure 11. Validation of methyl-high (M-H),
methyl-intermediate (M-I), and methyl-low (M-L) CRC classes
through the analysis of published methylation array data. (A)
The publically available dataset GSE25062 was accessed
through www.pubmed.com. We first identified the CpG
probes that are common between the GSE25062 dataset and
our datasets. We then applied unsupervised clustering analysis of the shared differentially methylated CpG probes. This
clustering analysis resulted in the identification of 3 subtypes
of CRCs, similar to our results using our discovery set of
CRCs. (B) We next repeated this process using data from the
HM450 methylation array studies of the TCGA colorectal
cancers. We again identified 3 subtypes of CRCs using the
10,000 most variable CpG probes identified in our discovery
set of CRCs.