Differences in DNA Methylation Signatures Reveal Multiple
Transcription
Differences in DNA Methylation Signatures Reveal Multiple
Gastroenterology 2014;147:418–429 Differences in DNA Methylation Signatures Reveal Multiple Pathways of Progression From Adenoma to Colorectal Cancer Yanxin Luo,1,2 Chao-Jen Wong,2 Andrew M. Kaz,2,3,4 Slavomir Dzieciatkowski,2 Kelly T. Carter,2 Shelli M. Morris,2 Jianping Wang,1 Joseph E. Willis,5 Karen W. Makar,6 Cornelia M. Ulrich,6,7 James D. Lutterbaugh,8 Martha J. Shrubsole,9 Wei Zheng,9 Sanford D. Markowitz,8 and William M. Grady2,4 1 Department of Colorectal Surgery, The Sixth Affiliated Hospital, Sun Yat-Sen University, Guangzhou, PR China; 2Clinical Research Division, Fred Hutchinson Cancer Research Center, Seattle, Washington; 3Research and Development Service, VA Puget Sound Health Care System, Seattle, Washington; 4Department of Medicine, University of Washington School of Medicine, Seattle, Washington; 5Department of Pathology, Case Medical Center, Case Comprehensive Cancer Center and Case Western Reserve University, Cleveland, Ohio; 6Public Health Sciences Division, Fred Hutchinson Cancer Research Center, Seattle, Washington; 7National Center for Tumor Diseases (NCT) and German Cancer Research Center (DKFZ), University of Heidelberg, Heidelberg, Germany GDR; 8Department of Medicine and Ireland Cancer Center, Case Western Reserve University School of Medicine and Case Medical Center, Cleveland, Ohio; and 9Division of Epidemiology, Department of Medicine, Vanderbilt Epidemiology Center, Vanderbilt University School of Medicine, Nashville, Tennessee See Covering the Cover synopsis on page 258. BASIC AND TRANSLATIONAL AT BACKGROUND & AIMS: Genetic and epigenetic alterations contribute to the pathogenesis of colorectal cancer (CRC). There is considerable molecular heterogeneity among colorectal tumors, which appears to arise as polyps progress to cancer. This heterogeneity results in different pathways to tumorigenesis. Although epigenetic and genetic alterations have been detected in conventional tubular adenomas, little is known about how these affect progression to CRC. We compared methylomes of normal colon mucosa, tubular adenomas, and colorectal cancers to determine how epigenetic alterations might contribute to cancer formation. METHODS: We conducted genome-wide arraybased studies and comprehensive data analyses of aberrantly methylated loci in 41 normal colon tissue, 42 colon adenomas, and 64 cancers using HumanMethylation450 arrays. RESULTS: We found genome-wide alterations in DNA methylation in the nontumor colon mucosa and cancers. Three classes of cancers and 2 classes of adenomas were identified based on their DNA methylation patterns. The adenomas separated into classes of high-frequency methylation and low-frequency methylation. Within the high-frequency methylation adenoma class a subset of adenomas had mutant KRAS. Additionally, the high-frequency methylation adenoma class had DNA methylation signatures similar to those of cancers with low or intermediate levels of methylation, and the low-frequency methylation adenoma class had methylation signatures similar to that of nontumor colon tissue. The CpG sites that were differentially methylated in these signatures are located in intragenic and intergenic regions. CONCLUSIONS: Genome-wide alterations in DNA methylation occur during early stages of progression of tubular adenomas to cancer. These findings reveal heterogeneity in the pathogenesis of colorectal cancer, even at the adenoma step of the process. Keywords: Epigenetic Modifications; Colon Cancer; Progression; Gene Regulation. C olorectal cancer (CRC) results from the progressive accumulation of gene mutations and epigenetic alterations, which induce the initiation and progression of these cancers. Although global DNA hypomethylation was one of the first DNA abnormalities identified in cancer, gene mutations were the first type of DNA alterations unequivocally demonstrated to drive cancer formation. Mutations have been found in adenomas, the precursor neoplasms to colon cancer, as well as in CRCs.1 The number of genetic alterations per adenoma genome is significantly smaller than that seen in CRCs.2 CRCs have hundreds of mutations and often display genomic instability.3 The genomic instability most commonly found in CRC is chromosome instability, which is recognized by the presence of aneuploidy and chromosomal gains and losses. A second form of genomic instability, microsatellite instability, is found in approximately 15% of CRCs and results from inactivation of the DNA mismatch repair system.4 Genomic instability is thought to predispose cells to mutations. A subset of the protumorigenic mutations then drive CRC formation.3–6 Based on the identification of multiple molecular subclasses of CRC, it appears that there are multiple molecular pathways that lead to CRC, although many aspects of these pathways are not well defined at this time.6,7 More recently, aberrant DNA methylation has been found in CRC and appears to also play a role in driving CRC formation.8 The average CRC genome carries thousands of alterations in the DNA methylation status of CpG dinucleotides. These Abbreviations used in this paper: adenoma-H, high methylator phenotype adenoma; adenoma-L, low methylator phenotype adenoma; CIMP, CpG island methylator phenotype; CRC, colorectal cancer; DMP, differentially methylated probe; HM450, HumanMethylation450; M-H, high methylation pattern; M-I, intermediate methylation pattern; M-L, low methylation pattern; MSI, microsatellite instability. © 2014 by the AGA Institute 0016-5085/$36.00 http://dx.doi.org/10.1053/j.gastro.2014.04.039 August 2014 419 The identification of methylated genes in aberrant crypt foci and adenomas has suggested that epigenetic alterations might play a role in both the initiation and progression of conventional tubular adenomas and CRC, as well as of CIMP CRCs.19 In addition, aberrantly methylated genes can be found in the histologically normal colon of individuals with an increased predisposition to CRC and in older individuals, suggesting that aberrant DNA methylation might be one of the earliest molecular events that initiate CRC formation.20,21 In order to further assess the role of epigenetic alterations in the initiation and progression of CRC, we carried out an epigenome-wide analysis of normal colon mucosa, tubular adenomas, and CRCs. We assessed the DNA methylation status of CpG dinucleotides in CpG islands (defined as regions with a fraction of C and G dinucleotides >50%), CpG shores (which are CpG-rich regions located within 2 kb of islands), and CpG shelves (which are CpG-rich regions flanking shores).22,23 Methods Primary Human Tissue Samples DNA extracted from snap-frozen tissues was used for the studies using the HumanMethylation450 arrays. A detailed description of the samples used is available in the Supplementary Methods. BASIC AND TRANSLATIONAL AT alterations often affect CpG dinucleotides found in promoter regions of genes and can induce the transcriptional repression of these genes.9,10 The underlying mechanism responsible for the aberrant methylation seen in CRC is not known at this time. Importantly, there is a close association between the DNA methylation status of a locus and its chromatin structure. Regions bound by polycomb group protein complexes are commonly subjected to aberrant methylation in cancer.11 Analyses of established CRCs have also revealed that there is a molecular subclass of CRCs that has an excessive number of aberrantly methylated CpG dinucleotides.12,13 These CRCs, which are designated as having a CpG island methylator phenotype (CIMP), account for approximately 15%20% of all CRCs and often carry mutant BRAF.14 Although virtually all CRCs have aberrantly methylated genes, the CIMP CRCs are recognized by an exceptionally high proportion of methylated loci.14 It appears that CIMP CRCs arise from sessile serrated polyps, which is an observation supported not only by the occurrence of CIMP in approximately 30% of sessile serrated polyps, but also by shared clinical features with CIMP CRCs, such as a predisposition to occur in the proximal colon, to occur in women, and a high frequency of mutant BRAF and microsatellite instability (MSI).7,15,16 In contrast, CIMP is rarely found in tubular or tubulovillous adenomas.17,18 DNA Methylation Patterns and Colorectal Cancer Figure 1. Differentially methylated CpG probes in the normal colon mucosa between people with no history of CRC and people with concurrent CRC. The strip-plot shows the top 65 probes that distinguish these 2 groups of tissue samples with a q value <1E-4. The Y-axis represents the methylation level (b value) of each case ranging from 0 (completely unmethylated) to 1 (completely methylated). The corresponding CpG probe ID on the HM450 array is indicated on the X-axis. Each dot represents the b value of a single case for each targeted CpG probe. Normal-Cancer (Normal-C), normal colon mucosa samples from patients with concurrent CRC; Normal-Healthy (Normal-H), samples from patients without a history of CRC. 420 Luo et al Gastroenterology Vol. 147, No. 2 BASIC AND TRANSLATIONAL AT Figure 2. Identification and validation of clustering of colorectal adenomas and heatmap representation of DNA methylation array data. DNA methylation status was assessed using HM450 arrays. Each column represents 1 sample and each row represents 1 of the top 5000 most variable probes. The probes are arranged based on the order of unsupervised hierarchical cluster analysis using a correlation distance metric and average linkage method. The DNA methylation M-values are represented by using a color scale from green (low DNA methylation) to red (high DNA methylation). The presence of TP53, PIK3CA, KRAS, BRAFV600E, CTNNB1, or APC mutations is indicated by a colored block (no color ¼ wild-type). Two subgroups (adenoma-L and adenoma-H) were identified using the clustering analysis on the discovery set of 18 adenoma samples (middle panel). These results were confirmed in an independently collected validation set of 24 adenomas using the same probes identified in the discovery set of adenomas (right panel). Normal colon samples (left panel, n ¼ 41) were used for reference. DNA Isolation and Bisulfite Conversion Data Access Genomic DNA was extracted and bisulfite modified as described previously.24 All methylation array data are available at the NCBI Gene Expression Omnibus under accession number GSE48684. Molecular Characterization The CIMP status and MSI status of the CRCs were assessed using methods as described previously.24,25 Gene mutation status of KRAS, BRAFV600E, APC, TP53, and PIK3CA was determined using the qBiomarker Somatic Mutation PCR System Arrays/Human Colon Cancer (Qiagen, Valencia, CA) following the manufacturer’s protocol, as described previously.26 HumanMethylation450 Array Illumina Infinium HumanMethylation450 (HM450) BeadChips (Illumina, San Diego, CA) were used for these studies. The processing of the DNAs on the methylation arrays was conducted in the Genomics Shared Resources at the Fred Hutchinson Cancer Research Center according to the manufacturer’s specifications (Illumina). Data filtering and normalization procedures are described in Supplementary Methods. Results Identification and Validation of Methylated Probes on the Human Methylation450 Arrays That Are Differentially Methylated Between Normal Colon, Tubular Adenomas, and Colorectal Cancer We and others have reported previously that results from the HM450 BeadChips are technically robust, but that there is a measurable false discovery rate.26,27 Therefore, we initially conducted technical validation studies and biological validation studies of a subset of differentially methylated CpGs (n ¼ 4) found on the HM450 arrays (described in detail in the Supplementary Methods and Results). All of the CpG probes (n ¼ 4) that were either aberrantly methylated in cancers or adenomas were confirmed to be methylated by pyrosequencing. In addition, these 4 CpG Cancer All Cluster 1 Adenoma (training set) Cluster 2 Cluster 3 All High Adenoma (validation set) Low All High Low n % n % n % n % n % n % n % n % n % n % Total Age, y <50 50–60 60–70 >70 Sex Male Female Location Proximal colon Transverse colon Distal colon Rectum Unknown MSI/MSS MSI MSS Stage I or II III or IV TP53 Mutant Wild-type SRC Mutant Wild-type PIK3CA Mutant Wild-type KRAS Mutant Wild-type FBXW7 Mutant Wild-type 64 100.0 36 56.3 13 20.3 15 23.4 18 100.0 11 61.0 7 39.0 24 100.0 19 79.2 5 20.8 13 18 15 18 20.3 28.1 23.4 28.1 9 10 10 7 25.0 27.8 27.8 19.4 3 5 3 2 23.1 38.5 23.1 15.4 1 3 2 9 6.7 20.0 13.3 60.0 7 3 5 3 38.9 16.7 27.8 16.7 3 2 4 2 27.3 18.2 36.4 18.2 4 1 1 1 57.1 14.3 14.3 14.3 1 8 5 10 4.2 33.3 20.8 41.7 1 6 4 8 5.3 31.6 21.1 42.1 0 2 1 2 0.0 40.0 20.0 40.0 23 41 35.9 64.1 15 21 41.7 58.3 5 8 38.5 61.5 3 12 20.0 80.0 4 14 22.2 77.8 3 8 27.3 72.7 1 6 14.3 85.7 9 15 37.5 62.5 7 12 36.8 63.2 2 3 40.0 60.0 28 3 25 6 2 43.8 4.7 39.1 9.4 3.1 9 2 17 6 2 25.0 5.6 47.2 16.7 5.6 8 0 5 0 0 61.5 0.0 38.5 0.0 0.0 11 1 3 0 0 73.3 6.7 20.0 0.0 0.0 12 0 5 1 0 66.7 0.0 27.8 5.6 0.0 8 0 2 1 0 72.7 0.0 18.2 9.1 0.0 4 0 3 0 0 57.1 0.0 42.9 0.0 0.0 14 0 9 1 0 58.3 0.0 37.5 4.2 0.0 11 0 7 1 0 57.9 0.0 36.8 5.2 0.0 3 0 2 0 0 60.0 0.0 40.0 0.0 0.0 9 55 14.1 85.9 2 34 5.6 94.4 0 13 0.0 100.0 7 8 46.7 53.3 21 43 32.8 67.2 11 25 30.6 69.4 4 9 30.8 69.2 6 9 40.0 60.0 23 41 35.9 64.1 15 21 41.7 58.3 6 7 46.2 53.8 2 13 13.3 86.7 4 14 22.2 77.8 1 10 9.1 90.9 3 4 42.9 57.1 5 19 20.8 79.2 4 15 21.1 78.9 1 4 20.0 80.0 1 63 1.6 98.4 0 36 0.0 100.0 0 13 0.0 100.0 1 14 6.7 93.3 0 18 0.0 100.0 0 11 0.0 100.0 0 7 0.0 100.0 0 24 0.0 100.0 0 19 0.0 100.0 0 5 0.0 100.0 7 57 10.9 89.1 3 33 8.3 91.7 3 10 23.1 76.9 1 14 6.7 93.3 3 15 16.7 83.3 2 9 18.2 81.8 1 6 14.3 85.7 4 20 16.7 83.3 4 15 21.1 78.9 0 5 0.0 100.0 29 35 45.3 54.7 16 20 44.4 55.6 8 5 61.5 38.5 5 11 33.3 73.3 8 10 44.4 55.6 7 4 63.6 36.4 1 6 14.3 85.7 11 13 45.8 54.2 10 9 52.6 47.4 1 4 20.0 80.0 3 61 4.7 95.3 1 35 2.8 97.2 0 13 0.0 100.0 2 13 13.3 86.7 0 18 0.0 100.0 0 11 0.0 100.0 0 7 0.0 100.0 1 23 4.2 95.8 1 18 5.3 94.7 0 5 0.0 100.0 DNA Methylation Patterns and Colorectal Cancer Characteristics August 2014 Table 1.Clinical and Genetic Characteristics of DNA Methylation-Based Subtypes of CRC and Adenoma Samples 421 BASIC AND TRANSLATIONAL AT 0.0 100.0 52.6 47.4 41.7 58.2 10 14 10 9 0 5 20.0 80.0 5.3 94.7 8.3 91.7 2 22 1 18 1 4 0.0 100.0 0 5 5.3 94.7 4.2 95.8 1 23 1 18 n % n % n High probes were also assessed in 2 independent collections of samples using pyrosequencing and were shown to have the same methylation pattern as seen in the samples run on the methylation arrays (Supplementary Tables 1 and 2 and Supplementary Figures 1 and 2). These results demonstrate that the data generated from the HM450 arrays are reproducible and generalizable to colon adenomas and CRCs. The Methylation Status in Normal Colon Mucosa Near Concurrent Colorectal Cancer Differs From That of Normal Colon Mucosa From Healthy Individuals (No History of Colorectal Cancer and No Concurrent Colorectal Cancer) 5 13 27.8 72.2 2 9 18.2 81.8 3 4 42.9 57.1 In order to assess the role of aberrant DNA methylation in the polyp/cancer sequence, we first determined the methylation status of normal colon mucosa using HM450 arrays, which assess the DNA methylation status of 485,577 CpG dinucleotides. Greater than 90% of the CpG islands in the genome are assessed using the HM450 array.22 We analyzed both the normal colon mucosa from people with no history of colon neoplasms, who are considered to be in an average risk group for CRC, and the normal colon mucosa from people with concurrent CRC, who are at increased risk of metachronous CRC.28 After filtering the data as described in the Methods section, we identified 343 differentially methylated probes using a q value of 1E-3 (Supplementary Data, Supplementary Table 3). As shown in Figure 1, these probes, which are located in 65 loci, can distinguish the DNA methylation levels in normal samples from CRC patients compared with normal mucosa from healthy individuals (q value <1E-4). The majority of these 343 probes (86%) have higher methylation levels in the CRC-associated mucosa compared with colon mucosa in cancer-free individuals. Additional studies will need to be done to determine if these probes are potential markers of a field cancerization process.29,30 Epigenetic Alterations Are a Common Occurrence in Colon Adenomas 2 13 CTNNB1 Mutant Wild-type BRAF Mutant Wild-type APC Mutant Wild-type MSS, microsatellite stability. 29.7 70.3 19 45 13 23 36.1 63.9 4 9 30.8 69.2 13.3 86.7 14.3 85.7 1 6 0.0 100.0 0 11 5.6 94.4 1 17 9 6 14.1 85.9 9 55 0 36 0.0 100.0 0 13 0.0 100.0 60.0 40.0 0.0 100.0 0 7 0.0 100.0 0 11 0.0 100.0 0 18 0 15 3.1 96.9 2 62 1 35 2.8 97.2 1 12 7.7 92.3 % n % n % Characteristics Table 1. Continued n All % n Cluster 2 Cluster 1 Cancer Cluster 3 n BASIC AND TRANSLATIONAL AT 0.0 100.0 % n % % n High All Adenoma (training set) Low All Adenoma (validation set) Gastroenterology Vol. 147, No. 2 % Luo et al Low 422 We next assessed the methylation status of CpG dinucleotides in 18 adenomas using the HM450 arrays. Adenomas are a well-recognized transition step between normal colon and colorectal cancer. The histologic heterogeneity of colon polyps and associated risk of CRC has been appreciated for many years. Recently, unique molecular features have been found in the different histologic types of adenomas, and it has been argued that these differences affect the likelihood of the polyps progressing to CRC.7,31 This led us to compare the methylation state of adenomas and established CRCs to normal colon mucosa. We identified 86,460 differentially methylated probes (DMPs) with a q value of 1E-5. Nearly 40% were hypermethylated and 60% were hypomethylated in the adenomas compared with the normal colon mucosa. The DNA methylation status of CpGs located in different classes of loci, including CpG islands, shores, and shelves as well as promoters, gene bodies, and intergenic regions was assessed. The HM450 array categorizes probes based on gene regions into 3 major gene feature groups: promoter (50 UTR, TSS200, TSS1500, and first exons), intragenic regions (body and 30 UTR), and intergenic regions.32 We found a higher proportion of hypomethylated probes in nonpromoter regions (25% vs 15% in promoter regions) and a higher proportion of hypermethylated probes in promoter regions compared with nonpromoter regions (Supplementary Methods; Supplementary Figure 3). Approximately 17%, 17%, and 19% of probes in CpG islands, shores, and shelves, respectively, are differentially methylated (colon adenoma vs normal colon), suggesting that the proportion of DMPs among these 3 classes of loci are not significantly different. Next, we assessed the overall DNA methylation patterns of the adenomas. Cluster analysis of the 10,000 most variable CpG probes (2.5%) in the adenomas revealed 2 distinct epigenotypes, which we have termed adenoma-H (high methylator phenotype) and adenoma-L (low methylator phenotype) (Figure 2). Of note, using leave-one-out crossvalidation, we determined that the misclassification rate of the cluster results is 5.5%, which suggests that the adenomaH and adenoma-L groups are truly unique entities and not simply a consequence of a chance association secondary to multiple comparisons. Comparison of methylated CpGs between the adenoma-H and adenoma-L groups revealed 1196 differentially methylated probes (q value <1E-4; Supplementary Data; Supplementary Table 4). Among these DMPs, 89.7% are hypermethylated in the adenoma-H polyps compared with the adenoma-L group. Most of the probes that are hypomethylated (n ¼ 58/124 [47%]) in the adenoma-H group vs adenoma-L group are located in the intergenic or intragenic regions (Supplementary Methods; Supplementary Figure 4). No association between DNA methylation and polyp size or histology was found. Interestingly, the adenoma-L polyps have a methylation pattern similar to normal colon mucosa, and the adenoma-H polyps are more similar to CIMP-negative CRC (Figure 3 and Supplementary Data; Supplementary Figures 5 and 6). We also found that KRAS mutations occur frequently in a subset of adenoma-H polyps (n ¼ 7/11 [63.3%]), and that mutant APC, BRAF, and PIK3CA occur in small subsets (n ¼ 5/18 [27.8%]; n ¼ 1/18 [5.6%]; n ¼ 3/18 [16.7%], respectively) of all the adenomas. Mutations in SRC, FBXW7, or CTNNB1 were not found in any of the adenomas (Table 1). Also, the adenoma-H polyps with mutant KRAS exhibit a unique methylation pattern compared with the adenoma-H polyps with wild-type KRAS. In order to validate the discovery of unique epigenotypes of tubular adenomas, we assessed the methylation patterns of an independent collection of 24 adenomas using the HM450 arrays. We used the same set of probes identified in the discovery set of samples (Figure 2) and confirmed the previous cluster results. We also identified an adenoma-H and an adenoma-L subgroup, with obvious heterogeneity existing in the adenoma-H group. Mutation profiling results were similar in this second set of samples compared with the discovery set of samples. We found KRAS mutations occur frequently in a subset of adenoma-H polyps (n ¼ 10/19 [52.6%]). In addition, we found mutations in APC (n ¼ 10/24 DNA Methylation Patterns and Colorectal Cancer 423 [41.7%]), BRAF (n ¼ 2/24 [8.3%]), PIK3CA (n ¼ 4/24 [16.7%]), and CTNNB1 (n ¼ 1/24 [4.2%]), which is similar to the frequencies found in the discovery set of samples. SRC mutations were not found in any of the adenomas in either set of samples. In addition, there are no significant differences in the age, sex, and location of the adenomas between adenoma-H and adenoma-L clusters (Table 1). Methylator Phenotype in Colorectal Cancer After assessing the methylation status of the normal colon mucosa and of colon tubular adenomas, we performed comprehensive DNA methylation profiling of 64 CRCs. We also assessed the mutation status of KRAS, BRAFV600E, APC, SRC, FBXW7, TP53, and PIK3CA in the CRCs as we had done with the adenomas. Using a recursively partitioned mixture model and a hierarchical model (hCluster) clustering approach on the 10,000 most variable CpGs, we identified 3 distinct CRC subgroups, indicated as cluster 1 (n ¼ 26/64 [53%]), cluster 2 (n ¼ 13/64 [22%]), and cluster 3 (n ¼ 15/64 [25%]), which we have termed methyl-low (low methylation pattern; M-L), methyl-intermediate (intermediate methylation pattern, M-I), and methyl-high (high methylation pattern; M-H), respectively (Figure 3). The M-H subgroup is enriched for CIMP-high cancers (n ¼ 11/15 [68.8%]).14 The remaining M-H CRCs (n ¼ 4/15 [31.2%]) were CIMP-low, which were characterized by having 12 methylated CIMP loci. Consistent with other studies, the M-H subgroup CRCs have frequent BRAFV600E mutations (n ¼ 9/15 [60.0%]) and are often MSI (n ¼ 7/15 [46.7%]).14 The M-H subgroup also has a relative paucity of APC mutations (n ¼ 2/15 [13.3%]) and TP53 mutations (n ¼ 2/15 [13.3%]) when compared with the CRCs of the other cluster subgroups, in which there are no BRAFV600E mutations, infrequent MSI (n ¼ 2/39 [4.1%]) and a higher frequency of APC mutations (n ¼ 13/39 [34.7%]) and TP53 mutations (n ¼ 15/39 [42.9%]) (Table 1). CRCs in the M-I subgroup exhibit an intermediate methylation pattern when compared with the other 2 cluster subgroups (Figure 3). In the M-I subgroup, mutant KRAS (n ¼ 8/13 [61.5%]) and mutant APC (n ¼ 4/13 [30.8%]) were frequent, which is similar to the frequencies seen in other studies6,33 (Table 1). The clinicopathologic features of the M-I subgroup suggest that this subtype of cancers corresponds to the classifications CIMP2 or CIMP-low that have been described.6,33 The M-L group is enriched for non-CIMP cancers as determined by the CIMP MethyLight assay panel and has a relatively low frequency of methylated CpGs.14 Of note, the MH, M-I, and M-L classes of CRCs identified in this study were validated through the analysis of 2 independent collections of CRCs, which identified the same 3 subsets of CRCs (described in detail in the Supplementary Methods and Results).6,13 Association of Methylated Loci With Polycomb Group Binding Sites in Colon Adenomas and Colorectal Cancers After identifying methylated CpGs that occur commonly in adenomas and CRC, we next assessed which of these methylated CpGs occur in loci commonly occupied by BASIC AND TRANSLATIONAL AT August 2014 424 Luo et al Gastroenterology Vol. 147, No. 2 BASIC AND TRANSLATIONAL AT Figure 3. Cluster analysis of CRCs and heatmap representation of DNA methylation array data. DNA methylation status was assessed using HM450 arrays. Each column represents 1 sample and each row represents 1 of the top 5000 most variable probes. The DNA methylation M-values are represented by using a color scale from green (low DNA methylation) to red (high DNA methylation). Three subgroups were identified by clustering and are indicated above the heatmap (Methyl-Low, Methyl-Intermediate, and Methyl-High). The presence of TP53, SRC PIK3CA, KRAS, FBXW7, CTNNB1, BRAFV600E, or APC mutations and CIMP status, which was determined using the Weisenberger panel of CIMP CpGs with MethyLight assays, are indicated by a colored block (no color ¼ wild-type). Normal colon samples (left panel, n ¼ 41) were used for reference. polycomb group proteins. Previous studies have demonstrated that genes whose expression is affected by methylation often are polycomb group target loci.11 We found that 55% of the hypermethylated loci (defined by q value < E-6 and a b value >.3 vs normal colon; n ¼ 756) occurred in areas of bivalent chromatin in all 3 methylation classes of CRCs (Figure 4), which is consistent with previously published findings.13 In addition, we found that the polycomb group proteinmarked hypermethylated loci found in the CRCs were also hypermethylated in the adenoma-H polyps, but not the adenoma-L polyps (Figure 4 and Supplementary Data; Supplementary Figure 7 and Supplementary Tables 5 and 6). These findings suggest that the adenoma-H polyps might be more progressed toward CRCs than adenoma-L polyps and also suggest that the aberrant methylation of genes that are likely to be transcriptionally repressed by this epigenetic alteration occurs early in the polyp to cancer progression sequence. Hypervariability of Methylation Occurs Early in the AdenomaCarcinoma Progression Sequence In addition to assessing the methylation status of CpGs in the normal colon and colon neoplasms, we assessed the inter-sample variability in methylation of the CpGs in the polyps and CRCs. We observed higher inter-sample variability in the frequency of methylated loci across the CRCs and adenomas as compared with the normal samples, which had a small amount of inter-sample variability (Figure 5A and B). Of interest, the degree of epigenetic hypervariability found in the adenomas and CRCs is similar (Figure 5C), suggesting the variability in methylation status occurs early in CRC formation (Figure 5D). DNA Methylation Patterns and Colorectal Cancer 425 BASIC AND TRANSLATIONAL AT August 2014 Figure 4. Venn diagram of hypermethylated genes that are polycomb group protein (PcG)-marked. (A) PcG-marked hypermethylated genes in CRC samples (419 genes). (B) PcG-marked hypermethylated genes in adenoma samples (554 genes). (C) PcG-marked hypermethylated genes in adenoma and CRC samples (384 genes). The number in each area indicates the number of genes in that area. Adenomas and Colorectal Cancers With a High Methylation Pattern Have a Unique Pattern of Methylated Intergenic and Intragenic CpGs Recent studies suggest that CIMP CRCs are derived from sessile serrated adenomas,34 a CRC precursor lesion exhibiting unique morphologic features and epigenetic characteristics.35–37 Earlier studies have shown that up to 30% of serrated adenomas have a CIMP methylation pattern and that tubular adenomas are rarely CIMP.35 These findings suggest that there are at least 2 unique polyp/cancer progression sequences that can lead to CRC.7 In light of the identification of CIMP in serrated adenomas, we carried out a detailed assessment of the epigenome in tubular adenomas and CRCs by assessing the methylation status of CpGs located in intergenic and intragenic regions. We compared the methylation status of these CpGs in the M-H 426 Luo et al Gastroenterology Vol. 147, No. 2 BASIC AND TRANSLATIONAL AT Figure 5. Increased intersample methylation variability in colon adenomas and CRCs. (A) The standard deviation of 1000 randomly selected probes in normal colon samples and adenomas is shown. (B) The standard deviation of 1000 randomly selected probes in normal colon samples and CRCs is shown. (C) The standard deviation of 1000 randomly selected probes in adenomas and CRCs is shown. (D) The standard deviation of 1000 randomly selected probes in adenoma-L polyps and adenoma-H polyps is shown. The gray solid line in each image (from left low corner to the up right corner) is the identity line, which indicates the location of a 1:1 ratio between the comparison sets, and the dashed red line indicates the best-fit linear regression of the CpG probes. The red line above the gray line means the group on the y-axis has higher variability than the one on the x-axis. CRCs and M-L CRCs and found 3555 intergenic and 3676 intragenic differentially methylated probes (q value <1E-5) (Supplementary Data, Supplementary Figure 8 and Supplementary Table 7). When we compared the M-H to MI/M-L CRCs, we identified 3122 intragenic and 3114 intergenic DMPs (q value <1E-5, Supplementary Data, Supplementary Table 8). High Methylator Phenotype Adenoma Polyps Share an Epigenetic Signature With Colorectal Cancers of Low/Intermediate Methylation Pattern Based on our observations of there being 2 discrete classes of adenomas based on methylation patterns (adenoma-L and adenoma-H) and on our identification of patterns of methylated intergenic and intragenic CpGs that distinguish M-H from M-L CRCs, we next assessed these intergenic and intragenic CpGs in adenomas. Interestingly, this analysis revealed that the epigenetic signature of the adenoma-H polyp class is similar to the M-I/M-L CRCs, whereas the adenoma-L polyp class is similar to normal colon mucosa (Figure 6). Of note, similar results were found in a validation set of adenomas (Supplementary Figure 9). Multi-dimensional scaling analysis agreed with the cluster analysis, which showed adenoma-H polyps to be similar to M-I/M-L CRCs and adenoma-L polyps to be similar to normal colon mucosa (Supplementary Data, Supplementary Figure 10). These findings suggest that the adenoma-H polyps might be the origin of M-I/M-L CRCs, and adenoma-L polyps might be polyps that ultimately will not progress to CRC. Additionally, these studies reveal significant heterogeneity in the epigenome of adenomas and suggest that the epigenome might portend the fate of tubular adenomas. Discussion In these studies we have found considerable genetic and epigenetic heterogeneity among not only CRCs but also among adenomas.6,38,39 The existence of different classes of CRC that differ based on DNA methylation patterns was first proposed by Issa and colleagues in 1999 when they identified a CIMP class of CRCs.8 Our studies provide additional insight into CIMP CRCs by showing that in the high August 2014 DNA Methylation Patterns and Colorectal Cancer 427 methylation pattern CRCs there is a low frequency of APC mutations, consistent with findings from the TCGA,6 and suggest CIMP CRCs might arise by a WNT independent pathway.40 In addition, we identified a series of differentially methylated probes located in intragenic and intergenic regions that distinguish M-H from M-I/M-L CRCs. These probes lie in regions called tissue differential methylated regions, and their methylation status is tissue specific, perhaps reflecting the stem cell methylation pattern from which the tissue is derived.23,41 These findings suggest that the M-H CRCs might be derived from a different stem cell precursor than M-I/M-L CRCs, which might explain the unique clinicopathologic features of CIMP CRCs compared with non-CIMP CRCs.14 We also analyzed the methylation status of normal colon mucosa. Our analysis of the normal colon mucosa included samples from individuals with concurrent CRC, who might have a field defect in their colons that predisposes them to adenomas and CRC, and from individuals who are cancer free. Consistent with earlier studies that have used limited gene panels, we observed differences in the methylation patterns between these 2 groups.20,42,43 Additional studies using a prospective study design will be needed to confirm whether the methylation state in the normal colon predicts risk for developing CRC. BASIC AND TRANSLATIONAL AT Figure 6. Cluster analysis of M-H CRCs, M-I/M-L CRCs, and adenoma-H adenomas using differentially methylated probes in intragenic/intergenic regions. Differentially methylated intragenic (A) and intergenic (B) CpG probes between M-H and M-I/M-L CRCs were used in unsupervised clustering analysis of the adenoma-H polyps and CRCs. The results show that the methylation pattern in adenoma-H polyps is similar to that in M-I/M-L CRCs, but not to that in M-H CRCs. (C) Schematic of concept that there are multiple adenoma to cancer pathways, which can be identified by DNA methylation signatures in the normal colon, adenomas, and cancers (see Discussion). Earlier studies of the methylation status of adenomas, using small sample sizes and small panels of candidate genes, have shown that methylation is present and heterogeneous in adenomas.15,44,45 Additionally, studies have demonstrated that CIMP can be detected in serrated polyps, but is rare in tubular adenomas.15,46 Using HM450 arrays and a moderately sized set of 18 tubular adenomas, we observed that aberrant DNA methylation occurs commonly in tubular adenomas. In addition, we identified subclasses of tubular adenomas based on their methylation status and KRAS mutation status. We studied the intergenic and intragenic CpGs, rather than promoter CpGs, because their methylation status appears to reflect the epigenome of the stem cell population from which the tumor cells are derived.47 We found that the adenoma-H polyps have a methylation pattern that is similar to that of M-I/M-L CRCs, and the adenoma-L polyps have a methylation pattern that is similar to that of normal colon epithelium. Importantly, these results were confirmed in an independent collection of 24 adenomas. These results suggest that adenoma-H polyps might be the precursors for M-I/M-L CRCs. Our findings also suggest adenoma-L polyps might have a low potential to progress to CRC and might represent the 90% of adenomas that do not evolve into CRC. 428 Luo et al Our studies provide insight into the colon adenoma to cancer progression sequence. This sequence of histologic progression of normal colon epithelial cells to CRC is widely believed to be driven by the serial acquisition of gene mutations and epigenetic alterations. It appears likely that the different subclasses of CRCs arise through different polyp to cancer progression sequences with CIMP CRCs arising from serrated polyps and non-CIMP CRCs arising from tubular adenomas.48 Our studies demonstrate there are subclasses of adenomas recognized by their epigenotype and KRAS mutation status and raise the possibility that one of these subclasses, adenoma-H polyps, might be the precursors for CRCs with a low/intermediate methylation pattern. In summary, our results confirm those of earlier studies that have shown aberrant DNA methylation occurs early in CRC formation and might predispose histologically normal tissue to become neoplastic. In addition, we have found that the epigenetic state of the adenomas might influence the propensity of the adenoma to undergo malignant transformation and portend the epigenotype of the resulting CRC. Supplementary Material Note: To access the supplementary material accompanying this article, visit the online version of Gastroenterology at www.gastrojournal.org, and at http://dx.doi.org/10.1053/ j.gastro.2014.04.039. BASIC AND TRANSLATIONAL AT References 1. Gal-Yam EN, Egger G, Iniguez L, et al. Frequent switching of polycomb repressive marks and DNA hypermethylation in the pc3 prostate cancer cell line. Proc Natl Acad Sci U S A 2008;105:12979–12984. 2. Houseman EA, Christensen BC, Yeh RF, et al. Modelbased clustering of DNA methylation array data: a recursive-partitioning algorithm for high-dimensional data arising as a mixture of beta distributions. BMC Bioinformatics 2008;9:365. 3. Vogelstein B, Papadopoulos N, Velculescu VE, et al. Cancer genome landscapes. Science 2013;339: 1546–1558. 4. Grady WM, Carethers JM. Genomic and epigenetic instability in colorectal cancer pathogenesis. Gastroenterology 2008;135:1079–1099. 5. Wood LD, Parsons DW, Jones S, et al. The genomic landscapes of human breast and colorectal cancers. Science 2007;318:1108–1113. 6. Cancer Genome Atlas N. Comprehensive molecular characterization of human colon and rectal cancer. Nature 2012;487:330–337. 7. Jass JR. Molecular heterogeneity of colorectal cancer: Implications for cancer control. Surg Oncol 2007;16(Suppl 1):S7–S9. 8. Toyota M, Ho C, Ahuja N, et al. Identification of differentially methylated sequences in colorectal cancer by methylated cpg island amplification. Cancer Res 1999; 59:2307–2312. Gastroenterology Vol. 147, No. 2 9. Issa JP, Ottaviano YL, Celano P, et al. Methylation of the oestrogen receptor cpg island links ageing and neoplasia in human colon. Nat Genet 1994;7:536–540. 10. van Engeland M, Derks S, Smits KM, et al. Colorectal cancer epigenetics: complex simplicity. J Clin Oncol 2011;29:1382–1391. 11. Ohm JE, McGarvey KM, Yu X, et al. A stem cell-like chromatin pattern may predispose tumor suppressor genes to DNA hypermethylation and heritable silencing. Nat Genet 2007;39:237–242. 12. Issa JP. Cpg island methylator phenotype in cancer. Nat Rev Cancer 2004;4:988–993. 13. Hinoue T, Weisenberger DJ, Lange CP, et al. Genomescale analysis of aberrant DNA methylation in colorectal cancer. Genome Res 2012;22:271–282. 14. Weisenberger DJ, Siegmund KD, Campan M, et al. Cpg island methylator phenotype underlies sporadic microsatellite instability and is tightly associated with braf mutation in colorectal cancer. Nat Genet 2006; 38:787–793. 15. Burnett-Hartman AN, Newcomb PA, Potter JD, et al. Genomic aberrations occurring in subsets of serrated colorectal lesions but not conventional adenomas. Cancer Res 2013;73:2863–2872. 16. Spring KJ, Zhao ZZ, Karamatic R, et al. High prevalence of sessile serrated adenomas with braf mutations: a prospective study of patients undergoing colonoscopy. Gastroenterology 2006;131:1400–1407. 17. Rex DK, Ahnen DJ, Baron JA, et al. Serrated lesions of the colorectum: review and recommendations from an expert panel. Am J Gastroenterol 2012;107:1315–1329. quiz 1314, 1330. 18. Kim YH, Kakar S, Cun L, et al. Distinct CpG island methylation profiles and braf mutation status in serrated and adenomatous colorectal polyps. Int J Cancer 2008; 123:2587–2593. 19. Lao VV, Grady WM. Epigenetics and colorectal cancer. Nat Rev Gastroenterol Hepatol 2011;8:686–700. 20. Shen L, Kondo Y, Rosner GL, et al. Mgmt promoter methylation and field defect in sporadic colorectal cancer. J Natl Cancer Inst 2005;97:1330–1338. 21. Ahuja N, Li Q, Mohan AL, et al. Aging and DNA methylation in colorectal mucosa and cancer. Cancer Res 1998; 58:5489–5494. 22. Sandoval J, Heyn H, Moran S, et al. Validation of a DNA methylation microarray for 450,000 CpG sites in the human genome. Epigenetics 2011;6:692–702. 23. Irizarry RA, Ladd-Acosta C, Wen B, et al. The human colon cancer methylome shows similar hypo- and hypermethylation at conserved tissue-specific CpG island shores. Nat Genet 2009;41:178–186. 24. Luo Y, Tsuchiya KD, Il Park D, et al. Ret is a potential tumor suppressor gene in colorectal cancer. Oncogene 2013;32:2037–2047. 25. Grady WM, Rajput A, Myeroff L, et al. Mutation of the type II transforming growth factor-beta receptor is coincident with the transformation of human colon adenomas to malignant carcinomas. Cancer Res 1998; 58:3101–3104. 26. Luo Y, Kaz AM, Kanngurn S, et al. NTRK3 is a potential tumor suppressor gene commonly inactivated by epigenetic mechanisms in colorectal cancer. PLoS Genet 2013;9:e1003552. 27. Busche S, Ge B, Vidal R, et al. Integration of highresolution methylome and transcriptome analyses to dissect epigenomic changes in childhood acute lymphoblastic leukemia. Cancer Res 2013; 73:4323–4336. 28. Saini SD, Kim HM, Schoenfeld P. Incidence of advanced adenomas at surveillance colonoscopy in patients with a personal history of colon adenomas: a meta-analysis and systematic review. Gastrointest Endosc 2006;64: 614–626. 29. Shen L, Issa JP. Epigenetics in colorectal cancer. Curr Opin Gastroenterol 2002;18:68–73. 30. Grady WM, Parkin RK, Mitchell PS, et al. Epigenetic silencing of the intronic microrna hsa-miR-342 and its host gene evl in colorectal cancer. Oncogene 2008; 27:3880–3888. 31. Yagi K, Akagi K, Hayashi H, et al. Three DNA methylation epigenotypes in human colorectal cancer. Clin Cancer Res 2010;16:21–33. 32. Dedeurwaerder S, Defrance M, Calonne E, et al. Evaluation of the infinium methylation 450k technology. Epigenomics 2011;3:771–784. 33. Shen L, Toyota M, Kondo Y, et al. Integrated genetic and epigenetic analysis identifies three different subclasses of colon cancer. Proc Natl Acad Sci U S A 2007; 104:18654–18659. 34. Leggett B, Whitehall V. Role of the serrated pathway in colorectal cancer pathogenesis. Gastroenterology 2010; 138:2088–2100. 35. Yamamoto E, Suzuki H, Yamano HO, et al. Molecular dissection of premalignant colorectal lesions reveals early onset of the cpg island methylator phenotype. Am J Pathol 2012;181:1847–1861. 36. Jass JR. Serrated adenoma of the colorectum and the DNA-methylator phenotype. Nat Clin Pract Oncol 2005; 2:398–405. 37. Gaiser T, Meinhardt S, Hirsch D, et al. Molecular patterns in the evolution of serrated lesion of the colorectum. Int J Cancer 2013;132:1800–1810. 38. Markowitz SD, Bertagnolli MM. Molecular origins of cancer: molecular basis of colorectal cancer. N Engl J Med 2009;361:2449–2460. 39. Lugli A, Jass JR. Types of colorectal adenoma. Verh Dtsch Ges Pathol 2006;90:18–24. 40. Kawasaki T, Nosho K, Ohnishi M, et al. Correlation of beta-catenin localization with cyclooxygenase-2 expression and cpg island methylator phenotype (cimp) in colorectal cancer. Neoplasia 2007;9:569–577. 41. Doi A, Park IH, Wen B, et al. Differential methylation of tissue- and cancer-specific cpg island shores distinguishes human induced pluripotent stem cells, embryonic stem cells and fibroblasts. Nat Genet 2009; 41:1350–1353. DNA Methylation Patterns and Colorectal Cancer 429 42. Worthley DL, Whitehall VL, Buttenshaw RL, et al. DNA methylation within the normal colorectal mucosa is associated with pathway-specific predisposition to cancer. Oncogene 2010;29:1653–1662. 43. Ushijima T. Epigenetic field for cancerization. J Biochem Mol Biol 2007;40:142–150. 44. Kim YH, Petko Z, Dzieciatkowski S, et al. Cpg island methylation of genes accumulates during the adenoma progression step of the multistep pathogenesis of colorectal cancer. Genes Chromosomes Cancer 2006;45:781–789. 45. Kim KM, Lee EJ, Ha S, et al. Molecular features of colorectal hyperplastic polyps and sessile serrated adenoma/polyps from korea. Am J Surg Pathol 2011; 35:1274–1286. 46. O’Brien MJ, Yang S, Mack C, et al. Comparison of microsatellite instability, cpg island methylation phenotype, braf and kras status in serrated polyps and traditional adenomas indicates separate pathways to distinct colorectal carcinoma end points. Am J Surg Pathol 2006; 30:1491–1501. 47. Maunakea AK, Nagarajan RP, Bilenky M, et al. Conserved role of intragenic DNA methylation in regulating alternative promoters. Nature 2010;466:253–257. 48. Bettington M, Walker N, Clouston A, et al. The serrated pathway to colorectal carcinoma: Current concepts and challenges. Histopathology 2013;62:367–386. Author names in bold designate shared co-first authorship. Received October 6, 2013. Accepted April 23, 2014. Reprint requests Address requests for reprints to: Yanxin Luo, MD, PhD, Department of Colorectal Surgery, The Sixth Affiliated Hospital, Sun Yat-Sen University, Guangzhou 510655, PR China. e-mail: [email protected]; fax: (206) 667-2917; or William M. Grady, MD, Fred Hutchinson Cancer Research Center, 1100 Fairview Avenue North, D4-100, Seattle, Washington 98109. e-mail: [email protected]; fax: (206) 667-2917. Acknowledgments The authors would like to acknowledge the outstanding service provided by the Genomics Shared Resources (FHCRC) and the Cooperative Human Tissue Network for the tissues they provided. We also thank the ColoCare team (Chris Velicer, Rebecca Holmes, Stephanie Zschäbitz, Kathy Vickers, Rachel Wilbur, Shannon Rush, and Sara Bates and others) for their assistance on these studies. In addition, we would like to thank Toshinori Hinoue and Peter W. Laird for kindly sharing their data. Finally, we would like to thank the study participants who kindly agreed to provide tissues for analysis. Data access: All methylation array data are available at the NCBI Gene Expression Omnibus under accession number GSE48684. The editors and reviewers can access the private dataset by the following link: http:// www.ncbi.nlm.nih.gov/geo/query/acc.cgi?token¼djghdmayauuisvo&acc¼ GSE48684. Funding This research reported in this publication was supported by the National Cancer Institute of the National Institutes of Health under award number RO1CA115513, P30CA15704, UO1CA152756, U54CA143862, and P01CA077852 (WMG), P50CA95103 (ZW), R01CA121060, P30CA68485, K07CA122451 (MJS). The content is solely the responsibility of the authors, and does not necessarily represent the official views of the National Institutes of Health. Support for these studies was also provided by a Burroughs Wellcome Fund Translational Research Award for Clinician Scientist (WMG), Program of Introducing Talents of Discipline to Universities of China (B12003, JW) and International Science & Technology Cooperation Program of China (2011DFA32570, JW), National Natural Science Foundation of China (81201920, YL); and 5P50CA150964 (SDM). BASIC AND TRANSLATIONAL AT August 2014 429.e1 Luo et al Supplementary Methods Materials The tissue samples were collected by endoscopic biopsy for the normal colon mucosa from cancer-free study subjects (19 cases). Normal colon from patients with cancer (22 cases) was obtained from the normal-appearing resection margin of a surgical resection specimen. The samples were obtained from the following sources: the University of Washington Medical Center and Fred Hutchinson Cancer Research Center through the ColoCare consortium (Seattle, WA), Vanderbilt University Medical Center and the Department of Veterans Affairs Tennessee Valley Health Care System (Nashville, TN), and the University Hospitals of Cleveland (Cleveland, OH) following protocols approved by the Institutional Review Board of each institution. Snapfrozen tissue was also provided by the Cooperative Human Tissue Network. All samples used for the methylation arrays were reviewed by a gastrointestinal pathologist to confirm the diagnosis and ensure that the cancer samples were >60% tumor epithelium. Data Filtering, Normalization, and Differential Analysis For the Illumina Infinium DNA methylation data analysis, we first removed unreliable probes using a detection P value >.05.1 The normalization process was conducted using the Bioconductor minfi package, which includes Illumina background level correction, color adjustment, and subsetquantile within array normalization. Note that subsetquantile within array normalization is a newly developed normalization scheme specifically designed for Illumina Infinium HM450 array data to account for the difference between the Infinium I and Infinium II probe designs.2 In order to reduce the background signal effects and biases, we also filtered out probes that contain SNPs (Target ID starts with rs on the arrays), that assess non-CpG sites (Target IDs start with ch), that are chromosome Xassociated and that have a SNP present within 10 bp from the query sites. We also applied the ComBat algorithm to assess and correct for batch effects across all array runs.3 The differential analyses comparing variable subgroups were performed based on the M-values converted from b values that were generated by the Illumina Infinium DNA methylation arrays. We computed a refined F-statistics to quantify the difference in DNA methylation M-values for each probe between 2 different sample sets. This statistic is based on an empirical Bayes approach in which the estimated sample error variance is scaled toward a pooled estimate.4 To account for multiple comparisons, we used false discover rate q value to determine the significance of differentially methylated positions and reported lists of probes associated with q < 1E-5.5 Unsupervised Clustering Analyses The clustering and subgroups classification were conducted using unsupervised hierarchical clustering provided by R-based function hClust. To identify the subgroups of CRC and adenoma, we used M-values from the 10,000 Gastroenterology Vol. 147, No. 2 (2.5%) probes that showed the greatest variability across the samples in each group. Leave-One-Out cross validation was used to predict the error rate of the clustering results. Identification of Potentially Silenced Methylated Genes As compared with normal colon samples, we first identified lists of promoter associated DMPs (q < 1E-5) in adenomas and cancers. The genes that were found to have at least 1 common probe among these 2 lists of DMPs and that harbored embryonic stem-cellassociated bivalent domains were determined as being potentially methylated and silenced genes. Classification of Methyl-High Colorectal Cancer Stem-CellAssociated Methylated CpGs We first analyzed the CpG probes that are uniquely methylated in Methyl-High CRCs relative to MethylIntermediate/Low CRCs and to normal samples (q < 1E5). This set of CpG probes comprises the Methyl-High-CRC specific methylated loci. Next, we selected probes located in the intergenic/intragenic regions, whose methylation status has been proposed to reflect that of the cell of origin, as discussed here. The final set of probes was considered as Methyl-High CRC stem cellassociated probes/regions. Identify of Adenoma-H/Methyl-Intermediate/ Low-Associated CpG Loci We initially selected a set of methylated CpG probes that discriminated between M-H and M-I/M-L CRCs and that were located in intergenic regions (q < 1E-5) because these probes have been suggested to indicate the cell of origin of the tissue or tumor. We then used these probes to conduct an unsupervised clustering and heatmap analysis of the CRCs and adenoma-H polyps to assess the relationship between these 3 groups of tumors with regards to the methylation levels of these probes. Validation of the Results Generated From the DNA Methylation Microarrays In the current study, we carried out 2 levels of validation. For technical validation of the array results, we used Pyrosequencing assays to assess a subset of the differentially probes identified on the HM450 DNA methylation microarrays. We also conducted clinical validation studies in which assessed the methylated CpGs identified in the discovery set of samples in an independent set of samples. For the technical validation studies, we first assessed the accuracy of the pyrosequencing assay. We designed PyroMark assays that targeted the specific CpG dinucleotide identified on the arrays and first assessed a sample set that consisted of serial dilutions of a 100% methylated control sample with a 100% unmethylated control DNA sample (#59655 for methyl and #59665 for unmethyl EpiTect Control DNA; Qiagen, Valencia, CA) (100%, 75%, 50%, 25%, and 0). Upon confirmation of the accuracy of the assay, we then used the PyroMark assay to technically August 2014 validate the HumanMethylation450 results by analyzing the identical samples run on the HM450 arrays. We then used the validated PyroMark assay to analyze the candidate genes in an independent collection of samples for clinical validation of the methylated CpGs. Validation of Methylated CpGs That Classify the Epigenotype of the Colorectal Cancers In order to validate the methylated CpG sets that characterize the M-High and M-I/M-L CRCs, we compared the classification results of our discovery set with those of an existing HM27 dataset (n ¼ 125 CRC samples; GSE250621). Of note, out of these samples, 103 cases were CIMP-negative and 22 were CIMP-positive as determined by MethyLight assays.1 We performed unsupervised clustering analysis using the 10,000 most variable probes identified in the discovery set and that were also included in the HumanMethylation27 platform. We also analyzed a second set of CRCs whose methylomes were determined as part of the TCGA. Correlation Between DNA Methylation Level and Gene Expression Level A previous study has shown that <10% of hypermethylated promoter associated CpGs associate with decreased gene expression when compared with samples with unmethylated CpGs.1 We used expression profiling datasets available for the samples described in the previous section to assess for a correlation between DNA methylation status and expression levels. Supplementary Results DNA Methylation Patterns and Colorectal Cancer 429.e2 sets of samples (n ¼ 105 and n ¼ 119). cg21101720 methylation was detected in 05.7% of normal samples, 60%75% of adenomas, and 69%80% cancer samples (the primers used for pyrosequencing are described in details in Supplementary Table 2). These results demonstrate that differentially methylated probes identified using the discovery set of samples and results the HM450 arrays are reliable and reproducible.6,7 The Epigenotype Classifications of Colorectal Cancers Were Validated Using an Independent Collection of Samples We identified the 506 most variable probes in our dataset that were also on the HumanMethylation27 platform. This methylation-based cluster analysis applied using these probes the probes from the discovery set classified 125 CRCs into 3 groups (cluster 1, 2 and 3 showed in Supplementary Figure 11A), which corresponded to M-L, MI, and M-H subtypes in our discovery dataset. In addition, the cluster 3 CRCs were enriched with CIMP-positive CRCs as determined by MethyLight (P ¼ 2E-12), and were also enriched with BRAFV600E mutation (P ¼ 3E-11), which is consistent with our findings using the discovery set of samples. More important, as we described previously, we have performed the clustering analyses on TCGA samples (batch no. 99, 39 CRCs), which were assessed using HM450 arrays. We used the same 10,000 variable probes identified in the discovery set of samples and applied them to the TCGA CRCs. As shown in Supplementary Figure 11B, 3 clusters were again identified based on the DNA methylation profiling data. Taken together, these results suggest that the classification of CRC epigenotypes based on the discovery set of samples is reproducible and generalizable. Differentially Methylated Probes Generated From the HM450 Arrays Are Reliable and Reproducible DNA Promoter Methylation Correlates With Absence of Gene Expression We first designed PyroMark assays targeting the following specific CpG dinucleotides: cg21101720, cg14215472, cg26532627, or cg03537386. The first 3 probes are hypermethylated in adenoma/cancer, and the last probe is hypomethylated in adenoma/cancers based on the DNA methylation array data. Analysis of the methylated DNA serial dilution sample sets demonstrated that the pyrosequencing assays can accurately determine the percentage of methylated DNA present in a sample (Supplementary Figure 1). We then applied these assays to the samples that were analyzed with the DNA methylation arrays. The results from the pyrosequencing assays correlated closely with the results from the methylation arrays (Supplementary Figure 2). In addition, we assessed the methylation status of the differentially methylated probes using the pyrosequencing assays on DNA from an independent collection of samples (Supplementary Table 1). The results on the validation set of samples were very similar to those in the discovery set of samples. For instance, cg21101720 was identified to be significantly hypermethylated in cancer/adenoma in 2 independent validation Using publically available gene expression profiling data (GSE25070), we found approximately 53% of downregulated genes are associated with hypermethylated CpGs (206 genes in total), presumably due to the expanded coverage of the methylation microarray platform (27K/450K). Of note, those 206 genes are targeted by 1594 hypermethylated probes in promoter and intragenic regions. Interestingly, 72% (1152 probes) of these hypermethylated probes are located in promoter regions and 79% (1258 probes) are in CpG islands. This suggests that the hypermethylation in promoter regions and CpG islands are very likely correlated with lack of gene expression, especially when compared with CpGs in intragenic regions and non-CpG islands. In addition, we found 37% of genes with high expression (269 genes) are associated with 610 hypomethylated probes. Nearly 60% of these probes are located in intragenic regions and approximately 80% are in non-CpG islands, including shores and shelves, which suggests that the hypomethylation in non-CpG islands likely correlates with down-regulation of gene expression, especially when compared with CpGs in CpG islands. 429.e3 Luo et al References 1. Hinoue T, Weisenberger DJ, Lange CP, et al. Genomescale analysis of aberrant DNA methylation in colorectal cancer. Genome Res 2012;22:271–282. 2. Maksimovic J, Gordon L, Oshlack A. Swan: subsetquantile within array normalization for illumina infinium humanmethylation450 beadchips. Genome Biol 2012; 13:R44. 3. Johnson WE, Li C, Rabinovic A. Adjusting batch effects in microarray expression data using empirical bayes methods. Biostatistics 2007;8:118–127. 4. Smyth GK. Linear models and empirical bayes methods for assessing differential expression in microarray Gastroenterology Vol. 147, No. 2 experiments. Stat Appl Genet Mol Biol 2004;3: Article3. Epub 2004 Feb 12. 5. Storey JD. The positive false discovery rate: a bayesian interpretation and the q-value. Ann Stat 2003; 31:2013–2035. 6. Busche S, Ge B, Vidal R, et al. Integration of highresolution methylome and transcriptome analyses to dissect epigenomic changes in childhood acute lymphoblastic leukemia. Cancer Res 2013; 73:4323–4336. 7. Luo Y, Kaz AM, Kanngurn S, et al. Ntrk3 is a potential tumor suppressor gene commonly inactivated by epigenetic mechanisms in colorectal cancer. PLoS Genet 2013;9:e1003552. August 2014 Supplementary Figure 1. Development of accurate PyroMark pyrosequencing assays to quantify the percentage of methylation of specific candidate CpG dinucleotides identified with the HM450 arrays. PyroMark Assay Design software (version 2.0; Qiagen) was used to design assays for pyrosequencing four CpG dinucleotides: cg21101720, cg14215472, cg26532627, and cg03537386, which were found to be differentially methylated between normal colon and colon neoplasms. The technical accuracy of each assay was tested using defined mixtures of standard EpiTect methyl/unmethyl control DNA. The following DNA mixtures were used for each assay: 100%, 75%, 50%, 25%, and 0% methylated DNA. The pyrosequencing results correlated well with the known DNA methylation content in the defined mixtures (r2 ¼ 0.9994 for cg21101720, 0.9981 for cg14215472, 0.9989 for cg26532627, and 0.9907 for cg03537386; all P < .0001). DNA Methylation Patterns and Colorectal Cancer 429.e4 Supplementary Figure 2. Technical validation of methylation status of CpGs identified as differentially methylated CpGs on the HM450 arrays. The pyrosequencing assays that were developed in Supplementary Figure 1 were used to assess the percentage of methylated CpG alleles at cg21101720, cg14215472, cg26532627, and cg03537386 in the same samples that were run on the HM450 arrays. These studies were conducted to determine the reliability of the HM450 results. Eighteen samples run on the HM450 arrays (randomly selected based on DNA availability) were used to assess the percentage of methylated cytosine of the cg21101720 (A) and cg14215472 (B) probes using pyrosequencing. The relative methylation levels measured by HM450 correlated reasonably well with the percentage of methylation determined by pyrosequencing (r2 ¼ 0.786 for cg21101720, and 0.893 for cg14215472, both P < .0001). 429.e5 Luo et al Gastroenterology Vol. 147, No. 2 Supplementary Figure 3. Determination of the distribution of the differentially methylated CpG probes when analyzed on the basis of the type of region in which the CpG is located. The CpG probes that are on the HM450 arrays have been annotated to include information regarding their location in promoter regions, intergenic regions, intragenic regions, etc. We assessed the distribution of the location of the differentially methylated CpGs (colorectal neoplasms [adenoma and CRCs] vs normal colon). We identified 86,460 methylated probes whose methylation status varied significantly between normal colon samples and colorectal adenomas and CRCs with a q value <1E-5. Almost 40% of these probes are hypermethylated in colorectal neoplasms. The majority of CpG probes that are hypermethylated in colon adenomas and CRCs are located in the promoter regions, and the majority of the hypomethylated CpG probes are located in intergenic regions (Fisher exact test P value <.1). Supplementary Figure 4. Heatmap of hypermethylated and hypomethylated CpG probes (DMPs) (q value <1E-4) in adenoma-H relative to adenoma-L polyps. The color panel on the left of the heatmap indicates the type of region in which the CpG is located and the color panel on the top indicates the cluster group of the adenomas. (A) Heatmap of the 1072 hypermethylated DMPs in adenoma-H polyps vs adenoma-L polyps. Half of the hypermethylated CpGs are located in the promoter regions (50%). (B) Heatmap of the 124 hypomethylated DMPs. The majority of the hypomethylated CpG probes are located in intergenic/intragenic regions (85.5%). August 2014 Supplementary Figure 5. Clustering analysis of adenoma-L polyps, adenoma-H polyps, and normal colon using DMPs (adenoma-H vs adenoma-H) reveals similarity of adenoma-L polyps and normal samples. The right major branch in the dendrogram includes only the adenoma-L polyps and normal colon samples, and the left branch has only adenoma-H polyps. Both the heatmap and the clustering results indicate that the adenoma-L polyps are more like the normal colon than the adenoma-H polyps. DNA Methylation Patterns and Colorectal Cancer 429.e6 Supplementary Figure 6. Multidimensional scaling of pairwise distance derived from the 1000 most variable probes across the adenoma and normal colon samples. Note that the adenoma-H polyps (green) are clustered together on the left, and the adenoma-L polyps (orange) are either inseparable from the normal colon or are in the intermediate distance from the normal cluster on the right. Normal-C, normal colon with concurrent CRC; Normal-H, normal colon with no adenoma or CRC. Supplementary Figure 7. Venn diagram of the hypermethylated promoters that are PcG-marked in cancers and adenomas. Almost all of PcG-marked hypermethylated promoters in CRCs are hypermethylated in the adenoma-H (A) polyps (99.5% of CpGs), however, they are not hypermethylated in the adenoma-L (B) polyps (2.1% of CpGs). 429.e7 Luo et al Supplementary Figure 8. Heatmap of DMPs showing the DMPs in methyl-high CRCs compared with methyl-low CRCs. Comparison of the methyl-high CRCs (M-H) to the methyl-low CRCs (M-L) identified 3,676 DMPs (q < 1E-5) in intragenic regions (A) and 3555 DMPs in intergenic regions (B). Gastroenterology Vol. 147, No. 2 Supplementary Figure 9. Clustering of M-H, M-I/M-L CRCs and the validation set of adenoma-H polyps using DMPs from a comparison of M-H CRCs vs M-I/M-L CRCs. Using the intragenic (A) and intergenic (B) DMPs (M-H vs M-I/M-L, q < 1E-5), the cluster dendrograms for both the intragenic and intergenic CpGs show that most of the adenoma-H polyps cluster with the M-I/M-L CRCs rather than with the M-H CRCs. August 2014 Supplementary Figure 10. Multidimensional scaling of pairwise distance derived from the 1000 most variable differentially methylated CpGs across adenoma-H polyps, M-H CRCs, and M-I/M-L CRCs. Although a few of the M-H CRCs (green) are dispersed throughout the space, most of them are clustered tightly together and show a substantial distance from the adenoma-H polyps. On the other hand, M-I/M-L CRCs (orange) and adenoma-H polyps (purple) cluster together more closely for CpGs located in intergenic (A) and intragenic (B) regions. DNA Methylation Patterns and Colorectal Cancer 429.e8 Supplementary Figure 11. Validation of methyl-high (M-H), methyl-intermediate (M-I), and methyl-low (M-L) CRC classes through the analysis of published methylation array data. (A) The publically available dataset GSE25062 was accessed through www.pubmed.com. We first identified the CpG probes that are common between the GSE25062 dataset and our datasets. We then applied unsupervised clustering analysis of the shared differentially methylated CpG probes. This clustering analysis resulted in the identification of 3 subtypes of CRCs, similar to our results using our discovery set of CRCs. (B) We next repeated this process using data from the HM450 methylation array studies of the TCGA colorectal cancers. We again identified 3 subtypes of CRCs using the 10,000 most variable CpG probes identified in our discovery set of CRCs.