Targeted resequencing analysis of 25 genes
Transcription
Targeted resequencing analysis of 25 genes
Published Ahead of Print on July 5, 2013, as doi:10.3324/haematol.2013.086686. Copyright 2013 Ferrata Storti Foundation. Early Release Paper Targeted resequencing analysis of 25 genes commonly mutated in myeloid disorders in del(5q) myelodysplastic syndromes by Marta Fernandez-Mercado, Adam Burns, Andrea Pellagatti, Aristoteles Giagounidis, Ulrich Germing, Xabier Agirre, Felipe Prosper, Carlo Aul, Sally Killick, James S. Wainscoat, Anna Schuh, and Jacqueline Boultwood Haematologica 2013 [Epub ahead of print] Citation: Fernandez-Mercado M, Burns A, Pellagatti A, Giagounidis A, Germing U, Agirre X, Prosper F, Aul C, Killick S, Wainscoat JS, Schuh A, and Boultwood J. Targeted resequencing analysis of 25 genes commonly mutated in myeloid disorders in del(5q) myelodysplastic syndromes. Haematologica. 2013; 98:xxx doi:10.3324/haematol.2013.086686 Publisher's Disclaimer. E-publishing ahead of print is increasingly important for the rapid dissemination of science. Haematologica is, therefore, E-publishing PDF files of an early version of manuscripts that have completed a regular peer review and have been accepted for publication. E-publishing of this PDF file has been approved by the authors. After having E-published Ahead of Print, manuscripts will then undergo technical and English editing, typesetting, proof correction and be presented for the authors' final approval; the final version of the manuscript will then appear in print on a regular issue of the journal. All legal disclaimers that apply to the journal also pertain to this production process. Haematologica (pISSN: 0390-6078, eISSN: 1592-8721, NLM ID: 0417435, www.haematologica.org) publishes peer-reviewed papers across all areas of experimental and clinical hematology. The journal is owned by the Ferrata Storti Foundation, a non-profit organization, and serves the scientific community with strict adherence to the principles of open access publishing (www.doaj.org). In addition, the journal makes every paper published immediately available in PubMed Central (PMC), the US National Institutes of Health (NIH) free digital archive of biomedical and life sciences journal literature. Support Haematologica and Open Access Publishing by becoming a member of the European Hematology Association (EHA) and enjoying the benefits of this membership, which include free participation in the online CME program Official Organ of the European Hematology Association Published by the Ferrata Storti Foundation, Pavia, Italy www.haematologica.org Targeted resequencing analysis of 25 genes commonly mutated in myeloid disorders in del(5q) myelodysplastic syndromes Running heads: Targeted resequencing of 25 genes in del(5q) MDS 1 2 1 Marta Fernandez-Mercado, * Adam Burns, * Andrea Pellagatti, Aristoteles Giagounidis, 4 5 5 3 3 6 Ulrich Germing, Xabier Agirre, Felipe Prosper, Carlo Aul, Sally Killick, James S. Wainscoat, Anna Schuh 2¥ and Jacqueline Boultwood 1 1¥ 1 LLR Molecular Haematology Unit, NDCLS, RDM, John Radcliffe Hospital, Oxford, UK; 2 NIHR Biomedical Research Centre, Oxford, UK; Medizinische Klinik II, St Johannes Hospital, 3 4 Duisburg, Germany; Department of Hematology, Oncology and Clinical Immunology, 5 Heinrich-Heine-Universität, Düsseldorf, Germany; Division of Cancer and Area of Cell Therapy and Haematology Service, Foundation for Applied Medical Research, Clínica Universitaria, 6 Universidad de Navarra, Pamplona, Spain; and Department of Haematology, Royal Bournemouth Hospital, Bournemouth, UK Statement of equal authors’ contribution: ¥ *MFM and AB contributed equally to this manuscript; JB and AS were co-senior authors Correspondence Professor Jacqueline Boultwood, LLR Molecular Haematology Unit, Nuffield Division of Clinical Laboratory Sciences, Radcliffe Department of Medicine, University of Oxford, John Radcliffe Hospital, Oxford OX3 9DU, UK. E-mail: [email protected] Acknowledgments The authors would like to thank Leukaemia and Lymphoma Research of the United Kingdom and the Oxford Partnership Comprehensive Biomedical Research Centre, with funding from the Department of Health's NIHR Biomedical Research Centres funding scheme for funding this work. The views expressed in this publication are those of the authors and not necessarily those of the Department of Health. The authors would like to thank the patients who accepted to participate in this study. The authors would also like to thank all co-workers in their laboratories for their technical assistance as well as all physicians for referring patient material. 1 Abstract Interstitial deletion of chromosome 5q is the most common chromosomal abnormality in myelodysplastic syndromes. The catalogue of genes involved in the molecular pathogenesis of myelodysplastic syndromes is rapidly expanding and next-generation sequencing technology allows detection of these mutations at high depth. Here we describe the design, validation and application of a targeted next-generation sequencing approach to simultaneously screen 25 genes mutated in myeloid malignancies. We used this method alongside single nucleotide polymorphism-array technology to characterize the mutational and cytogenetic profile of 43 early or advanced del(5q) myelodysplastic syndrome cases. A total of 29 mutations were detected in our cohort. Overall, 45% of early and 66.7% of advanced cases presented at least one mutation. Genes with the highest mutation frequency among advanced cases were TP53 and ASXL1 (25% of patients each). These showed a lower mutation frequency in 5q- syndrome cases (4.5% and 13.6%, respectively), suggesting a role in disease progression in del(5q) myelodysplastic syndromes. 52% of mutations identified were in genes involved in epigenetic regulation (ASXL1, TET2, DNMT3A and JAK2). Six mutations showed allele frequencies <20%, likely below the detection limit of traditional sequencing methods. Genomic array data showed that advanced del(5q) myelodysplastic syndrome cases displayed a complex background of cytogenetic aberrations, often encompassing genes involved in myeloid disorders. Our study is the first to investigate the molecular pathogenesis of early and advanced del(5q) myelodysplastic syndromes using next-generation sequencing technology on a large panel of genes frequently mutated in myeloid malignancies, further illuminating the molecular landscape of del(5q) myelodysplastic syndromes. 2 Introduction The myelodysplastic syndromes (MDS) represent a heterogeneous group of clonal hematopoietic stem cell (HSC) malignancies that are characterized by ineffective hematopoiesis resulting in peripheral cytopenias, and typically a hypercellular bone marrow. The MDS are preleukemic conditions showing frequent progression (approximately 40% of patients) to acute myeloid leukemia (AML). In the early stages of the disease, apoptosis of the bone marrow precursor cells prevails, but in more advanced disease increased proliferation of immature blasts occurs.1 About 50% of MDS exhibit acquired genomic abnormalities detected by conventional cytogenetic banding techniques. Recent molecular investigations have revealed additional genetic abnormalities in MDS, including micro-deletions and loss of heterozygosity (LOH) due to acquired uniparental disomy (UPD).2 Interstitial deletion within the long arm of chromosome 5 [del(5q)] is one of the most frequent cytogenetic abnormalities observed in myeloid malignancies, occurring in approximately 1020% of patients with de novo MDS3 and in a similar proportion of patients with de novo AML.4 In de novo MDS the del(5q) occurs either in isolation or together with other karyotypic abnormalities. Although the 5q- is a good prognostic indicator when found in isolation,5 this is 6 not the case when the 5q- is part of a complex karyotype. In a large MDS database, del(5q) was reported as an isolated abnormality in 14% of patients with clonal abnormalities, in 5% with one other abnormality, and in 11% with a complex karyotype.6 The median overall survival in these groups was 80, 47 and 7 months respectively.6 These findings are consistent with the general notion that the total number of cytogenetic changes found represents an independent factor that can allow for the stratification of patient cohorts into prognostic subgroups. The 5q- syndrome is the most distinct of all the MDS and is characterized by isolated del(5q), severe macrocytic anemia, frequent thrombocytosis, female predominance, and a lower risk of progression to AML.7 Patients with the 5q- syndrome have one of the best outcomes of any 8 7,8 MDS subgroup, with relatively long survival often of several years of duration. Whilst a small number of gene mutations have been reported in the 5q- syndrome, including mutation of TP53 and JAK2,9,10 the molecular landscape of this disease remains to be fully determined. Approximately 10% of patients with the 5q- syndrome show transformation to AML,7 but the genetic aberrations that drive this process are not fully determined. The International Prognostic Scoring System (IPSS)11 and its revised version8 are based upon karyotypic abnormalities as well as on morphological data. Recently, a new and more comprehensive cytogenetic scoring system has been developed, which allows for a refined cytogenetic risk prediction.12 However, the heterogeneous clinical outcome observed within the karyotypically and morphologically-defined groups in the IPSS indicates that it may be possible to refine the cytogenetic classification by using additional markers. The catalogue of genes that play a role in the molecular pathogenesis of MDS is rapidly expanding, and includes TET2, SF3B1, EZH2 and ASXL1.13-16 Unraveling the genetic complexity of MDS promises to elucidate 3 the pathophysiology of this disease, refine the taxonomy and prognostic scoring systems, and provide novel therapeutic targets. Technological advances in DNA sequencing provide an important tool to analyze heterogeneous cancer samples. Massively parallel sequencing enables the analysis of independent, clonal, DNA molecules 17 and offers the opportunity to adjust the balance between breadth and depth of such assays to identify a wide variety of potentially critical DNA changes in tumors. Broad approaches, such as whole genome and whole exome sequencing have been used to discover new cancer gene mutations 15,18,19 or to study clonal evolution.20 In particular, several such studies in MDS have identified recurrently mutated genes and novel pathways involved in pathogenesis, such as those encoding splicing factors.15,19 However, these genomewide approaches are still expensive and have relatively low sensitivity. In contrast, a more targeted sequencing approach aimed at detecting selected recurrent mutations in MDS allows for cost-effective and fast sequencing at the high depth required for accurate characterization of heterogeneous cancer samples. Here, we describe the design, validation and application of a targeted next generation sequencing (NGS) approach using a bench-top platform to simultaneously screen 25 genes mutated in a range of myeloid malignancies. We used this method to characterize the mutational profile of a cohort of 43 MDS cases with del(5q). Methods Patient samples Test Cohort Nine MDS samples, with known mutations detected by Sanger sequencing, pyrosequencing or amplification refractory mutation system PCR (ARMS-PCR) in at least one of our target genes, were selected to validate our gene panel. The nine samples chosen contained a total of 13 variants across 8 genes and included missense (n=4), nonsense (n=1) and frameshift (n=8) mutations (Table 1). MDS del(5q) Cohort Samples from 43 untreated MDS cases harboring a del(5q) were selected for mutational screening (mean age 66.0, range 24-88). These included 22 patients with 5q- syndrome, 9 cases with refractory anemia (RA) with additional karyotypic abnormalities, and 12 cases with advanced MDS (defined as having an increased number of blasts and including 11 RA with excess of blasts, RAEB, and 1 CMML in transformation). All karyotypes were determined by conventional G-banding. This study was approved by the ethics committees of the institutes involved and informed consent was obtained. 4 DNA Extraction Genomic DNA was isolated by phenol-chloroform extraction from peripheral blood neutrophils isolated using Histopaque (Sigma-Aldrich) and pelleted after hypotonic lysis of erythrocytes. The purity of the neutrophil populations was high, >95%, as assessed by standard morphology on Wright-Giemsa-stained cytospin preparations. Targeted re-sequencing We designed a TruSeq Custom Amplicon panel (TSCA, Illumina), targeting 25 genes mutated in various myeloid malignancies (Table 2). The panel was developed using the online DesignStudio pipeline (http://designstudio.illumina.com, Illumina), and covers a total of 46,604bp with 322 amplicons. In genes with well-defined mutational hotspots only these regions were targeted; otherwise the entire coding sequence of the gene was sequenced. Libraries prepared from 250ng DNA were subjected to 250bp paired-end sequencing. Protein sequences resulting from detected DNA-sequence changes were predicted using insilico.ehu.es on-line tool,21 and Alamut Software (Interactive Biosoftware, San Diego, CA, USA). PolyPhen-2 v2.2.2 on-line tool 22 was used to predict the functional effect of variant calls (Polymorphism Phenotyping v2, http://genetics.bwh.harvard.edu/pph2/). FLT3 ITD fragment analysis Thirty-three samples in the test cohort with sufficient DNA were screened for internal tandem duplications in FLT3 gene (FLT3-ITD) using conventional fragment analysis.23 These included 20 5q- syndrome cases, 5 del(5q) RA with additional karyotypic abnormalities and 8 advanced del(5q) MDS cases. Genomic array profiling Single nucleotide polymorphism (SNP) array data were available from a previously published study2 for 33 of the 43 samples included in the targeted sequencing analysis. Those data allowed us to identify cryptic copy number changes and UPD regions. Results Quality of MDS del(5q) MiSeq sample run The number of clusters that passed the quality filter was over 100,000 for the majority of the samples (40/43, 93%) (Figure S1). Paired-end MiSeq sequencing produced more than 2.2Gb of sequence data with 91% of reads higher than the quality threshold of Q30, exceeding the expected minimums of 2Gb and 75%, respectively. The average depth of coverage across all samples was >390x, with 98% of cases (42/43) over 250x, 91% (39/43) over 300x and 49% 5 (21/43) ≥400x. The overall sensitivity of the assay and its background noise were estimated at 1-3% (Supplementary Information and Table S1). Validation of the Myeloid Gene Panel In order to compare the accuracy and sensitivity of our TSCA assay against standard methods of mutation screening (Sanger sequencing, pyrosequencing, fragment analysis), we rescreened 9 test samples (Table 1a), containing 17 variants across 11 genes (ASXL1, DNMT3A, EZH2, FLT3, IDH1, IDH2, KIT, NPM1, NRAS, RUNX1 and TP53). Using the BaseSpace data analysis pipeline, we were able to successfully identify 5 missense, 3 nonsense and 7 frameshift mutations in our validation cohort (15 out of 17, 88.2%). In particular, short indel (insertions/deletions) mutations in both ASXL1 and NPM1 (1bp and 5bp respectively) were correctly identified by the BaseSpace analysis software. Analysis of the TEST009 aligned reads in the Integrative Genomics Viewer (IGV, Broad Institute), revealed a dramatically reduced read depth of 30x across TP53, compared to >1000x in other samples, suggesting that there was a failure to align the reads to the reference sequence. The TEST009 sequence data was therefore submitted to a second alignment and variant calling pipeline (Stampy/Platypus 24,25 ), which successfully identified a 19bp deletion (Figure S2). Two 109bp and 64bp FLT3 internal tandem duplications (ITD) in samples TEST003 and TEST004 respectively were only called after visual inspection of the un-aligned data for reads matching part of the FLT3 target sequence. The presence of FLT3-ITDs was subsequently confirmed by fragment analysis. In addition to the known control mutations, we identified 6 mutations affecting 5 genes in 5 samples (Table 1b). One of these mutations, the C1464X variant in TEST001, was visible in earlier Sanger sequencing traces; however at the time, the variant had not been called as it was within the range of background noise (Figure S3). All other additional mutations were confirmed by Sanger sequencing and fragment analysis (Figure S4 and Table S2). Mutations detected in del(5q) cases Highly purified peripheral blood neutrophil DNA samples from 43 MDS cases harboring a del(5q) were subjected to mutational screening using the 25-gene panel described above. A total of 4036 variant calls were detected by a combination of BaseSpace, Stampy/Platypus and visual inspection of the FLT3 locus. Of these, all non-synonymous variant calls with a COSMIC ID (i.e. recorded in Catalogue of Somatic Mutations in Cancer26) were considered relevant. We also included in the analysis all non-synonymous variant calls not found in either COSMIC or the dbSNP database (build 135). A total of 29 non-synonymous variants were called in 10 different genes: 7 affecting TP53, 6 ASXL1, 5 TET2, 2 CBL, 2 DNMT3A, 2 SF3B1, 2 JAK2, 1 U2AF1, 1 RUNX1 and 1 WT1 (Table 3, Figure 1, Table S3). In addition, 21 synonymous 6 variants with a COSMIC ID were found in 5 different genes (10 PDGFRA, 5 IDH1, 3 cKIT, 2 FLT3 and 1 TP53) (Table S4). Distribution of the non-synonymous mutations among disease subgroups A total of 29 mutations were detected in our cohort of 43 del(5q) MDS cases. Twelve of 29 mutations were found in 9 of the 22 5q- syndrome cases (45.0%) (Table 3). Five mutations affected 4 of the 9 del(5q) RA cases (44.0%) with additional cytogenetic aberrations (Table 3). The more advanced del(5q) MDS cases presented a higher proportion of sequence changes: 12 variant calls were found in 8 of the 12 advanced del(5q) cases (66.7%) (Table 3). The genes with the highest mutation frequency in this cohort were TP53 (3/12 patients, 25%; 5 mutations in total as two patients had two TP53 mutations) and ASXL1 (3/12, 25%). The mutation frequency for these two genes was lower in 5q- syndrome cases (TP53 1/22, 4.5%; ASXL1 3/22, 13.6%). Other mutations were identified in 5q- syndrome cases (3 TET2, 2 SF3B1, 1 DNMT3A, 1 RUNX1 and 1 WT1), in del(5q) RA with additional cytogenetic abnormalities (1 additional TP53, 1 CBL, 1 DNMT3A, 1 U2AF1 and 1 JAK2) and in advanced del(5q) MDS cases (2 TET2, 1 CBL and 1 JAK2). It is of note that six of the mutations detected in this study present variant frequencies lower than 20%, which are likely to be below the level of detection of Sanger sequencing.27,28 These low frequency mutations were found in the following genes: 2 TET2, 1 ASXL1, 1 DNMT3A, 1 JAK2 and 1 SF3B1 (Figure 1, Table S3). These data show that a number of different gene mutations occur in patients with the 5qsyndrome and that advanced del(5q) MDS cases display a greater mutation frequency than early del(5q) MDS cases, with mutation of TP53 and ASXL1 genes being the most frequent. Co-occurring mutations: analysis of clonality and timing of mutation acquisition Clonal evolution has been documented as MDS transforms to AML,29 and when de novo AML relapses after initial chemotherapy.30 The proportion of sequencing reads reporting a given mutation can be used to estimate the fraction of tumour cells carrying that mutation, and to identify whether mutations are clonal (in all tumor cells) or subclonal (in a fraction of tumor cells).31 This estimation needs to take into account copy number and loss of heterozygosity (LOH) data. Five cases in our cohort showed mutations in more than one gene. Whole genome array data was available for all of them.2 The genes with co-occurring mutations were ASXL1, WT1, SF3B1, TET2, DNMT3A, JAK2 and CBL (Figure 1, Figure 2). In two cases (1 5q- syndrome, MDS08, and 1 CMML, MDS42) two mutations were present at similar allele frequency, ASXL1 (44.7%) and WT1 (49.0%) in the 5q- syndrome case, and ASXL1 (45.4%) and CBL (96.0%) -the latter within a UPD region- in the CMML case. This is suggestive of a dominant clonal population of cells. In this scenario, it is not possible to determine the temporal order of mutations. A third case (MDS29, a del(5q) RA with additional 7 karyotypic abnormalities) had a DNMT3A mutation at variant allele frequency of ~44%, and a JAK2 mutation at ~7%. Since the copy number showed these to have occurred in diploid regions without any LOH, the fraction of cells carrying the mutations would be ~88% and ~14% respectively. On this basis, we could not infer if the JAK2 mutation was subclonal to the cells carrying the DNMT3A mutation or if, on the contrary, it represented an independent clone. However, assuming that each mutation occurred only once during tumour evolution, it is possible to suggest that DNMT3A mutation occurred earlier than JAK2 in the disease course. Similarly, the fourth case (MDS12, a 5q- syndrome case) had ~80% of cells carrying a SF3B1 mutation, ~20% with ASXL1 and ~10% with TET2. We can therefore suggest that the SF3B1 mutation occurred before ASXL1 or TET2. The variant allele fractions for ASXL1 and TET2 could be consistent with either TET2 being subclonal to ASXL1 or on a separate branch of the phylogenetic tree, so we cannot establish the timing of those two mutations to each other. Finally, the fifth case (MDS43, a del(5q) RAEB case) presented ~86% of cells with ASXL1 and ~55% with TET2. In this case, it was clear that TET2 was subclonal to ASXL1 and must have occurred later. Copy number changes and uniparental disomy analysis Thirty-three (18 5q- syndrome, 6 del(5q) RA with additional cytogenetic abnormalities, and 9 cases of advanced del(5q) MDS) of the 43 del(5q) MDS samples included in the targeted sequencing analysis had been previously analysed by SNP-arrays to identify cryptic copy number changes and regions of UPD (defined as continuous stretches of homozygous SNP calls >2 Mb without copy number loss). 2 The results of the analysis are listed in Table S5. The del(5q) was characterized in all 33 cases. Copy number changes in addition to the del(5q) were observed in 6 of 9 advanced MDS cases (66.7%) and 4 of 6 del(5q) RA with additional cytogenetic abnormalities cases (66.7%), but in only 4 of 18 5q- syndrome cases (22.2%). In the 5q- syndrome group, 31 regions of UPD were identified in 17 of 18 patients. All other cases included in this study showed regions of UPD, 6 regions in all 6 del(5q) RA with additional cytogenetic aberrations, and 17 in all 9 del(5q) advanced cases. 2 A proportion of the regions affected by copy number loss encompassed genes that are part of our TSCA gene panel. In advanced del(5q) MDS cases, these were EZH2, NPM1, ETV6, ASXL1 and TP53 (Figure 1, Table S6). Additional regions of cytogenetic loss encompassed CBL and ETV6 in two different del(5q) RA with additional cytogenetic aberration cases (Figure 1, Table S6). The only DNMT3A loss was seen in a 5q- syndrome case. In the one case (a del(5q) RAEB case) presenting cytogenetic loss encompassing TP53, the remaining copy presented a missense mutation (R273H), predicted to be damaging to the function of the protein (Table S3). 8 These results show that advanced del(5q) MDS cases display a more complex landscape of cytogenetic aberrations, both karyotypically evident and cryptic. These regions often contain genes involved in myeloid disease. Discussion In this study, we sought to validate an Illumina-based targeted NGS platform to simultaneously screen 25 genes relevant to myeloid malignancies for mutations. Once validated, we aimed to use this gene panel to characterize the mutational profile of a cohort of 43 MDS cases with del(5q), in the context of additional molecular and high-density genomic array data. The prevalence of the mutations detected in complex DNA samples has typically been limited to approximately 20% using Sanger sequencing.27,28 The development of specific mutation enrichment or detection strategies has greatly increased this sensitivity. 32,33 In keeping with the improved power of mutation detection of NGS over traditional sequencing techniques, we identified previously undetected mutations in the validation cohort (that comprised 9 test samples containing 17 variants across 11 genes) in addition to the previously known mutations in these samples. Using the BaseSpace and Stampy/Platypus24,25 analysis software, we were able to successfully identify all point mutations, short indels and deletions included in the validation cohort. However, FLT3-ITD variants that consist of patient-specific sequence duplications were amplified and sequenced, but were not identified using either bio-informatics pipeline and therefore had to be visually identified. This highlights the need for further refinements to commercially available analysis pipelines before their use in routine clinical practice. The non-alignment of these reads is largely a function of the comparative size of the insertion or deletion compared to the absolute read length. We are hopeful that in the future longer read lengths, in combination with improvements to the alignment algorithms, will greatly increase the ability to detect these important mutations. Once our panel was successfully validated, we applied it to study the mutational profile of a series of MDS cases with the del(5q). Gene mutation screening in del(5q) MDS has been performed in previous studies, but most of these studies focused on a limited numbers of genes, and have mainly employed traditional sequencing methods. investigated larger number of genes 41-43 9,34-40 Other studies have but did not specifically focus on MDS cases with del(5q). To our knowledge, the present work is the first attempt to screen a large number of genes using a targeted NGS approach in both early and advanced del(5q) MDS. A total of 29 mutations were detected in our cohort of 43 del(5q) MDS cases. Overall, 45% of 5q- syndrome and 44% of del(5q) RA with additional cytogenetic aberrations cases presented at 9 least one mutation. The more advanced del(5q) cases showed a higher proportion of mutated cases, and 66.7% presented at least one mutation. The genes with the highest mutation frequency among advanced cases were TP53 and ASXL1 (25% of patients each). The mutation frequency for these two genes was lower in 5q- syndrome cases (TP53 4.5%, ASXL1 13.6%). We therefore confirmed in our del(5q) cohort the observation made by our group and 9,44,45 others that TP53 mutations occur predominantly in MDS with complex karyotype. The increased incidence of TP53 and ASXL1 mutations in advanced del(5q) cases in our present study suggests that these abnormalities may play a role in disease progression in del(5q) MDS. These data are consistent with a recent report that has shown that TP53 mutations were associated with disease progression in del(5q) MDS. 43 The 5q- syndrome is widely considered to be relatively genetically stable compared to other MDS subtypes, on the basis of molecular studies (including genomic array data analysis).2 This is reflected in its relatively good prognosis. 8,11 Previous studies have shown mutations in a limited number of genes, including TP53, JAK2 and ASXL19,10,34,46 in this MDS subtype. The incidence of JAK2 and ASXL1 mutations is ~6%.37 Here, we show that over 40% of patients with the 5q- syndrome in fact harbor a gene mutation, including TET2, SF3B1, RUNX1, WT1 and ASXL1. The SF3B1 mutations detected in this study were identified in 5q- syndrome cases, and not in the other two del(5q) patient groups, which are karyotypically or morphologically defined by more advanced disease. SF3B1 mutations have been associated with a relatively benign disease course.15,43,47 It has been suggested that multipotent hematopoietic stem cells initially attain a splicing factor mutation as founding genetic lesion, and subsequently acquire additional mutations that drive their malignant transformation.43,48 This is consistent with our finding in one 5q- syndrome case with a high SF3B1 mutant allele frequency and two other mutations (ASXL1 and TET2) with lower mutant allele frequencies. The present study shows that a high proportion of genes involved in the epigenetic regulation of the cell (TET2, ASXL1, DNMT3A and JAK2) are affected by either mutations or cytogenetic losses in del(5q) MDS cases: 15 of 29 genes with non-synonymous mutations (51.7%) and 4 of 10 genes in regions affected by cytogenetic loss (40%) were epigenetic regulators. This observation is consistent with a recent report of mutations in a large cohort of MDS (n=117), where the authors also found 80 mutations in genes predicted to affect the epigenetic regulation of the cell in half of the cohort (52% of cases).43 Genome-wide methylation analysis on a subset of cases with and without mutations in epigenetic factors did not highlight a specific DNA methylation profile associated with these mutations (Supplementary Information and Figure S5). A total of six of the mutations detected in this study present variant frequencies below the level of detection of Sanger sequencing, which is estimated to be around 15-20%.27,28 For example 10 one patient with the 5q- syndrome showed a DNMT3A mutant allele frequency of 7.8% and another case a SF3B1 mutant allele frequency of 11.4%. Sanger Sequencing has been the gold standard for sequencing for many years, and the vast majority of sequencing studies published to date have used this technology. It is likely that previous studies underestimated the prevalence of mutations in MDS. This has recently been illustrated by Jadersten et al.34 who used NGS to reveal TP53 mutations (median clone size 11%) in nearly 20% of low-risk MDS patients with del(5q). Our data support the hypothesis that the prevalence of mutations in del(5q) MDS may have also been underestimated for other genes. Here, we have shown that genes involved in the epigenetic regulation of the cell frequently harbor low-frequency mutations in del(5q) MDS, non detectable by means of Sanger sequencing. This has previously been demonstrated for TET2 in MDS and CMML.49 The proportion of variant reads can be used to determine the order of occurrence of multiple mutations and therefore to infer the clonal evolution from early stages of the disease. Interestingly, ASXL1 was one of the genes involved in four of the five cases with two or more mutations, with a lower variant frequency than the other co-mutated genes in three of the cases, suggesting that mutation of ASXL1 represented a later event in the disease course in these cases. Our analysis of clonality was based on a small number of single cases with multiple mutations. Studies with a similar sensitivity involving larger MDS cohorts will certainly help establishing the phylogenetic structure of tumor evolution. In summary, we have successfully developed and validated a panel that allows for the screening of 25 genes frequently mutated in myeloid malignancies. The present study on del(5q) MDS has shown that a number of gene mutations occur in patients with the 5qsyndrome, and that >40% of patients with this low-risk MDS subtype harbor at least one gene mutation. A higher percentage of mutations was found among the more advanced del(5q) MDS cases, with TP53 and ASXL1 being the more frequently mutated genes. Our study is the first to investigate and compare the molecular pathogenesis of early and advanced del(5q) MDS using targeted NGS technology on a large panel of genes frequently mutated in myeloid malignancies. Authorship and Disclosures: Conceived and designed the experiments: JB, AS, JSW. Performed the experiments: MFM, AB. Analyzed the data: MFM, AB, AP, XA, FP. Contributed reagents/materials/analysis tools: SK, AG, CA, UG. Wrote the paper: MFM, AB, AP, JSW, AS, JB. 11 References 1. Heaney ML, Golde DW. Myelodysplasia. N Engl J Med. 1999;340(21):1649-60. 2. Wang L, Fidler C, Nadig N, Giagounidis A, Della Porta MG, Malcovati L, et al. Genomewide analysis of copy number changes and loss of heterozygosity in myelodysplastic syndrome with del(5q) using high-density single nucleotide polymorphism arrays. Haematologica. 2008;93(7):994-1000. 3. Bernasconi P, Klersy C, Boni M, Cavigliano PM, Calatroni S, Giardini I, et al. Incidence and prognostic significance of karyotype abnormalities in de novo primary myelodysplastic syndromes: a study on 331 patients from a single institution. Leukemia. 2005;19(8):142431. 4. Johansson B, Harrison C. Acute Myeloid Leukemia. In: Heim S, Mitelman F, eds. Cancer Cytogenetics (ed 3rd). Hoboken, NJ, 2009:45-139. 5. Giagounidis AA, Germing U, Haase S, Hildebrandt B, Schlegelberger B, Schoch C, et al. Clinical, morphological, cytogenetic, and prognostic features of patients with myelodysplastic syndromes and del(5q) including band q31. Leukemia. 2004;18(1):113-9. 6. Haase D, Germing U, Schanz J, Pfeilstocker M, Nosslinger T, Hildebrandt B, et al. New insights into the prognostic impact of the karyotype in MDS and correlation with subtypes: evidence from a core dataset of 2124 patients. Blood. 2007;110(13):4385-95. 7. Boultwood J, Pellagatti A, McKenzie AN, Wainscoat JS. Advances in the 5q- syndrome. Blood. 2010;116(26):5803-11. 8. Greenberg PL, Tuechler H, Schanz J, Sanz G, Garcia-Manero G, Sole F, et al. Revised international prognostic scoring system for myelodysplastic syndromes. Blood. 2012;120(12):2454-65. 9. Fidler C, Watkins F, Bowen DT, Littlewood TJ, Wainscoat JS, Boultwood J. NRAS, FLT3 and TP53 mutations in patients with myelodysplastic syndrome and a del(5q). Haematologica. 2004;89(7):865-6. 10. Wong KF, Wong WS, Siu LL, Lau TC, Chan NP. JAK2 V617F mutation is associated with 5q- syndrome in Chinese. Leuk Lymphoma. 2009;50(8):1333-5. 11. Greenberg P, Cox C, LeBeau MM, Fenaux P, Morel P, Sanz G, et al. International scoring system for evaluating prognosis in myelodysplastic syndromes. Blood. 1997;89(6):2079-88. 12. Schanz J, Tuchler H, Sole F, Mallo M, Luno E, Cervera J, et al. New comprehensive cytogenetic scoring system for primary myelodysplastic syndromes (MDS) and oligoblastic acute myeloid leukemia after MDS derived from an international database merge. J Clin Oncol. 2012;30(8):820-9. 13. Ernst T, Chase AJ, Score J, Hidalgo-Curtis CE, Bryant C, Jones AV, et al. Inactivating mutations of the histone methyltransferase gene EZH2 in myeloid disorders. Nat Genet. 2010;42(8):722-6. 14. Gelsi-Boyer V, Trouplin V, Adelaide J, Bonansea J, Cervera N, Carbuccia N, et al. Mutations of polycomb-associated gene ASXL1 in myelodysplastic syndromes and chronic myelomonocytic leukaemia. Br J Haematol. 2009;145(6):788-800. 15. Papaemmanuil E, Cazzola M, Boultwood J, Malcovati L, Vyas P, Bowen D, et al. Somatic SF3B1 mutation in myelodysplasia with ring sideroblasts. N Engl J Med. 2011;365(15):1384-95. 12 16. Tefferi A, Lim KH, Abdel-Wahab O, Lasho TL, Patel J, Patnaik MM, et al. Detection of mutant TET2 in myeloid malignancies other than myeloproliferative neoplasms: CMML, MDS, MDS/MPN and AML. Leukemia. 2009;23(7):1343-5. 17. Druley TE, Vallania FL, Wegner DJ, Varley KE, Knowles OL, Bonds JA, et al. Quantification of rare allelic variants from pooled genomic DNA. Nat Methods. 2009;6(4):263-5. 18. Varela I, Tarpey P, Raine K, Huang D, Ong CK, Stephens P, et al. Exome sequencing identifies frequent mutation of the SWI/SNF complex gene PBRM1 in renal carcinoma. Nature. 2011;469(7331):539-42. 19. Yoshida K, Sanada M, Shiraishi Y, Nowak D, Nagata Y, Yamamoto R, et al. Frequent pathway mutations of splicing machinery in myelodysplasia. Nature. 2011;478(7367):64-9. 20. Ding L, Ellis MJ, Li S, Larson DE, Chen K, Wallis JW, et al. Genome remodelling in a basal-like breast cancer metastasis and xenograft. Nature. 2010;464(7291):999-1005. 21. Bikandi J, San Millan R, Rementeria A, Garaizar J. In silico analysis of complete bacterial genomes: PCR, AFLP-PCR and endonuclease restriction. Bioinformatics. 2004;20(5):7989. 22. Adzhubei I, Jordan DM, Sunyaev SR. Predicting Functional Effect of Human Missense Mutations Using PolyPhen-2. Curr Protoc Hum Genet. 2013;Chapter 7:Unit7 20. 23. Murphy KM, Levis M, Hafez MJ, Geiger T, Cooper LC, Smith BD, et al. Detection of FLT3 internal tandem duplication and D835 mutations by a multiplex polymerase chain reaction and capillary electrophoresis assay. J Mol Diagn. 2003;5(2):96-102. 24. Lunter G, Goodson M. Stampy: a statistical algorithm for sensitive and fast mapping of Illumina sequence reads. Genome Res. 2011;21(6):936-9. 25. Rimmer A, Mathieson I, Lunter G, McVean G. Platypus: An Integrated Variant Caller (www.well.ox.ac.uk/platypus). 2012. 26. Forbes SA, Bindal N, Bamford S, Cole C, Kok CY, Beare D, et al. COSMIC: mining complete cancer genomes in the Catalogue of Somatic Mutations in Cancer. Nucleic Acids Res. 2011;39(Database issue):D945-50. 27. Bar-Eli M, Ahuja H, Gonzalez-Cadavid N, Foti A, Cline MJ. Analysis of N-RAS exon-1 mutations in myelodysplastic syndromes by polymerase chain reaction and direct sequencing. Blood. 1989;73(1):281-3. 28. Collins SJ, Howard M, Andrews DF, Agura E, Radich J. Rare occurrence of N-ras point mutations in Philadelphia chromosome positive chronic myeloid leukemia. Blood. 1989;73(4):1028-32. 29. Walter MJ, Shen D, Ding L, Shao J, Koboldt DC, Chen K, et al. Clonal architecture of secondary acute myeloid leukemia. N Engl J Med. 2012;366(12):1090-8. 30. Ding L, Ley TJ, Larson DE, Miller CA, Koboldt DC, Welch JS, et al. Clonal evolution in relapsed acute myeloid leukaemia revealed by whole-genome sequencing. Nature. 2012;481(7382):506-10. 31. Nik-Zainal S, Van Loo P, Wedge DC, Alexandrov LB, Greenman CD, Lau KW, et al. The life history of 21 breast cancers. Cell. 2012;149(5):994-1007. 32. Li M, Diehl F, Dressman D, Vogelstein B, Kinzler KW. BEAMing up for detection and quantification of rare sequence variants. Nat Methods. 2006;3(2):95-7. 13 33. Su Z, Dias-Santagata D, Duke M, Hutchinson K, Lin YL, Borger DR, et al. A platform for rapid detection of multiple oncogenic mutations with relevance to targeted therapy in nonsmall-cell lung cancer. J Mol Diagn. 2011;13(1):74-84. 34. Jadersten M, Saft L, Smith A, Kulasekararaj A, Pomplun S, Gohring G, et al. TP53 mutations in low-risk myelodysplastic syndromes with del(5q) predict disease progression. J Clin Oncol. 2011;29(15):1971-9. 35. Jerez A, Gondek LP, Jankowska AM, Makishima H, Przychodzen B, Tiu RV, et al. Topography, clinical, and genomic correlates of 5q myeloid malignancies revisited. J Clin Oncol. 2012;30(12):1343-9. 36. Pardanani A, Patnaik MM, Lasho TL, Mai M, Knudson RA, Finke C, et al. Recurrent IDH mutations in high-risk myelodysplastic syndrome or acute myeloid leukemia with isolated del(5q). Leukemia. 2010;24(7):1370-2. 37. Patnaik MM, Lasho TL, Finke CM, Gangat N, Caramazza D, Holtan SG, et al. WHOdefined 'myelodysplastic syndrome with isolated del(5q)' in 88 consecutive patients: survival data, leukemic transformation rates and prevalence of JAK2, MPL and IDH mutations. Leukemia. 2010;24(7):1283-9. 38. Patnaik MM, Lasho TL, Finke CM, Knudson RA, Ketterling RP, Chen D, et al. Isolated del(5q) in myeloid malignancies: clinicopathologic and molecular features in 143 consecutive patients. Am J Hematol. 2011;86(5):393-8. 39. Sebaa A, Ades L, Baran-Marzack F, Mozziconacci MJ, Penther D, Dobbelstein S, et al. Incidence of 17p deletions and TP53 mutation in myelodysplastic syndrome and acute myeloid leukemia with 5q deletion. Genes Chromosomes Cancer. 2012;51(12):1086-92. 40. Sokol L, Caceres G, Rocha K, Stockero KJ, Dewald DW, List AF. JAK2(V617F) mutation in myelodysplastic syndrome (MDS) with del(5q) arises in genetically discordant clones. Leuk Res. 2010;34(6):821-3. 41. Bejar R, Stevenson K, Abdel-Wahab O, Galili N, Nilsson B, Garcia-Manero G, et al. Clinical effect of point mutations in myelodysplastic syndromes. N Engl J Med. 2011;364(26):2496506. 42. Damm F, Kosmider O, Gelsi-Boyer V, Renneville A, Carbuccia N, Hidalgo-Curtis C, et al. Mutations affecting mRNA splicing define distinct clinical phenotypes and correlate with patient outcome in myelodysplastic syndromes. Blood. 2012;119(14):3211-8. 43. Mian SA, Smith AE, Kulasekararaj AG, Kizilors A, Mohamedali AM, Lea NC, et al. Spliceosome mutations exhibit specific associations with epigenetic modifiers and protooncogenes mutated in myelodysplastic syndrome. Haematologica. 2013. 44. Jonveaux P, Fenaux P, Quiquandon I, Pignon JM, Lai JL, Loucheux-Lefebvre MH, et al. Mutations in the p53 gene in myelodysplastic syndromes. Oncogene. 1991;6(12):2243-7. 45. Lai JL, Preudhomme C, Zandecki M, Flactif M, Vanrumbeke M, Lepelley P, et al. Myelodysplastic syndromes and acute myeloid leukemia with 17p deletion. An entity characterized by specific dysgranulopoiesis and a high incidence of P53 mutations. Leukemia. 1995;9(3):370-81. 46. Boultwood J, Perry J, Pellagatti A, Fernandez-Mercado M, Fernandez-Santamaria C, Calasanz MJ, et al. Frequent mutation of the polycomb-associated gene ASXL1 in the myelodysplastic syndromes and in acute myeloid leukemia. Leukemia. 2010;24(5):1062-5. 47. Malcovati L, Papaemmanuil E, Bowen DT, Boultwood J, Della Porta MG, Pascutto C, et al. Clinical significance of SF3B1 mutations in myelodysplastic syndromes and myelodysplastic/myeloproliferative neoplasms. Blood. 2011;118(24):6239-46. 14 48. Cazzola M, Rossi M, Malcovati L. Biologic and clinical significance of somatic mutations of SF3B1 in myeloid and lymphoid neoplasms. Blood. 2013;121(2):260-9. 49. Smith AE, Mohamedali AM, Kulasekararaj A, Lim Z, Gaken J, Lea NC, et al. Nextgeneration sequencing of the TET2 gene in 355 MDS and CMML patients reveals lowabundance mutant clones with early origins, but indicates no definite prognostic value. Blood. 2010;116(19):3923-32. 15 Table 1a. Summary of mutations present in test samples used for TSCA panel validation. All Qscore values were generated by GATK through the BaseSpace pipeline, with the exception of the TEST009 variant, which was generated by Platypus. Sample ID Gene TEST001 ASXL1 Mutation TEST001 EZH2 c.1925het_insA; p.G643RfsX13 p.L98Ifs*28 TEST001 EZH2 p.Q250X TEST001 NRAS TEST001 RUNX1 TEST002 ASXL1 TEST003 IDH2 Position Q-score Depth of coverage (x) Frequency 60% Chr20:31,022,442 99 239 Chr7:148529801 99 895 32% Chr7:148523705 99 437 42% p.G12D Chr1:115258747 99 955 32% p.S141X Chr21:36252940 99 361 31% Chr20:31,022,263 99 159 28% Chr15:90,631,934 99 1382 51% Chr5:170,837,548 99 836 23% TEST003 NPM1 c.1748G>GA;p.W583X c.13775G>GA; p.R140Q ----/TCTG TEST003 FLT3 109bp insertion N/A TEST004 NPM1 ----/TCTG Chr5:170,837,548 TEST004 FLT3 64bp insertion N/A TEST005 NPM1 ----/TCTG Chr5:170,837,548 99 527 TEST006 DNMT3A c.2648C>CT;p.R882V Chr2:25,457,242 99 250 44% TEST007 KIT c.2447A>AG;p.D816V Chr4:55,599,321 99 248 39% TEST008 NPM1 ----/TCTG Chr5:170,837,548 99 875 20% TEST008 IDH1 Chr4:209,113,114 99 1390 48% TEST009 TP53 c.6694C>CT; p.R132C TGTACATGGCCATGG CGCGG / T Chr17:7,578,441 200 731 95% Detected visually only 99 734 25% Detected visually only 15% Table 1b. Summary of additional mutations found in the test samples. Mutation Position TET2 p.C1464X Chr4:106,193,930 99 Depth of coverage (x) 423 TET2 p.L1258Afs*10 Chr4:106,164,903 99 2968 25% NPM1 G/GTCTG Chr5:170,837,547 99 999 18% TEST007 RUNX1 p.L71Sfs*24 Chr21:36,259,199 99 395 51% TEST007 SF3B1 p.K700E Chr2:198,266,834 99 2053 45% TEST008 FLT3 p.I836del Chr13:28,592,636 99 2271 48% Sample ID Gene TEST001 TEST005 TEST006 16 Q-score Frequency 47% Table 2. List of genes targeted for enrichment in the TSCA library. Gene Location Chromosomal Coordinates Targeted Exons ASXL1 20q11.21 chr20:30946147-31027122 ATRX Xq21.1 chrX:76,760,356-77,041,719 CBL 11q23.3 chr11:119076986-119178859 CBLB 3q13.11 chr3:105377109-105587887 9, 10 CBLC 19q13.32 chr19:45281126-45303903 9, 10 DNMT3A 2p23.3 chr2:25455830-25564784 23 ETV6/TEL 12p13.2 chr12:11802788-12048325 All 8 exons EZH2 7q36.1 chr7:148504464-148581441 FLT3 13q12.2 chr13:28577411-28674729 IDH1 2q34 chr2:209100953-209119806 2-20 14, 15 (JM and TK1 domains) 20 (D835) 4 IDH2 15q26.1 chr15:90627212-90645708 4 JAK2 9p24.1 chr9:4985245-5128183 12, 14 KIT 4q12 chr4:55524095-55606881 2, 8-11, 13, 17 MPL 1p34.2 chr1:43803475-43820135 10 NPM1 5q35.1 chr5:170814708-170837888 12 NRAS 1p13.2 chr1:115247085-115259515 2, 3 PDGFRA 4q12 chr4:55095264-55164412 12, 14, 18 RUNX1 21q22.12 chr21:36193574-36260987 3-8 SF3B1 2q33.1 chr2:198256698-198299771 15, 16 SRSF2 17q25.1 chr17:74730197-74733493 1 TET2 4q24 chr4:106067842-106200960 3-11 TP53 17p13.1 chr17:7571720-7590868 4-9 U2AF1 21q22.3 chr21:44513066-44527688 2, 6 WT1 11p13 chr11:32409322-32457081 7, 9 (Cys-His zinc finger domains) ZRSR2 Xp22.2 chrX:15808574-15841382 All 11 exons 17 12 8-10 (ADD domain) 17-31 (Helicase domain) 8-9 (ring finger domain and linker sequence) Table 3. Summary of non-synonymous variant calls with a COSMIC ID or not present in dbSNP 5q- syndrome (n=22) Number of mutations (%) TP53 1 (4.5) ASXL1 3 (13.6) TET2 3 (13.6) JAK2V617F 0 CBL 0 DNMT3A 1 (4.5) U2AF1 0 SF3B1 2 (9.1) RUNX1 1 (4.5) WT1 1 (4.5) TOTAL NUMBER OF 12 MUTATIONS Patients presenting at 9 (45.0) least one mutation * Two patients had two TP53 mutations. RA del(5q) with additional karyotypic abnormalities (n=9) Number of mutations (%) 1 (11.1) 0 0 1/8 (12.5) 1 (11.1) 1 (11.1) 1 (11.1) 0 0 0 18 Advanced del(5q) cases (n=12) Number of mutations (%) 5 (41.7)* 3 (25.0) 2 (16.7) 1/8 (12.5) 1 (8.3) 0 0 0 0 0 5 12 4 (44.4) 8 (66.7) Figure Legends Figure 1. Mutations, deletions and loss of heterozygosity in 25 genes analysed in del(5q) MDS samples. Columns show results for each of the 43 analysed cases. Grey boxes indicate mutated cases. Black boxes mark samples for which SNP-array data were available. X: double mutant. Δ: gene encompassed within a region of cytogenetic loss. Θ: gene encompassed within a region of UPD. Figure 2. Mutant allele frequencies in individual del(5q) MDS samples. The area of each coloured circle indicates the allele frequency of the given mutation. The text under the circles lists the frequency and nature of each mutation in order of decreasing allele frequency. 19 Supplementary Information Targeted re-sequencing We designed a TruSeq Custom Amplicon panel (TSCA, Illumina), targeting 25 genes mutated in various myeloid malignancies (Table 2). The panel was developed using the online DesignStudio pipeline (http://designstudio.illumina.com, Illumina), and covers a total of 46,604bp with 322 amplicons. In genes with well-defined mutational hotspots only these regions were targeted; otherwise the entire coding sequence of the gene was sequenced. Dual-barcoded TSCA libraries were created from 250ng of genomic DNA, in accordance with the manufacturer’s instructions, before undergoing 2x150bp paired-end sequencing on the Illumina MiSeq platform. The initial alignment and variant calling analysis was performed with the BaseSpace online analysis tool (https://basespace.illumina.com, Illumina). In order to 1 screen for larger insertions and deletions, the data was also was run through the Stampy and 2 Platypus pipelines, which uses a different algorithm to map sequencing reads to a reference genome. All variants called were visually inspected in IGV. All candidate sequence variations that passed the internal Illumina integrity filters, and with a quality score greater than Q60, were taken forward for further analysis. All variations were confirmed visually and then checked against dbSNP build 135 (NCBI, National Center for Biotechnology Information, USA) and COSMIC (Catalog of Somatic Mutations In Cancer, Wellcome Trust Sanger Institute, Hinxton, Cambridge, UK) databases, to assess whether the variations found were reported polymorphisms or annotated mutations, respectively. Assay sensitivity To evaluate the sensitivity of the assay, we used two different approaches: (1) comparison with absolute real-time PCR quantification for a specific mutation and (2) definition of general background noise across all amplicons. 1. Comparison with real-time PCR V617F We determined the variant allele frequency (VAF) in 7 JAK2 positive samples by real- time PCR, and compared them with the VAF values from the targeted sequencing assay. We performed real-time PCR using the commercially available JAK2 MutaQuant™ kit (Ipsogen, Luminy Biotech, Marseille, France), which distinguishes between JAK2 wild-type and V617F alleles through Taqman allelic discrimination. Allele specific probes, labelled with 5’ reporter and 3’ quencher dyes, for both wild-type and V617F alleles are used to amplify the region of interest. The JAK2 V617F percentage can be calculated from the fluorescent levels of each assay. DNA samples were quantified using a BioPhotometer (Eppendorf, Hamburg, Germany) and normalised to a working concentration of 5ng/µl in nuclease free water. RT-PCR reactions were setup in a 100 well rotor by a CAS1200 liquid handling instrument (Qiagen, Hilden, Germany). Each reaction contained 6.25µl 2x Taqman Universal PCR Master Mix (Applied Biosystems, Life Technologies, Carlsbad, CA,), 0.5µl 25x primer/probe mix (Ipsogen, Luminy Biotech, Marseille, France), 3.25µl nuclease free water and 2.5µl 5ng/µl sample DNA. 4-point duplicate standard curves were included with each run, amplified from standard plasmids included in the kit. Positive (>99.9% V617F) and negative (<0.1% V617F) controls were also included in each run. Each sample was processed in duplicate for both the wild type and V617F alleles. The reactions were amplified on a Rotor-Gene 6000 instrument (Qiagen, Hilden, Germany) with the following PCR conditions: 50°C for 2 minutes, 95°C for 10 minutes followed by 50 cycles of 95°C for 15 seconds and 62°C for 1 minute, with acquisition of FAM fluorescence during the 62°C step. Analysis of the raw data was performed using the Rotor-Gene Q software package (Qiagen, Hilden, Germany). The cycle threshold was set at 0.03 with the slope corrected, as per the manufacturer’s guidelines (Ipsogen, Luminy Biotech, Marseille, France). Raw data tables for both Wild-Type and V617F assays were exported into Excel (Microsoft, Redmond, WA) to facilitate further analysis. The standard curves were plotted (y = mean ct, x = Log 10 CN, where CN is gene copy number/5µl) for both the wild-type and V617F standard samples, and the Y 2 V617F and R values were extracted. The copy number for JAK2 was calculated as: (mean CtJAK2V617F – Standard Curve InterceptJAK2V617F)/Standard Curve SlopeJAK2V617F. JAK2 wild-type copy number was calculated as: (mean CtJAK2WT – Standard Curve InterceptJAK2WT)/Standard V617F Curve SlopeJAK2WT. Final results were determined as a percentage of JAK2 allele load, calculated by: Copy NumberJAK2V617F/(Copy NumberJAK2V617F + Copy NumberJAK2WT) x 100. V617F The variant allele frequency of the JAK2 positive samples, as determined by the real-time PCR assay, ranged from 1-24% (Table S2). All mutations with a VAF >3% (6/7, 86%) were V617F successfully aligned and called as the JAK2 variant. The remaining mutation (1% VAF) was present in the sequencing reads, but was below the detection limit of the variant calling software. 2. Background noise We determined the background noise level of our assay by investigating the sequencing read composition at 31 SNP loci over 14 chromosomes in 15 samples. The SNPs were all initially identified by our data analysis pipeline, are bi-allelic and are all recorded in dbSNP135 as being non-pathogenic. At each locus (465 total), we measured the level of background noise by calculating the percentage of sequencing reads containing any of the alternate nucleotides (3 in the case of homozygous SNPs, 2 in the case of heterozygous SNPs). The mean level of background noise in our assay was thus determined as 0.31% (range 0.00.8%) across all SNP loci in all samples, and was consistently low both between the SNPs (mean 0.31%, range 0.1-0.8%), and between the samples (mean 0.33%, range 0.25-0.55). Interestingly, the background level at heterozygous loci was lower than that at homozygous loci (0.2% and 0.4% respectively). Taken together, we therefore defined the sensitivity of the panel at 1-3% depending on the locus examined and the variant caller software. V617F JAK2 V617F JAK2 pyrosequencing 3 (c.1849G>T) mutation was analysed using primers as previously described. In brief, DNA was amplified in 25μl reactions, containing 2x Qiagen Multiplex PCR Master Mix (Qiagen), 5x Q Solution (Qiagen) and 5mM each of reverse and biotinylated forward primers. o Cycling conditions consisted of an initial denaturation step of 97 C for 15 minutes followed by o o o 35 cycles of 30 seconds at 97 C, 90 seconds at 62 C and 2 minutes at 72 C. The resulting biotinylated PCR product was subjected to pyrosequencing using a Pyromark Q24 System (Qiagen). Pyromark Q24 allele quantification (AQ) software was used to quantify the level (if V617F any) of JAK2 variant present in each sample. FLT3-ITD ARMS-PCR 4 FLT3-ITD mutations were analysed using primers as previously described, modified with WellRED fluorescent dyes. 4,5 In brief, DNA was amplified in 25μl reactions, containing 2x Qiagen Multiplex PCR Master Mix (Qiagen), 5x Q Solution (Qiagen) and 5mM each of forward o and reverse primers. Cycling conditions consisted of an initial denaturation step of 95 C for 15 o o minutes followed by 35 cycles of 30 seconds at 95 C, 1 minute at 56 C and 2 minutes at o o 72 C, with a final extension step of 10 minutes at 72 C. The resulting PCR product was diluted 1:10. 2μl of diluted PCR product was mixed with 40μl Sample Loading Solution (Beckman Coulter) and 0.5μl GenomeLab DNA Size Standard 600 (Beckman Coulter) and subjected to capillary electrophoresis on a CEQ8000 Genetic Analysis System (Beckman Coulter). Data analysis was performed using CEQ analysis software version 9.0.25. NPM1 fragment analysis Validation of the NPM1 mutation was performed by fragment analysis, using primers as 6 previously described. DNA was amplified in 25μl reactions containing 2x Qiagen Master Mix (Qiagen), 10pmol of forward and reverse primers and sterile water up to the final 25μl volume. o Cycling conditions consisted of an initial denaturation step of 95 C for 15 minutes followed by o o o 40 cycles of 30 seconds at 92 C, 30 seconds at 58 C and 20 seconds at 72 C, with a final o extension step of 10 minutes at 72 C. The resulting PCR product was diluted 1:10. 2μl of diluted PCR product was mixed with 40μl Sample Loading Solution (Beckman Coulter) and 0.5μl GenomeLab DNA Size Standard 600 (Beckman Coulter) and subjected to capillary electrophoresis on a CEQ8000 Genetic Analysis System (Beckman Coulter). Data analysis was performed using CEQ analysis software version 9.0.25. Sanger Sequencing Mutations discovered in the validation cohort in TET2, RUNX1, SF3B1 and FLT3 were confirmed by Sanger sequencing. DNA was amplified in 25μl reactions containing 2x Qiagen Master Mix (Qiagen) and 5mM of forward and reverse primers. 5x Q Solution (Qiagen) was used where indicated (Table S2). Cycling conditions for all targets consisted of an initial o o denaturation step of 97 C for 15 minutes followed by 35 cycles of 30 seconds at 92 C, 30 o o o seconds at 55 C (RUNX1 and FLT3) or 60 C (TET2 and SF3B1) and 20 seconds at 72 C, o with a final extension step of 10 minutes at 72 C. The PCR products were purified using MicroClean (Cambio) and 1μl of purified PCR product was used for sequencing with the Big Dye terminator v3.1 chemistry (Applied Biosystems) with either the forward or reverse primer. After ethanol/EDTA precipitation, the samples underwent electrophoresis on an ABI 3130 Genetic Analyzer (Applied Biosystems). Genome-wide DNA-methylation The DNA methylation profiles of 14 cases were analysed using Illumina HumanMethylation 27 BeadChip (Illumina, Inc., San Diego, CA, USA). Those 14 cases included 11 5q- syndrome, 1 del(5q) RA with additional cytogenetic aberrations and 2 advanced del(5q) cases. To ensure karyotypic homogeneity, only the DNA methylation profiles of the 11 5q- syndrome cases was further analysed based on the mutational status of the genes involved in epigenetic regulation included in our TSCA. Within these 11 5q- syndrome cases 1 had a DNMT3A mutation, 2 had an ASXL1 mutation, and 1 had concomitant ASXL1 and TET2 mutations. Data analysis was carried out using R/Bioconductor. Before selection of differentially methylated probes a filtering process based on the mean β-values for each gene mutated under study (DNMT3A, ASXL1, ASXL1 and TET2, ASXL1 or TET2) was performed to focus the analysis on genes with large differences in their methylation status. Briefly, the obtained mean value was categorized in three states: unmethylated state (mean value < 0.3), partially methylated state (mean value > 0.3-<0.7) and methylated state (mean value > 0.7). We assigned a value of 0, 1 or 2 to each probe in function of its methylation state and calculated the difference between states for each comparison. All probes with differential methylated state equal to 0 were filtered out. Finally, fold-change of mean β-values was used to find out the probes that showed significant differential methylation patterns. Probes were selected as significant using a logFC cut off of 1.5. In order to investigate the potential effect on DNA methylation of mutations in genes involved in the epigenetic regulation of the cell, the following comparisons were run: 2 ASXL1-mut cases versus 7 cases with no epigenetic gene mutations. Number of differentially methylated genes (DMG): 422. 1 DNMT3A-mut case versus 7 cases with no epigenetic gene mutations. Number of DMG:144. 1 ASXL1 & TET2 mutant cases versus 7 cases with no epigenetic gene mutations. Number of DMG:156. 3 ASXL1-mut cases versus 7 cases with no epigenetic gene mutations. Number of DMG: 205. The lists of DMG were used to generate supervised clusters on all 11 5q- syndrome cases. None of the analyses managed to cluster the samples based on their mutations in epigenetic genes. Based on these results, we cannot attribute any specific DNA methylation profile to the mutations detected in genes involved in the epigenetic regulation of the cell. SNP mapping assay and data analysis The SNP mapping assay was performed according to the protocol supplied by the manufacturer (Affymetrix, Santa Clara, CA, USA). Briefly, 250 ng DNA were digested with Hind III, ligated to the adaptor, and amplified by polymerase chain reaction (PCR) using a single primer. PCR products were purified with the DNA amplification clean-up kit (Clontech) and the amplicons were quantified. The 40 μg of purified amplicons were fragmented, endlabeled and hybridized to a Genechip Mapping 50K Hind III array at 48°C for 16–18 hours in a Hybridization Oven 640 (Affymetrix). After washing and staining in a Fluidics Station 450 (Affymetrix), the arrays were scanned with a GeneChip Scanner 3000 (Affymetrix). Cell intensity calculations and scaling were performed using GeneChip Operating Software (GCOS). Data were analyzed using GeneChip Genotyping Analysis Software Version 4.0 (Affymetrix) and CNAG software version 2.0. Quality control was performed within the Genotyping software after scaling the signal intensities of all arrays to a target of 100%. DNA copy number was analyzed with both the chromosome copy number tool (CNAT) version 3.0 and CNAG version 2.0. CNAT compares obtained SNP hybridization signal intensities with SNP intensity distributions of a reference set from more than 100 healthy individuals of different ethnicity. For analysis with CNAG we used a pool of 45 healthy controls as a 7 reference set. References 1. Lunter G, Goodson M. Stampy: a statistical algorithm for sensitive and fast mapping of Illumina sequence reads. Genome Res 2011;21(6):936-9. 2. Rimmer A, Mathieson I, Lunter G, McVean G. Platypus: An Integrated Variant Caller (www.well.ox.ac.uk/platypus). 2012. 3. Jones AV, Kreil S, Zoi K, Waghorn K, Curtis C, Zhang L, et al. Widespread occurrence of the JAK2 V617F mutation in chronic myeloproliferative disorders. Blood 2005;106(6):2162-8. 4. Murphy KM, Levis M, Hafez MJ, Geiger T, Cooper LC, Smith BD, et al. Detection of FLT3 internal tandem duplication and D835 mutations by a multiplex polymerase chain reaction and capillary electrophoresis assay. J Mol Diagn 2003;5(2):96-102. 5. Kiyoi H, Naoe T, Yokota S, Nakao M, Minami S, Kuriyama K, et al. Internal tandem duplication of FLT3 associated with leukocytosis in acute promyelocytic leukemia. Leukemia Study Group of the Ministry of Health and Welfare (Kohseisho). Leukemia 1997;11(9):1447-52. 6. Scholl S, Mugge LO, Landt O, Loncarevic IF, Kunert C, Clement JH, et al. Rapid screening and sensitive detection of NPM1 (nucleophosmin) exon 12 mutations in acute myeloid leukaemia. Leuk Res 2007;31(9):1205-11. 7. Wang L, Fidler C, Nadig N, Giagounidis A, Della Porta MG, Malcovati L, et al. Genomewide analysis of copy number changes and loss of heterozygosity in myelodysplastic syndrome with del(5q) using high-density single nucleotide polymorphism arrays. Haematologica 2008;93(7):994-1000. V617F Table S1. Summary of JAK2 variant allele frequencies (VAF). RT-PCR MiSeq JAK2 WT Copy Number V617F Copy Number VAF Total Depth Reference Depth JAK2_A 60035 1865 0.03 8205 JAK2_B 51617 5929 0.10 7924 JAK2_C 58408 9834 0.14 JAK2_D 52331 7917 JAK2_E 59013 JAK2_F 50564 JAK2_G 36490 11804 Sample ID Variant Depth VAF 7564 628 0.08 6540 1366 0.17 7883 5859 2015 0.26 0.13 7828 6342 1472 0.19 852 0.01 7411 7219 177 0.02 10411 0.17 8139 6390 1719 0.21 0.24 7637 5290 2333 0.31 Table S2. Sanger sequencing primers and PCR conditions. Target Forward Primer Reverse Primer PCR Conditions Reference TET2 AGACTTATGTATCTTTCATCTAGCTCTGG ACTCTCTTCCTTTCAACCAAAGATT 60°C Gelsi-Boyer et al. RUNX1 GCTGTTTGCAGGGTCCTAA CCTGTCCTCCCACCACCCTC 5x Q Solution, 55°C SF3B1 CTGCAGTTTGGCYGAATAGTTG AAAATTCTGTTAGAACCATGAAACA 60°C Papaemmanuil et al. FLT3 CCGCCAGGAACGTGCTTG GCAGCCTCACATTGCCCC 5x Q Solution, 55°C Nakao et al. Table S3. Detailed description of non-synonymous variants with a COSMIC ID or not reported in dbSNP. sample ID Diagnostic Gene Genome coordinates DNA change Protein change Qscore Variant call ratio [% (variant/total)] COSMIC ID dbSNP ID Polyphen2 (score, sensitivity, specificity) MDS16 RA (5qsyndrome) RUNX1 chr21:36259324 A>AG L29S 99 31.9 (23/72) COSM24756 rs111527738 Probably damaging (0.999 0.14, 0.99) MDS15 RA (5qsyndrome) SF3B1 chr2:198266834 T>TC K700E 99 11.4 (170/1496) COSM84677 NA Probably damaging (1.000, 0.00, 1.00) DNMT3A chr2:25457242 C>CA R882L 75 7.8 (92/1176) NA NA Probably damaging (0.982, 0.75, 0.96) ASXL1 chr20:31022449 insG G646WfsX12 99 44.7 (174/389) COSM34210 NA Truncated protein WT1 chr11:32413565 C>CT R462Q 99 49.0 (174/355) COSM21408 NA Probably damaging (1.000, 0.00,1.00) TET2 chr4:106193748 C>CT R1404X 99 45.1 (309/685) COSM42037 NA Truncated protein ASXL1 chr20:31022449 insG G646WfsX12 99 10.5 (37/351) COSM34210 NA Truncated protein COSM84677 NA Probably damaging (1.000, 0.00, 1.00) MDS07 MDS08 MDS08 MDS14 MDS12 MDS12 MDS12 MDS06 MDS11 MDS10 RA (5qsyndrome) RA (5qsyndrome) RA (5qsyndrome) RA (5qsyndrome) RA (5qsyndrome) RA (5qsyndrome) RA (5qsyndrome) RA (5qsyndrome) RA (5qsyndrome) RA (5qsyndrome) SF3B1 chr2:198266834 T>TC K700E 99 40.0 (620/1549) TET2 chr4:106164896 insA fs (Y1255X) 99 5.3 (41/771) COSM110747 NA Truncated protein TET2 chr4:106197552 C>CT P1962L 99 50.3 (303/602) COSM41894 NA Probably damaging (0.974, 0.76, 0.96) ASXL1 chr20:31022902 G>GA W796X 99 35.8 (144/402) COSM53207 NA Truncated protein TP53 chr17:7578413 C>CG V173L 99 41.1 (109/265) COSM43559 NA Probably damaging (0.979, 0.76, 0.96) MDS29 RA (5qsyndrome) JAK2 Chr9:5073770 G>GT V617F 99 7 COSM12600 rs77375493 Probably damaging ( 0.996, 0.55, 0.98) MDS34 RA (5qsyndrome) JAK2 Chr9:5073770 G>GT V617F 99 28 COSM12600 rs77375493 Probably damaging ( 0.996, 0.55, 0.98) DNMT3A chr2:25457176 G>GA P904L 99 44.0 (198/450) COSM52989 rs149095705 Probably damaging (0.995, 0.68, 0.97) U2AF1 chr21:44514777 T>TC Q157R 99 38.3 (242/632) COSM144989 NA Probably damaging (0.997, 0.41, 0.98) CBL chr11:119149332 C>CT A447V 99 43.6 (99/227) NA NA Possibly damaging (0.717, 0.86, 0.92) MDS29 MDS28 MDS30 Del(5q) RA with additional cytogenetic abnormalities Del(5q) RA with additional cytogenetic abnormalities Del(5q) RA with additional cytogenetic MDS26 MDS37 MDS42 MDS42 MDS36 MDS33 MDS43 MDS43 MDS39 MDS39 MDS38 MDS38 abnormalities Del(5q) RA with additional cytogenetic abnormalities Advanced del(5q) MDS (RAEB) Advanced del(5q) MDS (CMML) Advanced del(5q) MDS (CMML) Advanced del(5q) MDS (RAEB) Advanced del(5q) MDS (RAEB) Advanced del(5q) MDS (RAEB) Advanced del(5q) MDS (RAEB) Advanced del(5q) MDS (RAEB) Advanced del(5q) MDS (RAEB) Advanced del(5q) MDS (RAEB) Advanced del(5q) MDS (RAEB) TP53 chr17:7577553 A>AG M243T 99 28.0 (327/1166) COSM43726 NA TP53 chr17:7577120 C>CT R273H 99 82.2 (620/754) COSM10660 rs28934576 ASXL1 chr20:31023821 G>GT E1102D 99 45.4 (366/806) COSM36205 rs139115934 CBL chr11:119149004 G>GT W408C 99 96.0 (267/278) COSM34072 NA ASXL1 chr20:31024704 G>GA G1397S 99 49.9 (875/1755) COSM133033 rs146464648 TET2 chr4:106196850 insCATG E1728Dfs*13 99 17.0 (121/713) COSM211745 NA Truncated protein NA NA Truncated protein COSM34210 NA Truncated protein 27.3 (313/1145) 42.8 (470/1097) Probably damaging (1.000, 0.00, 1.00) Possibly damaging (0.831, 0.84, 0.93) Possibly damaging (0.779, 0.85, 0.93) Probably damaging (0.996, 0.55, 0.98) Possibly damaging (0.792, 0.85, 0.93) TET2 chr4:106164880 G>GT E1250X 99 ASXL1 chr20:31022449 insG G646WfsX12 99 TP53 chr17:7578190 T>TC Y220C 99 38.7 (48/124) COSM99719 rs121912666 Probably damaging (1.000, 0.00, 1.00) TP53 chr17:7578275 G>GA Q192X 99 49.3 (99/201) COSM117949 NA Truncated protein COSM6549 rs11540652 COSM11059 NA TP53 chr17:7577538 C>CA R248L 99 TP53 chr17:7577568 C>CT C238Y 99 44.1 (1168/2648) 37.5 (998/2664) Probably damaging (1.000, 0.00, 1.00) Probably damaging (1.000, 0.00, 1.00) Table S4. Detailed description of synonymous variants with a COSMIC ID. sample ID Diagnostic Gene Genome coordinates DNA change Protein change Qscore Variant call ratio [% (variant/total)] COSMIC ID dbSNP ID MDS04 RA (5q- syndrome) IDH1 chr2:209113192 G>GA G105G 99 49.2 (445/904) COSM253316 rs11554137 MDS13 RA (5q- syndrome) IDH1 chr2:209113192 G>GA G105G 99 49.3 (465/943) COSM253316 rs11554137 MDS01 RA (5q- syndrome) IDH1 chr2:209113192 G>GA G105G 99 49.3 (421/854) COSM253316 rs11554137 MDS24 Del(5q) RA with additional cytogenetic abnormalities IDH1 chr2:209113192 G>GA G105G 99 49.6 (483/973) COSM253316 rs11554137 MDS30 Del(5q) RA with additional cytogenetic abnormalities IDH1 chr2:209113192 G>GA G105G 99 49.4 (356/721) COSM253316 rs11554137 MDS27 Del(5q) RA with additional cytogenetic abnormalities FLT3 chr13:28608459 T>TC L561L 99 53.6 (149/278) COSM19740 rs34374211 MDS43 Advanced del(5q) MDS (RAEB) FLT3 chr13:28608459 T>TC L561L 99 52.1 (173/332) COSM19740 rs34374211 MDS42 Advanced del(5q) MDS (CMML) KIT chr4:55599268 C>CT I798I 99 55.1 (162/294) COSM1307 rs55789615 MDS26 Del(5q) RA with additional cytogenetic abnormalities KIT chr4:55599268 C>CT I798I 99 45.5 (150/330) COSM1307 rs55789615 MDS02 RA (5q- syndrome) KIT chr4:55599268 C>CT I798I 99 50.0 (166/332) COSM1307 rs55789615 MDS05 RA (5q- syndrome) PDGFRA chr4:55152040 C>CT V824V 99 55.6 (280/504) COSM22413 rs2228230 MDS08 RA (5q- syndrome) PDGFRA chr4:55152040 C>CT V824V 99 53.8 (271/504) COSM22413 rs2228230 MDS09 RA (5q- syndrome) PDGFRA chr4:55152040 C>CT V824V 99 49.0 (251/512) COSM22413 rs2228230 MDS11 RA (5q- syndrome) PDGFRA chr4:55152040 C>CT V824V 99 48.0 (210/437) COSM22413 rs2228230 MDS14 RA (5q- syndrome) PDGFRA chr4:55152040 C>CT V824V 99 47.5 (308/648) COSM22413 rs2228230 MDS16 RA (5q- syndrome) PDGFRA chr4:55152040 C>CT V824V 99 52.2 (251/481) COSM22413 rs2228230 MDS01 RA (5q- syndrome) PDGFRA chr4:55152040 C>CT V824V 99 53.0 (231/436) COSM22413 rs2228230 MDS27 Del(5q) RA with additional cytogenetic abnormalities PDGFRA chr4:55152040 C>CT V824V 99 51.4 (360/701) COSM22413 rs2228230 MDS35 Advanced del(5q) MDS (RAEB) PDGFRA chr4:55152040 C>CT V824V 99 45.6 (312/684) COSM22413 rs2228230 MDS42 Advanced del(5q) MDS (CMML) PDGFRA chr4:55152040 C>CT V824V 99 50.8 (332/654) COSM22413 rs2228230 MDS32 Advanced del(5q) MDS (RAEB) TP53 chr17:7578210 T>TC R213R 99 44.9 (137/305) COSM249885 rs1800372 Table S5. Genomic array results for 33 del(5q) cases analysed, including 18 5q- Syndrome cases. Brackets show several metrics of the detected alterations: coordinates mapping the alteration (start-end); SNPs within it (start-end); length (bp); SNPs contained (number); copy number. It is noted if any of the 25 genes analysed in this study was encompassed in that region. UPD: uniparental dysomy. NA: not available. Sample ID MDS02 Diagnosis RA, 5qSyndrome Age/Sex Karyotype Deletions 2p23.3 (25119937-26655307; 49925005; 1535370; 14; 1.17; 0.67) DNMT3A NA/F NA 5q22.1-q33.2 (110998762154437028; 20461-21565; 43438266; 1105; 1.38; 0.18) UPD Gains 13q14.11-q14.13 (4262752044793601; 43906-43966; 2166081; 61; 1.99; 0.25) Whole Chr8 (272252-146052174; 29522-33067; 145779922; 3546; 2.28; 0.34) 6q24.1 (139269078-139910603; 25404-25432; 641525; 29; 2.30; 0.38) 10p14-p13 (11264836-13110988; 35711-35761; 1846152; 51; 2.20; 0.31) MDS03 RA, 5qSyndrome 48/M 46,XY,del(5)(q13:q33) 5q14.3-q34 (87824784-167184563; 19911-21920; 79359779; 2010; 1.40; 0.18) 12q15 (67170328-69219954; 4203942099; 2049626; 61; 2.10; 0.36) 12q24.22-q24.31 (125358706; 43078-43146; 9511465; 69; 2.18; 0.34) 16q22.3-q23.1 (72348989-73935434; 50009-50037; 1586445; 29; 2.25; 0.30) 17q23.2-q23.3 (53845988-58047700; 51071-51112; 4201712; 42; 2.24; 0.33) 4q13.1-q13.2 (65578799-67713359; 14910-14985; 2134560; 76; 2.00; 0.23) MDS04 MDS05 RA, 5qSyndrome RA, 5qSyndrome 88/F 60/F 46,XX,del(5)(q13:q33) 46,XX,del(5)(q13:q33) 5q14.3-q34 (86862506-166939254; 19896-21913; 80076748; 2018; 1.61; 0.20) 5q14.2-q33.3 (81866327-156969197; 19783-21650; 75102870; 1868; 1.41; 0.34) 4q26-q27 (117680512-121421102; 16225-16298; 3740590; 74; 2.02; 0.27) 4q31.21-q31.23 (146661914149093249; 16855-16925; 2431335; 71; 1.99; 0.24) 1p31.2-p31.1 (68657746-70841176; 975-1035; 2183430; 61; 1.93; 0.35) 5q11.1-q11.2 (50213917-52264199; ChrX. ATRX, ZRSR2 MDS06 RA, 5qSyndrome 68/F 46,XX,del(5)(q1415:q33) 5q14.3-q33.3 (87875023-156072147; 199912-21634; 68197124; 1723; 1.82; 0.25) MDS07 RA, 5qSyndrome NA/F 46,XX,del(5)(q13:q33) 5q14.3-5q34 (89303345163980289;19955-21829; 74676944; 1875; 1.54; 0.26) MDS08 RA, 5qSyndrome NA/M NA MDS09 RA, 5qSyndrome 84/F 46,XX,del(5)(q13:q33) MDS10 RA, 5qSyndrome 76/F 46,XX,del(5)(q13:q33) 5q14.3-q33.3 (89917534-158840805; 19974-21705; 68923271; 1732; 1.47; 0.31) 5q14.3-q33.2 (85182021-154953129; 19863-21600; 69771108; 1738; 1.44; 0.21) 5q12.3-q13.1 (65535157-67677808; 19411-19484; 2142651; 74; 1.41; 0.14) 5q14.3-q15 (83343740-95777305; 19825-20080; 12433565; 256; 1.40; 0.17) 5q21.1-q34 (97786027-163782378; 20142-21815; 65996351; 1674; 1.39; 0.18) RA, 5qSyndrome 81/F MDS12 RA, 5qSyndrome 77/F 46,XX,del(5)(q22:q35) MDS13 RA, 5qSyndrome 64/F 46,XX,del(5)(q33:q34) MDS14 RA, 5qSyndrome 66/F 46,XX,del(5)(q31:q33)[8] /46,XX[31] MDS11 46,XX,del(5)(q13:q33) 5q21.1-q34 (98822612-164720069; 20174-21840; 65897457; 1667; 1.52; 0.22) 5q14.3-q33.1 (86463622-151297473; 19893-21476; 64833851; 1584; 1.48; 0.18) 5q32-q34 (148469763-167102662; 21427-21918; 18632899; 492; 1.46;; 0.28) 5q31.3-q33.3 (142271912156074292; 21232-21637; 13802380; 406; 1.42; 0.23) 18981-19036; 2050282; 56; 2.34; 2.02) 6q14.3-q15 (85411634-88478204; 24018-24102; 3066570; 85; 1.97; 0.28) 4q21.21-21.22 (80195990-82802565; 15257-15352; 2606575; 96; 1.98; 0.43) 13q21.2-q21.31 (5889568161586211; 44290_44371; length; 2690530; 82; 1.96; 0.30) 6q13-q14.1 (75035581-77537444; 23758-23818; 2501863; 61; 1.99; 0.25) 4q12-q13.1 (58804522-61855787; 14735-14810; 3051265; 76; 2.07; 0.33) 13q21.31-q21.32 (6229866765258156; 44390-44460; 2959489; 71; 1.97; 0.26) 2q23.3-q24.1 (152916479155038402; 7616-7694; 2121923; 79; 2.00; 0.30) 7p15.2-p15.1 (25388631-28776965; 26914-27025; 3388334; 112; 1.96; 0.27) 6q13-q14.1 (74620278-77463618; 23753-23813; 2843340; 61; 2.00; 0.24) 1q31.1 (185202380-187449830; 3205-3265; 22474450; 61; 1.95; 0.30) 3p24.1-p23 (29906836-34218933; 10495-10560; 4312097; 66; 1.96; 0.24) 46,XX,del(5)(q13:q33)[6] /46,XX[4] 5q14.3-q33.3 (84641203-158921205; 19851-21707; 74280002; 1817; 1.69; 0.20) 74/F NA 5q21.1-q33.3 (102215241156099317; 20248-21641; 53884076; 1394; 1.34; 0.23) RA, 5qSyndrome 66/F 46,XX,del(5)(q1415:q33) 5q21.3-q34 (106062626-163950485; 20323-21828; 57887789; 1506; 1.35; 0.31) 4q21.3-22.1 888367780-90815461; 15505-15585; 2447681; 81; 1.97; 0.43) 10q23.1 (83732605-85765313; 37101-37181; 2032708; 81; 2.22; 0.54) 3q25.1-q25.2(151639927154140946; 12710-12780; 2501019; 71; 2.00; 0.27) RA, 5qSyndrome 24/F 46,XX,del(5)(q31:q33) 5q31.3q.33.3 (141347991158590590; NA; 17242599; 491;; 1.37; 0.17) 7q31.33 (123603987-125743270; 28927-28987; 2139283; 61; 2.04; 0.33) RA, 5qSyndrome 72/F MDS16 RA, 5qSyndrome MDS18 MDS20 MDS15 MDS21 MDS23 MDS24 6q22.33-q23.1 (128419190130577705; 25114-25209; 2158515; 96; 1.92; 0.23) 4q26-q27 (120601325122789184;16270-16340; 2187859; 71; 1.93; 0.29) RA, 5qSyndrome RA RA 70/M 77/F 72/M 46,XY,del(5)(q13:q33) 46,XX,del(5)(q13:q33),d el(11)(q22)[3]/46,XX[2] 46,Y,der(X)t(X;12)(p22;q 21),del(5)(q14-15;q33- 5q14.3-5q33.3 (19918-21643; 87898589-156124093; 68225504; 1726; 1.65; 0.30) 5q14.3-q34 (82810660-163854743; 19815-21825; 81044083; 2011; 1.78; 0.20) 11q22.3-q25 (106376702134173875; 40174-40623; 27797173; 450; 1.77; 0.19) (Only CN loss) CBL 5q15-q33 (91919548-158401872; 20009-21692; 66482324;1684; 1.60; 10q21.2-q21.3 (61596872-64250581; 36741-36796; 2653709; 56; 2.04; 0.33) 1p32.3-33 (48755449-54062185; 462-534; 5306736; 73; 2.09; 0.36) 17q23.2-q24.1 (53562730-61202528; 51069-51137;7639798; 69; 2.02; 0.26) 5q11.2-q12.1 (57355239-62126018; 19149-19330; 4770779; 182; 1.94; 0.39) Multiple small gains 11q14.3-q21 (91659665-94441966; 39765-39825; 2782301; 61; 1.95; 0.38) 19p12-q12 (21633219-33888946; 53124-53185; 12255727; 62; 1.95; 0.27) 6q24.1-q24.2 (141617059143973425; 25458-25518;2356366; 34),der(12),del(12)(p11q 13)[7]/46,XY[3] 0.16) 61; 1.96; 0.21) 6q23.2-q23.3 (135131247138523284; 25324-25394; 3392037; 71; 1.68; 0.15) (Only CN loss) 12p11.23-p13.31 (980936927307682; 40761-41192; 17498313; 432; 1.63; 0.13) (Only CN loss) ETV6 12q21.33-q22 (89389338-94118553; 42583-42677; 4729215; 95; 1.69; 0.19) (Only CN loss) MDS25 RA 78/F MDS26 RA 85/F MDS29 RA 78/F MDS31 RA 73/F 46,XX,del(5)(q14:q34), t(1,3)(p33:p14)[21]/46,X X[4] 5q14.3-q33.2 (86226079-154919227; 19885-21598; 68693148; 1714; 1.73; 0.33) 12q21.2-q21.31 (7822683681842401; 42349-42425; 3615565; 77; 2.05; 0.44) 46,XX,del(5)(13:q33),+8 5q14.3-q33.3 (86607880-157924393; 19894-21673; 71316513; 1780; 1.60; 0.19) 13q21.1 (55328914-58382524; 44214-44273;3053610;60; 2.02; 0.24) 5q14.3-q33.3 (90357044-158432337; 19981-21696; 68075293; 1716; 1.64; 0.19) 9q21.13 (71990110-74653742; 34293-34368; 2663632; 76; 1.98; 0.25) 5q21.3-q34 (104537088-167772186; 20302-21937; 63235098; 1636; 1.47; 0.16) 8q21.11 (75717988-78220362; 31423-31498; 2502374; 76; 1.95; 0.20) 46,XX,del(5)(q13:q33)[1 8]/46,XX,del(5)(q13:q33) ,-7[1] 46,XX,del(5)(q13:q31)[1 8]/48,XX,del(5)(q13:q31) ,idic(21)(q22),+2mar[2]/4 6,XX[1] 6p21.2-p22.1 (length: 11429636; 131; 2.20; 0.36) 22q13.1-q13.31 (length: 8390013; 88; 2.21; 0.42) Some small CN changes Whole Chr8 (228574-143783463; 29518-33065; 143554889; 3548; 2.25; 0.29) 13q13.1-14.11 (31506479-39779549; 43528-43816; 8273070; 289; 1.98; 0.27)) MDS32 MDS35 RAEB RAEB 52/F 58/M 46,XX,del(5)(q13:q33) 92,XXYY,del(5)(q14:q33 ) 5q14.3-33.2 (87217489-153708130; 19903-21559; 66490641; 1657; 1.47; 0.21) 5q21.3-q35.3 (107008082180607628; 20365-22122; 73599546; 1758; 1.51; 0.21) NPM1 14q12 (24397576-26870017; 4594746026; 2472441; 80; 2.00; 0.23) 21q11.1-q22.3 (10000969-46844296; 54389-55266; 36843327; 878 whole Chr; 1.99; 0.29) RUNX1, U2AF1 13q21.33-q22.1 (7040143573818187; 44605-44740; 3416752; 136; 2.02; 0.26) 16p11.1-q24.3 (34953675-88143266; 49577-50361; 53189591; 785; 1.99; 0.27) 13q31.2-q34 (88103652-113215972; 45121-45851; 25112320; 731; 2.45; 0.38) MDS36 RAEB NA/F 46,XX,del(5)(q13:q33),d el(11)(q23) 5q23.1-q33.2 (116859235155177249; 20656-21605; 38318014; 950; 1.45; 0.20) 6q23.3 (135385192-138089687; 25328-25388; 2704495; 61;2.02; 0.29) 9q12.1 (56241373-59006184; 3090830988; 2764811; 81; 2.01; 0.25) 15q13.1-q13.2 (26237007-28085050; 47841-47866; 1848043; 26; 2.12; 0.35) 5q21.1-q35.3 (98243608-180607628; 20163-22122; 82364020; 1960; 1.42; 0.20) NPM1 7q11.22-q36.3 (69377470158624663; 27729-29517; 89247193; 1789; 1.43; 0.20) EZH2 12p12.1-p13.2 (10219902-22122693; 40783-41038; 11902791; 256; 1.46; 0.22) ETV6 MDS37 RAEB 58/M 4345,XY,del(5)(q31),der(7) t(7;12)(q22;q1?3),-12,13,19,?del(20)(q1?3)[cp4] 13q14.11q14.2 (40036389-46217079; 43817-44001; 6180690; 185; 1.38; 0.18) 13q14.2-q21.1 (47592092-55466888; 44039-44218; 7874796; 180; 1.43; 0.17) 6p22.3 (22218694-23666140; 2268822735; 1447446; 48) 5q15 (92105855-95229134; 2001120072; 3123279; 62; 2.12; 0.44) 9q21.31-q21.32 (8078235482945497; 34556-34600; 2163143; 45) 15q12-q13.2 (24374497-28800086; 47820-47867; 4425589; 48; 1.46; 0.27) 17p11.2-p13.3(450509-19519465; 50362-50609; 19068956; 248; 1.39; 0.18) TP53 MDS39 RAEB 82/F 46,XX,del(5)(q13:q33),t( 6;12)(q13;p12)[2]/45,XX, -7,22/46,XX,del(5)(q13:q33 ),t(6;12)(q13;p12),+mar[ 15]/46,XX[3] 20q11.21-q13.13 (2993363148271268; 53958-54172; 18337637; 215; 1.37; 0.22) ASXL1 5q14.2-q34 (81511479-161557314; 19775-21753; 80045835; 1979; 1.53; 0.44) 7p22.3-p11.2 (250149-56479844; 26082-27645; 56229695; 1564; 1.58; 0.45) 7q21.3-q36.3 (94919442-158624663; 2p22.2-p22.1 (38319370-41333317; 5245-5334; 3013947; 90; 2.29; 0.70) Multiple gain of copy number 28305-29517; 63705221; 1213; 1.57; 0,47) EZH2 6q13-q15 (72449224-89814330; 23699-24130; 17365106; 432; 1.99; 0.30) SRSF2 MDS40 RAEB 54/F 46,XX,del(5)(q14:q34) 5q14.3-q34 (87425364-161996101; 19904-21761; 74570737; 1858; 1.47; 0.17) 9p21.3-p22.2 (18466830-21763347; 33640-33730; 3296517; 91; 1.98; 0.32) 9p21.1-p21.2 (26316539-30212869; 33860-34045; 3896330; 186; 2.00; 0.24) 12q24.13-q24.21 (111751790115292641; 43011-43072; 3540851; 62; 2.02; 0.32) MDS41 MDS42 MDS43 RAEB CMML RAEB 56/M 45/M 79/F 46, XY,del(5)(q14:q34)[2];47 ,XY,del(5)(q14:q34),+21[ 20] 46,XY,del(5)(q13:q33),d el(13)(q12:q22) 46,XX,del(5)(q15:q33) 5q22.3-q34 (20585-21899; 115079389-166312668; 51233279; 1315; 1.57; 0.39) 5q14.3 (86007785-88632975; 1988119936; 2625190; 56; 1.88; 0.40) 5q14.3-q33.3 (85143956-159665477; 19861-21719; 74521521; 1859; 1.45; 0.24) 8q21.11 (75982355-78287079; 31427-31506; 2304724; 80; 2.06; 0.32) 13q13.2-q21.31 (3349541861943642; 43601-44385; 28448224; 785; 1.44; 0.21 5q21.1-q33.2 (101389190154492074; 20226-21568; 53102884; 1343; 1.50; 0.21) 11q22.1-q25 (97063972-134173875; 39904-40623; 37109903; 720; 2.00; 0.31) CBL 4q13.1 (60577199-62928065; 1476114841; 2350866; 81; 1.89; 0.24) Multiple gains of copy number 4q21.21 (79349747-81034492; 15226-15290; 1684745; 65; 2.28; 0.44) Table S6. List of genes affected by cytogenetic loss. 5q- Syndrome (n=18) EZH2 NPM1 TP53 ETV6 ASXL1 CBL DNMT3A RA del(5q) with additional karyotypic abnormalities (n=6) 1 1 1 Advanced del(5q) cases (n=9) 2 2 1 1 1 Figure S1. Number of clusters generated per amplicon in the panel during the MDS del(5q) cohort MiSeq sequencing run. A total of 96% (308/322) of all amplicons generated at least 100 clusters during sequencing (average 5,362 clusters/amplicon). A B Figure S2. Comparison of read alignments covering the 19bp TP53 deletion in sample TEST009. The initial read alignment and variant calling (BaseSpace, A) failed to align any reads containing deletions to the reference genome, resulting in a much lower read depth across this locus (~30x). By comparison, re-analysis of the same data using the Stampy and Platypus pipeline (B) resulted in a greater number of aligned reads, giving a higher read depth (>700x) and successfully identified the deletion. Figure S3. Comparison of the TET2 C1464X mutation in sample TEST001 by Sanger and next-generation sequencing. The C1464X variant was detected and called in the MiSeq data (top) at a frequency of 47% (200/423 reads). The variant can be seen in the Sanger sequencing trace (bottom), but was not identified by the Mutation Surveyor software due to the relatively high background noise in the data. Figure S4. Validation of new mutations found by MiSeq in the validation cohort in addition to TET2 C1464X. The remaining new mutations were confirmed by Sanger sequencing (A-D) or fragment analysis (E). Figure S5. Supervised clustering using methylation data from 11 5q- syndrome cases. All pictures have been cropped to show the hierarchical clustering at the top. (A) Clustering using 422 differentially methylated genes between 2 ASXL1-mut cases and 7 cases with no epigenetic gene mutations. (B) Clustering using 144 differentially methylated genes between 1 DNMT3A-mut case and 7 cases with no epigenetic gene mutations. (C) Clustering using 156 differentially methylated genes between 1 ASXL1 & TET2-mut case and 7 cases with no epigenetic gene mutations. (D) Clustering using 205 differentially methylated genes between 3 ASXL1-mut cases and 7 cases with no epigenetic gene mutations.