Cellular reprogramming

Transcription

Cellular reprogramming
 to Renate voor Renate
Promotor: Prof. dr. ir. Wim Van Criekinge Dean: Prof. dr. ir. Guido Van Huylenbroeck Rector: Prof. dr. Paul Van Cauwenberge Members of the examination committee: Prof. dr. ir. Jacques Viaene Chairman Department Agricultural Economics Faculty of BioScience Engineering, Ghent University Prof. dr. Els Van Damme Secretary Department Molecular Biotechnology Faculty of BioScience Engineering, Ghent University Prof. dr. ir. Wim Van Criekinge Promotor Department Molecular Biotechnology Faculty of BioScience Engineering, Ghent University Prof. dr. ir. Olivier Thas Department of Applied Mathematics, Biometrics and Process Control Faculty of BioScience Engineering, Ghent University Dr. Manon van Engeland Department of Pathology Research Institute for Growth and Development, Maastricht University and University Hos­
pital, Maastricht, the Netherlands Prof. dr. Danny Geelen Department of Plant Production Faculty of BioScience Engineering, Ghent University Dr. Joost Louwagie Product Development OncoMethylome Sciences, Leuven
Cellular reprogramming ir. Maté Ongenaert Promotor Prof. dr. ir. Wim Van Criekinge Lab. for Bioinformatics and Computational Genomics (BioBix) Department of Molecular Biotechnology Faculty of BioScience Engineering Ghent University Thesis submitted in fulfilment of the requirements for the degree of Doctor (PhD) in Applied Biological Sciences: cell‐
and gene biotechnology Dutch translation of the title Cellulaire herprogrammatie Illustration on the cover The Vitruvian Man (by Leonardo Da Vinci, around 1487). It depicts a nude male figure in two superimposed positions with his arms and legs apart and simulta‐
neously inscribed in a circle and square. Da Vinci based his drawing on some hints at correlations of ideal human proportions with geometry in Book III of the treatise De Architectura by the ancient Roman architect Vitruvius, thus its name. Overlayed is a representation of a double‐stranded piece of DNA, with some cytosine residues methylated. The representation is a so‐called ‘ascii‐art’, con‐
taining only A,T,C and G to represent both a link to the ‘computer world’ and the sequence world. Artwork by Maté Ongenaert, Vitruvian Man image from Wikimedia Commons, photograph by Luc Viatour. Ascii art generated by text‐image.com (by Patrik Roos). Printing DCL Signs, Zelzate ISBN 978‐90‐5989‐274‐3 Maté Ongenaert Wim Van Criekinge The author and the promoter give the authorisation to consult and to copy parts of this work for personal use only. Every other use is subject to the copyright laws. Permission to reproduce any material contained in this work should be obtained from the author. Contents CONTENTS ................................................................................... I ABBREVIATIONS ....................................................................... 1 ACKNOWLEDGMENTS DANKWOORD ................................ 3 INTRODUCTION ........................................................................ 5 PART 1: EPIGENETICS, DNA­METHYLATION, DEVELOPMENT AND DISEASE .............................................. 7 CHAPTER 1: GENETICS AND EPIGENETICS – INTRODUCTION ................................................. 9 1.1 1.2 1.3 1.2.1 1.2.2 Situation in molecular biology ................................................................................ 9 Types of epigenetic modifications .......................................................................... 9 DNA modifications .......................................................................................................... 10 Histone modifications .................................................................................................... 10 Research objectives ..................................................................................................... 11 CHAPTER 2: DNA­METHYLATION ........................................................................................ 13 2.1 2.2 2.3 2.4 Occurrence of DNA­methylation ........................................................................... 13 Mechanism of DNA­methylation ........................................................................... 14 Influence of nutrition ................................................................................................. 14 Detection of DNA­methylation .............................................................................. 16 CHAPTER 3: FUNCTIONS OF DNA­METHYLATION ............................................................... 21 3.1 3.2 3.3 3.4 3.5 3.2.1 3.2.2 Contents Imprinting ....................................................................................................................... 21 Diseases caused by abnormal imprinting ......................................................... 22 Beckwith‐Wiedemann syndrome ............................................................................. 22 Prader‐Willi/Angelman syndromes ........................................................................ 22 Silencing of the female X­chromosome .............................................................. 22 Silencing of junk DNA ................................................................................................ 23 RNA structures and methylation .......................................................................... 23 i CHAPTER 4: METHYLATION AND INFLUENCE ON TRANSCRIPTION ...................................... 25 4.1 Interactions with DNA­methylation .................................................................... 25 4.2 Protein complexes involved in the link DNA­methylation – histone modification ....................................................................................................................................... 26 4.3 The influence of the Polycomb group of proteins.......................................... 27 CHAPTER 5: DNA­METHYLATION AND CANCER................................................................... 29 5.1 5.2 5.3 5.4 5.5 5.6 5.7 5.8 Development of cancer and the role of DNA­methylation ........................ 29 Cancer stem cell hypothesis and epigenetics .................................................. 32 Cancer profiling based on DNA­methylation .................................................. 33 Uncovering the cancer methylome ...................................................................... 35 Discovering epigenetic biomarkers ..................................................................... 36 Early diagnostics ......................................................................................................... 37 Stratification and personalized medicine ........................................................ 37 Epigenetics and cancer therapy ............................................................................ 38 PART 2: DNA­METHYLATION, CANCER AND LITERATURE ........................................................................... 41 CHAPTER 6: INTRODUCTION ................................................................................................ 43 CHAPTER 7: DNA­METHYLATION AND LITERATURE ANALYSIS ........................................... 45 CHAPTER 8: INTERMEZZO: BIOLOGICAL TEXT MINING ....................................................... 47 8.1 Introduction ................................................................................................................... 48 8.2 Step 1: Perform automated literature queries ............................................... 49 8.3 Step 2: Define what to search for: deal with ontologies, gene and protein lists and thesauruses of chemical compounds and diseases ......................... 52 8.4 Step 3: Identify keywords, annotation lists and concepts in literature results. Deal with textual variants and ambiguities and identify relationships in the results ............................................................................................................................................ 53 8.5 Step 4: Rank, summarize and present the results ......................................... 58 8.6 Discussion ........................................................................................................................ 60 8.7 Conclusion ....................................................................................................................... 60 ii Contents
CHAPTER 9: PUBMETH: METHYLATION DATABASE IN CANCER ........................................... 63 9.1 9.2 9.3 9.4 9.5 9.3.1 9.3.2 Introduction ................................................................................................................... 64 Filling up the database ............................................................................................. 65 Querying the database .............................................................................................. 67 Gene‐centric query .......................................................................................................... 67 Cancer‐centric query ...................................................................................................... 69 Performance of PubMeth, discussion and future ........................................... 70 Acknowledgments ....................................................................................................... 71 CHAPTER 10: CONCLUSION .................................................................................................. 73 PART 3: GENOME­WIDE SELECTION OF METHYLATION MARKERS .................................................. 75 CHAPTER 11: INTRODUCTION .............................................................................................. 77 CHAPTER 12: INTERMEZZO: DNA METHYLATIEMERKERS HELPEN VROEGTIJDIGE OPSPORING VAN CERVIXCARCINOOM .................................................................................... 79 CHAPTER 13: DISCOVERY OF METHYLATION MARKERS IN CERVICAL CANCER, USING RELAXATION RANKING .......................................................................................................... 85 13.1 13.2 Introduction ................................................................................................................... 86 Material and methods ............................................................................................... 88 13.3 Results ............................................................................................................................... 92 13.4 13.5 Discussion ..................................................................................................................... 103 Acknowledgments .................................................................................................... 105 13.2.1 13.2.2 13.2.3 13.2.4 13.2.5 13.2.6 Primary cervical tissue samples ................................................................................ 88 Cervical cancer cell lines ............................................................................................... 89 RNA and DNA isolation ................................................................................................. 89 Expression data ................................................................................................................ 90 Relaxation ranking algorithm ..................................................................................... 90 DNA methylation analysis using COBRA and bisulfite sequencing ............ 92 13.3.1 The validation of the top 3000 probe‐list selected using relaxing high‐
ranking 95 13.3.2 Validation of the 10 highest ranking candidate genes by COBRA ............... 99 Contents iii CHAPTER 14: EXPLORING THE CANCER METHYLOME USING GENOME­WIDE PROMOTER ANALYSIS ............................................................................................................................ 107 14.1 14.2 Introduction ................................................................................................................ 108 Materials and methods .......................................................................................... 109 14.3 Results ............................................................................................................................ 115 14.4 Discussion ..................................................................................................................... 121 14.2.1 14.2.2 14.2.3 14.2.4 Data sources .................................................................................................................... 109 Broad‐analysis: genome‐wide promoter alignment ...................................... 110 Deep analysis: specific binding patterns ............................................................ 113 Application of both approaches and experimental validation .................. 114 14.3.1 Broad‐analysis ............................................................................................................... 115 14.3.2 Deep‐analysis ................................................................................................................. 118 14.3.3 Application: Marker identification and experimental validation of proposed markers .......................................................................................................................... 120 CHAPTER 15: GENOME­WIDE PROMOTER ANALYSIS UNCOVERS PORTIONS OF THE CANCER METHYLOME ....................................................................................................................... 125 15.1 15.2 Introduction ................................................................................................................ 126 Materials and methods .......................................................................................... 127 15.3 Results ............................................................................................................................ 131 15.4 Discussion ..................................................................................................................... 141 15.2.1 15.2.2 15.2.3 15.2.4 15.2.5 15.2.6 15.2.7 15.2.8 Cell lines ........................................................................................................................... 127 5‐aza‐dC treatment of cells ....................................................................................... 127 Biotinylated RNA Probe Preparation and Hybridization ............................. 128 Analysis of Expression Data ..................................................................................... 128 BROAD analysis: Genome‐wide Promoter Alignment .................................. 129 DEEP analysis: Specific Binding Patterns ........................................................... 129 Tissue samples and DNA extraction ..................................................................... 129 Bisulfite Genomic Sequence Analysis, Conventional MSP, QMSP ............. 129 15.3.1 15.3.2 15.3.3 15.3.4 Validation of Modified Approach in Cell Lines ................................................. 133 Promoter Hypermethylation in Normal and Primary Tumor Tissues ... 134 Candidate Cancer Genes ............................................................................................ 137 New Targets of Aberrant Methylation in Major Types of Cancer by QMSP141 CHAPTER 16: TRANSCRIPTOME­WIDE PROMOTER HYPERMETHYLATION PROFILING IN NEUROBLASTOMA .............................................................................................................. 147 16.1 Materials and methods .......................................................................................... 148 16.2 Results and discussion ............................................................................................ 148 16.1.1 Neuroblastoma cell lines ........................................................................................... 148 16.1.2 Microarray analysis ..................................................................................................... 148 iv Contents
CHAPTER 17: PREDICTING PLATINUM RESPONSE IN OVARIAN CANCER, USING DNA­
METHYLATION PROFILING.................................................................................................. 155 17.1 17.2 Introduction ................................................................................................................ 156 Materials and methods .......................................................................................... 156 17.3 Results ............................................................................................................................ 160 17.4 Discussion ..................................................................................................................... 162 17.2.1 17.2.2 17.2.3 17.2.4 17.2.5 Samples ............................................................................................................................. 156 5‐aza‐dC treatment of cells ....................................................................................... 157 Biotinylated RNA Probe Preparation and Hybridization ............................. 157 Analysis of Expression Data ..................................................................................... 157 In‐silico analysis of top‐ranking probes .............................................................. 160 17.3.1 Ovarian cancer methylation markers .................................................................. 160 17.3.2 Platinum resistance methylation markers ......................................................... 161 17.3.3 Platinum sensitivity methylation markers ........................................................ 161 CHAPTER 18: CONCLUSIONS ............................................................................................. 163 PART 4: REPROGRAMMING OF HUMAN HOST CELLS BY VIRUSES ............................................................................ 165 CHAPTER 19: INTRODUCTION ........................................................................................... 167 CHAPTER 20: CERVICAL CANCER AND THE HPV FAMILY OF VIRUSES .............................. 169 20.1 20.2 Introduction ................................................................................................................ 170 Materials and methods .......................................................................................... 171 20.3 Results ............................................................................................................................ 173 20.4 Discussion ..................................................................................................................... 174 20.2.1 20.2.2 20.2.3 20.2.4 Cell lines ........................................................................................................................... 171 Methylation‐specific digital karyotyping ............................................................ 172 Tag extraction and mapping .................................................................................... 172 Real‐time MSP platform ............................................................................................. 173 20.3.1 Tags, significantly different between libraries ................................................. 173 20.3.2 Real‐time MSP ................................................................................................................ 174 CHAPTER 21: CONCLUSIONS ............................................................................................. 177 OTHER RESEARCH PROJECTS ......................................... 179 SUPPLEMENTARY DATA ................................................... 181 SUMMARY AND FUTURE PERSPECTIVES ..................... 185 Contents v SAMENVATTING EN TOEKOMSTPERSPECTIEVEN . 189 REFERENCES ......................................................................... 193 CURRICULUM VITAE ........................................................... 205 vi Contents
Abbreviations ACTB beta‐actin API Application Programming Interface AS Angelman syndrome BioC BioConductor BLAST Basic Local Alignment Search Tool BLAT BLAST‐like alignment tool bp basepairs BSP Bisulfite sequencing prod‐
uct BWS Beckwith‐Wiedemann syn‐
drome C Cytosine CGI CpG island ChIP Chromatin ImmunoPrecipi‐
tation COBRA Combined Bisulfite Restric‐
tion Analysis CSC cancer stem cell CSS Cascading Style Sheets CSV Comma Seperated Values DAC 5‐aza‐2’‐deoxycytidine DBTSS Database of Transcription Start Sites DNA Deoxyribonucleic acid DNMT DNA methyltransferase FIGO Federation of Gynecology and Obstetrics G Guanin GO Gene Ontology HBV hepatitis B virus HDAC Histone deacetylase HMT Histone methyltransferase HPV Human Papilloma Virus hrHPV high‐risk HPV HSIL hoog‐gradige squameuze intraepitheliale letsels ICR Imprinting Control Region Abbreviations IPA IRAS Ingenuity Pathway Analysis imidazoline receptor anti‐
sera‐selected LINE Long Interspersed Nuclear Elements LOH loss of Heterozygocity LOI loss of imprinting LSIL laag‐gradige squameuze intraepitheliale letsels MBD Methyl‐CpG Binding Do‐
main MeDIP MEthylated DNA Immuno‐
Precipitation MeSH Medical Subject Headings miRNA microRNA MSDK Methylation Specific Digital Karyotyping MSP Methylation specific PCR NCBI National Center for Bio‐
technology Information NCI National Cancer Institute ncRNA non‐coding RNA O/E Observed/Expected ratio PAGE Polyacrylamide Gel Elec‐
trophoresis PcG Polycomb Group PCR Polymerase Chain Reaction PWS Prader‐Willi syndrome QMSP quantitative MSP SAGE Serial Analysis of Gene Expression SAH S‐adenosylhomocysteine SAM S‐adenosyl‐L‐methionine SINE Short Interspersed Nuclear Elements TSA trichostatin A TSG Tumor suppressor gene TSS Transcription Start Site XIC X Inactivation Centre
1 Acknowledgments Dankwoord Dit stukje tekst is het eerste van een lange tekstreeks, en misschien wel het be‐
langrijkste deel. Het schrijven van deze paragrafen mag dan wel een vrij strikt individueel gebeuren zijn, de inkt zou niet op het papier staan zonder een hele‐
boel mensen die mij op professioneel of persoonlijk vlak gesteund hebben: zon‐
der hen zou dit werk eenvoudigweg niet bestaan. In de eerste plaats ben ik mijn promotor, Wim Van Criekinge, veel dank ver‐
schuldigd. Hij is degene die mijn interesse in de bio‐informatica aanwakkerde, gaf mij altijd zijn vertrouwen en zijn enthousiasme en stroom aan ideeën werkte zeer stimulerend. Als mede‐begeleider de practica van de cursus bio‐informatica mee vorm geven was een leerrijke en vooral leuke afwisseling, ook daar kreeg ik alle vertrouwen. Ook mijn andere collega’s van de twee eilanden en ons outstation in de kelder, ben ik veel dank verschuldigd. Leander, Tim, Gerben, Tom, Peter, Joachim en Sofie zorgen ervoor dat de sfeer er altijd in bleef. Ook op minder rooskleurige dagen bleven zij mij altijd steunen; de uren in de resto werd er soms wel wat gezaagd en geklaagd maar vooral leuke babbels geslagen. Bedankt! Nadat onze kleine bende starters in ware Prison Break stijl en met enkele dozij‐
nen dozen Pringles chips als proviand was ontsnapt uit de kelderruimte van Blok E, kwam ik terecht in het grote bureau op het gelijkvloers van Blok B. Sorry voor alle demo’s, Nijntje‐sessies en de invulling van de muziekquiz sinds dan… Be‐
dankt voor de leuke momenten! Op onze uitvalsbasis op het tweede kan ik altijd rekenen op de administratieve steun van Fien en Sofie. Ook alle andere collega’s van de vakgroep: bedankt voor de aangename tijd! Naast mijn eigen collega’s zijn er nog een aantal mensen die mij dit werk helpen verwezenlijken hebben. Acknowledgments / Dankwoord
3 ‐
Jasmien Hoebeeck, Katleen De Preter en Frank Speleman van de Medische genetica Gent voor de leuke samenwerking in het neuroblastoma‐deel Lieselot Vercruysse en Guy Smagghe van het labo Agrozoölogie: bedankt voor de vlotte samenwerking Veerle Melotte en Manon van Engeland (Departement pathologie, Universi‐
teit Maastricht): bedankt voor het vertrouwen An Nijs, Jean‐Pierre Renard, Gonda Verpooten, Geert Trooskens en Valérie Deregowski van Oncomethylome Sciences (Leuven) voor de hulp bij de praktische experimenten in Leuven, en de aangename samenwerking in een heleboel projecten Bea Schuurs, Ed Schuuring en Ate van der Zee (UMCG Groningen) voor het werk op baarmoederhalskanker Renske Steenbergen en Peter Snijders (VUMC, Amsterdam) voor het mee tot stand brengen van de experimenten met de HPV‐modellen in cervicale kanker Kornelia Polyak, Min Hu and Noga Qimron (and co‐workers at Dana‐Farber Cancer Institute, Boston) for the execution of the MSDK‐experiments. It was a pleasure to stay there for two weeks! Mohammad Hoque, Marianna Brait and David Sidransky (and other col‐
leagues) (Johns Hopkins, Baltimore): it was really a pleasure to work with you on the validation of the methylation markers from our computational approaches. It was really nice to get your feedback on our analysis methods: thank you for your confidence and useful feedback on our methodologies! ‐
‐
‐
‐
‐
‐
‐
Buiten deze eerder werkgerelateerde contacten, zijn er nog tientallen mensen die mij de nodige steun gaven. Dank aan mijn ouders, die mij de mogelijkheden ga‐
ven mij de opleiding van mijn keuze te volgen en mij steunden in al mijn beslis‐
singen. Zonder de kansen die jullie mij gegeven hebben was dit werk er niet ge‐
weest! Bedankt. Als allerlaatste bedank ik Renate: haar steun in moeilijke dagen, haar geduld als ik haar weer eens het verschil tussen RNA en DNA wou uitleggen, het nalezen van artikels moet haar meer dan eens hoofdpijn hebben bezorgd. Zonder jou was ik niet dezelfde persoon geweest. Woorden schieten te kort om te beschrijven wat je voor mij betekent: ik heb je lief en draag dit werk op aan je. 4 Acknowledgments / Dankwoord
Introduction Ever since the discovery of the structure and function of DNA by Watson and Crick in 1953, tens of thousands of researchers all over the world work on genet‐
ics. About 50 years later, with the publication about the human genome sequence in the same journal, their work was highlighted in the media again. Our blueprint seems to be composed out of DNA strands, relatively simple mole‐
cular structures, each containing 3 billion base pairs. Transcription and transla‐
tion of this DNA gives rise to about 25000 proteins with diverse functions, inte‐
racting with each other and controlling the complexity of the entire organism. However, genetics alone is not able to explain all observed events. Very recently, it became clear that what is considered as junk DNA (such as miRNAs) might play an important role in controlling genetics. Also important are chemical mod‐
ifications of the DNA, that seem to be able to fine­tune and control the genetics as we know it. These so‐called epi­genetic modifications of DNA and histones are described in this work. In part 1, this phenomenon is described, with a focus on DNA‐methylation, its functions and relationship with gene silencing and its involvement in cancer development and progression. In part 2, the existing knowledge of DNA‐methylation in cancer is summarized using text mining techniques and a methylation database in cancer is developed. Introduction 5 In part 3 we search for novel methylation markers in various cancer types. Therefore, we make use of different genome‐wide experimental techniques. The results of such experiments produce enormous amounts of data that contain a considerable amount of noise. Selecting the most promising markers out of these results thus requires adapted (computational) sorting and selection methodolo‐
gies. The results of the computational approaches that were developed were validated experimentally on both cell lines and primary samples, revealing novel methylation biomarkers in various cancer types (including cervical, ovarian, neuroblastoma, head and neck cancer, prostate, lung, ...) Such an identified methylation ‘biomarker’ can be used to detect cancer devel‐
opment in early stages, to classify different patient groups (opening the road to personalized treatment) and to predict response to chemotherapy. Potentially, these markers can also be used as targets for ‘epigenetic therapy’: therapies that are able to alter the epigenetic state of key genes (in a more or less specific way), slowing down the development of cancer or making the patient more sensitive to chemo‐ or other therapeutic agents or treatments. In part 4, a unique viral infection model system is used in order to investigate the influence of viral infection on the epigenetic state of their human host cells. As in some cancer types (such as cervical cancer), the infection with a virus plays a key role in the development of the tumor, the genes affected by viral infection may be very early diagnostic markers, with high precision (both sensitivity and specificity). I wish you an inspiring journey through the world of DNA‐methylation! Maté Ongenaert – Lokeren/Ghent, January 2009. 6 Introduction
What is epigenetics and DNA­methylation? What are the functions of DNA­methylation and how does the epigenetic changes contribute to the development of diseases such as cancer? Part 1: Epigenetics, DNA‐methylation, development and cancer Part 1: Epigenetics, DNA‐methylation, development and disease 7 Chapter 1: Genetics and epigenetics – introduction 學而不思則罔,思而不學則殆。 (To study and not think is a waste. To think and not study is dangerous) Confucius, Chapter II, The Analects 1.1 Situation in molecular biology The human genome is composed out of roughly 3 billion base pairs, of which less than 2 percent encode pro‐
teins (the genes). This central dogma of molecular biology profiled this tiny piece of our DNA as the carrier of our genetic and inheritable material. How‐
ever, some events cannot be ex‐
plained, taking only the genes into account. There are diseases known that are clearly inheritable but seem to randomly pick their patients, some‐
times one part of identical twins gets ill while the other half is not affected. There are cancer types known that develop because of change in activity in a not mutated gene. And why do most mammal clones don’t survive? The answer could partially be found in επι­genetics, literally a layer above the Genetics and epigenetics – introduction
DNA. This layer, made out of proteins or (simple) chemical compounds, does not alter the DNA‐sequence, but is able to distinguish between ill and healthy and controls properties of the organ‐
ism. This epigenetic layer of informa‐
tion is determining processes like growth, aging and the development of cancer. Epi‐mutations (changes in the epigenetic layer) are believed to have a role in diabetes and schizophrenia. Epigenetics is the study of mitotically heritable (i.e. they are maintained when cells divide) alterations in gene expression potential that are not me‐
diated by changes in DNA sequence. 1.2 Types of epigenetic modifications Epigenetic modifications occur at the different levels where genetic infor‐
mation is stored in: ranging from modifications of the DNA‐strand itself to modifications of the proteins in the nucleosomes (the building blocks of chromatin). A simple overview is given in Figure 1.1. Modifications mainly occur by addi‐
tion or removal of simple chemical molecules such as methyl and acetyl groups. The most common changes are addi‐
tions of methylmarks on the DNA (DNA‐methylation) and the addition of different chemical residues to the 9 histones, proteins where the DNA is attached to. The latter modifications are often referred to as the histone tails. M
DNA Methylation
Methyl groups added to specific
DNA bases repress gene activity.
e
M
e
C
G
G
C
C
G
Me
Histone
Histone Modification
Many different modifications to
histones, including methylation
and acetylation, have been
identified. These modifications
can alter the activity of the DNA
wrapped around them.
Chromosome
Figure 1.1: Overview of the different levels that epigenetic modifications control 1.2.1
DNA modifications
The most described epigenetic change is DNA methylation. A methylgroup is added onto the DNA by a family of enzymes, called the DNA methyltrans‐
ferases (DNMTs). This epigenetic change influences transcription and other epigenetic changes and is dis‐
cussed in detail in Chapter 2: DNA‐
methylation. 10 1.2.2
Histone modifications
Chromatin is the physiological tem‐
plate of our genome. Its fundamental unit, the nucleosome core particle, consists of 146 DNA base pairs organ‐
ized around an octamer consisting of two copies of each highly conserved core histone proteins – H2A, H2B, H3 and H4. Dynamic modulation of chro‐
matin structure, chromatin remodel‐
ling, is a key component in the regula‐
tion of gene expression, apoptosis, Genetics and epigenetics – introduction
The histones can be modified in differ‐
ent ways: methylation, acetylation and phosphorylation are the most com‐
mon modifications. These modifica‐
tions can happen at distinct amino acid residues of the histones, as shown in Figure 1.2 (Inche and La Thangue, 2006). DNA replication and repair and chro‐
mosome condensation and segrega‐
tion. Disruption of these processes is intimately associated with human diseases, including cancer (Wang et al., 2007). Ac
P
Ac
Ac Ac
S G R G K Q G G K AR AK S KS RS SR AG LQ F PV G R IHR L L RK G NY
Me
Ac
Ac
Ac
P
Ac
Ac
P
Me
P
P D PA K S APAP KK G SK K AV TK AQ KK D G KK R KR SR KE SY SI
P
Me Me
Me
Ac
Me
P
Me
P
Ac
Ac
Me
Me
Ac
Ac
Me
Ac
Ac
H2B
Me
Ac
Me
P
Me
A R T K Q TA RK STG G K APR KQ L AT KA AR KS APATG G V K K PH
P Me Ac
H2A
H3
Me
S G R G K G G KG LG KG G A KR HR K V L RD NI Q G IT K PAI RR L AR
H4
Figure 1.2: Modifications and their locations on the tails of core histones, H2A, H2B, H3 and H4. Spheres indicate the residue that is modified and the type of modification; methy‐
lation of lysines (red), methylation of arginines (blue), acetylation of lysines (orange), phosphorylation of serine or threonine (green) (Inche and La Thangue, 2006) 1.3 Research objectives The ‘broad’ title of this thesis (cellular reprogramming) may already give an indication of its wide range of re‐
search questions. These research questions are however all in the field of epigenetic modifications, and can be roughly divided into four sections; Genetics and epigenetics – introduction
represented in this thesis by four parts. Part 2 (DNA‐methylation, cancer and literature) focuses on the impact of one of the most described epigenetic changes (DNA‐methylation) on cancer initiation and development. A lot of research is performed in this area, and one of the first objectives is to collect, 11 summarize and present the current knowledge in an easy to use interface. In this perspective, we have developed a methylation database in cancer: PubMeth. choice of possible markers, analysis of genome‐wide data and help with the ranking and selection of possible can‐
didates to validate after initial data analysis. Part 3 (Genome‐wide selection of me‐
thylation markers) focuses on the careful selection of DNA­
methylation biomarkers in different cancer types. The initial discovery of DNA‐methylation biomarkers is im‐
portant, as these initially discovered markers can be validated as being cancer‐specific. Such a marker can be used in early discovery of cancer and to predict whether a patient will bene‐
fit from certain therapies. Current technologies allow to screen for bio‐
markers on a genome‐wide scale; however there has to be a high­
performing screening and analysis strategy to increase the success rate in the experimental validation studies. In this perspective, computational approaches in all stages are very useful: they may provide better initial Part 4 (reprogramming of human host cells by viruses) focuses on the influ­
ence of viral infections on DNA­
methylation of the human host cells. Therefore, we make use of keratinocyte cell lines, that are trans‐
fected with proteins from a high‐risk HPV type (Human Papilloma Virus type 16), which is clearly associated with cervical cancer (> 99,7 % of pa‐
tients are infected with a HPV type). Using a genome‐wide marker discov‐
ery technique (MSDK), we are able to identify markers, whose methylation state changes after viral infection. This discovery shows that viruses are able to reprogram their host cells, and the markers identified can help under‐
stand this process and may be ideal early diagnosis targets. 12 Genetics and epigenetics – introduction
Chapter 2: DNA­
methylation Nino is late. Amélie can only see two explanations: 1 ­ he didn't get the photo. 2 ­ before he could assemble it, a gang of bank robbers took him hostage. The cops gave chase. They got away... but he caused a crash. When he came to, he'd lost his memory. An ex­con picked him up, mistook him for a fugitive, and shipped him to Istanbul. There he met some Afghan raiders who took him to steal some Russian warheads. But their truck hit a mine in Tajikistan. He sur­
vived, took to the hills, and became a Mujaheddin. [Increasingly angry] Amé­
lie refuses to get upset for a guy who'll eat borscht all his life in a hat like a tea cozy. Narrator in “Le fabuleux destin d'Amé­
lie Poulain” (2001) 2.1 Occurrence of DNA­
methylation DNA‐methylation is a natural occur‐
ring epigenetic change in normal cells. DNA‐methylation
In the human genome, DNA‐
methylation almost exclusively hap‐
pens at cytosine residues within the dinucleotide CG. This symmetric dinu‐
cleotide is often indicated as CpG, where p represents the phosphate in between the two base pairs. The largest part of the CpG dinucleo‐
tides (70 %) is methylated. This ac‐
counts for 0.75 – 1 % of all DNA bases (Bestor, 2000). Methylated cytosines are widely spread, across the genome, with particularly high densities in the promoters of retroviruses and trans‐
posons that have accumulated in the genome. Unmethylated CpG sites are usually found in DNA regions with a high frequency of CpG’s, the so‐called CpG­islands. These CpG‐islands (about 29000 in the human genome) are distributed in a non‐random way, with a preference for the promoter and the first exon regions of genes. This is illustrated in Figure 2.1. Most CpG islands remain free of me‐
thylation and are associated with transcriptionally active genes. Some CpG‐islands are methylated and are associated with imprinted genes and genes on the inactivated X‐
chromosome (Worm and Guldberg, 2002). 13 Normal cell
Transcripon start (TSS)
Exon 1
DNMT
2
3
Promoter region
Figure 2.1: Methylation in normal somatic cells: CG dinucleotides occur rarely in the genome and are in most cases methylated. However, in some regions (CpG islands) they occur in clusters, mostly in promoter regions. The CG dinucleotides in CpG islands are in most cases not methylated. The DNMT1 enzyme maintains methylation (Herman and Baylin, 2003) 2.2 Mechanism of DNA­
methylation There are two different forms of DNA‐
methylation: de novo‐methylation and maintenance methylation. In both situations, enzymes called DNA me‐
thyltransferases, catalyze the reaction. In maintenance methylation (where the methylation of the newly synthe‐
sized strand is a copy of the parental strand), the involved enzyme is DNMT1. Other methyltransferases (DNMT3A and DNMT3B) catalyze de novo methylation (add a methylgroup to previously unmethylated DNA) (Bestor, 2000). S­adenosyl­L­
methionine (SAM) is used as the sub‐
strate. This is illustrated in Figure 2.2. 2.3 Influence of nutrition During DNA‐methylation, S‐adenosyl‐
methionine is bound on cytosine resi‐
dues. Mechanisms (such as biochemi‐
cal pathways) that can influence or control the amount and supply of methyl groups could also have an in‐
fluence on DNA‐methylation. Indeed, there are interactions between dietary factors and DNA‐methylation. Such nutrients include folate, vitamin B6 and B12, methionine and choline. As shown in Figure 2.3, folate has a central role in the one‐carbon metabo‐
lism. Normally, a carbon unit from serine or glycine is transferred to tet‐
rahydrofolate to form 5,10‐
methylenetetrahydrofolate. Vitamin NH 2
NH 2
B6 is a necessary co‐factor of the en‐
CH 3
zyme in this reaction. This folate‐form N
DNMTs
N
SAM
can further be used for the synthesis O
N
O
of purines or reduced and used to N
methylate homocysteine to form me­
cytosine
5-methylcytosine thionine. This latter reaction is cata‐
lyzed by a vitamin B12 dependent Figure 2.2: Methylation of the cytosine‐
enzyme. residue in the DNA 14 DNA‐methylation
Methionine is then converted to S‐
adenosylmethionine (SAM). SAM do‐
nates its labile methylgroups to more than 80 biological methylation reac‐
tions such as methylation of DNA, RNA and protein. However, when the supply of folate is limited, the levels of homocysteine increase. However there exists a reac‐
tion for methionine synthesis from homocysteine. This alternative path‐
way is not sufficient to compensate diminishing SAM pools. The cellular levels of S‐adenosylhomocysteine (SAH) increase as the equilibrium of the SAH‐homocysteine interconver‐
sion is in favour of SAH synthesis. DNA‐methylation
Therefore, when homocysteine me‐
tabolism is inhibited (as in folate defi‐
ciency), cellular SAH will be increased. Increased SAH inhibits methyltrans‐
ferase activity and, consequently, DNA methylation reactions. This inhibition of DNA methylation associated with inadequate dietary folate has also been associated with increased cancer susceptibility (Davis and Uthus, 2004). The effects of folate deficiency on DNA methylation are highly complex; ap‐
pear to depend on cell type, target organ, and stage of transformation; and are gene and site specific. 15 DNA etc.
Diet
Diet
SAM
DHF
Ser
Gly
B6
GHMT
THF
Thymidylate
5,10-CH 2 THF
5,10-CH THF
B2
Diet
MAT
DNMT
Diet
Methylated DNA etc.
Methionine DMG
Choline SAH
MS
Zn, B6
B12
BHMT
Adenosine
Betaine
SAHH
Methyl-THF Homocysteine
CBS
Cystathionine
Cys
Zn, B6
Diet
MTHFR
Purines etc.
GSH
Figure 2.3: Dietary factors, enzymes, and substrates involved in methyl metabolism. En‐
zymes are shown in italics with a box around them. These include glycine hydroxymethyl‐
transferase (GHMT; EC 2.1.2.1); methylenetetrahydrofolate reductase (MTHFR; EC 1.5.1.20); 5‐methyltetrahy‐drofolate:homocysteine S–methyltransferase (methionine synthase of MS; EC 2.1.1.13); betaine‐homocysteine S‐methyltransferase (BHMT; CD 2.1.1.5); methionine adenosyltransferase (MAT; EC 2.5.1.6); DNA methyltransferase (DNMT; EC 2.1.1.37); S‐adenosyl‐homocysteine hydrolase (SAHH; EC 3.3.1.1); and cys‐
tathionine‐ß‐synthase (CBS; EC 4.2.1.22) (Davis and Uthus, 2004) Abbreviations: DHF, dihydrofolate; Ser, serine; Gly, glycine; Cys, cysteine; THF, tetrahy‐
drofolate; B6, vitamin B6 or pyridoxine; B12, vitamin B12 or cobalamin; B2, vitamin B2 or riboflavin; 5,10‐CH2 THF, 5,10‐methyltetrahydrofolate; 5,10‐THF, 5,10‐
methylenetetrahydrofolate; methyl‐THF, 5‐methyltetrahydrofolate; Zn, zinc; DMG, di‐
methylglycine; SAM, S‐adenosylmethionine; SAH, S‐adenosylhomocysteine; GSH, glu‐
tathione 2.4 Detection of DNA­
methylation As in other detection methods in mo‐
lecular biotechnology, there is no sin‐
gle method that is suited in every pos‐
sible application. Do we want to detect methylation of only one gene or on a genome‐wide scale? Must the detec‐
tion be highly sensitive or quantita‐
16 DNA‐methylation
tive? Depending on the application, different detection methodologies are used. Most of the detection method‐
ologies for DNA‐methylation are de‐
rived from standard molecular tech‐
niques, such as the use of restriction enzymes, PCR technologies and se‐
quencing. Figure 2.4 shows which methodology is suitable, depending on the application. Global or
locus-specific ?
Global
Cytosin extension
Bisulfite sequencing of
repetitive elements
HPLC
Locus-specific
Genome-wide or
candidate gene ?
Genome-wide
Candidate gene
Quantitative or
sensitive ?
Array-based
or not ?
Array-based
Antibody or 5mC
binding
Methylation-sensitive
restriction enzyme
Bisulfite modification
Not
RLGS
Digital karyotyping
Library and sequencing
Quantitative
Allele-specific
or not
Allele-specific
Bisulfite cloning
and sequencing
Sensitive
Methyl light
MSP
Not
Direct bisulfite sequencing
•Pyrosequencing
•Manual sequencing
•Mass array
Figure 2.4: Overview of the different detection methodologies for DNA‐methylation, according to the application (Shen and Waterland, 2007) To be able to detect DNA‐methylation, some findings were crucial and are now applied in almost all detection techniques used. These findings are the use of methylation‐specific en‐
zymes and bisulfite treatment of DNA. Methylation‐sensitive restriction en‐
zymes can be used to distinguish me‐
thylated and unmethylated sequences. The restriction functionality of these enzymes is dependent of the methyla‐
tion state of their restriction site. If one can then separate based on frag‐
ment length, the difference methy‐
DNA‐methylation
lated‐not methylated can be visual‐
ized. Bisulfite modification (Hayatsu, 1976) of DNA deaminates all cytosine resi‐
dues to thymine, except the methy‐
lated cytosines. Sequencing (bisulfite sequencing) or designing specific primers for the methylated (not con‐
verted) and methylated treated se‐
quences can distinguish between me‐
thylated and unmethylated sequences. Bisulfite treatment and designing spe‐
cific primers is referred to as MSP (methylation specific PCR) (Herman et al., 1996). 17 Other techniques used in this PhD thesis are COBRA (COmbined Bisulfite Restriction Analysis) and MSDK (Me‐
thylation Specific Digital Karyotyping). COBRA (COmbined Bisulfite Restric‐
tion Analysis) (Xiong and Laird, 1997) is a technique based on bisulfite treatment, restriction and quantitative PCR. First, a PCR reaction amplifies the region of interest. This PCR product is then treated with sodium bisulfite. Next, a restriction digestion is per‐
formed with an enzyme that loses its restriction site when unmethylated. The fragments are separated using PAGE (PolyAcrylamide Gel Electro‐
phoresis), transferred to a membrane via electroblotting and labeled by hybridizing labeled oligonucleotides. Based on the signal intensities, the methylation degree can be quantified. AscI) and ligated to a biotinylated linker. Next a fragmenting enzyme (such as NlaIII), a frequent cutter, is used. Unmethylated fragments are captured with streptavidin coated magnetic beads. Adapters that bind on the NlaIII over‐
hang are ligated. Tagging enzyme MmeI (has its restriction site outside its recognition site which is in the adapter) is used to cleave 17 bp tags. Next steps in the analysis are: ‐
‐
‐
‐
‐
‐
MSDK (Methylation Specific Digital Karyotyping) (Hu et al., 2006) is a genome‐wide methylation detection technology, similar to SAGE (Serial Analysis of Gene Expression). Genomic DNA is digested with a methylation‐
sensitive mapping enzyme (such as 18 DNA‐methylation
‐
ligate ditags amplify by PCR release ditags from adapters using NlaIII ligate to form concatemers clone into a vector and E. coli bac‐
teria using electroporation sequence the plasmid vector to obtain tag sequences map onto the genome and apply statistics: identify tags that are present in one library (not methy‐
lated) vs. less present (methylated) in another library Unmethylated
Methylated
Digest with methylaton-specific methylation mapping enzyme AscI
Ligate to biotinylated linker
Cleave with fragmenting enzyme NlaIII
CATG
CATG
GTAC
GTAC
Capture with straptavidin beats
GTAC
CATG
CATG
GTAC
Ligate to LS adapter A,B
A CATG
GTAC
A
CATG
GTAC
CATG
GTAC
B
A
CATG
GTAC
A
B
CATG
GTAC
CATG
GTAC
B
CATG
GTAC
CATG
GTAC
B
Release tags with IIS enzyme MmeII
A
CATGNNNNNNNNNNNNNNNNN
B GTACNNNNNNNNNNNNNNN
CATGXXXXXXXXXXXXXXXXX
GTACXXXXXXXXXXXXXXX
tag2
tag1
A CATGXXXXXXXXXXXXXXXXX
GTACXXXXXXXXXXXXXXX
Ligate to form ditags
CATGNNNNNNNNNNNNNNNNN
B GTACNNNNNNNNNNNNNNN
tag1
ditag
XXXXXXXXXXXXXXXXX NNNNNNNNNNNNNNNCATG
A CATG
GTAC XXXXXXXXXXXXXXXNNNNNNNNNNNNNNNNNGTAC B
ditag
NNNNNNNNNNNNNNNCATG
A CATGXXXXXXXXXXXXXXXXX
GTACXXXXXXXXXXXXXXXNNNNNNNNNNNNNNNNNGTAC B
tag2
PCR amplification, NlaIII digestion
tag1
XXXXXXXXXXXXXXXXX NNNNNNNNNNNNNNN CATG
GTAC XXXXXXXXXXXXXXXNNNNNNNNNNNNNNNNN
XXXXXXXXXXXXXXXXX NNNNNNNNNNNNNNNCATG
GTACXXXXXXXXXXXXXXXNNNNNNNNNNNNNNNNN
Ligate to form concatemers
tag1
tag2
tag3
XXXXXXXXXXXXXXXXX NNNNNNNNNNNNNNN CATGXXXXXXXXXXXXXXXXX NNNNNNNNNNNNNNNCATG
GTACXXXXXXXXXXXXXXXNNNNNNNNNNNNNNNNN GTACXXXXXXXXXXXXXXXNNNNNNNNNNNNNNNNN
tag2
tag4
Figure 2.5: Overview of MSDK library generation (Hu et al., 2006) DNA‐methylation
19 Chapter 3: Func­
tions of DNA­
methylation There is a difference between knowing the path and walking the path Morpheus in “The Matrix” (1999) 3.1 Imprinting A subset of genes is expressed from only one of the two chromosome homologues. In all organisms it is al‐
ways either the maternal copy or the paternal copy that is expressed. This process is called genomic imprinting; to date about 70 imprinted genes have been identified. While there are a number of lone imprinted genes, the majority of identified imprinted sites are found in clusters. The clusters seem to contain at least one non‐
coding RNA (ncRNA) gene. Each clus‐
ter is controlled by a single major cis‐
acting element: the Imprinting Con­
trol Region (ICR). ICRs acquire differential methylation between the maternal and paternal copy in the germ cells and are able to control the imprinting of all genes within the cluster. The clusters can be divided in two distinct types: those Functions of DNA‐methylation
whose ICR is methylated during oogenesis (on the maternally im‐
printed chromosome), and those whose ICR is methylated during sper‐
matogenesis (Edwards and Ferguson‐
Smith, 2007). Methylation plays a crucial role in imprinting. The mechanism that is involved in methylation of the ICR is not completely resolved yet. However, DNMT3a (a de novo methylation en‐
zyme) and DNMT3l (which is similar to DNMT3a) seem to be essential for methylation. How the difference arises between the paternal and maternal lineages is not fully understood. Re‐
searchers think that methylation as well as protection from methylation is involved. Once the methylation mark is estab‐
lished, it is mainly maintained by the maintenance methyltransferase en‐
zyme DNMT1. After fertilization, a genome‐wide reprogramming event occurs and all DNA methylation is actively and immediately lost from the paternal pronucleus and progressively lost from maternally inherited chro‐
mosomes. However, DNA‐methylation imprints on both parental genomes are resistant to these events. Some factors that protect the demethylation of imprinted genes have been identi‐
fied but the entire mechanism is unre‐
vealed.
21 3.2 Diseases caused by abnormal imprinting 3.2.1
Beckwith-Wiedemann
syndrome
The Beckwith‐Wiedemann syndrome (BWS) is characterized by prenatal overgrowth, midline abdominal wall defects, ear creases or pits, neonatal hypoglycemia, and a high frequency of Wilms and other embryonal tumors, such as rhabdomyosarcoma and hepa‐
toblastoma. It is the first disease where it became clear that, next to genetic factors, epigenetic factors were clearly involved. The risk of each of the clinical stigmata of BWS could be determined with respect to the molecular defects. The first of these is loss of imprinting (LOI) of the insulin‐like growth factor‐II gene (IGF2), an imprinted growth factor gene normally expressed only from the paternally inherited allele but in BWS expressed from both paternal and maternal copies (Feinberg, 2008). 3.2.2
Prader-Willi/Angelman
syndromes
The importance of parent‐specific imprinting becomes clear in the Prader‐Willi (PWS) and Angelman syndromes (AS). Both diseases are distinct neurodevelopmental disord‐
ers, each caused by several genetic and epigenetic mechanisms involving 22 Functions of DNA‐methylation
the proximal long arm of chromosome 15. Lack of a functional paternal copy of 15q11‐q13 causes PWS; lack of a functional maternal copy of UBE3A, a gene within 15q11‐q13, causes AS (Horsthemke and Wagstaff, 2008). 3.3 Silencing of the female X­chromosome In somatic cells of mammalian fe‐
males, only one copy of the X‐
chromosome is active, the other copy is silenced. During the development, in each cell an initial choice is made which copy of the X chromosome will be inactivated and this chromosome is stably silenced through the different mitotic divisions. Thus, females are mosaics: some clusters of cells have the paternal X chromosome active, others have the maternal X chromo‐
some active. The process of X‐inactivation is com‐
plex and occurs in different stages. First, the region where the initial marking happens, is called the X Inac­
tivation Centre (XIC). The most im‐
portant gene in this region is the X­
inactive Specific Transcript Gene (Xist). The region includes elements that are required for the marking of an active X chromosome and the stable expression and localization of Xist from the inactive X chromosome. Prior to inactivation, Xist expression is detected as a small pinpoint of expres‐
sion from both X chromosomes, until the transcripts accumulate and local‐
ize on the future inactive X, mediated at least in part by stabilization of the transcript. The puzzle of how one of two apparently equivalent X chromo‐
somes can be chosen to express Xist, and thus be inactivated, remains to be solved. It is clear, however, that com‐
ponents of the XIC are involved, and it has been suggested that the levels of Xist RNA may influence which copy of X undergoes inactivation. DNA methy­
lation has been implicated in the regulation of Xist in differentiated cells since the promoter region of the tran‐
scriptionally active allele on the inac‐
tive X chromosome is unmethylated, whereas that of the transcriptionally inactive allele on the active X chromo‐
some is methylated. The coating of the inactive X‐
chromosome with Xist is only the start of a whole cascade of reactions that change the structure of the chromatin and block transcription. These reac‐
tions include histone H3 lysine 9 me‐
thylation and hypoacetylation, H4 hypoacetylation and DNA‐methylation (Chang et al., 2006). 3.4 Silencing of junk DNA Junk DNA are parts of the genome that seem to have no any function. On the contrary, they could be harmful if expressed. It is the remainder of vi‐
ruses that were able to integrate in the Functions of DNA‐methylation
genome and genes that have dupli‐
cated and mutated. Transposable ele‐
ments (that account for over 30 % of the human genome) such as trans‐
posons and retrotransposons are examples. The latter include Long Interspersed Nuclear Elements (LINEs) and Short Interspersed Nu‐
clear Elements (SINEs). Expression of these elements leads to genetic insta‐
bility, as the elements are able to copy themselves and integrate in other parts of the genome. Therefore, it is important that these elements remain transcriptionally silenced. DNA‐
methylation plays a crucial role in this regard. The concept of methylation as a ge‐
nome defence system assumes that retrotransposable elements are inher‐
ently detrimental to the genome. Pro‐
tection and conservation of the integ‐
rity and fidelity of an organisms DNA serves as the overriding goal. There‐
fore, an important aspect of DNA me‐
thylation is its connection to the host‐
defence system, which acts to offset the threats from these largely parasitic sequences by maintaining them in a methylated, transcriptionally silent state (Carnell and Goodman, 2003). 3.5 RNA structures and methylation miRNAs (micro­RNAs) are short (around 22 nucleotides) RNA mole‐
cules encoded in the genome. They are 23 transcribed into primary miRNAs, processed in the nucleus by RNAseII Drosha and DGCR8. The resulting pre‐
cursor miRNAs form imperfect stem‐
loop structures that are exported to the cytoplasm by Exportin‐5, where they are further processed by RNAseIII Dicer into the mature and functional miRNAs. These miRNAs have the ability to bind to their target mRNA sequences with complete com‐
plementarity, which can lead to deg­
radation of this target. of the PHB gene and change the chro‐
matin structure. It is also predicted that the main DNMTs are potential targets of miRNAs. miRNAs may regu‐
late chromatin structure as well by regulating key histone modifiers. This complex interplay between epigenet‐
ics and miRNA is schematized in Fig‐
ure 3.1. DNA-methylation
5-Aza-CdR
DNMTs
miRNAs are linked with cancer, they are reported to act either as onco‐
genes or as tumor suppressor genes (Chuang and Jones, 2007). miRNAs
Translational
suppression
HDACs
e.g. HDAC 4
PBA
miRNAs can also be involved in estab‐
lishing DNA methylation. In Arabidop­
sis two miRNAs (miR‐165 and miR‐
166) are involved in the methylation 24 Functions of DNA‐methylation
Chromatin remodeling
Figure 3.1: The interplay of epigenetics and miRNAs (Chuang and Jones, 2007) Chapter 4: Methyla­
tion and influence on transcription Is that what your little note says? It must be hard living your life off a cou­
ple of scraps of paper. You mix your laundry list with your grocery list you'll end up eating your underwear for breakfast. Natalie in “Memento” (2000) 4.1 Interactions with DNA­
methylation Initially it was proposed that methyla‐
tion could alter or interfere with the correct binding of nuclear factors (such as transcription factors) to their targets. In a number of cases (mostly solitary CGs that become methylated) this mechanism is indeed involved. An alternative mechanism is one whereby transcriptional repressors selectively recognize methylated CpGs. In this context, researchers were able to identify proteins that specifically bind to symmetrically methylated CpGs. This family of proteins seemed to contain a Methyl­CpG Binding Domain (MBD). This group of pro‐
teins seemed also to be involved in gene silencing through changes in the histone modification pattern. Previ‐
ously, it was already shown that DNA methylation patterns are mechanisti‐
cally linked to gene silencing through changes in the histone code. The MBD‐
containing proteins links these two epigenetic changes (Ballestar and Esteller, 2005). A number of MBD‐
containing proteins have been identi‐
fied such as MECP2, MBD1‐4, BAZ2A and 2B. The ability of MBD proteins to repress transcription is fundamental. As men‐
tioned earlier, methylated DNA is as‐
sociated with transcriptional repres‐
sion and inactive chromatin. An initial hypothesis was that the binding of these factors could alter the chromatin structure, thereby denying access to the transcriptional machinery. Further studies have demonstrated that MeCP2‐dependent repression is medi‐
ated through the recruitment of his‐
tone deacetylases and histone lysine methyltransferases as shown in Figure 4.1 (Worm and Guldberg, 2002). Histone deacetylases (HDAC) deace‐
tylate the histones, causing the chro‐
matin to condense and become inac‐
cessible for the transcription machin‐
ery. This chromatin remodelling causes transcriptional silencing. Methylation and influence on transcription
25 Figure 4.1: Linking DNA‐methylation and inactive chromatin. Methyl‐binding domains bind onto methylated DNA and recruit histone deacetylases, turning active chromatin into condensed chromatin (Worm and Guldberg, 2002) 4.2 Protein complexes in­
volved in the link DNA­
methylation – histone modification The interaction between DNA‐
methylation and histone modifications through the MDB‐containing proteins is very complicated, as there are dif‐
ferent protein complexes involved. 26 One such a complex (Mi‐2/NuRD his‐
tone deacetylase) interacts with MBD2 (also known as the MeCP1 complex). This complex contains the histone deacetylase complex but also ATP‐
dependent chromatin‐remodelling subunits. In addition, MBD3 is also incorporated in the complex. The in‐
terplay between DNA‐methylation and histone modification through protein complexes is illustrated in Figure 4.2.
Methylation and influence on transcription
Figure 4.2: The role of methyl‐CpG‐binding domains (MBDs) in silencing methylated tumor‐suppressor genes. In cancer cells, many tumor‐suppressor genes undergo aberrant hypermethylation at their CpG islands, and many different elements can be recruited: the 4 MBD proteins involved in transcriptional repression, recruitment of histone deacetylase (HDAC) and histone methyltransferases (HAT), binding to both methyl‐CpG‐rich and ‐poor sequences, as well as modulation of the binding by post‐translational modification. Changes in the chromatin of these genes lead to transcriptional silencing (Ballestar and Esteller, 2005) 4.3 The influence of the Polycomb group of proteins The establishment and maintenance of epigenetic gene silencing is fundamen‐
tal to cell determination and function. Apart from DNA methylation systems, a group of proteins, the Polycomb Group (PcG), is a conserved system to establish gene silencing. This group of proteins, first discov‐
ered in Drosophila flies, are part of multiplex protein complexes, includ‐
ing HDACs (histone deacetylases) and HMTs (histone methyltransferases). The PRC2 complex, containing EZH2, is capable of methylating the histone H3 tail at lysine‐9 (K9) and more prominently at K27. Both histone marks cause gene silencing (Lund and van Lohuizen, 2004). Methylation and influence on transcription
27 In addition, the Polycomb protein EZH2 shows to associate with DNMTs and is required to establish DNA me‐
thylation in a subset of target genes (Vire et al., 2006). This shows that the Polycomb proteins may serve as re‐
cruiters for DNMTs, involved in the hypermethylation of tumor suppres‐
sor genes, highlighting another con‐
nection between various epigenetic silencing mechanisms. A summary of the networks where Polycomb plays a role in, is given in Figure 4.3 (Spar‐
mann and van Lohuizen, 2006). PRC1
Histone tails
Me
Pol II
H3K27
Nucleosome
Inhibition of transcription
DNA
Ub
EZH2
PRC2
H3K27
Me
PRC1
H2AK119
PRC1
H2AK119 ubiquitylation
Me
Me
Me
DNMT
PRC2
Chromatin compaction
Recruitment of DNMTs
Figure 4.3 : Binding of the PRC2 (Polycomb repressive complex 2) initiation complex to the Polycomb group (PcG) target genes induces enhancer of zeste homologue 2 (EZH2)‐
mediated methylation (me) of histone proteins, primarily at lysine 27 of histone H3 (H3K27). PRC1 is able to recognize the trimethylated H3K27 (H3K27me3) mark through the chromodomain of Polycomb (PC). This interaction might bring neighbouring nu‐
cleosomes into the proximity of the PRC2 complex to facilitate widespread methylation over extended chromosomal regions. Although the precise mechanisms for PRC‐mediated stable gene silencing are still poorly understood, they are proposed to involve direct inhi‐
bition of the transcriptional machinery, PRC1‐mediated ubiquitylation (Ub) of H2AK119, chromatin compaction and recruitment of DNA methyltransferases (DNMTs) to target gene loci by PRC2. Pol II, RNA polymerase II (Sparmann and van Lohuizen, 2006) 28 Methylation and influence on transcription
Chapter 5: DNA­
methylation and cancer [running] Okay, so what am I doing? [sees man also running] I'm chasing this guy. [man shoots] Nope. He's chas­
ing me. Leonard Shelby in “Memento” (2000) 5.1 Development of can­
cer and the role of DNA­methylation Cancer results from the uncontrolled growth of cells. The cell division proc‐
ess of a normal cell is strictly con‐
trolled at different stages. Thus to become a progenitor cancer cell, these control mechanisms have to be by‐
passed. These mechanism include control of growth and cell divisions and control of programmed cell death (apoptosis). The alteration of normal cell functions are caused by both genetic and epige‐
netic failures and defects that either activate proto‐oncogenes or inactivate tumour suppressor genes. Proto­oncogenes are genes that, in normal cells, act as components of growth promoting signalling pathway (growth factors, growth factor recep‐
DNA‐methylation and cancer
tors, intracellular signalling molecules and transcription factors) or are anti‐
apoptotic genes, angiogenesis promot‐
ing genes, telomerase (TERT) and invasion and metastasis promoting genes. If these genes are more acti‐
vated than is intended, this can pro‐
mote tumour growth, stimulate angio‐
genesis and metastasis. This activation can be caused by single base pair changes (SNPs: Single Nucleotide Polymorphisms) that influence the expression or activation. Other possi‐
bility are chromosomal translocations whereby the gene is inserted in an more active region or by the formation of an oncogenic fusion protein. Silencing of tumour suppressor genes (TSG) is caused by mutations, chromosomal aberrations and DNA‐
methylation. The TSGs are genes in‐
volved in important controlling path‐
ways that prevent a somatic cell to turn into a cancer cell. Their functions are cell cycle control, apoptosis and cell adherence and communication control. To silence a TSG, two consecu‐
tive alterations are required as the copies of the TSG on both chromo‐
somes must be targeted. The silencing of TSGs by two genetic or epigenetic events is known as the Knudson two hit hypothesis, shown in Figure 5.1. In hereditary cancer types, one of the hits is genetically caused and trans‐
ferred to the offspring, and only one other hit is required. In sporadic can‐
29 cers, two consecutive hits are re‐
quired. The first hit is either a genetic (such as point mutations) or an epige‐
netic one. The consecutive hit is often a chromosomal defect (such as Loss Of Heterozygocity ‐ LOH) or is caused by DNA‐methylation. The number of can‐
cer‐related genes affected by epige‐
netic inactivation equals or exceeds the number that are inactivated by mutation. Many genes modified by promoter hypermethylation in cancers have tumour‐suppressor function. Some important pathways whose function is disabled by DNA‐
methylation of important genes in the pathway are listed in Table 5.1 (Esteller, 2007b). An overview of which genes have been described as hypermethylated in cancer, is given in Figure 5.2. mutation
methylation
FIRST HIT
methylation
LOH
LOH
methylation
SECOND HIT
mutation
+ LOH
mutation
+ methylation
methylation
+ LOH
biallelic
methylation
Figure 5.1: Overview of the Knudson two‐hit hypothesis. The first hit in non‐heritable cancer types can either be a mutation or a methylation; the second hit can be a methyla‐
tion or a chromosomal defects such as Loss Of Heterozygocity (LOH) 30 DNA‐methylation and cancer
Table 5.1: Important pathways that are altered during cancer initiation or development by DNA‐hypermethylation of genes in the pathway (Esteller, 2007a) Pathways DNA repair Hormone response
Representative hypermethylated genes hMLH1, MGMT, WRN, BRCA1
Receptors of estrogen, progesterone, androgen, prolactin and thyroid‐stimulating hormones Vitamin response RARB2, CRBP1
Ras signaling RASSF1, NORE1A
Cell cycle p16, p15, Rb
P53 network P14, p73, HIC1
Cell adherence and E‐cadherin, H‐cadherin, FAT cadherin, EXT1, SLIT2, EMP3
invasion Apoptosis TMS1, DAPK1, WIF1, SFRP1
Wnt signaling APC, DKK1, IGFBP3
Tyrosine kinase SOCS1, SOCS3, SYK
cascades Transcription factors GATA4, GATA5, ID4
Homeobox genes PAX6, HOXA9
Other pathways GSTP1, LKB1, THBS14, COX2, SRBC, RIZA, TPEF, SLC5A8, Lamin A/C microRNAs miR‐127 (target: BCL6), miR‐124a (target: CDK6)
DNA‐methylation and cancer
31 Figure 5.2: Chromosomal location of genes, whose promoter region is described as hy‐
permethylated in different cancer types (Esteller, 2007a) 5.2 Cancer stem cell hy­
pothesis and epigenet­
ics The multistep model of carcinogenesis (Knudson two hit hypothesis, as shown above) requires a long‐living cell in which multiple genetic or epi‐
genetic hits occur. As an alternative, it might be possible that the progenitors of stem cells, that normally undergo limited numbers of cell divisions, ac‐
quire the capacity to self‐renew. These so called cancer stem cells (CSCs) subsequently become the long‐living 32 DNA‐methylation and cancer
targets, acquiring (epi‐)genetic le‐
sions. As normal adult stem cells, CSCs can divide indefinitely and give rise to both more CSCs and progeny that can differentiate into the different cell types in a tumour (Tan et al., 2006). Remaining question is how a normal stem cell acquires unlimited division capability and becomes a cancer stem cell. Recently, genes have been identified that are targeted for transcriptional repression in human embryonic stem (ES) cells. This is caused by the PcG (Polycomb) proteins SUppressor of Zeste 12 (SUZ12) and Embryonic Ec‐
toderm Development (EED), which form the Polycomb repressive com‐
plex 2 (PRC2) and which are associ‐
ated with nucleosomes that are trimethylated at Lys27 of histone H3 (H3K27). It seems that genes, targeted in ES cells by Polycomb, have a high chance of being cancer‐specifically methy‐
lated. The predisposition of ES cell PRC2 targets to cancer‐specific DNA‐
hypermethylation suggests crosstalk between PRC2 and de novo DNA me‐
thyltransferases in an early precursor cell with a PRC2 distribution similar to that of ES cells. The precise develop‐
mental stage and type of cell in which such crosstalk might occur is un‐
known and might not be an embryonic stem cell. This crosstalk between Polycomb repression and aberrant DNA‐methylation is shown in Figure 5.3 (Widschwendter et al., 2007).
Figure 5.3: A model for the progression of epigenetic marks from reversible repression in ES cells to aberrant DNA methylation in cancer precursor cells and persistent gene silenc‐
ing in cancer cells (Widschwendter et al., 2007) 5.3 Cancer profiling based on DNA­methylation Important tumour suppressor genes can be silenced by DNA‐
hypermethylation. However, this hap‐
pens in a non‐random way: in a cer‐
tain cancer type, DNA‐
hypermethylation in the promoter regions of the same genes can be ob‐
served in many patients. The DNA­
hypermethylome (which genes are DNA‐methylation and cancer
methylated) seems to be dependent of the cancer‐type, subtype and in some cancer types even of the stage of can‐
cer‐development. These cancer‐
specific methylation patterns have been used in some studies to classify cancer (sub)types: DNA‐methylation information can be used to profile the different cancer types. This is illus‐
trated in Figure 5.4 (Paz et al., 2003) where, based on methylation analysis in the promoter region of 15 genes, 33 cell lines of different cancer types can be correctly classified. Figure 5.4: Hierarchical clustering of human cancer cell lines by CpG island promoter hypermethylation. In the top and left parts, the genes and cell lines analyzed are indicated, respectively. In the panel, red indicates hypermethylated CpG island, green indicates unmethylated CpG island, and black indicates homozy‐
gous deletion. Different cell types are indicated by colors: colon (blue), breast and prostate (dark green), lung (pink), renal (gray), head and neck (light green), leukemia (light blue), mela‐
noma (yellow), bladder (light violet), glioma (dark violet), lym‐
phoma (magenta), and nontransformed cell lines (red) (Paz et al., 2003) 34 DNA‐methylation and cancer
There are large differences in the me‐
thylation frequencies of the main hy‐
permethylated genes across various cancer types. Combining this informa‐
tion, one can determine a typical me‐
thylation profile for each cancer type (Esteller, 2007b). future investigations towards person‐
alized medicine. 5.4 Uncovering the cancer methylome Also commonly used are DAC (5­aza­
2’­deoxycytidine) and TSA (trichostatin A) treatments. DAC is a nucleotide homologue and is built in the DNA instead of cytosine. Main difference is that DAC cannot be me‐
thylated and the initial methylation signal will be lost. TSA is an inhibitor of HDAC (Histone DeACetylases) and prevents the histones to become deacetylated and thus the chromatin to condense. Different methodologies have been used to detect methylation in a ge‐
nome‐wide manner. Commonly used are array­based techniques (such as Differential Methylation Hybridisation, DMH, using different probes where the different products after bisulfite treatment can hybridize against). Other studies rely on the use of anti­
bodies (such as ChIP (Chromatin Im‐
munoPrecipitation), that recognizes histones) and MeDIP (MEthylated DNA ImmunoPrecipitation), directly recognizing methylated cytosine). Figure 5.5: A CpG island hypermethylation Using high‐resolution tiling arrays, the profile of human cancer. Y‐axis, frequency of hypermethylation for each gene in each binding sites of the antibodies can be detected. primary cancer type (Esteller, 2007a) With the availability of genome‐wide screening methods, these methods have been adopted for the detection of DNA‐methylation. Purpose of these large‐scale genome‐wide analyses is to discover which genes are methylated in the investigated cancer types and in which group and cancer stage of pa‐
tients this occurs. This kind of analy‐
ses can uncover large portions of the so‐called cancer methylome at once and give insight in the biology of the tumours and their progression. Differ‐
ent patient groups can be identified as well, which could be useful for the DNA‐methylation and cancer
The difference in expression of the genes is measured (e.g. using an ex‐
pression micro‐array) before and after treatment. If initially a gene was not expressed and after treatment with 35 DAC and/or TSA it is reactivated, its silencing could be caused by DNA‐
methylation. ternative inactivating route to muta‐
tions for many tumour suppressor genes. In a genome‐wide sequencing of can‐
cer genes, Sjoblom et al. (Sjoblom et al., 2006) observed that newly discov‐
ered gene mutations in colon and breast cancers generally had a low incidence of occurrence, with 90% of the genes identified harbouring a mu‐
tation frequency of less than 10%. An epigenetic study (Schuebel et al., 2007) shows that about half of these mutated genes are methylated in a much higher frequency, as shown in Figure 5.6. Also, a much higher num‐
ber of candidate hypermethylated genes were found in comparison with mutated genes. Both observations show that this epi‐
genetic change might provide an al‐
Figure 5.6: Relationship between methy‐
lation status, analyzed by MSP, and muta‐
tion for 13 genes (Sjoblom et al., 2006) 5.5 Discovering epigenetic biomarkers state of the DNA at certain residues, or the chemical modification of the his‐
tone tails. Depending on the existing knowledge and the purpose of the biomarker discovery study, the differ‐
ent attempts in cancer can be divided into a limited number of main catego‐
ries. According to the World Health Organi‐
sation (WHO), a biomarker is a cellular or molecular indicator of exposure, health effects, or susceptibility. Bio­
markers can be used to measure inter­
nal dose, biologically effective dose, early biological response, altered struc­
ture or function, susceptibility. In the epigenetic perspective, a bio‐
marker could thus be the methylation 36 DNA‐methylation and cancer
At first, the biomarker discovery is focussed at finding the difference be‐
tween cancer and normal samples. A good biomarker is methylated in the cancer samples while not in normal samples, and preferentially detectable from the early stages on. Some studies try to compose a panel of biomarkers to increase the sensitivity and/or se‐
lectivity. These initial candidates need further investigation and validation on a higher number of samples (possibly including different cancer stages) in order to be able to identify candidates for early diagnosis, survival analysis or stratification. In later stages, the influence of DNA‐
methylation profiles on therapies and survival can be determined. In addi‐
tion, possible targets for epigenetic therapy can be identified, for which clinical trials should be designed, in‐
cluding large patient and control groups. 5.6 Early diagnostics As there exist very sensitive method‐
ologies to detect DNA‐methylation of specific gene promoters, few cells with a changed methylation profile of these genes can be detected within thou‐
sands of normal cells. This makes DNA‐methylation an ideal screening tool to detect the development of can‐
cer before any clinical symptom can be observed. This early detection meth‐
odology is especially useful if non­
invasive samples containing tumour‐
DNA can be obtained. These samples can include blood or other body fluids, semen, urine or faeces from patients (Paluszczak and Baer‐Dubowska, 2006). DNA‐methylation and cancer
Two parameters are important in this perspective: sensitivity and selectiv­
ity. The sensitivity is the ability to detect the true positives (cancer sam‐
ples in this case), while the selectivity is the ability to distinguish cancer samples and normal tissues. Sensitiv‐
ity and selectivity should be as high as possible with a minimal set of genes to test methylation status in. When this number is relatively low, the cost of analyzing the samples will be low and applicable to screen high‐risk patient populations. 5.7 Stratification and per­
sonalized medicine Within one tumour (sub) type, the identified methylation biomarkers often show distinct patient popula­
tion groups. These groups, as found by clustering methodologies, often interfere with discrete histological cancer subtypes. DNA methylation profile assessment in cancer allows subtype discrimina‐
tion, which is often connected with risk and prognosis estimation. In addi‐
tion to this patient stratification, methylation profiles can also be used to predict response to treatment (such as chemotherapy). This could be used to treat patients based on their (epi‐) genetic profile, often referred to as personalized medicine. Currently, real individualized personal medicine is not applied but different patient 37 groups (as identified by their methyla‐
tion profile) receive different treat‐
ments, or one decides that some groups benefit from additional or omitted treatment strategies. Exam‐
ples are the application of radiation in combination with chemotherapy. velopment of cancer. Therefore, (tar‐
geted) therapies could be used to af‐
fect these DNA‐methylation and his‐
tone changes. There are a number of treatments in a variety of cancer types, both solid tumours and hematological malignancies. Both early detection and patient stratification are a huge benefit and a good biomarker combination provides both. The treatment with the highest success at a very early stage is crucial in stopping cancer development, me‐
tastasis and recurrence. The epigenetic drugs currently avail‐
able are divided into two separate classes: DNMT inhibitors (DNA de‐
methylating drugs) and HDAC inhibi‐
tors (chromatin remodelling drugs). In both classes, different chemical groups are discovered, as schematized in Table 5.2 (Peedicayil, 2006). Also in the development of novel therapies, the discovery of different patient groups that react differently on the active components may speed up the entire trial and registration process. In the different clinical trial stages, the therapy can directly be applied to patient groups with an epi‐
genetic profile that seem to benefit from the treatment. In these patient groups, it is easier to demonstrate the potential of the novel therapy: less patients would be needed to comply with the statistical thresholds. 5.8 Epigenetics and cancer therapy Epigenetic events such as DNA‐
methylation and histone modifications seem to be highly involved in the de‐
38 DNA‐methylation and cancer
DNMT inhibitors inhibit the DNA methyltransferases to methylate the DNA. This causes demethylation after cell divisions. If this causes the si‐
lenced tumor suppressor genes to be activated again, this would be very beneficial in stopping the tumor pro‐
gression. Depending on their chemical structures, the DNMT inhibitors are divided in three subgroups. The nucleoside analogues are built in the DNA instead of cytosine and can‐
not be methylated. These are very potent inhibitors of DNMT, although the most potent ones (such as 5‐aza‐
CR) are also very cytotoxic and are administrated in very low doses. Re‐
cently, Zebularine showed to be less cytotoxic. Table 5.2: Classification of epigenetic drugs with therapeutic potential (Peedicayil, 2006) DNMT inhibitors Nucleoside analogues:
• 5‐azacytidine (5‐aza‐CR) • Decitabine (5‐aza‐CdR) • Zebularine Non­nucleoside analogues: • Procainamide • Procaine • Epigallocatechin‐3‐gallate (EGCG) Antisense Oligonucleotides: • DNMT1 ASO In the search to find alternatives for the cytotoxic nucleotide analogues, other (non‐nucleotide) inhibitors of DNMT are found such as procaine. A completely different strategy is to use antisense constructs to silence the DNMTs. These short oligonucleotides DNA‐methylation and cancer
HDAC inhibitors Hydroxamates:
• Trichostatin A • Suberoylanilide hydroxamic acid (SAHA) Cyclic tetrapeptides: • Depsipeptide • Apicidin Aliphatic acids: • Valproic acid • Phenyl butyrate Benzamides: • MS‐275 • CI‐994 Electrophylic ketones: • Trifluorylmethyl ketones • α‐Ketoamides hybridize with mRNA, making them inactive. HDAC inhibitors inhibit the histone deacetylases, which along with HATs, help maintain the acetylation status of the histones. Various (small) compo‐
nents show to have HDAC inhibitory effects.
39 Upon DNA hypermethylation, transcription of the affected genes may be blocked, resulting in gene silencing. In neopla­
sia, abnormal patterns of DNA methylation have been rec­
ognized and hypermethylation is now considered one of the important mechanisms resulting in silencing expression of tumour suppressor genes, i.e. genes responsible for control of normal cell differentiation and/or inhibition of cell growth Part 2: DNA‐methylation, cancer and literature Part 2: DNA‐methylation, cancer and literature 41 Chapter 6: Intro­
duction "Six pints of bitter," said Ford Prefect to the barman of the Horse and Groom. "And quickly please, the world's about to end." Douglas Adams, "Hitchhiker’s guide to the galaxy" DNA methylation represents a modifi‐
cation of DNA by addition of a methyl group to a cytosine, also referred to as the fifth base (Doerfler et al., 1990). This epigenetic change does not alter the primary DNA sequence and might contribute to overall genetic stability and maintenance of chromosomal integrity and consequently facilitate organization of the genome into active and inactive regions with respect to gene transcription (Robertson, 2002). Genes with CpG islands in the pro‐
moter region are generally unmethy‐
lated in normal tissues. Upon DNA hypermethylation, transcription of the affected genes may be blocked, result‐
ing in gene silencing. In neoplasia, abnormal patterns of DNA methyla‐
tion have been recognized and hyper‐
methylation is now considered one of the important mechanisms resulting Introduction in silencing expression of tumour sup‐
pressor genes, i.e. genes responsible for control of normal cell differentia‐
tion and/or inhibition of cell growth. In the last few years, new hyper‐
methylated biomarkers have been used in cancer research and diagnos‐
tics (Esteller, 2003). The detection of DNA hypermethyla‐
tion was revolutionized by two dis‐
coveries. Bisulfite treatment results in the conversion of cytosine residues into uracil, except the protected me‐
thylcytosine residues (Hayatsu, 1976) and based on the sequence differences after bisulfite treatment, with a me­
thylation specific PCR (MSP), methy‐
lated DNA can be distinguished from unmethylated DNA (Herman et al., 1996). In many cancers, various markers have been reported to be hypermethylated (Paluszczak and Baer‐Dubowska, 2006). As discussed in Chapter 5: DNA‐
methylation and cancer, DNA‐
methylation plays a crucial role in cancer and thousands of publications describe DNA‐methylation of hun‐
dreds of genes in a whole variety of cancer types. Being able to explore and combine this existing knowledge may lead to novel insights in the mechanisms behind and create new research questions. 43 Chapter 7: DNA­
methylation and literature analysis Ainsi presque tout est imitation. L’idée des Lettres persanes est prise de celle de l’Espion turc. Le Boiardo a imité le Pulci, l’Arioste a imité le Boiardo. Les esprits les plus originaux empruntent les uns des autres. (Almost everything is imitation... The most original writers borrowed from one another) Voltaire, "Lettre XII: sur M. Pope et quelques autres poètes fameux", Lettres philosophiques (1733) As each month new publications de‐
scribe the hypermethylation of genes in different cancer types, it would be an advantage to be able to keep track of this information. If one is able to fasten or make it easier to perform literature searches, this information can be used for instance for selecting positive controls and for gene‐
prioritizing purposes. Most abstracts of publications, related to methylation, are stored in the Pub‐
Med database, hosted by the NCBI (National Center for Biotechnology Information). The information of the publication (abstracts, authors, key‐
words,…) can be accessed through the DNA‐methylation and literature analysis
web‐interface NCBI provides, as well as through a retrieval system, called E‐
Fetch. This is in fact a so called ‘API’: an Application Programming Inter‐
face. This API allows access to and retrieval from PubMed (as well as other NCBI databases), using a pro‐
gramming language (such as Perl). This enables us to access PubMed records without having to use the web‐interface of PubMed. This offers perspectives to: • Automatically query PubMed using a lot of (combinations of) search terms at the same time • Sequentially edit the results: high‐
lighting, counting, sorting of spe‐
cific content of interest In the (epi‐)genomic field, applica‐
tions of these two possibilities include: • Computer­generated searching of different aliases of a gene at the same time. A human gene is commonly identified by different aliases (symbols, identifiers in ge‐
netic databases, variants of names and descriptions). This makes it ex‐
tremely difficult to search a specific gene in literature as often more than 10 such aliases are associated with one single gene. And what about genes that share aliases, and textual variants of all these differ‐
ent aliases (e.g. BRCA1, BRCA‐1, BRCA 1, BRCA‐I)? However, using 45 computer programs, it is feasible to download all aliases for a gene through different databases, gen‐
erate textual variants and search PubMed with all these variants at the same time, generating one sin‐
gle results file within seconds to minutes. • Downloading, highlighting and sorting all abstracts, related with one particular area of in­
terest. With a list of keywords, re‐
lated with one particular area of in‐
terest, all related abstracts can be retrieved. These keywords can be highlighted, as well as all genes (and their aliases and textual vari‐
46 ants), sentences with both an alias and a keyword,… At the same time, the different words of interest can be counted and a scoring scheme can be applied. Based on the count‐
ing, one could then rank the ab‐
stracts, find only the abstracts with a particular gene mentioned in it, while the highlighting enables fast screening of the abstract. The next chapters will give insight in biological text mining in general and how we applied these methodologies to generate a methylation database in cancer: PubMeth. DNA‐methylation and literature analysis
Chapter 8: Intermezzo: Biological Text Mining Paper 1: Digging into biomedical literature: a guide to biological text mining Ongenaert M, Van Criekinge W. In preparation Life­science researchers make use of enormous amounts of data, presented in biomedical literature. Researchers would benefit enormously from metho­
dologies that are able to perform literature queries, analyze and filter the results and present the answer to specific research questions and interac­
tions and relationships among them in a summarized overview in an auto­
mated way. This brings us in the field of Biological Text Mining: the use of robotized methods for exploiting the enormous amount of knowledge availa­
ble in the biomedical literature today. This amount of data comes from vari­
ous independent research groups with different angles of view and focus points and it is extremely useful to be able to compare and combine their datasets to gain new insights and come to hypotheses that could not be gen­
erated using only one data source. We discuss how these literature sources can be automatically queried and the results annotated. We therefore make use of existing web­based tools and discuss their strengths and limits. In addition, custom Perl­scripts give insight in the mechanisms that most text mining approaches share.
Intermezzo: Biological Text Mining
47 8.1 Introduction Scientists and researchers in life sciences that plan an experiment, are performing it or are discussing results, consult biomedical literature in all of these stages. Therefore they can use different literature databases (PubMed (Wheeler et al., 2005), Web of Science, Scopus, Google Scholar) to simplify their search and to cover as much trusted sources as possible. However, with currently more than 18 million abstracts in PubMed, the result of the search queries they perform can be unmanageable to analyze and filter the needed information. In addition, creating an accurate and complete literature search query may not be as easy. Therefore, researchers would benefit enormously from methodologies that are able to perform literature queries, analyze and filter the results and present the answer to specific re‐
search questions and interactions and relationships among them in a sum‐
marized overview in an automated way. This brings us in the field of Bio­
logical Text Mining: the use of robo‐
tized methods for exploiting the enormous amount of knowledge avail‐
able in the biomedical literature today. This amount of data comes from vari‐
ous independent research groups with different angles of view and focus points and it is extremely useful to be able to compare and combine their 48 datasets to gain new insights and come to hypotheses that could not be generated using only one data source. Biological text mining must be able to deal with enormous amounts of data and biological classifications such as ontologies and controlled vocabula‐
ries. The algorithms used, should take into account different gene aliases, names and descriptions and variants of them. The analysis should be able to make use of lists of disease pheno‐
types and symptoms, identify chemical compounds and drugs or analysis methods and their abbreviations. It becomes clear that the ideal text min‐
ing application does not exist and the choice of which techniques to use, will be depending on the research domain and the defined goals of the literature search. In this review, we discuss which basic techniques exist to gather relevant literature and how to analyze them, taking ontologies, gene lists and clini‐
cal symptoms into account. The dis‐
cussed methodologies are illustrated with real life examples. Purpose is to give researchers insight in the possi‐
bilities and show examples how to use existing methodologies in an easy way, as fast and with as few additional work as possible. This review is con‐
structed from a practical view: four steps, mostly more or less prominent present in any biological text mining effort are discussed. Intermezzo: Biological Text Mining We will show application examples of tools, already available on the Internet as a web service (application exam‐
ples) and we will demonstrate basic custom Perl‐scripts to give more in‐
sight in the underlying mechanisms (insight examples). The insight exam‐
ples are simplified versions of scripts, used to create the PubMeth database (Ongenaert et al., 2008). 8.2 Step 1: Perform auto­
mated literature que­
ries In any case, the first step in biomedical text mining is to actually get the bio‐
medical texts (abstracts or full text), based on search queries. This seems trivial but this may be the most crucial step as it is in the beginning of the pipeline. If the query is incomplete or inaccurate, the performance (recall and precision) will be significantly affected. To get biomedical texts, the researcher needs to decide which database to use and which queries he/she will perform, unless the whole database is used to screen. Most literature databases (and text mining interfaces) have a web inter‐
face and the generation of the query is seamlessly integrated in this interface. However, one should always check whether the generated query reflects the research question. Intermezzo: Biological Text Mining
For example, submitting the query “text mining” through the web‐
interface of NCBI PubMed actually generates “text[All Fields] AND ("min‐
ing"[MeSH Terms] OR "mining"[All Fields])” (this can be seen in the De‐
tails tab of PubMed). Mining apparent‐
ly is identified as a MeSH term. MeSH (Medical Subject Headings) is the Na‐
tional Library of Medicine's controlled vocabulary thesaurus (Kim and Wil‐
bur, 2005) which consists of sets of terms and naming descriptors in a hierarchical structure that permits searching at various levels of specifici‐
ty. So actually, PubMed not only searches for texts, but is able to recognize some specific scientific and medical context. This normally improves the quality and accuracy of the search and is enabled by default. However in some situations it is undesirable. In addi‐
tion, PubMed allows adding additional limits to the query, such as publication year, topics covered and age groups of subjects (‘Limits’ tab). Once a query has been carefully gen‐
erated, adjusted and fine‐tuned, it can be executed. However, if one has a whole list of such queries and thus a lot of search results, the web‐interface is not sufficient any more to deal with these results and to further use them in the analysis pipeline. Therefore, an automated system to query the litera‐
ture database and to store the results 49 in a structured format (e.g. XML) or by using a database, will be required. Fortunately, all main literature data‐
bases provide ways to perform auto‐
mated queries and save the results in a structured format. This file format has a defined and do‐
cumented structure, e.g. an XML file has different ‘tags’, indicating at which level the different features can be situated. This defined layout makes it relatively easy and fast to get the re‐
quired individual elements of the data out of this file using a program lan‐
guage, although XML files are not very readable for humans. This data‐
operation is often called ‘parsing’. In this review, the different steps are described separately from each other, while in practice this is not the case. In this step, often keyword lists or orga‐
nized thesauruses (as described in step 2) are used to generate the search query. For example a tool that uses a chromosomal region as input and links the genes on this region to a certain disease or phenotype will use the different gene symbols and aliases, as well as a controlled vocabulary (synonyms, symptoms, different sub‐
types etc.) of diseases to generate the initial query. Application example: PubNet PubNet (Douglas et al., 2005) is a web service that extracts several types of 50 relationships returned by PubMed queries and maps them into networks, allowing graphical visualization, tex‐
tual navigation, and topological analy‐
sis. Based on user search terms, PubMed is queried and the results are gathered as XML files. As these XML files have a defined and documented structure, it is relatively easy to parse them. In the example of PubNet, de‐
pending on the relationship the user wants to visualize, the required fea‐
tures (such as authors or MeSH terms) are parsed out of the XML files and passed to the visualization technolo‐
gies. For instance, one is interested to see which authors are experts in the field of DNA‐methylation in colorectal can‐
cer. PubNet will send the generated query to PubMed and retrieve the results as an XML file. Per publication, the authors are parsed out of this XML file, an index of all authors is made and it is determined how many times each of these authors was co‐author with others in the list. Depending on this analysis, the graphical summary is generated. This view shows authors, publishing frequently together in groups (with line width indicating the frequency). This way, it also becomes clear which groups cooperate, and which persons connect the different groups and probably are experts in the field. This way, the relationship be‐
tween the authors of thousands of abstracts can be visualized and inter‐
Intermezzo: Biological Text Mining preted within minutes. The process of data fetching, parsing and a visualiza‐
tion example is given in Figure 8.1. Fetching XML 1
Parsing XML
Compiling nodes: 3513 edges: 21182
Generating output: svg ps pdf png
Figure 8.1: PubNet scheme of getting literature data from PubMed (Performing query, get XML files and parse them) and the visualization of co‐authors in the field of colorectal cancer DNA‐methylation Insight example: automated querying of PubMed PubMed (and other NCBI databases) can be queried using E‐Utils. This al‐
lows to pass queries to reserved NCBI servers using any program language and get the results back. The system works as follows: ‐
In a first stage, one passes its que‐
ries to the NCBI search system (E‐
Intermezzo: Biological Text Mining
‐
search). The servers execute the request and return a list of Primary ID’s as result. In this case PubMed IDs Afterwards, the details of these PubMed IDs are requested from the NCBI servers (E‐Fetch) and the results are passed in the required supported format (e.g. XML, plain text) 51 Perl‐script 1 illustrates this process: it gets all PubMed records related with DNA methylation and epigenetics and stores the results in a single XML file, allowing further processing. 8.3 Step 2: Define what to search for: deal with ontologies, gene and protein lists and the­
sauruses of chemical compounds and dis­
eases Once the literature results are stored in a structured way e.g. by using data‐
base technologies, the challenge is to identify certain interesting concepts in the results. As we are dealing with biomedical literature, we might want to identify gene or protein symbols, aliases or names; cancer types or oth‐
er diseases and their symptoms; Gene Ontology terms (Gene Ontology Con‐
sortium, 2008), chemical (Singh et al., 2003; Hoffmann, 2007; Wild and Hur, 2008) and pharmaceutical compounds such as drugs . To identify gene‐related, medical, chemical or drugs‐related terminolo‐
gies, a various number of databases are available. Some of these databases are hierarchically organized, others are unstructured. Examples of struc‐
tured sets of keywords are Gene On‐
tology terms and MeSH terms. Gene 52 symbols and names are examples of unstructured keyword sets. Some databases are in between: they cover for instance synonyms and symptoms of a disease, but there are no relations between the different diseases cov‐
ered. Some of these databases and lists (also known as thesauruses) are generated themselves by using text‐mining ap‐
proaches (Jin et al., 2006; Kim et al., 2008). Some databases actually are a so called ‘mashup’ (Cheung et al., 2008; Belleau et al., 2008) of different other data sources: they try to com‐
bine different data sources in one single interface. An example of such a database is GeneCards (Safran et al., 2002) for human gene information. In order to use these different lists, they will also have to be parsed to get the individual entities, taking into account synonyms and the hierarchic‐
al structure. Insight example: creating a list of human genes, their aliases and symbols We demonstrate in script 2 how to parse GeneCards to retrieve a list of all aliases and symbols of human genes. In a later phase, we search for all these gene symbols in the entire literature results set. Intermezzo: Biological Text Mining ‐
‐
First, we make use of Ensembl (Flicek et al., 2008) / BioMart to re‐
trieve a list of all human genes, as‐
sociated with a Ensembl gene ID (BioMart: human genes – no filters – output: Ensembl gene ID – present only unique results – save as CSV file). This initial step can al‐
so be automated, using the En‐
sembl API Second, the corresponding Gene‐
Cards records are retrieved. All gene symbols and aliases, names and descriptions are parsed out of this record and locally stored 8.4 Step 3: Identify key­
words, annotation lists and concepts in litera­
ture results. Deal with textual variants and ambiguities and iden­
tify relationships in the results Next in the analysis pipeline is to match the different annotation and keyword lists from step 2 with the literature results retrieved in step 1. Based on the identifications discov‐
ered in this step, the literature refer‐
ences are ranked, information is fil‐
tered, sorted, highlighted and summa‐
rized in order to present the analysis results or given to advanced machine learning classifiers. Intermezzo: Biological Text Mining
The previous steps were straightfor‐
ward, not prone to errors and compu‐
tationally relatively easy. This step however is more difficult as ambigui‐
ties, biases and errors are introduced (Tanabe and Wilbur, 2002; Tuason et al., 2004; Chen et al., 2005; Fundel and Zimmer, 2006; Yang et al., 2008), hav‐
ing an impact on the rest of the down‐
stream analysis. Genes can share aliases, gene symbols may be existing English words or are also used as an identifier of a cell line (Sehgal and Srinivasan, 2006). Symbols and names can also be written in different ways, the so‐called textual variants (e.g BRCA1, BRCA‐1, BRCA/1, BRCA 1, BRCA‐I). Many abstracts use abbrevia‐
tions for chemical compounds or dis‐
ease, making it even harder to identify these abbreviations correctly (Liu et al., 2002). After identification of different enti‐
ties, relationships between all these entries can be detected. For instance: are the genes on a certain chromo‐
somal region associated with symp‐
toms of a certain disease. Or given a certain disease, which genes are re‐
lated with this disease and in which pathways can these be situated. If the mechanisms for the identification of relationships are very complex, they could even be used to identify indirect relationships and be able to define novel research hypotheses. 53 The different relationship identifica‐
tion strategies range from co‐
occurrence to statistical and machine‐
learning based techniques. Some ef‐
forts use linguistic models to improve the detection. Three basic techniques (co‐occurrence, rule‐based and know‐
ledge‐based) are reviewed in (Cohen and Hunter, 2008). For hierarchically structured lists, the different levels of relationships can be taken into account. A gene ontology term detected at a certain level can be associated with its synonyms, its par‐
ents and children. In the analysis for relationships, these different levels of relationship can be taken into account in order to cross the level borders. To compare the performance of differ‐
ent text mining efforts, the F‐value (Chen et al., 2006) is often used. This value is calculated based on both pre‐
cision (P) and recall (R): . To calculate the F‐value, (manually) an‐
notated reference datasets are availa‐
ble for testing and comparing text mining efforts (Jimeno et al., 2008). Application examples: iHOP, PolySearch 54 Whatizit, Different web‐based services that allow easy identification of various keyword lists, are created. One exam‐
ple is Whatizit (Rebholz‐Schuhmann et al., 2008). Given plain text of a list of PubMed IDs, this tool can be used to identify 23 different terminologies, ranging from proteins to drugs. A frequently used tool is iHOP (Fer‐
nandez et al., 2007), illustrated in Figure 8.2. iHOP is able to detect a whole range of terms and relation‐
ships in one single view and is pre‐
indexed: the results appear almost immediately. Disadvantage is that iHOP only shows results of one single gene and cannot be used on other queries. This speeds up manual litera‐
ture exploration of one single gene as different entities are highlighted and the results will be more accurate as iHOP deals with different aliases and synonyms. The interface displays key sentences: sentences that contain most probably valuable information. Relationships are indicated in these sentences as well: if the gene of inter‐
est is regulated by another gene, not only both genes will be highlighted, also the keywords, indicating interac‐
tion between them, will be empha‐
sized. Intermezzo: Biological Text Mining Age penetrance is greater for BRCA1
-linked than for BRCA2
-linked cancers in this population. [2000]
mRNA were more likely to lack BRCA2
Tumors lacking BRCA1
We evaluate current knowledge of BRCA1
ovarian cancer. [2001]
and BRCA2
mRNA than tumors expressing BRCA1
functions to explain why mutations in BRCA1
mRNA (P<.001). [2002]
and BRCA2
lead specifically to breast and
Silencing of BRCA1 , BRCA2 , or BRCA1 /2-associated genes enhanced cisplatin cytotoxicity approximately 4- to 7-fold more in TP53
than in matched TP53 wild-type cells. [2006]
TP53 (+/-) matched-pair cell lines were used to determine if knockdown of BRCA1
selectively enhances cisplatin cytotoxicity in TP53 -deficient cells. [2006]
, BRCA2
, or validated hits that associate with BRCA1
The rarity of these mutants in human cancer and their multiple occurrence in BRCA -associated breast tumours suggests that these novel p53
selected during malignant progression in the unique genetic background of BRCA1 - and BRCA2 -associated tumours. [1999]
-deficient cells
and BRCA2
mutants are
MCF-7 cells transfected with a dominant negative mutant p53 (143 val-->ala) required at least tenfold higher doses of adriamycin to down-regulate BRCA1
and BRCA2 mRNAs than did parental MCF-7 cells or control-transfected MCF-7 clones. [1998]
METHODS: We determined the frequency of ATM IVS10-6T-->G variants in a cohort of individuals affected by breast and/or ovarian cancer who underwent
BRCA1 and BRCA2 genetic testing at four major Australian familial cancer clinics. [2004]
The product of the RAD51
gene functions with BRCA1
in the repair of double-stranded DNA breaks. [2003]
colocalizes with both BRCA1
RAD51
is an important component of double-stranded DNA-repair mechanisms that interacts with both BRCA1
In particular, BRCA1
recombination. [2004]
and BRCA2
and BRCA2
RAD51
functions in concert with Rad51
, and genetic variants in RAD51
, BRCA2
would be candidate BRCA1
/2 modifiers. [2001]
and BRCA2
. [2007]
and other genes to control double strand break repair (DSBR) and homologous
Hence, these data indicate that human cells with biallelic BRCA2 mutations display typical features of both FA - and HR-deficient cells, which suggests that
FANCD1 /BRCA2 is part of the integrated FA /BRCA DNA damage response pathway but also controls other functions outside the FA
pathway. [2006]
Another possibility that explains the lack of detection of alterations in BRCA1 or BRCA2 is the presence of mutations in undiscovered genes or in genes that
interact with BRCA1 and/or BRCA2 , which may be low-penetrance genes, like CHEK2 . [2006]
Figure 8.2: iHOP result for the BRCA1 gene. Different annotation lists are used (diseases, chemical compouns, Gene Ontology terms) and indicated. The webinterface makes use of hover‐over effects and all information is hyperlinked On the other hand, the interface does not help in giving a complete overview and prioritizing the relationships iden‐
tified. The hierarchical structures of some keyword lists is not reflected as well. This tool gives a very fast im‐
pression and performs well in identi‐
fying interesting text phrases and individual elements but is not de‐
signed to generate data summaries or prioritizing. Intermezzo: Biological Text Mining
PolySearch (Cheng et al., 2008)(example in Figure 8.3) is not restricted to a single gene and the user can create its own query, but one is limited to use one of the prior defined relationship identification schemes at a time (e.g. given a disease, list all genes associated with it and rank them). PolySearch is able to rank the results but does not create summary views. 55 Color Code
Query Gene/Protein Disease Drug Metabolite Association Word
Relevancy PubMed
Score
ID
126
(0,4,4,6)
Full
details
Key Sentences
Ren CC, Miao XH, Yang B, Zhao L, Sun R, Song WQ: Methylation status of the fragile histidine triad and
17009983 E-cadherin genes in plasma of cervical cancer patients. Int J Gynecol Cancer. 2006
Sep-Oct;16(5):1862-7.
100
12751384
(0,3,3,10)
Chen CL, Liu SS, Ip SM, Wong LC, Ng TY, Ngan HY: E-cadherin expression is silenced by DNA methylation
in cervical cancer cell lines and tumours. Eur J Cancer. 2003 Mar;39(4):517-23.
We examined promoter methylation of E-cadherin in five cervical cancer cell lines and 20 cervical cancer tissues
using methylation-specific PCR (MSP) and bisulphite DNA sequencing.
Color
Coded
Text
Color
Coded
Text
23
(0,0,3,8)
Moon HS, Choi EA, Park HY, Choi JY, Chung HW, Kim JI, Park WI: Expression and tyrosine
11371122 phosphorylation of E-cadherin, beta- and gamma-catenin, and epidermal growth factor receptor in
cervical cancer cells. Gynecol Oncol. 2001 Jun;81(3):355-9.
Color
Coded
Text
22
(0,0,3,7)
Rodriguez-Sastre MA, Gonzalez-Maya L, Delgado R, Lizano M, Tsubaki G, Mohar A, Garcia-Carranca A:
15863126 Abnormal distribution of E-cadherin and beta-catenin in different histologic types of cancer of the
uterine cervix. Gynecol Oncol. 2005 May;97(2):330-6.
Color
Coded
Text
Branca M, Giorgi C, Ciotti M, Santini D, Di Bonito L, Costa S, Benedetto A, Bonifacio D, Di Bonito P, Paba
21
P, Accardi L, Mariani L, Syrjanen S, Favalli C, Syrjanen K: Down-regulation of E-cadherin is closely
16800245
(0,0,2,11)
associated with progression of cervical intraepithelial neoplasia (CIN), but not with high-risk human
papillomavirus (HPV) or disease outcome in cervical cancer. Eur J Gynaecol Oncol. 2006;27(3):215-23.
Hsu YM, Chen YF, Chou CY, Tang MJ, Chen JH, Wilkins RJ, Ellory JC, Shen MR: KCl cotransporter-3
down-regulates E-cadherin/beta-catenin complex to promote epithelial-mesenchymal transition. Cancer
Res. 2007 Nov 15;67(22):11064-73.
18
(0,0,2,8)
18006853
18
(0,0,3,3)
9218005
15
(0,0,2,5)
Haga T, Uchide N, Tugizov S, Palefsky JM: Role of E-cadherin in the induction of apoptosis of
17906929 HPV16-positive CaSki cervical cancer cells during multicellular tumor spheroid formation. Apoptosis.
2008 Jan;13(1):97-108.
E-cadherin and beta-catenin colocalize in the cell-cell junctions, which becomes more obvious in a time-dependent
manner by blockade of KCC activity in cervical cancer SiHa and CaSki cells.
Fujimoto J, Ichigo S, Hirose R, Sakaguchi H, Tamaya T: Expression of E-cadherin and alpha- and
beta-catenin mRNAs in uterine cervical cancers. Tumour Biol. 1997;18(4):206-12.
Wu H, Lotan R, Menter D, Lippman SM, Xu XC: Expression of E-cadherin is associated with squamous
differentiation in squamous cell carcinomas. Anticancer Res. 2000 May-Jun;20(3A):1385-90.
14
(0,0,1,9)
10928048 To evaluate whether E-cadherin could serve as a biomarker of squamous cell differentiation, we analyzed its
expression by immunohistochemistry in formalin-fixed, paraffin-embedded tissue sections of 7 head and neck
cancer patients, 19 lung cancer patients, 73 esophageal cancer patients, 19 skin cancer patients, and 18 cervical
cancer patients.
Color
Coded
Text
Color
Coded
Text
Color
Coded
Text
Color
Coded
Text
Color
Coded
Text
Figure 8.3: PolySearch results interface: investigating the connection between E‐cadherin and cervical cancer. Different identified entities get different colors, indicating the query terms more prominently. Note the ranking and scoring information Novel possible relationships can be detected using Chilibot (Chen and Sharp, 2004), where in the most com‐
plex version two lists (of either genes or keywords) are searched for rela‐
tionships within a group or between terms in the groups. These two ap‐
proaches cannot use indexation in the relationship detection and are there‐
fore much slower but are able to re‐
veal more complex relationships. It also creates a schematic summary figure of different relationships (co‐
occurrence, stimulative, inhibitory) identified, with the possibility to color 56 the gene nodes in this graph according to gene expression data. Which tool to use, mainly depends on the research question. The perfect tool for every research question simply does not exists as all tools have their specific properties: indexing of data, availability and completeness of key‐
word lists, detection and identification methodologies, techniques for identi‐
fying relationships, incorporation of linguistic knowledge and text corpus‐
es, speed and way of ordering and representation. The most sophisti‐
cated and accurate text mining appli‐
Intermezzo: Biological Text Mining cations must be fed with positive and negative sets or trained, which often takes too long. In any case, making use of the described techniques and tools, is faster and/or more accurate and complete than manual literature searches and could open possibilities to uncover hidden associations, which could not be easily discovered by hu‐
mans. Insight example: matching genes and keywords and present abstracts, re‐
sult illustrated in Figure 8.4 ‐
‐
First of all, based on the list of aliases, generated in step 2, a list of textual variants is generated in script 3a. This script basically ge‐
nerates regular expressions, suited for immediate use in Perl In script 3b, the XML file with all abstracts is parsed and analyzed: o Author information and publi‐
cation details are stored o The title and abstract are searched for keywords; can‐
cer types; detection metho‐
dologies in methylation re‐
search and gene symbols; aliases and their textual va‐
riants. Sentences with both a Intermezzo: Biological Text Mining
gene alias and a keyword or a gene alias and a cancer type are identified o The results of the identifica‐
tion of single occurrences and sentences with both an alias and a keyword are stored in a relational MySQL database to enable fast querying and sort‐
ing afterwards. This database could for instance be used to query which genes were men‐
tioned in combination with a certain cancer type o Per abstract, a HTML file with Javascript (for the hover‐over effect) and keyword highlight‐
ing is created, in order to fas‐
ten human revision. The dif‐
ferent colors and the hig‐
hlighting of sentences with both a gene symbol and a keyword drastically improves revision times and accuracy. The use of hyperlinks (to orig‐
inal abstract in PubMed; to other publication of any of the authors, to GeneCards) dy‐
namically links all these dif‐
ferent information sources and allows intuitive naviga‐
tion 57 16820927: Promoter methylation status of the MGMT, hMLH1,
and CDKN2A/p16 genes in non-neoplastic mucosa ofMGMT
patients with and without colorectal adenomas.
MGMT
Close
O-6-methylguanine-DNA
Ye C. Shrubsole MJ. Cai Q. Ness R. Grady WM. Smalley W. Cai H. Washington K methyltransferase - 10q26.3
Oncol Rep - 2006
The aberrant methylation of CpG islands is a common epigenetic alteration found in cancers.
The process contributes to cancer formation through the transcriptional silencing of tumor
suppressor genes. CpG island methylation has been observed in aberrant crypt foci (ACF) and
adenomas in the colon, implicating it in the earliest aspects of colon cancer formation. In addition,
some investigators have identified an age-related increase in DNA methylation of the ESR1 locus
in the colon mucosa, suggesting that DNA methylation may be a pre-neoplastic change that
increases the risk of colon adenomas and colon cancer. We investigated the methylation status
in the promoter regions of the CDKN2A/p16, hMLH1, and MGMT genes in human non-neoplastic
rectal mucosa and evaluated whether these methylation markers may predict the presence of
adenomatous polyps in the colon. The promoter methylation patterns of these genes were
examined in rectal biopsies (mucosa samples) of 97 colorectal adenoma cases and 94 healthy
controls using methylation-specific PCR (MSP) assays. Methylation of the MGMT and hMLH1
genes was present in both cases and controls, with a frequency of 12.4% and 18.1% for the MGMT
gene and 12.4% and 11.7% for the hMLH1 gene. The frequency of CDKN2A/p16 promoter
methylation was very rare in normal colorectal tissue with a frequency of approximately 2%.
Overall, no apparent case-control difference was identified in the methylation status of these
genes, either alone or in combination. hMLH1 methylation was more frequently observed among
overweight or obese subjects (BMI>/=25) with an adjusted OR of 3.7 (95% CI=1.0-13.7).
Methylated alleles of the hMLH1 and MGMT genes were frequently detected in normal rectal
mucosa, while the frequency of CDKN2A/p16 methylation detected was very low. The
methylation status of these genes in rectal mucosa biopsies detected by MSP assays may not
distinguish between patients with and without adenomas in the colon.
Figure 8.4: Example of identification of human genes and their symbols, cancer types, methylation related keywords and detection methodologies. Note the errors in identifica‐
tion: ACF and CI being detected as a gene while it are abbreviations 8.5 Step 4: Rank, summar­
ize and present the re­
sults The last step, crucial for usability, is to order and summarize the data and present the results to the user. The visualization of the text analysis must be easy to understand and navigate, but however provide sufficient data, available within a few mouse clicks. The key concepts or the results with the highest support must be listed first. The interface must be self‐
explainable, also for first‐time users. 58 It can be a real challenge to present data in a structured form, giving enough details but without losing the overview and navigation aspects. Of‐
ten, graphic representations of data are excellent: they may give overview and summarizations, indicate connec‐
tions and may be scalable. The result presentation may also largely benefit by using representations that re‐
searchers are familiar with. For exam‐
ple: it may be a good idea to present results, related with pathways, in a cellular representation indicating membranes, nucleus and organelles. Intermezzo: Biological Text Mining Some commercial packages (such as Ingenuity Pathway Analysis) imple‐
ment such attractive visualization strategies as this enables biological researchers to gain insight in the on‐
going processes and to more effective‐
ly formulate new hypothesizes. Sortable tables, colors, highlighted and hyperlinked data are often used to visualize the results in a web‐
environment as users are familiar with this. With the new technological (web 2.0) developments such as AJAX, the visualization can be even further im‐
proved: hover‐over effects, expanda‐
ble sections with details or additional filters without having to refresh the page. In addition to literature searches, oth‐
er public data sources could be used to prioritize genes. A nice example is Endeavour (Tranchevent et al., 2008)(example in Figure 8.5): this prioritization tool uses, in addition to text sources, biological pathways; sequence motifs; protein interaction; regulatory modules and expression data. All this data is used to create a statistical model that is able to rank genes in every of these aspects. In the end, a global ranking is presented. Figure 8.5: Candidate gene prioritizing visualization in Endeavour. A single candidate gene (given a unique background color) is ranked according to different data sources, including literature data. A global ranking order is then determined: this is the prioritized candidate gene list Intermezzo: Biological Text Mining
59 8.6 Discussion In this review we discussed four of the major mechanisms that most biologi‐
cal text mining systems share. Most applications are either completely focused on one particular theme in literature or either more applicable in more biological and medical contexts. The more specialized, the better the identification or training methodolo‐
gies can be implemented, while the cost of generalization is loss in recall or specificity. This probably is the reason why so many applications and databases are created, relying on text mining efforts in a single (more or less narrow) research area (Fang et al., 2008; Shtatland et al., 2007; Lee et al., 2008; Gajendran et al., 2007). Most of these databases are created by people with a biological background, rather than researchers with computa‐
tional or linguistic expertise and thus use rather simple text mining ap‐
proaches instead of taking compli‐
cated models into account. Despite the lack of these technologies, these data‐
bases perform well, as the knowledge of experts in the field greatly adds to recall and specificity. Instead of having to train statistical models, the data‐
base is annotated and complemented by experts that mainly benefit from simple keyword highlighting and im‐
provement of the navigation through the abstracts and the hyperlinked information. 60 The perfect application most probably does not exist for one research ques‐
tion, but the available web‐based tools often provide a very good basis for analysis, certainly when executed by researchers that would otherwise perform a manual literature search. Apart from the querying, identification and ordering mechanisms in the back‐
ground, the presentation of the results is a key feature. The use of dynamic web technologies with carefully cho‐
sen colors, intuitive navigation enabl‐
ing both general overviews and de‐
tailed information can make the gen‐
erated data and knowledge accessible. This review shows some of the advan‐
tages and the power of text mining approaches and the visualization of the results. However, techniques to automatically analyze full text articles and to distil information out of tables and figures are in development and would increase the available data even further. Full text versions of articles are not accessible in a standardized way, there is not one single locate where they are stored and some jour‐
nals require registration. The recent move to open access articles and the implementation of the DOI system (Digital Object Identifier) might speed up full text searches. 8.7 Conclusion Biological text‐mining is becoming necessary in order to analyze litera‐
Intermezzo: Biological Text Mining ture results. With the availability of diverse biological databases and web tools, it has become feasible to auto‐
matically mine biological texts and setting up a tailored text‐mining ap‐
proach in order to answer biological questions in a certain content can help speeding up getting insight in existing knowledge and the generation of nov‐
el research hypothesis. Different web‐
based text‐mining tools are available, some are highly focused on one par‐
ticular area while others can be ap‐
plied in broad biological contexts. Some tools only use co‐occurrence and other relatively simple detection prin‐
ciples, while others implement linguis‐
tic knowledge; the visualization of the results ranges from simple color hig‐
hlighting to web 2.0 enabled diagrams. The ideal text‐mining application for all specific research questions proba‐
bly does not exist, but with some mod‐
ifications, the results can be obtained fast and accurate. Intermezzo: Biological Text Mining
61 Chapter 9: PubMeth: methylation data­
base in cancer Paper 2: PubMeth: a cancer methylation database combining text­
mining and expert annotation. Ongenaert M, Van Neste L, De Meyer T, Menschaert G, Bekaert S, Van Criekinge W. As published in Nucleic Acids Research; 36 (Data­
base issue): D842­6. (Open Access)
Epigenetics, and more specifically DNA methylation is a fast evolving re­
search area. In almost every cancer type, each month new publications con­
firm the differentiated regulation of specific genes due to methylation and mention the discovery of novel methylation markers. Therefore, it would be extremely useful to have an annotated, reviewed, sorted and summarized overview of all available data. PubMeth is a cancer methylation database that includes genes that are reported to be methylated in various cancer types. A query can be based either on genes (to check in which cancer types the genes are reported as being methylated) or on cancer types (which genes are reported to be methylated in the cancer (sub) types of interest). The da­
tabase is freely accessible at http://www.pubmeth.org. PubMeth is based on text­mining of Medline/PubMed abstracts, combined with manual reading and annotation of preselected abstracts. The text­
mining approach results in increased speed and selectivity (as for instance many different aliases of a gene are searched at once), while the manual screening significantly raises the specificity and quality of the database. The summarized overview of the results is very useful in case more genes or can­
cer types are searched at the same time. PubMeth: methylation database in cancer
63 9.1 Introduction DNA methylation represents a modifi‐
cation of DNA by addition of a methyl group to a cytosine, also referred to as the fifth base (Doerfler et al., 1990). This reaction uses S‐adenosyl‐
methionine as a methyl donor and is catalyzed by a group of enzymes, the DNA methyltransferases (DNMTs). In humans and other mammals, this epi‐
genetic modification is almost exclu‐
sively imposed on cytosines that pre‐
cede a guanosine in the primary DNA sequence (often called a CpG dinucleo‐
tide). The frequency of these CpGs in the genome is much lower than would be expected as a methylated cytosine often is subject to deamination thereby forming thymidine. However, in some regions, dense clusters of CpGs can be identified: these regions are referred to as CpG islands (Her‐
man and Baylin, 2003). DNA‐methylation is an epigenetic change: it does not alter the primary DNA sequence and might contribute to overall genetic stability and mainte‐
nance of chromosomal integrity. Con‐
sequently, it facilitates the organiza‐
tion of the genome into active and inactive regions with respect to gene transcription (Robertson, 2002). Genes with CpG islands in their pro‐
moter region are generally unmethy‐
lated in normal tissues. Upon DNA hypermethylation, transcription of the affected genes may be blocked, result‐
64 ing in gene silencing. In neoplasia, abnormal patterns of DNA methyla‐
tion have been recognized. Hyper‐
methylation is now considered one of the important mechanisms resulting in silencing expression of tumour sup‐
pressor genes, i.e. genes responsible for control of normal cell differentia‐
tion and/or inhibition of cell growth. In the last few years, new hyper‐
methylated biomarkers have been used in cancer research and diagnos‐
tics (Esteller, 2003). MethDB (Amoreira et al., 2003), one of the few databases that focus on DNA methylation, is general and sample oriented. But it is not optimized to cancer‐related queries because this type of query requires a summarized overview. However, in MethDB query‐
ing multiple genes or cancer types is not supported and data is always han‐
dled as a separate sample. Another database, MethPrimerDB (Pattyn et al., 2006), has a focus on detection meth‐
odologies (e.g. MSP primer design). Both databases discussed here, de‐
pend on submissions by administra‐
tors or users, which guarantees the required quality of the databases, but consequently they are not always complete and up to date. The data‐
bases are neither designed to rank and summarize cancer‐related information (genes and cancer (sub) types in‐
volved), although this is crucial in applied methylation research in the cancer field. PubMeth: methylation database in cancer
Hereby we present PubMeth, a data‐
base that combines a text‐mining ap‐
proach (fast, intelligent to search mul‐
tiple aliases and textual variants of these aliases, querying multiple key‐
word lists at once) with a manual re‐
viewing and annotation step. The lat‐
ter one drastically improves specificity and annotation quality. The interface is able to rank, summarize and repre‐
sent data, making the information the database contains easily accessible. The reviewing step also heavily de‐
pends on the text‐mining step that sorts abstracts, highlights terms and provides links to different sources. This way, the reviewing step can be done fast and accurate enough to process all abstracts, electronically published until now in PubMed. In addition, using this approach, an up‐
date strategy can be more easily im‐
plemented. DNA methylation in cancer research has evolved to a mainstream research topic. Methylation profiles are suc‐
cessfully used in early detection and personalized treatment. However, more and more data is available, espe‐
cially with the availability of large‐
scale screening techniques. All the information taken together deter‐
mines the knowledge of the ‘cancer methylome’. Ultimately, the epige‐
nome of all cancer tissues, including those of different stage and grade, could be mapped out. Epigenetic PubMeth: methylation database in cancer
states differ widely among tissues, and changes are far more varied and much more frequent per tumor than DNA mutations. "Each differentiated cell has a different epigenome," said Jones (Garber, 2006). In this perspective, it is very useful to extract which genes are already reported in which cancer types from literature. This information might be used as positive controls, to check the same genes in other (re‐
lated) cancer types, to screen for markers that could be used as early diagnostic utility or in the context of personalized medicine and to deepen the knowledge of the mechanisms of methylation. PubMeth tries to contain and summa‐
rize as many available literature data and presents them in a easy to use graphical interface. It speeds up the process of searching relevant litera‐
ture, many aliases and keywords are searched at the same time and the results are reliable as they are manu‐
ally reviewed as one would do when performing a manual literature search. 9.2 Filling up the database Abstracts, related to epigenetics and methylation, are downloaded in XML‐
format through NCBI E‐Utils (E‐fetch) using more than 15 methylation‐
related keywords (such as methyla‐
tion, DNA‐methylation, methylated, epigenetic and a range of variants, as well as detection technologies). The 65 aliases, symbols and descriptions of human genes, associated with an En‐
sembl ID, are obtained using a perl‐
script. This queries the GeneCards database (Rebhan et al., 1998), that already combines different genetic databases such as Ensembl and Entrez Gene. Different textual variations of all aliases are generated to be as com‐
plete as possible (e.g. variants for BRCA1 include BRCA 1, BRCA‐1, BRCAI, BRCA I and BRCA‐I). To avoid counting and highlighting aliases that are also common English words, an alias is rejected if more than 100,000 PubMed abstracts are retrieved. A list (http://www.wordcount.org) of fre‐
quently used English words is searched at the same time. Cancer‐related keywords were ob‐
tained from a list of the National Can‐
cer Institute (http://www.cancer.gov/cancertopics
/alphalist) and keywords related with detection‐methodologies are manually compiled. One by one, abstracts are searched for aliases and their variants, methyla‐
tion‐related keywords, sentences with both an alias and such a keyword. In addition, terms related with cancer and detection methodologies were also highlighted and counted. This 66 information is stored in a MySQL 5 relational database using Perl‐DBI. Based on the information in this data‐
base, abstracts are ranked. This rank‐
ing is based on a large number of pa‐
rameters such as the number of ali‐
ases, the number of different genes, the number of different aliases per gene, the number of sentences with both an alias and a methylation‐
related keyword, the presence of de‐
tection‐methodology and cancer‐
related keywords. Abstracts are then manually reviewed, taking into account the order after ranking, with the aid of highlighting the different keyword lists, aliases and sentences with alias and methylation‐
related keyword in different colours. Aliases are linked with gene informa‐
tion using hover‐over effects gener‐
ated with JavaScript and CSS. After manual reviewing, the information in the database only has to be minimally updated or corrected. A schematic overview of the complete process is given in Figure 9.1. This process is still in progress; due to the ranking system the most important publications are currently in the database. The remain‐
ing abstracts will be reviewed soon, and an accurate update strategy will be developed. PubMeth: methylation database in cancer
Abstracts
PubMed abstracts, retrieved trough NCBI E-uls (E-Fetch), associated with methylaonrelated keywords or textual variants
Gene variants
Aliases, symbols, names,
descripons and textual variants
from GeneCards
Highlighng and annotaon
12679904: Mutation and methylation of hMLH1 in gastric
carcinomas with microsatellite instability.
Fang DC Wang RQ Yang SM Yang M L u HF Peng GY Xiao L Luo YH
World
Gastroenterol 2003
Methylaon
Methylaon-related keywords
and textual variants
Cancer
Cancer-related keywords and
textual variants from NCI
AIM To appraise the corre at on of muta ion and methylation of hMSH1 with m crosatell te
instabi ity (MSI) in gastric cancers METHODS Muta ion of hMLH1 was detected by
Two dimensional electrophores s (Two D) and DNA sequencing Methylation of hMLH1 promoter
was measured w th methylation specific PCR MSI was ana yzed by PCR based methods
RESULTS S xty e ght cases of sporad c gastric carcinoma were studied for mutation and
methylation of hMLH1 promoter and MSI Three mu ations were ound two of hem were caused
by a single bp subst tut on and one was caused by a 2 bp substitution which displayed s m lar
Two D band pa tern Methylation of hMLH1 promoter was detected in 11(16 2 %) gastric cancer
By us ng five MSI markers MSI n at least one locus was de ected n 17/68(25 %) of the tumors
analyzed Three hMLH1 mutat ons were all de ec ed n MSI H (>=2 loci n=8) but no mutat on was
found in MSI L (on y one ocus n=9) or MSS tumor lack ng MSI or stable n=51) Methylation
frequency of hMLH1 in MSI H (87 5 % 7/8) was significantly higher than that in MSI L (11 1 % 1/9)
or MSS (5 9 % 3/51) (P<0 01 0 001) but no d fference was found between MSI L and MSS
(P>0 05) CONCLUSION Both mutat on and methylat on of hMLH1 are invo ved in the MSI
pathway but not re ated to the LOH pathway in gastric carc hMLH1
Close
Counng, storing
and sorng
Counters for aliases,
different genes,
different keyword-types,
sentences with both an
alias and a keyword,…
MySQL
DB
mutL homolog 1 colon cancer
nonpolyposis type 2 (E coli)
3p22 3
Detecon
detecon-related keywords and
textual variants
Sorng, manual review
of highlighted abstracts
Figure 9.1: Scheme that illustrates the initial filling up of database using text‐mining. Aliases of genes and different keyword lists (methylation, cancer and detection‐related) are highlighted in the abstract. At the same time, different parameters are counted and stored in a MySQL relational database. Afterwards, the data is ranked and manually re‐
viewed 9.3 Querying the database A record in the database contains in‐
formation about the source publica‐
tion, the gene, the cancer type and subtypes if specified. It includes the number of primary cancer samples where methylation is analyzed in, as well as the number of analyzed cell lines and the number of normal tis‐
sues. For all these three categories the methylation frequency (the percent‐
age of the samples that show methyla‐
tion) is also available. Other informa‐
tion includes the detection technolo‐
gies used and an ‘evidence sentence’ where most of the information in this record came from. PubMeth: methylation database in cancer
PubMeth can be queried using the web‐interface at http://www.pubmeth.org in two ways, depending on the researcher’s focus: •
•
Gene‐related: in which cancer types (and subtypes) the genes of interest are reported to be me‐
thylated Cancer‐related: which genes are reported to be methylated in the cancer types/subtypes 9.3.1
Gene-centric query
A query is created in two easy steps. In the first step, the user provides a list of 67 genes (different identifiers are ac‐
cepted: gene symbol or name, RefSeq, Ensembl ID, …). The query is analyzed using local symbol/alias lists, gener‐
ated using GeneCards, and suggestions are presented to the user. In the sec‐
ond step, the user reviews the selec‐
tions made (most likely the genes selected due to intelligent sorting in the background are correct) and sub‐
mits his choices. cal usage example would be that, using a pharmacologic demethylation ap‐
proach in cell lines, 50 candidate genes are selected. The question then is to sub‐select genes to verify in pri‐
mary cancer samples, often based on time‐consuming literature searches. This selection is facilitated by the summarization view of PubMeth. The summary is very useful if multiple genes are searched at once; this fea‐
ture is what distinguishes this data‐
base from previous efforts. One practi‐
From this main page, one can go to the detailed pages, focusing on a selected gene in a certain cancer type. On such a detailed page, graphical representa‐
tions of the number of references in the database, the total number of samples and the mean methylation frequency are displayed for the differ‐
ent cancer types and their subtypes. The complete individual records, linked with their original PubMed record, are shown. Users can also choose to browse a pre‐computed gene list. Advantage is that the user can browse all genes in PubMeth without having to query the database, which is significantly faster. However, the summary view is not available.
At this point, the results will be gener‐
ated and the main result page is pre‐
sented to the user. This main result page ranks the genes, based on the number of references to the gene in the database. A graphical summary representation of the number of refer‐
ences, the number of primary samples and the mean methylation frequency within different cancer types is also given (example in Figure 9.2). 68 PubMeth: methylation database in cancer
Methylation frequency:
0
0 20 %
20 40 %
40 60 %
60 80 %
80 100 %
breast
478
593
148
570
630
cervical
658
759
586
433
397
cardial
50
527
oesophaegeal
299
159
383
27
179
48
liver
284
450
339
301
1271
545
salivary gland
96
60
gall bladder
114
9
55
59
105
50
oral
362
36
103
197
34
121
687
135
29
124
36
56
ov arian
802
676
79
115
brain
1086
460
1981
229
neuroblastoma
183
193
83
27
50
23
56
72
52
48
259
92
313
141
163
32
27
468
38
20
neuroendocrine
46
mucoepidermoid carcinoma
79
102
71
1
lymphoma
452
132
292
gastric
2162
379
1284
1724
635
85
108
17
17
NULL
8
19
endocrine
67
176
kidney
154
501
160
175
110
pancreas
198
140
140
112
106
11
1173
25
colorectal
3935
925
3184
769
leukaemia
1114
52
40
606
mesothelioma
17
6
6
6
nasopharyngeal
71
98
133
234
skin
89
41
96
small bowel
bone
257
92
larynx
wilms tumour
374
82
50
28
756
42
99
58
58
393
374
1463
6
203
28
46
65
prostate
514
944
348
568
33
159
222
6
6
1691
214
535
49
280
48
71
138
b le duct
46
42
165
146
415
170
107
20
27
846
Figure 9.2: Summary page of a gene‐centric query. The different colors represent the frequency of methylation of the gene in the different cancer types (what percentage of the samples showed methylation), while the numbers indicate the total number of primary samples tested for methylation 9.3.2
Cancer-centric query
A cancer‐centric query is executed in one easy step: the user selects cancer types (and/or subtypes up to three levels – e.g. lymphoma ‐ non‐hodgkin lymphoma ‐ b‐cell lymphoma ‐ diffuse large B‐cell). An overview (in the same style as the gene‐centric searching approach) of the genes that are most commonly described as methylated in the selected cancer types, as well as the total number of samples and the mean methylation frequency is re‐
turned. From this summary page, navigating to detailed pages is intui‐
tive. PubMeth: methylation database in cancer
This type of search is meant to get a quick overview of the genes that are reported in the methylation context in the cancer (sub)types of interest and in which frequency, to explore methy‐
lation in the cancer types of interest, to compare experimental results with or to perform, in a next step, a gene‐
centric search on these genes for full details in all cancer (sub)types. A screencast that dynamically shows how to query PubMeth is available on the PubMeth website. 69 9.4 Performance of Pub­
Meth, discussion and future To evaluate the performance of Pub‐
Meth, we tested how well the database performed in comparison with a care‐
ful manual literature search. There‐
fore, we selected a very recent review, focusing on DNA‐methylation in breast cancer (Agrawal et al., 2007). This article contains a table where the authors provide a list of 39 genes, known to be hypermethylated in breast cancer and their literature ref‐
erences. The genes in this list are en‐
tered into the gene‐centric search of PubMeth: 27 genes are listed in Pub‐
Meth, 11 are not listed and 1 gene could not be associated with a gene symbol. Of the 27 genes listed in Pub‐
Meth, 20 are described in breast can‐
cer. Breast cancer is listed first on the results page due to the background sorting mechanisms, but 18 genes are not associated with breast cancer in PubMeth. On the other hand, the review article lists 39 different genes, while a can‐
cer‐centric search for breast cancer returns 94 genes. Important to men‐
tion: the genes both in PubMeth and the review (the shared group) are associated with a high number of pri‐
mary samples in PubMeth. If all 94 genes were ranked according to the number of primary samples in de‐
70 creasing order, most members of the shared group are on top of this rank‐
ing, almost the complete top‐10 is present in the shared group (except numbers 7 and 8). This example clearly shows the power of PubMeth as well as its weaknesses. First of all, doing such a literature search manually usually takes multi‐
ple hours, while PubMeth presents its summary within minutes. PubMeth is in most cases able to find more refer‐
ences than a manual search would, using the different keywords and alias lists. On the contrary, often abstracts don’t mention any of the genes in question, and these abstracts are not taken into account for consideration in PubMeth. Examples of such articles are large‐scale studies with multiple genes or reviews. These articles gen‐
erally are easily found by manual searches but not using our text mining approach that is only able to screen abstracts. As long as there is no universal or centralized system to be able to screen full text articles or a mainstream open access strategy, solution for this would be to leave the restriction that the abstract has to contain a gene out and to do more general searches. Other possibility is to allow users to enter their suggestions for inclusion into PubMeth; such a submission system would allow to combine the power of both submissions by users and an PubMeth: methylation database in cancer
automated text mining approach that demonstrates to be very powerful dealing with different keyword lists and gene name variants. The latter is available on the PubMeth website: articles related with DNA‐methylation in cancer that could not be found with PubMeth, can be suggested for inclu‐
sion. Currently, PubMeth only focuses on hypermethylation, however the inclu‐
sion of hypomethylated genes would be useful as well for some users. In a next update, keywords related with hypomethylation will be added. Other future database updates should take into account different originating tissues (for example clearly separate between primary cancer tissue and serum) and the different types of normals (surrounding tissue in tumor patient versus samples from healthy person). However, different articles use different terminologies and often PubMeth: methylation database in cancer
this information is not easily extract‐
able. Improvements to the interface should represent the above described separa‐
tion of samples. Currently, only the mean of the methylation frequency in primary cancers is given. This could be extended to give an idea of the degree of variation in the different experi‐
ments and the different methods, the difference between cancer and normal tissue and the frequency in cell lines. However, it is a real challenge to pre‐
sent all this useful information in a clear interface that is easy to overview and browse. 9.5 Acknowledgments The authors would like to thank all initial test users of PubMeth for their detailed comments and suggestions for improvement. Many thanks to all the people who helped correcting and improving the manuscript. 71 Chapter 10: Conclu­
sion ‐ [The phone is ringing. Roy is drinking coffee and licking doughnut sugar from his fingers, purposefully delay­
ing for as long as possible before he answers it] ‐ Roy: [answers phone] Hello IT. Have you tried turning it off and on again? ... OK, well, the button on the side. Is it glowing? ... Yeah, you need to turn it on. Err, the button turns it on. [Moss enters and tosses Roy a muffin] ‐ Roy: Yeah, you do know how a button works, don't you? No, not on clothes. [Moss's phone rings. He answers it.] ‐ Moss: Hello IT. Yuhuh. Have you tried forcing an unexpected reboot? ‐ Roy: No, there you go, I just heard it come on. No, that's the music you hear when it comes on. No, that's the music you hear when... I'm sorry, are you from the past? ‐ Moss: You see the driver hooks a func­
tion by patching the system call table so it's not safe to unload it unless an­
other thread is about to jump in there and do its stuff. And you don't want to end up in the middle of invalid mem­
ory. [laughs] Hello? Situation in “The IT­crowd” (2006) Conclusion DNA‐methylation and epigenetics in general is a ‘hot topic’ in scientific literature. It now is clear that epige‐
netic changes are in some cancer types much more common than mutations and genetic alterations. However, literature data is not accessible in a standardized way. A way to overcome this problem is to make use of text recognition and mining tools, in com‐
bination with publicly available the­
sauruses (lists with words, synonyms, gene ontology, cancer‐related key‐
words,…). Being able to automatically search in these datasets gives quick insight in the current state in the research and allows better and faster formulation of new hypotheses to test, avoids doing tests that are already performed and allows taking confirmed data as posi‐
tive controls. With PubMeth, we were able to put methylation literature data into a searchable interface and presenting the results in an easy to overview and handy interface. This knowledge base can be used to fasten routine litera‐
ture searches and to drive research hypothesizes forward (posi‐
tive/negative controls, enrichment and clustering analysis, functional insights). 73 Genome­wide selection and discovery of DNA methylation markers. DNA­methylation markers are so­called bio­
markers that can be used to discover cancer cells in an early stage, can be detected using non­invasive methods or can be used to predict response to therapies. Careful initial ex­
perimental set­up and selection procedures can increase the success rate of the experimental validation and select mark­
ers with better specificity and selectivity Part 3: Genome‐wide selection of methylation markers Part 3: Genome‐wide selection of me‐
thylation markers 75 Chapter 11: Intro­
duction Dieu me pardonnera, c'est son métier Last words of Heinrich Heine (1856) A biomarker in cancer research would be a feature that can distinguish can‐
cer samples and normal samples, is able to stage/grade the cancer or can be used to predict response to treat‐
ment. The earlier this marker can be identified in the cancer progression, the better. Epigenetics, and DNA‐methylation in particular, open new perspectives in finding such biomarkers. Finding possible methylation bio‐
markers is now possible with the rise of high­throughput methodologies. However, some restrictions apply in the search of methylation biomarkers: • Testing a lot of promoter regions for methylation requires a lot of sample material. Often, primary cancer samples are limited. There‐
fore, samples of cancer cell lines are used. However, about half of the effects seen in the cell line may solely be due to the fact that it is a cell line • The effect of de‐methylating treat‐
ments (such as treatment with Introduction DAC) can only be identified in cell lines • Array‐based techniques have lim­
ited sensitivity as they are limited by diffusion The problems described above indi‐
cate that high‐throughput screening methodologies can only be used to list possible marker candidates. These candidates need further validation on primary cancer samples, using more sensitive techniques (such as PCR‐
based technologies). Also, the sensitiv‐
ity and selectivity of the biological methylation marker has to be deter‐
mined in order to appropriately make use of it and to avoid overfitting. In order to limit the candidate list to a reasonable number of genes to vali‐
date and to improve the success rate of this selection, the following strate‐
gies can be applied: • Perform replications or perform the experiment with multiple can‐
cer cell lines • Try to incorporate primary can­
cer samples in the initial high‐
throughput screening steps • Combine different sources of data: expression results of samples without treatment and samples with different demethylation treatments. Make use of publicly available literature and other data sources 77 • Find a way to prioritize the can­
didate list after initial screening in order to increase the success rate of the validation study These strategies have been applied in the different case studies in this part. The main focus is on intelligent setup of the large‐scale screening studies and ways to rank and prioritize possi‐
ble markers for the validation studies. The case studies clearly demonstrate that both the design of the study and the analysis can be optimized to ob‐
tain a acceptable success rate in the validation studies afterwards. As cervical cancer is prominently pre‐
sent in both this part and the follow‐
78 ing, a brief introduction is given in an intermezzo (in Dutch). Next, several analysis methods are described. The methodologies can mainly divided into three different strategies: • Relaxation ranking, applied both in vivo and in silico in cervical cancer • Genome‐wide promoter analysis strategies; both shown in silico as applied in various cancer types, in‐
cluding lung, prostate, breast and neuroblastoma • Ranking methodologies to identify markers that can be used to predict treatment response: applied for platinum therapy in ovarian cancer
Introduction
Chapter 12: Intermezzo: DNA methyla­
tiemerkers helpen vroegtijdige opsporing van cervixcarcinoom Paper 3: DNA methylatiemerkers helpen vroegtijdige opsporing van cervixcarcinoom Van Criekinge W, Ongenaert M, van der zee AGJ, Wisman GBA, Kridelka F. As published in De agenda Gynaecologie – oncologie. Oktober 2008, p. 12­
14. Aangezien baarmoederhalskanker (cervixcarcinoom, cervical cancer) ver­
schillende keren in dit proefschrift wordt vermeld en dit vanuit verschillende perspectieven, is het nuttig dit type kanker te kaderen. Recent kwam baar­
moederhalskanker in het nieuws, gezien men verschillende vaccins heeft ontwikkeld tegen een sterk betrokken virus (HPV). Cervixcarcinooom is de tweede meest voorkomende kanker bij vrouwen we­
reldwijd en zelfs de meest frequente in ontwikkelingslanden. Door reguliere bevolkingsscreening is een sterke daling in sterfte opgetreden in de westerse wereld. Hierbij wordt een cytologisch onderzoek van de cervix, het klassieke uitstrijkje of de PAP­test, uitgevoerd met als doel het opsporen van asympto­
matische pre­maligne afwijkingen van de cervix, de zogenaamde laag­
gradige squameuze intraepitheliale letsels (LSILs) en de hoog­gradige SILs (HSILs). De sensitiviteit van deze test (30% ­ 87%) is echter voor HSIL en baarmoederhalskanker verre van optimaal (Nanda et al., 2000). Hierdoor zullen er dus HSILs en cervixcarcinomen gemist worden en daar­
door niet worden behandeld. Door de relatief langzame carcinogenese en door regelmatig te screenen wordt dit probleem grotendeels ondervangen. Toch worden er regelmatig cervixcarcinomen gediagnosticeerd bij patiënten bij wie het voorafgaande uitstrijkje van de cervix als normaal was gescoord. Intermezzo: DNA methylatiemerkers helpen vroegtijdige opsporing van cervixcarcinoom 79 Daar er een sterke associatie bestaat tussen het humaan papillomavirus (HPV) en het ontstaan van het cervix‐
carcinoom (Bosch et al., 2002), wordt momenteel in diverse landen onder‐
zocht of een HPV‐test, al dan niet in combinatie met een uitstrijkje, zinvol is (Bulkmans et al., 2007). Er bestaan meer dan 100 verschillende soorten HPV, waarvan er 15 gekarakteriseerd worden als hoog‐risico HPV (hr‐HPV), wat inhoudt dat dit de typen zijn die baarmoederhalskanker kunnen ver‐
oorzaken. Verschillende onderzoeken toonden aan dat het testen van hr‐
HPV DNA een aanzienlijk hogere sen‐
sitiviteit oplevert dan cytologie, name‐
lijk meer dan 95%. Dit gaat echter wel gepaard met een verlaagde specifici‐
teit. De meeste vrouwen raken gedu‐
rende het leven besmet met HPV, maar kunnen dit virus ook spontaan weer klaren. Hierdoor is screenen op hr‐
HPV niet specifiek, omdat er vrouwen positief zullen testen die geen (pre)maligne afwijking van de cervix hebben en deze ook niet zullen ont‐
wikkelen. Naarmate de te screenen populatie meer jongere vrouwen bevat zal deze specificiteit nog lager worden door de bekende hoge incidentie van het hr‐HPV bij jonge, sexueel actieve vrouwen (Kulasingam et al., 2002). groep die gevaccineerd zal worden in eerste instantie meisjes betreft tussen 10 en 13 jaar, zullen de eerste effecten op het voorkomen van (pre)maligne afwijkingen van de cervix niet eerder dan na 20‐25 jaar optreden, hetgeen derhalve ook betekent dat tot die tijd er ook geen consequenties met be‐
trekking tot het screenen zijn. Recent zijn er 2 HPV vaccins ter be‐
schikking gekomen, die waarschijnlijk op korte termijn op grote schaal in Europa toegespast zullen worden (Harper et al., 2004). Daar de doel‐
De analyse van DNA methylatie, een epigenetisch proces, is mogelijk meer geschikt voor het voorspellen van het risico op cervixcarcinoom. Methylatie is het koppelen van een methyl groep 80 Ook daarna zal screenen op (pre)maligne afwijkingen zeer waar‐
schijnlijk noodzakelijk blijven daar de 2 vaccins momenteel slechts tegen hr‐
HPV 16 en 18 bescherming bieden en deze gezamenlijk voor slechts 75% van de cervixkankers in Europa ver‐
antwoordelijk zijn (Smith et al., 2007). Gezien de blijvende noodzaak voor screenen op cervixcarcinoom en de tekortkomingen van zowel de PAP‐test als van de hr‐HPV test, zou de ontwik‐
keling van nieuwe biomerkers met zeer hoge sensitiviteit en specificiteit een toegevoegde waarde kunnen heb‐
ben bij het bevolkingsonderzoek naar cervixkanker. Ook kan de tekortko‐
ming van de zeer gevoelige hr‐HPV screeningstest ondervangen worden door deze test, indien positief, te com‐
bineren met een zeer specifieke triage test. Intermezzo: DNA methylatiemerkers helpen vroegtijdige opsporing van cervixcarcinoom (CH3) aan het Cytosine (C) molecuul in de DNA sequentie wanneer dit ge‐
volgd wordt door een guanine (G) molecuul. Als er in de sequentie talrijke CG dinu‐
cleotiden in elkaars buurt voorkomen, is dit een zogenaamd CpG eiland. Me‐
thylatie van CpG eilanden in het pro‐
moter gedeelte van een gen (ongeveer de helft van de genen heeft een CpG eiland) resulteert in het uitschakelen van RNA transcriptie, hetgeen “gene silencing” genoemd. protein expression
Normal:
promoter gene region
X
inactivatie van tumor suppressorge‐
nen met als gevolg een ontregelde celdeling. In veel verschillende tumo‐
ren zijn gemethyleerde genpromoters aangetoond, alsmede in voorstadia van tumoren. DNA methylatieverande‐
ringen zijn ook weefsel‐ en tumor‐type specifiek. Daarom is DNA methylatie een uitermate interessante biomerker voor het (vroegtijdig) detecteren van (pre)maligne afwijkingen. Verschillende cervix specifieke geme‐
thyleerde genpromoters zijn reeds beschreven in de literatuur en kunnen worden opgespoord met behulp van de Methylatie Specifieke PCR (MSP) methode in uitstrijkjes van patiënten met (pre)maligne afwijkingen van de cervix. In een eerste preliminaire studie waarbij uitstrijkjes werden verzameld Cancer: M
M M
M M M
van gezonde vrouwen die een hyste‐
promoter gene region
rectomie ondergingen (controle uit‐
strijkjes) en van cervixcarcinoom pati‐
enten, bleek dat 4 methylatiemerkers Figure 12.1: methylatie van een genpro‐
moter blokkeert de genexpressie en de een sensitiviteit voor de detectie van eiwitproductie kanker hadden van 89% met een ge‐
definieerde specificiteit van 100% Het gen zal dan niet meer afgeschre‐
(Wisman et al., 2006). Deze sensitivi‐
ven worden en dus niet meer vertaald teit was vergelijkbaar met hr‐HPV naar eiwit. Het betreft hier vaak de detectie en cytomorfologie. blocked protein expression
Intermezzo: DNA methylatiemerkers helpen vroegtijdige opsporing van cervixcarcinoom 81 Methylated CpG
Un-Methylated CpG
CGACGCGCGCCGC
CGACGCGCGCCGC
Step 1:
Chemical treatment
CGACGCGCGUCGU
UGAUGUGUGUUGU
Step 2:
PCR with methylationGC TGCGCGCAGCA
GC TGCGCGCAGCA
specific primers
PCR product
X
No PCR product
Figure 12.2: Methylatie Specifieke PCR (MSP) methode: DNA wordt eerst chemisch ge‐
modificieerd (via bisulfiet) waarbij een niet‐gemethyleerde Cytosine (C) wordt geconver‐
teerd naar Uracil (U) en waarbij een gemethyleerde C onveranderd blijft. Daarna wordt het methylatieprofiel gedetecteerd door middel van een PCR‐reactie met specifieke pri‐
mers voor de gemethyleerde sequentie Desalniettemin werd met deze 4‐
merker combinatie nog steeds niet alle cervixcarcinomen gedetecteerd. Daar‐
om wordt op dit moment een aantal nieuw geïdentificeerde kanker‐
specifieke methylatiemerkers verder bestudeerd. De strategie in ons onder‐
zoek maakte gebruik van cervixcarci‐
noom cellijnen, niet of wel behandeld met demethylerende middelen. Na behandeling met deze demethyla‐
tie agentia, komen genen die door methylatie transcriptioneel inactief waren, terug tot expressie (re‐
expressie). Het verschil in RNA ex‐
pressie werd bepaald door middel van microarray analyse die werd gekop‐
peld aan een biostatistische analyse om methylatiemerkers te identifice‐
ren. Deze merkers werden functioneel 82 geëvalueerd op een screeningsplat‐
form en gerangschikt in een methyla‐
tietabel volgens hun differentieel me‐
thylatiepatroon tussen kanker en normaal cervix weefsel. Momenteel wordt gewerkt aan de verificatie en validatie van deze geme‐
thyleerde genen door de diagnostische waarde van detectie van gemethyleer‐
de genen uit te zoeken in grote aantal‐
len patiënten, die verwezen werden naar de Polikliniek Gynaecologie van het UMCG (Groningen, Nederland) en van het ULg (Liège, België) in verband met een afwijkend uitstrijkje. Wanneer de resultaten van deze ana‐
lyse bevredigend zijn, zullen vervol‐
gens nog veel grotere series uitstrijk‐
jes geanalyseerd worden, die afgeno‐
Intermezzo: DNA methylatiemerkers helpen vroegtijdige opsporing van cervixcarcinoom men zijn in het kader van screenings‐
onderzoek op cervixcarcinoom. Op deze wijze zal uiteindelijk een sensi‐
tieve en specifieke test gebaseerd op methylatiemerkers ontwikkeld wor‐
den, die gebruikt kan worden als screeningsmethode op (pre)maligne afwijkingen van de cervix of als triage test voor hr‐HPV positieve uitstrijkjes.
Microarray experimenten:
cervixcarcinoom cellijnen re-expressie data
Biostatistische analyse + literatuur:
Selectie van 232 genes
Assay design tool:
424 MSP assay designs
High-throughput MSP platform:
79170 MSP resultaten
Data analyse en interpretatie:
Rangschikking van de 424 merkers
Methylatietabel
Figure 12.3: schema voor het identifi‐
ceren van kandidaat methylatiemerkers Intermezzo: DNA methylatiemerkers helpen vroegtijdige opsporing van cervixcarcinoom 83 Ranked Assays
Cancer
Normal
Figure 12.4: methylatietabel. 87 methylatieprofielen van cervixkanker monsters (boven‐
aan) tov 114 normale cervix monsters (onderaan). Monsters worden in de Y‐as getoond, het methylatieprofiel over de 424 merkers (X‐as) per individueel monster wordt in de horizontale rijen weergegeven. Merkers die het beste discrimineren tussen kankers en normalen (hoge sensitiviteit, hoge specificiteit) worden het meest links geplaatst, met een dalend discriminerend effect naar rechts. De rode vierkantjes tonen de gemethyleerde resultaten, de groene vierkantjes tonen de niet‐gemethyleerde resultaten, de witte vier‐
kantjes tonen de ongeldige resultaten Een deel van de biostatistische analyse waarnaar in dit intermezzo wordt verwezen, is gebaseerd op Chapter 13: Discovery of methylation markers in cervical cancer, using relaxation ranking; methylatie biomerkers in baarmoederhalskanker (cervical cancer). De methodieken worden ook verder nog beschreven in Chapter 15: Geno­
me­wide promoter analysis uncovers portions of the cancer methylome en Chapter 20: Cervical cancer and the HPV family of viruses. 84 Intermezzo: DNA methylatiemerkers helpen vroegtijdige opsporing van cervixcarcinoom Chapter 13: Discovery of methylation markers in cervical cancer, using relaxa­
tion ranking Paper 4: Discovery of methylation markers in cervical cancer, using relaxation ranking Ongenaert M, G. Wisman GBA, Volders HH, Koning AJ, van der Zee AGJ, Van Criekinge W, Schuuring E. As published in BMC Medical Genomics, 1:57. (Open Access)
Background: to discover cancer specific DNA methylation markers, large­scale screening methods are widely used. The pharmacological unmasking expression microarray approach is an elegant method to enrich for genes that are silenced and re­expressed during functional reversal of DNA methylation upon treatment with demethylation agents. However, such experiments are performed in in vitro (cancer) cell lines, mostly with poor relevance when extrapolating to primary cancers. To overcome this problem, we incorporated data from primary cancer samples in the experimental design. A strategy to combine and rank data from these different data sources is essential to minimize the experimental work in the validation steps. Aim: to apply a new relaxation ranking algo­
rithm to enrich DNA methylation markers in cervical cancer. Results: the application of a new sorting methodology allowed us to sort high­
throughput microarray data from both cervical cancer cell lines and primary cervical cancer samples. The performance of the sorting was analyzed in silico. Pathway and gene ontology analysis was performed on the top selection and gives a strong indication that the ranking methodology is able to enrich towards genes that might be methylated. Terms like regulation of progression through cell cycle, positive regulation of programmed cell death as well as organ development and embryonic development are overrepre­
sented. Combined with the highly enriched number of imprinted and X­chromosome located genes, and increased prevalence of known methylation markers selected from cervical (the highest ranking known gene is CCNA1) as well as from other cancer types, the use of the ranking algorithm seems to be powerful in enriching towards methylated genes. Verifi­
cation of the DNA methylation state of the 10 highest­ranking genes revealed that 7/9 (78%) gene promoters showed DNA methylation in cervical carcinomas. Of these 7 genes, 3 (SST, HTRA3 and NPTX1) are not methylated in normal cervix tissue. Conclusion: the appli­
cation of this new relaxation methodology allowed us to significantly enrich towards methy­
lation genes in cancer. This enrichment is both shown in silico and by experimental valida­
tion, and revealed novel methylation markers as proof­of­concept that might be useful in early cancer detection in cervical scrapings.
Discovery of methylation markers in cervical cancer, using relaxation ranking 85 13.1 Introduction DNA methylation represents a modifi‐
cation of DNA by addition of a methyl group to a cytosine, also referred to as the fifth base (Doerfler et al., 1990). This epigenetic change does not alter the primary DNA sequence and might contribute to overall genetic stability and maintenance of chromosomal integrity. Consequently, it facilitates the organization of the genome into active and inactive regions with re‐
spect to gene transcription (Robert‐
son, 2002). Genes with CpG islands in the promoter region are generally unmethylated in normal tissues. Upon DNA hypermethylation, transcription of the affected genes may be blocked, resulting in gene silencing. In neopla‐
sia, hypermethylation is now consid‐
ered as one of the important mecha‐
nisms resulting in silencing expression of tumour suppressor genes, i.e. genes responsible for control of normal cell differentiation and/or inhibition of cell growth (Serman et al., 2006). In many cancers, various markers have been reported to be hypermethylated (Paluszczak and Baer‐Dubowska, 2006). The detection of DNA hyper‐
methylation was revolutionized by two discoveries. Bisulfite treatment results in the conversion of cytosine residues into uracil, except the pro‐
tected methylcytosine residues (Ha‐
yatsu, 1976). Based on the sequence differences after bisulfite treatment, methylated DNA can be distinguished 86 from unmethylated DNA, using methy‐
lation specific PCR (MSP) (Herman et al., 1996). In the last few years, hypermethylated biomarkers have been used in cancer research and diagnostics (Esteller, 2003; Esteller, 2007a; Herman, 2005). Presently, DNA hypermethylation of only few markers is of clinical rele‐
vance (Esteller, 2007a). Two classical examples are hypermethylation of MGMT in the prediction of treatment response to temozolomide in glioblas‐
toma (Hegi et al., 2005) and DNA hy‐
permethylation of GSTP1 in the early detection of prostate cancer (Hoque et al., 2005b). The search for markers that are hypermethylated in specific cancer‐types resulted in a large list of genes but more recent evidence re‐
vealed that many of these markers are methylated in normal tissues as well (Dammann et al., 2005; Wisman et al., 2006; Wisman et al., 2006). To discover novel markers that are specific for certain stages of cancer with a high specificity and sensitivity, large‐scale screening methods were developed such as Restriction Land‐
mark Genomic Scanning (Costello et al., 2000), Differential Methylation Hybridization (Yan et al., 2000; Strathdee and Brown, 2002; Huang et al., 1999), Illumina GoldenGate® Me‐
thylation, microarray‐based Inte‐
grated Analysis of Methylation by Isoschizomers (MIAMI) (Hatada et al., Discovery of methylation markers in cervical cancer, using relaxation ranking 2006) and MeDIP in combination with methylation‐specific oligonu‐
cleotide microarray (Shi et al., 2003). These approaches demonstrated that large‐scale screening has a large po‐
tential to find novel methylation tar‐
gets in a whole range of cancers. To identify cancer‐related hypermethy‐
lated genes, also pharmacological unmasking expression microarray approaches were suited (Sova et al., 2006; Tokumaru et al., 2004; Yamashi‐
ta et al., 2002). In this approach, the re‐activation of gene expression using microarray analysis was studied dur‐
ing functional reversal of DNA methy‐
lation and histone acetylation in can‐
cer cell lines using demethylating agents and histone deacetylase inhibi‐
tors. This methodology generally re‐
sults in a list of several hundreds of candidate genes. Although the analysis of the promoter (e.g. screening for dense CpG islands) is used to narrow down the number of candidate genes, the number list is still too large. This methodology has proven relevant as its application resulted in the identifi‐
cation of new potential methylated genes (Guo et al., 2004; Yamashita et al., 2006). However, the initial large scale screen‐
ing approach will also detect many genes that are not directly targets themselves but become re‐activated due to the re‐expression of for in‐
stance transcription factors (Cameron et al., 1999). Furthermore, in most studies only re‐expression data after demethylation in cell lines were used. Smiraglia and coworkers (Smiraglia et al., 2001) calculated that more than 57% of the loci methylated in cell lines were never methylated in 114 primary cancers of different malignancy types. The small number of cell lines used to identify methylated genes does not allow to draw conclusions on the rele‐
vance of such cancer‐specific genes without testing a large series of pri‐
mary tumors, which is not done in most studies. Finally, the completion of the se‐
quence of the human genome pro‐
vided information on genes, promoter gene structure, CG‐content and chro‐
mosomal localization. These data are useful to define criteria for the candi‐
date genes to act as appropriate tar‐
gets for DNA methylation. To identify genes that are downregu‐
lated due to promoter hypermethyla‐
tion and to enrich for those genes that are most frequently involved in cervi‐
cal cancer, we performed the following experiments: Affymetrix expression microarray analysis on a panel of frozen tissue samples from 39 human primary cervical cancers to identify cancer‐
specific down‐regulated genes To select those genes that are hyper‐
methylated in cervical cancer, Affy‐
Discovery of methylation markers in cervical cancer, using relaxation ranking 87 metrix expression microarray analysis on a panel of 4 different cervical can­
cer cell lines in which the expression of (hyper)methylated genes was re‐
activated upon treatment with 5‐aza‐
2'deoxycytidine (DAC) (blocking DNA methylation), and/or trichostatin A (TSA) (inhibiting histone deacetylase ‐ HDAC) Data from both approaches were combined, and a novel non‐
parametrical ranking and selection method was applied to identify and rank candidate genes. Using in silico promoter analysis we restricted the analysis to those candidate genes that carry CpG‐islands To validate whether our new ap‐
proach resulted in a significant en‐
richment of hypermethylated genes, we compared the first 3000 high‐
ranking candidate probes with lists of imprinted genes, X‐chromosome lo‐
cated genes and known methylation markers. In addition, to investigate whether the promoters of these se‐
lected gene probes are hypermethy‐
lated and this methylation is present in cancer and not in normal tissue, we determined the hypermethylation status of the 10 highest ranking candi‐
date genes in both cervical cancers and normal cervices using COBRA (COmbined Bisulfite Restriction Analysis). These data revealed a highly significant enrichment of methylated genes. 88 13.2 Material and methods 13.2.1 Primary cervical tissue
samples
For the expression microarray analy‐
sis, tissues from 39 early stage frozen cervical cancer samples were used from a collection of primary tumors surgically removed between 1993 and 2003. All patients were asked to par‐
ticipate in our study during their ini‐
tial visit to the outpatient clinic of the University Medical Center Groningen (UMCG, Groningen, The Netherlands). Gynecological examination under general anesthesia was performed in all cervical cancer patients for staging in accordance with the International Federation of Gynecology and Obstet‐
rics (FIGO) criteria (Finan et al., 1996). Tumor samples were collected after surgery and stored at ‐80 °C. The stage of cervical cancer patients in‐
cluded 33 FIGO stage IB (85%) and 6 FIGO stage IIA (15%). The median age of the cervical cancer patients was 46 years (IQ range 35 – 52 yr.). For COBRA and BSP (Bisulfite Se‐
quencing PCR), 10 (of the 39) primary cervical cancers and 5 controls (nor‐
mal cervix) were used. The age‐
matched normal cervical controls were women without a history of ab‐
normal Pap smears or any form of cancer and planned to undergo a hys‐
terectomy for benign reasons during the same period. Normal cervices Discovery of methylation markers in cervical cancer, using relaxation ranking were collected after surgery and his‐
tologically confirmed. Informed consent was obtained from all patients participating in this study. The study was approved by the ethics committee of the UMCG. 13.2.2 Cervical cancer cell lines
Four cervical carcinoma cell lines were used: HeLa (cervical adenocarci‐
noma, HPV18), SiHa (cervical squamous cell carcinoma, HPV16), CSCC‐7 (nonkeratinizing large cell cervical squamous cell carcinoma, HPV16) and CC‐8 (cervical adenosquamous carcinoma, HPV45). HeLa and SiHa were obtained from the American Tissue Type Collection. CSCC‐7 and CC‐8 (Koopman et al., 1999) were a kind gift of Prof. GJ Fleuren (Leiden University Medical Center, Leiden, the Netherlands). All cell lines were cultured in DMEM/Ham's F12 supplemented with 10% fetal calf serum. Cell lines were treated for 3 days with low to high dose (200 nM, 1 μM or 5 μM) 5‐aza‐2'deoxycytidine (DAC), 200 nM DAC with 300 nM trichostatin A (TSA) after 48 hours, or left untreated. Cells were split to low density 24 hours before treatment. Every 24 hours DAC was refreshed. After 72 hours cells were collected for RNA isolation. 13.2.3 RNA and DNA isolation
From the frozen biopsies, four 10‐µm‐
thick sections were cut and used for standard RNA and DNA isolation. After cutting, a 3‐µm‐thick section was stained with haematoxylin/eosin for histological examination and only tissues with >80% tumor cells were included. Macrodissection was per‐
formed to enrich for epithelial cells in all normal cervices. For DNA isolation, cells and tissue sections were dissolved in lysis buffer and incubated overnight at 55°C. DNA was extracted using standard salt‐
chloroform extraction and ethanol precipitation for high molecular DNA and dissolved in 250 µl TE‐4 buffer (10 mM Tris; 1 mM EDTA (pH 8.0)). For quality control, genomic DNA was amplified in a multiplex PCR contain‐
ing a control gene primer set resulting in products of 100, 200, 300, 400 and 600 bp according to the BIOMED‐2 protocol (van Dongen et al., 2003). RNA was isolated with TRIzol reagent (Invitrogen, Breda, The Netherlands) according to manufacturer’s protocol. RNA was treated with DNAse and purified using the RNeasy mini‐kit (Qiagen, Westburg, Leusden, The Netherlands). The quality and quantity of the RNA was determined by Agilent Lab‐on‐Chip analysis (ServiceXS, Lei‐
den, The Netherlands, www.serviceXS.com). Discovery of methylation markers in cervical cancer, using relaxation ranking 89 13.2.4 Expression data
Gene expression for 39 primary can‐
cers and 20 cell line samples was per‐
formed using the Affymetrix HGU 133 Plus 2.0 array with 54,675 probes for analysis of over 47,000 human tran‐
scripts. The labeling of the RNA, the quality control, the microarray hy‐
bridization and scanning were per‐
formed by ServiceXS according to Affymetrix standards. For labeling, ten microgram of total RNA was amplified by in vitro transcription using T7 RNA polymerase. Quality of the microarray data was checked using histograms, boxplots and a RNA degradation plot. One cell line sample was omitted because of poor quality. Using BioConductor (Gentleman et al., 2004), present (P), absent (A) or marginal (M) calls were determined with the MAS5 algorithm. MAS5 uses a non‐parametric statisti‐
cal test (Wilcoxon signed rank test) that assesses whether significantly more perfect matches show more hybridization signal than their corre‐
sponding mismatches to produce the detection call for each probe set (Liu et al., 2002). The relaxation ranking approach only relied on P‐calls. Some samples were analyzed in duplicate, and the profile of P‐calls is highly simi‐
lar (93‐95 % of the probesets have an identical P/M/A call). 90 13.2.5 Relaxation ranking algorithm
In order to identify the most promis‐
ing markers that are methylated in cervical cancer, we assumed that such markers should be silenced in cancer cells and upregulated upon re‐
activation after DAC/TSA treatment, Therefore, the best methylation mark‐
ers will be genes represented by probes with: • no expression in primary cervical cancers: P‐calls=0 out of 39 cancers • no expression in (untreated) cervi‐
cal cancer cell lines: P‐calls=0 out of 4 cell lines • expression in cervical cancer cell lines treated with DAC (or DAC in combination with TSA): P‐calls=15 out of 15 treated cell lines To select for those gene probes that would be the best candidate hyper‐
methylated genes in cervical cancer, we present the relaxation ranking algorithm. Probesets were ranked, not primarily based on the number of P‐
calls and thus explicitly setting thresholds, but primarily driven by the number of probesets that would be picked up, based on selection criteria (the number of P‐calls in primary can‐
cers, untreated and treated cell lines). The stricter (e.g. P‐calls: 0 ‐ 0 ‐ 15) these selection criteria, the lower the number of probes that meet with these criteria; while if the conditions Discovery of methylation markers in cervical cancer, using relaxation ranking become more and more relaxed (higher number of P‐calls in primary cancers and untreated cell lines, and lower number of P‐calls in treated cell lines), the more probes will comply. In the end, using P‐calls: 39 ‐ 4 ‐ 0 as criteria, all probe sets were returned. This way, there was no need to define a ‘prior’ threshold for the number of P‐
calls. The following sorting method was applied (R‐scripts are presented in the supplementary data): 1) All possible conditions were gen‐
erated and the number of probes that were picked up under these conditions was calculated: a. the number of samples with expression (P) of a certain probe in i. primary cervical cancer samples is called xsample ii. cervical cancer cell lines is called ysample iii. treated cervical can‐
cer cell lines is called zsample b. all combinations of x, y and z are made i. x (the number of P‐calls is primary cancers) varies from 0 to 39 ii. y (the number of P‐calls in untreated cell lines) from 0 to 4 iii. z (the number of P‐calls in treated cell lines) from 0 to 15 iv. In total, 3200 combinations of x, y and z can be made c. a probeset was found under each of these generated condi‐
tions x, y and z if: i. xsample ≤ x (number of P‐calls for probe in primary cancers smaller or equal compared to condi‐
tion) AND ii. ysample ≤ y (number of P‐calls for probe in untreated cell lines smaller or equal compared to condi‐
tion) AND iii. zsample ≥ z (number of P‐calls for probe in treated cell lines lar‐
ger or equal com‐
pared to condition) d. under very strict conditions (x=0, y=0, z=15) no probes were found, while under the most relaxed conditions (x=39, y=4, z=0) all probes Discovery of methylation markers in cervical cancer, using relaxation ranking 91 were returned. For all combi‐
nations of x, y and z, the num‐
ber of probes that complied (w), was stored 2) The data was sorted with w as pri‐
mary criterion (ascending), followed by x (ascending), y (ascending) and z (descending) 3) This sorted dataset was analyzed row per row. In row i, the wi probes retrieved with criteria xi yi zi were compared with the list of probes, al‐
ready picked up in rows 1 to i‐1. If a probe did not occur in this list, it was added to the list 4) This process continued until there were m (user‐defined) probes in the list DNA methylation analysis using CO‐
BRA and bisulphate sequencing 13.2.6 DNA methylation analysis
using COBRA and bisulfite sequencing
To validate the (hyper)methylated status of candidate gene probes, DNA extracted from 10 cervical cancers and 5 normal cervices were analyzed using BSP and COBRA. Bisulfite modification of genomic DNA was performed using the EZ DNA methylation kit (Zymogen, BaseClear, Leiden, The Netherlands). The 5’ promoter region of the tested gene was amplified using bisulfite 92 treated DNA. PCR primers for amplifi‐
cation of specific targets sequences are listed in Supplementary Table 1. COBRA was performed directly on the BSP products as described by Xiong et al. (Xiong and Laird, 1997) using di‐
gestions with BstUI, Taq1 and/or HinfI according the manufacture’s protocol (New England Biolabs Inc., Beverly, MA). For sequence analysis, the BSP products were purified (Qiagen) and subjected to direct sequencing (Base‐
Clear, Leiden, The Netherlands). Leu‐
kocyte DNA collected from anonymous healthy volunteers and in vitro CpG methylated DNA with SssI (CpG) me‐
thyltransferase (New England Biolabs Inc.) were used as negative and posi‐
tive control, respectively.
13.3 Results To identify novel markers that are methylated in cervical cancer, we ap‐
plied a multistep approach that com‐
bines re‐expression of silenced hy‐
permethylated genes in cervical can‐
cer cell lines (using DAC and DAC/TSA), downregulated expression in 39 cervical cancers expression, and selection of candidate markers using a relaxing ranking algorithm. The best profile of a candidate marker would be: no expression in any of the 39 cervical primary cancers and 4 un‐
treated cancer cell lines, but re‐
activation of expression after de‐
methylation and/or blocking of his‐
tone deacetylation in all 15 cell lines Discovery of methylation markers in cervical cancer, using relaxation ranking treated with various combinations of DAC/TSA (P‐calls: 0 – 0 – 15). How‐
ever, none of the probe sets showed this ideal profile. To generate a list of candidate genes, a relaxation ranking algorithm was applied. Figure 13.1: The number of probes (w) that is retrieved using parameters x (number of P‐calls in primary cancers for probe), y (number of P‐calls in untreated cell‐lines for probe) and z (number of P‐calls in treated cell‐lines for probe) The only variable used in the relaxa‐
tion ranking is the number of probes we would like to retrieve. As shown in Figure 13.1, the number of probes retrieved (w) with parameters x, y and z (the number of P‐calls in respec‐
tively primary tumor samples, un‐
treated and treated cell lines) follows a complex profile which consists not only of additive elements, but also interactions between the parameters. In general, the number of P‐calls in primary cancer samples (x) has the largest influence on w. The sorting methodology has the advantage that no cut‐off values have to be chosen for x, y and z, and therefore there is no need to implicitly link a relative weight factor to the parameters. To calculate the most optimal number of potentially hypermethylated candi‐
date markers for further analysis, we estimated this number based on known (i.e. described in literature) methylation markers in cervical can‐
cer. Forty‐five known methylation markers were found using text‐mining using GeneCards (Rebhan et al., 1997) for aliases/symbols to query PubMed through NCBI E‐Utils (Supplementary Table 2). The position of the markers after ranking (“observed”) was deter‐
mined as shown in the step plot in Figure 13.2. If the markers would be randomly distributed in the ranking, the profile would be similar to the curve, marked ‘expected’. This ‘ex‐
pected’ curve is not a straight line, but is calculated based on whether a probe could be assigned with a gene symbol and taking probes into account that are associated with a gene that is already associated with an earlier selected probe. The number of ob‐
served methylation markers has in general the same slope as expected. However, until about 3000 probes, the slope of the number observed markers versus the number of selected probes Discovery of methylation markers in cervical cancer, using relaxation ranking 93 (in dashed lines) cannot be explained if the markers would be randomly distributed as its steepness is much higher. When selecting more than 3000 probes, the slope suddenly de‐
creases to a level that is close to ran‐
dom distribution. This enrichment can also statistically be proven (see fur‐
ther). Therefore, we selected the first 3000 probes, referred to as TOP3000, in the ranking for further analysis. In this TOP3000 list, 2135 probes are associated with a gene symbol, of which 1904 are unique. Number of methylation markers in cervical cancer
40
Expected
Observed
30
20
10
0
0
10000
20000
30000
40000
Number of selected probes
50000
60000
Figure 13.2: Step‐plot to determine optimal number of probes for further analysis. Step‐
plot of the number of retrieved known markers (45 published hypermethylation markers in cervical cancer, see Supplementary Table 2) as a function of the position after relaxa‐
tion ranking (this is the number of selected probes after ranking). The step plot shows the actual (observed) number of markers. If the markers were randomly distributed, one would expect the profile, marked with ‘expected’ (details in the text). The trend of the observed markers versus the number of selected probes is indicated with dashed lines 94 Discovery of methylation markers in cervical cancer, using relaxation ranking 13.3.1 The validation of the top
3000 probe-list selected
using relaxing highranking
To validate whether the TOP3000 contains potential hypermethylated genes, we determined the occurrence of various gene sets that are known to be hypermethylated such as imprinted genes, chromosome‐X genes, cervical cancer‐related hypermethylated genes and genes reported to be methylated frequently in cancers, other than cer‐
vical cancer. Enrichment for imprinted genes Imprinting is a genetic mechanism by which genes are selectively expressed from the maternal or paternal homo‐
logue of a chromosome. As methyla‐
tion is one of the regulatory mecha‐
nisms controlling the allele‐specific expression of imprinted genes (Holmes and Soloway, 2006), it is expected that known imprinted genes are enriched in the TOP3000 selection. According to the Imprinted Gene Cata‐
logue (Morison et al., 2005), this TOP3000 list contains 16 imprinted (or parent‐specific expressed) genes (Supplementary Table 3)). On the whole Affymetrix array in total 74 imprinted genes could be assigned with a probe. Taking into account duplicate probes and probes that are not associated with a gene symbol, 8.76 imprinted genes could be ex‐
pected in the first 3000 probes if the imprinted genes were randomly dis‐
tributed indicating a 1.83‐fold [16/8.76] enrichment in the TOP3000 (Χ²=5.904; p=0.0151). The enrichment towards imprinted genes is even more significant in the TOP100 candidate genes (3 versus only 0.31 expected; Χ²=14.9; p<0.0001). All statistical enrichment tests are chi‐square tests with Yates’ correction, given p‐values are two‐tailed.
Enrichment for genes on the X­
chromosome X‐chromosome‐inactivation in females is initiated from an inactivation center that produces the Xist transcript, an RNA molecule that covers one copy of the X‐chromosome and results in si‐
lencing of gene expression. This coat‐
ing initiates a number of chromatin changes including stable DNA methy‐
lation (Heard, 2004). Of the entire list of 54675 probes on this Affymetrix array, 40683 could be associated with a chromosomal location, 1325 probes are located on the X‐chromosome. In the TOP3000 list (with 2239 chromo‐
somal locations known), 93 probes are located on chromosome X (data not shown) indicating a significant en‐
richment (1.28‐fold) of X‐
chromosome‐located probes in the TOP3000; Χ²=5.8; p=0.0165). This enrichment is even more significant within the TOP1000 (42/708 chromo‐
somal locations; Χ²=12.567 ; Discovery of methylation markers in cervical cancer, using relaxation ranking 95 p=0.0004) and TOP100 probes (13/71 chromosomal regions known; Χ²=36.097; p<0.0001). Enrichment for cervical cancer spe­
cific methylation markers The enrichment of known methylation markers involved in cervical cancer (Supplementary Table 2) was already illustrated significant when calculating the optimal number of probes for fur‐
ther testing and hereby demonstrated the enrichment towards these mark‐
ers. In the TOP3000, 10 known genes are present (Table 13.1). As only 5.33 probes of these known methylation markers for cervical cancer are ex‐
pected if randomly distributed, the TOP3000 list is enriched for these markers 1.88‐fold (Χ²=3.715 ; p=0.0539). Enrichment for known hypermethy­
lation markers in cancers other than cervical cancer To determine whether the ranking methodology is able to enrich towards known hypermethylation markers reported in various cancer types, PubMeth (a literature‐based methyla‐
tion database) was used (Ongenaert et al., 2008). Of the 40683 gene‐probes on the Affymetrix array, 349 genes are present in the database. Interestingly, in the TOP250 probes (representing 152 unique genes), 10 known methy‐
96 lation are described in the database (NNAT, SST, NPTX1, MAGEA3, CYP1A1, PRSS21, BDNF, MEG3, SNAI1, CCNA1). If randomly distributed, tak‐
ing duplicate probes and probes not associated with gene symbols into account, 3.3 genes were expected (Χ²= 12.028, p=0.0005). This enrichment is also observed in the TOP1000 (27 known markers vs. 14 expected, Χ²=11.947, p=0.0005) and TOP3000 probes (55 known markers vs. 41 expected, Χ²=4.871, p=0.0273). Table 13.2 summarizes genes that are asso‐
ciated with hypermethylation in can‐
cer, as found by a literature search and present in the TOP250. Interestingly, this analysis revealed that known methylation markers seem to be significantly enriched and highly‐ranked. This also showed that the known cervical cancer‐specific markers are not enriched to the same extend (CCNA1 is highest at position 234), implying the existence of better hypermethylated markers, involved in cervical cancer, in the TOP3000 list. In summary, as the top‐list contains a relatively large number of imprinted, chromosome‐X and known methyla‐
tion markers, our analysis revealed that the ranking strategy was able to enrich the candidate gene list with possible new (hyper)methylated genes. Discovery of methylation markers in cervical cancer, using relaxation ranking Table 13.1: Reported DNA methylation markers in cervical cancer present in the TOP3000 Gene symbol Rank
Chromosomal loca­
tion References
CCNA1 234
13q12.3‐q13
TIMP2 TFPI2 PEG3 RUNX3 IGSF4 PTEN TNFRSF10D TIMP3 APC 404
651
1242
1463
1742
1926
2270
2500
2733
17q25
7q22
19q13.4
1p36
11q23.2
10q23.3
8p21
22q12.3
5q21‐q22
(Kitkumthorn et al., 2006) (Ivanova et al., 2004) (Sova et al., 2006) (Dowdy et al., 2005) (Kim et al., 2004)
(Steenbergen et al., 2004) (Cheung et al., 2004) (Shivapurkar et al., 2004) (Wisman et al., 2006) (Wisman et al., 2006) Published DNA methylation markers in cervical cancer were selected by literature text mining (see Supplementary Table 2) Gene Ontology Associated with the development and progression of cancer, silencing by hypermethylation often affects genes in important pathways (Baylin and Ohm, 2006). Therefore, we investi‐
gated whether our selected TOP3000 candidate genes are associated with specific pathways or harbor related functions. Multiple GO‐terms using Gene Ontology (GO) by GOstat (Beiss‐
barth and Speed, 2004) (Supplemen‐
tary Table 4) and specific pathways using Ingenuity Pathway Analysis (IPA) (Supplementary Table 5), were significantly over‐represented within the TOP3000 list when compared to all annotated human genes. These terms include regulation of transcription ‐ DNA‐dependent, tran‐
scription from RNA polymerase II promoter, regulation of progression through cell cycle, positive regulation of programmed cell death as well as organ development and embryonic development (all p‐values <10‐6 ; cor‐
rected for multiple hypothesis testing (Hochberg and Benjamini, 1990)). Genes in these processes were re‐
ported to be often methylated during cancer progression (Herman, 2005). Genes responsible for development and differentiation are mainly silenced by methylation in normal tissues. On the other hand, in cancer tissues, genes responsible for cell cycle control and induction of apoptosis are often Discovery of methylation markers in cervical cancer, using relaxation ranking 97 aberrantly expressed as many of thes‐
genes have tumor suppressor activity. DNA hypermethylation is one mecha‐
nism to regulate expression of tumor suppressor genes (Serman et al., 2006). The GO‐analysis provided additional strong indication that our highest ranking genes in the top‐list are sig‐
nificantly enriched for methylated genes involved in cervical cancer tis‐
sue or cell lines. Table 13.2: Listing of cancer‐associated hypermethylation markers that have been re‐
ported previously within the 250 highest ranking genes, as found by literature search (through NCBI E‐fetch, using GeneCards to search aliases) Gene symbol Rank
ZIK1 6
Chromosomal location 19q13.43
NNAT 1, 2 21
20q11.2‐q12
SST1 22
3q28
SSX23 27
Xp11.23‐p11.22
NPTX11 29
17q25.1‐q25.2
PRSS21 72
16p13.3
CYP1A11 76
15q22‐q24
MAGEA31, 3 96
Xq28
References
Intestinal metaplasia
(Mihara et al., 2006) Pediatric acute leukemia
(Kuerbitz et al., 2002) Colon cancer
(Mori et al., 2006) Bladder cancer
(Fradet et al., 2006) Pancreatic cancer
(Hagihara et al., 2004) Testicular cancer
(Manton et al., 2005) (Kempkensteffen et al., 2006) Prostate cancer
(Okino et al., 2006) Different cancer cell lines (Leuke‐
mic, Hepatic, Prostate, Breast, Co‐
lon) (Wischnewski et al., 2006) Hepatocellular carcinoma (Qiu et al., 2006) INSR 98 124
19p13.3‐p13.2
Melanoma (Sigalotti et al., 2002) Prostate cancer
(Wang et al., 2005) Discovery of methylation markers in cervical cancer, using relaxation ranking Gene symbol Rank
DLX1 150
Chromosomal location 2q32
PAX9 207
14q12‐q13
ZNF342 228
19q13.32
CCNA11, 4 234
13q12.3‐q13
References
Lung cancer
(Rauch et al., 2006) Lung cancer
(Rauch et al., 2006) Brain cancer
(Hong et al., 2003) Cervical cancer (Kitkumthorn et al., 2006) Head­and neck cancer (Tokumaru et al., 2004) CTCFL 245
20q13.31
LIFR 248
5p13‐p12
Oral cancer (Shaw et al., 2006) Prostate and bladder cancer
(Hoffmann et al., 2006) Hepatocellular carcinoma
(Blanchard et al., 2003) 1Genes, whose promoter has been described in literature as being methylated in certain cancer types selected by screening the PubMeth database. 2Imprinted. 3Located on the X­chromosome. 4Methylated in cervical cancer (literature search)
13.3.2 Validation of the 10 highest ranking candidate
genes by COBRA
In order to validate whether the high‐
est ranking genes represent markers that are functionally hypermethylated in cervical cancer, we performed CO‐
BRA on bisulfite‐treated DNA of 10 cervical cancers and 5 normal cervi‐
ces. For this analysis we focused on those first 10 genes from the highest ranking probe‐list that (see Supple‐
mentary Table 6 for more details): • represent a known gene (i.e. gene symbol) • contain a CpG‐island surrounding the TSS • are located on any chromosome except chromosome X • are expressed in less than 15 car‐
cinomas BSP was used to amplify the CpG‐
islands of these candidate genes using bisulfite‐treated DNA and COBRA to determine the methylation status. CCNA1 (at position 49; Supplementary Discovery of methylation markers in cervical cancer, using relaxation ranking 99 Table 6) was included as a positive control for the highest listed, reported cervical cancer specific methylation gene promoter (Table 13.1). BSP/COBRA of CCNA1 revealed that 6 of 10 carcinomas are methylated at the restriction enzyme sites (T1, T3, T5, T7, T9 and T10 in Figure 13.3). Sequence analysis of the BSP‐products on average 7‐9 independent clones for each carcinoma) of these 10 carcino‐
mas revealed that in 6 carcinomas the promoter is hypermethylated in good agreement with the COBRA results.
A
-430
B
B
T
T B
B
B
T1
- B T
T2
-
T3
B T - B T
T4
T5
- B T
- B T
T6
T7
- B T - B T
T8
T9
- B T - B T
T10
-
B T
L
IV
- B T - B T
C
Tumor 1 (2/9)
Tumor 2 (1/7)
Tumor 3 (6/9)
Tumor 4 (3/8)
Tumor 5 (7/9)
Tumor 6 (3/9)
Tumor 7 (3/9)
Tumor 9 (7/8)
Tumor 10 (9/9)
Methylation
100%
40 90%
10 30%
0%
Figure 13.3: (Hyper) methylation analysis of the promoter region (‐430 to ‐5 of TSS) of the CCNA1 gene by COBRA and sequence analysis. A: schematic representation of the restriction enzyme sites (B: BstUI and T: TaqI) in the virtual hypermethylated BSP nucleo‐
tide sequence after bisulfite treatment. Vertical bars represent CG site, arrow represents TSS (retrieved from Ensembl). B: Result of COBRA analysis of the BSP products of 10 tumor samples (T1‐T10), in vitro methylated DNA as a positive control (IV) and leucocyte DNA as a negative (unmethylated) control (L). C. Schematic representation of the se‐
quencing results. From each tumor, the BSP‐products were cloned into TOPO‐pCR4 (Invi‐
trogen) and sequencing (BaseClear) was performed on M13‐PCR products of 7‐9 inde‐
pendent clones. Circles represent CG dinucleotides: the darker, the more clones at this site were methylated Table 13.3 summarizes the methyla‐
tion status of the 10 highest ranking genes in 10 cervical cancer and 5 100 normal cervices using COBRA. One gene (ADARB1 at rank 2) could not be analyzed for methylation as no specific Discovery of methylation markers in cervical cancer, using relaxation ranking BSP products could be amplified using several combinations of primer pairs. Interestingly, using the BSP products of the other 9 listed genes, 7 (78%) showed methylation in carcinomas (Table 13.3) Four genes are hyper‐
methylated in all 9 tested cancers, while for SST (7 of 9 carcinomas), HTRA3 (1 of 9 carcinomas) and NPTX1 (5 of 10 carcinomas) a fraction of the carcinomas is hypermethylated. Figure 13.4 shows representative me‐
thylation analysis of 3 genes using COBRA. Three (NNAT, SST and NPTX1) of the 7 hypermethylated gene promoters have been reported to be methylated in tumors previously. Taken these data together, these find‐
ings showed that the relaxation rank‐
ing algorithm resulted in a very sig‐
nificant enrichment for genes with a positive methylation status. Table 13.3: Methylation status using COBRA of the 10 highest ranking gene promoters Rank Gene Chromosomal location Methylation in cancer Methylation in normal 1 DAZL 3p24.3 9/9 5/5 2 ADARB1 21q22.3 Nd Nd 3 SYCP3 12q 9/9 5/5 4 AUTS2 7q11.22 0/9 0/5 5 NNAT 20q11.2 9/9 5/5 6 SST 3q28 7/9 0/5 7 HTRA3 4p16.1 1/9 0/5 8 ZFP42 4q35.2 9/9 5/5 9 NPTX1 17q25.1 5/10 0/5 10 GDA 9q21.13 0/9 0/5 47 CCNA1 6/10 0/5 Genes, selected for further validation after applying additional criteria. Included is CCNA1 on postion 47 (original position 241) as the highest ranking cervical­
cancer­associated hypermethylated gene. Methylation status was determined by BSP/COBRA. Enrichment of cervical cancer spe­
cific methylation markers. A cervical‐cancer‐specific hyper‐
methylated marker is only of rele‐
vance for the diagnosis of (pre‐) ma‐
lignant disease in case normal cervical epithelium is not methylated. COBRA analysis of 5 normal cervices for all 9 genes revealed that 4 genes (DAZL, Discovery of methylation markers in cervical cancer, using relaxation ranking 101 permethylation in only 6 of 10 tumors but none of the 5 normals (Table 13.3). SYCP3,ZFP42 and NNAT) are hyper‐
methylated in all 5 samples. On the other hand, of the 7 genes hyper‐
methylated in cervical cancer speci‐
mens, 3 genes (SST, HTRA3 and NPTX1) did not show DNA methyla‐
tion in any of the normal cervices of 5 independent individuals. We ob‐
served the same methylation profile for CCNA1 that was reported previ‐
ously as a cervical cancer specific gene (Kitkumthorn et al., 2006) with hy‐
This analysis revealed that the re‐
laxation ranking algorithm not only resulted in a very significant enrich‐
ment for genes with a positive methy‐
lation status, but also for hyper‐
methylated genes that are specifi‐
cally methylated in cancers and not in the normal cervices. A
-208
B
B
B
T
186
SYCP3
300
T
B
T
B
B
B
T
TB
T
-184
AUTS2
-281
HB
BT
B
B 177
SST
B
T1
T2
B T
B T
T3
T4
T1
T2
T3
T4
T5
T6
T7
B T
B T
B T
B T
B T
B T
B T
B T
T5
B T
T7
B T
T8
B T
T9
B T
T10
B T
B T
N1
N2
N3
N4
N5
B T
B T
B T
B T
B T
N3
N4
N5
IV
L
B T
B T
B
SYCP3
T8
B T
T9
T10
N1
N2
B T
B T
B T
B T
B T
B T
B T
IV
L
B T
B T
B
AUTS2
T1
B H T
SST
T2
B H T
T3
B H T
T4
T5
T7
T8
B H T
B H T
B H T
B H T
T9
B H T
T10
B H T
N1
B H T
N3
B H T
N4
N5
B H T
B H T
IV
B H T
L
B
B H T
Figure 13.4: Representative COBRA on 3 gene promoters (SST, AUTS2 and SYCP3). A: schematic representation of of the restriction enzyme sites in the virtual hypermethylated BSP nucleotide sequence after bisulfite treatment.(B: BstUI, T: TaqI and H: HinfI). Bars represent CG site and arrow is TSS (retrieved from Ensembl). B: Result of COBRA analysis of BSP products of tumor samples (T1‐T10) and 5 normal cervices (N1‐N5), in vitro methylated DNA as a positive control (IV) and leukocyte DNA as a negative (unmethylated) control (L); lane B is water blank 102 Discovery of methylation markers in cervical cancer, using relaxation ranking 13.4 Discussion In this study, we optimized the identi‐
fication of methylation markers after pharmacological unmasking microar‐
ray approach combined with microar‐
ray expression data of primary cancer samples. For the integration of data from both cell lines and primary can‐
cers, we developed a novel ranking strategy, which combines re‐activation in cell lines and no expression in pri‐
mary cancer tissue. The relaxation ranking algorithm uses a non‐
parametrical method of sorting. No threshold on expression level or P‐
calls has to be set and no overlap be‐
tween different cell lines has to be chosen. The only parameter needed is the number of probes/genes that should be included in the top list. Us‐
ing this algorithm, genes can still be selected for further analysis, even if it is not in (almost) all cell lines re‐
expressed or not silenced in most primary tumour samples. In this study, we showed that the ex‐
perimental design in combination with the ranking strategy is able to enrich a list of probes for methylated genes. Imprinted genes and genes on the X‐
chromosome are significantly en‐
riched in the high‐ranking TOP3000 probes. Pathway and gene ontology analysis illustrates that the high‐
ranking genes are involved in tumour development and progression. En‐
richment of similar pathways or on‐
tologies when selecting abnormal expressed genes is commonly re‐
ported in various cancer types (Korn‐
berg et al., 2005; Wang et al., 2005). More importantly, methylation mark‐
ers reported to be involved in various cancers (including cervical cancer) are significantly enriched in the top‐lists as well. Interestingly, the highest rank‐
ing cervical cancer specific gene is CCNA1 (position 234 in Table 13.1; position 49 in Table 13.3). Apart from cervical cancer, CCNA1 was reported to be hypermethylated in colorectal, oral, head and neck cancer (Shaw et al., 2006; Tokumaru et al., 2004; Xu et al., 2004). In good agreement with the reported data, we show that CCNA1 is hypermethylated in 6 of 10 cervical carcinomas and none of the normal cervices using COBRA and BSP‐
sequencing (Table 13.3 and Figure 13.3). Analysis of the methylation status of the highest ranking genes revealed that seven out of nine selected genes (78%) are methylated in cervical can‐
cers, whereas 4 of these 7 genes (DAZL, SYCP3, ZFP42 and NNAT) were also hypermethylated in all 5 normal cervices (Table 3). Although hyper‐
methylation of NNAT has been impli‐
cated in pediatric acute leukemia (Kuerbitz et al., 2002), the hyper‐
methylation status in both cancer and normal tissues suggests that NNAT acts as an imprinted gene (Supple‐
mentary Table 3) rather than cancer Discovery of methylation markers in cervical cancer, using relaxation ranking 103 specific gene in cervical cancer. The other three genes (SST, HTRA3 and NPTX1) might be cancer‐specific be‐
cause these genes are, similar as CCNA1, both hypermethylated in the cancers and not in the normal controls (Table 13.3). Of these genes two were previously described as cancer specific genes, SST in colon carcinoma (Mori et al., 2006) and NPTX1 in pancreatic cancer (Hagihara et al., 2004). How‐
ever, all 3 genes have not been de‐
scribed previously in cervical cancer. The exact involvement in cervical cancer development of these 3 genes has to be explored in the future, but the application of the relaxation rank‐
ing algorithm illustrates the power of enrichment for new hypermethylated genes that can discriminate between cervical cancer and normal cervical epithelium. The combination of the initial setup and the analysis is unique. In most other studies either few genes are investigated for their methylation status in primary cancer samples or a large‐screening approach is applied on cell lines only. Generally, only genes which are re‐expressed in most cell lines can be retained for further inves‐
tigation, as several hundreds of genes are upregulated in one or more cell lines after treatment with DAC/TSA. Most studies used additional filtering (such as pathway analysis, known mutated genes), but the list of candi‐
date genes that need experimental 104 validation to determine their methyla‐
tion status is long. These markers need to go through a pipeline of DNA methylation detection in cell lines and cancer samples, in order to find only a few cancer specific markers with different sensi‐
tivity and specificity (Mori et al., 2006; Suzuki et al., 2002; Yamashita et al., 2002). However, the success rate is relatively low, as many promoter re‐
gions do not show (differential) methylation. In addition, CpG arrays can be used to identify putative methylation markers, as recently described for cervical cancer (Lai et al., 2008). Again, this method re‐
quires the analysis of many markers to end up with only few cervical cancer specific methylation markers. In the last few years it became appar‐
ent that many markers that are methy‐
lated in cancer have been shown to be methylated in normal tissues as well (Cheong et al., 2006; Dammann et al., 2005; Hoque et al., 2008; Wisman et al., 2006). Our present analysis once more illustrates that many more genes, preceded by a CpG island in the promoter region, are methylated in normal tissue as well than was previ‐
ously anticipated. To be able to further increase the enrichment for these cancer specific methylated markers, the inclusion of expression microar‐
ray data from normal tissue in the relaxation ranking algorithm analy‐
Discovery of methylation markers in cervical cancer, using relaxation ranking sis might be helpful. To validate this, we performed global gene expression microarray analysis using the Affy‐
metrix HGU 133 Plus 2.0 array with 54,675 probes (in parallel with the samples described in this study) on 5 independent age‐matched normal cervices from healthy women. We assume that cancer specific methy‐
lated markers should be expressed in all normal cervices resulting in a positive P‐call (most optimal P‐call=5). Including the P‐call for normal ex‐
pression on our 10 highest ranked methylated genes and CCNA1 revealed that all the four cervical cancer spe‐
cific methylated genes (SST, HTRA3, NPTX1 and CCNA1) would not have been selected as none of the normal cervices showed a P‐call for these probes (data not shown). It is gener‐
ally accepted that tumor suppressor genes (including cancer specific me‐
thylated genes) are characterized by the fact that their expression can be downregulated as the result of methy‐
lation, mutations and/or deletions, but is still present in its normal counter‐
part tissue. However, the expression levels in normal tissue are relatively low for most of these genes when compared to those cancer tissues that do not show downregulation as was reported for p16INK4a (Kang et al., 2006). Thus, our data suggest that the addition of expression data of normal cervices would not enrich for cervical cancer specific methylated genes. Other possibilities to further refine the selection of cancer relevant hy‐
permethylated genes are to restrict the ranking to gene promoters that are likely to be methylated because of defined CG‐content or the presence of conserved motifs (or similar se‐
quence attributes) related to hyper‐
methylated promoters (Bock et al., 2006; Das et al., 2006; Feltus et al., 2003). Recently, novel methylation markers were identified, based on a genome‐wide promoter alignment (Hoque et al., 2008). Promoters, closely related in the alignment with known methylation markers show to have a high chance to be methylated as well. In conclusion, the application of this new relaxation ranking meth‐
odology allowed us to significantly enrich towards methylation genes in cancer. This enrichment is both shown in silico and by experimental validation, and revealed novel me‐
thylation markers as proof‐of‐
concept that might be useful in early cancer detection in cervical scrapings. 13.5 Acknowledgments This study was supported by On‐
coMethylome Sciences S.A., Liège, Belgium and by the Dutch Cancer So‐
ciety (KWF‐NKB‐RUG 2004‐3161).
Discovery of methylation markers in cervical cancer, using relaxation ranking 105 Chapter 14: Exploring the cancer methy­
lome using genome­wide promoter analysis Paper 5: Exploring the cancer methylome using genome­wide promoter analysis Ongenaert M, Straub J, Hoque MH, Wisman GBA, Len,dvai A, Schuuring E, van der Zee AGJ, Yamashita K, Sidransky D, Van Criekinge W. This chapter describes the computational and statistical framework of a marker­selection methodology, which is validated and published in a more clinical context in Chapter 15: Genome­wide promoter analysis uncovers portions of the cancer methylome. Background: DNA methylation has a role to mediate epigenetic silencing of genes in cancer and other diseases. Various studies give strong indications that DNA­sequence attributes (such as sequence patterns) contain informa­
tion about the epigenetic state of the sequence. This principle could be used to be able to identify previously undescribed gene promoters that could be me­
thylated in a cancer­specific way. Methods: Two separate genome­wide ap­
proaches are used to link primary DNA sequence and DNA hypermethylation in cancer. One approach relies on a genome­wide alignment of promoter regions (broad), while the other focuses on the appearance of short DNA mo­
tifs in the promoter. These computational approaches were combined with a robust, established pharmacological unmasking strategy. The results are verified in silico and validated in cancer cell lines and primary tumor sam­
ples. Results: Both computational approaches show to be able to enrich to­
wards possible cancer­specific methylated genes. In the genome­wide promo­
ter alignment, known methylation markers show to be more densely clustered than one would expect if they were randomly distributed. The patterns used in the motif­based approach are overrepresented in a close region around the transcription start site and show to be more conserved throughout evolution than the surrounding promoter sequence, an indication that the selected patterns possibly have a biological function. Exploring the cancer methylome using genome‐wide promoter analysis
107 However, epigenetic and genetic effect often interfere. DNA‐methylation al‐
DNA methylation represents a modifi‐ ters chromosome organization, inhib‐
cation of DNA by addition of a methyl its the binding of proteins such as group to a cytosine, also referred to as CTCF (Hark et al., 2000). On the other the fifth base (Doerfler et al., 1990). hand DNA‐methylation also promotes This reaction uses S‐adenosyl‐ or recruits the binding of proteins on methionine as a methyl donor and is DNA. Examples of such proteins are catalyzed by a group of enzymes, the MECP2, MBD1, MBD2, MBD3 and DNA methyltransferases (DNMTs). In MBD4, which induce histone modifica‐
humans and other mammals, this epi‐ tion (Jaenisch and Bird, 2003). genetic modification is almost exclu‐
sively imposed on cytosines that pre‐ Interaction in the other direction, how cede a guanosine in the primary DNA the primary DNA‐sequence might sequence (often called a CpG dinucleo‐ influence epigenetic changes, is not tide). The frequency of these CpGs in fully understood. the genome is much lower than would be expected. A methylated cytosine is At first, it became clear that DNA‐
often subject to deamination thereby methylation is not a random event, forming thymidine. However, in some shown by the fact that some genes are regions, dense clusters of CpGs can be more frequently methylated than oth‐
identified: these regions are referred ers and that a specific and unique to as CpG islands (Herman and Baylin, profile of CpG island hypermethylation can be defined for the most common 2003). tumor types. This profile even allows DNA‐methylation is an epigenetic the classification of the originating change: it does not alter the primary tissue using hierarchical clustering DNA sequence and might contribute to (Paz et al., 2003). Why certain genes overall genetic stability and mainte‐ are targets for aberrant DNA‐
methylation in their promoter region nance of chromosomal integrity. is unknown. It is known that the main Epigenetic modifications such as aber‐ DNA methylating enzyme (DNMT1, rant DNA‐methylation of CpG‐islands DNA methyltransferase 1) is not able in the promoter region, are considered to methylate the majority of CpG is‐
as one of the mechanisms leading to lands even when it is brought to over‐
silencing of tumor suppressor genes in expression (Feltus et al., 2003) . 14.1 Introduction human cancer. These epigenetic changes seem to act separately from the primary DNA‐sequence itself. 108 Later, indications that the primary DNA‐sequence itself influences Exploring the cancer methylome using genome‐wide promoter analysis whether a sequence is ‘methylation‐
prone’ were given by Feltus et al. Us‐
ing seven small DNA‐sequence pat‐
terns and classification algorithms, they were able to discriminate ‘methy‐
lation‐prone’ and ‘methylation‐
resistant’ sequences with 82% accu‐
racy. Similar DNA‐pattern recognition tech‐
niques were used to predict, on a ge‐
nome‐wide scale, the methylation‐
state in the human brain (Das et al., 2006). Very recently this approach was extended: sequence patterns combined with other DNA‐sequence properties (such as predicted tran‐
scription factor sites and CpG island attributes) were used to predict the epigenetic states of all CpG‐islands in the genome (Bock et al., 2007). All these studies use certain proper‐
ties, directly derived from the primary DNA‐sequence, to predict the epige‐
netic state of that sequence. These attributes themselves differ from pa‐
per to paper and depend on the com‐
putational classification/clustering methodology. DNA sequence patterns are a commonly used sequence attrib‐
ute. In this study, we also make use of such sequence attributes to identify clus‐
ters of genes that are likely to be me‐
thylated specifically in cancer. We therefore make use of the hypothesis that the primary DNA sequence seems to be involved in the determination whether the sequence is prone to me‐
thylation in cancer or not. In the deep approach, we identify distinct pro‐
moter motifs to classify cancer‐
specifically methylated genes. For the broad approach, we make use of already known methylation markers (genes whose promoter region is aberrantly methylated) in different cancer types. This way, we are able to show that the promoter sequences of known methylation markers can be divided into clusters with sequence similarity. Genes whose promoter is also in these clusters might also be a novel, previously undescribed, methy‐
lation marker. Both methodologies were applied in a large‐scale methylation study to ex‐
perimentally validate the predicted candidate markers. After a first func‐
tional filter (transcriptional reactiva‐
tion after demethylation treatment of cancer cell lines), candidates are screened using bisulfite DNA sequenc‐
ing, conventional methylation‐specific PCR (MSP) and quantitative MSP, both in cancer cell lines and in primary tumor samples of a whole range of cancer types. 14.2 Materials and methods 14.2.1 Data sources
The Database of Transcription Start Sites (DBTSS) (Suzuki et al., 2004) Exploring the cancer methylome using genome‐wide promoter analysis
109 mapped each sequence on the human draft genome sequence to identify its transcriptional start site, which pro‐
vides us with more detailed informa‐
tion on distribution patterns of tran‐
scriptional start sites and adjacent regulatory regions. The sequences of all main promoter sequences in DBTSS (version 5.2.0 ‐ 14399 in total) were collected (‐300 to +200, with the Transcription Start Site (TSS) at +1). As literature markers, data from Pub‐
Meth is used. PubMeth (Ongenaert et al., 2008) is a database that focuses on methylation in cancer, built with the aid of textmining for completeness and afterwards manually reviewed and annotated to assure high quality. 14.2.2 Broad-analysis: genomewide promoter alignment
All 14399 promoter sequences were subsequently aligned by the MPI im‐
plementation of ClustalW with default parameters (Thompson et al., 1994; Li, 2003). TreeIllustrator (Trooskens et al., 2005) was used to visualize the complex guide tree and in addition the known methylation markers were indicated using concentric layers around the tree visualization (Figure 14.1).
110 Exploring the cancer methylome using genome‐wide promoter analysis Figure 14.1: Visualization within TreeIllustrator of the alignment of promoter regions that contain a CpG island. The red bars indicate the 56 cancer‐specific methylated seeds Statistical validation of clustering
307 known literature markers could be mapped on the alignment. To dem‐
onstrate more dense clustering than expected, two statistical validation approaches are used: a closest marker methodology and a complete distance profile. Both approaches use Monte Carlo simulations and are schematized in Figure 14.2. Exploring the cancer methylome using genome‐wide promoter analysis
111 Closest marker
Complete profile
2,0
1,0
1,5
0,8
0,6
1,0
0,4
0,5
0,2
0,0
0,0
The closest marker strategy gives a single distribution,
while the complete profile methodology’s gives one
distibution per marker.
1,0
0,8
0,6
0,4
0,2
0,0
Figure 14.2: Main difference between closest marker and complete profile methodology In the closest marker methodology, for each methylation marker, the minimal distance (number of nodes) to the closest neighboring marker is determined. This way, a distribution is obtained: the number of markers in function of the distance to the closest 112 neighboring marker. The higher the frequency of markers for a low num‐
ber of nodes, the closer the markers generally are and the more they are clustered together. Next, the cumula‐
tive distribution is calculated (for example: for three nodes, the number Exploring the cancer methylome using genome‐wide promoter analysis of markers that have their closest marker 3, 2 or 1 node away were added). In each of the 10000 simulation rounds, as many random genes as methylation markers are selected and for each selected gene, the distance to the closest other selected gene is cal‐
culated. The cumulative distribution is calculated. For each node distance, it is determined whether the value in the simulation was equal or better than the value for the markers. In the complete profile methodology, the number of nodes from one marker to all other markers was calculated. This way 307 profiles (one for each of the methylation markers) were ob‐
tained: the cumulative number of me‐
thylation markers as a function of the number of nodes away. Combining these 307 profiles, the average num‐
ber of cumulative markers per dis‐
tance is calculated. In the Monte Carlo simulations, as many random genes as markers are chosen and this is re‐
peated 1000 time and analyzed as the markers. The number of simulations that performed at least as good as the methylation markers at a certain dis‐
tance, was calculated. 14.2.3 Deep analysis: specific
binding patterns
Apart from a broad promoter align‐
ment we sought to determine if there were shorter patterns lost in the global alignment (broad), associated with known cancer‐specific methyla‐
tion. Therefore, the second (deep) part of the computational promoter analy‐
sis focused on identification of se‐
quence features able to discriminate two different functional classes (A & B) of CpG island‐containing promot‐
ers. Class A lists genes which are only methylated in cancer and not in nor‐
mal tissues, while Class B enumerates genes which are at least partially me‐
thylated in normal (predominantly imprinted genes) tissues. Details for genes in both classes is given in Sup‐
plementary table 1. For each of these genes we extracted a symmetric region of 1 kb around the predicted TSS using the DBTSS data‐
base. No significant differences in either starting position, GC‐content, length or O/E ratio were found for CpG islands of genes belonging to Class A and Class B. We exhaustively screened DNA‐
patterns using the Teiresias algorithm (Rigoutsos and Floratos, 1998) with a minimum of 7 non‐wild card nucleo‐
tides (L) and a maximal length be‐
tween two non‐wild cards of 9 nucleo‐
tides (W) which are present in at least 25% of the sequences of one Class (A or B). In the next step we applied dif‐
ferent machine learning techniques using WEKA (Frank et al., 2004) to extract those (combination of) pat‐
Exploring the cancer methylome using genome‐wide promoter analysis
113 terns for which the frequencies of occurrence allowed to discriminate Classes A and B. These patterns are then used to classify promoters ge‐
nome‐wide. Given the small size of the training set (Class A: n=18, Class B: n=15) and thereby inherent risk of overfitting, we performed additional in silico validation techniques. In-silico validation of deep approach
One of these validation techniques was to determine stability of the se‐
quence patterns during evolution. The human sequences in Class A were pairwise aligned with the orthologous sequences of the promoters in rat, mouse and chicken. The evaluation of conservation during evolution is done using a scoring mechanism. In each alignment, the number of matches and mismatches are determined for each different nucleotide (A,C,G or T). For example, the frequency that a T stays T or was aligned with A,G,C or a gap is calculated. This information is used to build a substitution matrix. Next, the different patterns are located in the alignment and the pattern score is determined: +1 for a match, ‐1 for a mismatch. This score is compared with the expected score: the score that would be expected, using the substitu‐
tion matrix (which was built using the entire sequence). The difference ob‐
served‐expected is calculated per nu‐
cleotide that the pattern spans in the alignment. 114 14.2.4 Application of both approaches and experimental
validation
Data sources
At the time the analysis happened, DBTSS contained less sequences and data from PubMeth were not yet avail‐
able. We extracted 8793 sequences from DBTSS (version 3.0 based on human assembly build 31), present on the U133A Affymetrix micro‐array. Subsequently, Newcpgreport (Olson, 2002) was used to identify CpG is‐
lands (a CpG island is defined as a region with minimal length of 200 bp, a GC content larger than 50% and the CpG (observed)/CpG (expected) (O/E) ratio is greater than 0.60) (Gardiner‐
Garden and Frommer, 1987). These conditions are slightly less stringent than the one proposed by Jones et al. (Takai and Jones, 2002). We justified these approaches because we are us‐
ing experimentally established and verified gene promoter regions (re‐
gions which are closely associated with gene expression) instead of ap‐
plying the criteria to a genome‐wide scan. This resulted in a sequence set and resulting alignment of 4,728 genes which were complemented with a set of 56 reported/known cancer‐
specifically methylated genes (seeds) chosen from published articles (To‐
kumaru et al., 2004; Yamashita et al., 2002) or own unpublished data. These 56 seeds are listed in Supplementary Exploring the cancer methylome using genome‐wide promoter analysis table 1. 28 of the 56 genes added were already present on the list of 4,728 genes. These 56 seeds were selected as being validated as cancer‐
specifically methylated (methylated in cancer, not in the normal tissue). This is not necessarily the case in the data in PubMeth. Therefore, the success rate of finding also cancer‐specifically methylated genes was expected to be higher than taking all promoters and all literature markers into account. Reactivation filter
identified. For further analysis we considered only those genes which are present in DBTSS and for which at least 1 reactivation event was ob‐
served. In total, 22 human cancer cell lines of five different cancer types were used in this filter (breast: BT‐20, MCF‐7, MDA‐MB 231 and MDA‐MB 436; prostate: 22rv1, DU145, LANCap, PC3; lung: HTB‐58, HTB‐59, A549 and H23; colon: DLD‐1, HCT116‐p53+/+, HCT116‐p53‐/‐, RKO‐p53+/+, RKO‐E6 (functional p53‐null) and SW480; and cervix: Hela, Siha, CSCC7 and CSCC8). We want to experimentally verify candidate novel markers from both the broad and the deep approach in cell lines and primary cancer tissue. Before proceeding to this experimen‐
tal validation step, an additional filter‐
ing step was introduced to decrease the number of genes to be tested and to increase the success rate of finding highly cancer‐specific markers. Such a marker is highly methylated in cancer, and thus transcriptionally silenced while it is transcribed in normal tis‐
sue. Experimental validation
The filtering step is based on the es‐
tablished pharmacological unmasking approach. Using the de‐methylation agent 5‐aza‐2'‐deoxycitidine (5‐aza‐
dC) and expression micro‐arrays, transcripts that are silenced in the (untreated) cancer cell lines but are reactivated upon methylation removal due to 5‐aza‐dC treatment, can be Using sequences of 14399 promoter regions, we are able to show that the 307 known (literature) methylation markers that could be mapped on the alignment visualization are more densely clustered together than one would expect if they were randomly distributed. This is statistically proven In order to validate the remaining 175 genes, we designed primers for each gene and tested each one by bisulfite sequence analysis, combined bisulfite restriction analysis (COBRA), and/or MSP in one or more cell line that ex‐
hibited reexpression after demethyla‐
tion treatment. 14.3 Results 14.3.1 Broad-analysis
Exploring the cancer methylome using genome‐wide promoter analysis
115 by Monte‐Carlo simulations (up to 10000 simulation rounds). Two statistical validation strategies were applied: both approaches use Monte Carlo simulations to demon‐
strate more dense clustering of methy‐
lation markers than one would expect if the markers were randomly distrib‐
uted across the alignment. In the ‘nearest marker’ strategy, the fre‐
quency of the methylation markers Known literature markers
95 % percentile simulations
median simulations
120
Cumu at ve number of markers
with a small ‘distance’ (the minimal number of nodes in the tree structure) to the closest other methylation marker is higher than expected if the markers would be randomly chosen. In less than 5 percent of the simula‐
tions, the expected frequency is equal or better compared with methylation markers. This observation is valid up to seven nodes as shown in Figure 14.3.
80
40
0
1
2
3
4
5
6
Number of nodes
7
8
9
Figure 14.3: Results of closest markers – the cumulative number of markers versus the 95 percentile and median of 10000 Monte Carlo simulations. Up to seven nodes, less than 5 % of the simulations perform at least as good as the methylation markers About a third of the known markers can be found in such a cluster where the minimal distance between two 116 markers is less or equal to seven nodes. In less than 5% of the simula‐
tions with randomly chosen genes at Exploring the cancer methylome using genome‐wide promoter analysis least the same clustering degree is detected. The other validation strategy (complete profile) shows similar re‐
sults: for 8 nodes or less, less than 5 % of the simulations had an equal or better performance compared to the markers (Supplementary figure 1). Both statistical validation strategies show that the literature markers are more clustered than one would expect if randomly distributed across the alignment. It is very unlikely that the observed degree of clustering occurs by accident. The promoter alignment is thus, in a certain degree, able to cluster methylation‐prone sequences, indicating there might be sequence elements or features (closely related with the TSS) that contain information about whether the sequence is methy‐
lation‐prone during cancer‐
development or not. Applications to use the broad approach in experimental studies
used to screen candidate methylation markers, based on initial experimental results. An example of such an applica‐
tion is micro‐array expression data of cell lines before and after treatment with demethylating agents. The probes/genes that show to be reacti‐
vated after treatment are possibly interesting candidates. However, there are many probes/genes that show reactivation. Screening these candi‐
dates using the broad approach could be helpful to prioritize genes. To easily perform this kind of analy‐
ses, we have created an easy to use web‐interface that combines the data in the promoter alignment and Pub‐
Meth. Input are the candidate genes the user would like to check, output is a table with an overview which mark‐
ers are closely related in the alignment and in which cancer types they are described in as being methylated . The web‐interface (example of results in Figure 14.4) is freely available at http://matrix.ugent.be/promoter/. Once the genome‐wide alignment is made, this methodology can easily be Exploring the cancer methylome using genome‐wide promoter analysis
117 Gene
TNFRSF10D
CADM1
CRABP1
Marker
Gene is marker
in
1N
1 node markers
2N
2 nodes markers
3N
1
brain, breast,
endocrine, lung,
mesothelioma,
neuroblastoma,
prostate
0
-
0
-
1
1
anal, breast,
cervical,
colorectal,
endometrial,
gastric, head
and neck,
leukaemia, lung,
nasopharyngeal,
oesophaegeal,
ovarian,
pancreas
0
-
0
-
0
1
colorectal,
lymphoma
3 nodes markers
ALX4
bile duct,
colorectal,
gastric,
oesophaegeal
-
4N
4 nodes markers
5N
0
-
0
0
1
PLAGL1
breast,
gastric,
ovarian
5 nodes markers
6N
6 nodes markers
7N
7 nodes markers
-
0
-
0
-
-
0
-
0
-
NMU
head and
neck
0
-
0
-
0
-
0
-
0
-
0
-
2
THBS4
lymphoma
Figure 14.4: Example of the result of the application of the broad approach on user data. For each candidate, neighboring methylation markers are searched in the alignment and displayed with the cancer types they have been identified in (with links back to PubMeth to see literature details). 14.3.2 Deep-analysis
The following seven motifs (GGGC*GC*C, GCC*GCAC, CTGGG*GA, CCC**GCGCC, AGCTG**CT, A*GGC*GGG, A*CGC*GCC; where * represents every nucleotide) were found to be overrepresented in Class A (cancer‐specific methylation) versus Class B (tissue‐specific methylation). Classification validation was done by 10‐fold stratified cross validation. Using different classification tech‐
niques, we obtained up to 100% preci‐
sion and specificity. Screening 8,793 genes extracted from DBTSS, 261 genes were identified with at least four different motifs in the 1 kb region around the TSS. This cut‐off is chosen arbitrarily, mainly to raise the success rate and to be stringent enough to obtain a feasible amount of selected genes. To avoid the risk of overfitting 118 in the classification and to demon‐
strate the potential biological impor‐
tance of these motifs, we performed additional in silico validation tech‐
niques. First, the chromosomal location of the 7 motifs was examined. The seven patterns were highly enriched in CpG islands. Even when taking the CG‐
content in account, the patterns oc‐
curred around twofold more than expected in CpG islands. Within the promoter region around the TSS, the patterns were found to be overrepre‐
sented in a small window of 250 bp around the TSS, even though a 1 kb window had been used to find the patterns. This observation supports the potential role for methylation in regulating initiation of transcription. Exploring the cancer methylome using genome‐wide promoter analysis Secondly, the orthologous sequences of the promoters of genes in Class A in rat, mouse and chicken were aligned with the human promoter sequence. This alignment showed that the pat‐
terns were more conserved in the rat and the mouse as compared with the complete promoter sequence. This conservation was less clear to non‐
existing in chicken. the minimal distance between two markers is less or equal to seven nodes. In less than 5% of the simula‐
tions with randomly chosen genes at least the same clustering degree is detected. This result shows that ele‐
ments in the primary DNA sequence around the TSS at least partially de‐
termine whether the gene could be methylated in cancer. The results are displayed in Supple‐
mentary figure 2. A control pattern (a pattern that occurred in about the same frequency in both Classes A and B and had a comparable CG‐content as the selected motifs– GCC*GGGC,) was as stable in the evolution as the rest of the surrounding sequence. This is an indication that the motifs could have a biological function as they are better conserved during evolution than the surrounding promoter sequence. In the complete profile methodology, the number of nodes from a marker to all other markers was taken into ac‐
count, this way 307 profiles (one for each of the methylation markers) were obtained: the cumulative num‐
ber of methylation markers as a func‐
tion of the number of nodes away. Per node, the average number of cumula‐
tive markers is calculated. Next, as many random genes as markers are chosen and this is repeated 1000 times. Next, per simulation, the aver‐
age number of cumulative selected genes is calculated per node. How many of the simulations performed at least as good as the real markers, was calculated. For 8 nodes or less, less than 5 % of the simulations had an equal or better average number of genes compared to the markers (Sup‐
plementary figure 1). Since this biological function might be a transcription factor binding site, we compared the 7 different motifs with a transcription factor binding site data‐
base (TRANSFAC 6.0) (Krull et al., 2003) using a Kullback‐Leibler based distance measure. Although some transcription factor binding sites are similar (MEF‐2, AML‐1, ER, GR, Pax‐2, Sp‐1, EGR‐3, RFX‐1, HNF‐4), the 7 motifs might serve as novel transcrip‐
tion factor binding sites. About a third of the known markers can be found in such a cluster where Both strategies show that the litera‐
ture markers are more clustered than one would expect if randomly distrib‐
uted across the alignment. The pro‐
moter alignment is in a certain degree Exploring the cancer methylome using genome‐wide promoter analysis
119 able to cluster methylation‐prone sequences, indicating there might be sequence elements or features that contain information about whether the sequence is methylation‐prone during cancer‐development or not. 14.3.3 Application: Marker identification and experimental
validation of proposed
markers
Some regions on the circle‐
representation of the guide tree of the alignment seemed to be more dense in known markers than others, indi‐
cating that there might be a sequence mechanism located in the small region around the Transcriptional Start Site (TSS) which makes certain genes more methylation‐prone. We would like to stress that due to the nature of the guide‐tree and the assumptions made in the visualization routine we are not able to make general phylogenetic inferences from this tree. Of 4,756 sequences used for the alignment, 245 were found clustered (less than 5 nodes) with the 56 known genes me‐
thylated in cancer but not in normal tissues (selected genes listed in Sup‐
plementary Table 1). We then excluded 132 genes which did not pass the reactivation filter or were already reported to be cancer‐
specifically methylated, leaving 113 genes (245‐132) which were located close to known markers. Combined 120 with the data from the deep approach (261, of which 97 pass the reactivation filter), 200 genes (10 genes in are in common) are selected and pass the reactivation experiments. Of these 200 candidate markers, 25 known methy‐
lated genes (as found by literature search) are omitted as this study fo‐
cuses on undescribed methylation markers. The remaining 175 genes are first tested on 22 human cancer cell lines of 5 different cancer types. 82 genes (47 %) were indeed methylated in at least one cell line. To determine if the me‐
thylated genes in cancer cell lines were cancer‐specific, we investigated promoter methylation in a limited number (n=10‐15 for tumors, n=2‐12 for normal tissue) of various primary tumors and age‐matched normal tis‐
sues by bisulfite sequence analysis, COBRA, and/or MSP. Out of 82 genes which showed methy‐
lation in cell lines, promoter methyla‐
tion was detected in 53 (65%) genes in primary tumor tissues. After testing corresponding age‐matched normal tissues, 28 of these genes were identi‐
fied to be methylated in a cancer‐
specific manner. In total, 28/175 (16%) novel cancer‐specific methy‐
lated genes were identified through our combination of a computational approach and empiric studies. Exploring the cancer methylome using genome‐wide promoter analysis This selection and validation flow is given in Figure 14.5. A summary of our analysis of all 175 genes is given in detail in Table 15.1. As the experimen‐
tal results greatly add to the knowl‐
edge of the cancer methylome in vari‐
ous cancertypes, the full results are discussed in a separate paper (Hoque et al., 2008)(see. Chapter 15: Ge­
nome­wide promoter analysis un­
covers portions of the cancer me­
thylome). In this follow‐up paper, apart from the overall results intro‐
duced here, the performance of the eight most promising novel methy‐
lated genes is discussed in 13 different cancer types. DBTSS
8793 promoter
regions
200 genes
175 potenally novel methylated genes
25 known
methylated genes
CpG island predicon
4728
82/175
promoter
regions
methylated in cell lines
Sequence filter
Deep
261 genes
10
genes
Broad
245 genes
53/82
methylated in
primary ssues
Micro array
~ 14500 genes
22 cell lines
Reacvaon filter
28/53
cancer-specific methylaon
25/53
Tissue -specific methylaon
Figure 14.5: Schematic flow illustrating the selection and validation flow 14.4 Discussion Most studies on DNA methylation in cancer have focused on a candidate gene approach where a tumor sup‐
pressor or previously reported methy‐
lated gene is tested in another type of cancer. Although a number of studies Exploring the cancer methylome using genome‐wide promoter analysis
121 have attempted to detect additional gene targets, in general, the method‐
ologies have not been sensitive enough to identify a representative population of defined gene sequences, large enough for bioinformatics analy‐
sis. However, in epigenetics, there is a fast adaptation and development of com‐
putational approaches (Bock and Len‐
gauer, 2008). Some of the studies dis‐
cussed, show that it is possible to pre‐
dict whether a promoter has a higher chance to be methylated in certain circumstances. We developed two approaches to be able to enrich to‐
wards cancer‐specific methylation. The deep approach uses sequence motifs. This pattern‐based approach is similar to the other articles discussed earlier. This methodology was used before in a similar application, and four of the patterns we identified were about 90% identical to the two best “methylation‐proneness” motifs as reported by Feltus et al. (Feltus et al., 2003). The motifs are: AGCTG**CT; A*GGC*GGG; GGGC*GC*C; GGCTGCGGGGGCAGCAGCTG; CTGGG*GA; AAGAAGGGAGAGAAG‐
GAGGAA. The patterns might be binding sites for transcription factors or only be indi‐
rectly involved. Such an indirect effect is another epigenetic mark: histone modification. Recently, it became clear 122 that the Polycomb group (PcG) of pro‐
teins catalyze the addition of a methyl group at lysine 27 of histone H3 (H3K27me) (Margueron et al., 2005). Polycomb‐mediated gene silencing is initiated by methylation of H3K27 by EZH2, a component of a protein com‐
plex (PRC2). DNA‐methylation and H3K27me seem to be interconnected (Villa et al., 2007), and thus it could be possible that the sequence patterns discussed here might be recognized by the polycomb group of proteins and the DNA is methylated in a later stage. On the other hand, the broad approach using a genome‐wide alignment is innovative. This methodology is intui‐
tive, allows fast candidate selection and does not depend on the chosen sequence attributes or classification strategies. This is an advantage, as most described selection strategies rely on the training of datasets using various methods (support vector ma‐
chines, neural networks,…) and are based on certain sequence attributes. Apart from the fact that the used clas‐
sifiers are often very complex to un‐
derstand, one has to decide which sequence attributes to use and there is always a risk of overfitting. In addi‐
tion, the results of the broad analysis are not influenced by the chosen clas‐
sification methodology. This allows completely unbiased selec‐
tion of candidate markers. We have used the broad selection strategy in Exploring the cancer methylome using genome‐wide promoter analysis other recently started projects as well (more specific in neuroblastoma and cervical cancer). The results are prom‐
ising and experimental validation studies are currently ongoing. Through the web‐interface we pro‐
vide, users are able to screen their own methylation candidates in order to prioritize them for experimental validation. Disadvantage of the broad approach is that the informative sequence attrib‐
utes, are now in fact hidden. Which parts or features in the sequence might be important in controlling epigenetic modifications, cannot be extracted. Our analysis shows that many methy‐
lated genes are located within defined genomic clusters and are associated with common sequence motifs. These findings strongly suggest that the me‐
thylation modification process does not occur in a random manner. A large‐scale experimental validation study was performed. This study com‐
bines both computational selection strategies with an established methy‐
lation screening assay. Predicted can‐
didate‐markers were filtered: their expression had to be reactivated after treatment with demethylating agents in at least one cancer cell line. The success rate of the broad ap‐
proach seems to be higher than the deep approach. Originally, in the deep approach, 261 genes were selected, of which only about 25 % passed the reactivation filter. Of these remaining genes, 44 % is actually methylated in cancer cell lines. In the broad approach 245 genes were selected, of which 43 % passes the reactivation filter. Once after the reactivation filter, the success rates are about identical: 45 % of the selected genes is actually me‐
thylated in cancer cell lines. This shows that proximity in the genome‐
wide alignment is more successful than occurrence of certain sequence motifs. Probably there are other se‐
quence attributes than the selected DNA‐patterns that are better suited for methylation prediction. Other rea‐
son might be that the deep approach was used to identify a whole range of promoters, even without CpG‐island; while in the alignment only promoters with a clear CpG‐island were used and thus had a higher chance of being ac‐
tually methylated. Despite these ob‐
servations, the success rate after the reactivation filter is similar for the two approaches. Striking is that there is almost no overlap of the two ap‐
proaches meaning the combination of the approaches adds valuable infor‐
mation. Exploring the cancer methylome using genome‐wide promoter analysis
123 By developing a new methodology to analyze gene promoters in combina‐
tion with a relatively large expression microarray dataset, it has been possi‐
ble for the first time to identify a large number of target genes in a com‐
pletely unbiased manner, allowing examination of the logic of de novo methylation. In our experience, this is a major advance over previous em‐
piric techniques that required exces‐
sive experimental effort and yielded only a few (<0.5%) cancer‐specific methylated genes (Suzuki et al., 2002; Yamashita et al., 2002). Our yield, based on a combination of re‐
expression arrays and promoter se‐
quence filters provided a nearly 500‐
fold higher yield of genes harboring promoter methylation. The computa‐
tional approach drastically narrows down the candidate gene list, com‐
pared with directly testing reactivated genes from the pharmacologic un‐
masking. Both computational strate‐
gies, combined with the reactivation filter, raise the success rate, and make it able to find novel methylation mark‐
ers within a set composed of a feasible number of genes to test. There has been much discussion about which genes should be the focus of future efforts for methylation analysis. Our results suggest that many genes not previously implicated in cancer are methylated at significant levels and may provide novel clues to cancer pathogenesis. From our data, it seems that large‐scale unbiased screens of genes with CpG‐rich promoter regions may yield many new genes and pat‐
terns for future explorations. The experimental validation study demonstrates that computational approaches can predict for cancer‐
specific methylated genes and reduce empiric testing. These approaches are likely to improve as more methylated genes are discovered and character‐
ized. The experimental analysis con‐
tributes greatly to the emerging epi‐
genomic map of DNA methylation in the human genome. Complete, detailed results are available in a separate paper (Hoque et al., 2008): see Chap­
ter 15: Genome­wide promoter analysis uncovers portions of the cancer methylome. Additional studies using similar and complementary genomic strategies should yield further insights into the dynamics and hierarchy of epigenetic regulation during tumorigenesis. These data could define the epigenetic landscape of major human cancer types, provide new targets for diag‐
nostic and therapeutic intervention, and open fertile avenues for basic research in tumor biology. 124 Exploring the cancer methylome using genome‐wide promoter analysis Chapter 15: Genome­wide promoter analysis uncovers portions of the cancer methylome Paper 6: Genome­wide Promoter Analysis Uncovers Portions of the Cancer Methylome Hoque MH, Kim MS, Ostrow KL, Liu J, G. Wisman GBA, Park HL, Poeta ML, Jeronimo C, Lendvai HA, Schuuring E, Begum S, Rosenbaum E, Ongenaert M, Yamashita K, Califano J, Westra W, van der Zee AGJ, Van Criekinge W, Si­
dransky D. As published in Cancer research, 68 (8): 2661­2670 DNA methylation has a role in mediating epigenetic silencing of CpG island genes in cancer and other diseases. Identification of all gene promoters methy­
lated in cancer cells “the cancer methylome” would greatly advance our under­
standing of gene regulatory networks in tumorigenesis. We previously described a new method of identifying methylated tumor suppressor genes based on pharmacologic unmasking of the promoter region and detection of reexpression on microarray analysis. In this study, we modified and greatly improved the selection of candidates based on new promoter structure algorithm and micro­
array data generated from 20 cancer cell lines of 5 major cancer types. We identified a set of 200 candidate genes that cluster throughout the genome of which 25 were previously reported as harboring cancer specific promoter methylation. The remaining genes 175 were tested for promoter methylation by bisulfite sequencing or methylation­specific PCR (MSP). 82 of 175(47%) genes were found to be methylated in cell lines and 53 of these 82 genes (65%) were methylated in primary tumor tissues. From these 53 genes, cancer specific me­
thylation was identified in 28 genes (28/53) (53%). Furthermore, we tested 8 of the 28 newly identified cancer specific methylated genes with quantitative MSP in a panel of 300 primary tumors representing 13 types of cancer. We found cancer­specific methylation of at least one gene with high frequency in all can­
cer types. Identification of a large number of genes with cancer specific methyla­
tion provides new targets for diagnostic and therapeutic intervention, and opens fertile avenues for basic research in tumor biology. Genome‐wide promoter analysis uncovers portions of the cancer methylome 125 15.1 Introduction Solid human tumors arise and pro‐
gress through aberrant function of various genes that positively and negatively regulate many aspects of cell function, including proliferation, apoptosis, genome stability, angio‐
genesis, invasion and metastasis (Hanahan and Weinberg, 2000). Dis‐
covery and functional assessment of these genes is essential for under‐
standing the biology of cancer and for clinical applications, including identi‐
fication ofa therapeutic targets, early cancer detection and improved predic‐
tion of cancer risk and disease course. Many factors can affect gene function, including genetic alterations as well as epigenetic modifications. Epigenetic modifications are defined as all meiotically and mitotically heri‐
table changes in gene expression that are not coded in the DNA sequence itself. Methylation of the C5 positions of cytosine residues in DNA has long been recognized as an epigenetic si‐
lencing mechanism of fundamental importance (Holliday and Pugh, 1975; Riggs, 1975). DNA methylation alters chromosome structure, inhibits the binding of proteins such as CTCF (a candidate tumor suppressor protein that binds to highly divergent DNA sequences), and defines regions of transcriptional regulation (Hark et al., 2000) . DNA methylation can also promote the binding of proteins such 126 as MECP2, MBD1, MBD2, MBD3 and MBD4, which induce histone modifica‐
tion (Jaenisch and Bird, 2003). CpG dinucleotides are found at in‐
creased frequency in the promoter region of many genes, and methylation in the promoter region is frequency associated with “gene silencing”; i.e., the gene is not expressed in the pres‐
ence of methylation but is expressed in its absence (Leonhardt et al., 1999). Both global hypomethylation and gene‐specific promoter hypermethyla‐
tion are associated with malignancy (Ehrlich, 2002; Momparler, 2003), and studies in animals and in humans have demonstrated that these epigenetic changes are an early event in carcino‐
genesis and are present in the precur‐
sor lesions of a variety of cancers in‐
cluding lung (Belinsky et al., 1998), head and neck (Hoque et al., 2005a), and colon (Esteller et al., 2000). Challenges in analyzing CpG island (CGI) methylation include distinguish‐
ing islands from repetitive DNA se‐
quences, which are usually heavily methylated, and to identify those that regulate gene expression. In an effort to identify important tumor suppres‐
sor genes silenced by promoter me‐
thylation, genome‐wide screening techniques to detect differences in DNA methylation were developed. Many of these studies documented that when CGI methylation in pro‐
moter regions is appropriately vali‐
Genome‐wide promoter analysis uncovers portions of the cancer methylome dated, expression of downstream genes is almost always found to be severely repressed or absent (Suzuki et al., 2002; Yamashita et al., 2002). However, in searching for markers of early cancer detection and prognosis, promoter methylation does not neces‐
sarily need to correlate with severely reduced expression, as long as the methylation pattern is specific to neo‐
plastic cells and is associated with clinically important information. In this study, we used advanced bioin‐
formatics tools (Ongenaert et al., 2007; submitted to PLoS computa‐
tional biology) and robust datasets from cancer cell lines treated with demethylating agents to identify novel cancer specific methylated genes. We then used bisulfite DNA sequencing, conventional methylation‐specific PCR (MSP) and quantitative MSP (QMSP) to confirm cancer‐specific methylation in a large number of novel genes. Our results confirm computational predic‐
tion of methylated CpG sites in cancer through extensive experimentation. Moreover, this approach has greatly expanded our knowledge of methy‐
lated promoters in cancer cell lines and primary tumors and has led to the discovery of a substantial portion of “the cancer methylome’ and sets the stage for rapid and full elucidation of methylated gene targets and pathways in human cancer and will likely lead to the rapid development of diagnostic markers and new targets for therapeu‐
tic intervention. 15.2 Materials and methods 15.2.1 Cell lines
We used 20 different human cancer cell lines (breast: BT‐20, MCF‐7, MDA‐
MB 231 and MDA‐MB 436; prostate: 22rv1, DU145, LANCap, PC3; lung: HTB‐58, HTB‐59, A549 and H23; co‐
lon: DLD‐1, HCT116, RKO and SW480; and cervix: Hela, Siha, CSCC7 and CSCC8) in this study. Cell lines were propagated in accordance with the instructions from American Type Cul‐
ture Collection. Details of the cell lines and their cell of origin are given in Supplementary Table 1. 15.2.2 5‐aza‐dC treatment of cells We seeded all cell lines (1x106) in their respective culture medium and maintained them for 24 h before treat‐
ing them with 5 µM 5‐aza‐dC (Sigma, St. Louis, MO) for 3 days. We renewed medium containing 5‐aza‐dC every 24 h during the treatment. We handled control cells the same way, without adding 5‐aza‐dC. Stock solutions of 5‐
aza‐dC were dissolved in phosphate buffer saline PBS (pH 7.5). We pre‐
pared total RNA using the RNeasy Mini Kit (Qiagen). Genome‐wide promoter analysis uncovers portions of the cancer methylome 127 15.2.3 Biotinylated RNA Probe
Preparation and Hybridization
Several versions of Affymetrix arrays were used for gene expression profil‐
ing per the manufacturer's instruction. Hu95A.V2 arrays containing 12,500 human genes were used for the 2 lung squamous cancer cell lines. HGU 133 plus 2 arrays with more than 55,000 probes for analysis of over 47,000 human transcripts were used for pro‐
filing the 4 cervical cancer cell lines. For the remaining 16 cell lines we used GeneChip Human Genome U133A Arrays containing over 22,000 probe‐
sets for analysis of over 18,400 tran‐
scripts, which include ~14,500 well‐
characterized human genes. Seven µg of total RNA were used for the preparation of double‐stranded cDNA using a Superscript choice sys‐
tem and an oligo(dT)24‐anchored T7 primer (Invitrogen). The cDNA was then used as a template to synthesize a biotinylated cRNA for 5 h at 37°C with aid of the BioArray High Yield RNA Transcript Labeling Kit (Enzo Diagnos‐
tics, Inc.). In vitro transcription prod‐
ucts were purified using RNeasy spin columns (Qiagen). Biotinylated RNA was then treated for 35 min at 94°C in a buffer composed of 200 mM Tris‐
acetate (pH 8.1), 500 mM potassium acetate, and 150 mM magnesium ace‐
tate. Affymetrix array chips were hy‐
bridized with biotinylated cRNA (15 128 µg/chip) for 16 h at 45°C using the hybridization buffer and control pro‐
vided by the manufacturer (Affy‐
metrix). The GeneChip Fluidics Station 400 (Affymetrix) was used for washing and staining the arrays. A three‐step pro‐
tocol was used to enhance the detec‐
tion of the hybridized biotinylated cRNA. First, arrays were incubated with a streptavidin‐phycoerythrin conjugate followed by labeling with an anti‐streptavidin goat‐biotinylated antibody (Vector Laboratories) and then were stained again with the streptavidin‐phycoerythrin conjugate. The chips were then scanned using a Hewlett Packard scanner. The excita‐
tion source was an argon ion laser, and a photomultiplier tube detected the emission through a 570‐nm long pass filter. Digitized image data were proc‐
essed using the GeneChip software (version 3.1) available from Affy‐
metrix. 15.2.4 Analysis of Expression Data
We computed gene expression sum‐
mary values for Affymetrix GeneChip data using the bioconductor package (Gentleman et al., 2004) (which uses background adjustment, quantile normalization and summarization). Raw data quality was assessed using intensity plots and RNA degradation plots (data not shown). In a second Genome‐wide promoter analysis uncovers portions of the cancer methylome stage, the retained data sets for each cell line of each cancer type were normalized using the MAS5 algorithm (Affymetrix software). We also nor‐
malized among the cell lines of each cancer type and among cell lines of all cancer types analyzed (data not shown). The expression calls 'P' (present) 'M', (marginal) and 'A' (absent) were used to prioritize the different probes/genes using the relaxation ranking algorithm. 'P' in the 5‐aza‐dC treatment data sets was assigned a score of 1 (P‐score) and 'A' in the non‐
treatment data sets was assigned a score of 1 (A‐score). For each probe/gene, the expression score was calculated as the sum of the P‐score and A‐score. We then used the previ‐
ously published algorithm to select candidate genes (Yamashita, Upadhyay et al. 2002) modified by further selection of promoters with structural and sequence similarities to genes empirically found to be methy‐
lated (Ongenaert et al., 2008). Brief descriptions of this approach are as follows: 15.2.5 BROAD analysis: Genome-wide Promoter
Alignment
This methodology has been explained in 14.2.2 Broad‐analysis: genome‐
wide promoter alignment 15.2.6 DEEP analysis: Specific
Binding Patterns
The methodology is as in 14.2.3 Deep analysis: specific binding patterns. 15.2.7 Tissue samples and DNA
extraction
We evaluated tissue samples from 13 different types of primary cancers (total 300 human samples). Tissue samples from 106 age‐matched indi‐
viduals without a history of malig‐
nancy were used as controls. Tissue samples were microdissected to isolate more than 70% epithelial cells in both neoplastic and non‐
neoplastic tissues. Cell pellets were digested with 1% SDS and 50 μg/ ml proteinase K (Boehringer Mannheim, Germany) at 48°C overnight, followed by phenol/chloroform extraction and ethanol precipitation of DNA as previ‐
ously described (Hoque, Lee et al. 2003). 15.2.8 Bisulfite Genomic Sequence Analysis, Conventional MSP, QMSP Bisulfite sequence analysis was per‐
formed to determine the methylation status in cell lines and a limited num‐
ber of tissues including primary tu‐
mors and age‐matched normal con‐
trols from the same organ. We ex‐
Genome‐wide promoter analysis uncovers portions of the cancer methylome 129 tracted genomic DNA as above and carried out bisulfite modification of genomic DNA as described previously (Hoque et al., 2006)‐treated DNA was amplified for the 5' region that in‐
cluded at least a portion of the CGI within 1 kb of the proposed Transcrip‐
tional start site (TSS) using primer sets (Supplementary Table 5). The primers for bisulfite sequencing were designed to hybridize to regions in the promoter without CpG dinucleotides. PCR products were gel‐purified using the QIAquick Gel Extraction Kit (Qiagen) according to the manufac‐
turer’s instructions. Each amplified DNA sample was sequenced by the Applied Biosystems 3700 DNA ana‐
lyzer using nested, forward or reverse primers and BD terminator dye (Ap‐
plied Biosystems, Foster City, CA). When necessary, MSP primers were designed to amplify methylated or unmethylated DNA. For high throughput analysis we de‐
veloped Quantitative Methylation Specific PCR (QMSP) for 8 genes. Briefly, bisulfite‐modified DNA was used as template for fluorescence‐
based real‐time PCR, as previously described (Hoque et al., 2005b) . Am‐
plification reactions were carried out in triplicate in a volume of 20 μL that contained 3 µL bisulfite‐modified DNA, 600 nM forward and reverse primers, 200 nM probe, 5 U of Platinum Taq polymerase (Invitrogen), 200 μM each of dATP, dCTP, dGTP,200 μM dTTP; 130 and 2.5 to 5.5 mM MgCl2. Primers and probes were designed to specifically amplify the promoters of the 8 genes of interest and the promoter of a ref‐
erence gene, actin‐B (ACTB). Primer and probe sequences and annealing temperatures are provided in Supple‐
mental Table 6. Amplifications were carried out using the following profile: one step at 95ºC for 3 minutes, 50 cycles at 95ºC for 15 seconds, and 60 ºC to 62 ºC for 1 minute. Supplemental Table 7 lists the 8 genes whose pro‐
moters were examined by QMSP, their proposed functions, and the tumors in which these promoters have been shown to be hypermethylated. Ampli‐
fication reactions were carried out in 384‐well plates in a 7900 Sequence Detector (Perkin‐Elmer Applied Bio‐
systems) and were analyzed by SDS 2.2.1 (Sequence Detector System) (Applied Biosystems). Each plate in‐
cluded patient DNA samples, positive (in vitro methylated leukocyte DNA) and negative (normal leukocyte DNA or DNA from a known unmethylated cell line) controls, and multiple water blanks. Leukocyte DNA from a healthy individual was methylated in vitro with excess SssI methyltransferase (New England Biolabs Inc., Beverly, MA) to generate completely methy‐
lated DNA, and serial dilutions (90‐
.009 ng) of this DNA were used to construct a calibration curve for each plate. All samples were within the assay’s range of sensitivity and repro‐
ducibility based on amplification of Genome‐wide promoter analysis uncovers portions of the cancer methylome the internal reference standard (CT value for ACTB of 40 or less). The rela‐
tive level of methylated DNA for each gene in each sample was determined as a ratio of methylation‐specific PCR for the amplified gene to ACTB and then multiplied by 1000 for easier tabulation (average value of triplicates of gene of interest/average value of triplicates of ACTB × 1000). The sam‐
ples were categorized as unmethy‐
lated or methylated based on detec‐
tion of methylation above a threshold set for each gene. This threshold was determined by analyzing the levels and distribution of methylation (if any) in normal (non‐neoplastic) age‐
matched tissues and by maximizing the sensitivity and specificity. 15.3 Results We modified the methylated gene discovery algorithm that were applied in our previous studies (Kim et al., 2006; Sjoblom et al., 2006; Tokumaru et al., 2004; Yamashita et al., 2002)which required excessive ex‐
perimental effort and time for a rela‐
tively small yield through a process of inclusion of only those targets with similar promoter pattern with known cancer specific methylated genes. Briefly, we used two selection rules to identify candidate methylated genes in the Human Genome. From the Data‐
Base of Transcriptional Start Sites (DBTSS), we identified genes with well‐characterized transcription start sites (TSS) included on Affymetrix expression microarrays. We then de‐
veloped a bioinformatics approach based on two criteria to predict can‐
cer‐specific methylated genes. In one part of the analysis, we assumed se‐
quence homology in the promoter regions of known methylation‐prone genes and the estimated sequence length containing CpG islands (CGIs). In a further analysis, we identified 7 overrepresented sequence patterns in a learning set of known cancer‐specific methylated genes versus tissue‐
specific methylated genes. We then applied these patterns to real data sets generated in 20 cancer cell lines from the most common types of cancer treated with 5 µM 5‐aza‐2'‐
deoxycytidine (5‐aza‐dC) to reactivate gene transcripts silenced by promoter methylation. After treatment, we measured changes in gene expression using Affymetrix microarrays. The gene filtering approach and data analysis are depicted in Figure 15.2, a chromosomal map of the 200 initially selected genes is given in Figure 15.1. Genome‐wide promoter analysis uncovers portions of the cancer methylome 131 Figure 15.1: Chromosomal map of the 200 selected genes 132 Genome‐wide promoter analysis uncovers portions of the cancer methylome Figure 15.2: Flowchart for selection of candidate tumor suppressor genes (TSGs). We used 20 cancer cell lines of 5 different major cancers to screen for candidate TSGs after microarray analysis of cells treated with 5 μM 5‐aza‐dC treatment (reactivation filter). Coupling the sequence and reactivation filters, we obtained over 200 unique genes. We diminished the number of candidates by excluding 25 genes which were previously re‐
ported as methylated. By empiric testing, we found 53 genes which harbored promoter hypermethylation in primary tumor tissues by direct sequence analysis or MSP. Twenty‐
five of these 53 genes were methylated in both normal and tumor tissues. Twenty‐eight of these 53 genes showed cancer‐specific methylation. QMSP was developed for 8 genes for high throughput analysis in multiple tissue types. CE=Cervical Cancer, LU=Lung cancer, HN=Head & Neck Cancer, TH=Thyroid cancer, CO=Colon cancer, PA=Pancreatic Cancer, GA=Gastric Cancer, ES= Esophageal Cancer, OV=Ovarian cancer, BR=Breast Cancer, BL=Bladder cancer, PR=Prostate Cancer, KI=Kidney cancer We considered a gene to be reacti‐
vated if reexpression occurred in at least one cell line of any particular cancer type. 15.3.1 Validation of Modified
Approach in Cell Lines
Out of the 200 genes predicted to be methylated by our modified approach, 25 genes were identified as reported Genome‐wide promoter analysis uncovers portions of the cancer methylome 133 to harbor cancer‐specific methylation after a literature search (Pubmed search words: [particular gene name] and [methylation]). In order to vali‐
date the remaining 175 genes, we designed primers for each gene and tested each one by bisulfite sequence analysis, combined bisulfite restriction analysis (COBRA), and/or MSP in one or more cell line that exhibited reex‐
pression after demethylation treat‐
ment. Promoter methylation of 82 genes (82/175) (47%) was docu‐
mented based on identification of ≥50% methylated CpG sites in the CGI in contrast to 10% to 20% in previous algorithm (Tokumaru et al., 2004; Yamashita et al., 2002). 15.3.2 Promoter Hypermethylation in Normal and Primary Tumor Tissues
To determine if the methylated genes in cancer cell lines were cancer‐
specific, we investigated promoter 134 methylation in a limited number (n = 10‐15 for tumors, n = 2‐12 for nor‐
mals) of various primary tumors and age‐matched normal tissues by bisul‐
fite sequence analysis, COBRA, and/or MSP (Supplementary Table 8). Out of 82 genes which showed methylation in cell lines, promoter methylation was detected in 53 (65%) genes in primary tumor tissues. After testing corresponding age‐matched normal tissues, 28 of these genes were identi‐
fied to be methylated in a cancer‐
specific manner. Thus, 28/175 (16%) new cancer‐specific methylated genes were identified through our combina‐
tion of a computational approach and empiric studies. We used age‐matched normal tissue as a control. If the fre‐
quency of methylation is higher in cancer and absent or lower level/frequency in normal at an opti‐
mal cutoff, we considered it cancer‐
specific methylation. A summary of our analysis of all 200 genes is de‐
tailed in Table 15.1. Genome‐wide promoter analysis uncovers portions of the cancer methylome Table 15.1: Summary of findings of the 200 candidate markers Tissue type Lung squamous Lung adeno Breast Prostate Colorectal Cervical Reported methylation in cancer Number of genes tested Methylated in cell lines Methylated in tumor tissues Methylated in normal tissues Cancer­
specific methylation 36 7 21 9/21 6/9 3/6 18 5 13 6/13 2/6 1/2 31 45 48 1 9 4 28 36 44 6/28 16/36 23/44 5/6 8/16 16/23 3/5 5/8 10/16 45 4 33 22/33 16/22 3/16 Total §233 ¤30 175 82/175 53/82 25/53 §233 overlapping genes across the all cancer types ¤5 overlapping in different types of cancer. Thus, # of known cancer‐specific methylated genes is 25 (30‐5) Total experimentally tested genes: 200‐25=175 No methylation was detected in the remaining 93 genes (175‐82) in cell lines or primary tissues. However, we empirically analyzed only 200 to 300 bp of a potential 1 to 2 kb region of the promoter by bisulfite sequencing or MSP for the majority of the genes. Figure 15.3A shows the chroma‐
togram of bisulfite sequencing of pro‐
moter methylation of representative candidate genes in primary tissues and cancer cell lines of five major can‐
cer types. Examples of cell lines exam‐
ined showed methylation of target gene in the region and, significantly, exhibited silencing of mRNA expres‐
sion (Figure 15.3B), suggesting that mRNA expression of these genes were regulated by promoter hypermethyla‐
tion. Genome‐wide promoter analysis uncovers portions of the cancer methylome 135 Hs78
MCF7
M
MDA-MB 231
Cell Lines
A. a
U M U M
U
Tissues
BTT
NBT
M
U
M
U
M U M U M U M U M U
M U
KIF1A
RKO
DLD-1
NL
HCT116
Cell Lines
Tissues
DW
NL
PN
PT
NN
DW
OSMR
β-ACTIN
A. b
PAK3A
Lung cancer cell line
Tumor
Normal Lung tissue
NISCH
Lung cancer cell line
Tumor
Normal Lung tissue
136 Genome‐wide promoter analysis uncovers portions of the cancer methylome b
5-aza treated
Mock treated
a
5-aza treated
B
Mock treated
OGDHL
PAK3
GAPDH
GAPDH
H23
Siha
d
c
*
O SM R
O SM R
Amplication Plot (DLD-1)
1E+2
*
*
SW480
DLD-1
HT29
HCT116
*
GAPDH
1E+1
Δ Rn
Relative fold
Control
Aza
A
M
1E-0
1E-1
M
1E-2
A
1E-3
OSMR
1E-4
0
1
20
40
(Cycle)
Figure 15.3: A. a: Promoter methylation of representative candidate genes. a, Methylation of KIF1A and OSMR by conventional methylation specific PCR in cancer cell lines and primary tissues; M=methylated, U=unmethylated; NBT=Normal Breast tissues from non‐
cancer patients, BTT=Breast Tumor Tissues. NN: normal colon epithelium from noncancer patients; PN, paired normal colon tissues from colon cancer patients; PT, paired colon cancer tissues. b, Representative sequencing results of the PAK3 and NISCH in cancer cell lines, normal and cancer tissues. Normal tissues were taken from non‐cancer patients. Arrows, all guanines present after sequencing are complementary to methyl cytosines on the opposite DNA strand. B. Re‐expression of representative genes analyzed by semiquan‐
titaive RT‐PCR or Real‐time RT‐PCR. a and b: Reactivated PAK3 and OGDHL were ob‐
served by the 5‐Aza‐dC treatment in H23 and Siha cell lines by semiquantitative RT‐PCR. c: Overexpression of OSMR was observed by the 5‐Aza‐dC treatment analyzed by Real‐
time quantitative RT‐PCR. Relative fold was calculated by the expression of OSMR mRNA to GAPDH (an internal control). Fold increase of OSMR ranged from 5.2 (HCT116) to 2,868 (DLD‐1). Experiments were performed in duplicate, and values indicate means ± SD. *, P<0.05. d, The amplification plot of OSMR transcript in DLD‐1. A=5‐Aza‐dC treated, M=Mock treated 15.3.3 Candidate Cancer Genes
The cancer‐specific methylated genes identified in this study are listed in Table 15.2. By modified approach we selected 200 genes and after empiric testing, 28 were newly identified can‐
didate cancer genes (methylated in Genome‐wide promoter analysis uncovers portions of the cancer methylome 137 primary tumors but not in progenitor cells). Overall between 2‐12 new me‐
thylated cancer genes were identified in lung, breast, colon, prostate and cervical cancer. 138 Genome‐wide promoter analysis uncovers portions of the cancer methylome Thyroid Cervix Gastric Prostate Colon Pancreas Lung Esophagus Breast Tissue types ND (0) (54) 8/20 0/10 (100) 13/24 10/10 (29) (10) (50) 4/14 2/20 (15) 10/20 2/13 (45) (0) (22) 9/20 0/2 (6) 7/32 1/17 (47) (0) (26) 16/34 0/10 (10) 5/19 1/10 N (45) PAK3 13/29 T 4/20 (13) 3/24 (43) 6/14 (5) 1/20 (5) 1/20 (0) 0/32 (35) 12/34 (21) 4/19 (33) NISCH 8/24 T ND (10) 1/10 (0) 0/10 (10) 2/20 (0) 0/13 (100) 2/2 (6) 1/16 (0) 0/10 (10) 1/10 N 4/20 (29) 7/24 (75) 9/12 (15) 3/20 (37) 7/19 (80) 24/30 (24) 8/34 (32) 6/19 (41) 12/29 T ND (0) 0/10 (100) 10/10 (35) 7/20 (0) 0/13 (0) 0/2 (6) 1/17 (0) 0/8 (0) 0/ 9 N KIF1A Table 15.2: Methylation frequency in different cancer types 0/20 (17) 4/24 (21) 3/14 (0) 0/20 (20) 4/20 (19) 6/32 (6) 2/34 (11) 2/19 (32) (5) 1/20 (0) 0/13 (0) 0/ 2 (0) 0/17 (0) 0/13 (0) 0/10 N ND (0) 0/10 (100) 10/10 OGDHL 8/25 T 0/19 (4) 1/24 (0) 0/14 (0) 0/20 (75) 16/20 (6) 2/32 (0) 1/34 (5) 1/20 (0) 1/28 T N ND (0) 0/10 (100) 12/12 (0) 0/20 (0) 0/13 (0) 0/2 (0) 0/17 (0) 0/10 (0) 0/10 OSMR 5/19 (17) 4/24 (79) 11/14 (100) 20/20 (90) 18/20 (28) 9/32 (60) 21/34 (60) 12/20 (39) 11/28 ND (0) 0/10 (67) 8/12 (0) 0/20 (0) 0/13 (0) 0/2 (0) 0/17 (10) 1/10 (0) 0/10 N B4GALT1 T 16/21 (61) 14/23 (86) 12/14 (85) 17/20 (90) 18/20 (88) 28/32 (80) 24/30 (100) 19/19 (97) MCAM 28/29 T ND (70) 7/10 (100) 12/12 (0) 0/20 (25) 3/12 (50) 1/2 (88) 15/17 (0) 0/10 (70) 7/10 N 9/17 (39) 9/23 (93) 13/14 (80) 16/20 (100) 20/20 (3) 1/32 (68) 23/34 (83) 15/18 (69) SSBP2 20/29 T ND (40) 4/10 (100) 12/12 (10) 2/20 (25) 3/12 (50) 1/2 (71) 12/17 (0) 0/10 (60) 6/10 N ND=Not done Bladder Ovary Head & Neck Kidney Tissue types 0/6 (0) (15) ND 3/20 (53) 10/19 (0) (52) ND N 0/8 PAK3 13/25 (29) 7/24 (40) T (25) 5/20 (47) 9/19 (36) 9/25 (65) NISCH 13/20 (20) T (17) 1/6 ND (0) 0/8 ND N (20) 4/20 (11) 2/19 (35) 9/25 (5) 1/20 (20) T (0) 0/6 ND (0) 0/8 ND N KIF1A (0) 0/20 (0) 0/19 (8) 2/25 (0) OGDHL 0/20 (0) T (0) 0/6 ND (0) 0/8 ND N (0) 0/20 (11) 2/19 (0) 0/26 (0) 0/23 (0) T OSMR (0) 0/6 ND (0) 0/8 ND N (45) 9/20 (47) 9/19 (12) 3/26 (43) 10/23 (26) (0) 0/6 ND (25) 2/8 ND N B4GALT1 T (90) 18/20 (100) 19/19 (38) 10/26 (100) 23/23 MCAM (76) T (50) 3/6 ND (71) 5/7 ND N (25) 5/20 (37) 7/19 (35) 9/26 (78) 18/23 SSBP2 (53) T (0) 0/6 ND (57) 4/7 ND N The 200 candidate cancer genes iden‐
tified in this study fell into three cate‐
gories: a) genes previously observed to be altered in human cancer by me‐
thylation; b) genes in which no previ‐
ous methylation in human cancers was discovered but had been linked to cancer through functional studies; and c) genes with no previous connection with neoplasia. strongly suspected to be involved in cancer. These include NTRK2, ASMLTL, and TFP12. In addition, can‐
cer‐specific methylation was observed in genes for which no biologic role has yet been established, such as OGDHL, C1ORF166, and ARMC7. 15.3.4 New Targets of Aberrant
Methylation in Major
Types of Cancer by
a) The re‐identification of genes pre‐
QMSP
viously shown to be methylated in human cancers represents a critical validation of our modified approach in this study. Most of the genes previ‐
ously shown to be methylated in can‐
cer were found to harbor promoter methylation in this study. These in‐
cluded APC, SFRP1, FHIT, and TWIST. b) Although genetic and epigenetic alterations currently provide the most reliable indicator of a gene’s impor‐
tance in human neoplasia (Momparler, 2003; Varmus, 2006; Vogelstein and Kinzler, 2004), there are many other genes which are thought to play key roles on the basis of functional or ex‐
pression studies. Our studies provide epigenetic evidence supporting the importance of several of these genes in neoplasia. For example, we discov‐
ered cancer‐specific methylation in PAPSS2, TUBG2 and DLL4. c) In addition to the genes noted above, our study revealed a large number of genes that were not We noted that some of the cancer‐
specific methylated genes were reacti‐
vated and methylated in more than one type of cell line. To determine the frequency of methylation in a larger set of samples and in multiple cancer types, we selected 8 of the most fre‐
quently cancer‐specific methylated genes from our list of newly identified 28 genes and developed a quantitative methylation specific PCR (QMSP) as‐
say. We found cancer‐specific methy‐
lation at various frequencies for each gene in multiple types of cancer. A high frequency of cancer‐specific me‐
thylation for at least one gene was identified in every cancer type, sup‐
porting the notion that methylated genes are likely to play a role across multiple cancer types. 15.4 Discussion Most studies on DNA methylation in cancer have focused on a candidate Genome‐wide promoter analysis uncovers portions of the cancer methylome 141 gene approach where a tumor sup‐
pressor or previously reported methy‐
lated gene is tested in another type of cancer. Although a number of studies have attempted to detect additional gene targets, in general, the gene se‐
lection methodologies have not been sensitive enough to identify target genes with comparatively less time and labor. By developing a new tool to analyze gene promoters in combina‐
tion with a relatively large expression microarray dataset, it has been possi‐
ble for the first time to identify a large number of target genes. In our experi‐
ence, this is a major advance over previous empiric techniques that re‐
quired excessive experimental effort and yielded only a few (<0.5%) can‐
cer‐specific methylated genes (Toku‐
maru et al., 2004; Yamashita et al., 2002). Our yield, based on a combina‐
tion of re‐expression arrays and pro‐
moter sequence pattern provided a nearly 500‐fold higher yield of genes harboring promoter methylation. tissue samples may have been con‐
taminated with unmethylated DNA from normal surrounding cells, and in artificial conditions, cell lines may have acquired methylation of some genes to provide a cell growth advan‐
tage. The discrepancy between the computationally and pharmacologi‐
cally predicted (175) and experimen‐
tally (82) identified methylated genes in cell lines may be partially due to the analysis of limited regions (approxi‐
mately 200‐300 bp for most of the genes) by bisulfite sequencing or MSP. To compare the overall pattern of methylated CpG islands among tu‐
mors, we tested 300 primary tumors of 13 different types with 8 frequently cancer‐specific methylated genes iden‐
tified from our approach. Pancreas, gastric, thyroid and ovary cancers displayed relatively low levels of me‐
thylation. Colon, prostate, esophagus and kidney tumors, however, dis‐
played a much higher frequency of methylation overall. Some tumors We found that 47% (82/175) of the within a type displayed high inherent genes tested in cell lines were methy‐ levels of methylation, whereas others lated by bisulfite sequencing and/or within the same tumor type displayed MSP and 65% (53/82) of these genes low levels (data not shown). The data were methylated in primary tumors. are not consistent with chance varia‐
Our results are consistent with previ‐ tion from tumor to tumor, because in ous studies (Kim et al., 2006; Toku‐ the absence of heterogeneity the vari‐
maru et al., 2004; Yamashita et al., ance of the methylation frequency 2002) where the frequency of methy‐ would not be expected to be greater lation of any particular gene in pri‐ than the mean. Therefore, aberrant mary tumors is generally less than methylation of CpG islands can be that observed in cell lines. Some of our quantitatively different in individual 142 Genome‐wide promoter analysis uncovers portions of the cancer methylome tumors within a tumor type and more pronounced in particular tumor types. We found cancer‐specific and tissue‐
specific methylation events in differ‐
ent tissue types. For example, PAK3 cancer‐specific methylation was found in esophagus, lung, cervix, head and neck and bladder cancers with high frequency. PAK3 was also occasionally methylated in other normal tissues. PAK3 is located in the X‐chromosome, thus it is likely that there will always be methylated signal in samples from female patients. However we consider PAK3 as cancer specific methylation as we also found high frequency of me‐
thylation in samples from male cancer patients. Like PAK3, some other genes showed either cancer‐specific or tis‐
sue‐specific methylation in multiple organs. Although there have been reports of MCAM overexpression in melanoma, we found a high frequency of MCAM promoter methylation in prostate cancer. OSMR showed can‐
cer‐specific methylation only in colon cancer and was previously shown to have a major functional role in breast and other cancers (Liu et al., 1998; Savarese et al., 2002). Liang et al. re‐
ported loss of expression of SSBP2 in 50% of myeloid leukemia cell lines and concluded that loss of SSBP2 ex‐
pression may underlie the impaired differentiation seen in human myeloid leukemia (Liang et al., 2005). How‐
ever, prior to this report there was no reported mechanism for loss of ex‐
pression of this DNA binding protein. β4GalT‐1 is constitutively expressed in all tissues, with the exception of the brain (Lo et al., 1998) as a Golgi‐
resident protein. We found a high frequency of cancer‐specific methyla‐
tion of β4GalT‐1 in esophagus, lung, colon and prostate. NISCH (IRAS) was first isolated as an imidazoline‐1 re‐
ceptor candidate cloned by an imida‐
zoline receptor antisera‐selected (IRAS) cDNA approach (Wang et al., 2002) and was independently shown to be an interacting partner for insulin receptor substrate 4 (Piletz et al., 2000) IRAS was recently reported to protect transfected PC12 cells from apoptosis (Sano et al., 2002), whereas its mouse homologue, Nischarin, which lacks the NH2‐terminal PX do‐
main, was identified as a cytosolic interacting protein for 5 integrin and shown to inhibit cell migration by inhibiting the ability of PAK1 to phos‐
phorylate substrates (Alahari and Nasrallah, 2004; Dontenwill et al., 2003). We found a high frequency of cancer‐specific methylation of this gene in lung, head and neck and gas‐
tric cancer. Further studies in these tissue types will provide precise in‐
sight into the functional aspects of IRAS. KIF1A is a member of the KIF1/Unc104 family and targeted deletion of the KIF1A gene in mice causes accumulation of clear small vesicles in the cell body of neurons as well as marked neuronal death (Yone‐
kawa et al., 1998). We report for the Genome‐wide promoter analysis uncovers portions of the cancer methylome 143 first time a high frequency of cancer‐
specific methylation of KIF1A in ma‐
jority of human tumors. The frequency of methylation within a tumor type of the individual CpG is‐
lands affected in at least three differ‐
ent tumor types is shown. Some tar‐
gets were methylated at a high fre‐
quency in one tumor type but infre‐
quently in others (for example OSMR) whereas other targets (for example KIF1A) were methylated at relatively high frequencies in the majority of tumor types. Thus, whereas some CpG‐island targets are shared by mul‐
tiple tumor types, others are methy‐
lated in a tumor‐type‐specific manner. It has been documented that virtually all biochemical, biological and clinical attributes are heterogeneous within human cancers of the same histologic subtypes (Shapiro and Shapiro, 1984). Our data suggest that differences in the methylated genes in various tu‐
mors could account for a major part of this heterogeneity. This might explain why it has been so difficult to correlate the behavior, prognosis, or response to therapy of common solid tumors with the presence or absence of a sin‐
gle gene alteration; such alterations reflect only a small component of each tumor’s overall genetic and epigenetic composition. Like any global genomic and epige‐
nomic approach, our study has limita‐
tions. First, we were not able to test all 144 the known and newly discovered me‐
thylated genes in all the 13 types of cancer included in this study. Second, although in most of the cases mosaic methylation occurred, focal methyla‐
tion for some genes was also reported and methylation in 5’ untranslated regions would not be detectable by the methods we used. Future studies em‐
ploying a combination of different technologies will be able to address these issues. The results of this study inform future cancer methylome discovery effort in several important ways: a) A major technical challenge of such studies will be discerning cancer‐
specific methylation from the large number of tissue‐specific methylated genes. In our study, using modified gene selection criteria in pharmalogi‐
cal unmasking strategy, we identified 47% methylated genes in contrast to 10% to 20% by previous criteria. In the future, improvements in gene slec‐
tion strategy for prediction of methy‐
lation‐prone gene should result in less labor and less empiric experimenta‐
tion. b) Another technical issue is the de‐
velopment of high throughput assays for the analysis of large numbers of samples. In this study we developed QMSP assay for 8 novel cancer‐specific methylated genes and similar real‐
time assays could be developed indi‐
Genome‐wide promoter analysis uncovers portions of the cancer methylome vidually for newly identified methy‐
lated targets. Once a methylation tar‐
get set is known for a particular can‐
cer, or even if the entire cancer “me‐
thylome” is discovered, other genomic approaches such as chip arrays may facilitate large scale research and clinical efforts. c) Although it is likely that studies of other solid tumor types will also iden‐
tify a large number of methylated genes, it will be important to apply rigorous approaches to identify the specific methylated genes that have been selected for during tumorigene‐
sis. Our modified approach can predict for cancer‐specific methylated genes and reduce empiric testing. These approaches are likely to improve as more methylated genes are discovered and characterized. d) There has been much discussion about which genes should be the focus of future efforts for methylation analy‐
sis. Our results suggest that many genes not previously implicated in cancer are methylated at significant levels and may provide novel clues to cancer pathogenesis. Adding this data to previous reports, perhaps up to 1/3 (approximately 300 genes total) of the cancer methylome has now been discovered, in compari‐
son to the identification of perhaps 200 mutated genes over the past 2 decades and recent genome‐wide mutation analysis in primary tumors (Sjoblom et al., 2006). An emerging picture of genetic and epigenetic changes and their relationship is un‐
raveling the biologic networks re‐
sponsible for human cancer. The ge‐
netic and epigenetic alterations in different cancer types are diverse (Costello et al., 2000; Esteller, 2003) and we and others previously found unique inverse relationships between genetic/epigenetic changes (Toku‐
maru et al., 2004; Toyooka et al., 2006; Xing et al., 2004). However, 26 genes obtained in the Vogelstein’s last muta‐
tion screening are also methylated here (Sjoblom et al., 2006). Ultimately, the epigenome of all cancer tissues, including those of different stage and grade, will be mapped out even as we now approach a total molecular signa‐
ture of cancer. Epigenetic states differ widely among tissues, and changes are far more varied and much more fre‐
quent per tumor than DNA mutations. According to Dr. Jones each differenti‐
ated cell has a different epigenome (Garber, 2006). Our comprehensive analysis contributes greatly to the emerging epigenomic map of DNA methylation in the human genome. Additional studies using similar and complementary genomic strategies should yield further insights into the dynamics and hierarchy of epigenetic regulation during tumorigenesis. These data define the epigenetic land‐
scape of major human cancer types, provide new targets for diagnostic and Genome‐wide promoter analysis uncovers portions of the cancer methylome 145 therapeutic intervention, and open fertile avenues for basic research in tumor biology. 146 Genome‐wide promoter analysis uncovers portions of the cancer methylome Chapter 16: Transcriptome­wide pro­
moter hypermethylation profiling in neuroblastoma Paper 7: Genome wide promoter methylation analysis in neuroblasto­
ma with perspectives for integrated molecular profiling Hoebeeck J, Ongenaert M, Michels E, De Preter K, Vermeulen J, Yigit N, De Paepe A, Van Criekinge W, Speleman F and Vandesompele J. In preparation. Hypermethylation of normally unmethylated CpG islands located in gene promoter regions can be an alternative for the more classical ways of inac­
tivation of a TSG, including mutation and deletion. To provide further insight in the genes that have been silenced by promoter hypermethylation in neuroblastoma and hence in neuroblastoma onco­
genesis, global expression profiling was performed to determine genes that were up­regulated after treatment with DAC using oligonucleotide mi­
croarrays (Affymetrix). Comparison of the list of differential expressed genes for which at least one reactivation event was observed to a list of genes known to be methylated in other tumor types lead to the identifica­
tion of three potentially methylated tumor suppressor genes located in critical regions in neuroblastoma, i.e. FABP3 (1p32­33), IGSF4 (11q23.2) and ESR1 (6q25.1). Additionally, a selection of 150 differentially expressed genes was made based on promoter similarity with known methylation markers. This gene list provides very interesting candidate genes that merit further investigation for their involvement in neuroblastoma. The validation of these markers is still in progress. Transcriptome‐wide neuroblastoma promoter hypermethylation profiling in 147 16.1 Materials and methods 16.1.1 Neuroblastoma cell lines
33 well‐characterized neuroblastoma cell lines were included in this study 7‐10. DNA was isolated using the QIAamp DNA mini kit (Qiagen). 16.1.2 Microarray analysis
RNA quality was evaluated with the Agilent 2100 Bioanalyzer using the RNA 6000 Nanochip. All samples demonstrated an RNA integrity num‐
ber of more than 8 and were consid‐
ered suitable for microarray analysis. After a two round amplification, 8 cell lines (CHP‐902R, CLB‐GA, IMR‐32, LAN‐2, N206, SH‐SY5Y, SK‐N‐AS, SJNB‐
1) before and after DAC treatment were hybridized to Affymetrix HG‐
U133 Plus 2.0 oligonucleotide chips containing 47,000 transcripts. CEL files were loaded in the R‐
Bioconductor (BioC) software. Stan‐
dard quality metrics (simpleaffy pack‐
age) demonstrate that the oligonucleo‐
tide chip data are of good quality. The affy package was used to normalise the expression levels and to obtain present‐absent (expression‐no ex‐
pression) calls for each probeset. For all cell lines and for each probeset the number of reactivation events were counted (absent in untreated cells and present in treated cells). The probe‐
sets with at least one reactivation event were compared with a list of 65 148 Transcriptome‐wide neuroblastoma known methylation markers, i.e. genes that have been described as methy‐
lated in cancer or cell lines. In a sec‐
ond step, we analyzed the promoter region using a visualization of genome wide promoter sequences. To this purpose promoter sequences of ap‐
proximately 4,700 genes were aligned, using Clustal W multiple alignment and visualized using the advanced visualization software tool TreeIllus‐
trator (Trooskens et al., 2005). The aligned promoter regions are those from genes that are annotated with a transcription start site (TSS) as de‐
termined by DBTSS (Suzuki et al., 2004), and contain at least one CpG island in their promoter sequence. CpG islands were discovered using newcpgreport (in the EMBOSS pack‐
age)(Olson, 2002). Identification of possible interesting genes was based on a medium‐large scale TextMining using GoldMine, a CGI/Perl script run‐
ning on a server at BioBix (laborato‐
rium for Bioinformatics and computa‐
tional genomics, Department of Mo‐
lecular Biotechnology, Faculty of Bio‐
Science Engineering). 16.2 Results and discussion We aimed at genome wide assessment of re‐activation of gene expression upon DAC treatment. We compared the expression pattern of 8 DAC treated versus non‐treated neuroblas‐
toma cells on oligonucleotide chips containing 47,000 transcripts. promoter hypermethylation profiling in Based on differential expression lev‐
els, genes were selected for which at least one reactivation event was ob‐
served. This gene list was compared to a list of known genes that have been described as methylated in other tu‐
mor types (Table 16.1). Table 16.1: List of known methylated genes reactivated in neuroblastoma cells gene locus events FABP3 1p33‐p32 1 PTGS2 1q25.2‐q25.3 1 HTLF 2p22‐p16 1 RARB 3p24 1 RBP1 3q23 2 CSPG2 5q14.3 1 APC 5q21‐q22 1 ESR1 6q25.1 2 GATA4 8p23.1‐p22 2 CDKN2B 9p21 1 IGSF4 11q23.2 2 CCND2 12p13 2 EDNRB 13q22 3 SOCS3 17q25.3 3 APOC1 19q13.2 2 SEZ6L 22q12.1 1 TIMP3 22q12.1‐q13.2|22q12.3 1 For 7 of the 17 genes in the intersec‐
tion, methylation analysis in neuro‐
blastoma had been previously per‐
formed (HTLF, RARB, RBP1, CCND2, SEZ6L and TIMP3). In these studies, no methylation was observed for SEZ6L, HTLF while a low CpG island Transcriptome‐wide neuroblastoma promoter methylation frequency was found for APC (9%), TIMP3 (10%), RARB (10%), CCND2 (10%) and RBP1 (30%) in cell lines. Of interest are newly identified methylated genes in this study, i.e. FABP3 (1p32‐33), IGSF4 (11q23.2) and ESR1 (6q25.1) as they are local‐
hypermethylation profiling in 149 ized in regions that are frequently deleted in neuroblastoma (Michels et al., in preparation). The FABP3 gene belongs to a multigene family of intra‐
cellular fatty acid‐binding proteins (FABPs). These proteins are involved in uptake and transport of fatty acids. IGSF4 (immunoglobulinsuperfamily, member 4) encodes a cellular adhe‐
sion molecule with a role in synaptic formation of neural cells 31. The tu‐
mor suppressor gene ESR1 encodes for the estrogen receptor α, a ligand‐
activated transcription factor. In a parallel analysis, promoter study of the genes with at least one reactiva‐
tion event in our neuroblastoma series was performed by genome wide alignment of promoter sequences. This large alignment of promoter re‐
gions was visualized with the 'radial cladogram' feature of the software tool TreeIllustrator. A unique feature of TreeIllustrator is the visualization of discrete information on top of the alignment view. This allowed us to select those genes with a very similar promoter region as known methyla‐
tion markers. Besides promoter simi‐
larity, the number of reactivation events is also shown in the tree and allows quick selection of genes with a high number of reactivation events lying together in a cluster. Examples of the visualization are shown in Figure 16.1. The underlying basis of this promoter study is the fact that it has been demonstrated that genes with a similar promoter region compared to known methylation markers, are also very good candidates to be methylated in cancer and not in normal tissues (Hoque et al., 2008). Using this ap‐
proach, we were able to select about 150 genes that were either in the 'neighborhood' of a known marker, or that clustered together in the align‐
ment and have a high number of reac‐
tivation events (at least two genes in a cluster have at least three reactiva‐
tions). Gene ontology analysis of this gene set demonstrated that mainly cell growth and morphology related genes were significantly (p < 0.01) overrep‐
resented (Table 16.2). 150 Transcriptome‐wide neuroblastoma promoter hypermethylation profiling in Figure 16.1: A. General overview of the visualisation of the alignment of promoter re‐
gions using TreeIllustratror. B. Detail of an interesting region; the red bars are known markers, while the bars closest to the identifiers show the number of reactivation event
Transcriptome‐wide neuroblastoma promoter hypermethylation profiling in 151 Table 16.2: Gene ontology (GO) data mining of subset of 150 genes that show reactiva‐
tion events in one or more of the investigated cell lines (only categories with p‐value smaller than 0.01 are represented) GO­ID p­value number of genes GO:0016049 4.70E‐04 6 cell growth GO:0008361 4.70E‐04 6 regulation of cell size GO:0001558 9.90E‐04 5 regulation of cell growth GO:0042743 1.08E‐03 2 hydrogen peroxide metabolism GO:0042744 1.08E‐03 2 hydrogen peroxide catabolism GO:0042542 1.08E‐03 2 response to hydrogen peroxide GO:0040008 1.26E‐03 5 regulation of growth GO:0040007 1.35E‐03 6 growth GO:0006265 2.24E‐03 2 DNA topological change GO:0000902 2.51E‐03 6 cellular morphogenesis GO:0000302 3.79E‐03 2 response to reactive oxygen species GO:0016043 5.88E‐03 13 cell organization and biogenesis GO:0050793 7.72E‐03 5 regulation of development GO:0009225 7.99E‐03 2 nucleotide‐sugar metabolism After comparison of our list of 150 putative neuroblastoma methylation markers with published microarray data using the L2L microarray analysis tool (http://mggisa.ugent.be/medgen50/L
2L/) we found that our gene list is significantly enriched with genes that are regulated by microRNAs (miRNAs) 32. miRNAs are a new class of small RNA molecules that play an important role in negative regulation of gene expression. Recent studies have dem‐
onstrated their involvement in cancer development 33, 34. Interestingly, miRNA‐based methylation of genomic DNA has been described 35. These data may suggest that in neuroblas‐
toma epigenetic regulation through 152 description GO class Transcriptome‐wide neuroblastoma methylation of putative miRNA target genes may occur. Within our list of 150 putative neuro‐
blastoma methylation markers, a strict selection will be made of strong can‐
didates, based on chromosomal posi‐
tion, systematic upregulation, Gene Ontology classification and correlation of methylation patterns with genetic abnormalities. Re‐expression of these genes will be confirmed in our panel of 22 neuroblastoma cell lines (untreated or treated with DAC alone or in com‐
bination with TSA). Potential prognos‐
tic relevance or correlation with other clinico‐genetic parameters will be determined through quantitative RT‐
PCR and methylation‐specific PCR on a promoter hypermethylation profiling in panel of primary neuroblastoma tu‐
mors. Further functional analysis will be conducted on the most promising candidates. With this study, we aim at identifying culprit tumor suppressor genes in Transcriptome‐wide neuroblastoma promoter regions that are frequently deleted in neuroblastoma. Another goal of our analysis is the identification of methy‐
lation markers for early diagnosis or for improved diagnosis or classifica‐
tion.
hypermethylation profiling in 153 Chapter 17: Predicting platinum re­
sponse in ovarian cancer, using DNA­
methylation profiling Paper 8: Predicting platinum response in ovarian cancer, using DNA­
methylation profiling Ongenaert M, Hoque MO, Brait M, Van Criekinge W, Sidransky D. In prepa­
ration. Ovarian cancer is the most common cause of cancer death from gynecolog­
ic tumors in the United States. Early disease causes minimal, nonspecific, or no symptoms. Therefore, most patients are diagnosed in an advanced stage. Overall, prognosis for these patients remains poor. Standard treatment involves aggressive debulking surgery followed by chemotherapy. Most patients with ovarian cancer have a recurrence. Based on the disease­free interval after completing chemotherapy, patients can be classified in 2 categories: (1) platinum­sensitive (relapse >6 mo after initial chemothera­
py) and (2) platinum­resistant. Patients with platinum­sensitive disease may exhibit a good response if rechallenged with a platinum­based regi­
ment. The probability of response increases with the duration of the dis­
ease­free interval. We used primary cancer samples from patients with known platinum­
response status, platinum resistant and sensitive ovarian cancer cell lines (untreated and treated with DAC demethylation treatment) and normal brushing samples in order to predict platinum resistance / sensitivity by using DNA­methylation profiles. The validation of these markers is still in progress, therefore gene names and references are left out. Predicting platinum response in ovarian cancer, using DNA‐methylation profiling 155 17.1 Introduction Arguably the discovery of cisplatin was one of the most significant events for cancer chemotherapy in the 20th century. Cisplatin, a square planar Pt(II) complex (cis‐dichlorodiamine platinum(II)) was approved for clini‐
cal use in 1978, and has become first line therapy for the treatment of testi‐
cular cancer. Platinum drugs are now widely used for the treatment of testi‐
cular and ovarian cancers. Cisplatin acts by binding to DNA. In aqueous biological media the chloro ligands are replaced by water to give a diaquo species which interacts with DNA to form inter‐ and intra‐strand cross‐links. The lesion causing cancer cell death is the intrastrand cross‐link between adjacent guanine bases on the DNA strand. Interestingly the fre‐
quency of this lesion (> 60%) is too high to be accounted for by simple statistics and the cis‐ammine ligands appear to interact with the DNA phos‐
phate backbone to specifically direct this simple molecule in an orientation such as to promote this specific lesion. Cisplatin has several limitations, it is very toxic, particularly towards the kidneys and has to be given as a large volume intravenous infusion, it is not orally bioavailable, and there is a pop‐
ulation of cancer cells which either are inherently resistant, or acquire resis‐
156 tance. These problems have led to extensive medicinal chemistry pro‐
grams in many laboratories (Fricker, 2007). Figure 17.1: chemical structure of cispla‐
tin (cis‐dichlorodiamine platinum(II)) It would be useful to identify bio‐
markers that can be used to predict platinum sensitivity in ovarian cancer patients in order to immediately start alternative therapies for possible re‐
sistant patients. In a later phase, there may be possibil‐
ities to re‐sensitize the patients: in the case DNA‐methylation is responsible for resistance, demethylating agents could be used therefore. 17.2 Materials and methods 17.2.1 Samples
As primary ovarian tumor samples, 7 platinum resistant tumor samples and 8 platinum sensitive tumor samples were selected. Predicting platinum response in ovarian cancer, using DNA‐methylation profiling Normal samples are 10 brushing sam‐
ples from patients without ovarian cancer. In addition, 3 pairs of ovarian cancer cell lines were taken into account: platinum sensitive cell lines and the resistant variants of the same cell lines. The cell lines used are 2008 and its resistant variant 2008C13; A2780 and the resistant A2780CP and IGROV and IGROVCP (resistant). Normal ovarian cell lines include OSE2A, OSE2B and OSE7. 17.2.2 5-aza-dC treatment of
cells
As described before in Chapter 15: Genome‐wide promoter analysis un‐
covers portions of the cancer methy‐
lome. 17.2.3 Biotinylated RNA Probe
Preparation and Hybridization
All samples were analyzed on Affyme‐
trix HGU133plus2 expression micro‐
arrays, containing over 54,000 probes corresponding with 47,000 human transcripts. 17.2.4 Analysis of Expression
Data
Raw data quality was assessed using intensity boxplots and RNA degrada‐
tion plots (data not shown). Expres‐
sion data of cancer samples, normal samples, cancer cell lines and normal cell lines were normalized using the just.rma method within BioConduc‐
tor/R (Gentleman et al., 2004). Data analysis was conducted using Volcano plots (in the ‘limma’ package) and False Positive Rate analysis (5% False Positives, 100 iterations) using the ‘RankProd’ package (Hong et al., 2006). Both types of analysis were performed on the different assignments of the experiment: • Cancer cell lines ‐ resistant vs. sen‐
sitive • Cancer cell lines – AZA‐treated vs. untreated o All samples o Resistant cell lines o Sensitive cell lines • Normal cell lines – AZA‐treated vs. untreated • Normal cell lines vs. cancer cell lines o Normal cs. all cancer cell lines o Normal vs. resistant cancer cell lines Predicting platinum response in ovarian cancer, using DNA‐methylation profiling 157 o Normal vs. sensitive cancer cell lines • Normal samples vs. cancer samples o Normal vs. all cancer sam‐
ples o Normal vs. resistant cancer samples o Normal vs. sensitive cancer samples Data from these different analyses were combined, in order to answer the main research questions: • Can we identify cancer‐specific methylation markers in ovarian cancer (methylated and downregu‐
lated in the cancer samples while not in normal samples). These markers could for instance be used as early detection markers • Are there methylation biomarkers that are methylated and downregu‐
lated in the resistant tumor sam‐
ples while not in sensitive tumor samples. These methylation bio‐
markers could be used in ‘persona‐
lized’ medicine or the genes in‐
volved could be activated again (epigenetic therapy) to sensitize the patients for platinum‐based therapies • Are there methylation biomarkers that are methylated and downregu‐
lated in the sensitive tumor sam‐
ples while not in resistant tumors Ovarian cancer methylation markers
An ideal methylation biomarker in ovarian cancer (not taking platinum resistance into account): • Is differentially expressed between normal samples and tumor sam‐
ples • Is differentially expressed between normal cell lines and cancer cell lines • Has a low expression in cancer samples and cell lines • Is differentially expressed between cancer cell lines and DAC‐treated cell lines (re‐activation) In order to select the best methylation biomarkers in ovarian cancer, all these criteria are combined using a score scheme, as follows: • Downregulation in cancer vs. nor‐
mals: the results from the RankProd FDR analysis are used o Results are ranked, based on p‐value; the score given to a probe reflects this ranking: Ranking Score
o The FDR itself is taken into account: FDR Score
tan 1 FDR o The fold change (FC) Δ is scored next, being Δmax the maximal FC observed : FC Score
158 tan
tan ∆
∆
Predicting platinum response in ovarian cancer, using DNA‐methylation profiling o The tangens function is cho‐
sen as all values are be‐
tween 0 and 1, and the tan‐
gens between these values range from 0 to 1,56 in a non‐linear way, with some‐
what (but not too extreme) more increase in score for values approaching 1 (being the best possible). This way, probes with a very good profile have a higher chance to acquire a high score and to be selected • Downregulation in cancer cell lines vs. normal cell lines and cancer cell lines vs. DAC‐treated cancer cell lines is examined the same way • Low expression in tumor samples and cancer cell lines is examined as follows: o For every sample, the max‐
imal expression is deter‐
mined; for each probe its expression level is com‐
pared to the maximal ex‐
pression level in that sam‐
ple: expression score 1
cell lines and three for upregulation after DAC treatment). • Is differentially expressed between cisplatin resistant and cisplatin sensitive tumor samples (downre‐
gulated in resistant) • Is differentially expressed between resistant tumor samples and nor‐
mal samples (downregulated in cancer) • Is re‐expressed in resistant cell lines after treatment with DAC, but o for all samples, the sum is calculated Using this score scheme, 11 values are generated (3 for downregulation in tumors vs. normal; one for expression in tumor samples; 3 for downregula‐
tion in cancer cell lines vs. normal cell line; one for low expression in cancer For each of these values, the percentile of the probe is calculated: for each probe, for a specific score it is calcu‐
lated where it is ranked, expressed in percentages versus the scores of all probes. For instance a probe with a score of 95 % of a particular probe was situated at the top 5 % probes with best scores. Next, we determine the number of scores where a probe was at least in the best 5 % as well as the average percentile of all scores for this probe. The probes are sorted primary on the number of scores in the best 5% per‐
centile, followed by the average per‐
centile score. Platinum resistance methylation
markers
An ideal methylation biomarker in platinum‐resistant ovarian cancer: Predicting platinum response in ovarian cancer, using DNA‐methylation profiling 159 less re‐expressed in sensitive tu‐
mor cell lines after DAC treatment • Has low expression in resistant tumor samples and higher expres‐
sion in sensitive tumor samples The difference between platinum sen‐
sitive / resistant cell lines was not taken into account as the differences were not statistically shown and very much dependent on the cell line used. In order to select the best methylation biomarkers in ovarian cancer, all these criteria are combined using a score scheme, as follows: • Downregulation in resistant tumor samples vs. sensitive: the results from the RankProd FDR analysis are used o Based on p‐value: the 95 % probes with best p‐value get a score of 3, the 90 % best probes get a score of 2 and the 75 % best probes get a score of 1 o The FDR itself is taken into account (the 95 % probes with best p‐value get a score of 3, the 90 % best probes get a score of 2 and the 75 % best probes get a score of 1) o the fold change (FC) Δ is scored next using the same score system (95 – 90 – 75) • Downregulation in resistant tumor samples vs. normal samples to eva‐
luate downregulation in the tumor 160 • Resistant cancer cell lines vs. DAC‐
treated cancer cell lines to evaluate possible methylation • Sensitive vc. DAC treated sensitive cell lines is also evaluated as we want resistant‐specific methylation Platinum sensitivity methylation markers are examined the same way. 17.2.5 In-silico analysis of topranking probes
The promoter regions of the different transcripts of the top‐ranking were gathered using Ensembl (Flicek et al., 2008), CpG‐islands were identified by Newcpgreport (Olson, 2002). A text‐mining analysis was used to screen for known methylation mark‐
ers in ovarian or other cancer types and to find evidence of previous re‐
ported influence on platinum resis‐
tance. In addition, pathway and func‐
tion overrepresentations were identi‐
fied by Ingenuity Pathway Analysis (IPA). 17.3 Results 17.3.1 Ovarian cancer methylation markers
The biomedical literature analysis shows that several top‐ranked (cutoff was chosen at 250) probes are asso‐
ciated with either methylation or downregulation in ovarian cancer. Predicting platinum response in ovarian cancer, using DNA‐methylation profiling Examples are the gene on rank 1&36, that is methylated in ovarian cancer; the gene at rank 16 that is methylated in some cancer types and is downre‐
gulated in ovarian cancer; the gene at rank 75 is known as a tumor‐
suppressor gene in ovarian cancer, is downregulated and described as me‐
thylated in ovarian and other tumor types. Overrepresented function (as identi‐
fied by IPA) include cancer and cell death; pathways include Wnt/β‐
catenin, Toll‐like receptor signaling, death receptor signaling and RAR‐
activation. The Wnt/β‐catenin pathway might be influenced as 6 genes are present in the top‐250 probes. These genes in‐
clude APC, directly connected to β‐
catenin. Other genes seem to be lo‐
cated in the extracellular part of the pathway. At the transcription level of the pathway, a transcription regulator (transcriptional corepressor that binds to a number of transcription factors and inhibits the transcriptional activation) is involved. The table, showing the top‐ranked genes, is included in Supplementary table 1. 17.3.2 Platinum resistance methylation markers
In the top‐ranked 207 probes (Sup‐
plementary table 2), several are asso‐
ciated with cisplatin reponse and are reported to be methylated in ovarian or other cancer types. Examples are the gene at rank 1&3: known as a tumor‐supressor in ova‐
rian cancer, described as methylated in various cancer types and is related with platinum‐sensitivity; the genes ar rank 13 is involved in platinum resis‐
tance; the genes at rank 35 is related with drug (including cisplatin) resis‐
tance in ovarian cancer. Enriched functions include drug me‐
tabolism, molecular transport and cell death. Overrepresented pathways are for instance PXR/RXR activation, me‐
tabolims of Xenobiotics by Cytoch‐
rome P450, PTEN signaling and Xeno‐
biotic metabolism signaling. 17.3.3 Platinum sensitivity methylation markers
In the top‐ranked 209 probes (Sup‐
plementary table 3), several are asso‐
ciated with methylation or cisplatin response. Examples are the gene at rank 20: downregulation is associated with response to drugs in ovarian cancer Predicting platinum response in ovarian cancer, using DNA‐methylation profiling 161 and the gene at rank 89: associated with platinum sensitivity. time is wasted and the survival rates of the patients drastically decrease. Enriched functions include immune response, cell death and cellular movement. Overrepresented path‐
ways are for instance Complement system, NRF‐2 mediated oxidative stress response and FGF signaling. In order to predict platinum resistant ovarian cancer patients, we make use of DNA‐methylation markers. Based on primary patients and pharmacolog‐
ical demethylation experiments on ovarian cancer cell‐lines, we applied a ranking methodology in order to gen‐
erate a list of possible DNA‐
methylation biomarkers. The methylation biomarkers are cur‐
rently under evaluation on primary cancer samples. 17.4 Discussion Drug resistance in ovarian cancer is a main problem in the different treat‐
ment strategies. If resistance occurs, 162 These biomarkers now need to be validated in order to verify that they can be used to predict platinum‐
resistance in primary patients. Predicting platinum response in ovarian cancer, using DNA‐methylation profiling Chapter 18: Conclu­
sions Methylation bio‐markers are powerful biological markers that can be used in early cancer detection and disease stratification and classification. Epige‐
netic therapies are now being devel‐
oped and clinical trials ongoing. With the availability to perform large‐scale methylation tests, the discovery of novel markers can happen on a ge‐
nome‐wide scale. However, the sensi‐
tivity of the detection technologies is limited as is the sample material. Finding ways to improve both the initial set‐up of the experiments and the analyses of the data can drastically improve the success rate of finding novel methylation biomarkers. Several approaches are demonstrated in this chapter. They all contributed to the extension of the knowledge of the cancer methylome in the investigated cancer types. Different strategies were used, de‐
pending on the chosen analysis meth‐
ods and the availability of (high‐
throughput) detecting platforms. Some methodologies rely on finding novel ways to rank the experimental data measurements, representing biological knowledge while other Conclusions methodologies rely on new hypothe‐
sizes that arise from data and are tested in validation studies. In both cases, it becomes clear that careful experimental design is crucial as the large‐scale detection platforms introduce a fair amount of noise and uncertainty in return for massive amounts of data. Taken the biological and technical variation into account, the strategies demonstrated here prove their power in the experimental validation studies: novel methylation markers are identi‐
fied with a significant better success rate, compared with earlier studies. Some novel marker discovery at‐
tempts, presented here, are still under validation studies. This illustrates the need for high‐quality analysis tech‐
niques with reasonable speed, next to high‐throughput screening techniques used for the initial analysis. Various discovery studies all generate pieces of the “cancer methylome”. However, the puzzle is to be com‐
posed. In the near future, new techniques may speed up the data‐generation: parallel sequencing‐by‐synthesis me‐
thod (454 sequencing) is already used in bisulfite sequencing (Taylor et al., 2007). 163 Part 4: Reprogramming of human host cells by viruses Part 4: Reprogramming of human host cells by viruses How can viruses modify their human hosts to be able to suc­
cessfully infect them and what is the role of DNA­
methylation? 165 Chapter 19: Intro­
duction Moss: (Writing email to fire depart­
ment) "Dear Sir stroke Madam, I am writing to inform you of a fire which has broken out on the premises of..." no, that's too formal (Deletes). Dear Sir stroke Madam. Fire, exclamation mark. Fire, exclamation mark. Help me, ex­
clamation mark. 123 Carrendon Road. Looking forward to hearing from you. All the best, Maurice Moss. Scene from “The IT crowd” (2006) Viruses cannot survive or at least mul‐
tiply outside the host they infected. Therefore, they use various techniques to ‘hide’ them for the defence mecha‐
nisms of the host they are living in. It might thus not be too hard to image that viruses actively modify the host, apart from just trying to disguise themselves. One such an active ‘re­
programming’ mechanism could be modifying the DNA‐methylation pat‐
tern of the host cells. This behaviour is described for differ‐
ent viruses in human hosts such as hepatitis B (Su et al., 2007) and Hu‐
man Papilloma Virus (HPV) types (Wu et al., 2005). Burgers et al. (Burgers et al., 2007) showed that the E7 protein of HPV‐16 associates with the main DNA‐methyltransferase DNMT‐1 in vitro and in vivo and could regulate its Introduction activity. This could be a mechanism by which HPV might influence the cell cycle of its host. On the other hand, it is know that one of the defence mechanisms of the (human) host cell is DNA‐methylation of viral DNA. For several viruses that integrate their genome into the host genome, the methylation status during this stage has been studied extensive‐
ly, and the relationship between me‐
thylation and viral‐induced tumor formation has been examined careful‐
ly. Also in viruses that do not integrate with the host genome, DNA‐
methylation can be involved (Hoelzer et al., 2008). Examples where the viral genome becomes methylated are he‐
patitis B virus (HBV) (Vivekanandan et al., 2008) and HPV (Badal et al., 2004) So it seems that in the above described viruses an ‘epigenetic battle’ is fought out: the virus influences DNA‐
methylation of the human host cell while methylation of the viral DNA is one of the hosts defense mechanisms. The impact of epigenetic reprogram‐
ming of human host cells by viruses is mainly unexplored but could explain the tight relation between some virus infections and the develop­
ment of cancer. For example: in al‐
most all (>99.7%) cervical cancers, a high‐risk HPV type is found (Munoz et al., 2004). Although HPV infection alone may not be sufficient for the 167 development of cervical cancer, it is clear that the viral infection (and its influence on DNA‐methylation) plays a crucial role in carcinogenesis. phenomenon that the virus influences the DNA‐methylation of the host, is probably an early event in cancer de‐
velopment. Some cancer types are clearly asso‐
ciated with a certain cancer type (such as HPV and cervical cancer and hepa‐
tis B and liver cancer). However, typi‐
cally only of a fraction of the infected people develop cancer. Therefore, screening methods based on virus infection are not indicative for the eventual development of cancer. The Thus it might be interesting to investi‐
gate whether it is possible to define (a panel of) methylation markers that become methylated early in the cancer development as a result of virus infec‐
tion. In this part, this search for early detection markers is discussed in cer‐
vical cancer. 168 Introduction
Chapter 20: Cervical cancer and the HPV family of viruses Paper 9: The influence on DNA­methylation of the human host cell genome after infection with HPV viral oncogenes Ongenaert M, Steenbergen R, Trooskens G, Deregowski V, Polyak K, Snij­
ders P, Van Criekinge W. In preparation. High­risk HPV (Human Papilloma Virus) types are highly associated with the development of cervical cancer. However, current screening methods are not very sensitive and a lot of false positives are detected. There exists some evi­
dence that methylation plays a crucial role in the life cycle of HPV: both HPV becomes methylated and HPV seems to be able to mediate its host methyla­
tion state. A methylation marker that is methylated during HPV infection and has a role in the progression towards cervical cancer would be much more sensitive and its detection would have less false positives. The aim is to identi­
fy genes (markers) whose methylation state is altered after infection with high­risk HPV whose altered methylation state plays a crucial role in the initiation or progression of cervical cancer. A unique virus infection model was used: karatinocyte cell lines, transfected with E6 of HPV. Methylation­specific digital karyotyping (MSDK) was used to screen DNA­methylation changes in different stages after transfection and in a cervical cancer cell line in a genome­wide way. 37 genes were selected for further evaluation, using a real­time MSP detection platform. Different cer­
vical cancer cell lines and different infection models were used in this study. Based on, the results, 11 genes were selected and the results were verified. Now, promising candidates (methylated in cervical cancer cell lines, and in the cell lines, transfected with E6/E7 or the entire genome of HPV­16 or HPV­
18; not methylated in keratinocyte cell lines) are further investigated on primary cervical cancer samples and normal controls, this validation pane of the study is still ongoing. Cervical cancer and the HPV family of viruses
169 20.1 Introduction The Human Papilloma Virus (HPV) is a small virus without envelope with a 8 kb circular double stranded DNA ge‐
nome. Over 100 different HPV types have been identified by sequencing (Bernard, 2005). Each HPV type caus‐
es a different kind of epithelial lesion, ranging from cutaneous infections and warts to lesions that can progress to cancer (de Villiers et al., 2004). HPV is transmitted by sexual contact and it is estimated that about 75 % of sexually active people are infected (de Villiers et al., 2004; Baseman and Koutsky, 2005). One group of HPV viruses, the Alpha Papillomaviruses, contains 30 to 40 HPV types that can infect cervical epi‐
thelium (Persson et al., 1996). A sub‐
set of this group (classified as high‐
risk types) is associated with the pro‐
gression of cervical cancer. The most prevalent high‐risk types are HPV‐16 (found in 50 % of cervical cancers) and HPV‐18 (found in 20 % of cervical cancers). In almost all (>99.7%) cer‐
vical cancers, a high‐risk HPV type is found (Munoz et al., 2004). Although HPV infection alone may not be suffi‐
cient for the development of cervical cancer, it is clear that the viral infec‐
tion plays a crucial role in carcinoge‐
nesis. The high‐risk E6 and E7 HPV oncoproteins drive cell proliferation through their association with PDZ domain proteins and Rb (retinoblas‐
170 toma), and contribute to neoplastic progression, whereas E6‐mediated p53 degradation prevents the normal repair of chance mutations in the cel‐
lular genome (Doorbar, 2006). Cervical cancer is the second highest cause of cancer deaths in women worldwide. It is predicted to be diag‐
nosed in 500,000 women and to have caused over 288,000 deaths is 2006 (Parkin, 2006). Screening programs (using for instance Papanicolaou smears) and early interventions have reduced the incidence of cervical can‐
cer. However, these cytology‐based tests are too insensitive (Martin‐
Hirsch et al., 2002). As in many cancer types, DNA‐
methylation is associated with the initiation and progression of cervical cancer. Portions of the HPV genome itself can be methylated (Badal et al., 2004; Badal et al., 2003). On the other hand there is evidence that HPV can have an influence on DNA‐methylation of their human host cells (Wu et al., 2005). Burgers et al. (Burgers et al., 2007) showed that HPV‐16 E7 associ‐
ates with the main DNA‐
methyltransferase DNMT‐1 in vitro and in vivo and could regulate its acti‐
vity. This could be a mechanism by which HPV might influence the cell cycle of its host. We hereby present a genome‐wide screen to detect DNA‐methylation Cervical cancer and the HPV family of viruses
changes of the human host after trans‐
fection with the viral oncoproteins (E6 and E7) of HPV‐16. As stated earlier, infection is not al‐
ways related with carcinogenesis. A way to be able to screen for changes, induced by HPV and clearly associated with early stages of the development of cervical cancer, could be more sen‐
sitive and have less false positives than current screening techniques. If we are able to identify regions in the genome whose methylation state changes as a result of HPV infection and play a role in cervical cancer de‐
velopment, this could be a much better screening technique than the existing techniques. As DNA‐methylation can be detected using MSP technology (Methylation‐specific PCR) (Herman et al., 1996)), such a marker could have the required sensitivity. The change in DNA‐methylation could be detected very early in the progression towards cervical cancer (early detection) and the amount of false positives could be lowered if the methylation change is specific for the development or pro‐
gression of cervical cancer. under the control of a retroviral pro‐
moter are immortalized. As a negative control, a not‐
transfected passage (passage 5 – p5) of keratinocytes has been used, while an early passage after transfection (passage 7 – p7) and an advanced stage after transfection (passage 30 – p30) is used. DNA of these three cell lines is ex‐
tracted, and used for the MSDK expe‐
riment. For the validation experiments (Real‐
time MSP on MethyLight): these cell lines were used in addition: -
-
20.2 Materials and methods 20.2.1 Cell lines
Primary human keratinocytes cell lineages (EK05), transfected with the viral oncogenes (E6 and E7) of HPV‐16 SiHa, HeLa and CaSki are cervical can‐
cer cell lines, EK0 are keratinocyte cell lines; EK05‐2 + HPV‐16 is infected with HPV‐16, FK16A and FK16B are two clonally derived HPV‐16 trans‐
Cervical cancer and the HPV family of viruses
EK00‐12 EK07‐3, passage 3 EK05‐2 transfected with E6 and E7 from HPV16, passage 30 FK16A, passage 35 FK16A, passage 94 FK16B, passage 61 FK18A, passage 75 FK18B, passage 19 FK18B, passage 103 SiHa HeLa CaSki 171 formed cell lines, while FK18A and FK18B are transformed with HPV‐18. The cell lines FK16A, FK16B, FK18A, and FK18B were established by trans‐
fection of primary human foreskin keratinocytes (EK94‐2) with the entire HPV 16 and HPV 18 genome, respec‐
tively (Steenbergen et al., 2002). 20.2.2 Methylation-specific digital karyotyping
MSDK library generation
To see the influence on the methyla‐
tion state of the human host cells on a genome‐wide scale, a Long‐SAGE‐like technique was used: MSDK (Methyla‐
tion Specific Digital Karyotyping). This methodology makes use of a methyla‐
tion‐sensitive restriction enzyme (As‐
cI), if its recognition site (that contains twice a CG) was methylated, the en‐
zyme will not cut and the correspond‐
ing tag will not be generated. The MSDK library generation was con‐
ducted as described in Hu et al. (Hu et al., 2005; Hu et al., 2006). poration, cells were resuspended in SOC medium (Invitrogen) and shaken at 37°C during 30 minutes. After‐
wards, they were plated out on LB agar plates (Sigma) containing 100 ng zeocin/l (Invitrogen). After 12‐24 hours of incubation at 37°C, positive clones were subcloned in 96‐well plates (containing LB agar and 100 ng/l zeocin). Per library, 15 96‐well plates were created this way. Plasmid‐
extraction, purification and sequenc‐
ing was performed (Agowa, Berlin, Germany). Sequences are generally around 500 bp with PHRED‐20 quali‐
ty. 20.2.3 Tag extraction and mapping
Tags (17 bp per tag, ditags are 34 bp long) were extracted with SAGE 2000 v. 4.5 (Invitrogen) and mapping on the human genome (UCSC Mar. 2006 (hg18) assembly based on NCBI Build 36.1) was done using BLAT (Kent, 2002) (100 % identity over complete 21 bp length i.e. the CATG restriction overhang and the 17 bp tag). Cloning and tag sequencing
Vector‐DNA (tags, cloned in pZeRo 1.0) from the MSDK libraries was brought into electrocompetent E. coli cells (Electromax, Invitrogen) using electroporation (1 µl vector library DNA, 12µl nanopure water and 10 µl E. coli cells; 2.5 kV electroshock with E. coli pulser, Bio‐Rad). After electro‐
172 The p5 library contained 12696 tags (6355 different tags of which 3004 are unique in the human genome); p7 contained 16858 tags (7410 different, 4025 unique sites in the human ge‐
nome) and the p30 library consisted of 12474 tags (4547 different tags, of which 3182 represent unique sites in the human genome). In total 6369 Cervical cancer and the HPV family of viruses
tags, unique in the human genome, were found in at least one library. Tags, statistically different between two libraries, were found using the Bio::SAGE::Comparison package (by Scott Zuyderduyn) for BioPerl (Stajich et al., 2002). This module uses the Bayesian analysis method of Audic and Claverie (Audic and Claverie, 1997). Tags that show to be significantly different between two libraries are mapped on the human genome. The AscI (methylation‐sensitive) restric‐
tion site that was responsible for the occurrence of the tag was located for tags with a unique match to the ge‐
nome. Using the Ensembl gene track from the UCSC genome‐browser (Hi‐
nrichs et al., 2006), genes closely re‐
lated with this restriction site were identified. 20.2.4 Real-time MSP platform
MSPs are performed using a real‐time PCR platform (Roche LightCycler® 480) using SYBR green for verification of the melting temperature. Amplicon sizes are verified using the Caliper LabChip® electrophoretic separation system. 20.3 Results 20.3.1 Tags, significantly different between libraries
In total, 184 tags with a unique chro‐
mosomal location are at least statisti‐
cally different between two stages. The chromosomal locations of the tags are displayed in Figure 20.1. Of these 184 tags, 90 % of the AscI restriction sites, responsible for the tag to exist, are located in a CpG island (as defined in the UCSC CpG Track, minimal length 200 bp, C+G‐content > 50 %, O/E‐ratio > 0.60). 90 % of the AscI sites is either within or very closely located to a gene (as defined by the UCSC Ensembl track, less than 4 kb difference as dis‐
tance). Figure 20.1: chromosomal location of statistically different tags between at least two stages. Tag locations are indicated with blue lines above the chromosome Cervical cancer and the HPV family of viruses
173 20.3.2 Real-time MSP
Of the candidates from MSDK analysis, 37 genes were selected, based on the raw data, literature study, public gene expression data and additional data from unpublished studies. Based on their methylation profile (unmethylated in normal cell lines, methylated in cell lines, transfected with HPV E6 and/or E7 and cervical cancer cell lines), additional experi‐
ments were performed on 11 genes and confirmed the initial analysis. An example of an excellent candidate marker is in Figure 20.2 (methylated in most cervical cancer cell lines, un‐
methylated in non‐transfected con‐
trols and methylated in transfected keratinocytes. The raw data of this marker is displayed in Figure 20.3. Five of the selected candidate markers are now further validated on primary cervical cancer samples (and controls) in order to estimate precision and recall for early detection purposes. 20.4 Discussion If an early detection marker is found and indeed related with the HPV infec‐
tion, it can replace existing screening methods, such as Pap‐smears or the identification of high‐risk HPV viral DNA. As the Pap‐smears have a high 174 number of false negatives and the presence of viral DNA is not sufficient for the development of cervical cancer, a methylation biomarker could have a higher specificity and sensitivity for the early development of cervical can‐
cer. Recently, Harald zur Hausen received the Nobel prize in medicine as he in‐
vestigated the relationship between HPV and cervical cancer. Since this discovery, the development of vacci‐
nation strategies has started and two different vaccines are now available. In Belgium they are commercially available under the names of Garda‐
sil® (HPV 16, 18, 6 and 11) and Cerva‐
rix® (HPV 16 and 18). Only the main HPV‐types are included, covering about 80 % of the cervical cancer pa‐
tients. However, vaccination is only useful if given before sexual activity (it can be safely applied starting at 10 years) and thus the need for a highly specific diagnostic tools still remains needed as currently cervical cancer is the fifth most deadly cancer in woman. The incidence and mortality in the US are about half those for the rest of the world, which is due in part to the suc‐
cess of screening with the Pap smear, indicating the success of regular screening methods. It is believed that, because of vaccination, the number of screens will decrease and the benefit of vaccination will be partially lost.
Cervical cancer and the HPV family of viruses
Methylated
1:CaSki -(153661)
Methylated
Methylated
2:CaSki -(153661)
UnMethylated
3:EK00-12 -(153650)
UnMethylated
4:EK00-12 -(153650)
Methylated
5:EK05-2 +HPV16 -(153652)
Methylated
6:EK05-2 +HPV16 -(153652)
UnMethylated
7:EK07-3 -(153651)
UnMethylated
8:EK07-3 -(153651)
Methylated
9:FK16A -(153653)
Methylated
10:FK16A -(153654)
Methylated
11:FK16B -(153655)
Methylated
12:FK16B -(153655)
Methylated
13:FK18A -(153656)
Methylated
14:FK18A -(153656)
UnMethylated
15:FK18B -(153657)
UnMethylated
16:FK18B -(153657)
Methylated
17:FK18B -(153658)
Methylated
18:FK18B -(153658)
Methylated
19:HeLa -(153660)
UnMethylated
20:HeLa -(153660)
Methylated
21:SiHa -(153659)
22:SiHa -(153659)
Figure 20.2: methylation profile of the best candidate marker ne s y
Me ng
S ze
n ns y
n ns y
Me ng
S ze
S ze
n ens y
Me
g
S e
n ens y
S e
g
Me
3bp
5 08
n ens y
n ens y
6 3
0bp
2bp
Me ng
S ze
S ze
2bp
Me ng
Me ng
Me ng
Me ng
n ens y
g
Me
S e
n ens y
Me
0bp
8bp
26 23
S ze
75
18
22
n ens y
n ens y
6 45
73 5
21
8 31
9bp
17
75 47
38 77
S ze
2 46
g
S e
n ens y
Me
Me
g
5bp
20
37 6
8bp
1bp
16
g
S e
S e
n ens y
15
73 85
19
Me ng
5bp
42
6
14
74 27
ne s y
S ze
Me ng
n ns y
S ze
S ze
Me ng
7bp
12
7 84
Me ng
S ze
S ze
Me ng
8bp
5 16
11
n ns y
n ns y
10
36 02
13
39 18
7bp
5bp
7174 24
9
n ns y
8
2 bp
9bp
77 3
7
7 bp
8bp
Me ng
S ze
S ze
Me ng
2bp
5 97
8 bp
6
ne s y
5
6 96
Me ng
S ze
2 bp
Me ng
8 bp
4
8 96
S ze
27 16
ne s y
3
ne s y
2
ne s y
1
75 82
76 6
Figure 20.3: raw results of the real‐time MSP Cervical cancer and the HPV family of viruses
175 Chapter 21: Conclu­
sions In the last decades, cancer became one of the most prominent diseases in the world. Who of us does not know any‐
one who suffered or died from cancer? The mechanisms behind the develop‐
ment are slowly uncovered in the hope to tackle the disease. Recently, the Nobel prize in medicine was shared between researchers on HIV and Harald Zur Hausen who dis‐
covered the relationship between HPV and cervical cancer. In cervical cancer, this relationship is very clear as 99.9 % of patients are infected with the high‐risk virus types. However, since then more and more cancer‐virus relationships have been discovered, and some researchers believe that in up to 20 % of cancer cases, a virus is involved in the development or pro‐
gression. Parts of the viral genomes are inte‐
grated in the human genome (e.g. LINEs and SINEs, transposable ele‐
ments) and these element are silenced using methylation. Can viruses affect DNA‐methylation of the human ge‐
nome as well? Conclusions It is known that methylation plays a key role in the life cycle of certain virus types and there is some evidence that the methylation state of the hu‐
man host cells is affected after virus infection. There might even be an ‘epi‐
genetic battle’ going on between host cells and viruses. In this part, we use a genome‐wide approach (MSDK) to identify regions with altered methylation after viral infection. The validation study is still ongoing, but we have strong evidence that in‐
fection with high‐risk HPV viruses indeed alters the methylation state of different genes in the human host cell, and that the methylated genes might play a role in cervical cancer develop‐
ment as they are methylated in cervi‐
cal cancer cell lines. Some methylation markers are indeed methylated in cervical cancer samples, while not in normal cervix samples. If these methylation changes occur very early and indeed play a key role in the initial development or progres‐
sion of cervical cancer, these might be very promising methylation bio‐
markers for early detection. 177 Other research projects Other research projects (phylogenetic analysis and general bioinformat‐
ics services) involved in, not described in this thesis: - Vercruysse L, Smagghe G, van der Bent A, van Amerongen A, On­
genaert M, Van Camp J. (2008). Critical evaluation of the use of bioin‐
formatics as a theoretical tool to find high‐potential sources of ACE inhi‐
bitory peptides. Peptides, in press. - Van Damme EJ, Nakamura‐Tsurata S, Smith DF, Ongenaert M, Win‐
ter HC, Rouge P, Goldstein IJ, Mo H, Kominami J, Culerrier R, Barre A, Hirabayashi J, Peumans WJ. (2007). Phylogenetic and specificity studies of two‐domain GNA‐related lectins: generation of multispecificity through domain duplication and divergent evolution. Biochemical Jour‐
nal, 404, 51‐61. Acknowledged in: - Chalmet K, Van Wanzeele F, Demecheleer E, Dauwe K, Pelgrom J, Van Der Gucht B, Vogelaers D, Plum J, Stuyver L, Vandekerckhove L, Verhof‐
stede C. (2008). Impact of Delta 32‐CCR5 heterozygosity on HIV‐1 genet‐
ic evolution and variability‐‐a study of 4 individuals infected with closely related HIV‐1 strains. Virology, 379, 213‐222. - Fouquaert E, Hanton S, Brandizzi F, Peumans W, Van Damme E. (2007). Localization and topogenesis studies of cytoplasmic and vacuo‐
lar homologs of the Galanthus nivalis agglutinin. Plant Cell Physiol., 48, 1010‐1021. In preparation: - Melotte V, Ongenaert M, Van Criekinge W, de Bruine A, Baldwin H, van Engeland M. The N‐myc Downstream Regulated Gene (NDRG) fami‐
ly; a family with diverse functions and opportunities. Other research questions
179 Supplementary data Supplementary data for Chapter 8: Intermezzo – Biological Text Mining -
Script 1 Script 2 Script 3a Script 3b Supplementary data for Chapter 13: Discovery of methylation markers in cervical cancer, using relaxation ranking -
-
-
-
Supplementary table 1: list of primers used for BSP Supplementary table 2: Overview of the 45 known methylation markers in cervical cancer selected from literature search and their position after relaxation ranking Supplementary table 3: Overview of published imprinted genes (Imprinted Gene Catalog), their position and gene name after re‐
laxation ranking Supplementary table 4: enriched gene ontology terms, descriptions, number of genes associated with this GO term and P‐value versus all human genes in the first 3000 probes. GO terms and statistics as determined by GOStat Supplementary table 5: overview of Ingenuity networks, highly represented in the top‐3000 list Supplementary table 6: The ranking of possibly functional methy‐
lated genes from the highest ranking probe‐list (TOP250). Probes were ranked according the relaxation ranking algorithm (“original ranking”). Possible functionally methylated genes were selected (“new ranking”) by omitting probes that do not fulfill the following criteria: (1) probes without gene symbol (i.e. gene ID) or hypo‐
thetical genes (marked as “unknown”); (2) probes/genes without a CpG island (marked as “no CpG”) because the expression of such markers is most probably reactivated upon DAC/TSA treatment in‐
directly via methylation‐regulated transcription factors (Shi et al., 2003b); (3) genes located on chromosome X (marked as “X‐
located”) since one of the main mechanisms of the inactivation of one copy of the X‐chromosome in females is DNA methylation (see Supplementary data
181 text); and (4) genes with expression that is not downregulated in less than 15 of the 39 carcinomas (marked as “untreated”); opti‐
mally no expression in all 39 cases is expected (P‐call = 0), but re‐
laxation ranking allows genes with varying P‐calls including those that are expressed in more than 40% of all carcinomas). X, Y and Z represent the P‐calls for primary tumor, untreated and treated cell lines, respectively Supplementary materials of Chapter 14: Exploring the cancer me­
thylome using genome­wide promoter analysis -
-
-
-
Supplementary table 1: Two different classes of genes are defined. Genes listed in Class A are only methylated in cancer and not in normals. Genes listed in Class B are at least partially methylated in normals Supplementary table 2: list of 56 known markers for broad analysis Supplementary figure 1: A: number of simulations (of the complete approach) that performed at least as good as the methylation markers. B: complete approach: mean number of cumulative markers and 75 and 95 % percentiles of the distribution of the markers and compared with the 95 % percentile of the means of all simulations Supplementary figure 2: conservation of the different patterns throughout evolution. The difference between the observed and expected score per nucleotide for chicken, mouse and rat. The higher, the better the pattern is conserved in comparison with its neighboring sequence. Error bars represent the standard error Supplementary table 3: genes, selected in broad analysis Supplementary table 4: genes, selected in the deep analysis Supplementary data for Chapter 15: Genome­wide promoter analy­
sis uncovers portions of the cancer methylome -
182 Supplementary table 1: Details cell line information used in screening study Supplementary table 2: Primers used for bisulfite sequencing Supplementary table 3: Primers and probes used for QMSP Supplementary table 4: Cancer‐specific methylated genes and their proposed functions Supplementary data
-
Supplementary table 5: Frequency of methylation in different tissue types based on bisulfite sequencing, COBRA, quantitative MSP, or conventional MSP Supplementary data for Chapter 17: Predicting platinum response in ovarian cancer, using DNA methylation profiling -
-
Supplementary table 1: Ranking of methylation candidates in ovarian cancer in general and the different ranking parameters Supplementary table 2: Ranking of methylation candidates in plati‐
num‐resistant ovarian cancer (methylated in platinum‐resistant cell lines) Supplementary table 3: Ranking of methylation candidates in plati‐
num‐sensitive ovarian cancer (methylated in platinum‐sensitive cell lines) Supplementary data for Chapter 20: Cervical cancer and the HPV family of viruses -
-
Supplementary table 1: MSDK analysis results: tags with a unique location in the human chromosome, and the tag counts and statistics between stages. Green notes possible demethylation while red indicates possible methylation in the later stage Supplementary table 2: genes, selected for validation in MethyLight experiment Supplementary table 3: MethyLight results (1=not methylated, 0=methylated) Supplementary table 4: genes, and the interpreted results of the additional validation in MethyLight experiments All supplementary data are available online at: http://nexus.ugent.be/mate/phd/ Supplementary data
183 Summary and future perspec‐
tives DNA‐methylation is an epigenetic modification. These modifica‐
tions of the DNA do not change the genetic sequence itself, but af‐
fect the level ‘above’ it: chemical modifications of the DNA or the histones (where the DNA is wound around) are added or altered. DNA‐methylation is the modification where a methyl group is add‐
ed on cytosine (C) residues in CG dinucleotides. If this takes place in the so‐called CpG islands (where CG dinucleotides occur very densely) within the promoter of a gene, this gene is silenced, and will not be transcribed; its function is lost. In this thesis, different aspects of DNA‐methylation and its importance in the field of on‐
cology has been dealt with. First, we created different methodologies and algorithms to identi‐
fy methylation markers: genes that are specifically methylated in cancer but not in normal tissue or genes, methylated in subgroups of patients (e.g. responders of a certain chemotherapy). We built a database and methodologies to discover existing publi‐
cations in the fast‐growing DNA‐methylation field. This database (PubMeth) allows to screen which genes are reported methylated in the selected cancer (sub)types and vice versa. Different ranking, sorting and selection techniques were developed to prioritize promising methylation marker candidates. These me‐
thodologies were all developed to deal with data from genome‐
wide approaches. Novel methylation markers (methylated in can‐
cer while not in normal) could be used as early detection markers. Summary 185 Different computational approaches are described here: ‐ Relaxation ranking: ranking strategy based on expression mi‐
cro‐arrays of primary cervical cancer samples and re‐expression experiments on cervical cancer cell lines. This methodology is based on low expression in cancer samples and re‐expression after treatment with the de‐methylation agent DAC and the histone de‐
acetylase inhibitor TSA. The methodology involves no thresholds. ‐ The deep approach: different DNA‐patterns in the promoter re‐
gion of genes were identified that seem to discriminate between cancer‐specifically methylated genes and genes that also are me‐
thylated in normal tissues. ‐ The broad approach makes use of a genome‐wide alignment of promoter regions. Apparently, genes described as methylation markers, are more densely clustered together. ‐ Both the deep approach and the broad approach were combined with data from re‐expression studies in cancer cell lines. The com‐
bination of this experimental filter and the computational ap‐
proaches drastically improves the success rate in finding cancer‐
specific methylation markers in various cancer types. ‐ There were also markers discovered, that may be able to predict chemotherapy (platinum) response in ovarian cancer, clearing the roads towards personalized medicine. The methodology uses a score‐scheme, based on both primary cancer samples (from pa‐
tients sensitive and resistant to platinum therapies) as re‐
expression experiments of ovarium cancer cell lines (both cisplatin resistant and sensitive). The novel identification strategies were validated on primary cancer samples and perform well: the success rate was improved in 186 Summary
comparison with other prioritization attempts. Several novel me‐
thylation markers were discovered in different cancer types (cer‐
vical cancer, ovarian cancer, head‐and‐neck cancer, neuroblastoma, …) and were validated on primary samples. Follow‐up research on larger clinical cohorts will demonstrate their potential diagnostic power. For some studies, the validation effort is still in progress; this illustrates the need for fast and accu‐
rate analysis techniques, suitable for validation purposes. Secondly, a unique viral infection system, combined with a ge‐
nome‐wide methylation detection methodology was used to inves‐
tigate the epigenetic ‘reprogramming’ of human host cells af­
ter infection with high­risk HPV (Human Papilloma Virus). In this experiment, we prove that a virus is able to epigenetically pro‐
gram their host cell. The high‐risk HPV types are clearly related with the development of cervical cancer as more than 99 % of the patients is infected with such a virus type. For the moment, the cytological Pap‐smear screening technique is widespread and used for early detection. Recently, vaccines were developed in order to prevent infection with the most prominent virus types (about 80 % of infections cov‐
ered). The vaccination strategy must be applied in young girls (be‐
fore sexual contact). The discovery of cervical cancer biomarkers with high sensitivity and specificity remains necessary for the non‐
vaccinated group and screening programs remain needed as the vaccines do not cover all hr‐HPV virus types. DNA‐methylation markers that seem to be related with infection with the virus, are ideal candidates for very early detection in a screening program: large scale analysis can be automated. Summary 187 As it is believed that multiple cancers may occur after viral infec‐
tion, or at least that these infections plays a key role in the devel‐
opment, this broadens the knowledge in this process and opens ways to very early detection. Currently, only fractions of the cancer epigenome are known. The introduction of methods that generate significantly large amounts of data (such as next generation sequencing) might be able to greatly expand the current knowledge. However, large collections of primary cancer samples (preferentially of different stages) will be needed. A centrally managed, well described and annotated li‐
brary of patient material containing different cancer types would be extremely beneficial. In addition, the highest level of precision (base pair level of single DNA‐molecules) will be reached for a high number of samples at the same time with sequencing techniques. The initial set‐up of such an experiment must be chosen so that the sequenced parts of the genome are enriched in DNA‐methylation or histone modifica‐
tions. The data analysis pipeline for the interpretation of the gener‐
ated data must perform fast and extract new knowledge out of the terabytes of raw data generated. Both the experimental set‐up and the downstream data analysis, our laboratory is working on. 188 Summary
Samenvatting en toekomst‐
perspectieven DNA‐methylatie is een epigenetische modificatie. Dit type van modificaties verandert de genetische informatie zelf niet, maar wijzigt de laag erboven. Er worden chemische modificaties toegevoegd of gewijzigd op het DNA of de his‐
tonen (waarrond het DNA gewonden is). DNA‐methylatie is die modificatie waarbij een methylgroep wordt aangebracht op het cytosine (C) residue en een CG dincucleotide. Als deze wijziging plaats vindt in een zoge‐
naamd CpG eiland (waar deze CG dinucleotiden dens bij el‐
kaar voorkomen) van de promoter van een gen, wordt dit gen niet meer afgeschreven en zal zijn functie verloren gaan. In dit proefschrift worden verschillende aspecten van DNA‐
methylatie en het belang ervan in de oncologie belicht. In het eerste onderzoeksdeel hebben we verschillende me‐
thodes en algoritmes ontwikkeld om methylatiemerkers te identificeren: genen die specifiek in kanker worden geme‐
thyleerd maar niet in normale weefsels of genen die enkel worden gemethyleerd in bepaalde patiëntsubgroepen (bv. die respons vertonen voor een bepaalde chemotherapie). We hebben een databank en methodes gemaakt om kennis uit de literatuur in het snelgroeiende DNA‐methylatie veld te halen. Deze databank (PubMeth) laat toe om het verband na te gaan tussen genen en hun methylatiepatroon in ver‐
schillende kanker (sub)types. Daarnaast werden verschillende sorterings‐ en selectieme‐
thodieken ontwikkeld om zo een prioriteit toe te kennen Samenvatting
189 aan de kandidaat DNA‐methylatiemerkers. Al deze metho‐
des werden ontwikkeld om data van genoom‐wijde experi‐
menten te kunnen verwerken. Nieuwe methylatiemerkers (gemethyleerd in kanker maar niet in normaal) kunnen worden gebruik bij vroegtijdige detectie van kanker. Er werden verschillende computationele oplossingen voor‐
gesteld: ‐ Relaxation ranking: sorteermethodiek gebaseerd op ex‐
pressie micro‐arrays van primair baarmoederhalskanker patiëntmateriaal en re‐expressie experimenten op baar‐
moederhalskanker cellijnen. Deze methode is gebaseerd op lage expressie in kanker en re‐expressie na behandeling met het de‐methylerend agens DAC en de histondeacetylase in‐
hibitor TSA. De methode gebruikt geen enkele (arbitrair te kiezen) grenswaarde ‐ De deep approach: er werden verschillende DNA‐
motieven in de promoter‐regio van genen geïdentificeerd die discrimineren tussen kanker‐specifiek gemethyleerde genen en genen die ook gemethyleerd worden in normale weefsels. ‐ De broad approach maakt gebruik van een genoom‐wijde alignering van promoterregio’s. Hieruit blijkt dat gekende methylatiemerkers meer dens geclusterd zijn dan verwacht. ‐ Zowel ‘deep’ als ‘broad’ werden gecombineerd met re‐
expressie studies in cellijnen van verschillende kankertypes. De combinatie van deze experimentele filter en de computa‐
tionele aanpak verhoogt de succesratio bij het vinden van kanker‐specifiek gemethyleerde genen aanzienlijk. 190 Samenvatting
‐ Er werden ook merkers geïdentificeerd die mogelijks de respons op chemotherapie (platinum) kunnen voorspel‐
len in eierstokkanker. Dit opent de weg naar gepersonali‐
seerde geneeskunde. De methodiek gebruikt een score‐
schema gebaseerd op zowel primaire kanker stalen (van zowel platinum‐gevoelige als resistente patiënten) als re‐
expressie experimenten op kanker cellijnen. De nieuwe identificatiestrategieën werden gevalideerd op primaire kankerstalen en presteren goed: ten opzichte van beschreven priorizatietechnieken is de succesratio ver‐
hoogd. Verscheidene nieuwe methylatie biomerkers in ver‐
schillende kankertypes (baarmoederhalskanker, eierstok‐
kanker, hoofd‐en nekkanker, neuroblastoom,…) werden succesvol gevalideerd op primaire patiëntenstalen. Vervolg‐
onderzoek op grotere patiëntengroepen zal het mogelijk diagnostische potentieel aantonen. De validatie van sommi‐
ge studies is momenteel nog lopende. Dit toont de noodzaak aan van snelle en betrouwbare analysetechnieken die ge‐
schikt zijn voor validatiedoeleinden. Ten tweede werd er een uniek viraal infectiesysteem ge‐
combineerd met een genoom‐wijde methylatiegevoelige detectietechniek om epigenetische ‘herprogrammering’ van humane gastheercellen na infectie met hoge‐risico HPV (Humaan Papilloma Virus) te onderzoeken. In dit experi‐
ment geven we sterke indicaties dat het virus in staat is de gastheercel epigenetisch te wijzigen. Deze hr‐HPV virustypes zijn duidelijk geassocieerd met de ontwikkeling van baarmoederhalskanker gezien meer dan 99 % van de patiënten besmet is. Momenteel is het uitstrijk‐
je een wijdverspreide screening methodiek die wordt ge‐
bruikt bij vroegdetectie. Onlangs werden er ook vaccins op de markt gebracht die infectie met de meest voorkomende virustypes (ongeveer 80 % van de infecties) moet voorko‐
men. De vaccinatie zou moeten gebeuren bij meisjes voor het eerste seksueel contact. De ontdekking van biomerkers Samenvatting
191 met hoge specificiteit en sensitiviteit bij baarmoederhals‐
kanker blijft noodzakelijk voor de niet gevaccineerde groep en gezien de vaccins niet tegen alle virustypes bescherming bieden. DNA‐methylatiemerkers zijn uitstekende kandida‐
ten voor vroegdetectie in een screeningsprogramma: ze kunnen op grote schaal en geautomatiseerd gebeuren. Gezien er aangenomen wordt dat verschillende kankertypes verwant zijn met virusinfecties, verhoogt dit onderzoek de kennis bij dit proces en opent dit de weg naar zeer vroege detectie. Momenteel zijn enkel delen van het ‘kanker epigenoom’ be‐
kend. De komst van methodes die grote hoeveelheden data genereren (zoals volgende‐generatie sequenering) zou de bestaande kennis in grote mate kunnen laten toenemen. Dit vereist echter grote collecties primaire kankerstalen (liefst nog in verschillende stadia). Een centraal beheerde, goed geannoteerde bibliotheek van patiëntenmateriaal die ver‐
schillende kankertypes bevat, zou het onderzoek in een stroomversnelling plaatsen. Bijkomende uitdaging is dat deze technieken het hoogste niveau van precisie hebben (op baseniveau van een enkele DNA‐molecule) en tegelijkertijd een hoog aantal stalen ver‐
werken. Dit betekent dat bij het proefopzet sequentiestuk‐
ken moeten gekozen worden die aangerijkt zijn aan DNA‐
methylatie of histonmodificaties. De data‐analyse strategie moet snel genoeg zijn maar toch nieuwe kennis extraheren uit de terabytes ruwe data die gegenereerd worden. Ons labo werkt zowel op de ontwikkeling van een goed proefop‐
zet als de verwerking van de gegenereerde data. 192 Samenvatting
References Agrawal,A., Murphy,R.F., and Agrawal,D.K. (2007). DNA methylation in breast and colorec‐
tal cancers. Mod. Pathol. 20, 711‐721. Alahari,S.K. and Nasrallah,H. (2004). A mem‐
brane proximal region of the integrin alpha5 subunit is important for its interaction with nischarin. Biochem J 377. Amoreira,C., Hindermann,W., and Grunau,C. (2003). An improved version of the DNA methy‐
lation database (MethDB). Nucleic Acids Re‐
search 31, 75‐77. Audic,S. and Claverie,J.M. (1997). The signific‐
ance of digital gene expression profiles. Genome Res. 7, 986‐995. Badal,S., Badal,V., Calleja‐Macias,I.E., Kalanta‐
ri,M., Chuang,L.S., Li,B.F., and Bernard,H.U. (2004). The human papillomavirus‐18 genome is efficiently targeted by cellular DNA methyla‐
tion. Virology 324, 483‐492. Badal,V., Chuang,L.S., Tan,E.H., Badal,S., Villa,L.L., Wheeler,C.M., Li,B.F., and Bernard,H.U. (2003). CpG methylation of human papillomavirus type 16 DNA in cervical cancer cell lines and in clini‐
cal specimens: genomic hypomethylation corre‐
lates with carcinogenic progression. J. Virol. 77, 6227‐6234. Ballestar,E. and Esteller,M. (2005). Methyl‐CpG‐
binding proteins in cancer: blaming the DNA methylation messenger. Biochem. Cell Biol. 83, 374‐384. Baseman,J.G. and Koutsky,L.A. (2005). The epidemiology of human papillomavirus infec‐
tions. J. Clin. Virol. 32 Suppl 1, S16‐S24. Baylin,S.B. and Ohm,J.E. (2006). Epigenetic gene silencing in cancer ‐ a mechanism for early oncogenic pathway addiction? Nature Reviews Cancer 6, 107‐116. References Beissbarth,T. and Speed,T.P. (2004). GOstat: find statistically overrepresented Gene Ontologies within a group of genes. Bioinformatics 20, 1464‐1465. Belinsky,S.A., Nikula,K.J., Palmisano,W.A., Mi‐
chels,R., Saccomanno,G., Gabrielson,E., Bay‐
lin,S.B., and Herman,J.G. (1998). Aberrant me‐
thylation of p16(INK4a) is an early event in lung cancer and a potential biomarker for early diagnosis. Proc Natl Acad Sci U S A 95. Belleau,F., Nolin,M.A., Tourigny,N., Rigault,P., and Morissette,J. (2008). Bio2RDF: Towards a mashup to build bioinformatics knowledge systems. J. Biomed. Inform. Bernard,H.U. (2005). The clinical importance of the nomenclature, evolution and taxonomy of human papillomaviruses. J. Clin. Virol. 32 Suppl 1, S1‐S6. Bestor,T.H. (2000). The DNA methyltransferases of mammals. Hum. Mol. Genet. 9, 2395‐2402. Blanchard,F., Tracy,E., Smith,J., Chattopad‐
hyay,S., Wang,Y.P., Held,W.A., and Baumann,H. (2003). DNA methylation controls the respon‐
siveness of hepatoma cells to leukemia inhibito‐
ry factor. Hepatology 38, 1516‐1528. Bock,C. and Lengauer,T. (2008). Computational epigenetics. Bioinformatics. 24, 1‐10. Bock,C., Paulsen,M., Tierling,S., Mikeska,T., Lengauer,T., and Walter,J. (2006). CpG island methylation in human lymphocytes is highly correlated with DNA sequence, repeats, and predicted DNA structure. PLoS. Genet. 2, e26. Bock,C., Walter,J., Paulsen,M., and Lengauer,T. (2007). CpG island mapping by epigenome prediction. PLoS. Comput. Biol. 3, e110. 193 Bosch,F.X., Lorincz,A., Munoz,N., Meijer,C.J., and Shah,K.V. (2002). The causal relation between human papillomavirus and cervical cancer. J. Clin. Pathol. 55, 244‐265. Bulkmans,N.W., Berkhof,J., Rozendaal,L., van Kemenade,F.J., Boeke,A.J., Bulk,S., Voorhorst,F.J., Verheijen,R.H., van,G.K., Boon,M.E., Ruitinga,W., van,B.M., Snijders,P.J., and Meijer,C.J. (2007). Human papillomavirus DNA testing for the detection of cervical intraepithelial neoplasia grade 3 and cancer: 5‐year follow‐up of a ran‐
domised controlled implementation trial. Lancet 370, 1764‐1772. Burgers,W.A., Blanchon,L., Pradhan,S., de,L.Y., Kouzarides,T., and Fuks,F. (2007). Viral onco‐
proteins target the DNA methyltransferases. Oncogene 26, 1650‐1655. Cameron,E.E., Bachman,K.E., Myohanen,S., Herman,J.G., and Baylin,S.B. (1999). Synergy of demethylation and histone deacetylase inhibi‐
tion in the re‐expression of genes silenced in cancer. Nature Genetics 21, 103‐107. Carnell,A.N. and Goodman,J.I. (2003). The long (LINEs) and the short (SINEs) of it: altered methylation as a precursor to toxicity. Toxicol. Sci. 75, 229‐235. Chang,S.C., Tucker,T., Thorogood,N.P., and Brown,C.J. (2006). Mechanisms of X‐
chromosome inactivation. Front Biosci. 11, 852‐
866. Chen,D., Muller,H.M., and Sternberg,P.W. (2006). Automatic document classification of biological literature. BMC. Bioinformatics. 7, 370. Chen,H. and Sharp,B.M. (2004). Content‐rich biological network constructed by mining PubMed abstracts. BMC. Bioinformatics. 5, 147. Chen,L., Liu,H., and Friedman,C. (2005). Gene name ambiguity of eukaryotic nomenclatures. Bioinformatics. 21, 248‐256. Cheng,D., Knox,C., Young,N., Stothard,P., Dama‐
raju,S., and Wishart,D.S. (2008). PolySearch: a web‐based text mining system for extracting relationships between human diseases, genes, 194 References
mutations, drugs and metabolites. Nucleic Acids Res. Cheong,J., Yamada,Y., Yamashita,R., Irie,T., Ka‐
nai,A., Wakaguri,H., Nakai,K., Ito,T., Saito,I., Sugano,S., and Suzuki,Y. (2006). Diverse DNA methylation statuses at alternative promoters of human genes in various tissues. DNA Res. 13, 155‐167. Cheung,K.H., Yip,K.Y., Townsend,J.P., and Scotch,M. (2008). HCLS 2.0/3.0: Health care and life sciences data mashup using Web 2.0/3.0. J. Biomed. Inform. Cheung,T.H., Lo,K.W.K., Yim,S.F., Chan,L.K.Y., Heung,M.S., Chan,C.S., Cheung,A.Y.K., Chung,T.K.H., and Wong,Y.F. (2004). Epigenetic and genetic alternation of PTEN in cervical neoplasm. Gynecologic Oncology 93, 621‐627. Chuang,J.C. and Jones,P.A. (2007). Epigenetics and microRNAs. Pediatr. Res. 61, 24R‐29R. Cohen,K.B. and Hunter,L. (2008). Getting started in text mining. PLoS. Comput. Biol. 4, e20. Costello,J.F., Fruhwald,M.C., Smiraglia,D.J., Rush,L.J., Robertson,G.P., Gao,X., Wright,F.A., Feramisco,J.D., Peltomaki,P., Lang,J.C., Schul‐
ler,D.E., Yu,L., Bloomfield,C.D., Caligiuri,M.A., Yates,A., Nishikawa,R., Su,H.H., Petrelli,N.J., Zhang,X., O'Dorisio,M.S., Held,W.A., Cave‐
nee,W.K., and Plass,C. (2000). Aberrant CpG‐
island methylation has non‐random and tu‐
mour‐type‐specific patterns. Nat. Genet. 24, 132‐
138. Dammann,R., Strunnikova,M., Schagdarsuren‐
gin,U., Rastetter,M., Papritz,M., Hattenhorst,U.E., Hofmann,H.S., Silber,R.E., Burdach,S., and Han‐
sen,G. (2005). CpG island methylation and expression of tumour‐associated genes in lung carcinoma. Eur. J. Cancer 41, 1223‐1236. Das,R., Dimitrova,N., Xuan,Z., Rollins,R.A., Hag‐
highi,F., Edwards,J.R., Ju,J., Bestor,T.H., and Zhang,M.Q. (2006). Computational prediction of methylation status in human genomic se‐
quences. Proc. Natl. Acad. Sci. U. S. A 103, 10713‐
10716. Davis,C.D. and Uthus,E.O. (2004). DNA methyla‐
tion, cancer susceptibility, and nutrient interac‐
tions. Exp. Biol. Med. (Maywood. ) 229, 988‐995. Esteller,M. (2007b). Epigenetic gene silencing in cancer: the DNA hypermethylome. Hum. Mol. Genet. 16 Spec No 1, R50‐R59. de Villiers,E.M., Fauquet,C., Broker,T.R., Ber‐
nard,H.U., and zur,H.H. (2004). Classification of papillomaviruses. Virology 324, 17‐27. Esteller,M., Sparks,A., Toyota,M., Sanchez‐
Cespedes,M., Capella,G., Peinado,M.A., Gonza‐
lez,S., Tarafa,G., Sidransky,D., Meltzer,S.J., Bay‐
lin,S.B., and Herman,J.G. (2000). Analysis of adenomatous polyposis coli promoter hyperme‐
thylation in human cancer. Cancer Res 60. Doerfler,W., Toth,M., Kochanek,S., Achten,S., Freisemrabien,U., Behnkrappa,A., and Orend,G. (1990). Eukaryotic Dna Methylation ‐ Facts and Problems. Febs Letters 268, 329‐333. Dontenwill,M., Pascal,G., Piletz,J.E., Chen,M., Baldwin,J., Ronde,P., Dupuy,L., Urosevic,D., Greney,H., Takeda,K., and Bousquet,P. (2003). IRAS, the human homologue of Nischarin, pro‐
longs survival of transfected PC12 cells. Cell Death Differ 10. Doorbar,J. (2006). Molecular biology of human papillomavirus infection and cervical cancer. Clin. Sci. (Lond) 110, 525‐541. Douglas,S.M., Montelione,G.T., and Gerstein,M. (2005). PubNet: a flexible system for visualizing literature derived networks. Genome Biol. 6, R80. Fang,Y.C., Huang,H.C., and Juan,H.F. (2008). MeInfoText: associated gene methylation and cancer information from text mining. BMC. Bioinformatics. 9, 22. Feinberg,A.P. (2008). Epigenetics at the epicen‐
ter of modern medicine. JAMA 299, 1345‐1350. Feltus,F.A., Lee,E.K., Costello,J.F., Plass,C., and Vertino,P.M. (2003). Predicting aberrant CpG island methylation. Proc. Natl. Acad. Sci. U. S. A 100, 12253‐12258. Fernandez,J.M., Hoffmann,R., and Valencia,A. (2007). iHOP web services. Nucleic Acids Res. 35, W21‐W26. Ehrlich,M. (2002). DNA hypomethylation, can‐
cer, the immunodeficiency, centromeric region instability, facial anomalies syndrome and chromosomal rearrangements. J Nutr 132. Flicek,P., Aken,B.L., Beal,K., Ballester,B., Cacca‐
mo,M., Chen,Y., Clarke,L., Coates,G., Cunning‐
ham,F., Cutts,T., Down,T., Dyer,S.C., Eyre,T., Fitzgerald,S., Fernandez‐Banet,J., Graf,S., Haid‐
er,S., Hammond,M., Holland,R., Howe,K.L., Howe,K., Johnson,N., Jenkinson,A., Kahari,A., Keefe,D., Kokocinski,F., Kulesha,E., Lawson,D., Longden,I., Megy,K., Meidl,P., Overduin,B., Park‐
er,A., Pritchard,B., Prlic,A., Rice,S., Rios,D., Schus‐
ter,M., Sealy,I., Slater,G., Smedley,D., Spudich,G., Trevanion,S., Vilella,A.J., Vogel,J., White,S., Wood,M., Birney,E., Cox,T., Curwen,V., Durbin,R., Fernandez‐Suarez,X.M., Herrero,J., Hubbard,T.J., Kasprzyk,A., Proctor,G., Smith,J., Ureta‐Vidal,A., and Searle,S. (2008). Ensembl 2008. Nucleic Acids Res. 36, D707‐D714. Esteller,M. (2003). Cancer epigenetics: DNA methylation and chromatin alterations in hu‐
man cancer. New Trends in Cancer for the 21St Century 532, 39‐49. Fradet,Y., Picard,V., Bergeron,A., and Larue,H. (2006). Cancer‐testis antigen expression in bladder cancer. Progres en Urologie 16, 421‐
428. Esteller,M. (2007a). Cancer epigenomics: DNA methylomes and histone‐modification maps. Nat. Rev. Genet. 8, 286‐298. Frank,E., Hall,M., Trigg,L., Holmes,G., and Wit‐
ten,I.H. (2004). Data mining in bioinformatics using Weka. Bioinformatics. 20, 2479‐2481. Dowdy,S.C., Gostout,B.S., Shridhar,V., Wu,X.S., Smith,D.I., Podratz,K.C., and Jiang,S.W. (2005). Biallelic methylation and silencing of paternally expressed gene 3 (PEG3) in gynecologic cancer cell lines. Gynecologic Oncology 99, 126‐134. Edwards,C.A. and Ferguson‐Smith,A.C. (2007). Mechanisms regulating imprinted genes in clusters. Curr. Opin. Cell Biol. 19, 281‐289. References 195 Fricker,S.P. (2007). Metal based drugs: from serendipity to design. Dalton Trans. 4903‐4917. Fundel,K. and Zimmer,R. (2006). Gene and protein nomenclature in public databases. BMC. Bioinformatics. 7, 372. Gajendran,V.K., Lin,H.R., and Fyhrie,D.P. (2007). An application of bioinformatics and text mining to the discovery of novel genes related to bone biology. Bone 40, 1378‐1388. Garber,K. (2006). Momentum building for hu‐
man epigenome project. J. Natl. Cancer Inst. 98, 84‐86. mediates methylation‐sensitive enhancer‐
blocking activity at the H19/Igf2 locus. Nature 405, 486‐489. Harper,D.M., Franco,E.L., Wheeler,C., Ferris,D.G., Jenkins,D., Schuind,A., Zahaf,T., Innis,B., Naud,P., De Carvalho,N.S., Roteli‐Martins,C.M., Teixeira,J., Blatter,M.M., Korn,A.P., Quint,W., and Dubin,G. (2004). Efficacy of a bivalent L1 virus‐like par‐
ticle vaccine in prevention of infection with human papillomavirus types 16 and 18 in young women: a randomised controlled trial. Lancet 364, 1757‐1765. Gardiner‐Garden,M. and Frommer,M. (1987). CpG islands in vertebrate genomes. J. Mol. Biol. 196, 261‐282. Hatada,I., Fukasawa,M., Kimura,M., Morita,S., Yamada,K., Yoshikawa,T., Yamanaka,S., Endo,C., Sakurada,A., Sato,M., Kondo,T., Horii,A., Ushiji‐
ma,T., and Sasaki,H. (2006). Genome‐wide profiling of promoter methylation in human. Oncogene 25, 3059‐3064. Gene Ontology Consortium (2008). The Gene Ontology project in 2008. Nucleic Acids Res. 36, D440‐D444. Hayatsu,H. (1976). Bisulfite modification of nucleic acids and their constituents. Prog. Nucle‐
ic Acid Res. Mol. Biol. 16, 75‐124. Gentleman,R.C., Carey,V.J., Bates,D.M., Bolstad,B., Dettling,M., Dudoit,S., Ellis,B., Gautier,L., Ge,Y.C., Gentry,J., Hornik,K., Hothorn,T., Huber,W., Ia‐
cus,S., Irizarry,R., Leisch,F., Li,C., Maechler,M., Rossini,A.J., Sawitzki,G., Smith,C., Smyth,G., Tierney,L., Yang,J.Y.H., and Zhang,J.H. (2004). Bioconductor: open software development for computational biology and bioinformatics. Genome Biology 5. Heard,E. (2004). Recent advances in X‐
chromosome inactivation. Current Opinion in Cell Biology 16, 247‐255. Guo,M., Akiyama,Y., House,M.G., Hooker,C.M., Heath,E., Gabrielson,E., Yang,S.C., Han,Y., Bay‐
lin,S.B., Herman,J.G., and Brock,M.V. (2004). Hypermethylation of the GATA genes in lung cancer. Clin. Cancer Res. 10, 7917‐7924. Hagihara,A., Miyamoto,K., Furuta,J., Hiraoka,N., Wakazono,K., Seki,S., Fukushima,S., Tsao,M.S., Sugimura,T., and Ushijima,T. (2004). Identifica‐
tion of 27 5 ' CpG islands aberrantly methylated and 13 genes silenced in human pancreatic cancers. Oncogene 23, 8705‐8710. Hanahan,D. and Weinberg,R.A. (2000). The hallmarks of cancer. Cell 100. Hark,A.T., Schoenherr,C.J., Katz,D.J., Ingram,R.S., Levorse,J.M., and Tilghman,S.M. (2000). CTCF 196 References
Hegi,M.E., Diserens,A.C., Gorlia,T., Hamou,M.F., de,T.N., Weller,M., Kros,J.M., Hainfellner,J.A., Mason,W., Mariani,L., Bromberg,J.E., Hau,P., Mirimanoff,R.O., Cairncross,J.G., Janzer,R.C., and Stupp,R. (2005). MGMT gene silencing and benefit from temozolomide in glioblastoma. N. Engl. J. Med. 352, 997‐1003. Herman,J.G. (2005). Epigenetic changes in cancer and preneoplasia. Cold Spring Harb. Symp. Quant. Biol. 70, 329‐333. Herman,J.G. and Baylin,S.B. (2003). Gene silenc‐
ing in cancer in association with promoter hypermethylation. N. Engl. J. Med. 349, 2042‐
2054. Herman,J.G., Graff,J.R., Myohanen,S., Nelkin,B.D., and Baylin,S.B. (1996). Methylation‐specific PCR: a novel PCR assay for methylation status of CpG islands. Proc. Natl. Acad. Sci. U. S. A 93, 9821‐9826. Hinrichs,A.S., Karolchik,D., Baertsch,R., Bar‐
ber,G.P., Bejerano,G., Clawson,H., Diekhans,M., Furey,T.S., Harte,R.A., Hsu,F., Hillman‐Jackson,J., Kuhn,R.M., Pedersen,J.S., Pohl,A., Raney,B.J., Rosenbloom,K.R., Siepel,A., Smith,K.E., Sug‐
net,C.W., Sultan‐Qurraie,A., Thomas,D.J., Trum‐
bower,H., Weber,R.J., Weirauch,M., Zweig,A.S., Haussler,D., and Kent,W.J. (2006). The UCSC Genome Browser Database: update 2006. Nucle‐
ic Acids Res. 34, D590‐D598. Hochberg,Y. and Benjamini,Y. (1990). More Powerful Procedures for Multiple Significance Testing. Statistics in Medicine 9, 811‐818. Hoelzer,K., Shackelton,L.A., and Parrish,C.R. (2008). Presence and role of cytosine methyla‐
tion in DNA viruses of animals. Nucleic Acids Res. Hoffmann,M.J., Muller,M., Engers,R., and Schulz,W.A. (2006). Epigenetic control of CTCFL/BORIS and OCT4 expression in urogenit‐
al malignancies. Biochemical Pharmacology 72, 1577‐1588. Hoffmann,R. (2007). Using the iHOP information resource to mine the biomedical literature on genes, proteins, and chemical compounds. Curr. Protoc. Bioinformatics. Chapter 1, Unit1. Holliday,R. and Pugh,J.E. (1975). DNA modifica‐
tion mechanisms and gene activity during de‐
velopment. Science 187. Holmes,R. and Soloway,P.D. (2006). Regulation of imprinted DNA methylation. Cytogenet. Genome Res. 113, 122‐129. Hong,C.B., Bollen,A.W., and Costello,J.F. (2003). The contribution of genetic and epigenetic mechanisms to gene silencing in oligodendrog‐
liomas. Cancer Research 63, 7600‐7605. Hong,F., Breitling,R., McEntee,C.W., Wittner,B.S., Nemhauser,J.L., and Chory,J. (2006). RankProd: a bioconductor package for detecting differen‐
tially expressed genes in meta‐analysis. Bioin‐
formatics. 22, 2825‐2827. stra,W.H., Schoenberg,M., Zahurak,M., Good‐
man,S.N., and Sidransky,D. (2006). Quantitation of promoter methylation of multiple genes in urine DNA and bladder cancer detection. J Natl Cancer Inst 98. Hoque,M.O., Kim,M.S., Ostrow,K.L., Liu,J., Wis‐
man,G.B., Park,H.L., Poeta,M.L., Jeronimo,C., Henrique,R., Lendvai,A., Schuuring,E., Begum,S., Rosenbaum,E., Ongenaert,M., Yamashita,K., Califano,J., Westra,W., van der Zee,A.G., Van,C.W., and Sidransky,D. (2008). Genome‐wide promo‐
ter analysis uncovers portions of the cancer methylome. Cancer Res. 68, 2661‐2670. Hoque,M.O., Rosenbaum,E., Westra,W.H., Xing,M., Ladenson,P., Zeiger,M.A., Sidransky,D., and Umbricht,C.B. (2005a). Quantitative as‐
sessment of promoter methylation profiles in thyroid neoplasms. J Clin Endocrinol Metab 90. Hoque,M.O., Topaloglu,O., Begum,S., Henrique,R., Rosenbaum,E., Van,C.W., Westra,W.H., and Sidransky,D. (2005b). Quantitative methylation‐
specific polymerase chain reaction gene pat‐
terns in urine sediment distinguish prostate cancer patients from control subjects. J. Clin. Oncol. 23, 6569‐6575. Horsthemke,B. and Wagstaff,J. (2008). Mechan‐
isms of imprinting of the Prader‐
Willi/Angelman region. Am. J. Med. Genet. A 146A, 2041‐2052. Hu,M., Yao,J., Cai,L., Bachman,K.E., van den,B.F., Velculescu,V., and Polyak,K. (2005). Distinct epigenetic changes in the stromal cells of breast cancers. Nat. Genet. 37, 899‐905. Hu,M., Yao,J., and Polyak,K. (2006). Methylation‐
specific digital karyotyping. Nat. Protoc. 1, 1621‐
1636. Huang,T.H., Perry,M.R., and Laux,D.E. (1999). Methylation profiling of CpG islands in human breast cancer cells. Hum. Mol. Genet. 8, 459‐470. Inche,A.G. and La Thangue,N.B. (2006). Chroma‐
tin control and cancer‐drug discovery: realizing the promise. Drug Discov. Today 11, 97‐109. Hoque,M.O., Begum,S., Topaloglu,O., Chatter‐
jee,A., Rosenbaum,E., Van Criekinge,W., We‐
References 197 Ivanova,T., Vinokurova,S., Petrenko,A., Eshi‐
lev,E., Solovyova,N., Kisseljov,F., and Kisseljo‐
va,N. (2004). Frequent hypermethylation of 5 ' flanking region of TIMP‐2 gene in cervical can‐
cer. International Journal of Cancer 108, 882‐
886. Jaenisch,R. and Bird,A. (2003). Epigenetic regu‐
lation of gene expression: how the genome integrates intrinsic and environmental signals. Nat. Genet. 33 Suppl, 245‐254. Jimeno,A., Jimenez‐Ruiz,E., Lee,V., Gaudan,S., Berlanga,R., and Rebholz‐Schuhmann,D. (2008). Assessment of disease named entity recognition on a corpus of annotated sentences. BMC. Bioin‐
formatics. 9 Suppl 3, S3. Jin,Y., McDonald,R.T., Lerman,K., Mandel,M.A., Carroll,S., Liberman,M.Y., Pereira,F.C., Win‐
ters,R.S., and White,P.S. (2006). Automated recognition of malignancy mentions in biomedi‐
cal literature. BMC. Bioinformatics. 7, 492. Kang,S., Kim,J., Kim,H.B., Shim,J.W., Nam,E., Kim,S.H., Ahn,H.J., Choi,Y.P., Ding,B., Song,K., and Cho,N.H. (2006). Methylation of p16INK4a is a non‐rare event in cervical intraepithelial neop‐
lasia. Diagn. Mol. Pathol. 15, 74‐82. Kempkensteffen,C., Christoph,F., Weikert,S., Krause,H., Kollermann,J., Schostak,M., Miller,K., and Schrader,M. (2006). Epigenetic silencing of the putative tumor suppressor gene testisin in testicular germ cell tumors. Journal of Cancer Research and Clinical Oncology 132, 765‐770. Kent,W.J. (2002). BLAT‐‐the BLAST‐like align‐
ment tool. Genome Res. 12, 656‐664. Kim,J.D., Ohta,T., and Tsujii,J. (2008). Corpus annotation for mining biomedical events from literature. BMC. Bioinformatics. 9, 10. Kim,M.S., Yamashita,K., Baek,J.H., Park,H.L., Carvalho,A.L., Osada,M., Hoque,M.O., Upad‐
hyay,S., Mori,M., Moon,C., and Sidransky,D. (2006). N‐methyl‐D‐aspartate receptor type 2B is epigenetically inactivated and exhibits tumor‐
suppressive activity in human esophageal can‐
cer. Cancer Res 66. 198 References
Kim,T.Y., Lee,H.J., Hwang,K.S., Lee,M., Kim,J.W., Bang,Y.J., and Kang,G.H. (2004). Methylation of RUNX3 in various types of human cancers and premalignant stages of gastric carcinoma. La‐
boratory Investigation 84, 479‐484. Kim,W. and Wilbur,W.J. (2005). A strategy for assigning new concepts in the MEDLINE data‐
base. AMIA. Annu. Symp. Proc. 395‐399. Kitkumthorn,N., Yanatatsanajit,P., Kiatpong‐
san,S., Phokaew,C., Triratanachat,S., Trivijit‐
silp,P., Termrungruanglert,W., Tresukosol,D., Niruthisard,S., and Mutirangura,A. (2006). Cyclin A1 promoter hypermethylation in human papillomavirus‐associated cervical cancer. Bmc Cancer 6. Kornberg,L.J., Villaret,D., Popp,M., Lui,L., McLa‐
ren,R., Brown,H., Cohen,D., Yun,J., and McFad‐
den,M. (2005). Gene expression profiling in squamous cell carcinoma of the oral cavity shows abnormalities in several signaling path‐
ways. Laryngoscope 115, 690‐698. Krull,M., Voss,N., Choi,C., Pistor,S., Potapov,A., and Wingender,E. (2003). TRANSPATH: an integrated database on signal transduction and a tool for array analysis. Nucleic Acids Res. 31, 97‐100. Kuerbitz,S.J., Pahys,J., Wilson,A., Compitello,N., and Gray,T.A. (2002). Hypermethylation of the imprinted NNAT locus occurs frequently in pediatric acute leukemia. Carcinogenesis 23, 559‐564. Kulasingam,S.L., Hughes,J.P., Kiviat,N.B., Mao,C., Weiss,N.S., Kuypers,J.M., and Koutsky,L.A. (2002). Evaluation of human papillomavirus testing in primary screening for cervical abnor‐
malities: comparison of sensitivity, specificity, and frequency of referral. JAMA 288, 1749‐1757. Lai,H.C., Lin,Y.W., Huang,T.H., Yan,P., Huang,R.L., Wang,H.C., Liu,J., Chan,M.W., Chu,T.Y., Sun,C.A., Chang,C.C., and Yu,M.H. (2008). Identification of novel DNA methylation markers in cervical cancer. Int. J. Cancer 123, 161‐167. Lee,H., Yi,G.S., and Park,J.C. (2008). E3Miner: a text mining tool for ubiquitin‐protein ligases. Nucleic Acids Res. 36, W416‐W422. Leonhardt,H., Rahn,H.P., and Cardoso,M.C. (1999). Functional links between nuclear struc‐
ture, gene expression, DNA replication, and methylation. Crit Rev Eukaryot Gene Expr 9. Li,K.B. (2003). ClustalW‐MPI: ClustalW analysis using distributed and parallel computing. Bioin‐
formatics. 19, 1585‐1586. Liang,H., Samanta,S., and Nagarajan,L. (2005). SSBP2, a candidate tumor suppressor gene, induces growth arrest and differentiation of myeloid leukemia cells. Oncogene 24. Liu,H., Aronson,A.R., and Friedman,C. (2002). A study of abbreviations in MEDLINE abstracts. Proc. AMIA. Symp. 464‐468. Liu,J., Hadjokas,N., Mosley,B., Estrov,Z., Spence,M.J., and Vestal,R.E. (1998). Oncostatin M‐specific receptor expression and function in regulating cell proliferation of normal and malignant mammary epithelial cells. Cytokine 10. Lo,N.W., Shaper,J.H., Pevsner,J., and Shaper,N.L. (1998). The expanding beta 4‐
galactosyltransferase gene family: messages from the databanks. Glycobiology 8. Lund,A.H. and van Lohuizen,M. (2004). Poly‐
comb complexes and silencing mechanisms. Curr. Opin. Cell Biol. 16, 239‐246. Manton,K.J., Douglas,M.L., Netzel‐Arnett,S., Fitzpatrick,D.R., Nicol,D.L., Boyd,A.W., Cle‐
ments,J.A., and Antalis,T.M. (2005). Hyperme‐
thylation of the 5 ' CpG island of the gene encod‐
ing the serine protease Testisin promotes its loss in testicular tumorigenesis. British Journal of Cancer 92, 760‐769. Margueron,R., Trojer,P., and Reinberg,D. (2005). The key to development: interpreting the his‐
tone code? Curr. Opin. Genet. Dev. 15, 163‐176. Martin‐Hirsch,P.L., Koliopoulos,G., and Paraske‐
vaidis,E. (2002). Is it now time to evaluate the true accuracy of cervical cytology screening? A review of the literature. Eur. J. Gynaecol. Oncol. 23, 363‐365. References Mihara,M., Yoshida,Y., Tsukamoto,T., Inada,K., Nakanishi,Y., Yagi,Y., Imai,K., Sugimura,T., Tate‐
matsu,M., and Ushijima,T. (2006). Methylation of multiple genes in gastric glands with intestin‐
al metaplasia ‐ A disorder with polyclonal ori‐
gins. American Journal of Pathology 169, 1643‐
1651. Momparler,R.L. (2003). Cancer epigenetics. Oncogene 22. Mori,Y., Cai,K., Cheng,Y., Wang,S., Paun,B., Hamil‐
ton,J.P., Jin,Z., Sato,F., Berki,A.T., Kan,T., Ito,T., Mantzur,C., Abraham,J.M., and Meltzer,S.J. (2006). A genome‐wide search identifies epige‐
netic silencing of somatostatin, tachykinin‐1, and 5 other genes in colon cancer. Gastroenter‐
ology 131, 797‐808. Morison,I.M., Ramsay,J.P., and Spencer,H.G. (2005). A census of mammalian imprinting. Trends in Genetics 21, 457‐465. Munoz,N., Bosch,F.X., Castellsague,X., Diaz,M., de,S.S., Hammouda,D., Shah,K.V., and Meijer,C.J. (2004). Against which human papillomavirus types shall we vaccinate and screen? The inter‐
national perspective. Int. J. Cancer 111, 278‐285. Nanda,K., McCrory,D.C., Myers,E.R., Bastian,L.A., Hasselblad,V., Hickey,J.D., and Matchar,D.B. (2000). Accuracy of the Papanicolaou test in screening for and follow‐up of cervical cytologic abnormalities: a systematic review. Ann. Intern. Med. 132, 810‐819. Okino,S.T., Pookot,D., Li,L.C., Zhao,H., Urakatni,S., Shiina,H., Igawa,N., and Dahiya,R. (2006). Epige‐
netic inactivation of the dioxin‐responsive cytochrome P4501A1 gene in human prostate cancer. Cancer Research 66, 7420‐7428. Olson,S.A. (2002). EMBOSS opens up sequence analysis. European Molecular Biology Open Software Suite. Brief. Bioinform. 3, 87‐91. Ongenaert,M., Van Neste,L., De Meyer T., Men‐
schaert,G., Bekaert,S., and Van Criekinge,W. (2008). PubMeth: a cancer methylation data‐
base combining text‐mining and expert annota‐
tion. Nucleic Acids Res. 36, D842‐D846. 199 Paluszczak,J. and Baer‐Dubowska,W. (2006). Epigenetic diagnostics of cancer ‐ the applica‐
tion of DNA methylation markers. Journal of Applied Genetics 47, 365‐375. Parkin,D.M. (2006). The global health burden of infection‐associated cancers in the year 2002. Int. J. Cancer 118, 3030‐3044. Pattyn,F., Hoebeeck,J., Robbrecht,P., Michels,E., De,P.A., Bottu,G., Coornaert,D., Herzog,R., Spele‐
man,F., and Vandesompele,J. (2006). meth‐
BLAST and methPrimerDB: web‐tools for PCR based methylation analysis. BMC Bioinformatics 7, 496. Paz,M.F., Fraga,M.F., Avila,S., Guo,M., Pollan,M., Herman,J.G., and Esteller,M. (2003). A systemat‐
ic profile of DNA methylation in human cancer cell lines. Cancer Res. 63, 1114‐1121. Peedicayil,J. (2006). Epigenetic therapy‐‐a new development in pharmacology. Indian J. Med. Res. 123, 17‐24. Persson,G., Andersson,K., and Krantz,I. (1996). Symptomatic genital papillomavirus infection in a community. Incidence and clinical picture. Acta Obstet. Gynecol. Scand. 75, 287‐290. Piletz,J.E., Ivanov,T.R., Sharp,J.D., Ernsberger,P., Chang,C.H., Pickard,R.T., Gold,G., Roth,B., Zhu,H., Jones,J.C., Baldwin,J., and Reis,D.J. (2000). Imida‐
zoline receptor antisera‐selected (IRAS) cDNA: cloning and characterization. DNA Cell Biol 19. Qiu,G., Fan,J.C., and He,Y.S. (2006). 5 ' CpG island methylation analysis identifies the MAGE‐A1 and MAGE‐A3 genes as potential markers of HCC. Clinical Biochemistry 39, 259‐266. Rauch,T., Li,H.W., Wu,X.W., and Pfeifer,G.P. (2006). MIRA‐assisted microarray analysis, a new technology for the determination of DNA methylation patterns, identifies frequent methy‐
lation of homeodomain‐containing genes in lung cancer cells. Cancer Research 66, 7939‐7947. Rebhan,M., Chalifa‐Caspi,V., Prilusky,J., and Lancet,D. (1998). GeneCards: a novel functional genomics compendium with automated data 200 References
mining and query reformulation support. Bioin‐
formatics. 14, 656‐664. Rebhan,M., ChalifaCaspi,V., Prilusky,J., and Lancet,D. (1997). GeneCards: Integrating infor‐
mation about genes, proteins and diseases. Trends in Genetics 13, 163. Rebholz‐Schuhmann,D., Arregui,M., Gaudan,S., Kirsch,H., and Jimeno,A. (2008). Text processing through Web services: calling Whatizit. Bioin‐
formatics. 24, 296‐298. Riggs,A.D. (1975). X inactivation, differentiation, and DNA methylation. Cytogenet Cell Genet 14. Rigoutsos,I. and Floratos,A. (1998). Combina‐
torial pattern discovery in biological sequences: The TEIRESIAS algorithm. Bioinformatics. 14, 55‐67. Robertson,K.D. (2002). DNA methylation and chromatin ‐ unraveling the tangled web. Onco‐
gene 21, 5361‐5379. Safran,M., Solomon,I., Shmueli,O., Lapidot,M., Shen‐Orr,S., Adato,A., Ben‐Dor,U., Esterman,N., Rosen,N., Peter,I., Olender,T., Chalifa‐Caspi,V., and Lancet,D. (2002). GeneCards 2002: towards a complete, object‐oriented, human gene com‐
pendium. Bioinformatics. 18, 1542‐1543. Sano,H., Liu,S.C., Lane,W.S., Piletz,J.E., and Lien‐
hard,G.E. (2002). Insulin receptor substrate 4 associates with the protein IRAS. J Biol Chem 277. Savarese,T.M., Campbell,C.L., McQuain,C., Mit‐
chell,K., Guardiani,R., Quesenberry,P.J., and Nelson,B.E. (2002). Coexpression of oncostatin M and its receptors and evidence for STAT3 activation in human ovarian carcinomas. Cyto‐
kine 17. Schuebel,K.E., Chen,W., Cope,L., Glockner,S.C., Suzuki,H., Yi,J.M., Chan,T.A., Van,N.L., Van,C.W., van den,B.S., van,E.M., Ting,A.H., Jair,K., Yu,W., Toyota,M., Imai,K., Ahuja,N., Herman,J.G., and Baylin,S.B. (2007). Comparing the DNA hyper‐
methylome with gene mutations in human colorectal cancer. PLoS. Genet. 3, 1709‐1723. Sehgal,A.K. and Srinivasan,P. (2006). Retrieval with gene queries. BMC. Bioinformatics. 7, 220. Serman,A., Vlahovic,M., Serman,L., and Bulic‐
Jakus,F. (2006). DNA methylation as a regulato‐
ry mechanism for gene expression in mammals. Coll. Antropol. 30, 665‐671. Shapiro,J.R. and Shapiro,W.R. (1984). Clonal tumor cell heterogeneity. Prog Exp Tumor Res 27. Shaw,R.J., Liloglou,T., Rogers,S.N., Brown,J.S., Vaughan,E.D., Lowe,D., Field,J.K., and Risk,J.M. (2006). Promoter methylation of P16, RAR beta, E‐cadherin, cyclin A1 and cytoglobin in oral cancer: quantitative evaluation using pyrose‐
quencing. British Journal of Cancer 94, 561‐568. Shen,L. and Waterland,R.A. (2007). Methods of DNA methylation analysis. Curr. Opin. Clin. Nutr. Metab Care 10, 576‐581. Shi,H., Maier,S., Nimmrich,I., Yan,P.S., Cald‐
well,C.W., Olek,A., and Huang,T.H. (2003). Oligo‐
nucleotide‐based microarray for DNA methyla‐
tion analysis: Principles and applications. J. Cell Biochem. 88, 138‐143. Shivapurkar,N., Toyooka,S., Toyooka,K.O., Red‐
dy,J., Miyajima,K., Suzuki,M., Shigematsu,H., Takahashi,T., Parikh,G., Pass,H.I., Chaud‐
hary,P.M., and Gazdar,A.F. (2004). Aberrant methylation of trail decoy receptor genes is frequent in multiple tumor types. International Journal of Cancer 109, 786‐792. Shtatland,T., Guettler,D., Kossodo,M., Pivova‐
rov,M., and Weissleder,R. (2007). PepBank‐‐a database of peptides based on sequence text mining and public peptide data sources. BMC. Bioinformatics. 8, 280. Sigalotti,L., Coral,S., Nardi,G., Spessotto,A., Corti‐
ni,E., Cattarossi,I., Colizzi,F., Altomonte,M., and Maio,M. (2002). Promoter methylation controls the expression of MAGE2, 3 and 4 genes in human cutaneous melanoma. Journal of Immu‐
notherapy 25, 16‐26. Singh,S.B., Hull,R.D., and Fluder,E.M. (2003). Text Influenced Molecular Indexing (TIMI): a References literature database mining approach that han‐
dles text and chemistry. J. Chem. Inf. Comput. Sci. 43, 743‐752. Sjoblom,T., Jones,S., Wood,L.D., Parsons,D.W., Lin,J., Barber,T.D., Mandelker,D., Leary,R.J., Ptak,J., Silliman,N., Szabo,S., Buckhaults,P., Farrell,C., Meeh,P., Markowitz,S.D., Willis,J., Dawson,D., Willson,J.K., Gazdar,A.F., Hartigan,J., Wu,L., Liu,C., Parmigiani,G., Park,B.H., Bach‐
man,K.E., Papadopoulos,N., Vogelstein,B., Kinz‐
ler,K.W., and Velculescu,V.E. (2006). The con‐
sensus coding sequences of human breast and colorectal cancers. Science 314. Smiraglia,D.J., Rush,L.J., Fruhwald,M.C., Dai,Z.Y., Held,W.A., Costello,J.F., Lang,J.C., Eng,C., Li,B., Wright,F.A., Caligiuri,M.A., and Plass,C. (2001). Excessive CpG island hypermethylation in can‐
cer cell lines versus primary human malignan‐
cies. Human Molecular Genetics 10, 1413‐1419. Smith,J.S., Lindsay,L., Hoots,B., Keys,J., France‐
schi,S., Winer,R., and Clifford,G.M. (2007). Hu‐
man papillomavirus type distribution in inva‐
sive cervical cancer and high‐grade cervical lesions: a meta‐analysis update. Int. J. Cancer 121, 621‐632. Sova,P., Feng,Q.H., Geiss,G., Wood,T., Strauss,R., Rudolf,V., Lieber,A., and Kiviat,N. (2006). Dis‐
covery of novel methylation biomarkers in cervical carcinoma by global demethylation and microarray analysis. Cancer Epidemiology Biomarkers & Prevention 15, 114‐123. Sparmann,A. and van Lohuizen,M. (2006). Poly‐
comb silencers control cell fate, development and cancer. Nat. Rev. Cancer 6, 846‐856. Stajich,J.E., Block,D., Boulez,K., Brenner,S.E., Chervitz,S.A., Dagdigian,C., Fuellen,G., Gil‐
bert,J.G., Korf,I., Lapp,H., Lehvaslaiho,H., Matsal‐
la,C., Mungall,C.J., Osborne,B.I., Pocock,M.R., Schattner,P., Senger,M., Stein,L.D., Stupka,E., Wilkinson,M.D., and Birney,E. (2002). The Bio‐
perl toolkit: Perl modules for the life sciences. Genome Res. 12, 1611‐1618. Steenbergen,R.D., OudeEngberink,V.E., Kra‐
mer,D., Schrijnemakers,H.F., Verheijen,R.H., Meijer,C.J., and Snijders,P.J. (2002). Down‐
regulation of GATA‐3 expression during human 201 papillomavirus‐mediated immortalization and cervical carcinogenesis. Am. J. Pathol. 160, 1945‐
1951. Steenbergen,R.D.M., Kramer,D., Braakhuis,B.J.M., Stem,P.L., Verheijen,R.H.M., Meijer,C.J.L.M., and Snijders,P.J.F. (2004). TSLC1 gene silencing in cervical cancer cell lines and cervical neoplasia. Journal of the National Cancer Institute 96, 294‐
305. Strathdee, G and Brown, R. Abberant DNA me‐
thylation in cancer: potential clinical interven‐
tions. Exp.Rev.Mol.Med. 1‐17. 4‐3‐2002. Ref Type: Electronic Citation Su,P.F., Lee,T.C., Lin,P.J., Lee,P.H., Jeng,Y.M., Chen,C.H., Liang,J.D., Chiou,L.L., Huang,G.T., and Lee,H.S. (2007). Differential DNA methylation associated with hepatitis B virus infection in hepatocellular carcinoma. Int. J. Cancer 121, 1257‐1264. Suzuki,H., Gabrielson,E., Chen,W., Anbazha‐
gan,R., van Engeland,M., Weijenberg,M.P., Her‐
man,J.G., and Baylin,S.B. (2002). A genomic screen for genes upregulated by demethylation and histone deacetylase inhibition in human colorectal cancer. Nat. Genet. 31, 141‐149. Suzuki,Y., Yamashita,R., Sugano,S., and Nakai,K. (2004). DBTSS, DataBase of Transcriptional Start Sites: progress report 2004. Nucleic Acids Res. 32, D78‐D81. Takai,D. and Jones,P.A. (2002). Comprehensive analysis of CpG islands in human chromosomes 21 and 22. Proc. Natl. Acad. Sci. U. S. A 99, 3740‐
3745. Tan,B.T., Park,C.Y., Ailles,L.E., and Weissman,I.L. (2006). The cancer stem cell hypothesis: a work in progress. Lab Invest 86, 1203‐1207. Tanabe,L. and Wilbur,W.J. (2002). Tagging gene and protein names in biomedical text. Bioinfor‐
matics. 18, 1124‐1132. Taylor,K.H., Kramer,R.S., Davis,J.W., Guo,J., Duff,D.J., Xu,D., Caldwell,C.W., and Shi,H. (2007). Ultradeep bisulfite sequencing analysis of DNA methylation patterns in multiple gene promo‐
202 References
ters by 454 sequencing. Cancer Res. 67, 8511‐
8518. Thompson,J.D., Higgins,D.G., and Gibson,T.J. (1994). CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position‐specific gap penalties and weight matrix choice. Nucleic Acids Res. 22, 4673‐4680. Tokumaru,Y., Yamashita,K., Osada,M., Nomoto,S., Sun,D.I., Xiao,Y., Hoque,M.O., Westra,W.H., Cali‐
fano,J.A., and Sidransky,D. (2004). Inverse corre‐
lation between cyclin A1 hypermethylation and p53 mutation in head and neck cancer identified by reversal of epigenetic silencing. Cancer Res. 64, 5982‐5987. Toyooka,S., Tokumo,M., Shigematsu,H., Mat‐
suo,K., Asano,H., Tomii,K., Ichihara,S., Suzuki,M., Aoe,M., Date,H., Gazdar,A.F., and Shimizu,N. (2006). Mutational and epigenetic evidence for independent pathways for lung adenocarcino‐
mas arising in smokers and never smokers. Cancer Res 66. Tranchevent,L.C., Barriot,R., Yu,S., Vooren,S.V., Loo,P.V., Coessens,B., Moor,B.D., Aerts,S., and Moreau,Y. (2008). ENDEAVOUR update: a web resource for gene prioritization in multiple species. Nucleic Acids Res. 36, W377‐W384. Trooskens,G., De Beule,D., Decouttere,F., and Van Criekinge,W. (2005). Phylogenetic trees: visualizing, customizing and detecting incon‐
gruence. Bioinformatics. 21, 3801‐3802. Tuason,O., Chen,L., Liu,H., Blake,J.A., and Fried‐
man,C. (2004). Biological nomenclatures: a source of lexical knowledge and ambiguity. Pac. Symp. Biocomput. 238‐249. Varmus,H. (2006). The new era in cancer re‐
search. Science 312. Villa,R., Pasini,D., Gutierrez,A., Morey,L., Occhio‐
norelli,M., Vire,E., Nomdedeu,J.F., Jenuwein,T., Pelicci,P.G., Minucci,S., Fuks,F., Helin,K., and Di,C.L. (2007). Role of the polycomb repressive complex 2 in acute promyelocytic leukemia. Cancer Cell 11, 513‐525. Vire,E., Brenner,C., Deplus,R., Blanchon,L., Fra‐
ga,M., Didelot,C., Morey,L., Van,E.A., Bernard,D., Vanderwinden,J.M., Bollen,M., Esteller,M., Di,C.L., de,L.Y., and Fuks,F. (2006). The Polycomb group protein EZH2 directly controls DNA methyla‐
tion. Nature 439, 871‐874. Vivekanandan,P., Thomas,D., and Torbenson,M. (2008). Hepatitis B viral DNA is methylated in liver tissues. J. Viral Hepat. 15, 103‐107. Vogelstein,B. and Kinzler,K.W. (2004). Cancer genes and the pathways they control. Nat Med 10. Wang,G.G., Allis,C.D., and Chi,P. (2007). Chroma‐
tin remodeling and cancer, Part I: Covalent histone modifications. Trends Mol. Med. 13, 363‐372. Wang,Y., Zhou,Y., Szabo,K., Haft,C.R., and Trejo,J. (2002). Down‐regulation of protease‐activated receptor‐1 is regulated by sorting nexin 1. Mol Biol Cell 13. Wang,Y.P., Yu,Q.J., Cho,A.H., Rondeau,G., Welsh,J., Adamson,E., Mercola,D., and McClelland,M. (2005). Survey of differentially methylated promoters in prostate cancer cell lines. Neopla‐
sia 7, 748‐760. Wheeler,D.L., Barrett,T., Benson,D.A., Bryant,S.H., Canese,K., Church,D.M., DiCuccio,M., Edgar,R., Federhen,S., Helmberg,W., Kenton,D.L., Khovayko,O., Lipman,D.J., Madden,T.L., Mag‐
lott,D.R., Ostell,J., Pontius,J.U., Pruitt,K.D., Schu‐
ler,G.D., Schriml,L.M., Sequeira,E., Sherry,S.T., Sirotkin,K., Starchenko,G., Suzek,T.O., Tatusov,R., Tatusova,T.A., Wagner,L., and Yaschenko,E. (2005). Database resources of the National Center for Biotechnology Information. Nucleic Acids Res. 33, D39‐D45. Widschwendter,M., Fiegl,H., Egle,D., Mueller‐
Holzner,E., Spizzo,G., Marth,C., Weisenberg‐
er,D.J., Campan,M., Young,J., Jacobs,I., and Laird,P.W. (2007). Epigenetic stem cell signa‐
ture in cancer. Nat. Genet. 39, 157‐158. Wild,D.J. and Hur,J. (2008). PubChemSR: A search and retrieval tool for PubChem. Chem. Cent. J. 2, 11. References Wischnewski,F., Pantel,K., and Schwarzen‐
bach,H. (2006). Promoter demethylation and histone acetylation mediate gene expression of MAGE‐A1,‐A2,‐A3, and‐A12 in human cancer cells. Molecular Cancer Research 4, 339‐349. Wisman,G.B., Nijhuis,E.R., Hoque,M.O., Reesink‐
Peters,N., Koning,A.J., Volders,H.H., Buikema,H.J., Boezen,H.M., Hollema,H., Schuuring,E., Si‐
dransky,D., and van der Zee,A.G. (2006). As‐
sessment of gene promoter hypermethylation for detection of cervical neoplasia. Int. J. Cancer 119, 1908‐1914. Worm,J. and Guldberg,P. (2002). DNA methyla‐
tion: an epigenetic pathway to cancer and a promising target for anticancer therapy. J. Oral Pathol. Med. 31, 443‐449. Wu,M.F., Cheng,Y.W., Lai,J.C., Hsu,M.C., Chen,J.T., Liu,W.S., Chiou,M.C., Chen,C.Y., and Lee,H. (2005). Frequent p16INK4a promoter hyperme‐
thylation in human papillomavirus‐infected female lung cancer in Taiwan. Int. J. Cancer 113, 440‐445. Xing,M., Cohen,Y., Mambo,E., Tallini,G., Udels‐
man,R., Ladenson,P.W., and Sidransky,D. (2004). Early occurrence of RASSF1A hypermethylation and its mutual exclusion with BRAF mutation in thyroid tumorigenesis. Cancer Res 64. Xiong,Z. and Laird,P.W. (1997). COBRA: a sensi‐
tive and quantitative DNA methylation assay. Nucleic Acids Res. 25, 2532‐2534. Yamashita,K., Park,H.L., Kim,M.S., Osada,M., Tokumaru,Y., Inoue,H., Mori,M., and Sidransky,D. (2006). PGP9.5 methylation in diffuse‐type gastric cancer. Cancer Res. 66, 3921‐3927. Yamashita,K., Upadhyay,S., Osada,M., Ho‐
que,M.O., Xiao,Y., Mori,M., Sato,F., Meltzer,S.J., and Sidransky,D. (2002). Pharmacologic un‐
masking of epigenetically silenced tumor sup‐
pressor genes in esophageal squamous cell carcinoma. Cancer Cell 2, 485‐495. Yan,P.S., Perry,M.R., Laux,D.E., Asare,A.L., Cald‐
well,C.W., and Huang,T.H. (2000). CpG island arrays: an application toward deciphering epigenetic signatures of breast cancer. Clin. Cancer Res. 6, 1432‐1438. 203 Yang,Z., Lin,H., and Li,Y. (2008). Exploiting the contextual cues for bio‐entity name recognition in biomedical literature. J. Biomed. Inform. 41, 580‐587. kawa,N. (1998). Defect in synaptic vesicle pre‐
cursor transport and neuronal cell death in KIF1A motor protein‐deficient mice. J Cell Biol 141. Yonekawa,Y., Harada,A., Okada,Y., Funakoshi,T., Kanai,Y., Takei,Y., Terada,S., Noda,T., and Hiro‐
204 References
Curriculum vitae PERSONAL DATA Adress: Kopkapelstraat 61 W4 9160 Lokeren Belgium Date of birth: December 19, 1982 Nationality: Belgian Tel: +32/(0)9.348.33.48 Mobile phone: +32/(0)479.56.48.84 E‐mail: [email protected] WORKING EXPERIENCE Researcher | G h e n t U n i v e r s i t y October 15 2005 ‐ current PhD student in the Lab. for computational genomics and bioinformatics, Department of Molecular biotechnology, Ghent University. Project: cellular reprogramming. Working on DNA‐methylation in the cancer‐context and influence on the DNA methylation of human cells after virus‐infection and during carcinogenesis. R&D Scientist | P r o c t e r & G a m b l e – B r u s s e l s I n n o v a ‐
tion Center July 1 – September 23, 2005 Experimental design and data‐analysis of washing experiments to identify novel ‘whiteness enhancers’ that can be used in washing tabs. Process re‐
sults, link with consumer preference models and give formulation advice (amounts needed, positive and negative effects, influence on performance, user preference and cost). Curriculum vitae
205 EDUCATION Ghent University, Faculty of BioScience Engineering 2000‐2005 Bio‐engineer in cell‐and gene biotechnology, distinction Thesis (2005) Promotor: Prof. dr. ir. Wim Van Criekinge Title: characterisation of cancer‐specific methylated CpG islands Short abstract: - CpG islands are CG‐rich areas in the promotorregions of genes. The cyto‐
sine residues can be methylated, causing the genes to be transcriptionally inactivated - using machine‐learning techniques and bio‐informatics, it is tried to pre‐
dict which promoters are cancer‐specific methylated, and what properties they have. As a specific property, some DNA patterns are found. - the biological significance of these patterns is checked using different methods (position within CpG islands, location in the whole genome, stabil‐
ity under selection pressure, similarity with transcription factors) - these patterns are found to be markers for cancer‐specific methylation. This way, more insight in the mechanism can be obtained with further inves‐
tigation Internship (2004) Location: CLO‐DVP (Agricultureal Research center ‐ department of Plant genetics and breeding ‐ Ministry of the Flemish Community), July 19 – August 26, 2004 Title: Detection and quantification of GMO content in maize Sint‐Lodewijkscollege, Lokeren 1994‐2000 Sciences‐Math (8h maths) 206 Curriculum vitae
PUBLICATIONS - Ongenaert M, Wisman B, Volders H, Koning A, van der Zee A, Van Crie‐
kinge W, Schuuring E. (2008). Discovery of DNA methylation markers in cervical cancer using relaxation ranking. BMC Medical Genomics, 1, 57. - Van Criekinge W, Ongenaert M, van der zee AGJ, Wisman GBA, Kridelka F. De agenda Gynaecologie – oncologie. Oktober 2008, p. 12‐14. - Vercruysse L, Smagghe G, van der Bent A, van Amerongen A, Ongenaert M, Van Camp J. (2008). Critical evaluation of the use of bioinformatics as a theoretical tool to find high‐potential sources of ACE inhibitory peptides. Peptides, in press. - Hoque MO, Kim MS, Ostrow KL, Liu J, Wisman GB, Park HL, Poeta ML, Jeronimo C, Henrique R, Lendvai A, Schuuring E, Begum S, Rosenbaum E, Ongenaert M, Yamashita K, Califano J, Westra W, van der Zee AG, Van Crie‐
kinge W, Sidransky D. Genome‐wide promoter analysis uncovers portions of the cancer methylome. Cancer Research, 68(8),2661‐70. - Ongenaert M, Van Neste L, De Meyer T, Menschaert G, Bekaert S, Van Criekinge W. (2007). PubMeth: a cancer methylation database combining text‐mining and expert annotation. Nucleic Acids Research, 36, D842‐D846. - Van Damme EJ, Nakamura‐Tsurata S, Smith DF, Ongenaert M, Winter HC, Rouge P, Goldstein IJ, Mo H, Kominami J, Culerrier R, Barre A, Hirabayashi J, Peumans WJ. (2007). Phylogenetic and specificity studies of two‐domain GNA‐related lectins: generation of multispecificity through domain duplica‐
tion and divergent evolution. Biochemical Journal, 404, 51‐61. OTHER SCIENTIFIC ACTIVITIES Congresses & meetings | W i t h o r a l p r e s e n t a t i o n - Ongenaert M, Van Neste L, De Meyer T, Menschaert G, Bekaert S, Van Criekinge W. PubMeth: a cancer methylation database combining text‐
mining and expert annotation. Epiphamy, Ghent, Belgium, October 26, 2007 - Hoebeeck J, Ongenaert M, Michels E, De Preter K, Vermeulen J, Yigit N, De Paepe A, Van Criekinge W, Speleman F and Vandesompele J. Genome wide promoter methylation analysis in neuroblastoma with perspectives for inte‐
grated molecular profiling. MC‐GARD Conference, Amsterdam, The Nether‐
lands, May 2‐5, 2007 (presentation by Hoebeeck) - Hoebeeck J, Ongenaert M, Michels E, De Preter K, Vermeulen J, Yigit N, De Paepe A, Van Criekinge W, Speleman F and Vandesompele J. Genome wide promoter methylation analysis in neuroblastoma with perspectives for inte‐
grated molecular profiling. 7th BeSHG meeting, Charleroi (Marcinelle), Bel‐
gium, April 20, 2007 (presentation by Hoebeeck) - Ongenaert M, Bekaert S and Van Criekinge W. The epigenomics of can‐
Curriculum vitae
207 cer: beyond genetics. Oncologic research lunch Ghent University Hospital, Ghent, Belgium, April 20, 2007 Congresses & meetings | W i t h p o s t e r p r e s e n t a t i o n - Ongenaert M, Van Neste L, De Meyer T, Menschaert G, Bekaert S, Van Criekinge W. Pubmeth: reviewed methylation database in cancer based on text‐mining. ISMB2008, Toronto, Canada, 19‐23 July, 2008 - Hoebeeck J, Ongenaert M, Michels E, De Preter K, Vermeulen J, Yigit N, De Paepe A, Van Criekinge W, Speleman F and Vandesompele J. Genome wide promoter methylation analysis in neuroblastoma with perspectives for inte‐
grated molecular profiling. Epiphamy, Ghent, Belgium, October 26, 2007 - Hoebeeck J, Ongenaert M, Michels E, De Preter K, Vermeulen J, Yigit N, De Paepe A, Van Criekinge W, Speleman F and Vandesompele J. Genome wide promoter methylation analysis in neuroblastoma with perspectives for inte‐
grated molecular profiling. Nature‐CNIO Conference , Madrid, Spain, October 3‐6, 2007 - Ongenaert M, Van Criekinge W, Straub J, Van der Zee A, Schuuring E, Spolders H. Discovery of methylation markers using a relaxation ranking algorithm, 15th Annual International Conference on Intelligent Systems for Molecular Biology (ISMB) & 6th European Conference on Computational Biology (ECCB), Vienna, Austria, July 21‐25, 2007 - Ongenaert M, Van Criekinge W. CpG island characterization, 4th Euro‐
pean Conference on Computational Biology (ECCB05), Madrid, Spain, Sep‐
tember 28 – October 1, 2005 Congresses & meetings | P a r t i c i p a n t - Micro‐array facility user meeting; Leuven, Belgium, October 13, 2006 - Workshop on Biostatistics; SAS Institute, Tervuren, Belgium, September 22, 2006 - Workshop on Biological Data Management; Ghent, Belgium, May 19, 2006 - BioScope‐IT kickoff meeting; Leuven, Belgium, October 25, 2005 - ‘Life, a nobel story’; Brussels, Belgium, April 29, 2004 MEMBERSHIPS, AWARDS, GRANTS & PEER REVIEWING - Member of ISCB (International Society of Computational Biology) (since 2004) - Travelship support from ECCB to attend ECCB05, Madrid, Spain - Reviewer for Bioinformatics (Oxford University Press) 208 Curriculum vitae