Bioinformatical Approaches to Ranking of anti-HIV
Transcription
Bioinformatical Approaches to Ranking of anti-HIV
Dissertation Bioinformatical Approaches to Ranking of anti-HIV Combination Therapies and Planning of Treatment Schedules Author André Altmann Dissertation zur des der der Erlangung des Grades Doktors der Naturwissenschaften (Dr. rer. nat.) Naturwissenschaftlich–Technischen Fakultäten Universität des Saarlandes Saarbrücken 2010 Tag des Kolloquiums: 8.6.2010 Dekan: Vorsitzender des Prüfungsausschusses: Berichterstatter: Prof. Dr. Holger Hermanns Prof. Dr. Matthias Hein Prof. Dr. Thomas Lengauer, Ph.D. Prof. Dr. Jörg Rahnenführer Prof. Dr. Hans-Peter Lenhof Dr. Francisco Domingues Beisitzer: Abstract The human immunodeficiency virus (HIV) pandemic is one of the most serious health challenges humanity is facing today. Combination therapy comprising multiple antiretroviral drugs resulted in a dramatic decline in HIV-related mortality in the developed countries. However, the emergence of drug resistant HIV variants during treatment remains a problem for permanent treatment success and seriously hampers the composition of new active regimens. In this thesis we use statistical learning for developing novel methods that rank combination therapies according to their chance of achieving treatment success. These depend on information regarding the treatment composition, the viral genotype, features of viral evolution, and the patient’s therapy history. Moreover, we investigate different definitions of response to antiretroviral therapy and their impact on prediction performance of our method. We address the problem of extending purely data-driven approaches to support novel drugs with little available data. In addition, we explore the prospect of prediction systems that are centered on the patient’s treatment history instead of the viral genotype. We present a framework for rapidly simulating resistance development during combination therapy that will eventually allow application of combination therapies in the best order. Finally, we analyze surface proteins of HIV regarding their susceptibility to neutralizing antibodies with the aim of supporting HIV vaccine development. Kurzfassung Die Humane Immundefizienz-Virus (HIV) Pandemie ist eine der schwerwiegendsten gesundheitlichen Herausforderungen weltweit. Kombinationstherapien bestehend aus mehreren Medikamenten führten in entwickelten Ländern zu einem drastischen Rückgang der HIV-bedingten Sterblichkeit. Die Entstehung von Arzneimittel-resistenten Varianten während der Behandlung stellt allerdings ein Problem für den anhaltenden Behandlungserfolg dar und erschwert die Zusammenstellung von neuen aktiven Kombinationen. In dieser Arbeit verwenden wir statistisches Lernen zur Entwicklung neuer Methoden, welche Kombinationstherapien bezüglich ihres erwarteten Behandlungserfolgs sortieren. Dabei nutzen wir Informationen über die Medikamente, das virale Erbgut, die Virus Evolution und die Therapiegeschichte des Patienten. Außerdem untersuchen wir unterschiedliche Definitionen für Therapieerfolg und ihre Auswirkungen auf die Güte unserer Modelle. Wir gehen das Problem der Erweiterung von daten-getriebenen Modellen bezüglich neuer Wirkstoffen an, und untersuchen weiterhin die Therapiegeschichte des Patienten als Ersatz für das virale Genom bei der Vorhersage. Wir stellen das Rahmenwerk für die schnelle Simulation von Resistenzentwicklung vor, welches schließlich erlaubt, die bestmögliche Reihenfolge von Kombinationstherapien zu suchen. Schließlich analysieren wir das HIV Oberflächenprotein im Hinblick auf seine Anfälligkeit für neutralisierende Antikörper mit dem Ziel die Impfstoff Entwicklung zu unterstützen. iii iv Acknowledgements This work was carried out in the Department for Computational Biology and Applied Algorithmics at the Max Planck Institute for Informatics in Saarbrücken. Foremost, I would like to thank my advisor Prof. Thomas Lengauer for his encouragement, support, and advice throughout the years, and for creating an inspiring environment where I could follow my own ideas. Thank you for giving me the opportunity to participate in this great project. I am also grateful to Prof. Jörg Rahnenführer and Prof. Hans-Peter Lenhof, who kindly agreed to referee this thesis. This work would not have been possible without the great number of even greater collaborators. I am much obliged to Robert W. Shafer (Stanford University), W. Jeffrey Fessel (Kaiser-Permanente Medical Care Program), and Klaus Überla (Ruhr-Universität Bochum). I am thankful to Rolf Kaiser and all members of his group at the Institute of Virology (University of Cologne), specifically Eugen Schülter, Nadine Sichtig, Melanie Balduin, and Jens Verheyen. I am grateful to the members of the Arevir consortium, foremost Rolf Kaiser, Martin Däumer, and Hauke Walter for sharing their knowledge on HIV. Thanks to all members of EuResist (IST-2004-027173) coordinated by Maurizio Zazzi and Francesca Incardona for creating this unique experience: especially Mattia Prosperi and Michal Rosen-Zvi for being partners in crime in developing the prediction engine. Thanks to all former and present members of our group for making the department a great and fun place to work (and have breaks). Especially my former office mates, Christoph Bock, Alexander Thielen, and Kasia Bozek; Bastian Beggel, Jasmina Bogojeska, Alejandro Pironti, Hiroto Saigo, Oliver Sander, Laura Tolosi, Hendrik Weisser for all the joint projects we worked on; Alejandro Pironti, Alexander Thielen, and Francisco Domingues for proof-reading parts of this thesis; Christoph Bock, Francisco Domingues, Oliver Sander, Tobias Sing, and Alexander Thielen for the numerous discussions. Special thanks go to Niko Beerenwinkel, Tobias Sing, and Martin Däumer for their invaluable advice during the beginning of this work, and to Achim Büch and Ruth SchneppenChristmann for their support and innumerable favors. Yassen Assenov, Rayna Dimitrova, Konstantin Halachev, Christoph Hartmann, Andreas Kämper, Lars Kunert, Jochen Maydt, Oliver Sander, Tobias Sing, and Laura Tolosi were the main criminals for making life in Saarbrücken a very fun time. Thank you! Furthermore, I would like to thank the students that worked with me on different projects: Fabian Müller for his excellent work in both his Bachelor’s and Master’s Thesis (congratulations for the Günter-Hotz medal!), Juliane Perner for her great work on her Bachelor’s Thesis and her invaluable help for computing clinical cutoffs, and Peter Ebert for his superb job in his Bachelor’s Thesis. It was great working with all of you. I wish you all the best! Thanks go also to Jorge Cham for his PhD comics that are a continuous source of fun because they are so true and straight to the point of every-day PhD-life12 . Finally, I would like to thank my family and friends for their love and support, especially my parents, Bärbel, and Günter. Above all though, I am grateful to Evangelia; for sharing ävery day with me: life is so much easier knowing you are by my side. 1 2 http://www.phdcomics.com/comics/archive.php?comicid=1159 http://www.phdcomics.com/comics/archive.php?comicid=1164 v Contents 1 2 Introduction AIDS, HIV, and Antiretroviral Therapy 2.1 2.2 2.3 2.4 2.5 2.6 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . GENO 2 PHENO 4.2 4.3 31 Features of Viral Evolution . 4.1.1 Activity Score . . . . . 4.1.2 Mutagenetic Trees . . geno2pheno-THEO . . . . . . 4.2.1 Material and Methods 4.2.2 Results . . . . . . . . 4.2.3 Discussion . . . . . . . Clinical Validation . . . . . . 4.3.1 Material and Methods 4.3.2 Results . . . . . . . . 4.3.3 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . E U R ESIST: Uniting Data for Fighting HIV 5.1 5.2 5 7 9 10 13 15 15 20 22 23 25 26 geno2pheno[integrase] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 Estimating Clinically Relevant Cutoffs for geno2pheno . . . . . . . . . . . 35 Improvements Using Semi-Supervised Learning . . . . . . . . . . . . . . . . 40 Predicting Response to Combination Therapy 4.1 5 5 The AIDS pandemic . . . . . . . . . . . . . . . . . . . . . . . . . 2.1.1 HIV Diversity . . . . . . . . . . . . . . . . . . . . . . . . . The Human Immunodeficiency Virus Type 1 . . . . . . . . . . . . 2.2.1 HIV Replication Cycle . . . . . . . . . . . . . . . . . . . . Clinical progression of an HIV infection . . . . . . . . . . . . . . HIV Treatment and Drug Resistance . . . . . . . . . . . . . . . . 2.4.1 Antiretroviral drugs . . . . . . . . . . . . . . . . . . . . . 2.4.2 The Era of Highly Active Antiretroviral Therapy . . . . . HIV Therapy Management . . . . . . . . . . . . . . . . . . . . . . 2.5.1 Rules-Based Interpretation Systems . . . . . . . . . . . . 2.5.2 Predicting the Drug Resistance Phenotype from Genotype Further Advancements . . . . . . . . . . . . . . . . . . . . . . . . Extensions to 3.1 3.2 3.3 4 1 Project Background . . . . . . . . . . . EuResist Prediction Engines . . . . . . 5.2.1 Generative Discriminative Engine 5.2.2 Mixed Effects Engine . . . . . . . 45 45 47 51 52 56 60 63 64 67 69 75 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75 78 79 80 vii 5.3 5.4 5.5 5.6 5.7 5.8 5.9 6 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80 81 84 84 84 89 91 94 94 94 95 95 99 99 100 101 102 103 104 105 106 107 107 108 110 111 113 6.1 114 116 119 121 123 124 127 131 139 140 145 150 152 6.4 6.5 6.6 Weighted Finite State Transducers . . . . . . . . . . . . . . . . . . . . . . . 6.1.1 Composition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.1.2 Single Source Shortest Path . . . . . . . . . . . . . . . . . . . . . . . 6.1.3 N -Shortest-Strings . . . . . . . . . . . . . . . . . . . . . . . . . . . . Transducers in Therapy Planning . . . . . . . . . . . . . . . . . . . . . . . . Mutation Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.3.1 Mutation Probabilities Derived From geno2pheno . . . . . . . . . 6.3.2 Mutation Probabilities Derived From Treatment Data . . . . . . . . Validation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.4.1 Prediction of Response to the Next But One Treatment . . . . . . . 6.4.2 Comparing Predicted Future Drug Options to Data From Real Cases Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A Solution to the HIV Pandemic: A Vaccine 7.1 7.2 viii . . . . . . . . . . . . . . . . . . . . . . . . . . Planning Sequences of HIV Therapies 6.2 6.3 7 5.2.3 Evolutionary Engine . . . . . . . . . . . . . . . . . . Combining Classifiers . . . . . . . . . . . . . . . . . . . . . . Predictive Performance of the EuResist Prediction Engine 5.4.1 Data . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.4.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . 5.4.3 Discussion . . . . . . . . . . . . . . . . . . . . . . . . 5.4.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . The EuResist Web Service . . . . . . . . . . . . . . . . . . 5.5.1 Implementation of the EV Engine . . . . . . . . . . . 5.5.2 EuResist web tool . . . . . . . . . . . . . . . . . . Towards Prediction of Sustained Response . . . . . . . . . . 5.6.1 Material and Methods . . . . . . . . . . . . . . . . . 5.6.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . 5.6.3 Discussion . . . . . . . . . . . . . . . . . . . . . . . . Effect of Modified TCE Definitions . . . . . . . . . . . . . . 5.7.1 Material and Methods . . . . . . . . . . . . . . . . . 5.7.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . 5.7.3 Discussion . . . . . . . . . . . . . . . . . . . . . . . . Integrating Novel Drugs . . . . . . . . . . . . . . . . . . . . 5.8.1 Materials and Methods . . . . . . . . . . . . . . . . . 5.8.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . 5.8.3 Conclusions . . . . . . . . . . . . . . . . . . . . . . . Therapy History: Replacement for the Genotype? . . . . . . 5.9.1 Material and Methods . . . . . . . . . . . . . . . . . 5.9.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . 5.9.3 Discussion . . . . . . . . . . . . . . . . . . . . . . . . 155 The Immune System and Vaccines . . . . . . . . . . . . . . . . . . . . . . . 155 Vaccines and HIV . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157 7.3 7.4 8 Predicting Neutralization from Genotype 7.3.1 Material and Methods . . . . . . 7.3.2 Results . . . . . . . . . . . . . . 7.3.3 Discussion . . . . . . . . . . . . . Outlook . . . . . . . . . . . . . . . . . . Conclusion and Outlook . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161 161 166 170 172 175 Bibliography 179 Appendix: List of Publications 201 x 1 Introduction Todays methods of modern molecular biology generate massive amounts of data. Proteomics focuses on the composition of proteins (our molecular machines), (epi-)genomics studies the influence of changes in a person’s (epi-)genetic information on e.g. susceptibility to diseases, where transcriptomics analyzes which genes are active under certain conditions. Research conducted with these methods in the context of diseases often aims at providing personalized therapy. In essence, personalized therapy refers to the delivery of the right drugs in the right dosage to the patient. Decisions regarding the selection of drugs are in general based on available biomarkers, e.g. the patient’s genome; personalization ensures a (cost-)effective therapy. The ways leading to the “personalization” are manifold, for example: 1. Understanding the molecular basis of the disorder. It is crucial to understand how genetic changes lead to the disease, because deeper understanding will allow a classification of defects with similar phenotype into subgroups depending on their molecular origin. This, in turn, will lead to the development of better drugs targeted at the causes of the disease leading to shorter treatment times and reduced costs. 2. Understanding of pathways involved in drug metabolization. Here, genetic changes influencing the effective concentration of the drug are of interest. For instance, mutations resulting in overdosing of the drug increase the risk of side-effects, while mutations that lower the drug concentration may result in treatment failure. 3. Pathogen tailored treatment. Obviously, knowing the disease-causing pathogen allows the selection of better matched drugs for its eradication (e.g. targeted vs. broad-spectrum antibiotic). Moreover, resistance of pathogens against antimicrobial agents often requires screening for active compounds prior to treatment start. Typically, “omics” strategies are employed for identifying disease-related biomarkers. In comparative studies, two groups of individuals (patients vs. healthy controls or responders vs. non-responders) are studied with proteomics, (epi-)genomics, and transcriptomics. In these settings, data are generated at a large-scale and their analysis is far beyond the feasibility of manual inspection. Thus, bioinformatics methods are needed. These methods often rely on methodologies from statistics and statistical/machine learning to separate healthy from sick individuals based on the given data. Moreover, bioinformatics methods are needed to design tools that analyze routinely generated omics data, and thereby bring personalized therapy into clinical routine. In this thesis we focus on personalized treatment of infections with the human immunodeficiency virus (HIV). HIV therapy is one of the most prominent examples of personalized therapy today and belongs to the third category of personalized medicine: finding antiretroviral agents that are effective against the predominant viral variant in the host. 1 2 1 Introduction Personalized HIV therapy Briefly, HIV mainly infects cells of the human immune system and, when untreated, HIV infections typically lead to the acquired immunodeficiency syndrome (AIDS) followed by death due to opportunistic infections. By now, AIDS accounts for an estimated annual 2 million deaths and thus poses one of the major worldwide health challenges. Patients in developed countries have access to about two dozen antiretroviral drugs. These compounds interrupt the replication cycle of the virus and thereby allow for a partial recovery of the patient’s immune system. Complete eradication of all viral particles from the patient is not possible. Moreover, treatment success as defined by undetectable amounts of virus in the blood can only be maintained for a limited time. The cause of the limited treatment success is the dynamic viral population within the host: high turnover rates and short replication cycles paired with an error-prone replication process lead to the evolutionary selection of mutations that reduce the susceptibility of the virus to the applied antiretroviral drug. These resistance mutations eventually render the antiretroviral drug useless for treating the patient. To date, testing for drug resistance prior to prescription of antiretroviral compounds is highly recommended. Resistance can be assessed either by slow and expensive laboratory based tests in cell cultures or via fast and cheap standardized methods that determine the genetic sequence of the viral drug targets. For the output of the latter method, a multitude of tools provides classification for single drugs in terms of drug resistance. In order to delay the emergence of resistance mutations, modern anti-HIV regimens combine at least three drugs with a minimum of two different mechanisms of action. The introduction of this highly active antiretroviral therapy (HAART) resulted in a dramatic decline of HIV-related mortality in developed countries. Resistance testing provides the pivotal tool for selecting active compounds against the viral population in the patient. Methods for resistance testing, however, still provide only recommendations for single drugs and thereby ignore the interplay between multiple drugs for attacking HIV. Outline We begin in Chapter 2 with a brief overview of the discovery of HIV, followed by a summary of the current pandemic. Then, we continue with a description of the HIV particle, its replication cycle, and clinical aspects of the HIV infection. This is followed by a review of state-of-the-art HIV treatment including a detailed description of the viral drug targets. We continue with a brief summary on HIV resistance testing and available genotype interpretation methods. We conclude the chapter with an outlook on potential advancements towards richer decision support systems. In Chapter 3 we present ways for improving and updating geno2pheno, a web service that predicts drug resistance from the viral genome. In Chapter 4 we start with a motivation for interpretation systems that consider combination therapies and summarize previously developed features that encode viral evolution. Then we present geno2pheno-THEO, a freely available interpretation tool, that predicts clinical response to combination therapy using information on the treatment, the virus, and evolutionary features. We continue with a retrospective clinical validation of geno2pheno- 3 THEO, where we compare the prediction performance to three widely-used expert-based interpretation systems. Moreover, we study the robustness of geno2pheno-THEO and explore the benefit of regimen-specific models. Chapter 5 begins with an introduction to the EuResist project, which is followed by a presentation of the three decision support systems for inferring short-term in vivo response to combination therapy developed within the project. We continue with a description of the EuResist prediction engine including combination approaches, performance assessment (including human experts as reference), and the web service. We proceed with an investigation on alternative definitions for response to combination treatment and study the performance of resulting prediction engines. Then we approach the update-problem, which constitutes the Achilles’ heel of data-driven decision support systems for HIV combination therapy. We conclude the chapter with the presentation of a prediction system that uses the patient’s treatment history instead of the viral genotype to infer response to treatment. In Chapter 6 we motivate the requirement for planning sequences of HIV therapies. We present transducers, a family of finite state machines, that constitute the computational framework for our novel method to rapidly simulate viral evolution during combination therapy. Furthermore, we introduce five mutation models that are based on in vitro and in vivo resistance data. The framework is studied in two validation settings. In the first setting we challenge the framework to separate failing from succeeding treatments on the basis of mutations caused by the preceding regimen. Within the second setting we assess whether overall resistance development is correctly reflected by resistance development within drug classes. The chapter is concluded by a case study comparing the impact of typical first-line regimens on future drug options. Chapter 7 makes the transition from personalized anti-HIV therapy to the probably only true solution for the HIV pandemic: an HIV vaccine. In the beginning of the chapter we summarize the basic concept of the human immune system comprising innate and adaptive immune response. This summary is followed by a short description of how vaccines establish immunity. Further, we list the challenges posed by HIV to vaccine design and review the most prominent vaccine candidates. The focus of the chapter is the prediction of neutralization by antibodies based on the genetic sequence of the viral genome. Analysis of the statistical learning models reveals positions that are essential for antibody binding. We conclude the chapter with potential improvements for our approach and further applications to the development of HIV vaccines. Chapter 8 concludes the thesis and provides an outlook on further advancements on personalized HIV therapy. 4 1 Introduction 2 AIDS, HIV, and Antiretroviral Therapy “In the field of observation, chance favours the prepared mind.” – Louis Pasteur This chapter aims at providing the necessary background for this thesis on the human immunodeficiency virus (HIV), the acquired immunodeficiency syndrome (AIDS), and state-of-the-art antiretroviral treatment with the focus on available antiretroviral drugs and routine treatment guidance. However, many important aspects that are not of direct relevance for this thesis had to be omitted. 2.1 The AIDS pandemic In 1981 a mysterious disease was discovered among the gay community in the United States of America. The patients were hospitalized due to diseases resulting from an impaired immune system. Owing to the media, the illness quickly became known as GRID (gayrelated immunodeficiency). The cause of the disease was unknown and the race for its discovery commenced. The first effective tool to screen for a possible infection with the unknown agent was a positive test for an infection with the Hepatitis B virus (HBV). Roughly one year after the first cases of this immunodeficiency in the USA were reported, the virus was identified and isolated by researchers at the Institute Pasteur in Paris (France). For their discovery of the infectious agent that is now known as the Human Immunodeficiency Virus (HIV), Luc Montagnier and Françoise Barré-Sinoussi were awarded the Nobel-Price for Physiology or Medicine in 2008 (Barré-Sinoussi et al., 1983). HIV infects cells of the human immune system and eventually leads to their death, thus explaining the typical symptoms of immunodeficiency in HIV infected patients. Robert C. Gallo, who also claimed to be the first discoverer of the infectious agent causing AIDS was not awarded the Nobel Price (Gallo et al., 1983). Indeed, it turned out that the probes investigated by Gallo’s group were co-infected with a Human T cell Leukemia Virus (HTLV), and thus “the paper by the Montaginer/Chermann group is unequivocally the first reported true isolation of HIV from a patient with lymphadenopathy” (Gallo, 2002). Gallo, however, demonstrated that the identified agent is the cause of AIDS (Popovic et al., 1984) and developed the first diagnostic blood test that prevented new infections and probably saved thousands of lives. The omission of Gallo from the prestigious Nobel Price, despite his numerous major contributions in the early years, resulted in a massive response by the international scientific community (Abbadessa et al., 2009). The history of the discovery of HIV has been briefly summarized by the main contributers Montagnier (2002) and Gallo (2002) in featured articles. The period following the identification of HIV (which in the early works was termed LAV or HTLV-III due to its believed relation to the group of HTL viruses) as the cause of AIDS was part of the scientific success story of modern medicine. Just to name a few milestones: 5 6 2 AIDS, HIV, and Antiretroviral Therapy Figure 2.1: Global prevalence of HIV in December 2007. The figure was adapted from the WHO 2008 Report on the global AIDS epidemic available at http://www. who.int/hiv/data/en/. By the end of 2008 at total of 33.4 million [31.1-35.8] people were estimated to live with HIV. the genome was sequenced and inter- as well as intra-patient variation in viral populations was detected (Shaw et al., 1984; Ratner et al., 1985; Wong-Staal et al., 1985; Hahn et al., 1986), the virus was found in the brain of AIDS patients (Shaw et al., 1985), the modes of HIV transmission were elucidated, all of HIV’s genes and most proteins were defined. But most importantly, the blood supplies in most developed nations were rendered safe by screening for HIV. The capability of screening for HIV infections, however, provided a first hint on how severe the HIV pandemic should become, e.g. sera from hemophiliacs in Japan were tested HIV negative early 1984, by end of 1984 already 20% of those patients were tested HIV positive after they had been treated with HIV-contaminated blood products from the United States (Gallo, 2002). The spread of AIDS was rapid, soon after first reported in 1981 in the USA, fourteen countries reported cases of AIDS in 1982, and 33 already in 1983. By the end of 2008, 33.4 million people worldwide were estimated to be living with HIV. Figure 2.1 displays the prevalence of HIV in different regions of the world. According to the WHO 2009 report on the global AIDS epidemic, 2.7 [2.4-3.0] million people were newly infected with HIV and about 2.0 million [1.7-2.4] million people died of AIDS in 2008. HIV is believed to be responsible for approximately 25 million deaths, thus rendering it the most serious biological killer to date. The prevalence of HIV infected individuals varies heavily between geographic regions, e.g. ranging from 0.1 to 0.5% in Central Europe to 15.0 to 28.0% in Southern Africa. These regional differences are likely the result of many factors, including: different availability of antiretroviral drugs, information campaigns (e.g. the government of South Africa pursued an “AIDS denialism” policy that caused deaths of about 330,000 people (Ano, 2006, 2008)), 2.1 The AIDS pandemic 7 the stigmatization of the HIV infection, and frequency of rapes (e.g. South Africa has an extremely high rate of rapes (Carries et al., 2007)). The high prevalence in sub-Saharan Africa probably results also from the fact that this region is the origin of HIV (Zhu et al., 1998). It is estimated that HIV entered the human population in 1931 [95% CI 1915-1941] through multiple infections from simian immunodeficiency virus (SIV)-infected nonhuman primates (Korber et al., 2000). Thus, approximately 50 years passed between the entry of the virus into the human population and the enrichment of AIDS cases in 1981. During these years, the virus could spread unrecognized, mainly because the late symptoms of AIDS coincide with symptoms of e.g. malnutrition and tuberculosis, which happen to be frequent problems in the infected population. As a consequence, the symptoms were attributed to rather well-known causes instead of a new infectious agent. 2.1.1 HIV Diversity As briefly mentioned, HIV was introduced into the human population by multiple crossspecies transmissions from SIV infected nonhuman primates. To date, we distinguish two types of HIVs: HIV type-1 (HIV-1), which is responsible for the current pandemic, evolved from an SIV variant present in chimpanzees, whereas HIV-2 is the result of a zoonotic infection from SIV in sooty mangabeys (Heeney et al., 2006). HIV-1 comprises four genetically distinct groups, namely M (main or major), N (non-M, non-O), O (outlier), and P, which were the results from at least four different zoonotic infections. Group P was only recently identified by Plantier et al. (2009) and originates from an SIV variant in gorillas. This work focuses on HIV-1 group M as it accounts for most infections worldwide and is therefore of premier interest. HIV-1 group M is further divided into subtypes. This devision is based on clusters typically appearing in phylogenetic analyses of genetic sequences of HIV-1 group M (Robertson et al., 2000). These subtypes are named A to D, F to H, J, and K and have a different prevalence in different geographic regions of the world (Table 2.1), hence, suggesting that their population structure is the result of founder effects. However, it is possible that some phylogenetic clusters only appear because of incomplete sampling of the global viral population, e.g. newly sequenced HIV-1 strains from Central Africa fall in-between established subtype clusters (Rambaut et al., 2004). In addition to the pure subtypes recombinant forms exist, as well. These are either well described established circulating recombinant forms (CRFs) or unique recombinant forms resulting from super-infection of a patient with two or more different subtypes. Questions related to HIV-1 subtypes and its influence on disease progression (Kanki et al., 1999) and efficacy of antiretroviral treatment are still of major interest (Kantor et al., 2005). Subtype B is most prevalent in the developed, industrialized regions of the world, and therefore a representative of this clade was used for development of antiretroviral drugs. Thus, the genetic changes that distinguish B variants from non-B variants are believed to hamper the effectiveness of antiretroviral therapy (Descamps et al., 1998). Recently, it was shown that HIV infection-related complications may vary between different subtypes, e.g. the risk of developing HIV-induced dementia is increased in patients that are infected with subtype D compared to patients infected with subtype A (Sacktor et al., 2009). HIV-1 subtype B is the best-studied variant of HIV-1 owing to the available resources 8 2 AIDS, HIV, and Antiretroviral Therapy Figure 2.2: Phylogenetic relation of lentiviruses in men and non-human primates. Reprinted from The Lancet (Simon et al., 2006) with permission from Elsevier. The recently discovered HIV-1 group P (Plantier et al., 2009) is missing. and infrastructure for research in its region of prevalence. The other HIV-1 subtypes clearly deserve more attention than they have received so far. However, data collection efforts in regions of their prevalence are less advanced than in Europe and North America. Also the data studied in this work originates mainly for collection efforts in North America and Central and Western Europe, hence, there exists a bias towards subtype B - simply due to practical limitations. The diversity of HIV poses also one of the major challenges for HIV vaccine design (Walker and Burton, 2008). The sequence diversity within a single subtype, for example, can reach up to 20%. Clearly, the sequence diversity is an issue especially for antibody based vaccines that require conserved epitopes on surface proteins. For instance, in Africa with a multitude of different circulating viruses the sequence of the envelope protein can differ by up to 38% (Walker and Burton, 2008). 2.2 The Human Immunodeficiency Virus Type 1 geographic region Sub-Saharan Africa South & South-East Asia Latin America Eastern Europe & central Asia North America East Asia Western & Central Europe North Africa & Middle East Caribbean Oceania number of infections 22.400 3.800 2.000 1.500 1.400 0.850 0.850 0.310 0.240 0.059 major risk groups HSex HSex, IDU HSex, MSM, IDU IDU HSex, MSM, IDU HSex, MSM, IDU MSM, IDU HSex, IDU HSex, MSM MSM 9 subtypes A, D, F, G, C, H, J, K, CRF B, AE B, BF A, B, AB B B, C, BC B B, C B B Table 2.1: Worldwide distribution of HIV-1 infections (as of December 2008), typical modes of transmission, and prevalent HIV-1 subtypes. HSex=heterosexual, MSM=Men who have sex with men, IDU=injection drug users. Numbers of infections are given in millions and originate from the WHO 2009 Report on the global AIDS epidemic. CRF=circulating recombinant form. Risk group and subtype distribution is based on (Simon et al., 2006). 2.2 The Human Immunodeficiency Virus Type 1 The human immunodeficiency virus (HIV) received its name officially in 1986. Extensive knowledge on HIV has been accumulated and a substantial amount of literature is available. In the following we provide a brief summary, for further details please refer to Fields et al. (2007) or similar reference literature. HIV belongs of to the family of retroviruses and, more precisely, is a member of the genus of lentiviruses, which indicates a long incubation period. Electron microscopy of particles in infected cell cultures shows spherical entities with a diameter of 100 - 120 µm (Figure 2.3 a). A conceptual representation of the virus architecture is depicted in Figure 2.3 b). The characteristic of retroviruses is that they store their genetic information in ribonucleic acid (RNA) and thus require a mechanism to translate RNA to deoxyribonucleic acid (DNA), which is the carrier of genetic information in their hosts. Each viral particle contains two single stranded RNAs of approximately 10 kb length that are tightly bound to viral nucleocapsid proteins and two viral enzymes (reverse transcriptase and integrase) that are vital for a successful infection of the host cell. This complex is protected by a cone-shaped capsid comprising approximately 2,000 copies of the capsid protein. The viral cone – also known as the viral core – is clearly visible in the electron micrograph (Figure 2.3 a). The viral cone is surrounded by a spherical matrix that is in turn covered by a lipid membrane. Attached to the matrix, is the viral spike, which is responsible for target cell recognition and cell entry. Figure 2.4 shows the organization of the HIV genome. HIV encodes a total of 15 viral proteins in overlapping reading frames. The majority of proteins is part of the three large precursor proteins that have to be cleaved into functional subunits. The polymerase gene 10 2 AIDS, HIV, and Antiretroviral Therapy (a) Electron microscopy of viral particles (b) Schematic representation of an HIV virion Figure 2.3: Electron micrograph of a budding HIV particle and mature HIV particles with visible viral core from Fields et al. (2007) and reprinted with kind permission from Lipincott Williams & Wilkins (a). Schematic representation of a single HIV particle from Wikipedia (http://commons.wikimedia.org/wiki/File: HIV_Virion-en.png) (b). (Pol ) encodes for three proteins: the protease, the reverse transcriptase, and the integrase. The Gag gene is the precursor of four viral structural proteins: p24 (the viral capsid), p6 and p7 (the nucleocapsid proteins), and p17 (the viral matrix). The third precursor is encoded by the Env gene and comprises the two subunits of the viral spike, the glycoprotein gp120 and the transmembrane glycoprotein gp41. As indicated in Figure 2.3 b) the viral spike consists of 6 proteins: one trimer of gp41 and one trimer of gp120, which is heavily shielded by glycans. The smaller genes encode for transactivators (Tat, Rev, Vpr ), which enhance gene expression, and other regulatory proteins (Vif, Nef, Vpu) helping the virus to be more efficient in its reproduction and counter defense mechanisms of the host cell. For instance, Vif disrupts the antiviral activity of the human enzyme APOBEC3G (Sheehy et al., 2002). APOBEC3G causes G-to-A hypermutations in the genome of HIV and other retroviruses, and thereby ultimately destroys the coding and replicative capacity of the invading pathogen. 2.2.1 HIV Replication Cycle Figure 2.5 depicts the replication cycle of HIV: starting from cell entry to maturation of new infectious viral particles. The turnaround time is estimated with 1.5 days from entry to the production of new infectious virions. The life cycle of HIV is complex and our current understanding of each step is the result of extensive research. Nevertheless, many details are not fully understood, yet. Thus, we focus the description of the replication cycle on the targets for modern antiretroviral therapy. Further fascinating aspects of the interplay between viral regulatory proteins and the host as well as the interplay of the host’s immune system with the viral infection are omitted. For a full description of the 2.2 The Human Immunodeficiency Virus Type 1 11 Figure 2.4: Diagram of the genome organization of HIV. Reprinted from Freed (2004) with permission from Elsevier. molecular life cycle see e.g. Fields et al. (2007). HIV enters the host cell by first binding one of its gp120 surface proteins to the CD4 receptor of the target cell. The importance of the CD4 receptor in HIV cell entry was identified shortly after the isolation of HIV (Dalgleish et al., 1984). The CD4 receptor is expressed on CD4+ T cells, macrophages, microglia, and dendritic cells that thereby are the main targets of HIV. After the anchoring step, the gp120 subunit of the viral spike undergoes a conformational change with the consequence of exposing an epitope that, in turn, allows binding to a chemokine receptor (also called a coreceptor). The most important coreceptors used by HIV in vivo are the chemokine receptors CCR5 and CXCR4 (Berger et al., 1999). After coreceptor binding, another conformational change occurs in the gp41 subunit of the envelope protein. This change brings viral and host cell membranes in close proximity with the result of membrane fusion (Esté and Telenti, 2007). Recent results support the hypothesis that HIV primarily enters the target cell by endocytosis followed by fusion in the endosome and not by fusion directly at the plasma membrane (see Uchil and Mothes (2009) for a review). However, the successful fusion is followed by release of the viral core into the cytoplasm of the target cell. Right after the viral core is uncoated, the viral enzyme reverse transcriptase (RT) transcribes the viral RNA into a double stranded DNA (dsDNA). Together with viral and host proteins the dsDNA forms the preintegration complex (PIC), which is guided to the nuclear pore. Once the PIC has entered the nucleus of the host cell, the viral enzyme integrase, which was part of the PIC, catalyzes the integration of the viral DNA into the host chromosome. The integration of the viral DNA into the host chromosome marks a central point in the HIV infection, as from now on the cell is irreversibly infected. The provirus, i.e. the integrated viral DNA, exploits the molecular machinery of the host cell for the production of new infectious viral particles. To be precise, the viral DNA serves as a template for the DNA-dependent RNA-polymerase II (pol II ) that produces messenger RNA (mRNA). This mRNA is (partially) spliced and exported from the nucleus where it is translated into the different viral (poly-)proteins. In addition, full RNA transcripts of the complete integrated viral genome are generated and can be integrated into new virions. The polyproteins Env and Gag/Gag-Pol are transported via different pathways to the viral membrane where they participate in forming new viral particles. The Env precursor protein, gp160, is glycolyzed in the endoplasmatic reticulum of the host cell, where the 12 2 AIDS, HIV, and Antiretroviral Therapy Figure 2.5: HIV replication cycle. The basic steps of the HIV replication cycle: viral entry, reverse transcription, integration, and formation of infectious particles. See main text for further details. Figure reprinted with kind permission of Saleta Sierra, Department of Virology, University Cologne. monomers also undergo oligomerization. The predominant form is a trimer. The gp160 complexes are further transported to the cell’s Golgi complex where they are cleaved into their subunits gp120 and gp41. There, cleavage is mediated by cellular enzymes and not by the viral protease. The heavy glycosylation of gp120 provides the means for HIV to shield its surface protein with a sugar coat against the immune response of the host. The Pol polyprotein is only expressed as part of the larger Gag-Pol protein, which in turn is a result of a -1 translational frameshift (Jacks et al., 1988). The ratio of Gag to Gag-Pol, which is approximated to be 20:1 (20 copies of the Gag polyprotein for each Gag-Pol polyprotein), is crucial for the successful replication of HIV (Shehu-Xhilaga et al., 2001). The HIV protease (PR) is part of Gag-Pol and only active as a dimer. In fact, enzyme activation is initiated when the PR domains of two Gag-Pol precursors dimerize and the complex begins with intramolecular activity followed by intermolecular activity (Pettit et al., 2004). The shorter Gag protein is rapidly targeted to the inner surface of the cell membrane after its synthesis. At the membrane the precursor protein is then cleaved by the viral PR during or after budding from the host cell into its four subunits, which then form the viral matrix, capsid, and nucleocapsid, respectively. Only fully maturated viruses are able to successfully infect new host cells. 2.3 Clinical progression of an HIV infection 13 2.3 Clinical progression of an HIV infection The main infection route of HIV is via sexual transmission across a mucosal surface. The actual risk of transmission varies and greatly depends on applied sexual practices. Maleto-male transmissions are one order of magnitude more likely than male-to-female and female-to-male transmissions. Further prominent routes of infection are mother-to-child transmission (via birth and/or breast feeding) and needle sharing among injection drug users. In the beginning of the HIV pandemic, also tainted blood transfusions were responsible for new infections. The two main markers that are used for monitoring HIV infections are the viral load (copies of HIV RNA per milliliter (ml) blood plasma) and the CD4+ T cell count per microliter (µl) blood. During the acute phase following primary infection, the viral load increases rapidly and reaches a peak of 106 to 107 copies/milliliter blood. HIV directly infects and causes the death of cells that are critical for effective immune response, and thus the CD4+ T cell count decreases rapidly during this phase from over 1000 (as observed in healthy individuals) to around 500. Symptoms of acute HIV infection (for example fever, head ache, weight loss) appear about two weeks after exposure to the pathogen and are rather nonspecific and therefore, in general, are not distinguishable from infections with other viruses like influenza. Consequently, those symptoms are not used to diagnose an HIV infection. Moreover, the nonspecific symptoms result in an unrecognized infection, which in turn, leads to transmission of the virus to further individuals. It is now known that, regardless of the route of transmission, HIV rapidly establishes a persistent infection of the gut-associated lympohid tissue, which is also the principal site of its replication. Arthos et al. (2008) showed that the HIV surface protein gp120 binds via one of its loops (V2) to the receptor integrin α4 β7 , which in turn mediates the migration of leukocytes to the gut (von Andrian and Mackay, 2000). Furthermore, the gp120-α4 β7 complex facilitates the activation of lymphocyte function-associated antigen 1 (LFA-1) on CD4+ T cells. LFA-1 is a central element for establishing virological synapses (Bromley et al., 2001) and consequently paves the way for HIV cell-to-cell transmission. Right after establishing the infection in the gut, the gut-associated lymphoid tissue undergoes a substantial depletion of CD4+ T cells. It is estimated that half of the CD4+ memory cells are killed during the initial attack (Mattapallil et al., 2005). Picker (2006) proposed that this depletion represents an irreversible damage to the immune system that ultimately results in AIDS. Another characteristic of the acute phase is the establishment of a latent viral reservoir. More precisely, HIV infects resting memory CD4+ cells that, despite the integrated provirus, remain replication-competent. The provirus remains dormant in this reservoir, i.e. no viral proteins are actively transcribed, thus the infected cells are not especially targeted by the immune system. In this state, the provirus can survive for decades until eventually HIV replication is initiated and new viral particles are produced. Of note, the latency of the virus is regulated by epigenetic mechanisms (Kauder et al., 2009). Epigenetics refers to the change of phenotype or gene expression based on mechanisms other than changes in the genetic code. For instance, attachment of methyl groups to cytosines in the DNA stably alters gene expression profiles. Towards the end of the acute phase, the viral load drops to a level of 103 to 104 and the immune system partially recovers from the primary attack as indicated by an increase 14 2 AIDS, HIV, and Antiretroviral Therapy Figure 2.6: Typical progression of an untreated HIV infection based on Pantaleo et al. (1993). in CD4+ T cells. The acute phase is followed by a phase of clinical latency where the viral load slowly increases and the CD4 count continuously depletes. It is thought that HIV kills CD4+ T cells directly mainly by inducing apoptosis or necrosis in infected cells. Nevertheless, it is believed that the continuous decrease of CD4+ T cells is a result of the combination of the persistent immune hyper-activation (Hazenberg et al., 2003) followed by activation-induced cell death (Green et al., 2003) of T cells and the massive depletion of mucosal CD4+ cells during the acute phase (Grossman et al., 2006). At the point where the patient’s immune system is too weak, opportunistic diseases and malignancies occur. Most HIV/AIDS-related complications are the result of bacterial or viral challenges to the already depleted immune system. The diseases range from cancers, which occur rarely among uninfected people, like Karposi’s sarcoma as a result of an infection with human herpes virus type-8 to pneumonia induced by Pneumocystis jirovecy. The progression of an untreated HIV infection is depicted in Figure 2.6. AIDS follows the phase of clinical latency and is defined by either reaching a CD4 count of less than 200 cells per µl or the manifestation of specified opportunistic infections. If untreated, AIDS is succeeded by death of the patient from these infections. Progression of the primary infection to AIDS varies among patients, ranging from only six months (Markowitz et al., 2005) to more than 25 years. One predictor of the rate of progression of the disease is the level of viral load established after one year without treatment (Lyles et al., 2000). Two groups of patients are able to control the replication rate of the virus such that the viral load does not exceed 5000 copies per ml (termed long-term non-progressors) or even 50 copies per ml (termed elite or natural controllers). These patients are recruited for genome-wide association studies that aim at uncovering the factors for their superior control of the infection (Fellay et al., 2007). 2.4 HIV Treatment and Drug Resistance 15 2.4 HIV Treatment and Drug Resistance Since the discovery of HIV a large array of antiretroviral drugs has been developed, targeting several stages of the viral life cycle. The first drug zidovudine (abbreviated ZDV or AZT) was approved by the U.S. Food and Drug Administration (FDA) already in 1987 – only four years after the isolation of the virus. ZDV is a nucleoside reverse transcriptase inhibitor (NRTI) that targets the viral reverse transcriptase (RT), and thereby inhibits the process of generating DNA from the viral RNA. The synthetization of ZDV dates back to 1964. The introduction of the first antiretroviral drug was a milestone in HIV therapy – this triumph, however, was only of short nature. Larder et al. (1989) and Larder and Kemp (1989) reported the emergence of resistant variants of HIV after prolonged treatment with ZDV. A major cause for the emergence of resistance is the viral RT itself, since the mechanism of reverse transcription is error prone and the enzyme lacks a proof-reading mechanism. Gao et al. (2004) estimated a mutation rate of HIV with 5.4×10−5 mutations per nucleotide per round of replication (initially Mansky and Temin (1995) reported 3.4 × 10−5 mutations per nucleotide per round, their result, however, was based on shorter fragments of viral RNA). Thus, the RT eventually generates a viral variant that is less susceptible to the drug and therefore is selected evolutionarily under treatment. By now, resistance mutations in all viral target proteins against all available compounds have been reported (Johnson et al., 2008). 2.4.1 Antiretroviral drugs Since the introduction of zidovudine a multitude of anti-HIV drugs has been approved by the FDA and EMEA (European Medicines Agency) for treating HIV infections. Table 2.2 lists all currently FDA approved drugs. The targets of those drugs and their mode of action will be explained in the following paragraphs. Here we proceed in order of the steps of the viral replication cycle that they block. Despite the large number of already existing anti-HIV drugs, multiple drugs in established and novel drug classes are under investigation. Table 2.3 lists an excerpt of currently investigated drugs. intercept the viral replication at the first possible point: entry of the viral core into the cytosol of the host cell. Of all approved anti-HIV drugs, entry inhibitors are the only drugs that target a host protein rather than a viral protein. The development of entry inhibitors targeting human receptors originates from the observation that 4%-16% of the European population (higher rates in the north and lower rates in the south) have a homozygous ∆32 mutation in the CCR5 gene rendering the resulting CCR5 receptor nonfunctional (Novembre et al., 2005). However, the population affected by the genetic defect was also reported to be virtually immune against HIV infections, while on the other hand, no severe side effects resulting from the nonfunctional receptor are known (Novembre et al., 2005). In addition to maraviroc, which is a CCR5 inhibitor and currently the only approved entry inhibitor, more CCR5 and also CXCR4 inhibitors are under investigation (Esté and Telenti, 2007). Prior to the use of coreceptor blockers, it Entry and Fusion Inhibitors 16 2 AIDS, HIV, and Antiretroviral Therapy Generic name Abbreviation Trade name Nucleoside and nucleotide reverse transcriptase inhibitors (NRTIs) zidovudine ZDV, AZT Retrovir didanosine ddI Videx zalcitabine ddC Hivid stavudine d4T Zerit lamivudine 3TC Epivir abacavir ABC Ziagen tenofovir TDF Viread emtricitabine FTC Emtriva Non-nucleos(t)ide reverse transcriptase inhibitors (NNRTIs) nevirapine NVP Viramune delavirdine DLV Rescriptor efavirenz EFV Sustiva etravirine ETR, ETV Intelence Protease inhibitors (PIs) saquinavir SQV Fortovase, Invirase (SQV + RTV) ritonavir RTV Norvir indinavir IDV Crixivan nelfinavir NFV Viracept fos-/amprenavir FPV/APV Lexiva/Agenerase lopinavir LPV Kaletra (LPV+RTV) atazanavir ATV Reyataz tipranavir TPV Aptivus darunavir DRV Prezista Fusion inhibitors (FIs) enfuvirtide ENF, T-20 Fuzeon Entry inhibitors (EIs) maraviroc MVC Selzentry Integrase inhibitors (InIs) raltegravir RAL Isentress FDA approval 1987 1991 1992-2006 1994 1995 1998 2001 2003 1996 1997 1998 2008 1995 1996 1996 1997 2003/1999 2000 2003 2005 2006 2003 2007 2007 Table 2.2: Antiretroviral drugs approved by the FDA. Table is based on information from http://aidsinfo.nih.gov. 2.4 HIV Treatment and Drug Resistance Generic name NRTIs apricitabine elvucitabine racivir NNRTIs rilpivirine lersivirine MIs bevirimat vivecon EIs AMD11070 vicriviroc ibalizumab InIs elvitegravir Abbreviation Phase ATC ACH-126443 RCV III II II TMC278 UK-453061 III II BVM MPC-9055 II II AMD070 VCV TNX-355 PRO 140 II III II II CXCR4 antagonist, trial was halted CCR5 antagonist monoclonal antibody targeting CD4 monoclonal antibody targeting CCR5 EVG III high cross resistance with RAL 17 Comment Table 2.3: Some investigational antiretroviral drugs. Table is based on information from http://aidsinfo.nih.gov and http://www.aidsmeds.com. is necessary to determine which coreceptor is used by the virus for entering (see Lengauer et al. (2007) for a review). A further mechanism of preventing HIV from entering a target cell is to inhibit fusion of virus and host cell membranes. The currently only licensed fusion inhibitor (FI), Enfuvirtide (abbreviated ENF or T-20), is a synthetic peptide that mimics the amino acids 127-162 of gp41. ENF binds to a subunit of gp41 and therefore prevents the required conformational change that facilitates the fusion of host and viral membrane. Drug resistance mutations are usually located in the ENF-binding site on gp41 (direct resistance) or confer resistance indirectly via mutations in other regions of gp41 and even in gp120 (Miller and Hazuda, 2004). The latter mechanism is less well understood. The two major drawbacks of ENF are its price (approximately 2,200 Euros for one month of treatment compared to e.g. 400 Euros for one month of ZDV treatment in Germany; price estimates are based on retail prices at http://www.docmorris.de and the recommended daily dose at http://aidsinfo.nih.gov) and the way of administration: ENF has to be injected twice daily, and thereby causing a number of skin irritations. interfere with the process of generating a DNA copy of the viral genome. The reverse transcriptase (RT) functions as a heterodimer comprising the p66 and p51 subunits. The p66 subunit is the full product of the RT region of the Pol gene, while the p51 subunit is obtained from the p66 unit by removal of a fragment at the C-terminus. This cleavage process is catalyzed by the viral protease. The p51 subunit plays only a structural role while the larger unit has two enzymatic centers: a DNA polymerase Reverse Transcriptase Inhibitors 18 2 AIDS, HIV, and Antiretroviral Therapy and a ribonuclease H (RNaseH). The DNA polymerase copies either viral RNA or DNA, while the RNaseH is responsible for the degradation of tRNA primers and genomic RNA present in DNA-RNA hybrid intermediates. There are two classes of reverse transcriptase inhibitors that are distinguished by their mode of action. The group of nucleoside/nucleotide reverse transcriptase inhibitors (NRTIs) are nucleoside and nucleotide analogs that are incorporated by the viral RT into the newly synthesized DNA strand. These analogs lack a free 3’-hydroxyl group and therefore terminate the transcription process after their incorporation into the DNA. The cost of NRTIs for one month of treatment varies from 300 Euros to 500 Euros (all prices apply to Germany only as they are based on information from http://www.docmorris.de). Three different mechanisms, associated with specific sets of mutations, used by HIV to lower the effect of NRTIs have been observed (see Sluis-Cremer et al. (2000) for a review). These mechanisms involve improved discrimination between NRTIs and their real dNTP counterpart, enhanced removal of the chain terminating NRTI at the 3’ end, and alteration in RT-template primer interactions. Non-nucleoside reverse transcriptase inhibitors (NNRTIs) form the second group of RT inhibitors. NNRTIs are small molecules that inhibit the RT by binding to a hydrophobic pocket in the proximity of the active site of the enzyme. Once the inhibitor is bound, it impairs the flexibility of the RT resulting in its inability to synthesize DNA. Resistance to NNRTIs occurs by mutations that reduce the affinity of the inhibitor to the protein. Usually, a single mutation selected by one NNRTI is sufficient to confer complete resistance to all compounds of the drug class (Clavel and Hance, 2004). This phenomenon is termed cross-resistance. Thus, newer inhibitors like etravirine (ETR) are designed such that they still exhibit high affinity in the presence of mutations induced by other NNRTIs. One month of NNRTI treatment costs about 450 Euros for older drugs and 650 Euros for recently approved products. Figure 2.7 b) shows the structure of a functional heterodimer with typical NRTI and NNRTI resistance mutations highlighted in the p66 subunit. aim at preventing the enzyme from integrating the viral DNA into the host chromosome. The integrase (IN) functions as a tetramer. Each monomer, which is cleaved out by the PR from the C-terminal portion of the Gag-Pol polyprotein, has three domains. The N-terminal domain contains a HH-CC zinc finger motif that is partially responsible for multimerization, optimal activity, and protein stability. The DDE motif in the core domain forms the catalytic triad. The C-terminal domain binds non-specifically to DNA with high affinity. So far it was not possible to crystallize the entire 288-aminoacid long protein due to its low solubility and propensity to aggregate. As an intermediate solution, the three domains have been crystallized individually. Moreover, structures of the N-terminus plus the core domain (Wang et al., 2001) and the core domain plus C-terminal domains (Chen et al., 2000) have been generated. The integration of the viral DNA requires three subsequent steps. During the 3’ processing step the integrase removes a dinocleotide from the long terminal repeat of each HIV-DNA strand. This step is followed by a process termed strand transfer occurring in the nucleus where the integrase cuts the cellular DNA and covalently links the viral DNA 3’ ends to the target DNA. The final step, the required gap repair, is believed to be carried out by host DNA repair enzymes (Yoder and Bushman, Integrase Inhibitors 2.4 HIV Treatment and Drug Resistance 19 2000). Currently only one FDA approved integrase inhibitor (INI) is available. Raltegravir (RAL) is a strand transfer inhibitor that interferes with the process by binding to the DDE motif in the catalytic domain (Hazuda et al., 2000). Successful inhibition of the integration process leaves the viral DNA in the nucleus, where it is recircularized by the host’s repair enzymes. Hence, the HIV life cycle is interrupted. The mechanism of resistance of HIV against INIs is still subject to investigation. Some clinically relevant mutations are found to increase the rate of dissociation of the inhibitor from the IN-DNA complex (Grobler et al., 2009). The price for one month of raltegravir treatment can be assessed with 1,100 Euros. interfere with the process of forming new infectious viral particles. The viral enzyme mainly engaged in virion maturation is the viral protease (PR). The PR cleaves the larger precursor proteins Gag and Gag-Pol into smaller functional units. Unlike its cellular relatives the viral PR acts as a true homodimer. Each monomer comprises 99 amino acids and the active site is formed by a cleft in between these symmetrically arranged monomers (Figure 2.7 a). The cleft accommodates the substrate to be cleaved and the two flexible flaps stabilize the substrate in the active site. The active site has room for a peptide of approximately seven amino acid length. Because of its important role in virion maturation, the viral PR was soon subject of investigation for potential inhibitors. Protease inhibitors (PI) are small molecules that bind to the active site of the protease and therefore compete with its natural substrates. In fact, the design of protease inhibitors constitutes a good example of structure-based drug design, since the chemical structure of the inhibitors were chosen to mimic the structure of the viral peptides that are naturally recognized and cleaved by the protease. There are two mechanisms that result in resistance of HIV against PIs. The first one is the exchange of amino acids in the PR such that the affinity to the inhibitor is decreased while the natural substrates can be bound efficiently (Clavel and Hance, 2004). Modifications of the affinity to the natural substrate alter also the efficiency of the protease. Thus, the second mechanism introduces compensatory mutations aiming at reestablishing the efficiency of the enzyme while maintaining resistance against the inhibitor. These compensatory mutations can occur either in the protease or in its substrate, i.e. at cleavage sites (Nijhuis et al., 2007). As in the case of NNRTIs, the new generation of PIs shows antiviral activity in the presence of mutations selected earlier by other PIs. The expenses for one month of PI treatment range from 800 Euros to 1200 Euros. Another class of drugs that interfere with the production of mature virions are maturation inhibitors (MIs). Unlike PIs they bind to the substrate of the protease instead of to the protease itself (Salzwedel et al., 2007). Currently no MI is approved by the FDA. The most advanced MI, bevirimat, prevents cleavage by the protease at the junction between the capsid and a spacer in the Gag precursor protein. Unfortunately, it was shown that a natural and frequently occurring polymorphism in Gag inactivates the antiviral activity of bevirimat (McCallister et al., 2008; Baelen et al., 2009), and therefore gives the inhibitor an unfavorable perspective (Verheyen et al., 2009). Protease and Maturation Inhibitors 20 2 AIDS, HIV, and Antiretroviral Therapy (a) HIV protease (b) HIV reverse transcriptase Figure 2.7: Structural models of viral targets for antiviral therapy rendered with the software PyMOL. (a) Protease with bound inhibitor Amprenavir. The two monomers of the functional homodimer are colored differently (PDB entry 3EKV). (b) Structural model of the functional RT dimer comprising the subunits p66 (mangenta) and p51 (cyan) from PDB entry 1RTJ. Typical NRTI resistance mutations are highlighted in orange, and NNRTI resistance mutations in green. Novel drugs are under investigation in nearly all established drug classes described in the previous paragraphs. However, also novel targets or mechanisms are under ongoing investigation. The studied approaches range from new ways of attacking old targets, e.g. small peptides that prevent formation of an active RT dimer by blocking the binding of the p51 and p66 subunits (Agopian et al., 2009), aptamers targeting the RNaseH activity of the RT (Andreola et al., 2001), or integrase binding inhibitors with activity at the 3’ processing step (Thibaut et al., 2009), to more exotic approaches. Sarkar et al. (2007) reported a successful excision of the integrated provirus from the host chromosome. A successful drug based on this mechanism would eventually allow to cure HIV infections, rather than (merely) delaying disease progression. A further possibility to eradicate the latent reservoirs of HIV is to trigger the transition from the latent phase to the active phase by targeting the epigenetic regulation (Kauder et al., 2009). Finally, targeting the frame shifting that maintains the ratio of Gag to Gag-Pol required for the generation of infectious virions is an option (Gareiss and Miller, 2009). Future Developments. 2.4.2 The Era of Highly Active Antiretroviral Therapy The rapid resistance development of HIV against individual drugs required a new pharmaceutical strategy. Soon after the release of ZDV other nucleoside analogs were marketed and dual therapy comprising two NRTIs was the first attempt to control viral replication (Hammer et al., 1996). The approach of combining several antiretroviral compounds benefited the most from the release of drugs in other drug classes: NNRTIs and PIs. This marks also the start of the era of highly active antiretroviral therapy (HAART) in 2.4 HIV Treatment and Drug Resistance 21 1995. HAART combines a minimum of three drugs from at least two different drug classes (Clavel and Hance, 2004). A typical HAART combines two NRTIs plus either one PI or one NNRTI (Dybul et al., 2002). The rationale of HAART is to maximally suppress viral replication and thereby delay the progression of the infection to AIDS and death. The use of different drug targets provides the means of erecting a higher barrier for HIV to escape the regimen by developing resistance mutations. The success of HAART is based on the fact that HIV has to acquire multiple resistance mutations against the different drugs in the regimen. Here, the use of multiple drug classes ensures that different sets of resistance mutations are required for a successful escape. Thus, the combination of a successfully suppressed viral load, i.e. few viruses that can perform the “experiment”, and a high genetic barrier to resistance, i.e. requirement of multiple mutations, substantially slows down the progression of the infection. The success of HAART was manifest in the immediate decline in HIV related mortality following its introduction (Mocroft et al., 1998; Crum et al., 2006). Despite its success, HAART presents a burden to the patients as many pills have to be taken on a daily basis. Moreover, a strict adherence to the treatment schedule is required for optimal suppression of viral replication. In addition, every drug comes with a extensive list of side effects ranging from headache and diarrhoea to lipodystrophy (fat redistribution) and even peripheral neuropathy (nerve damage) and therefore can seriously impair the patient’s quality of life. Given that currently there is no cure for HIV and antiretroviral treatment is a life long struggle, pharmaceutical companies are aiming at improving pharmacokinetics and the side effect profile of the drugs with the aim to render treatment more bearable. Nowadays, some antiretroviral combination therapies are available as a “once-daily” treatment delivered by a single pill. For example, the drug Atripla comprise three antiretroviral compounds (FTC+TDF+EFV) and is typically used for first-line treatment (one month of treatment costs approximate 1,300 Euros). A further milestone in HAART was the discovery that the protease inhibitor ritonavir interferes with the liver enzyme cytochrome P450 (Kumar et al., 1996). This enzyme is involved in the metabolic processing of most protease inhibitors. Thus, the use of a small dose of ritonavir inhibits the liver enzyme, and helps to maintain optimal levels of other protease inhibitors in the patient’s blood for a longer period of time. The boosting of protease inhibitors with ritonavir is standard as of 2001 – following the introduction of Kaletra (LPV+RTV) – and is usually denoted by PI/r. Currently, pharmaceutical companies are searching for further compounds achieving the same effect as ritonavir but with fewer side effects. Despite of today’s potency of HAART, drug resistance is still an issue. By acquiring drug resistance mutations the virus regains the ability to replicate in the presence of the drugs leading to a high viral load, which marks virological failure. Furthermore, virological failure usually precedes immunological failure marked by substantial decrease of CD4+ T cells. A factor that contributes to resistance development in patients undergoing HAART is incomplete adherence (Harrigan et al., 2005; Glass et al., 2008). Patients not taking their drugs at all do not impose enough selective pressure on the virus. Conversely, if the drugs are taken correctly and optimal drug levels are always reached, then the virus has little chance to develop mutations. On the other hand, drugs taken at irregular intervals leave the opportunity of developing mutations that are subsequently selected due to the 22 2 AIDS, HIV, and Antiretroviral Therapy selective pressure. Unfortunately, drug resistance may also appear in patients with perfect prescription refill rates. Harrigan et al. (2005) found at least one abnormally low drug plasma concentration in 36% of the patients with 95% prescription refill rate (i.e. almost perfect adherence to treatment). This points to either host genetic factors lowering the amount of available drug in the patient’s body or to incomplete adherence during the daily schedule, e.g. all drugs for a day are taken in the morning. A major factor towards non-adherence are again the side effects of the compounds. Patients occasionally remove drugs from an effective HAART without consulting with their physician in order to achieve remedy for troubling side effects. 2.5 HIV Therapy Management The circumstance that HIV evolves into drug resistance is further complicated by the fact that resistance mutations selected by one drug also confer resistance against other drugs from the same class. This phenomenon of cross-resistance is most pronounced among NNRTIs where, prior to the release of etravirine, a single mutation could render the complete drug class useless. The second generation of NNRTIs and PIs has therefore been designed in a way that they retain antiviral activity in the presence of commonly selected resistance mutations by older drugs. In order to ensure an effective treatment the extent of drug resistance has to be appraised prior to onset of the regimen. Resistance of HIV against antiretroviral drugs can be assessed by two different approaches. The first approach, phenotyping, measures the resistance of the virus in an experimental assay (see e.g. Walter et al. (1999) and Petropoulos et al. (2000)). In such assays the replication of the patient’s virus (clinical isolate) is compared to the replication of a reference wild type strain in the presence of a varying concentration of the drug. The concentration that cuts the replication rate in half is termed 50% inhibitory concentration (IC50 ). The fold-change (FC) in resistance, also often referred to as resistance factor (RF), is then simply the quotient between the IC50 of the clinical isolate and the reference virus: IC50 (clinical isolate) FC = . IC50 (reference isolate) Figure 2.8 depicts an example of dose-response curves generated by a phenotypic resistance test. The experiment has to be carried out for every single drug, thus rendering the whole process time- and cost-intensive – results are available within weeks and costs are in the range of 1,000 Euros. Moreover, owing to the nature of the assay the test is restricted to few specialized laboratories with sufficient security level and expertise. The second method focuses on the genetic sequence of the viral drug targets. In genotyping the viral sequence is obtained using standardized fast and cheap methods (result available within days at costs of 100 Euros) that can be carried out in virtually every laboratory. The outcome of genotyping is simply a list of mutations compared to a wild type virus and clearly much harder to interpret in terms of drug resistance as the single number per drug generated by phenotyping. In order to assist the interpretation of the viral genotype the international AIDS society (IAS) maintains an annually updated list of resistance mutations (Johnson et al., 2008). This list, however, mainly informs about the mere existence of mutations, the necessary knowledge about how many of the mutations 2.5 HIV Therapy Management 23 Figure 2.8: Dose-response curve of a phenotypic drug resistance test. Reprinted by permission from Macmillan Publishers Ltd: Nature Reviews Microbiology (Lengauer and Sing, 2006), copyright (2006). and in which combination are required to confer intermediate or complete resistance is not provided. The missing semantics of the resistance mutations is introduced by rulesbased interpretation systems developed by virologists and clinicians (see Lengauer and Sing (2006) for a review). The benefit of resistance tests (genotypic or phenotypic) has been demonstrated in prospective clinical trials. For example, Durant et al. (1999) showed that treatment decisions based on a genotype have a significantly better outcome than uninformed readjustments of medication. Moreover, Oette et al. (2006) showed that genotypic resistance analysis is beneficial even in treatment-naïve patients, i.e. patients that have never received any antiretroviral drug. The observed improvement is a direct consequence of transmitted drug resistance mutations, where patients are infected with already resistant viral strains. The advantage of resistance testing and the broad availability of genotyping led to routine sequencing of relevant parts of the viral genome. The most relevant part comprises the 99 amino acids of the viral protease and up to the first 240 amino acids of the RT. These data are now generated routinely in laboratories for genotype based resistance tests and are stored together with frequently obtained markers of the treatment success (viral load and CD4+ T cell count) and detailed information about the patient’s regimen (drugs, start, stop, cause of stop) in databases (Rhee et al., 2003; Roomp et al., 2006). These data collections provide a rich basis for further developments in terms of resistance interpretations as we will show in the remainder to this thesis. 2.5.1 Rules-Based Interpretation Systems The need for adding semantics to the resistance mutations promoted the development of many rules-based interpretation systems. The rules are generated by expert panels comprising virologists and clinicians on the basis of genotype-phenotype relationships, publica- 24 2 AIDS, HIV, and Antiretroviral Therapy tions, clinical response (in the form of mutations observed in patients failing antiretroviral therapy), and personal experience. The predictions by different rules-based interpretations are often discordant owing to their history of origins, which is manifest in the differing sets of rules and design principles. Frequently the discordant predictions lead to confusion by users consulting multiple decision support systems (De Luca et al., 2003). Moreover, the degree of disagreement can also depend on the HIV-1 subtype (Snoeck et al., 2006) thereby reflecting the general lack of knowledge on non-B subtypes. The rule sets are maintained regularly in meetings of the expert boards, and the rules have to be updated because of the ongoing release of new antiretroviral drugs. Rule sets for novel compounds have to be added as well as the impact of mutations selected by new compounds on the established drugs has to be studied. Thus, over time the rules undergo refinement that reflects the state-of-the-art knowledge on HIV drug resistance. Nowadays, antiretroviral therapy comprises multiple drugs. The decision support systems, however, provide only classifications for individual drugs. In order to overcome this limitation, the verbal classification provided by the tools is often mapped to a rating between 0 and 1 assessing the susceptibility of the virus against an individual drug. The ratings of individual drugs are summed to form a genotypic susceptibility score (GSS) that usually ranges between 0 and the number of drugs in the regimen (De Luca et al., 2003). For treatment-naïve patients clinicians usually aim at achieving a GSS of 3.0 (i.e. three fully active drugs) while for treatment experienced patients with heavily mutated viruses a GSS of 2.0 is recommended. The rules-based systems enjoy great popularity among clinicians and virologists involved in HIV patient care, mainly owing to the fact that they are not providing “black box” predictions. More precisely, the basis of the classification, i.e. the rule sets, are freely available (for most systems) and thereby allow users to follow the rationale behind the decision. In the following we provide a brief overview about the most popular rules-based decision support systems. The interpretation system provides a three-level classification of drug resistance and is developed by the French ANRS (National Agency for AIDS Research) AC 11 Resistance group. Their set of rules is now available in version 18 on the web site: http://www.hivfrenchresistance.org/ (Meynard et al., 2002). ANRS 07/2009 V18 Is a rule set maintained by the Rega Institute of the Catholic University in Leuven. It also provides a three-class rating. In contrast to the ANRS system it applies a weighted score for the conversion from the verbal classification (i.e. susceptible, intermediate, and resistant) into the scores that are then used to compute the GSS. Precisely, the score for boosted PIs is 0.75 and 1.5 instead of 0.5 and 1 for intermediate and fully susceptible viruses, respectively, and the score corresponding to intermediate resistance of INIs, EIs and some NNRTIs is 0.25 instead of 0.5. The specifications are available at http://www.rega.kuleuven.be/cev/ (Van Laethem et al., 2002). Rega Version 8.01 The HIVdb system is maintained by the Stanford university and available at http://hivdb.stanford.edu. Unlike in the other systems, each mutation receives a score between -10 and 60 and therefore the systems resembles rather a linear HIVdb Version 6.0.5 2.5 HIV Therapy Management 25 model with expert-derived weights than a rules-based system. Negative scores indicate resensitization effects, i.e. a virus with such mutations is more susceptible against that particular drug in the presence of the mutation. Based on the weights the resistance against each drug is assigned to one of five possible classes (Rhee et al., 2003). Drugs scored with 0-9 are estimated to be “Susceptible”, 10-14 indicates “Potential low-level resistance”, while 15-29 and 30-59 correspond to “Low-level resistance” and “Intermediate resistance”, respectively. Values of 60 or greater indicate “High-level resistance”. Moreover, the web service provides implementations of other rule sets like ANRS and Rega and additionally offers the option to use a personalized rule set. HIV-grade is an interpretation system maintained by HIVgrade e.V. an association of German clinicians and virologists. Like HIVdb it provides the possibility to compare different systems (all those mentioned above including geno2pheno described in Section 2.5.2). HIV-grade differs from the other systems by providing special rules for some drugs. These special rules consider which other drugs are part of the combination. In particular, the additional rules are aiming to capture mutations with resensitization effect and require that the drug selecting for the resensitization mutation is maintained in the regimen for keeping up the selective pressure. The susceptibility for each drug is provided in a three-class system. The web service is available at http: //hiv-grade.de. HIV-grade Version 12-2008 The AntiRetroScan algorithm is maintained by the Italian Antiretroviral Resistance Cohort Analysis (ARCA) consortium (Zazzi et al., 2009). Like HIVdb it provides five resistance classes and like the Rega rules it applies different weights for different drug classes that should be used to compute a GSS. The rule set is available at http://www.hivarca.net/includeGenpub/AntiRetroScan.htm. AntiRetroScan version 2.0 In addition to the free decision support systems introduced in the previous paragraphs, there also exists a number of proprietary rule sets that are part of diagnostic kits. Those kits usually include a sequencing machine and primers needed for the sequencing. Moreover, the interpretation software has to be approved by the FDA. An example of a proprietary system is the ViroSeq (Version 2.8) rule set, which is developed by Celera Diagnostics and provides three output categories (http: //www.celeradiagnostics.com/cdx/ViroSeq). The tool accompanies the ViroSeq HIV-1 Genotyping System developed by Abbott. Likewise, the GuideLines Rules (Version 14) are updated annually by an independent expert panel and accompany the TRUGENE HIV-1 Genotyping Assay offered by Siemens (http://www.medical.siemens.com/). Proprietary Systems 2.5.2 Predicting the Drug Resistance Phenotype from Genotype Currently, there exists a multitude of expert-based decision support systems. The systems frequently disagree on the classification results, which is no surprise given the different design principles and experts involved in the numerous boards. However, there has been the desire to base the classification of drug resistance on a more objective foundation. Obviously, an objective criterion of drug resistance is the phenotypic drug resistance as measured 26 2 AIDS, HIV, and Antiretroviral Therapy in vitro by experimental assays. There are two frequently used tools that predict phenotypic drug resistance from the viral genotype. One system, VircoTYPE, is proprietary and maintained by the company Virco BVBA (Mechelen, Belgium), the other, geno2pheno, is a freely available web service offered via http://www.geno2pheno.org (Beerenwinkel et al., 2002, 2003a). Both systems use statistical learning to infer drug resistance models from matched genotype-phenotype pairs. Application of these models allows to infer the more expensive and objective information on the basis of the cheap and easily available genotype. The first version of geno2pheno used decision trees (Breiman et al., 1984) to classify the virus as resistant or susceptible with respect to several drugs. The training data for the initial models comprised approximately 450 genotype-phenotype pairs (Beerenwinkel et al., 2002). The decision trees achieved moderate prediction performance but demonstrated in an appealing way, how rules of drug resistance can be automatically derived from data. Precisely, decision trees are a white box classifier, i.e. in addition to the classification result the user can learn what input led to the final decision, as opposed to black box models like neural networks and support vector machines (SVMs) with non-linear kernels (Boser et al., 1992). In a later version of geno2pheno a SVM-based prediction was added to the web service and the models were trained on approximately 650 genotype-phenotype pairs (Beerenwinkel et al., 2003a). The SVM exhibited better prediction performance than the decision trees and allowed to solve a regression problem, i.e. predicting the FC in resistance rather than only a class label. Because linear SVMs cannot operate on categorical data, e.g. amino acid sequences, the viral genotype was represented as binary vector, with one indicator for every possible amino acid at each sequence position. Hence, the encoding required 20 times the length of the amino acid sequence. The model interpretation of SVMs, however, is not as trivial as in the case of decision trees. Based on the work by Sing et al. (2005b) the linear SVM models for the individual drugs could be interpreted and a list of scored mutations (main contributers to observed resistance) is now provided with every prediction (for details see Section 3.1). The current version of geno2pheno is trained on approximately 1,000 genotype-phenotype pairs per drug. VircoTYPE uses also a linear model but, unlike geno2pheno, uses linear regression with pairwise interaction terms for mutations. The statistical learning task was carried out on a median of 46,100 genotype-phenotype pairs per drug (Vermeiren et al., 2007). Again, the use of a linear model facilitates model interpretation. In contrast to geno2pheno, VircoTYPE is a commercial tool that charges the user for every prediction. 2.6 Further Advancements Since the first methods that predict in vitro drug resistance have been published, this task was subject of numerous publications. The wealth of publications was also boosted by a freely available genotype-phenotype dataset published by Stanford’s HIVdb (Rhee et al., 2006). The publications range from use of established statistical learning models like simple linear models (Wang et al., 2004), linear models including interaction terms (Vermeiren et al., 2007), and rules-learning algorithms (Kierczak et al., 2009) to the development of novel techniques like non-parametric methods (DiRienzo et al., 2003) and machine learning approaches aiming at the discovery of patterns of resistance mutations (Saigo et al., 2007). 2.6 Further Advancements 27 Other approaches extended the genotype-phenotype relationship by incorporating descriptors of the antiretroviral drug in the prediction model (Drăghici and Potter, 2003; Lapins et al., 2008). Additionally, methods from docking and molecular dynamics simulation were used to compute the binding energy of PIs to variants of the protease. The binding energy turned out to be a good predictor of resistance without requiring a training step nor training data (Jenwitheesuk and Samudrala, 2005). These systems, however, represent a proof-of-concept and are not used as clinical decision support tools. Despite all the approaches for predicting in vitro drug resistance, the overall aim is still to infer clinical in vivo response of the patient to the drugs. Thus, the estimation of relevant cutoffs is an essential part for establishing a useful prediction system. The expert derived rules-based systems, on the other hand, mainly aim at interpreting resistance mutations in terms of clinical response and consequently have no need for relevant cutoffs. Nonetheless, all approaches have in common that they only classify single drugs as resistant or susceptible, but what lies at the heart of the problem is the prediction of response to a combination of drugs. In fact, HIV experts have discouraged inferring response from genotype via the intermediate step of first predicting the resistance phenotype (Larder et al., 2007; Brun-Vézinet et al., 2004). In contrast to this criticism, we could show that predicted phenotypes – using the VircoTYPE system – combined with clinically relevant cutoffs are at least as predictive as several expert derived rule sets. Additionally, predictive accuracy could be further enhanced for all systems by weighting individual drugs using statistical learning methods instead of the simple summation usually applied to derive a GSS or PSS (Altmann et al., 2009b). Moreover, statistical learning allowed to include predictions by multiple interpretation systems to infer virological response. In fact, combinations of predicted phenotypes with expert predictions showed a slight improvement, which could also be confirmed on a large independent test set. In contrast, combinations of only rules-based predictions showed no added value over the use of a single system. In this study we also evaluated the importance of predictions for single drugs in terms of response to antiretroviral treatment. Indeed, the benefit of statistical learning originated from the different weighting of drugs and confirms earlier results (Swanstrom et al., 2004). This weighting is presumably the result of the abundance of the drug in the training data, its potency, the accuracy of the prediction algorithm for that drug, pharmacokinetics (e.g. the forgiveness of a drug when a dosage was missed or taken too late), and further factors adding to the success of the drug in vivo. Given a sufficient amount of data, it is possible to develop models that directly predict response to antiretroviral combination using genetic information and the drugs in the treatment. Such models thereby circumvent the need to first assess resistance against single compounds and then subsequently combine the single predictions to a score for the complete regimen. A further advantage is that such direct models can learn negative and positive pharmacokinetic interactions between drugs from the data. Indeed, there exists a large number of interactions between the different drugs (see Boffito et al. (2005) for a review), ranging over different drug classes and even to non-anti-HIV drugs (e.g. all drugs metabolized by cytochrome P450 are affected by ritonavir boosted PIs). The worst interactions between drugs in terms of drug resistance are effects that lower the bioavailability of one or more compounds of the combination treatment: frequent suboptimal levels of 28 2 AIDS, HIV, and Antiretroviral Therapy a drug in the blood plasma provide the right mixture of selective pressure and extent of viral replication to eventually generate resistant viruses. An online repository providing lectures and overviews on pharmacokinetic interactions is maintained by the Liverpool HIV Pharmacology Group (http://www.hiv-druginteractions.org/). Plasma levels of drugs can also be influenced by the genetic predisposition of the host. For example, Mahungu et al. (2009) showed that a mutation in cytochrome P450 has a positive influence on the plasma level of the NNRTI nevirapine. Hence, consideration of host genetics during the selection of compounds for a combination therapy can yield further improvements. On the other hand, severe adverse effects can be increased depending on host genetics, e.g., the patient immunotype HLA-B*5701 is strongly associated with hypersensitivity to the NRTI abacavir (Mallal et al., 2008). Still, despite its evident benefit, pharmacogenetic testing is currently not widely applied. A further host-specific aspect not considered by current tools is the presence of coinfections with other viruses (e.g. HBV) or other (chronic) diseases that require medication. Again, compounds used to treat other diseases may limit the efficacy of anti-HIV drugs or might add limitations in the selection of anti-HIV drugs. Further important information is neglected by the restriction of the resistance analysis to the baseline genotype alone: archived resistance mutations in the viral reservoirs (latent viruses). Resistance mutations selected by earlier treatments can disappear from the viral population as soon as the selective pressure is removed. For instance, protease resistance mutations disappear in the absence of protease inhibitors because the protease is more efficient without those resistance mutations. Increased efficiency translates into a higher number of infectious viruses that eventually outnumber the protease resistant variants. The mutations, however, persist in viral reservoirs and are rapidly reselected after reappearance of the selective pressure by reusing the drug. Consequently, just studying the most recent genotype might miss important mutations. And indeed, the patient’s treatment history has long been recognized as valuable information (Bratt et al., 1998), and recently it has been shown that inspection of previous genotypes helps to explain response to antiretroviral combination therapy (Zaccarelli et al., 2009). A quantitative analysis of the viral population with new sequencing techniques revealed advantages in predicting the coreceptor usage phenotype from genotype (Däumer et al., 2008). In terms of drug resistance, the interest is mainly on resistance mutations that are only present in a minority of the viral population at the onset of the treatment. Minorities might increase the risk of selecting that resistance mutation early after treatment start. Hence, the question is, if minorities are harmful for the success of treatment at all, and if so, what threshold of the minority leads to rapid selection of resistant variants. Using standard Sanger sequencing (also called bulk-sequencing) a minority in the population can only be detected, if it accounts at least for 20% in the overall sample. In contrast, ultra-deep sequencing allows to determine the sequence of individual viruses. Hence, here the resolution is mainly bound by the cost of the experiment. Initial studies using ultradeep sequencing demonstrated that resistance mutations found in minorities of the viral population correlate well with previous antiretroviral therapies and can therefore provide information on archived resistance mutations even in the absence of treatment records (Le et al., 2009). Eventually, every known drug combination fails due to acquired drug resistance muta- 2.6 Further Advancements 29 tions. The fact that anti-HIV therapy is a life-long effort paired with the limited number of antiretroviral drugs and the cross-resistance between them, raises the need to consider different strategies of effectively using the available array of compounds. For example, one could follow a “hit-hard” strategy and apply one drug from each class, which will probably lead to a very long lasting successful treatment (e.g. six years), but once it fails there are resistance mutations against drugs from all classes. Consequently, only a very limited number of drugs is available after treatment failure. On the other hand, one can restrict the treatment to two drug classes, like in standard HAART, which remains effective for e.g. three years. The fact that other drug classes have been spared, allows for multiple repetition of the medium term success. More specifically, one could optimize the order in which drugs of one class should be administered for minimizing the impact of cross-resistance. This planning of sequences of treatments, or therapy sequencing, aims at prolonging the patient’s life. Summing up, improvement over state-of-the art treatment guidance can be achieved by: 1. Construction of models that predict in vivo response to antiretroviral combination therapy and thereby capture interactions between drugs and between drugs and mutations in vivo. 2. Inclusion of features of viral evolution for assessing the risk of the virus to escape a putative treatment by developing further mutations. 3. Incorporation of data on the patient’s treatment history for assessing the risk of resistance mutations archived in viral reservoirs. 4. Consideration of host specific characteristics ranging from genetic factors to coinfections for addressing the interactions between drugs and host (pharmacogenetics) and virus and host (e.g. immune control) 5. Investigation of the impact of minorities in the viral population on the treatment outcome and resistance development using novel sequencing techniques. 6. Generation of personalized treatment schedules that ensure the optimal use of the available array of antiretroviral drugs. This thesis addresses some of these issues. Dealing with points four and five, for example, requires novel data: presence of co-infections or genotyping of the patient and sequencing the viral quasi species, respectively. These data are not routinely collected in treatment databases, and the corresponding issues are therefore not addressed. The objective of the thesis is the inference of virological response in vivo directly from the viral genotype, the applied drug combination, features derived from the viral genotype, and further available information (Chapters 4 and 5). The models are restricted to RTIs and PIs (excluding next generation drugs within these classes), simply due to lack of data on novel drugs in databases collecting clinical response data. This poses no major limitation, as RTIs and PIs remain the major building block for today’s antiretroviral therapy of HIV infections. Novel drugs are typically spared for later treatment lines often termed salvage treatments. Finally, we undertake first steps towards therapy sequencing by introducing a fast method for estimating the viral evolution during combination therapy (Chapter 6). 30 2 AIDS, HIV, and Antiretroviral Therapy 3 Extensions to GENO 2 PHENO The web service geno2pheno for predicting phenotypic drug resistance from the viral genotype has been in operation since December 2000. Updates and continuous research is required for maintaining its benefit for virologists and clinicians treating HIV patients. In this chapter we describe the web tool geno2pheno[integrase] that predicts from the viral genotype phenotypic resistance against two integrase inhibitors (Section 3.1). Moreover, we present research aiming at improving the usefulness of predicted fold-change in IC50 by deriving clinically relevant cutoffs of the continuous measure (Section 3.2). Finally, in Section 3.3 we explore approaches for improving prediction of resistance to individual antiretroviral drugs by using abundantly available viral genotype data in addition to the genotype-phenotype pairs. 3.1 GENO 2 PHENO[integrase] The viral integrase comprises 288 amino acids and is a product of the C-terminal portion of the large precursor protein encoded by the Pol gene. Only recently, in 2008, the first integrase inhibitor raltegravir (RAL) was approved by the FDA. Knowledge of drug resistance against RAL and elvitegravir (EVG), an integrase inhibitor in phase III clinical trials, accumulates only slowly. Moreover, due the recent release of the drug, routine resistance testing is not established yet, mainly owing to the fact that the integrase of the patient’s virus is assumed to be wild type and thus susceptible to the novel drug. As a consequence, genotype-phenotype data is rare. However, some researchers investigate mutations, which were reported to emerge during integrase inhibitor containing regimens, with phenotypic resistance assays. These genotype-phenotype data are partially available via Stanford’s HIVdb1 . Material and Methods Using this freely available resource we compiled a dataset comprising 113 and 126 genotypephenotype pairs for RAL and EVG, respectively. The raw data were preprocessed exactly like in the case of geno2pheno: the decadic logarithm was applied to the fold-change values, and each of the 288 amino acid positions was represented by 20 binary variables indicating the presence (or absence) of a specific amino acid at that position. In accordance to the geno2pheno web service we applied linear support vector regression (SVR) for computing the continuous log10 (FC) from genotype. The cost parameter C of the SVR was optimized in a five-fold cross-validation. Performance was measured using Pearson’s correlation coefficient (r) between predicted and measured log10 (FC) in a leave-one-out cross-validation (LOOCV) with the optimized cost parameter. In addition, we trained one 1 http://hivdb.stanford.edu/cgi-bin/IN_Phenotype.cgi 31 32 3 Extensions to geno2pheno linear SVR model for each drug on the complete set for investigating the contribution of each mutation to the overall observed resistance. More precisely, linear support vector machines (SVMs) allow to rewrite the decision function for classification and regression as a simple linear model with trivial access to the contribution of every mutation (Guyon et al., 2002). The training of a SVM results in a set of support vectors x~k with label yk and a set of non-zero weights ak . In case of a classification task, the label for a new sample ~x is then computed by: ! X f (~x) = sgn yk αk K(x~k , ~x) , (3.1) k where K(x~k , ~x) is the kernel function. Essentially, a kernel function expresses the similarity between two vectors. Hence, during the decision process the similarity of the input vector to each support vector x~k is computed and multiplied with its label yk and its influence αk derived during training. Popular kernel functions are for instance, the polynomial kernel with degree d, K(x~k , ~x) = (1 + hx~k , ~xi)d and the radial basis function kernel K(x~k , ~x) = exp(−βk~x − x~k k2 /(2σ 2 )). However, when using a plain linear kernel we simply set K(x~k , ~x) = hx~k , ~xi, which allows us to rewrite the decision function as f (~x) = sgn (w ~ · ~x + b) , with X w ~= αk yk x~k and b = hyk − w ~ · x~k i. k Thus, the weight vector w ~ is simply a linear combination of the support vectors and offers a straightforward interpretation. Results Figure 3.1 depicts the scatter plots between predicted and observed log10 (FC) values. For both drugs the correlation coefficients are high, whereas resistance to RAL (r = 0.87) was predicted more accurately than resistance to EVG (r = 0.79). The influence of each mutation on the measured phenotypic resistance against RAL and EVG is displayed in Figures 3.2 a) and b), respectively. As expected, most mutations lead to an increase in FC; a few mutations, however, slightly increase susceptibility to the drugs. Figure 3.2 c) shows a scatter plot between the SVR weights obtained from the RAL and EVG models. Here it becomes evident that most mutations confer resistance to both drugs. A few exceptions are mutations at position 66, which influence the EVG phenotype but not the RAL phenotype, and mutation 151I that increases RAL resistance, but decreases EVG resistance. The offset (i.e. b in the linear model) is close to 0 for both models, thus indicating no resistance for wild type viruses. Discussion A clear limitation of this study is the low number of training instances, which, in addition, originate from a total of eleven different research groups using different experimental assays for measuring phenotypic resistance. These factors clearly increased the variation 3.1 geno2pheno[integrase] 33 4 4 ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ●● ● ● ● ● ●● ● ● ●● ● ● ● ● ●● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ●● ●● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ●●●● ● ● ●● ● ● ●● ● ● ● ● ● ● ●● ●●● ● ● ● ● ● ● ● ● ● ● ●● ●● r=0.79 ● ●● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● 0 0 ● ● ● 2 ● r=0.87 1 ● ● ● ● predicted log10(FC) 2 ●● ●● 1 predicted log10(FC) ● ● ● ● ●● 3 3 ●● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ●● −1 −1 ● −1 0 1 2 actual log10(FC) (a) RAL 3 4 −1 0 1 2 3 4 actual log10(FC) (b) EVG Figure 3.1: Scatter plots between predicted and observed log10 (FC) values for (a) RAL (n=113) and (b) EVG (n=126). in the phenotypic measurements we used for training the SVR models. Despite this shortcoming, the achieved prediction accuracy was surprisingly good for both drugs. This, however, might be a direct consequence of the origin of the data: researchers were interested in the effect of a few mutations, which frequently emerge during treatment with integrase inhibitors. Thus, they conducted site-directed mutagenesis to introduce these mutations, individually or in combination, into a wild type virus. Hence, our model summarizes the knowledge derived from these experiments, but it does not allow to identify, yet un-described integrase resistance mutations as it was done with the training data for geno2pheno (Sing et al., 2005b). Furthermore, as a consequence of the bias towards wellknown resistance mutations, some training instances had identical genotypes. Of course, identical samples can artificially boost the performance assessed in cross-validation settings. In order to study the impact of duplicate samples on the prediction accuracy, we repeated the LOOCV study with a smaller set of unique samples (n=76 for RAL and n=90 for EVG). The observed correlation was slightly worse for RAL (r = 0.85) and even slightly better for EVG (r = 0.81) than with the full dataset. Nonetheless, analysis of the feature importance could confirm the high cross-resistance potential between the two inhibitors (McColl et al., 2007). The prediction of phenotypic resistance to the integrase inhibitors RAL and EVG based on SVR models is implemented in the freely available web service geno2pheno[integrase] 2 . Along with the predicted FC values, a list of mutations contributing to the drug resistance is provided. Currently, prediction of resistance to INIs is decoupled from prediction to resistance to RTIs and PIs, since, as of now, determining the genotype of the viral integrase and the Pol fragment containing protease and reverse transcriptase are separate working steps. 2 http://integrase.geno2pheno.org 140S 143C 143R 148H 148K 148R 151A 151I 155S155H 3 Extensions to geno2pheno ● ● 66A 68V ● ● ● ● 0 ● ● ● ● ● ● 1 ● ● ● 121Y 92Q 92V ● 0 25 50 75 234V ● 113V ● −1 influence on log(FC) 97A 2 34 100 125 150 175 200 225 250 275 amino acid position ● ● ● ● ● 0 ● 263K ● ● ● ● 234V ● ● ● ● ● 125K 95K 113V 114Y ● 68V 51Y 1 140S 121Y 66A 66I ● ● ● ● 146P 147G 148H 148K 148R 151A 153Y 155H 155S 157Q 2 92Q92V ● ● ● ● 25 50 75 100 138K ● 151I 20K 0 ● ● 230R ● 128T ● −1 influence on log(FC) 3 (a) RAL feature importance 125 150 175 200 225 250 275 amino acid position (b) EVG feature importance 148H 148K 1.5 92Q 1.0 66I 155H 121Y 140S 66A 151A 147G 146P 263K 0.5 change in log(FC) for EVG 2.0 148R 92V 155S 0.0 61I 68I 74M 154I 230N 128T 20K 230R −0.5 113V 234V 153Y 114Y 95K offset 68V 125K 51Y 157Q 138K −0.5 0.0 143R 97A 143C 151I 0.5 1.0 1.5 2.0 change in log(FC) for RAL (c) Correlation of mutations Figure 3.2: Contribution of each mutation to the predicted phenotypic resistance to RAL (a) and EVG (b). All mutations that cause a change in log10 (FC) of more than 0.15 are labeled with the corresponding substitution in red (increase) or green (decrease). Correlation of mutations (c): the x-axis (y-axis) displays the effect on log10 (FC) by a specific mutation on RAL (EVG) resistance. 3.2 Estimating Clinically Relevant Cutoffs for geno2pheno 35 3.2 Estimating Clinically Relevant Cutoffs for GENO 2 PHENO The appealing benefit of the phenotypic drug resistance test, and consequently also of the predicted FC in drug resistance, is the output of a scalar value that objectively quantifies drug resistance. These values, however, have to be interpreted in terms of clinical impact. Intuitively, high values predict unfavorable outcome, while low values indicate good response to the drug. Still, where exactly the cutoffs are located, which classify a drug as susceptible, intermediate, and resistant, is unclear. The computation of clinically relevant cutoffs is therefore an essential task for converting an in vitro measure into a clinically useful tool. Moreover, the observed ranges of FC vary heavily between drugs. For example, FC for ZDV in the training data for geno2pheno has a median of 29 [interquartile range (IQR): 1.5; 225], while ABC has a median of only 3.9 [IQR: 1.7; 8.2]. In geno2pheno this scaling issue is addressed by normalizing the raw predicted FC with the predicted FC as observed in treatment-naïve patients (Beerenwinkel et al., 2003a). Briefly, the log10 (FC) of one drug in a group of untreated patients, i.e. patients, who are assumed not to have resistance mutations, is predicted with geno2pheno and the mean µnaïve and the standard deviation σnaïve are computed. The log10 (FC) for a new prediction is now normalized by computing the z-score log10 (FC) − µnaïve z= . σnaïve Hence, drug resistance is expressed as the distance (measured in standard deviations) from a treatment inexperienced group, and therefore makes FCs for different drugs more comparable. The problem, however, which values correspond to intermediate or complete drug resistance remains unsolved by this transformation. The first generation of cutoffs in geno2pheno was based on the observation that on a logarithmic scale the distribution of the FC values displays a bimodal distribution (Figure 3.3) – more precisely, a mixture of two Gaussian distributions, with one Gaussian representing the susceptible subpopulation and the other the resistant subpopulation. Hence, the density of the x = log10 (FC) can be modeled as α · φ(x; µ1 , σ1 ) + (1 − α) · φ(x; µ2 , σ2 ), with φ(x; µ, σ) denoting a normal distribution with mean µ and standard deviation σ (Beerenwinkel et al., 2003a). The model parameters can be efficiently estimated using the expectation-maximization (EM) algorithm (Dempster et al., 1977), and the intersection (between µ1 and µ2 ) of the two (weighted) Gaussians represents a logical candidate for a cutoff between susceptible and resistant. Beerenwinkel et al. (2003a) exploited the bimodal nature of the distribution further for calculating the probability of an FC value of belonging to the susceptible subpopulation, that is where prob(sus|FC) > prob(res|FC) (Beerenwinkel et al., 2003b). To this end, the zero x0 of the log-likelihood function l(x) = prob(sus|x) prob(res|x) is computed and the log-likelihood function is approximated by its tangent L(x) at x0 . Finally, the probability of an FC value to belong to the susceptible subpopulation can be 1 approximated with prob(sus|FC) ≈ 1+exp(−L(x)) , where x := log10 (FC). This quantity is also referred to as the activity of a drug d against a virus. Despite its appealing mathematical properties, the probability of belonging to the susceptible subgroup was not favored by the clinicians and virologists, who considered the 36 3 Extensions to geno2pheno −1 0 1 log10(FC) 2 3 1 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 Probability 0 0 0.0 0.1 0.2 Density Density 0.3 0.4 0.5 0.6 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Probability Density Density 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 EFV 0.7 NFV −2 −1 0 1 2 3 log10(FC) Figure 3.3: Distribution of log10 (FC) for the PI NFV and the NNRTI EFV. The Gaussian representing the susceptible (resistant) subpopulation is depicted by a green (red) dashed line. The intersection of the Gaussians is denoted by the vertical dashed line, and the approximation of prob(sus|FC) is given by the blue line. measure being too conservative. That is, viruses are being declared resistant against a compound although the drug still shows some clinically useful activity in the patient. The current set of cutoffs used in geno2pheno are so-called clinically relevant cutoffs, meaning, they reflect the clinical usefulness of predicted in vitro resistance. For estimating those cutoff values special treatment changes were manually examined. In these treatment changes only a single drug was added to the regimen and the achieved change in viral load can therefore be fully attributed to this new drug in the combination. Hence, clinically relevant cutoffs can be computed by correlating the predicted drug resistance against the compound to its effect on viral load decline (Däumer et al., 2007). These special treatment changes, however, occur only rarely in databases collecting routine treatment information, and consequently, the estimated cutoffs are unreliable. As a result, the approach was restricted to the drug LPV for which most data was available. A further method for deriving clinically relevant cutoffs (employed for the remaining drugs) was based on the correlation of predicted drug resistance to a drug with the viral load measured during a failing treatment containing that drug. In short, a linear model between log10 (FC) and log10 (VL) was fitted and the FC corresponding to a VL of 103.41 copies per ml (i.e. two standard deviations from the mean of VL in treatment-naïve patients) was selected to be the upper cutoff (Däumer et al., 2007). The lower cutoff could not be estimated with this method and was set to the biological cutoff (i.e. two standard deviations from the mean IC50 observed in treatment-naïve patients). The company Virco with its proprietary FC prediction system VircoTYPE was facing a similar problem. For deriving clinically relevant cutoffs Winters et al. (2008) compiled one dataset per drug comprising only regimens containing that drug. Briefly, the treatments are grouped with respect to the estimated background activity of the other drugs (using a preliminary cutoff), thus allowing for studying the relation between change in viral load and predicted FC of the target drug. The lower and upper cutoffs are defined as the FC 3.2 Estimating Clinically Relevant Cutoffs for geno2pheno 37 values at which the drug exhibits a 20% and 80% loss in activity compared to a wild type virus, respectively. The thresholds are optimized iteratively per drug, that is, after values for all drugs have been computed sequentially, the next round of cutoff optimization uses the improved values to achieve a better estimate of the background activity. The iterative procedure was stopped when the computed clinical cutoffs remained stable. Moreover, stability of the derived cutoffs was estimated using 1000 bootstrap replicates of the initial data (Winters et al., 2008). For allowing a more systematic derivation of clinical relevant cutoffs for the geno2pheno system we investigated an approach akin to the one applied by Virco. In contrast to that approach, all cutoffs are estimated simultaneously using global probabilistic optimization algorithms like simulated annealing (Kirkpatrick et al., 1983) and genetic algorithms. In the following, we briefly describe our approach. Material and Methods Data The basis for the optimization is a set of treatment change episodes (TCEs) extracted from the EuResist integrated database. Briefly, a TCE comprises the viral genotype (at most three months prior to treatment start), the prescribed drug combination, and a viral load before (at most three months) and after treatment start (within 4-12 weeks). Treatment success was defined as a reduction of the VL below 500 copies per ml blood or by at least a 100-fold reduction compared to the pre-treatment VL measurement. Details on this standard datum definition and the EuResist database are provided in Chapter 5. Application of the definition to data stored in the current version of the EuResist database resulted in 5,012 TCEs. Of those, 4,514 instances were randomly selected and formed the development set. The remaining 498 TCEs were used as an independent test set. Success rate was 72.5% in both datasets. Furthermore, FC in drug resistance against the drugs applied in a treatment was predicted with geno2pheno. Global Probabilistic Optimization Simulated annealing (SA) and a genetic algorithm (GA) were used to compute two cutoffs for every compound. Below (above) the lower (upper) cutoff activity of the drug was rated 1.0 (0.0), representing full (no) activity. Between the cutoffs the activity was set to 0.5. SA starts with a random set of cutoffs, and a subset of those are randomly modified in each iteration of the algorithm. The cutoffs are used to compute the phenotypic susceptibility score (PSS) of the applied regimen, i.e. simply the sum of individual drug activities. If a modified set of cutoffs improves the correlation between the PSS and the change in VL (∆VL) on the development set, then it is retained. If, on the other hand, the modified set lowers the correlation, then it is only retained with a small probability. This probability depends on the magnitude of decrease of the correlation and on the runtime of the algorithm. Acceptance of slightly worse solutions is a tool for avoiding to get stuck in local maxima. Precisely, the probability paccept of accepting a worse set of cutoffs is δ 1 i paccept = exp , with T (i) = 1− , T (i) 50 I 38 3 Extensions to geno2pheno where δ is the difference in performance between the old set and the new set (i.e. negative), T (i) is the temperature of the system, I is the maximal number of iterations (here 5000) and i is the current iteration. Based on the current best set of cutoffs, a neighboring set of cutoffs was tested. In order to reduce computational complexity, the continuous IC50 values were searched in steps of 0.05. A neighboring set of cutoffs is defined as a set in which no cutoff differs by more than 0.15 (i.e. three grid steps) from the corresponding value in the current best set. Finally, it was ensured that lower cutoffs were smaller than upper cutoffs. Among all neighboring sets of cutoffs, one set was randomly selected for being tested in the next iteration. In contrast to SA, GA maintains multiple sets of cutoffs (the population) with each set containing cutoffs for all drugs (one individual). In every iteration of the algorithm (generation), the chance to copy a set of cutoffs depends on its performance (fitness). During the copying process the cutoffs are randomly modified (mutated) and even parts of individuals might be exchanged (cross-over). The algorithm terminates when a stopping criterion is fulfilled. Usually, the stopping criterion comprises a maximum number of generations and a maximum number of generations without substantial improvement (stall generation). For GA we used the implementation of the genetic algorithm and directed search toolbox of the programming environment Matlab. For robust estimates of the cutoffs, SA and GA were carried out on 100 bootstrap replicates of the original development set and the mean of the cutoffs obtained from the bootstrap replicates represents the final set of cutoffs. This set was used to compute PSS values on the validation set where the PSS was correlated to ∆VL and to the binary outcome – performance was measured as Pearson correlation (r) and area under the receiver operating characteristics (ROC) curve (AUC). Briefly, ROC curves depict classifier performance by giving a true-positive rate (TPR; percentage of correctly predicted successes) for every false-positive rate (FPR; percentage of failing therapies that were predicted to be successful). The AUC summarizes the performance and is a convenient measure for comparing scoring systems without the need to provide a particular cutoff (Brun-Vézinet et al., 2004). The AUC is a value between 0 and 1 corresponding to the probability that a randomly selected success receives a higher score than a randomly selected failure (Fawcett, 2006). The achieved performance was compared to the old cutoffs, to the original activity scores, and to Stanford’s HIVdb. In addition to the original approach, we investigated slight modifications. Namely, the use of AUC instead of correlation as the target function, a linear interpolation of the intermediate region (instead of simply using 0.5), and the combination of both. Results Figure 3.4 a) depicts the cutoffs based on the activity score, the old manually derived clinical cutoffs, and the set of new automatically derived clinical cutoffs. This new set of cutoffs was derived with GA and correlation between PSS and ∆VL as a target function without interpolation between the cutoffs. Of note, cutoffs based on the bimodal activity model tend to be lower for most drugs than the (old or new) clinical cutoffs, and thereby confirm the subjective impression of virologists and clinicians. When used with linear interpolation between the lower and upper threshold, the new cutoffs achieve an AUC 3.2 Estimating Clinically Relevant Cutoffs for geno2pheno (a) clinical cutoffs 39 (b) ROC curve Figure 3.4: New and old clinical cutoffs (CCO) in comparison (a). Cutoffs based on the bimodal activity model are shown in the upper bar. The old clinical cutoffs are depicted in the middle bar, and the newly derived cutoffs are depicted below. Green corresponds to susceptible, yellow and red indicate intermediate and full resistance, respectively. Performance of different sets of cutoffs (b). Performance achieved with the different sets of cutoffs on the validation data in comparison to a HIVdb 5.0.0 based prediction. (r) of 0.784 (0.469), compared to 0.759 (0.433) with the old cutoffs, 0.776 (0.461) with the activity score, and 0.781 (0.457) with HIVdb using the five state interpretation. GA provided an improved set of clinical cutoffs for geno2pheno that performed slightly better than (as good as) HIVdb with respect to correlation (AUC). Figure 3.4 b) depicts the ROC curves on the validation set computed with the old and new clinical cutoffs and the reference methods HIVdb and activity scores. Based on a method by DeLong et al. (1988) the significance of the difference between two AUC values could be assessed. Only improvements over the old clinical cutoffs by any other method showed a statistical trend (new clinical cutoffs: p = 0.0877; activity score: p = 0.1097) or even statistical significance (HIVdb: p = 0.0204). The p-values for differences between the remaining pairs of methods were all larger than 0.19. Table 3.1 lists the achieved performance on the validation set using either 0.5 for the intermediate resistance or the linear interpolation. Of note, for simulated annealing it beneficial, in general, to use interpolation during the estimation of the cutoffs, in addition, using the correlation between PSS and ∆VL as target function optimized both r and AUC while AUC as target function results in worse r on the validation set. For GA such conclusions cannot be drawn. Here, using AUC instead of correlation as optimization function achieved slightly better performance in terms of AUC. The upper thresholds for some drugs, however, were extremely large, e.g. ZDV and 3TC received approximately an upper cutoff of 100-fold (data not shown). That is, for these drugs, viruses were regarded as resistant only for FC values of 100 and more. Overall, cutoffs derived with GA yielded 40 3 Extensions to geno2pheno D I D I r AUC development validation simulated r D I 0.455 0.464 0.465 0.467 0.421 0.396 0.446 0.436 annealing AUC D I 0.763 0.774 0.759 0.766 0.759 0.756 0.752 0.751 genetic algorithm r AUC D I D I 0.472 0.469 0.780 0.784 0.468 0.471 0.774 0.784 0.467 0.463 0.786 0.789 0.479 0.466 0.784 0.790 Table 3.1: Performance of different sets of automatically derived clinical cutoffs. Performance was measured in correlation (r) and AUC on the independent validation set. The rows denote the setting in the training stage, i.e. the different target functions and whether interpolations between thresholds was used (I for interpolation, D for discrete). The columns correspond to different settings in the validation setting. The first (last) 4 columns denote performance of cutoffs derived with simulated annealing (genetic algorithm). better performance on the independent validation set than the cutoffs derived with SA when the same training setup was used. The method of DeLong et al. (1988) indicates that the observed improvements in AUC are significant (p-values range from 0.0502 to 0.0041). Clearly, SA and GA are not the only probabilistic global optimization methods. Other methods like ant colony optimization (Dorigo and Gambardella, 1997) can also be used to simultaneously optimize the cutoffs. However, the genetic algorithm performs well, and it is likely that modifications of the target function and the computation of the PSS play a more pivotal role in the final quality of the cutoffs than the optimization method. 3.3 Improvements Using Semi-Supervised Learning The advantage of data-driven methods for developing decision support systems, namely that information is only derived from data without interference of expert opinions, is also their major weak point: a sufficient amount of data is required during the training step of the models. This weakness becomes more evident when new drugs are released. Rulesbased systems can easily be extended on the basis of first reports of the drug in clinical trials or extended access programs. For the phenotypic resistance test, which is an important step during data generation, the drug has to be available to the laboratory conducting the assay. This, however, is usually only the case when the drug is already FDA approved. Consequently, there exists a serious time lag between approval of a drug and the availability of sufficient data for training statistical models. In a recent work we investigated the use of semi-supervised learning (SSL) methods for improving the prediction of drug resistance (Perner et al., 2009). Briefly, SSL uses labeled data (genotype-phenotype pairs) and unlabeled data (just genotypes from routine diagnostic) to derive improved models. The idea behind SSL is that unlabeled data can provide information on the distribution of the data in the input space. This information 3.3 Improvements Using Semi-Supervised Learning 41 can be exploited by the machine learning algorithm to avoid the separation of clusters, since points in the same cluster are assumed to belong to the same class (cluster assumption). Moreover, SSL methods generally assume that samples that are close in the input space are also close in the output space (smoothness or manifold assumption). Zhu (2007) provides an introduction into SSL and a survey on available SSL methods. We examined two types of unlabeled data: the first category originates from patients that were treated with the particular drug (for which we are building a model) at the time the viral sequence was obtained (Sdrug ), while the second category required only the use of one drug of the same drug class (Sclass ) at the time of genotyping. The second set of unlabeled data can be expected to be larger but probably less informative for the learning as the virus was (at the time of genotyping) not exposed to the drug in question. We found that if all labeled data (up to 1,000 genotype-phenotype pairs) were used, then the SSL learning methods showed little benefit over the standard supervised learning methods regardless of the type of unlabeled data. If, however, only little labeled data were available (e.g. 10% of all labeled data) then the SSL methods showed a significant improvement (Figure 3.6; upper row). Unfortunately, the benefit could not be witnessed for all drugs/drug classes and all tested SSL methods (data not shown). In fact, a violation of the SSL assumptions by the underlying data, led to inferior models. For example, the performance of the 3TC model was heavily corrupted by SSL methods. In the case of 3TC, one amino acid exchange is sufficient to confer complete resistance, while for other NRTIs several mutations are necessary. NRTIs are usually given to the patients in pairs. Thus viruses that were exposed to 3TC were also exposed to other NRTIs with more complicated resistance patterns. As a consequence, the data density does not reflect the labeling of 3TC resistance, which is a clear violation of the smoothness assumption. The transductive SVM (tSVM) is a SSL version of the SVM and, just like its supervised counterpart, provides a decision boundary (see Eqn. 3.1) after the training phase. Briefly, the standard soft margin SVM optimizes the following function: N X 1 min kwk ~ +C ξi , subject to ξi ≥ 0, yi (w ~ · x~i + b) ≥ 1 − ξi , ∀i w ~ 2 (3.2) i=1 where N is the number of labeled training instances, x~i and yi are the features and the label of the ith instance, respectively, w ~ and b define the hyperplane, ξi are the slack variables that allow for misclassification and C is the cost parameter for misclassified examples. The tSVM aims at determining a separating hyperplane under consideration of the M unlabeled samples {~x∗1 , . . . , ~x∗M }, therefore Eqn. 3.2 is extended in the following way: N M X X 1 min kwk ~ +C ξi + C ∗ ξj∗ , subject to w ~ 2 i=1 j=1 ξi , ξj∗ ≥ 0, yi (w ~ · x~i + b) ≥ 1 − ξi , yj∗ (w ~ · ~x∗j + b) ≥ 1 − ξj∗ , ∀i,j (3.3) where the additional parameters ξj∗ and C ∗ are the slack variables and the misclassification cost parameter for the unlabeled instances, respectively. Thus, the optimization problem ∗ for the in Eqn. 3.3 differs from Eqn. 3.2 in that the tSVM has to find a labeling y1∗ , . . . , ym unlabeled data and a hyperplane < w, ~ b > simultaneously. An approximative optimization procedure, which is required due to the complexity of the optimization problem, has been 42 3 Extensions to geno2pheno Figure 3.5: Decision boundaries fitted by SVM-based models. Decision boundary derived only from labeled instances (colored circles) using a supervised SVM method (left). The transductive SVM takes also the unlabeled samples (black circles) into account when fitting the decision boundary (right). implemented in the software library SVMlight by Joachims (1999). The approach begins with a labeling of ~x∗1 , . . . , ~x∗m based on the classification of an inductive SVM and a low weight C ∗ for the penalty for misclassified unlabeled data points. Then the labels of two randomly selected samples (one positive and one negative) are swapped. If the objective function is improved by that exchange of labels, then the switch is made permanent. This process is repeated until there are no more switches possible that yield an improved objective function. At this point the penalty for misclassified unlabeled data points C ∗ is increased and further labels are swapped to greedily improve the objective function. The iterative procedure stops when C ∗ exceeds a user defined value. Figure 3.5 depicts the concept of transductive SVMs. The resulting hyperplane (i.e. decision boundary) has the same form as the one derived with the standard supervised SVM approach. Consequently, the boundary derived with an tSVM can be used to assess the class of new unseen samples. The ability of classifying unseen samples is not granted for SSL methods. In fact, many methods only provide a classification for all unlabeled samples that are available during the training phase. Hence, classification of a new sample requires retraining of the entire model with all previous training data and the new unlabeled sample. For instance, low-density separation (Chapelle and Zien, 2005), one of the SSL methods we studied, cannot be used on unseen samples. This inability is a direct result from the fact that low-density separation (LDS) constructs a kernel which is based on all samples (labeled and unlabeled) in training data. LDS is only capable of making predictions for samples that were used to build the kernel. Nevertheless, the benefit of using tSVMs in the transductive setting (i.e. only prediction of unlabeled samples used in the training phase) that we previously observed (Perner et al., 2009), could be transferred to the inductive setting (i.e. prediction of new unseen samples without retraining). Figure 3.6 depicts the learning curves (size of labeled training data versus model performance) for the two drugs LPV and ZDV using the tSVM implementation SVMlight by Joachims (1999) and both types of unlabeled data. A standard supervised SVM served as reference method. The results show that with only very little labeled data 3.3 Improvements Using Semi-Supervised Learning 43 (e.g. 2.5%) the SSL models almost reach the performance of the supervised method using all available labeled data. Concluding, SSL does not yet provide a useful tool for improving the prediction of resistance for the first drug in a novel drug class (e.g. integrase inhibitors), simply due to the fact that only little relevant unlabeled data exist at the time when the drug is released. For instance, there are currently about 3800 integrase sequences stored in the Los Alamos HIV Sequence Database (http://www.hiv.lanl.gov/), compared to about 75.000 sequences comprising the protease. Moreover, since prior to the release of raltegravir there were no integrase inhibitors available, the majority of the available sequences will not contain any resistance mutations. The question, whether SSL provides a benefit for a new drug in an established drug class, depends on the resistance profile of the class and how the drugs are used. For instance, the NNRTI and NRTI models suffered from the circumstance that multiple drugs with different resistance patterns were simultaneously targeting the same viral protein, while PIs generally benefited from SSL. Likewise, future INIs might benefit from sequence data generated after exposure to raltegravir. 44 3 Extensions to geno2pheno 1.0 1.0 1.1 ZDV 1.1 LPV ● ● ● ● ● ● ● ● ● ● 0.9 ● ● AUC ● ● ● ● ● ● ● ● 0.7 ● 0.5 ● 200 300 400 500 ● ● 600 0 200 400 SVM on L SVMLight on L and S_drug SVMLight on L and S_class LDS on L and S_drug LDS on L and S_class 600 Number of labeled samples Number of labeled samples LPV ZDV 800 ● ● ● ● ● ● ● ● ● ● ● ● ● ●● 0.9 0.9 ● 1.0 1.0 1.1 100 1.1 0 SVM on L SVMLight on L and S_drug SVMLight on L and S_class LDS on L and S_drug LDS on L and S_class 0.6 ● 0.5 0.6 0.7 ● ● ● ● ● ● ● ● ● ● ● ● 0.5 ● 0 100 200 300 SVM on L SVMLight on L and S_drug SVMLight on L and S_class 400 Number of labeled samples 500 600 0.8 0.6 0.7 ● ● ● 0.5 0.6 0.7 0.8 AUC ● AUC ● 0.8 ● ●● 0.8 AUC 0.9 ● 0 200 400 SVM on L SVMLight on L and S_drug SVMLight on L and S_class 600 800 Number of labeled samples Figure 3.6: Learning curve for the drugs LPV and ZDV. Model performance is given as the area under the ROC curve (AUC) and based on a 10-fold cross-validation. For LPV (ZDV) a total of 682 (1055) labeled samples (L) were available. For the tSVM 1442 (2717) or 4435 (7887) sequences, which were exposed to LPV (ZDV) or at least one PI (NRTI) at the time of sequencing, respectively, were used. The performance was estimated using 2.5%, 5%, 10%, 20%, 40%, 60%, 80%, and 100% of the labeled data. The upper row depicts the performance of tSVM and LDS in the transductive setting (i.e. test instances were used as unlabeled training data). By contrast, the lower row depicts the performance of tSVM in the inductive setting. 4 Predicting Response to Combination Therapy In Chapter 2 we introduced the basis of modern anti-HIV therapy. In order to ensure the selection of potent regimens, several genotype interpretation systems are used to infer in vitro drug susceptibility and/or in vivo response to antiretroviral treatment on the basis of HIV-1 genotype. Most of these tools use a set of rules carefully crafted by experts and classify the virus as susceptible, intermediate, or resistant to each of the individual compounds. Few tools are fully data driven rather than based on expert knowledge. The most prominent examples are geno2pheno (Beerenwinkel et al., 2003a) and VirtualPhenotype (Vermeiren et al., 2007), which both apply methods from statistical learning for predicting in vitro resistance on the basis of genotype. Although all of these methods are designed to infer susceptibility to individual compounds (Vercauteren and Vandamme, 2006), recently developed decision support tools are being explored to infer virological response directly to a typical 3-4-drug HAART regimen. In one study (Larder et al., 2007), artificial neural networks were used to predict the change in viral load, given the sequence, regimen, and additional host-specific features. In Section 4.2 we introduce the software pipeline geno2pheno-THEO, which predicts the probability of reaching an undetectable viral load during the course of the regimen given the applied drug combination and the genetic makeup of the viral population. Geno2pheno-THEO uses covariates that encode the estimated viral evolution during treatment in addition to drug combination and viral genotype. These evolutionary features are based on works by Beerenwinkel et al. and are briefly summarized in Section 4.1. The resulting statistical model is evaluated on a large external database in a comparison to well-established HIV genotype interpretation systems (Section 4.3). 4.1 Features of Viral Evolution The features estimating viral evolution that are used to improve the prediction of response to antiretroviral treatment are based on works by Beerenwinkel et al. and are briefly summarized in the following sections. 4.1.1 Activity Score The activity score for combination treatments is based on the activity score for individual drugs as computed by geno2pheno. Precisely, let D be the set of all drugs and T = {d1 , . . . , dn } ⊆ D be a combination of n drugs. Furthermore, Tc represents all drugs of T belonging to drug class c ∈ {NRTI, NNRTI, PI}. The activity of a treatment T against 45 46 4 Predicting Response to Combination Therapy virus seq is defined as: activity(T, seq) = X c max activity(d, seq), d∈Tc where rf d (seq) denotes the resistance factor (RF) of the virus seq against drug d, and activity(d, seq) = prob(sus| log rf d (seq)) as defined in Section 3.2. Inhibitors sharing drug target and mechanism of action are competing, e.g. for the binding site, thus the activity score is restricted to the most active representative of a class (Jordan et al., 2002). No such restrictions hold for drugs from different drug classes, and consequently the activity score is additive for inhibitors from different drug classes. Obviously, the number of different drug classes constitutes the upper bound of the activity for treatments. Thus, the activity score is in [0,3] for our case with at most three drug classes. Maximizing over drugs in the same class might at first glance seem limiting, since there is no immediate benefit of administering multiple drugs from the same class. The benefit becomes evident though, when considering the change of the activity score during the course of treatment. The selective pressure posed by the treatment leads to mutations in the viral genome, which in turn might affect drugs from the same class differently and thus the most active drug from one class changes over time. For instance, at start of a regimen comprising ZDV and 3TC the RT of a virus shows only the T215Y mutation, which solely reduces the susceptibility of ZDV. Thus, the activity of 3TC is unharmed and consequently the activity of the treatment is 1. During the time of therapy, the virus develops another RT mutation. For instance, M184V that completely inhibits the activity of 3TC but does not affect ZDV at all. The activity score for the combination treatment is now equal to the activity of ZDV, in contrast to a 3TC monotherapy where the activity score would have decreased immediately to 0. This example illustrates that, in addition to the resistance to drugs at treatment start, the long-term success of an antiviral therapy depends greatly on the ability of the virus to escape from selective pressure presented by the treatment. An estimate of the virus’ ability to escape from the treatment might therefore be an useful information for predicting the success of combination treatments. Such an estimate can be derived by studying changes of activity in the mutational neighborhood of the virus. Due to the high dimension of the sequence space, i.e. there are 20319 possible amino acid sequences of length 319, the neighborhood of the virus is explored using a beam search. Precisely, starting from the original sequence, all variants with one additional mutation are generated and the activity of the treatment against all variants is computed. The resulting activity score for these in silico mutants is used to rank the variants. Only for the top b mutants showing the least activity all variants with one additional mutation are generated and their activity score is computed. The search stops when d mutations are introduced into the original sequence. The breath b and depth d are the two parameters of the beam search. The activity score at depth r is defined by: activityr (T, seq) = min activity(T, seq0 ), 0 seq ∈Br where Br denotes the neighborhood of the original sequence with r additional mutations. Br is generated from the b viruses with the lowest activity score in Br−1 by adding an additional mutation. For further details on an the activity score see the work by Beerenwinkel et al. (2003b). 4.1 Features of Viral Evolution 47 4.1.2 Mutagenetic Trees The beam search approach assumes that every mutation is equally likely, and that only the increase in resistance is important. This, however, is an oversimplification. For example, there exist well-known resistance pathways for the first approved anti-HIV drug zidovudine. The thymidine analog mutation (TAM) pathway 1 and 2 comprise the RT mutations 41L, 210W, and 215F/Y and 67N, 70R, and 219E/Q, respectively. The mutations belonging to one pathway are mainly accumulated in a preferred order (hence pathway). The molecular causes for the preferred order or for the existence of two distinct pathways are still unknown. However, these pathways occur so frequently that they were identified by experts without the use of computational analyses. Beerenwinkel et al. (2005b) introduced mixtures of mutagenetic trees to estimate mutational pathways from cross-sectional data. Briefly, a mutagenetic tree is a weighted directed tree with vertices representing the appearance a new mutation (event). Edge weights are conditional probabilities between vertices with the constraint that the parent event has to be present before the child event can occur. Thus, every path from the root of the tree to a vertex represents the ordered accumulation of mutations. Basically, mutagenetic trees are restricted Bayesian networks and constitute an extension of the oncogenetic tree model introduced by Desper et al. (1999) that was restricted to undirected trees. A further extension of the original approach is the ability to generate mixtures of mutagenetic trees. Formally, a mutagenetic tree is a quartuple T = (V, E, r, ρ), with V = {1, . . . , l} being a set of vertices representing the events, E being the set of edges, r ∈ V being the root vertex that encodes the absence of any events, and ρ : E → [0, 1] being the mapping from edges to conditional probabilities. A single tree is constructed by first building the complete digraph G = (V, V × V, w) on all vertices. The weights w are defined as ∀u, v ∈ V : w(u, v) = log P r(u, v) − log(P r(u) + P r(v)) − log P r(v), where P r(u) and P r(u, v) denote the marginal probability of event u and the joint probability of events u and v, respectively, as estimated from the data. The edge weights are P r(u,v) the logarithm of the independence of the mutations P r(u)P r(v) multiplied by the direction P r(u) of the dependence P r(u)+P r(v) . The mutagenetic tree is defined as the branching in G that maximizes the sum of its edge weights. Using the algorithm by Edmonds (1967) the maximum weighted branching can be computed in O(|V ||E|) time, for a fully connected graph this corresponds to O(|V |3 ). The weighting function w(u, v), however, displays an undesired behavior for edges leaving the root node, since the less likely events receive a higher score w(r, v) = − log(1 + P r(v)). Thus, for edges leaving the root node the alternative weighting function w(r, v) = log P r(v) is used. The likelihood of a mutational pattern x is the probability that a given mutagenetic tree T generates x: L(x|T ) = P r(x|T ). Let S ⊆ V be the set of events defined by the mutational pattern x. If there is a subset of edges E 0 ⊆ E such that exactly all vertices of S can be reached in the subtree (V, E 0 ), then x can be generated by T with likelihood: Y Y P r(e) (1 − P r(e)). L(x|T ) = e∈E 0 e∈{S×V \S} The first product computes the probability of reaching all events in S and the second product represents the probability of not going beyond the events in S. If there is no 48 4 Predicting Response to Combination Therapy appropriate subtree of T , then the pattern x cannot be generated by T . Hence, L(x|T ) = 0. For instance, the right tree in Figure 4.1 does support the mutational pattern 215F, 41L but not the pattern 215F, 41L, and 70R, since the mutations 67N and either 219E or 219Q have to be acquired before the mutation at position 70. Thus, the likelihood of the pattern 215F, 41L, and 70R to be generated by that tree is 0. On the other hand, the likelihood of the 215F, 41L is 0.4 × 0.77 × (1 − 0.61) × (1 − 0.45). The mixture of mutagenetic trees is estimated from cross-sectional data in an EM-like learning algorithm (Dempster et al., 1977). Formally, a k-mutagenetic tree mixture model is defined as k k X X M= αi Ti with αi ∈ [0, 1] ∧ αi = 1, i=1 i=1 with Ti = (V, Ei , r, ρi ). Consequently, the likelihood of a pattern x given the mixture model is defined as k X L(x|M ) = αi L(x|Ti ). i=1 In order to handle noisy real-world data, one component of the mixture is forced to attain a star-like topology modeling the independence of mutations. Briefly, during the learning phase we want to find the mixture of k mutagenetic trees that maximize the log-likelihood of the training data: N X n=1 log k X αi L(xn |Ti ). (4.1) i=1 The responsibility γni of the tree Ti for the training sample xn is defined as the probability of xn being generated by Ti given the mixture model M . In the M-like step of the training algorithm the parameters of the current mixture model, i.e. the edge weights of the star component, the k − 1 trees, and the mixture parameters αi , are updated. In the E step the responsibilities are computed using the updated model. E and M step are iterated until the log-likelihood converges. At the initialization of the algorithm initial responsibilities are assigned according to a (k − 1)-means clustering of the training data. Algorithm 1 is adapted from (Beerenwinkel et al., 2005b) and summarizes the complete training process. For each antiretroviral drug one mixture of mutagenetic trees is trained. The training data for each drug comprises sequences from patients that were obtained during treatment with that drug. Figure 4.1 depicts a 2-mutagenetic tree mixture model learned for the drug ZDV. The left component is a star that supports every possible pattern and therefore models the noise of the data. The right component is a tree estimated from data. The mixture weights α are depicted above the components. The actual tree estimated from the data is able to generate 72% of the instances found in the training data. The mixture of mutagenetic trees can be used to derive two evolutionary features. The genetic progression score provides an estimate on the expected waiting time for a mutational pattern to occur and therefore summarizes how advanced the virus in terms of drug resistance is. The genetic barrier to drug resistance is the probability that the virus will not escape from drug pressure by developing further mutations. Both features are briefly described in the next sections. 4.1 Features of Viral Evolution 49 Algorithm 1 k-mutagenetic tree learning (X = (xnj )1≤n≤N,1≤j≤l , k) N is the number of training instances, and l is the number of events 1. Guess initial responsibilities: (a) Run (k-1)-means clustering algorithm (b) Set responsibilities ( γni = 1 2 if xn is in cluster i − 1, 1 2(k−1) else. 2. M-like step. Update model parameters: P Ni = N n=1 γni ∀i ∈ {1, . . . , k} P P T1 is a star with edge weights: β = lN1 1 lj=1 N n=1 γn1 xnj ∀i ∈ {2, . . . , k} : (a) Estimate the joint probability for all pairs of events (u, v): N 1 X pi (u, v) = γni xnu xnv . Ni n=1 (b) Compute the maximum weight branching Ti from the complete digraph with weights w derived from pi . i (c) Compute the mixture parameter αi = N N . 3. E step. Compute responsibilities: γni = Pk αi L(xn |Ti ) m=1 αi L(xn |Tm ) . 4. Iterate steps 2 and 3 until convergence, i.e. no changes in Eqn. 4.1. return T1 , . . . , Tk and α1 , . . . , αk The Genetic Progression Score The genetic progression score (GPS) was introduced by Rahnenführer et al. (2005). Briefly, a timed mutagenetic tree can be obtained by assuming independent Poisson processes for the occurrence of events on the edges and for the sampling time of the virus. The difference of occurrence time of the event j and its parents is denoted by Zj . Since event j was generated by a Poisson process, the waiting time Zj is exponentially distributed with parameter λj . Furthermore, let the sampling time of the virus ZS be exponentially distributed with parameter λS . The probability of event j can be computed as P r(j) = λj /(λj + λS ). Thus, the expected waiting time for event j is defined as: E[Zj ] = 1 1 − P r(j) 1 − P r(j) = = E[ZS ]. λj P r(j)λS P r(j) Estimation of the λS from real data is usually impossible, since the time of infection is usually unknown and the virus might have had events before starting the particular drug. Thus, we set E[ZS ] = 1, which defines a unit-less waiting time. We emphasize that the GPS is not intended to estimate absolute waiting times. Rather it provides a dimension-free 50 4 Predicting Response to Combination Therapy Figure 4.1: Example Mutagenetic tree mixture. measure of genetic progression that allows for comparing mutational patterns of different viruses. Unfortunately – apart from the case of a single event – there exists no formula for computing the waiting time for arbitrary pattern of events. However, within the timed mutagenetic tree model the expected waiting time can be estimated by simulating the waiting process. To this end, the estimated probabilities P r(j) are used to compute the required λj = P r(j)/(1−P r(j)). For a sufficiently large number of simulations one receives a stable estimate for the waiting time Zx for the mutational pattern x with respect to the distribution induced by the mutagenetic mixture model M . This waiting time is the genetic progression score: GPS(x) = EM [Zx ]. Of note, the GPS of a virus is computed for every drug (mutagenetic tree model) separately. Due to the definition of E[ZS ] = 1 the waiting times cannot be compared between drugs. The Genetic Barrier to Drug Resistance Instead of inspecting the past of the virus, i.e. to which state(s) in the (mixture of) tree(s) the virus already has progressed, we can use the evolutionary model for studying the future of the virus, i.e. to which states will it most likely move. Briefly, the genetic barrier is defined as the probability of not escaping from a drug by developing further mutations. With escape from a drug being defined as exceeding a predefined threshold of phenotypic drug resistance. In the mutagenetic tree model it is sufficient to describe the given viral sequence as one of 2l possible mutational patterns. Given a tree topology one can now compute the transition probability from the current pattern to any of the 2l possible patterns. If the tree topology does not allow the transition – e.g. when loosing mutations, then the probability is set to 0. Furthermore, using the paired genotype-phenotype data, which was used to 4.2 geno2pheno-THEO 51 train geno2pheno, one can compute the mean phenotypic resistance associated with each mutational pattern. Due to the limited number of genotype-phenotype pairs it is likely that some mutational patterns do not occur, in this case the weighted mean of all mutational patterns with non-zero transition probabilities is used, with the transition probabilities serving as weights. The genetic barrier to drug resistance is then simply defined as the sum of all transition probabilities to mutational patterns with phenotypic resistance below a certain threshold. The genetic barrier to drug resistance combines evolutionary information, the transition probabilities, with phenotypic information. 4.2 geno2pheno-THEO This section describes a joint work with Niko Beerenwinkel, Tobias Sing, Igor Savenkov, Martin Däumer, Rolf Kaiser, Soo-Yon Rhee, W Jeffrey Fessel, Robert W Shafer, and Thomas Lengauer. The work was published in Antiviral Therapy under the title “Improved prediction of response to antiretroviral combination therapy using the genetic barrier to drug resistance” (Altmann et al., 2007a). With about 24 drugs available and novel drugs being approved almost every year, it becomes increasingly difficult for the treating physician to select an optimal drug combination. There are a variety of different therapeutic goals, including VL reduction, increase in CD4+ T cell counts, minimization of adverse effects and preservation of future drug options. Importantly, different optimization criteria will tend to favor different therapies (Jiang et al., 2003). To date, most methods predict virological response to therapy based on the baseline genotype and the compounds in the applied combination. Specifically, artificial neural networks (Wang et al., 2003) and fuzzy rules combined with a genetic algorithm (Prosperi et al., 2004) were used in this manner to predict the change in VL. Related approaches include the application of case-based reasoning (Prosperi et al., 2005) and combinatorial optimization based on expert rules (Lathrop and Pazzani, 1999). In this section, we report new ways of analyzing treatment change episodes (TCE) and demonstrate how these new approaches might be more useful than current methods for prediction of response to therapy. The improvement results from incorporating genetic analysis, phenotypic prediction, and a prediction of the probability that further evolution of resistance will occur. The applied methodologies involve various techniques of statistical learning. We dichotomize virological response and compare the performance of several classifiers that predict the therapeutic success or failure of each genotype-therapy pair. The novel features derived from genotype and drug combination encode information about the evolutionary potential of the virus and the predicted level of phenotypic drug resistance. Specifically, we consider predicted phenotypes (see Section 2.5.2), the activity score based on a heuristic search over in silico mutants (see Section 4.1.1), the genetic barrier and the genetic progression score (see Section 4.1.2). Analyzing over 6,300 TCEs observed in a clinical setting, we show that all new descriptors significantly improve prediction of therapy outcome, especially if they combine evolutionary and phenotypic information. 52 4 Predicting Response to Combination Therapy start Sequencing > 14 days stop > 0 days Therapy (a) failure start start Sequencing < 90 days > 0 days > 14 days VL < 400 cp/ml > 0 days Therapy stop > 0 days Therapy stop (b) success start start Sequencing < 90 days > 0 days > 14 days Therapy VL < 400 cp/ml > 0 days VL < 400 cp/ml > 56/112 days stop > 0 days Therapy stop (c) sustained success Figure 4.2: A treatment change episode (TCE) resulting in sequencing is always considered to be a failure (a). If the virus is undetectable during the treatment following a failure, then the TCE is called a success (b). In addition, the alternative definition for successful TCEs requires two subsequent viral load (VL) measurements below the threshold with at least eight (or 16) weeks in between (c). 4.2.1 Material and Methods Treatment change episodes A TCE (Larder et al., 2003) consists of a baseline genotype, a drug combination, and a binary outcome indicating success or failure of the regimen. For our analysis, valid successful or failing TCEs were defined as follows (Figure 4.2): any available genotype is considered as evidence of a failing regimen, because, in general, sequencing can only be performed if the VL exceeds ∼1,000 copies/ml. Successful regimens are defined by inspecting therapies that follow a genotype measurement. When multiple genotypes are available, the most recent sequence sample before the onset of therapy was used. If the VL decreases below 400 copies/ml at least once during the course of the follow-up therapy and if genotyping was performed no earlier than three months before starting the therapy, then the respective treatment is considered a success. This definition of success focuses on initial response. Sustained response is also investigated by using an alternative definition of success that requires a second follow-up VL value below the threshold at least 8 weeks (or 16 weeks) after the first (Figure 4.2 c). 4.2 geno2pheno-THEO 53 Figure 4.3: The 6,337 analysed TCEs comprise a total of 875 distinct combination therapies. The histogram summarizes all 354 drug combinations that occur at least three times in the dataset. Another 116 combinations appeared only twice and 405 only once. Grey bars indicate successful therapies defined as initial virological responses below 400 copies/ml. The inlet histogram shows the 20 most abundant drug combinations. Datasets We analysed data obtained from the Stanford HIV Drug Resistance Database (comprising data from clinical studies ACTG 320, ACTG 364, GART and HAVANA) and from two Northern California clinic populations undergoing genotypic resistance testing at Stanford University. From a total of 25,717 therapies, 10,288 sequences, and 6,706 patients, we extracted 6,337 TCEs, including 4,776 failures and 1,561 successes, according to the definition based on initial response. The median time period between treatment change and the first follow-up VL measurement <400 copies/ml is 38 weeks (interquartile range: 16-78 weeks). Figure 4.3 shows the distribution of drug combinations for this dataset (denoted A). The alternative definitions resulted in 1,082 and 900 successes for eight and 16 weeks sustained response, respectively. Because of lack of data on the drug enfuvirtide (at time of the study the only approved EI), we considered only regimens consisting of NRTIs, NNRTIs and PIs. From dataset A, we generated two special subsets that are balanced with respect to sequences (BS) and with respect to therapies (BT), respectively. The dataset BS contains one successful and one failing regimen for each of 1,364 sequences exactly. Each genotype in this dataset gives rise to two TCEs, defined by the regimens before and after a successful treatment change. Similarly, in the set BT each regimen is paired with two different sequences, giving rise to exactly one successful TCE and one failure. This selection resulted in a total of 2,436 TCEs comprising 321 different regimens. If, for a fixed drug combination, 54 4 Predicting Response to Combination Therapy there were more sequences resulting in failure than in success, then failures have been selected randomly to match the number of successes (or vice versa). Dataset A was split into two groups denoted A1 and A2 according to the clinical centre in which the TCE was observed. Group A1 comprises 816 successes and 2,098 failures from one health care provider, and group A2 includes the remaining 745 successes and 2,678 failures from clinical studies and other hospitals. By using, in turn, one of these subsets for training and the other for testing, we seek to identify possible biases that might be related to the specifics of individual clinics, such as preferred treatment protocols. Datasets BS and BT were split in the same manner. In addition to the clinical data, we also used in vitro data on phenotypic drug resistance. For each drug, 880 sequences with fold-change (FC) determined by a recombinant virus assay (Walter et al., 1999) were available. These data were used to regress phenotype on genotype and to identify resistant states in defining the genetic barrier. Features The baseline approach for predicting TCE outcome uses one indicator variable for each resistance-associated mutation and one for each drug. For this approach and approaches based on mutagenetic trees (Section 4.1.2), we considered the resistance mutations presented in Johnson et al. (2005) resulting in 66 binary variables (49 mutation indicators, 17 drug indicators). We refer to this encoding of genotypes and therapies as the indicator representation. All other encodings contain these straightforward covariates and additional, more elaborate, features. The phenotype representation includes, for each drug in the respective regimen, the predicted FC in susceptibility. These predictions are based on a linear support vector machine that has been trained on the 880 matched genotype-phenotype pairs as described previously (Section 2.5.2). In the activity representation, the indicator vector is extended by the estimated activity of the drug cocktail against the virus population. For the beam search both parameters, both the breath and depth are set to 10. The genetic barrier representation also adds both phenotypic and evolutionary information to the indicator representation, and it can be regarded as an advancement over the activity score. More precisely, we used mutagenetic trees, a family of probabilistic graphical models, to estimate the order and rate of occurrence of resistance mutations. Using the Mtreemix software1 (Beerenwinkel et al., 2005c), for each drug, a mixture model of mutagenetic trees was learned from sequences derived under regimens comprising that drug. The number of trees was selected using the Bayesian information criterion (BIC) (Yin et al., 2006). A validation of mutagenetic tree models in terms of tree stability and goodness of fit has been presented in Beerenwinkel et al. (2005b). Viral escape is approximated by exceeding a predefined level of phenotypic resistance. These levels are defined by the cutoffs listed in Table 4.1 and are exactly the (old) clinical cutoffs for geno2pheno (Section 3.2). Unlike the sequence space search for low-activity mutants, the genetic barrier accounts for the fact that not all mutations are equally likely to occur. This is also an advantage over simple counting of resistance mutations, a frequently employed approximation to the 1 http://mtreemix.bioinf.mpi-sb.mpg.de/ 4.2 geno2pheno-THEO drug cutoff trees ZDV 30.0 5 ddC 2.2 2 ddI 2.4 4 d4T 2.0 4 3TC 15.4 5 ABC 3.4 7 TDF 2.1 3 drug cutoff trees SQV 4.5 4 IDV 4.6 4 RTV 2.6 4 NFV 5.8 6 APV 12.0 3 LPV 10.0 5 ATV 4.2 2 NVP 9.0 5 DLV 9.7 2 55 EFV 7.0 4 Table 4.1: Resistance cutoff used for each drug and number of trees in the mutagenetic tree mixture model. genetic barrier. Finally, the genetic progression score (GPS) involves only evolutionary information that is extracted from the mutagenetic tree models. The GPS of a genotype is defined as the expected waiting time for the mutational pattern to occur. Thus, the GPS also accounts for different probabilities of different mutations, but it does not include any phenotypic information. Statistical Learning Methods The five feature sets defined the input to several different machine-learning techniques, which were used to predict treatment response. We selected several standard classification methods (Hastie et al., 2001, pp.79-114), including linear discriminant analysis (LDA), which was used in Beerenwinkel et al. (2003b), together with the activity representation, least-squares regression, linear support vector machines (SVMs), decision trees (C4.5 software2 ), and logistic regression. We also included the more recent method of logistic model trees (LMT), which combine decision trees with logistic regression in the leaves of the tree (Landwehr et al., 2005). Receiver Operating Characteristic Analysis We used receiver operating characteristic (ROC) curves to compare the predictive power of classifiers. ROC curves arise from varying a parameter of the classifier that controls the trade-off between sensitivity and specificity. Each point on the ROC curve represents one classifier, and the curve allows for reading off its false positive and true positive rate. For example, the point (0.1, 0.8) represents a classifier that will falsely predict 10% of failing regimens as “success”, but correctly detect 80% of the successful regimens. The comparison of ROC curves is preferred to comparing error rates, because it corrects for skewed class distributions (as in dataset A) and controls both sensitivity and specificity of the classifier (Brun-Vézinet et al., 2004). The area under the ROC curve (AUC) is used as a summary performance measure. The trivial classifier that makes random predictions produces a linear ROC curve with an AUC of 0.5. The maximum AUC a classifier can achieve is 1.0 and the larger this value, the better its prediction performance. AUC values were compared using Wilcoxon rank-sum tests. The ROCR software3 was used for ROC analysis (Sing et al., 2005a). 2 3 http://www.rulequest.com http://bioinf.mpi-sb.mpg.de/projects/rocr/ 56 4 Predicting Response to Combination Therapy 4.2.2 Results Table 4.2 shows the performance of all learning techniques in combination with all feature encodings for the complete dataset A using the initial response definition of therapeutic success. Classifier performance was estimated by 10-fold cross-validation and is reported as the AUC. All of the proposed extensions of the indicator representation improved predictive power (Table 4.2, last column). This improvement was observed across all statistical learning methods. The additional features activity score and GPS yield the same predictive power on average. The largest improvement is achieved by the genetic barrier and by phenotype predictions, which both use phenotypic information for prediction. Compared with the feature encoding, the choice of the learning method has only a small effect. On average, the AUC differs by as much as 0.064 (7.7%) between different feature encodings, but only by 0.027 (3.2%) between different learning methods. The AUC of the phenotype representation decreases if combined with decision trees. However, as illustrated in Figure 4.4, the ROC curves reveal that this difference is mainly due to lower true positive rates at very high false positive rates, which might not be relevant for practical purposes. We restrict the further analysis of ROC curves to the LMT learning method, which, on average, outperformed all other techniques (Table 4.2). In Figure 4.2, the ROC curves for the five different feature encodings used with LMT on dataset A are shown. The indicator representation (black line) is improved significantly by all four additional features (p<0.0002). This advancement is most prominent for the genetic barrier that incorporates phenotypic information indirectly (p=0.0002), followed by the predicted phenotype, the activity score and the GPS. If we accept a false positive rate of 10%, then the indicator representation will detect 65% of the successful TCEs correctly (represented by the point [0.1, 0.65] on the black line in Figure 4.2), whereas the genetic barrier representation achieves an accuracy of 76.6%. The accuracy can be further increased by accepting more false positives. For example, if 90% of the successes were to be detected, we would have to accept 19.5% false positives for the genetic barrier or the phenotype encoding, and 35.5% for the indicator representation. The differences between the four encodings are even more strongly articulated in the analysis of the two balanced subsets of the complete dataset. In the dataset BS, each viral genotype is paired with a drug combination that gave rise to a successful TCE and with another TCE that resulted in failure. Thus, the genotype alone does not provide any information on the outcome of these TCEs. As might have been expected, the GPS does not improve the predictive power on this dataset, because this feature is derived only from the genotype (Figure 4.5 a). By contrast, the remaining three encodings, namely activity, genetic barrier, and phenotype, enhance the performance significantly (p<0.004). The genetic barrier encoding outperforms the activity encoding, and the phenotype representation provides the best encoding on this dataset. However, the difference between genetic barrier and phenotype is negligible (p>0.9). In the dataset BT, the therapies (rather than the sequence, as in BS) are balanced. For every therapy there exists the same number of genotypes that gave rise to successful and to failing TCEs. Here, the drug combination alone does not provide any information on the outcome of the TCEs. Usage of the GPS on this dataset increases the performance of the indicator representation to the level reached with the activity representation (Figure 4.5 b). LSR 0.825 (0.005) 0.911 (0.004) 0.870 (0.005) 0.892 (0.003) 0.856 (0.005) 0.871 (0.015) SVM 0.819 (0.009) 0.910 (0.004) 0.864 (0.006) 0.891 (0.005) 0.864 (0.005) 0.870 (0.015) C4.5 0.845 (0.005) 0.839 (0.005) 0.842 (0.007) 0.875 (0.005) 0.861 (0.005) 0.852 (0.007) LOGR 0.820 (0.007) 0.912 (0.002) 0.868 (0.003) 0.891 (0.002) 0.868 (0.005) 0.872 (0.015) LMT 0.868 (0.004) 0.905 (0.004) 0.898 (0.004) 0.916 (0.005) 0.899 (0.004) 0.897 (0.008) Mean 0.834 (0.008) 0.898 (0.012) 0.868 (0.007) 0.893 (0.005) 0.867 (0.007) Table 4.2: Classifier performance on the full dataset A. The table displays, for all combinations of feature encodings (rows) and learning techniques (columns), the resulting area under the receiver operating characteristic (ROC) curve (AUC) and its standard error (in parentheses) computed using 10-fold cross-validation. C4.5, C4.5 software; GPS, genetic progression score; LDA, linear discriminant analysis; LMT, logistic model trees; LOGR, logistic regression; LSR, least-squares regression; SVM, linear support vector machines. Indicator + Phenotype + Activity + Genetic barrier + GPS Mean LDA 0.825 (0.005) 0.912 (0.004) 0.868 (0.005) 0.891 (0.005) 0.856 (0.006) 0.870 (0.015) 4.2 geno2pheno-THEO 57 4 Predicting Response to Combination Therapy 0.6 0.4 indicator + phenotype + phenotype (C4.5) + activity + genetic barrier + GPS 0.0 0.2 True positive rate 0.8 1.0 58 0.0 0.2 0.4 0.6 0.8 1.0 False positive rate Figure 4.4: ROC curves for the complete dataset A using LMT (unless stated otherwise in the legend) and 10-fold cross-validation. Every feature encoding is represented by a receiver operating characteristic (ROC) curve, namely the baseline indicator representation (indicator), and the following additional features: predicted phenotypes, activity score, genetic barrier, and genetic progression score (GPS). Each point on the curve represents a classifier and allows determining its true positive rate and false positive rate. For example, the point (0.355, 0.9) on the black line represents a logistic model trees (LMT) classification model trained on the plain indicator representation with an expected 35.5% of false positives and 90% of true positives. 0.6 0.8 1.0 59 indicator + phenotype + activity + genetic barrier + GPS 0.0 0.0 0.2 0.4 True positive rate 0.6 0.4 indicator + phenotype + activity + genetic barrier + GPS 0.2 True positive rate 0.8 1.0 4.2 geno2pheno-THEO 0.0 0.2 0.4 0.6 False positive rate (a) Dataset BS 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 False positive rate (b) Dataset BT Figure 4.5: ROC curves for dataset BS (a) and dataset BT (b) using LMT and 10-fold crossvalidation. BS referes to “balanced with respect to sequences” and likewise BT refers to “balanced with respect to therapies”. Fur further details see caption of Figure 4.4. Similarly to dataset BS, application of the phenotypic and the genetic barrier representation results in maximal performance. Here, the genetic barrier encoding outperforms the phenotype encoding, but the difference does not reach statistical significance (p=0.1655). In the interesting region of low false positive rates (<30%), however, the difference can be substantial. For example, at a false positive rate of 10% the indicator approach recognizes 28% of the successful TCEs correctly, GPS yields a true positive rate of 54.7%, using the activity score increases true positives to 54.5%, the phenotype representation achieves almost 58%, and the genetic barrier recognizes as many as 64.9% of successes correctly. This rate more than doubles the precision of the indicator encoding alone. In order to investigate possible biases in the estimated models resulting from the specific clinical centres which collected the TCE data, we used the split of dataset A into A1 and A2. The TCEs in A1 originate from a single health care provider, but those in A2 stem from several different clinical studies and hospitals and hence are expected to be less homogeneous. When A1 and A2 are used separately in the same cross-validation procedure described above for the pooled dataset A, then the predictive performance is similar in both cases (AUC of 0.898 for A1 and 0.906 for A2). If A1 (A2) is used for training and A2 (A1) for testing, we estimate an AUC of 0.837 (0.875). The performance loss due to separation of the data by clinics was slightly more pronounced for the balanced dataset BS than for BT (data not shown). All of the reported results remain qualitatively unchanged when therapeutic success is defined by a more sustained response over at least eight or 16 weeks instead of by initial response. Using the alternative definitions we re-calculated the AUC values for all feature 60 4 Predicting Response to Combination Therapy encodings with LMT on all datasets. None of the derived performance measures showed any significant difference compared with the initial definition. 4.2.3 Discussion Given the increasing number of possible drug combinations and the genetic diversity of HIV, it is unlikely that simple hand-crafted rules will capture the complex interplay between drug cocktails and mutational patterns that determine response to antiretroviral therapy. Thus, statistical and computational approaches are required for optimal use of the available drugs in each individual patient. Here we have analysed the ability of various statistical learning techniques in combination with different feature encodings to predict therapy outcome from the baseline genotype and the applied drug combination. The viral genotype is only one of many patient-specific characteristics, such as immune status or genetic predisposition, and it is straightforward to extend our approach to situations where additional parameters are available (see Chapter 5). Nevertheless, the viral genotype has a prominent role among those covariates as it encodes the structure of the target proteins. The challenge in using these genetic data is to predict the evolution of the virus population under drug pressure and to understand the complex relationship between mutational patterns and in vivo drug resistance. We have addressed these issues by considering, in addition to drug and mutation indicators, features that make use of phenotypic and evolutionary information. Our computational experiments have identified the predicted phenotype and the genetic barrier to drug resistance as the most beneficial features, significantly boosting the predictive power of all classifiers. The choice of the learning technique had less impact, but LMT consistently showed an advantage over the other methods. In general, the representations that incorporate in vitro phenotype predictions yielded better performance than the GPS, a purely evolutionary feature, but both sources of information led to improvements among all tested classifiers and datasets. The two subanalyses of the complete TCE dataset A showed opposing results. Whereas the performance was increased compared with dataset A for the balanced sequences (BS), it was decreased for the balanced therapies (BT). In the case of the BS data, classifiers need to learn the differences between drug combinations conditioned on the genotype. However, in effect, with this dataset the dependence on genotype is largely masked by the differing application profiles of regimen use accumulated in the dataset. For example, lopinavir appeared in only 38 failing regimens and in 284 successful follow-up therapies. Likewise, the combination of zidovudine and didanosine defined 60 failures, but not a single success. This is because the underlying clinical cohort data reflects the historical approval and use of drugs. For example, the combination of zidovudine and didanosine has been available for some time and many patients have received it until failure. This is easy to learn from the data, but the generalization that this combination always fails, although strongly supported by the data, is certainly wrong and thus misleading when evaluating treatment options containing zidovudine and didanosine. This problem is eliminated with the other subset, BT, where classifiers learn differences between mutational patterns conditioned on the regimen. Because, in learning therapy outcome, we aim to generalize the mutation-drug interactions and not to reconstruct historical drug-use patterns, we regard the performance using dataset BT as a more genuine estimate of our ability to predict treatment response. 4.2 geno2pheno-THEO 61 We have addressed the concern that the learned models might be biased by the specific sampling of patients by analysing TCEs from different clinics separately. We found a small decrease in predictive performance that was more pronounced when the more homogeneous subset A1 was used for training. This finding highlights the importance of including data from many different clinical centers. The performance loss was greater for the BS dataset than for the BT dataset indicating that the patterns of drug use can vary among clinics. Larger studies that include sufficient data from several clinics need to investigate this source of bias in more detail in the future. In the present study, we have dichotomized therapy response into “success” and “failure”. However, with response defined as the change in VL after a certain time, regression methods can be used for learning in a similar way, and this is unlikely to affect the results presented here. Furthermore, most classifiers predict a score (in fact, often a probability) rather than only the class label. Thus, for a given genotype, they can be used for ranking drug combinations with respect to the expected therapeutic success. In the future, ranking systems could evolve into valuable tools for supporting the complex decision-making process of clinicians. However, such systems will have to provide an interface that allows physicians to incorporate their prior knowledge on a given case, and to constrain the range of possible regimens by explicitly excluding or including specific drugs. In this way, it is possible to spare drugs or drug classes, limit total pill burden, or react to adverse effects, while being able to identify the most promising options from the remaining regimens. The development of improved therapy rankers will crucially depend on their public availability, which allows experts to identify strengths and weaknesses of different approaches. For this reason, we have implemented the prototypical therapy ranker THEO (THErapy Optimizer, Figure 4.6). It is freely accessible for research purposes as part of the geno2pheno web site4 . In order to produce a ranking of therapies, geno2pheno-THEO (g2p-THEO) applies the classifier trained for predicting therapy response to the sequences of PR and RT provided by the user and a predefined set of therapies (consisting of any combination of either two PIs or two NRTIs plus one NNRTI or one PI). The score produced by the classifier for every combination of drugs is the expected therapeutic success, which is used to rank the considered therapies. In particular, g2p-THEO applies the LMT classifier, which was trained on dataset BT using the genetic barrier encoding as input, to derive the scores needed for the ranking. This selection of feature encoding and statistical learning method was based on the cross-validation results obtained on dataset BT. Figure 4.6 depicts the results of a g2p-THEO analysis of a genotype from a patient failing treatment after first-line therapy with abacavir, lamivudine and efavirenz (for a list of mutations, see caption of Figure 4.6). The first two regimens proposed by g2p-THEO are double PI therapies (saquinavir and lopinavir/ritonavir, and amprenavir and lopinavir/ritonavir) with a predicted probability of virological success of over 70%. As discussed above, the statistical model used by g2p-THEO has some limitations. Thus, suggestions made by g2p-THEO must be used with care. Moreover, the ranking displayed by g2p-THEO might appear counterintuitive, because it is based on confidence scores provided by the statistical learning method and the single criterion for therapy success is reduction of VL below a threshold. Hence, regimens that are highly active tend to occur among the top-ranked therapies. Although these regimens are most likely to reduce the 4 http://www.geno2pheno.org 62 4 Predicting Response to Combination Therapy Figure 4.6: The g2p-THEO applet for selecting and evaluating drug combinations. The applet allows for limiting the number of drugs that can be part of a therapy (No. of drugs). Furthermore, the daily burden (No. of pills per day) can be limited. The number of compounds from each drug class can be set (for example, 1≤NRTIS≤3), and the use of single drugs can be enforced (for example, 3TC=include) or excluded. Pushing the “Compute” button ranks all remaining therapies according to the underlying prediction model. In the resulting table, the components of the regimens are listed (Regimen), the number of pills, how many pills of every compound are included in the regimen (Comment), and the score calculated by the model (Success). Additionally, the predictions are compared in a histogram between all selected (according to the constraints specified by the user; red bars) and all possible therapies (black bars). The figure shows the result of a g2p-THEO analysis of a genotype from a patient failing treatment after first line therapy with abacavir, lamivudine and efavirenz. The sequence (subtype B) contained the following mutations (compared with the reference strain HXB2): PR V3I, E35D, S37N, L63P and A71T and RT L74V, K102N, K103N, Y115F, E122K, D123E, D177E, I178V, V179D, M184V, G190A, T200A, R211K, L214F, H221Y, T286A, E297R and S322T. 4.3 Clinical Validation 63 VL below the threshold necessary for classifying a TCE as a success, less active regimens might be a better choice for preserving future drug options and might therefore be ranked higher by clinicians (Jiang et al., 2003). Thus, we emphasize that the purpose of therapy rankers will always be to support, and not to replace, the complex decision-making process of clinicians. In future work, further improvement of the genetic barrier representation might be achieved by estimating mutagenetic trees for combinations of drugs instead of single drugs. Such trees would model evolutionary pathways to multi-drug resistance and handle the interplay of drugs within the applied regimen. However, this approach is currently infeasible due to the lack of sufficient data. We also emphasize that the drug-wise computation of the genetic barrier, as employed here, facilitates the incorporation of a new drug, because only one additional mutagenetic tree needs to be learned. Moreover, to a first approximation, this tree can be learned from in vitro and clinical study data even prior to approval. The most promising avenues to increased prediction accuracy and improved therapy ranking are via larger datasets, the use of additional parameters (such as CD4+ T cell counts, plasma levels of drugs, viral replication capacity, and host genetic factors), analysis of previous sequences of the viral population that represent important background information, and a profound understanding of viral evolutionary escape from drug pressure. In the following section, we provide a retrospective validation of g2p-THEO using data from European patient cohorts. These type of validations are used to ensure that interpretation approaches are not overfitted to a specific patient cohort. Furthermore, validations facilitate the comparison between different approaches on independent data, and thereby allow for objectively rating their performance. 4.3 Clinical Validation This section describes a joint work with Martin Däumer, Niko Beerenwinkel, Yardena Peres, Eugen Schülter, Joachim Büch, Soo-Yon Rhee, Anders Sönnerborg, W. Jeffrey Fessel, Robert W. Shafer, Maurizio Zazzi, Rolf Kaiser, and Thomas Lengauer. The work was published in the Journal of Infectious Diseases under the title “Predicting the Response to Combination Antiretroviral Therapy: Retrospective Validation of geno2pheno-THEO on a Large Clinical Database” (Altmann et al., 2009a). In this section, we present the external validation of g2p-THEO in a dataset containing 7600 treatment-sequence pairs collected in a Europe-wide effort (Aharoni et al., 2007). This validation is of similar nature as the split of dataset A into A1 and A2 (based on the center where the data was collected) presented in the previous section. Virological response was dichotomized, and performance was compared with three state-of-the-art expert-based interpretation tools. In subsequent analyses, various techniques of statistical learning were applied to (1) assess the putative improvement in prediction accuracy incurred by applying models for specific drug combinations and (2) investigate the reliability of g2p-THEO when applied to unseen combinations of compounds - that is, those combinations that are not contained in the training dataset. 64 4 Predicting Response to Combination Therapy 4.3.1 Material and Methods Treatment Change Episodes (TCEs) The present study used the previously introduced definition of a TCE (Altmann et al., 2007a), on which g2p-THEO is based (Figure 4.2). As opposed to the previous analysis we do not make use of the sustained response. Although current assays have a threshold of sensitivity of 40 or 50 copies, the 400-copy threshold was used to include data obtained by earlier assays. Datasets The statistical model applied by g2p-THEO was trained on data obtained from the Stanford HIV Drug Resistance Database (Rhee et al., 2003) (comprising data from clinical studies ACTG 320, ACTG 364, GART, and HAVANA) and from two northern California clinic populations undergoing genotypic resistance testing at Stanford University. From a total of 25,717 therapies, 10,288 sequences, and 6706 patients, 6359 TCEs were extracted (4776 failing and 1583 successful therapies). This dataset is hereafter called “StanfordCalifornia” (A in the previous section). Overrepresentation of certain compounds in failing or successful therapies within the Stanford-California dataset led the statistical learning models to often base their decisions only on the drug combination, irrespective of the genotype. To eliminate this artifact, g2p-THEO was trained not on the full dataset but on a subset that contained the same number of failure- and success-associated genotypes for every drug combination (see Section 4.2 for details). The number of genotypes per drug combination ranges from two to 446 (2478 TCEs in total). Hereafter, this dataset is called “Stanford-CaliforniaBT” (for “balanced therapies”). In some analyses, both the StanfordCalifornia and the Stanford-CaliforniaBT datasets were evaluated again after removal of ZDV+3TC+IDV combination therapy, which was overrepresented because of the inclusion of the large ACTG 320 dataset. For further analysis, a subset of Stanford-California containing only drug combinations with ≥ 20 successes and ≥ 20 failures was selected. Only six treatments met this requirement (Table 4.3); hereafter, this dataset is called “Stanford-California6”. Using the same definition, an independent TCE dataset (EuResistDB) was extracted from the EuResist integrated database (version 2007/05/29), comprising data from Germany (Arevir) (Roomp et al., 2006), Italy (ARCA) (De Luca et al., 2006), and Sweden (Karolinska Institute). For more details on the EuResist database see Chapter 5. From a total of 58,195 therapies, 19,258 sequences, and 16,999 patients, we obtained 7603 TCEs (6217 failing and 1386 successful therapies). For further analysis, we generated the subset EuResistDB6, comprising the same six drug combinations as in Stanford-California6 (Table 4.3). Because the EuResist integrated database includes antiretroviral treatments started from 1990 to 2007, the main analysis was repeated on the TCE subset derived from treatments started after 31 December 2000, to minimize the contribution of obsolete therapy records featuring e.g. non-boosted protease inhibitors. 4.3 Clinical Validation Regimen ZDV+3TC+IDV d4T+3TC+SQV/r ddI+d4T+EFV d4T+3TC+EFV ddI+d4T+NFV d4T+3TC+NFV All regimens Stanford-California6 Failure Success Total 223 229 452 71 28 99 96 54 150 60 27 87 110 28 138 275 28 303 835 394 1229 65 EuResistDB6 Failure Success Total 108 5 113 27 2 29 132 25 157 114 16 130 131 22 153 209 16 225 721 86 807 Table 4.3: Distribution of failing and successful treatment change episodes for the 6 most common regimens in the Stanford-California and EuResistDB datasets (Stanford-California6 and EuResistDB6 subsets). ANRS Rega Stanford HIVdb 1.0 susceptible susceptible susceptible, potential low-level resistance Score 0.5 possible resistance intermediate low-level resistance, intermediate resistance 0.0 resistant resistant high-level resistance Table 4.4: Mapping of the classification given by different expert-based interpretation systems to continuous values. Interpretation systems ANRS (Meynard et al., 2002, version 2006/07), Rega (Van Laethem et al., 2002, version 7.1.1), and Stanford HIVdb (Rhee et al., 2003, version 4.3.0) are expert-based interpretation methods. These algorithms apply carefully handcrafted interpretation rules or tables derived by expert panels from the analysis of available in vitro and in vivo resistance data. In addition, some rules of the ANRS algorithm were derived from the statistical association between baseline genotypic data and virological response. The classification generated by the interpretation systems was normalized into a score by mapping the verbal classification to a score according to Table 4.4. Thus, the rating numerically represents the activity of a drug against the virus on a scale ranging from 0.0 (inactive) to 1.0 (fully active). Individual scores for NNRTIs and for boosted PIs computed using the Rega algorithm were converted to 1.0, 0.25, and 0.0 and to 1.5, 0.75, and 0.0, respectively, as indicated by the algorithm developers. The treatment score or genotypic susceptibility score (GSS) (De Luca et al., 2004) was then defined as the sum of single-drug scores for the compounds included in the regimen. g2p-THEO is a data-driven interpretation system that directly computes a rating for a combination therapy. This value can be interpreted as the probability of the viral load being reduced to below the limit of detection during the course of therapy. g2p-THEO represents the HIV-1 genotype by 49 indicator variables, each of them indicating the presence (1) or absence (0) of a resistance mutation (Johnson et al., 2005) (the complete 66 4 Predicting Response to Combination Therapy list: for protease, 10F/ I/R/V, 16E, 20M/R/I, 24I, 30N, 32I, 33F/I/V, 36I/L/V, 46I/L, 47V/A, 48V, 50L/V, 53L, 54L/V/M/T/A/S, 60E, 62V, 63P, 71V/T/I/L, 73S/ A/C/T, 77I, 82A/F/T/S, 84V, 88D/S, 90M, and 93L; for reverse transcriptase, 41L, 44D, 62V, 65R, 67N, 70R, 74V, 75T/M/S/A/I, 77L, 100I, 103N, 106A/M, 108I, 115F, 116Y, 118I, 151M, 181C/I, 184V/I, 188C/L/H, 190S/A, 210W, 215F/Y, and 219Q/E). Similarly, treatment is encoded using 17 indicator variables, each representing the presence (1) or absence (0) of a compound in the regimen (for the list of considered compounds, see Table 4.1). In addition, viral evolution during the course of therapy is represented by the genetic barrier to drug resistance (Beerenwinkel et al., 2005a) for all drugs in the regimen. The genetic barrier is the probability that the virus will remain susceptible under drug pressure, given as a numerical value between 0.0 (no genetic barrier [i.e., the virus is expected to become resistant]) and 1.0 (insurmountable genetic barrier [i.e., the virus is expected to remain susceptible]). Together with the indicator variables, the genetic barrier (one value per drug) is used as input to the logistic model tree (LMT) (Landwehr et al., 2005) applied by g2p-THEO to compute a score for a combination therapy. Receiver Operating Characteristic (ROC) Curves ROC curves depict classifier performance by giving a true-positive rate (TPR; percentage of correctly predicted successes) for every false-positive rate (FPR; percentage of failing therapies [i.e., with a genotype obtained during treatment] that were predicted to be successful [i.e., to decrease viral load to <400 copies/ml]). The area under the ROC curve (AUC) summarizes the performance and is a convenient measure for comparing scoring systems without the need to provide a particular cutoff (Brun-Vézinet et al., 2004). Briefly, the AUC is a value between 0.0 and 1.0 corresponding to the probability that a randomly selected success receives a higher score than a randomly selected failure (Fawcett, 2006). The ROCR software (version 1.0-2) (Sing et al., 2005a) was used for the ROC analysis. Comparative Analysis The statistical model applied by g2p-THEO was used to predict the outcome of the genotype-therapy pairs in the external EuResistDB dataset. The performance was compared with that of the three expert-based interpretation tools: ANRS, Rega, and Stanford HIVdb. The EuResistDB dataset was used as an independent test set. One hundred bootstrap replicates of the EuResistDB dataset were used for computing standard deviations. Stability To further analyze the robustness of the approach, the prediction of response to drug combinations that were not present in the training data was simulated by a variant of cross-validation on the training data. In standard cross-validation, the available data are split randomly into n equally sized non-overlapping subsets. Then, n − 1 pooled subsets are used as a training set, and the remaining subset is used as a test set to compute the performance of the model. This procedure is repeated n times. Hence, every subset is used as a test set once. To simulate the prediction of unseen drug combinations, the splits were not carried out randomly, and every subset contained only TCEs with the same 4.3 Clinical Validation Analysis, setting Stability analysis All Regimen-specific models Regimen specific All Kernel C Linear 0.1 1 Linear Linear 0.1 0.1 C optimizing AUC in 5-fold CV 4 (optimized AUC in 10-fold CV) 67 Table 4.5: Support vector machine settings. C is the cost factor, CV refers to crossvalidation. drug combination. Results derived by this “therapy-fold” cross-validation were compared with results of standard cross-validation, with the number of folds equal to the number of different drug combinations. A substantial loss in performance (e.g., a loss of 0.1 in AUC) for the therapy-fold cross-validation compared with the normal protocol indicates unstable behavior with respect to unseen drug combinations, because it indicates the requirement to include examples of the drug combination that should be predicted in the training data for maintaining the performance. The experiment was repeated on the special subset StanfordCaliforniaBT and on the complete EuResistDB dataset. Because of limited computational resources, the LMT applied by g2p-THEO was replaced by the faster linear support vector machines (SVMs) (Chang and Lin, 2001), with similar predictive performance (see e.g. Table 4.2). SVM settings are listed in Table 4.5. Regimen-specific Models For some drug combinations, many samples were available. Models trained exclusively on TCEs for one drug combination are expected to predict response to that regimen more accurately than models trained on all available TCEs. The Stanford-California6 subset contained sufficient data for training 6 regimen-specific models and for measuring their performance. The performance of the individual models was assessed by 10 repetitions of five-fold cross-validation on data comprising only one drug combination. The performance of the full model was assessed by 10 repetitions of a variant of five-fold cross-validation in which all TCEs with other drug combinations were added to the four subsets forming the training data. SVMs with a linear kernel were used as a statistical learning method. SVM settings are listed in Table 4.5. Performance in an independent test set was assessed by predicting the outcome of TCEs in EuResistDB6 with the full model and the six regimenspecific models (10 repetitions). Results were compared with those obtained using g2pTHEO and the three expert-based interpretation tools. 4.3.2 Results Comparative Analysis Figure 4.7 depicts the ROC curves for the three expert-based interpretation tools and g2p-THEO. The curves for Stanford HIVdb, ANRS, and Rega did not differ substantially, resulting in comparable TPRs and FPRs for every GSS cutoff. In contrast, in the FPR range from 0% to 40%, the curve for g2p-THEO was distinctly located above all the other 68 4 Predicting Response to Combination Therapy curves. For higher FPRs, all curves proceeded with similar slopes. The AUC value for g2p-THEO was significantly larger than those for the expert-based approaches (p < 0.001; paired Wilcoxon test). ROC curves allow for a detailed analysis of specific points on the curve. For example, at a FPR of 20% (close to a GSS cutoff of 2.5 for Rega and ANRS and of 2.0 for Stanford HIVdb), Stanford HIVdb, ANRS, and Rega yielded TPRs of 44.2% (±3.2%), 47.8% (±2.7%), and 46.5% (±2.2%), respectively. In contrast, at the same FPR g2p-THEO achieved a TPR of 64.0% (±1.6%). On the other hand, at a false-negative rate (FNR) of 10% (close to a GSS cutoff of 1.0 for all systems), Stanford HIVdb, ANRS, and Rega yielded true-negative rates of 54.1% (±2.0%), 51.0% (±2.1%), and 55.9% (±0.8%), respectively, compared with 54.7% (±1.8%) for g2p-THEO. Restriction of the EuResistDB dataset to therapies started after 31 December 2000 led to an even more pronounced difference in AUC between g2p-THEO (0.824±0.006) and the expert-based approaches (for Stanford HIVdb, 0.728±0.008; for ANRS, 0.733±0.008; for Rega, 0.754±0.007). Stability Table 4.6 summarizes the results of the stability analysis. The standard cross-validation performance on the Stanford-California dataset was in line with previously computed results (Table 4.2). For both Stanford-California datasets containing the highly prevalent ZDV+3TC+IDV regimen, the therapy-fold cross-validation yielded slightly worse results than the standard protocol (difference in AUC of ∼ 0.03). In contrast, in the EuResistDB dataset and both Stanford-California datasets without ZDV+3TC+IDV, no loss in performance was observed. Regimen-specific Models Table 4.7 shows the mean AUC and standard deviation for the regimen-specific models and the full model for the six most common drug combinations in the StanfordCalifornia dataset. The results for the combination of d4T+3TC+SQV/r were the worst for both models, and there was no benefit in using the regimen-specific model. However, for the remaining five drug combinations, the benefits of regimen-specific models ranged from 0.017 to 0.048 and reached statistical significance. Within the EuResistDB6 dataset, an insufficient number of successful TCEs was available for ZDV+3TC+IDV and d4T+3TC+SQV/r (Table 4.3). The full model outperformed the regimen-specific model only for d4T+3TC+EFV. For the remaining three drug combinations, the benefit of the regimen-specific model was more pronounced than in the cross-validation setting and ranged from 0.046 to 0.301 for d4T+3TC+NFV, a combination for which the full model actually failed to make useful predictions. The LMTs in g2p-THEO also outperformed the regimen-specific model for d4T+3TC+EFV. In the remaining three cases, the benefit of the regimen-specific models ranged from 0.011 to 0.082. All regimen-specific models outperformed the expert-based methods. Figure 4.8 depicts the ROC curves for the regimen-specific models (g2p-THEO-SVMRS), the full model (g2p-THEO-SVM), g2p-THEO, and the expert-based methods applied to the EuResistDB6 dataset. The full model performed better than the expert-based methods in the area below a FPR of 32% but worse in the remaining region. As in the previous ROC plot (Figure 4.7), g2p-THEO performed better than the expert-based 4.3 Clinical Validation 69 Figure 4.7: Receiver operating characteristic (ROC) curves for the EuResistDB dataset. Every method is represented by a single ROC curve, namely Stanford HIVdb, ANRS, Rega, and geno2pheno-THEO (g2p-THEO). Each point on the curve represents a classifier with a different cutoff and allows the true-positive rate (TPR) and false-positive rate (FPR) for that cutoff to be determined. Whiskers indicate the standard deviations of the TPRs at a specific FPR. The genotypic susceptibility score (GSS) and predicted success probability (p) cutoffs leading to specific TPR and FPR values are indicated within the plot for expert-based approaches and g2p-THEO, respectively. For each method, the area under the ROC curve and its standard deviation are given parenthetically in the box. interpretation tools in the area below a 50% FPR and performed as well in the remaining region. However, the regimen-specific models outperformed the other methods over the whole range of FPRs. More specifically, they yielded a TPR of 58.4% at a FPR of 20%, compared with 39.8% for the expert-based methods and 53.6% for g2p-THEO. 4.3.3 Discussion Validation of HIV genotype interpretation systems is a crucial step in translating computerbased methods into clinically effective treatment decision support tools. In the present study, using an external dataset of ≈ 7600 TCEs extracted from the EuResist integrated database, the recently developed g2p-THEO system was shown to outperform the three most widely used expert-based interpretation systems. Although the EuResist dataset included many obsolete therapies because of its long observation period, the same results 70 4 Predicting Response to Combination Therapy AUC Dataset Stanford-California With ZDV+3TC+IDV Without ZDV+3TC+IDV Stanford-CaliforniaBT With ZDV+3TC+IDV Without ZDV+3TC+IDV EuResistDB Drug combinations, no. Standard cross-validation Therapy-fold cross-validation 876 875 0.901 0.898 0.870 0.895 323 322 712 0.838 0.812 0.847 0.812 0.813 0.844 Table 4.6: Area under the receiver operating characteristic curve (AUC) values obtained by standard cross-validation and therapy-fold cross-validation. The rows of the table correspond to the five datasets, with both Stanford-California datasets studied with and without the ZDV+3TC+IDV drug combination, in light of the overrepresentation of this regimen caused by the ACTG 320 data. (StanfordCaliforniaBT is the “balanced therapies” subset.) Note that computation of the AUC requires positive and negative samples, but because of the nature of therapy-fold cross-validation, it could not be ensured that positive and negative samples were present in every subset of the cross-validation. Thus, it was not possible to compute fold-wise AUC values or their standard deviations and statistical significance. were confirmed when therapies started before 1 January 2001 were removed. The g2pTHEO system was more accurate than ANRS, Stanford HIVdb, and Rega by 16.2% 19.8% in the detection of therapeutic success (20% FPR). However, all of the systems were comparable in detecting treatment failure at a 10% FNR. This finding suggests that expert-based systems are better suited to detect the failure of therapy than to detect success, probably because their original purpose was to detect resistance to individual drugs. However, whether a treatment will most likely be successful is exactly the response a user expects from a decision support tool. Current expert-based approaches are indeed evolving into clinically oriented tools aimed at building effective combination regimens. Computing a regimen GSS by simple summation of the individual drug scores derived by expert-based systems fails by definition to weight both different drug potencies and drug interaction effects. However, such an unweighted GSS is still commonly used (Maggiolo et al., 2007; Cozzi-Lepri et al., 2007) in the absence of any agreed-upon standard for a weighted GSS. Notably, the latest Rega algorithm has introduced arbitrary drug weights in an attempt to account for the expected increased potency of ritonavir-boosted PIs and a lack of intermediate NNRTI activity. The superior performance of g2p-THEO may have derived from two factors. First, the calculated genetic barrier provides useful information by estimating the probability of viral evolution under drug pressure (Altmann et al., 2007a). Second, the training process assigns weights to all drugs. Hence, during the decision making process, drugs are not treated equally. As shown by Altmann et al. (2007b, 2009b), this can significantly improve the performance of genotype interpretation tools. On the other hand, g2p-THEO is currently p <0.001 0.256 <0.001 0.003 <0.001 <0.001 Regimen-specific model 0.596 (0.038) 0.420 (0.012) 0.925 (0.003) 0.727 (0.016) 0.821 (0.008) 0.829 (0.016) 0.720 0.810 (0.005) Full model 0.570 0.815 0.879 0.771 0.709 0.528 0.712 0.733 g2p-THEO 0.610 0.444 0.914 0.769 0.739 0.782 0.710 0.781 EuResistDB6, AUC Stanford HIVdb 0.656 0.407 0.803 0.693 0.665 0.782 0.668 0.707 ANRS 0.692 0.463 0.798 0.724 0.671 0.781 0.688 0.718 Rega 0.755 0.426 0.777 0.708 0.684 0.784 0.689 0.707 Table 4.7: Area under the receiver operating characteristic curve (AUC) values for cross-validation and training test set-up. Rows correspond to the six regimens in the two datasets. The columns for the Stanford-California6 data show the AUC values derived by 10 repetitions of five-fold cross-validation for the regimen-specific models and the full model using all available training data (SDs are in parentheses); p-values for the comparison of the performances of the two models were obtained by the Wilcoxon rank-sum test. The columns for the EuResistDB6 dataset show the AUC values for the regimen-specific models (standard deviations are in parentheses), with the full model using all Stanford-California data, the original geno2pheno-THEO (g2p-THEO) predictions, and the three expert-based approaches (Stanford HIVdb, ANRS, and Rega). Regimen ZDV+3TC+IDV D4T+3TC+SQV/r ddI+d4T+EFV d4T+3TC+EFV ddI+d4T+NFV d4T+3TC+NFV Mean Full dataset Stanford-California6 Regimen-specific Full model, model, AUC AUC 0.965 (0.005) 0.928 (0.002) 0.747 (0.035) 0.740 (0.022) 0.988 (0.006) 0.950 (0.002) 0.924 (0.028) 0.876 (0.015) 0.974 (0.005) 0.957 (0.005) 0.946 (0.010) 0.915 (0.010) 0.924 0.895 4.3 Clinical Validation 71 72 4 Predicting Response to Combination Therapy Figure 4.8: Receiver operating characteristic (ROC) curves for regimen-specific models applied to the EuResistDB6 dataset. Every method is represented by a ROC curve for a subset of the EuResistDB data comprising six different treatments. In addition to the methods depicted in Figure 4.7, ROC curves are shown for g2p-THEO-SVM (g2p-THEO in which logistic model trees were replaced by SVMs) and g2p-THEO-SVM-RS (g2p-THEO using regimen-specific SVMs). For each method, the area under the ROC curve is given parenthetically in the box. limited to a set of well-established resistance mutations, which might explain its inability to improve the detection of failing regimens. In the computation of a score for a drug combination, the robustness of the tool with respect to unseen drug combinations is an important issue. In both Stanford-California datasets, a slight decrease in the AUC was observed in predictions for unseen drug combinations. Overrepresentation of ZDV+3TC+IDV therapy due to inclusion of the ACTG 320 clinical trial in the datasets was identified as a possible confounder of this analysis, because no decrease in performance with unseen drug combinations was observed after removal of TCEs containing the ZDV+3TC+IDV combination. Thus, our stability analysis indicated that g2p-THEO returns reliable scores for unobserved drug combinations. The major reason for the preservation of performance is that the prediction is based on a linear model. Indeed, during the learning process of the linear model, contributions of every single covariate to the outcome are computed. This also holds for drugs in a regimen, because the impact has to be distributed among these drugs. In the end, observed and 4.3 Clinical Validation 73 unobserved drug combinations are both composed of observed compounds. However, this property is also a potential disadvantage of linear models. Specifically, if the prediction is based on a linear model, the synergistic effects between drugs or mutations cannot be represented unless a large number of covariates are introduced to explicitly model these effects. In contrast, in regimen-specific models all mutations are evaluated in the context of the same drug combination, thus rendering the explicit modeling of interactions unnecessary. In the Stanford-California6 dataset, the regimen-specific models for all drug combinations exhibited increased performance compared with the full model, even though the full model had access to many more training samples. This finding was confirmed in independent data. However, the benefit of regimen-specific models decreased when they were compared with the full statistical model for g2p-THEO. This can be explained by the fact that g2p-THEO applies LMTs that directly train multiple linear models on distinct subsets of the training data. Unfortunately, only a few outdated regimens gave rise to enough training data for regimen-specific models. However, a promising approach has been recently proposed (Bickel et al., 2008), one that pools data on “similar” regimens to overcome this limitation in generating regimen-specific models. A major issue with any fully data-driven system is the inability to generate predictions for newly licensed compounds because of the delayed availability of sufficient training data. This drawback can be addressed only by multicenter efforts and cooperation between drug companies and regulatory bodies for immediate release of clinical trial data. However, because an optimized background regimen is recommended for the effective use of any new compound, interpretation systems are still relevant for choosing the backbone drugs, particularly in heavily experienced patients. Expert-based systems can complement datadriven systems for predicting the activity of novel drugs until sufficiently large genotyperesponse datasets are available. An attempt to combine expert scores for novel drugs with a data-driven approach will be presented in Section 5.8. Data-driven systems need a large amount of data for training, so observational cohort data are often used. These provide a valuable source for assessing the impact that drug resistance has on the response to treatment but typically lack other relevant information, including adherence levels and pharmacokinetics data. Weighting for these factors is expected to help us develop better systems aimed at building effective regimens. It must also be noted that modern and future antiretroviral treatment strategies are expected to limit the development of drug resistance by providing increased potency and convenience, perhaps making treatment toxicity issues relatively more relevant than resistance over time. However, drug resistance and cross-resistance remain issues for a substantial proportion of patients harboring viral populations that display a complex mutational pattern because of multiple treatment failures. In addition, toxicity is also a major contributor to the selection of drug resistance through decreased adherence. Developed as a clinically oriented tool, g2p-THEO allows the user to exclude specific drugs for toxicity issues and provides a total pill count for each regimen. Although no data-driven system is meant to replace a comprehensive patient evaluation by an expert HIV specialist, validated tools such as g2p-THEO can provide an appropriate support to most care-givers of HIV-infected patients. 74 4 Predicting Response to Combination Therapy 5 E U R ESIST: Uniting Data for Fighting HIV In the previous chapter we demonstrated that features derived from the viral genotype are helpful in predicting response to combination therapy. The predictive performance of statistical learning models, however, depends to a great deal on the amount of available training data. In the case of geno2pheno approximately 1000 genotype-phenotype pairs were sufficient for building well performing models for inferring phenotypic drug resistance against a singe drug from genotype. The predicted resistance information is based on information derived in vitro. Clinicians, however, are mainly interested in knowing whether a drug-cocktail will work in vivo or not. The fact that anti-HIV drugs can be used in hundreds of combinations calls for massive amounts of training data for systems such as g2p-THEO that directly predict response to combination therapy. The HIV Resistance Database Initiative (RDI) was the first international initiative aiming at the integration of multiple databases from different centers for collecting sufficient data to train such statistical models (Larder et al., 2002). Since its foundation in 2002, the RDI database grew from 3,500 to approximately 30,000 patients in 20071 . However, till now RDI failed to provide a freely accessible web tool that predicts response to combination treatments. This chapter describes our work within the EU project: EuResist. EuResist, like RDI, aims at the collection of data and a the development of a free web accessible tool that predicts response to combination treatments. The current version of the database holds approximately records from 33,000 different patients. Unlike RDI, EuResist made the prediction system freely available already in 2007. This chapter begins with a background on the EuResist project (Section 5.1). In Sections 5.2 and 5.3 we introduce the individual EuResist prediction engines and the combination strategies, respectively. The final web service is described in Section 5.5. Sections 5.6 and 5.7 investigate how changes in the definition to treatment response affect the prediction performance. Furthermore, in Section 5.8 an approach for overcoming a serious limitation of all prediction engines aiming at inferring response to combination therapies, namely the update ability with respect to novel compounds, is studied. Finally, section 5.9 explores the possibility of predicting response to antiretroviral therapy based on the treatment history alone. 5.1 Project Background The EuResist project aimed at developing a European integrated system for clinical management of antiretroviral drug resistance. The system should provide the clinicians with a prediction of response to antiretroviral treatment in HIV patients, thus helping the clinicians to choose the best drugs and drug combinations for any given HIV genetic variant. To this end, a massive European integrated dataset was created, linking some of the largest locally existing resistance databases. 1 source: http://www.hivrdi.org 75 76 5 EuResist: Uniting Data for Fighting HIV The project involved participants from eight institutions. Three institutions acted as data providers and consultants concerning virological knowledge and clinical aspects of HIV treatment, namely the University of Siena, the Karolinska Institute in Stockholm, and the University of Cologne. Four participating institutions were involved in modeling the prediction engines: IBM Israel, KFKI Research Institute for Particle and Nuclear Physics Hungarian Academy of Sciences, Informa S.r.l., and the Max Planck Institute for Informatics. A division of IBM Israel was also responsible for the integration of the multiple data sources into a unified database, and Informa S.r.l. was concerned with the overall project management and the development of the web front end for the prediction engine. The Kingston University dealt with matters of data quality of the integrated database. Like other HIV resistance tools, the response models should be able to provide a prediction even when the genotype is the only available information. The data collected in the resistance databases, however, enable exploiting patient related data: for instance the patient’s treatment history, the current immune status, and demographic data. To allow for both: the use of the genotype only and the exploitation of additional data, the modeling was restricted to genotype and treatment information only (minimal feature set) and opened for all available information (maximal feature set). In operation mode, the model chosen for the prediction depends on the information provided by the user. A considerable effort within the project was the definition of the standard datum. Briefly, the standard datum is the equivalent to the TCEs in the previous chapter and defines which data from the database are considered for learning and what has to be learned. For instance, if one is interested in virological response to an anti-HIV treatment around three months after onset of the regimen (short-term response), then one has to extract treatments from the database where a viral load measurement is available around three months of treatment. The standard datum definition can also be a source of error or overfitting. For instance, in the work by Larder et al. (2007) the time between onset of the treatment and the viral load measure is used as a feature. In their study, one treatment with different viral load measurements gave rise to at most three training points for the same statistical model. The setup ensured that the patients in training and test set formed a distinct set. However, the high correlation of the cases, due to multiple samples from the same treatment, in the small test set of 50 instances leads very likely to an overestimation of model performance. Another important issue, which the standard datum definition has to address, concerns the baseline measurements, i.e. what is the maximal allowed time span between measurement of a biomarker and start of the therapy so that the value can be considered a baseline measurement. For example, a genotype obtained about one year before the treatment start is clearly too outdated to serve as a baseline genotype. On the other hand, a genotype that was generated a few weeks before treatment start is clearly suitable. In general, the stricter a standard datum definition, the fewer data are available for the statistical learning step, thus the definition has to find a good balance between strictness and resulting training data. The standard datum definition applied within the EuResist project considers values that were obtained at most three months before start of the regimen as baseline values – only if there was no intermediate treatment of at least two weeks length between the measurement and start of the treatment to be considered. With this definition, EuResist follows the recommendations of The Forum for Collaborative HIV Research. Furthermore, 5.1 Project Background Sequencing & VL start VL < 500 cp/ml or 100−fold reduction? 77 stop < 90 days 28 days > 0 days 56 days Therapy Figure 5.1: Standard datum. A treatment change is considered a valid training/testing instance if the baseline measures (viral genotype and VL) were obtained at most 90 days before start of the new treatment and given that there was no other treatment lasting longer than 14 days between time point of measurement and start of treatment. A treatment is considered successful if the VL at 8 weeks (±4) is below the limit of detection (here: 500 copies of viral RNA per ml blood serum) or constitutes a 100-fold reduction compared to the baseline value. the EuResist standard datum focuses on initial response around eight (four to 12) weeks of treatment; one treatment can only give rise to only one training instance since only the closest measurement to the eight weeks time point is considered. Thus, the definition facilitated a regression approach, i.e. predicting the change in viral load between the baseline measurement and the follow-up measurement. In addition, like in the development of g2p-THEO, a classification approach was investigated within the project. To this end, treatment response was dichotomized into success and failure based on the reduction of viral load. Precisely, if the viral load could be reduced below the limit of detection, i.e. 500 copies of viral RNA per ml blood serum, then the treatment was considered a success. Some patients, though, start with extremely high viral load values and a reduction below the limit of detection is hard to achieve within the short time frame. Thus, alternatively, a success can be achieved by a 100-fold reduction of the viral load compared to the baseline value. Figure 5.1 depicts a schematic overview of the standard datum definition. Modifications of this standard datum definition that focus on sustained response (VL at 24±8 weeks) are studied later in this chapter. Figure 5.2 depicts the growth of the EuResist Integrated database (EIDB) over the period of the project. The amount of patients and HIV sequences almost doubled from initially about 17,000 to now 34,000 entries. Likewise, the number of stored treatments nearly tripled from 35,000 to 98,000. The observed increase of data is not only a result of the increased size of the initial three databases, but also a direct consequence of the addition of new databases. In November 2007 the Luxembourg database joined the EuResist integrated database, and in October 2008 three additional databases started their collaboration with EuResist: IRSICAIXA Foundation (Spain), Catholic University of Leuven (Belgium), and Univeristy of Brescia (Italy). More databases are expected to join the EuResist effort. It strikes that the number of usable standard datum instances is by far smaller than the number of sequences in the database. This discrepancy is a result of the strict requirements posed by the standard datum definition. For instance a sequence has to be available at most 90 days prior to treatment start, thus not all available sequences in the database gave rise to a usable learning instance. However, the collection effort increased 78 5 EuResist: Uniting Data for Fighting HIV 6000 100000 ● ● 4500 SD instances Patients Sequences Therapies ● 75000 ● 3000 ● ● ● 1500 ● ● ● ● 50000 ● ● ● ● 25000 ● 10.10.08 27.11.07 16.08.07 01.05.07 29.05.07 23.01.07 0 08.11.06 07.12.06 0 Figure 5.2: Growth of the EuResist integrated database (EIDB). The graph depicts the increase in patients, viral sequences, and recorded treatments in the database of the time of the project (scale on the right). The blue line (scale on the left) corresponds to the increase of standard datum (SD) instances available for training and testing the prediction engine. the available training instances from about 1500 to over 5000. Since not all experiments have been carried out with the most recent release of the EuResist integrated database, we provide for every computer experiment the release date of the database that was used, and thus allowing to put the results in the correct context. 5.2 E U R ESIST Prediction Engines Within the EuResist project four institutions developed statistical models for predicting the response to combination antiretroviral therapy. Three of the prediction engines were selected for the final system to be implemented as a web service. Our contribution, the Evolutionary (EV) engine, was based on the g2p-THEO software described in Chapter 4. The two other contributions the Generative Discriminative (GD) engine and the Mixed Effects (ME) engine originate from the Machine Learning Group of the IBM Research Laboratory in Haifa, Israel and from the Department of Computer Science and Automation of the University of Roma TRE located in Rome, Italy, respectively. In the following sections we provide a brief summary for the GD and ME engines. Furthermore, we specify the differences between the original g2p-THEO software and our EV engine. 5.2 EuResist Prediction Engines 79 outcome PI IDV SQV NRTI 3TC NNRTI ZDV EFV NVP Figure 5.3: The Bayesian network used in the GD engine. Variables indicating use of individual drugs are connected to the outcome (black arrows) and to indicators for their drugs class or in the case of the maximal feature set: historic use of a drug from the same class (green arrows). The variables in the middle layer representing the (historic) use of a drug class are also connected to the outcome variable (blue arrows). 5.2.1 Generative Discriminative Engine The Generative Discriminative (GD) engine applies generative models to derive additional features for the classification using logistic regression. When inspecting Figure 5.2 it strikes that only a small percentage of therapies in the database (Table 5.1) have an associated genotype and are therefore suitable for training a classifier, which is supposed to receive sequence information. However, a much larger fraction of the therapies can be labeled as success or failure on the basis of the baseline and follow-up VL measures alone, since for the labeling no viral genotype is required. The GD engine thus trains a Bayesian network on about 20,000 therapies (with and without associated genotype). The network is organized in three layers and uses an indicator for the outcome of the therapy, indicators for individual drugs, and indicators for drug classes (Figure 5.3). This generative model is used to compute a probability of therapy success on the basis of the drug combination alone. This probability is used as an additional feature for the classification by a logistic regression model: the discriminative step of the approach. Furthermore, indicators for individual drugs and single mutations are input for the logistic regression. Indicators representing a drug class are replaced with a count of the number of previously used drugs from that class when working with the maximal feature set. In this way information about past treatments is incorporated. In addition to features from the minimal set, the maximal feature set comprises indicators for mutations in previously observed genotypes, the number of past treatment lines, and the VL measure at baseline. Correlation between single mutations and the outcome of the therapy was used to select relevant mutations for the model. A detailed description on the network’s setup and the selected mutations can be found in Rosen-Zvi et al. (2008). 80 5 EuResist: Uniting Data for Fighting HIV 5.2.2 Mixed Effects Engine The Mixed Effects (ME) engine explores the benefit of including second- and third-order variable interactions. Basically, since modern regimens combine multiple drugs, binary indicators representing usage of two or three specific drugs in the same regimen are introduced. Further indicators represent the occurrence of two specific mutations in the viral genome for modeling interaction effects between them. Moreover, interactions between specific single drugs and single mutations or pairs of drugs and single mutations are represented by additional covariates. In addition to the terms modeling the mixed effects, other clinical measures (VL and CD4+ T cell counts), demographic information (risk factor, country of infection, risk, sex, . . . ), covariates based on previous treatments (indicators for previous use of drugs and drug classes), and the predicted viral subtype are used as additional covariates. The large number of features (due to the mixed effects) requires a strong effort in feature selection. Thus, multiple feature selection methods were used for generating candidate feature sets. Filters and embedded methods, i.e. methods that are intrinsically tied to a statistical learning method, were applied sequentially: (i) univariable filters, such as χ2 with rank-sum test and correlation-based feature selection (Hall, 1999), were applied to reduce the set of candidate features; (ii) embedded multivariate methods, such as ridge shrinkage (Le Cessie and Van Houwelingen, 1992) and the Akaike information criterion (AIC) selection (Akaike, 1974) were used to eliminate correlated features and to assess the significance of features with respect to the outcome in multivariate analysis. In multiple 10fold cross-validation runs on the training data the performance of the resulting feature sets were compared with a t-statistic (adjusted for sample overlap and multiple testing). The approach leading to the best feature set was applied on all training samples to generate the final model. Unlike the GD engine, the ME engine employs one logistic regression model that only uses the maximal feature set. Missing variables, e.g. when working with the minimal feature set, are replaced by the mean (or mode) of that variable in the training data. 5.2.3 Evolutionary Engine As stated before, one major obstacle in HIV-1 treatment is the development of resistance mutations. The Evolutionary (EV) engine uses derived evolutionary features to model the virus’s expected escape path from drug therapy. The representation of viral evolution is based on mutagenetic trees. Unlike in g2p-THEO the mixture of mutagenetic trees comprises only two components, the noise component and one tree that was estimated from the data. A further difference to the original approach concerns the way of how mutational patterns that define complete drug resistance against an individual compound are identified. This information is crucial for computing the genetic barrier to drug resistance. The original approach, which is briefly described in Section 4.1.2, uses the available genotype-phenotype pairs. Since there are only about 1,000 such pairs, it is very likely that mutational patterns that can occur are not observed within the small sample. Here, we used geno2pheno for predicting the drug resistance phenotypes for approximately 16,000 viruses stored in the EIDB. Based on this much larger sample the procedure of identifying resistant patterns was carried out as initially explained. The modified approach resulted in 5.3 Combining Classifiers 81 genetic barrier values that were more predictive for response to treatment than the values obtained with the original protocol (increase of 0.02 and 0.027 in AUC and correlation, respectively, in a 10-fold cross-validation on the training set). The genetic barrier to drug resistance was provided together with other features, like indicators for individual drugs in the treatment and indicators for IAS mutations in the genotype as in the prototype g2p-THEO. In the maximal feature set, indicators for previous use of a drug and the baseline VL measure extend this list. As a general improvement over the original g2p-THEO prototype, interactions up to second order between indicator variables were considered as well. This was achieved by introducing further indicator variables that explicitly model these interactions. The resulting large number of covariates required the implementation of a feature selection step as part of the training process of the EV engine. The chosen feature selection approach was based on SVMs with a linear kernel. The approach works in three steps: (i) optimization of the misclassification cost parameter C in a 10-fold cross-validation setting to maximize the area under the ROC curve (AUC); (ii) generation of 25 different SVMs by five repetitions of five-fold cross-validation using the optimized cost parameter; (iii) computation of the z-score for every feature. All features with a mean z-score larger than 2 were selected for the final model based on logistic regression. 5.3 Combining Classifiers In order to provide the end-user of the prediction system with a single outcome for a request, instead of multiple results generated by the different engines, the three classifications have to be combined to one final decision. In principle, there are two approaches for combining classifiers, namely classifier fusion and classifier selection. In classifier fusion, complete information on the feature space is provided to every individual system and all outputs from the systems have to be combined; in classifier selection, every system is an expert in a specific domain of the feature space and the local expert alone decides for the output of the ensemble. However, the individual classifiers described above were designed to be global experts, thus only classifier fusion methods were explored. Methods for classifier fusion can operate on class labels or continuous values (e.g. support, posterior probability) provided by every classifier. The methods range from simple non-trainable combiners like the majority vote, to very sophisticated methods that require an additional training step. In order to find the best combination method we compared several approaches ranging from simple methods to more sophisticated ones. All results were compared to a combination that has access to an oracle telling which classifier is correct. Intuitively, the predictive performance of this oracle represents the upper bound on the performance that can be achieved by combining the classifiers. The following subsections briefly introduce the combination methods considered. Non-trainable Combiners As mentioned above, there are a number of simple methods to combine outputs from multiple classifiers. The most intuitive one is a simple majority vote, whereby every individual classifier computes a class label (in this case success or failure) and the label 82 5 EuResist: Uniting Data for Fighting HIV that receives the most votes is the output of the ensemble. One can also combine the posterior probability of observing a successful treatment as computed by the logistic regression. This continuous measure can be combined using further simple functions: mean returns the mean probability of success by the three classifiers (Kittler et al., 1998); min yields the minimal probability of success (a pessimistic measure); max results in the maximal predicted probability of success (an optimistic measure); median returns the median probability. Meta-classifiers The use of meta-classifiers is a more sophisticated method of classifier combination, which uses the individual classifiers’ outputs as input for a second classification step. This allows for weighting the output of the individual classifiers. In this work we applied quadratic discriminant analysis (QDA), logistic regression, decision trees, and naïve Bayes (operating on class labels) as meta-classifiers. Decision Templates and Dempster-Shafer The decision template combiner was introduced by Kuncheva et al. (2001). The main idea is to remember the most typical output of the individual classifiers for each class, termed decision template. Given the predictions for a new instance by all classifiers, the class with the closest (according to some distance measure) decision template is the output of the ensemble. Let ~x be an instance, then DP~x is the associated decision profile. The decision profile for an instance contains the support (e.g. the posterior probability) by every classifier for every class. Thus, DP~x is an I × J-matrix, where I and J correspond to the number of classifiers and classes, respectively. The decision template combiner is trained by computing the decision templates DT for every class. The DT for the class j is simply the mean of all decision profiles for instances ~x belonging that class. Hence, DT j = 1 X DP ~x , ∀j ∈ {1, . . . , J}, Nj ~ x∈ωj where Nj is the number of elements in class ωj . For a new sample, the corresponding decision profile is computed and compared with the decision templates for all classes using a suitable distance measure. The class with the closest decision template is the output of the ensemble. Thus, the decision template combiner is a nearest-mean classifier that operates on decision space rather than on feature space. We used the squared Euclidean distance to compute the support for every class: µj = 1 − J I 1 XX [DT j (j 0 , j) − DP ~x (j 0 , i)]2 , JI 0 j =1 i=1 where DTj (j 0 , i) is the (j 0 , i)-th entry in DTj . Decision templates were reported to outperform other combiners, e.g. Kuncheva et al. (2001) and Kuncheva (2002). Decision templates can also be used to compute a combination that is motivated by the evidence combination of the Dempster-Shafer theory. Instead of computing the similarity 5.3 Combining Classifiers 83 between a decision template and the decision profile, a more complex computation is carried out as described in detail in Rogova (1994). We refer to these two methods as Decision Templates and Dempster-Shafer, respectively. Clusters in Decision Space Regions in decision space where the classifiers disagree on the outcome are of particular interest in classifier combination. Therefore, we propose the following method that finds clusters in decision space and learns separate logistic regression models for every cluster for fusing the individual predictions. Let si be the posterior probability of observing a successful treatment predicted by classifier i. Then we express the (dis)agreement between two classifiers by computing: αij = ( 0 si − sj classifiers i and j agree on the label, else, for all i and j where i < j. Thus, in case of disagreement between two classifiers, the computed value expresses the magnitude of disagreement. These agreements are computed for all instances of the training set and used as input to a k-medoid clustering (Rousseeuw and Kaufman, 1990). For all resulting k clusters, an individual logistic regression is trained on all instances associated with the cluster using the si as input. The idea is that in clusters where e.g. classifier 1 and 2 agree, and classifier 3 tends to predict lower success probabilities the logistic regression can either increase or decrease the influence of classifier 3, depending on how often predictions by that classifier are correct or incorrect, respectively. When a new instance has to be classified then first the agreement between the classifiers is computed for locating the closest cluster. In a second step the logistic regression associated with that cluster is used to calculate the output of the ensemble. The number of clusters k, the only parameter of this method, is optimized in a 10-fold cross-validation. The approach is motivated by the behavior knowledge space (BKS) method (Huang and Suen, 1995), which uses a look-up table to generate the output of the ensemble. However, the BKS method is known to easily overtrain, and does not work with continuous predictions. Local Accuracy-based Weighting Woods et al. (1997) propose a method that uses one k-nearest-neighbor (knn) classifier for every individual classifier to assess the local accuracy of that classifier given the input features. The final output is then solely given by the most reliable classifier of the ensemble. Since the three classifiers in this setting are trained to be global experts, we applied the proposed method to compute the reliability estimate for each classifier given the features of an instance. In contrast to the method proposed by Woods et al. (1997), the output is a weighted mean based on these reliability estimates. In order to use a knn classifier as a reliability estimator the labels from the original instances are replaced by indicators of whether the classifier in question was correct on that instance or not. With the replaced labels and the originally used features the knn classifier reports the fraction of correctly classified samples in the neighborhood of the query instance. This fraction can be used as a local reliability estimator. The output by 84 5 EuResist: Uniting Data for Fighting HIV the ensemble is then defined by a weighted mean: P ri si s̄ = Pi i ri where si and ri are the posterior probability of observing a successful treatment and the local accuracy for classifier i, respectively. For simplicity, only Euclidean distance was used in the knn classifier, the number of neighbors k was optimized in a 10-fold cross-validation setting. Combining Classifiers on the Feature Level As described in Section 5.2, every individual classifier uses a different feature set, specifically, different derived features, but the same statistical learning method. Thus, a further combination strategy is the use of all features selected for the individual classifier as input to a single logistic regression rather than computing a consensus of the individual classifiers’ predictions. 5.4 Predictive Performance of the E U R ESIST Prediction Engine 5.4.1 Data About 3,000 instances in the EIDB (version from 2007/11/27) met the requirements of the standard datum definition and can therefore be used as learning instances. From this complete set, 10% of the data were randomly set aside and used as an independent test set (Table 5.1). The split of the training data in 10 equally sized folds was fixed, allowing for 10-fold cross-validation of the individual classifiers. The same 10 folds were used for a 10-fold cross-validation of the combination approaches. Classification performance was measured in terms of accuracy (i.e. the fraction of correctly classified examples) and the area under the receiver operating characteristics curve (AUC). Briefly, the AUC is a value between 0.0 and 1.0 and corresponds to the probability that a randomly selected positive example receives a higher score than a randomly selected negative example (Fawcett, 2006). Thus, a higher AUC corresponds to a better performance. 5.4.2 Results Results for the individual classifiers using the minimal and maximal feature set are summarized in Table 5.2. The use of the extended feature set significantly improved the performance of the GD and EV engine with respect to the AUC (p ≈ 0.002 for both using a paired Wilcoxon test). With respect to accuracy only the improvement observed by the EV engine reached statistical significance (p = 0.007). Remarkably, replacement of all missing additional features in the case of the ME engine when working with the minimal feature set did not result in a significant loss in performance (p = 0.313 and p = 0.312 with respect to AUC and accuracy, respectively). Correlation among classifiers The performances of the individual classifiers were very similar. Pearson’s correlation coefficient (r) indicated that the predicted probability of success for the training instances 5.4 Predictive Performance of the EuResist Prediction Engine EIDB Labeled Therapies Training Set Test Set Patients 18,467 8,233 2,389 297 Sequences 22,006 3,492 2,722 301 VL measurements 240,795 40,498 5,444 602 Therapies 64,864 20,249 2,722 301 Successes 13,935 1,822 202 85 Failures 6,314 900 99 Table 5.1: Summary of the EuResist Integrated Database (version from 2007/11/27) and training and test set. The table displays the number of patients, sequences, VL measurements, and therapies for the complete EuResist Integrated Database (EIDB) and the set of therapies that could be labeled with the standard datum definition. 469 of the sequences associated with all labeled therapies belong to historic genotypes and are not directly associated with a therapy change. Moreover, detailed information on training set and test set (comprising labeled therapies with an associated sequence each) is given. Engine GD ME EV minimal feature set AUC Accuracy Train Test Train Test 0.747 (0.027) 0.744 0.745 (0.024) 0.724 0.758 (0.019) 0.745 0.748 (0.031) 0.757 0.766 (0.030) 0.768 0.754 (0.031) 0.748 maximal feature set AUC Accuracy Train Test Train Test 0.768 (0.025) 0.760 0.752 (0.028) 0.757 0.762 (0.021) 0.742 0.754 (0.030) 0.757 0.789 (0.023) 0.804 0.780 (0.032) 0.751 Table 5.2: Results for the individual classifiers on training set and test set. The table displays the performance, measured in AUC and Accuracy, achieved by the individual classifiers on the training set (using 10-fold cross-validation; standard deviation in brackets) and the test set using different feature sets. using the minimal (maximal) feature set were highly correlated (i.e. close to 1.0): GD-ME 0.812 (0.868); GD-EV 0.797 (0.786); ME-EV 0.774 (0.768). In fact, the three classifiers agreed on the same label in 80.4% (81.7%) of the cases using the minimal (maximal) feature set. Notably, agreement of the three classifiers on the wrong label occurred more frequently in instances labeled as failure than in instances labeled as success (39% vs. 4% and 37% vs. 4% using the minimal and maximal feature set, respectively; both p < 2.2 × 10−16 with Fisher’s exact test). This behavior led to further investigation of the instances labeled as failures in the EIDB. Indeed, 145 of 350 failing instances, which were predicted to be a success by all three engines, achieve a VL below 500 copies per ml once during the course of the therapy. However, this reduction was not achieved during the time interval that was used in the applied definition of therapy success. Among the remaining 550 failing cases this occurred only 100 times. Using a Fisher’s exact test this difference was highly significant (p = 4.8 × 10−14 ). These results were qualitatively the same when using the maximal feature set. Results of combination methods Table 5.3 summarizes the results achieved by combining the individual classifiers, and Figure 5.4 depicts the improvement in AUC on training and test set of the combination meth- 86 Method SB Oracle Min Max Median Mean Majority QDA LR DTrees NB DT D-S Cluster LA Feature 5 EuResist: Uniting Data for Fighting HIV minimal feature set AUC Accuracy Train Test Train Test 0.766 (0.030) 0.768 0.754 (0.031) 0.748 0.914 (0.015) 0.911 0.842 (0.025) 0.844 0.771 (0.020) 0.765 0.746 (0.027) 0.761 0.760 (0.023) 0.765 0.742 (0.030) 0.731 0.773 (0.020) 0.766 0.759 (0.027) 0.766 0.777 (0.020) 0.772 0.760 (0.024) 0.744 0.683 (0.023) 0.660 0.759 (0.027) 0.738 0.771 (0.020) 0.763 0.755 (0.031) 0.738 0.778 (0.021) 0.774 0.762 (0.028) 0.744 0.718 (0.044) 0.741 0.748 (0.032) 0.757 0.732 (0.027) 0.740 0.759 (0.027) 0.738 0.777 (0.021) 0.774 0.755 (0.027) 0.754 0.777 (0.021) 0.772 0.755 (0.024) 0.754 0.775 (0.019) 0.773 0.758 (0.029) 0.741 0.777 (0.020) 0.771 0.761 (0.025) 0.741 0.750 (0.026) 0.747 0.745 (0.029) 0.751 maximal feature set AUC Accuracy Train Test Train Test 0.789 (0.023) 0.804 0.780 (0.032) 0.751 0.917 (0.013) 0.920 0.850 (0.022) 0.860 0.792 (0.021) 0.793 0.760 (0.030) 0.764 0.779 (0.021) 0.779 0.757 (0.030) 0.741 0.789 (0.029) 0.786 0.768 (0.029) 0.761 0.794 (0.019) 0.793 0.780 (0.028) 0.781 0.697 (0.027) 0.683 0.768 (0.029) 0.761 0.790 (0.022) 0.794 0.769 (0.027) 0.764 0.798 (0.020) 0.805 0.781 (0.030) 0.771 0.722 (0.033) 0.678 0.777 (0.032) 0.757 0.752 (0.028) 0.753 0.768 (0.029) 0.761 0.796 (0.019) 0.797 0.766 (0.026) 0.767 0.796 (0.019) 0.796 0.767 (0.026) 0.764 0.797 (0.018) 0.800 0.783 (0.028) 0.784 0.795 (0.019) 0.791 0.781 (0.029) 0.777 0.786 (0.021) 0.779 0.780 (0.029) 0.767 Table 5.3: Results for the combined classifiers on training and test set. The table summarizes the results achieved by the different combination approaches on the training set (10-fold cross-validation; standard deviation in brackets) and the test set. The reference methods are Single Best (SB) and Oracle, the non-trainable combiners are named according to their function, the meta-classifiers according to the statistical learning methods, logistic regression, decision trees, and Naïve Bayes are abbreviated by LR, DTrees, and NB, respectively. Decision Templates (DT), Dampster-Shafer (D-S), Clustering (Cluster) and Local Accuracy (LA) are the methods described in detail in Section 5.3. Feature is the combination on the feature level. ods compared to the single best and single worst classifier, respectively. Most combination methods improved performance significantly over the single worst classifier. However, only the oracle could establish a significant improvement over the single best classifier. Overall, performances of the combination approaches were quite similar. Of note, the pessimistic min combiner yielded better performance than the optimistic max combiner. Among the non-trainable approaches tested, the mean combiner yielded the best performance. The logistic regression was the best performing meta-classifier. In fact, the logistic regression can be regarded as a weighted mean, with the weights depending on the individual classifier’s accuracy, and the correlation between classifiers. Moreover, using all features of the individual classifiers as input to a single logistic regression did not improve over the single best approach. Figure 5.5 shows the learning curves for the three individual classifiers, the mean combiner, and the combiner on the feature level. The curves depict the mean AUC (after 10 repetitions) on the test set achieved with varying sizes of the training set (25, 50, 100, 200, 400, 800, 1600, 2722). In every repetition the training samples were randomly selected from the complete set of training instances. The mean combiner appeared to learn faster and significantly outperformed the single best engine with a training set size of 200 samples 5.4 Predictive Performance of the EuResist Prediction Engine 87 Figure 5.4: Improvement in AUC of combination methods compared to the single best and single worst classifiers. The figure displays the improvement in AUC of all combination methods over the single best (blue bars) and single worst (red bars) classifiers on the training set (upper panel) and the test set (lower panel). Significance of the improvement on the training set was computed with a one-sided paired Wilcoxon test. Solidly colored bars indicate significant improvements (at a 0.05 p-value threshold), as opposed to lightly shaded bars for insignificant improvements. On the test set no p-values could be computed. (p ≈ 0.010 with a paired one-sided Wilcoxon test). The improvement remained significant up to a training set size of 1,600 samples (p ≈ 0.002). The combination on feature level was significantly worse (p ≈ 0.002) than the worst single approach for all training set sizes (except for complete set). Finally, Figure 5.6 depicts the ROC curves of the three individual engines and the combined engine (mean) using both feature sets on the test set. The results are compared to a Stanford HIVdb based GSS prediction (for details see e.g. Section 4.3). The Stanford HIVdb service did not support the rarely used drug ddC. Consequently, the four instances in the test set containing ddC were excluded. Moreover, Stanford HIVdb did not provide prediction results for two sequences. Thus, the reported AUC values (in parentheses in the figure legend) differ slightly from the values reported in Tables 5.2 and 5.3 as only 295 of the 301 test instances were used for generating the figure. Impact of Ambiguous Failures In order to further study the impact of ambiguous failures (i.e. instances labeled as failure but achieving a VL below 500 cp per ml once during the course of treatment) on the 5 EuResist: Uniting Data for Fighting HIV 0.75 88 ● ● ● ● 0.70 ● ● ● 0.65 ● ● ● ● 0.60 AUC on test set ● ● ● ● 0.55 ● ● ● 0.50 ● ● ●● EV GD ME combined feature level combined mean ● ● ● 0 500 1000 1500 2000 2500 size of training set Figure 5.5: Learning curves for the individual classifiers, the mean combiner, and the combination on feature level. The figure shows the development of the mean AUC on the test set depending on the amount of available training data for the individual classifiers, the mean combiner, and the combination on the feature level using the minimal feature set. Error bars indicate the standard deviation on 10 repetitions. performance of the individual classifiers and the combination by mean or on the feature level, they were removed from the training set, the test set, or both sets. After removal the classifiers were retrained and tested on the resulting new training and test set, respectively. The results in Table 5.4 suggest that training with the ambiguous failures does not impact the classification performance (columns “none” vs. “only train”, and columns “only test” vs. “both”). However, the ambiguous cases have great impact on the assessed performance. Removal of these cases increases the resulting AUC by 0.05. However, there might still be an influence of these ambiguous failures on the performance of the trainable combination methods. For verification we removed these cases whenever performance measures were computed (also in 10-fold cross-validation) and trained a selection of the combination methods on the complete training data and on the cleaned (i.e. without ambiguous failures) training data. The results in Table 5.5 suggest that the trainable combination methods were not biased by the ambiguous failures. A possibility for circumventing the need for dichotomization of virological response is the prediction of change in VL between the baseline value and the measurement taken 5.4 Predictive Performance of the EuResist Prediction Engine 0.8 0.6 True positive rate EV (0.797) GD (0.752) ME (0.735) combined (0.786) Stanford (0.726) 0.0 0.0 0.2 0.4 0.6 0.4 EV (0.762) GD (0.737) ME (0.736) combined (0.766) Stanford (0.726) 0.2 True positive rate 0.8 1.0 ROC curve with maximal feature set 1.0 ROC curve with minimal feature set 89 0.0 0.2 0.4 0.6 False positive rate 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 False positive rate Figure 5.6: ROC curves for the prediction engines on the test data. The left (right) plot depicts ROC curves for the individual prediction engines and the combination using the mean combiner when working with the minimal (maximal) feature set. Results are compared to the performance of the Stanford HIVdb based genotypic susceptibility score. at the follow-up time point. Logistic regression was replaced by linear regression in the individual classifiers for predicting the change in VL. Using the maximal feature set the GD (ME, EV) engine achieved a correlation (r) of 0.658±0.023 (0.664±0.023, 0.679±0.020) on the training set (Rosen-Zvi et al., 2008). The mean combiner yielded a correlation of 0.691±0.019. However, the oracle achieved r = 0.834±0.012. Although small, the difference between EV and the mean prediction reached statistical significance (p ≈ 0.005) using a one-sided paired Wilcoxon test. Results on the test set were qualitatively the same: GD (ME, EV) reached a correlation of 0.657 (0.642, 0.678) and the mean combiner (oracle) reached 0.681 (0.814). 5.4.3 Discussion The performance of the methods considered for combining the individual classifiers improved only little over the single best method on both sets of available features. It turns out that the simple non-trainable methods perform quite well, especially the mean combiner. This phenomenon has been previously discussed in literature (Kuncheva et al., 2001; Liu and Yuan, 2001). Here we focused on finding the best combination strategy for a particular task. The advantage of the mean combiner is that it does not require an additional training step (and therefore no additional data), although it ranges among the best methods studied. Moreover, this combination strategy is easy to explain to end-users of the prediction system. The learning curves in Figure 5.5 show that the mean combiner learns faster (gives more reliable predictions with fewer training data) than the individual prediction systems. Moreover, the curves show that the combined performance is not dominated by the single 90 5 EuResist: Uniting Data for Fighting HIV Engine GD ME EV Mean Feature minimal feature set ambiguous instances removed from none only Train only Test both 0.744 0.738 0.784 0.786 0.745 0.739 0.770 0.771 0.768 0.776 0.811 0.824 0.772 0.767 0.812 0.814 0.747 0.754 0.797 0.808 maximal feature set ambiguous instances removed from none only Train only Test both 0.760 0.747 0.808 0.806 0.742 0.757 0.808 0.810 0.804 0.812 0.846 0.855 0.793 0.791 0.849 0.849 0.779 0.787 0.832 0.842 Table 5.4: Results on the (un)cleaned test set when individual classifiers are trained on the (un)cleaned training set. The table summarizes the results, measured in AUC for the individual classifiers, the mean combiner, and the combination of feature level when retrained on the (un)cleaned training set and tested on the (un)cleaned test set. Cleaned refers to the removal of ambiguous failing instances. best approach as the results on the full training set might suggest. Furthermore, the learning curve for the combination on the feature level indicates that more training data is needed to achieve full performance. In general, combining the three individual approaches leads to a reduction of the standard deviation for almost all combination methods. This suggests a more robust behavior of the combined system. In the cases of failing regimens, all three classifiers very frequently agree upon the wrong label, precisely in 350 of 900 (39%) failing regimens in the training data using the minimal feature set. There are two possible scenarios why the VL drop below 500 copies per ml did not take place during the observed time interval despite the concordant prediction of success by all the three engines: 1. Resistance against one or more antiretroviral agents is not visible in the available baseline genotype but stored in the viral population and rapidly selected, which would lead to an initial decrease in VL shortly after therapy switch, and a subsequent rapid increase before the target time frame (Figure 5.7 (b) lower right). 2. The patient/virus is heavily pretreated and therefore takes longer to respond to the changed regimen, or the patient is not completely adherent to the regimen, both cases lead to a delayed reduction in VL after the observed time frame (Figure 5.7 (b) lower left). Figure 5.7 (a) shows the distribution of predicted success provided by the mean combiner using the minimal feature set. There is a clear peak around 0.8 for instances labeled as success whereas the predictions for the failing cases seem to be uniformly distributed. Interestingly, the distribution of the failing cases with a VL below 500 copies per ml resembles more the distribution for success than for failure. The approach to predicting the change in VL exhibited moderate performance. In general, the task of predicting change in VL is harder, since many host factors, which are not available to the prediction engines, contribute to the effective change in individual patients. However, guidelines for treating HIV patients recommend a complete suppression of the virus below the limit of detection (Hammer et al., 2008). Thus, dichotomizing the outcome 5.4 Predictive Performance of the EuResist Prediction Engine Method minimal feature set Train Test removed from test only Single Best 0.809 (0.021) Oracle 0.935 (0.012) Min 0.817 (0.019) Max 0.807 (0.024) Median 0.820 (0.020) Mean 0.823 (0.019) Logistic Regression 0.824 (0.019) Decision Templates 0.823 (0.018) Clustering 0.822 (0.020) Local Accuracy 0.823 (0.019) removed from train and test Logistic Regression 0.825 (0.019) Decision Templates 0.823 (0.018) Clustering 0.822 (0.021) Local Accuracy 0.823 (0.019) 91 maximal feature set Train Test 0.811 0.936 0.807 0.810 0.810 0.816 0.816 0.818 0.808 0.813 0.839 0.945 0.847 0.832 0.844 0.850 0.852 0.851 0.852 0.850 (0.017) (0.014) (0.022) (0.018) (0.021) (0.019) (0.017) (0.019) (0.017) (0.019) 0.847 0.950 0.848 0.824 0.835 0.847 0.856 0.850 0.850 0.844 0.816 0.818 0.796 0.813 0.852 0.851 0.852 0.850 (0.017) (0.019) (0.017) (0.019) 0.856 0.850 0.844 0.843 Table 5.5: AUC for the combined engines on training set and test set with the ambiguous cases removed from test set and training set or test set only. The table displays the results, measured in AUC, on training set (10-fold cross-validation; standard deviation in brackets) and test set for a selection of combination approaches when trained on the (un)cleaned training set. For computation of the AUC the ambiguous cases were always removed. and instead solving the classification task is an adequate solution, since classifiers can be used for computing the probability of achieving complete suppression. 5.4.4 Conclusion The use of the maximal feature set consistently outperformed that of the minimal feature set in the combined system. Among the studied combination approaches the logistic regression performed best, although not significantly better than the mean of the individual classifications. The mean is a simple and effective combination method for this scenario. Variations in the size of the training set showed that a system combining the individual classifiers by the mean achieves better performance with fewer training samples than the individual classifiers themselves or a logistic regression using all the features of the individual classifiers. This and the consistent reduction of the standard deviation of the performance measures lead to the conclusion that the mean combiner is more robust than the individual classifiers, although the performance is not always significantly improved. Moreover, the mean is a combination strategy that is easily explainable to the end-users of the system. In this study we discovered ambiguous failures. These therapies are classified as failure but have a VL measurement below 500 copies per ml. Although these instances did significantly influence neither the learning of the individual classifiers nor the learning of the combination method, they lead to an underestimation of the performance. This suggests 92 5 EuResist: Uniting Data for Fighting HIV (a) Distribution of predicted success probabilities (b) Different viral load trajectories Figure 5.7: (a) Distribution of the predicted success for all successful therapies (blue solid), all failing therapies (red solid), failing therapies with at least one VL measure below 500 during the regimen (red dashed), and failing therapies with all VL measures above 500 (red dotted) of the mean combiner using the minimal feature set. (b) Different viral load trajectories. The top row depicts a clear success (left) and a clear failure (right). The bottom row depicts trajectories that are labeled as failures because the VL closest to eight weeks of treatment exceeds the limit of detection (horizontal line). Those cases are nevertheless often predicted as success. that clinically relevant adjustments of the definition of success and failure can result in increased accuracy of the combined engine. The training and test data comprise patients with very differing level of pre-treatment, i.e. from therapy-naïve patients to patients receiving their 15th antiretroviral regimen. Hence, it is of interest how the engine performs for those patients. Figure 5.8 a) depicts the performance of the EV engine for different groups of patients measured by 10-fold cross-validation on the training set (EuResist release from 2008/10/10). The accuracy is the highest for naïve patients, and interestingly, the AUC is the lowest for this group. The low AUC can be explained by the fact that all patients in that group receive high predicted success probabilities, and a substantial fraction of the few failing regimens might be indeed ambiguous failures. Nevertheless, the AUC rapidly increases and reaches 0.84 for the group with the highest level of pre-treatment. The accuracy, on the other hand, decreases from the initial 0.82 and stabilizes at 0.75. Again, usage of the maximal feature set shows a better performance in all groups (except for naïve patients). An older version of the EuResist engine (trained on release from 2007/08/29) was compared to the opinion of 12 international HIV treatment experts (who were allowed to use any available interpretation algorithm) on 25 cases from the test set. Only ten experts handed in all their predictions and were therefore considered. The result of this Engine vs. Experts (EVE) study is shown in Figure 5.8 b). It turned out that the prediction engine performed as accurately as the best expert (expert 8). Moreover, the EuResist prediction 93 140 5.4 Predictive Performance of the EuResist Prediction Engine expert 9 consensus expert 7 expert 3 expert 2 expert 5 0.5 40 (a) Group-wise Performance expert 1 10+ 20 1−2 expert 6 6−9 0 0 expert 10 3−5 # pretreatments ● ● ● expert 4 ● ● expert 8 ● ● ● ● case ● ● ● ●● 0.6 0.7 0.8 0.9 AUC / accuracy 1 100 80 ● ● ● ● 60 # instances ●● 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 outcome AUC accuracy 120 ● EuResist success failure (b) Result of the EVE Study Figure 5.8: Performance of the EV engine with respect to pre-treatment level (a). Bars denote the mean number of samples in the group. Green (red) denotes the mean number of successful (failing) therapies in the group. Performance is measured in AUC (solid line) and accuracy (dashed line) using the minimal (black) and maximal (blue) feature set. Result of the EVE study (b). Every row corresponds to one case, and the columns denote the true outcome of the treatment, the EuResist prediction, predictions by the 10 experts, and the consensus of the expert predictions. Green relates to (predicted) success and red to (predicted) failure. 94 5 EuResist: Uniting Data for Fighting HIV engine provides only six wrong predictions, and interestingly, the decision profile of the prediction engine is very similar to the consensus decision of all ten experts, which makes also six mistakes. 5.5 The E U R ESIST Web Service The prediction engines developed by the three institutes are hosted on servers within these institutes. Every institute is running a SOAP (once defined as: simple object access protocol) server. A SOAP protocol defines which data and in which format may be exchanged between a client and a server. In essence, SOAP is the successor of XML-RPC. The specifications are usually written down in a WSDL (web service description language) file. Modern programming languages support the construction of SOAP clients and servers on the basis of the WSDL files. Each engine has its own endpoint that conveniently provides the required WSDL file upon request (i.e. by adding “?wsdl” to the endpoint of the engine): EV engine2 , GD engine3 , and ME engine4 . Thus, once the endpoint of an engine is known, virtually everyone can build a SOAP client to send valid queries to the server. The structure of the SOAP services selected in the EuResist project allows every institution to implement the service in a preferred programming language. Moreover, the use of SOAP services allows for a variety of tools: web sites, stand-alone programs, or integration into existing programs for management of clinical data. If preferred, then only a subset of the engines can be queried as well. All the different clients accessing the array of prediction engines can be implemented without any adaptation of the underlying SOAP servers. Indeed, the EuResist prediction engine was integrated into InfCare HIV (http://www. infcare.se), a Swedish management tool for HIV data. 5.5.1 Implementation of the EV Engine The SOAP server of the EV engine is written in PHP. Functions that are specific to the EV engine – especially the computation of the genetic barrier to drug resistance – are implement in C++ and made available to PHP via a PHP extension. The original computation of the genetic barrier to drug resistance was rather slow, mainly due to the requirement of computing all transition probabilities in the mixture of mutagenetic trees. The speed up was achieved by precomputing all possible values of the genetic barrier offline. This was possible since the mixture of mutagenetic trees perceive the virus only as a binary pattern of length l, with l being the number of predefined events (Section 4.1.2). Thus, there is only a finite set of possible values the genetic barrier can take. This workaround facilitates fast response times of the web service. 5.5.2 E U R ESIST web tool The most visible portal for accessing the EuResist prediction engine is the dedicated web site (http://engine.euresist.org), which was created by one of the EuResist partners. 2 http://euresist.bioinf.mpi-inf.mpg.de/prediction/?wsdl http://srv-peres.haifa.il.ibm.com:9080/EuResistIbmWeb/services/EuResistIBMEngine?wsdl 4 http://www.informadoc.net/euresistibmserviceinterfaces.asmx?wsdl 3 5.6 Towards Prediction of Sustained Response 95 The web site is responsible for collecting the data that is sent to the array of prediction engines. Figure 5.9 displays the data input page of the web site. Mandatory information for a successful request is the HIV sequence comprising protease and reverse transcriptase. If no further information is provided, then the sequence is aligned to a consensus sequence; the list of mutations along with a predefined list of putative treatments is sent to the three individual engines. The user can also provide optional information that is available on the patient: the number of previous treatment lines, previously taken antiretroviral drugs, VL, CD4+ T cell counts, and demographic data (gender, age, mode of HIV transmission). Like in g2p-THEO, the user can influence the list of putative treatments by excluding drugs from being part of any treatment. The three engines predict response to each of the treatments on the provided list and send their results back to the web site. After receiving predictions for all putative treatments, the web site generates and displays a top 10 list of combination treatments (Figure 5.10). Unlike in g2p-THEO, the ranking is not only based on the mean predicted success but also on the range of the predictions. Precisely, the ranking criterion is: (mean_of_predictions) × (1 − range_of_predictions ), α with α = 1.3. The user can then interactively reorder the top 10 list with respect to success rate or range of the predictions only. Moreover, for drugs that are currently not supported by the EuResist prediction engine, the web site queries Stanford’s HIVdb and provides the result. The prediction results are followed by a summary of the data that was provided to the engines and an analysis of the mutations, e.g. whether they are known resistance mutations or not. At the top right of the results page the user can click on “Request detail”. The resulting page (Figure 5.11) states which engines were contacted and whether the request was successful. 5.6 Towards Prediction of Sustained Response Currently, as many other tools, the EuResist prediction engine focuses on predicting initial virological response, i.e. reduction of VL within 4-12 weeks of treatment. However, clinicians are also interested in the response to the selected treatment beyond this short period. Thus, performance of the EuResist prediction engine of inferring sustained response has to be properly assessed. For this analysis, sustained response is measured at 16-32 weeks of treatment. Sustained response data were extracted from the EuResist integrated database comprising data from Italy, Sweden, Germany, and Luxembourg (i.e. version from 2007/11/27). 5.6.1 Material and Methods Data The data for this study were extracted according to the standard datum definition presented in Section 5.1. For the comparison we used the originally employed definition that focuses on initial response without changes, i.e. response measured at 8 (±4) weeks of treatment, and a modified version with sustained response measured at 24 (±8) weeks of 96 5 EuResist: Uniting Data for Fighting HIV Figure 5.9: Data input page of the EuResist prediction engine. The form contains fields for pasting a HIV sequence comprising protease and reverse transcriptase. Alternatively, the sequence can be uploaded as fasta or mutation can be selected from a drop-down menu. In the field below one can select the drugs that should be considered when ranking different regimens. The remaining fields ask for additional data, e.g. number past treatment lines and previously used drugs. 5.6 Towards Prediction of Sustained Response Figure 5.10: Results page of the EuResist prediction engine. 97 98 5 EuResist: Uniting Data for Fighting HIV Figure 5.11: Details of the request. The page summarizes how long did the generation of the result took and which SOAP servers were queried. treatment. In both cases a success was defined as a drop of viral load below 500 copies per ml or, alternatively, a 100-fold reduction compared to the baseline value. 3,023 treatments met these requirements for initial response and 2,722 (301) were used for training (testing) the EuResist prediction engine as described in the previous section (short-term dataset). A comparable number of 2,778 treatments met the requirements for sustained response (long-term dataset). For avoiding over optimistic estimation of the prediction performance, the instances in the long-term dataset were grouped depending on whether they were also present in the short-term dataset. Precisely, dataset A comprised 1,837 treatments of the short-term training set for which a classification with respect to longterm response was available (1,540 of those share the same label). Dataset B comprised all 742 treatments for which only a long-term label was available. And dataset C comprised 199 treatments from the short-term test set for which also a long-term label was available (167 of them have the same label). Treatments in dataset A and B contributed to the training set for a long-term prediction model, while dataset C was used as an independent test set. Changes of the label from success at initial response to failure at sustained response were with 142 (17) as frequent as changes from failure to success with 155 (15) in dataset A (C). Comparative Analysis The EuResist prediction engine was used to predict the response to antiretroviral therapy for all treatments in datasets B and C. In order to rate the achieved prediction performance correctly, the results were compared to a long-term prediction engine. This engine was trained from scratch (i.e. feature selection and statistical learning) for this task solely on data from datasets A and B. The resulting model was used to predict the sustained response for all treatments in dataset C. Predictions by this model for treatments in dataset B were obtained in a cross-validation like procedure that will be explained in the following paragraph. The training procedure for the sustained prediction model was as follows. Training data for the long-term model comprised datasets A and B. Precisely, dataset B was split into 10 equally sized subsets, and dataset A plus nine of the subsets from dataset B were used as training set. The remaining subsets from dataset B operated as a test set. This procedure 5.6 Towards Prediction of Sustained Response 99 was repeated 10 times. Hence, every subset of dataset B was used as a test set once. Therefore, results can be compared to the predictions given by the short-term model for instances in dataset B. The final long-term model was trained using the complete data from datasets A and B. And response to treatments in dataset C was predicted and results compared to the short-term predictions of dataset C. Due to unavailability of predictions of the GD engine for dataset B, only the ME and EV engines were used for the comparison on dataset B. Results on dataset C include the predictions made by all three engines. As described in Section 5.2, logistic regression was the only statistical learning method for all engines. Again, the three predictions provided by the individual engines were combined using the mean. Performance was measured in AUC (area under the ROC curve). Statistical significance was computed using a Wilcoxon test for testing whether the two areas under the ROC curve are different. All experiments were carried out using both, the minimal feature set and the maximal feature set. Results on dataset C were also compared to a GSS prediction based on the Stanford HIVdb (for details see e.g. Section 4.3). In addition to the sustained outcome of the treatment, the initial response for the instances in dataset C was available. Thus, on dataset C performance of the all models is assessed with respect to both time points. 5.6.2 Results Figures 5.12 (a) and (b) show the ROC curves on dataset B. Briefly, the short-term model achieved an AUC on dataset B of 0.799±0.029 (0.823±0.05) compared to 0.799±0.05 (0.846±0.055) achieved by the long-term model using the minimal (maximal) feature set. In both cases the difference was not significant (p=0.99). When using the maximal feature set, however, the performance difference between the two models was more pronounced in favor of the long-term prediction engine. Results were qualitatively the same on the independent test set C. The short-term model achieved an AUC of 0.757 (0.799) compared to 0.77 (0.81) achieved by the long-term model using the minimal (maximal) feature set. Figures 5.13 (a) and (b) show the corresponding ROC curves. Here the long-term model is better when predicting sustained response and the short-term model is better when studying initial response. The performance of HIVdb was better for predicting sustained response than initial response. In general HIVdb performed worse than either of the prediction models. 5.6.3 Discussion In this analysis we showed that the EuResist prediction engine predicted long-term response as well as short-term response. This fact has been demonstrated previously for rules-based prediction systems (Zazzi et al., 2009). Moreover, the EuResist prediction engine performs as well as a system trained for predicting long-term response. A likely explanation is that in many cases the short-term response and long-term response share the same label (≈ 82% in dataset A and C). Interestingly, the performance for all tested systems increased when studying the sustained response. 0.8 0.6 short 0.823 (0.050) long 0.846 (0.055) 0.0 0.0 0.2 0.4 Average true positive rate 0.6 0.4 short 0.799 (0.029) long 0.799 (0.050) 0.2 Average true positive rate 0.8 1.0 5 EuResist: Uniting Data for Fighting HIV 1.0 100 0.0 0.2 0.4 0.6 0.8 1.0 False positive rate (a) CV performance on B (min) 0.0 0.2 0.4 0.6 0.8 1.0 False positive rate (b) CV Performance on B (max) Figure 5.12: ROC curves for dataset B. Curves were generated using the minimal (a) and maximal (b) feature set, respectively, with whiskers indicating the standard deviation, for the initial response model (short) and the sustained response model (long). 5.7 Effect of Modified TCE Definitions The results in Section 5.3 indicated that the standard datum definition is suitable but has serious drawbacks, the main one being that treatments are labeled as failures albeit the fact that the VL is sufficiently reduced during course of the treatment. As discussed, the factors leading to a delayed response, e.g. suboptimal adherence, usually cannot be inferred from the viral genotype. Luckily, the suboptimal labeling does not have a great impact on the learned statistical models but only on assessed performance leading to an underestimation of the usefulness of the tool in clinical practice (Section 5.3). Likewise, factors that lead to an initial response but a virological failure only a few weeks later may not be found in the viral genotype as well, for example, decreased adherence of the patient (after improved health conditions due to initial response or due to the appearance of sideeffects), archived drug resistance mutations that are not visible in the baseline genotype, rapid development of drug resistance. An improved standard datum definition could consult multiple VL measurements before labeling a treatment as success or failure. The problem with relying on multiple measurements at specific time points is the expected decrease of training samples. For instance, only about 2000 of the 3000 standard datum instances used in the previous section have also a VL available in the four months window around 24 weeks of treatment. Here we introduce two modified standard datum definitions that use multiple time points of VL measurement for dichotomizing virological response to success and failure. 0.6 0.8 1.0 101 short 0.799 long 0.81 hivdb 0.752 short (short) 0.786 long (short) 0.77 hivdb (short) 0.735 0.0 0.0 0.2 0.4 True positive rate 0.6 0.4 short 0.757 long 0.77 hivdb 0.752 short (short) 0.747 long (short) 0.737 hivdb (short) 0.735 0.2 True positive rate 0.8 1.0 5.7 Effect of Modified TCE Definitions 0.0 0.2 0.4 0.6 0.8 False positive rate (a) CV performance on C (min) 1.0 0.0 0.2 0.4 0.6 0.8 1.0 False positive rate (b) CV Performance on C (max) Figure 5.13: ROC curves for the independent test set C. Here both outcomes, initial response (dashed lines) and sustained response (solid lines) are studied for the short-term model (short) and the sustained response model (long). The performance is compared to the performance of Stanford HIVdb (hivdb). 5.7.1 Material and Methods Alternative Standard Datum Definition The basic idea behind the first modified standard datum definition is to extract from the database instances with initial and instances with sustained response separately. If there are instances that receive a label for both time points, then those with discordant labels are removed from the final dataset. For this study we use the most recent version of the EuResist integrated database (2008/10/10). The database gives rise to 5001 and 4456 instances for initial and sustained response, respectively. 3504 instances received a label for both time points; 2959 of those instances received the same label. This initial data is augmented by instances that only received a label for initial response (n=1497) or only for sustained response (n=952). The resulting dataset comprises 5408 treatment switches of which 10% are randomly assigned to a test set (n=541). The success rate was 0.747 and 0.756 in the training and test set, respectively. This definition is termed TEP (for: two end points). The second modified standard datum definition examines the area under the log10 (VL) curve after treatment start as basis for the labeling. Here, all available VL measurements between treatment start and at 24 (±8) weeks are used to compute the area under the log10 (VL) curve. By definition, at least two VL measurements are required, the time point of the first measurement is set to treatment start, and the time point of the last measurement is set to 24 weeks. The minimum expected area under the VL curve is 168 × log10 (50) ≈ 285, corresponding to the fact that all measurements are at the level of detection, i.e. 50 copies of viral RNA per ml blood. However, we usually apply a threshold 0.8 0.6 0.2 0.4 True positive rate 0.6 0.4 0.2 Average true positive rate 0.8 1.0 5 EuResist: Uniting Data for Fighting HIV 1.0 102 minimal 0.849 maximal 0.856 0.0 0.0 minimal 0.818 (0.028) maximal 0.856 (0.030) 0.0 0.2 0.4 0.6 0.8 False positive rate (a) Performance on training data 1.0 0.0 0.2 0.4 0.6 0.8 1.0 False positive rate (b) Performance on test data Figure 5.14: Performance of the EV engine when trained and tested using the TEP definition. The engine was trained with the minimal and maximal feature set. Performance in 10-fold cross-validation (a) and on an independent test set (b). of 500 copies per ml as the database features also values using an older, less sensitive VL assay. Applying the 500 copies threshold leads to a cutoff for the area under the VL curve of approximately 450. As the choice of the cutoff is not straight forward, we apply an array of cutoffs: 400, 450, 500, 600, and 700, corresponding to an average VL of approximately 240, 477, 946, 3727, and 14677, respectively. The dataset comprises a subset of instance from the sustained response dataset. Precisely, those instances with more than one VL measurement during the observation period (n=4059). An independent test set is created by randomly selecting 10% of the instances (n=406). This definition is termed AVL (for: area under VL). Prediction Engine Due to practical reasons only the EV engine is trained and tested on the datasets. For training the engine we use exactly the same features and feature selection model as described earlier in Section 5.2. For the AVL labeling, feature selection was carried out only once with a cutoff of 500. Performance is again assessed using the area under the receiver operating characteristics curve. 5.7.2 Results Using the TEP definition, the EV engine achieves an AUC of 0.818±0.028 (0.856±0.03) and 0.849 (0.856) in the 10-fold cross-validation and on the independent test set, respectively, using the minimal (maximal) feature set. Figure 5.14 depicts the corresponding ROC curves. Table 5.6 lists the AUCs achieved on training and test data using the AVL labeling. The 5.7 Effect of Modified TCE Definitions success rate minimal maximal train test train test train test 103 400 450 500 600 700 0.627 0.611 0.712 (0.019) 0.724 0.791 (0.023) 0.794 0.728 0.695 0.753 (0.028) 0.742 0.822 (0.024) 0.797 0.785 0.749 0.788 (0.035) 0.772 0.860 (0.021) 0.823 0.858 0.825 0.810 (0.032) 0.840 0.867 (0.020) 0.878 0.915 0.911 0.782 (0.040) 0.829 0.867 (0.036) 0.894 Table 5.6: Performance using the area under the log10 (VL) curve for labeling. The columns correspond to different thresholds used for dichotomizing the area into success and failure. The first two rows state the fraction of successful treatments in the dataset (train and test). The following rows show the performance of the minimal and maximal feature set measured in AUC on the training and test data. values range from 0.712±0.019 to 0.782±0.04 in the 10-fold cross-validation when restricted to the minimal feature set. Application of the maximal feature set leads to an increase of around 0.07 in AUC. Moreover, the higher the selected threshold the higher the achieved AUC. 5.7.3 Discussion The observed performance with the TEP definition is close to the one achieved after removing the ambiguous failures from the dataset in Section 5.3. This suggests that the labeling using the modified definition is clearer. A confounding factor is obviously the increase in data compared to the previous analysis. However, when the modified definition was applied to an older version of the EuResist database (2007/11/27) a similar boost in performance was observed (data not shown). Moreover, using a training set of comparable size originating from the most recent release, with labels according to initial response only, shows AUC values around 0.75 (see e.g. results in Section 5.8). Thus, the observed boost in performance mainly originates from the change in the standard datum definition. One could argue that removing the cases with discordant labels at the two time points would only leave the “easy to predict” cases in the training and test data. In general, cases are considered to be simple if the viral genotype shows few mutations, i.e. the patient is not very treatment experienced. However, the number of recorded treatments prior to the treatment switch, which is a measure of treatment experience, did not differ significantly between the time points for initial and sustained response (p = 0.6529 computed with a Kolmogorov-Smirnov test). Neither was the difference between initial response and the combined response significant (p = 0.99). Thus, a differing level of treatment experience as a source for improved prediction performance can be excluded. In general, results obtained with the AVL definition are inferior to the ones obtained using the TEP definition, probably owing to the fact that the AVL definition does not exclude instances that represent unclear cases (e.g. slow decrease in VL). And indeed, if the instances with discordant labels in initial and sustained response are excluded from the dataset the AUC reaches 0.824±0.032 and 0.837 on training and test set, respectively, with the minimal feature set and a cutoff of 500. However, the increase in AUC with increasing 104 5 EuResist: Uniting Data for Fighting HIV threshold used for the labeling indicates that the task of sorting out clear failures is not so hard to solve. The cutoffs 600 and 700 represent treatments with an average VL of 3727 and 14677, respectively. As long as there are known factors that significantly influence the response to treatment and these factors cannot be included in the statistical model (for any reason) one should select a labeling that removes (or at least reduces) the impact of these factors. One such factor is the adherence of the patient to the prescribed treatment. Also the RDI tried to exclude cases that are likely to be related to poor adherence. Precisely, if at baseline the patient had a VL of less than 1000 copies per ml and the VL increased after the treatment change by more than 300-fold (i.e. 2.5 log10 VL) then the TCE was excluded (Larder et al., 2007). The TEP definition provided a labeling involving instances that are less likely corrupted by incomplete adherence. However, inspection of the VL trajectories of a patient’s past treatments could give a clue about his or her general adherence and provide a useful covariate for predicting response to combination therapy. There are many ways to modify the original standard datum definition, e.g. by changing the outcome time points or considering more than two outcome time points. When modifying the definition of response one has to consider not to pose too many restrictions as one would drastically reduce the amount of available data. Moreover, and most importantly, the definition of treatment response must be in concordance with clinical practice: only in this case one can provide a tool that can be used in clinical routine. 5.8 Integrating Novel Drugs The work described in this section has been presented at the 7th European HIV Drug Resistance Workshop 2009, Stockholm, Sweden (Altmann et al., 2009c). The beauty of purely data-driven approaches to predict either drug resistance of the virus to single drugs or the virological response to combination therapy is that they rely solely on data for constructing a model that solves the task. These approaches, however, require a sufficient amount of data to solve the task with a certain reliability. Hence, the beauty of the approach is also its major weakness, as data for newly licensed drugs are rarely available in sufficient amounts. In the case of geno2pheno, phenotypic resistance data are required. The data, however, demands the availability of the drug to the laboratory, and this usually only the case after official release of the drug on the market. Moreover, once the drug is available, the experimental data are still cost- and labor-intensive to obtain. In the case of prediction engines that infer response to combinations of antiretroviral agents, the databases collecting the training data have to be updated with treatments comprising the novel compounds. Unfortunately, this process can take years, e.g. in the latest release of the EuResist database (2008/10/10) there exist only 119 standard datum instances featuring three novel drugs (the oldest was approved in June 2005). Owing to the problems of update ability, the data-driven approaches are doomed to lag behind the new developments by the pharmaceutical companies (unless they will agree to share their data on clinical trials to facilitate updating such tools). For geno2pheno we investigated the use of semi-supervised learning techniques for improving the prediction of HIV drug resistance in situations with only few training samples (see Perner et al. (2009) and Section 3.3). Unfortunately, a benefit of semi-supervised learning could not be confirmed for every drug 5.8 Integrating Novel Drugs 105 class. More precisely, some model assumptions of the employed semi-supervised learning methods could even be harmful for the classification performance of some drugs. A solution to the update problem of prediction engines like g2p-THEO and EuResist that is based on semi-supervised learning strategies is unlikely to be successful, simply owing to the use of drug combinations instead of single drugs as in the case of geno2pheno. Here we investigate a strategy that aims at incorporating expert-based prediction systems, which generally are rapidly updated following the release of a new drug, into the data driven approach. The approach borrows ideas from language processing where discounting is the process of replacing the original counts of events with modified counts in order to redistribute the probability mass from the frequently observed events to rare or even unseen events. For example, absolute discounting removes a constant number from the counts for all events (Ney et al., 1994). 5.8.1 Materials and Methods Standard Encodings From the latest release of the EuResist integrated database (2008/10/10) we extracted 4,538 standard datum instances (original definition as in Figure 5.1). These therapy switches were used as training data for the EV engine. Response to antiretroviral treatment was dichotomized to success and failure based on the 8 (±4) week follow-up viral load (VL). The baseline model used the minimal feature set, i.e. indicators for drugs in the regimen, mutations in the baseline genotype, and the genetic barrier to drug resistance. The extended model used the maximal feature set, i.e. additionally baseline VL, indicators for previous use of a drug, and indicators modeling previous exposure to NRTIs, NNRTIs, and PIs. Both encodings are termed standard encodings. ndGSS Encodings The alternative EV engine incorporates rules-based ratings for novel drugs as additional features. We overcome the fact that only very few real novel drugs (considered by today’s standards) are available in the training data by treating all drugs in a regimen that were approved by the FDA less than four years prior to treatment start as novel drugs; simply because these drugs were real novel drugs at the time the treatment started. The genotypic susceptibility score (GSS) for drugs that are considered novel was computed using Stanford’s HIVdb (version 5.0.0; HIVdb5) and used as an additional covariate termed ndGSS (for: novel drug GSS). This step constitutes the discounting-like part of the modeling strategy, since indicators and genetic barrier for drugs contributing to ndGSS were set to 0, i.e. with respect to the standard encoding the novel drugs were not existing in the treatment. Table 5.7 lists the year of FDA approval for the considered drugs and the year until when those drugs were considered novel. Moreover, the table lists in how many instances the drugs were used and how many instances were affected by the discounting. In order to be able to distinguish between the absence of novel drugs and all the virus being resistant to all novel drugs (in both cases the ndGSS is 0), the ndGSS was set to -1 if all novel drugs were classified as resistant by HIVdb5. Moreover, as novel compounds can be expected 106 5 EuResist: Uniting Data for Fighting HIV Year FDA approval Year until novel Drug used in instances # instances changed ZDV 1987 1991 1355 0 ddI 1991 1995 1085 0 d4T 1994 1998 900 60 3TC 1995 1999 3123 133 ABC 1998 2001 752 153 TDF 2001 2003 1852 413 Year FDA approval Year until novel Drug used in instances # instances changed IDV 1996 1999 237 66 SQV 1995 1999 245 54 NFV 1997 2000 359 128 LPV 2000 2002 1726 394 APV 1999 2003 314 69 ATV 2003 2004 423 110 NVP 1996 2000 395 75 EFV 1998 2002 823 243 Table 5.7: Year of approval of the drugs by the FDA and year until the drugs were considered novel. The last two rows list the number of standard datum instances in which the drugs were used, and the number of instances that were affected by the discounting-like step. to be more powerful drugs, we introduce a weighting factor (wf) for the ndGSS feature. Precisely, the ndGSS feature is multiplied by the weighting factor during the prediction process. Performance Assessment Performances of standard and the ndGSS encodings were compared in a 10-fold crossvalidation on the training data. In this setting we seek to investigate whether the changes applied to the standard encoding have a serious impact on the performance with respect to the established compounds. To this end, the folds making up the training data for the ndGSS encoding were used as is to train the classifier, the test fold, however, comprises the standard encoding. Thus both approaches are tested on the same encoding (i.e. the standard encoding). An additional 119 standard datum instances containing real new drugs (DRV n=59, TPV n=52, ETR n=4, DRV+ETR n=4) were extracted from the EuResist database for independently assessing performance of the ndGSS encoding. The ndGSS for these new drugs was computed using HIVdb5 as well. Performances of the ndGSS encodings were compared with the performance achieved by a HIVdb5-based GSS for the complete regimen. The weighting factor was optimized by five-fold cross-validation on these 119 instances. Classification performance was assessed using the area under the ROC curve (AUC). 5.8.2 Results In the cross-validation setting the baseline (extended) model using the standard and the ndGSS encoding achieved an AUC of 0.742±0.020 (0.775±0.022) and 0.735±0.027 (0.768±0.027), respectively. For comparison, HIVdb5 reached an AUC of 0.708±0.026. On the 119 instances including real novel drugs HIVdb5 achieved an AUC of 0.545 (almost random) while the baseline (extended) model using the ndGSS encoding yielded an AUC 5.9 Therapy History: Replacement for the Genotype? (a) Performance on training data 107 (b) Performance on instances with real new drugs Figure 5.15: ROC curves on training data (a) and test data containing novel drugs (b). Whiskers indicate the standard deviation. of 0.587 (0.636). Using the optimized weighting factor for ndGSS elevated the AUC to 0.608 (wf=8), 0.642 (wf=7), and 0.677 (wf=4) for HIVdb5, baseline, and extended model, respectively. 5.8.3 Conclusions The cross-validation analysis demonstrated that there was no statistical significance between the standard and the ndGSS encoding (p > 0.13 using a paired Wilcoxon test). This indicates that the precision of the system regarding established drugs is not compromised by the discounting-like approach. The baseline encoding significantly outperformed HIVdb5 and the extended encoding significantly outperformed the baseline encoding (both: p = 0.02). Performance on the 119 test instances featuring real novel drugs was disastrous. However, even the reference approach (HIVdb5) yielded a performance close to random. Both, the baseline and the extended ndGSS models outperformed HIVdb5, i.e. they could predict response better than the GSS alone. Thus, the modeling approach can be regarded as a step in the right direction. Boosting the influence of the ndGSS feature elevated the AUC of all models by approximately 0.05. This effect is probably related to the increased potency of the novel drugs. 5.9 Therapy History: Replacement for the Genotype? The vast majority of tools that predict response to antiretroviral therapy focus on the viral genotype as the major source of information. Indeed, obtaining the sequence of the viral drug targets is a standard approach for finding the best treatment for HIV patients in developed countries. However, despite dropping prices for genome sequencing, genotyping is not a standard practice in resource-limited settings, where even the standard VL assay 108 5 EuResist: Uniting Data for Fighting HIV exceeds the budget for standard care. Unfortunately, the majority of people infected with HIV live in such resource-limited countries. The viral drug targets are steadily responding to the ongoing treatment and each treatment leaves traces in the viral genotype. In fact, the genotype obtained by bulk sequencing may not provide all mutations that ever occurred in the viral population. Precisely, if mutations do not present a replicative advantage, resistance mutations may disappear from the currently predominant viral variant. For example, the predominant viral population in patients having a treatment break has usually the characteristic of the wild type virus. Unfortunately, the patient harbors previous viral variants in the form of proviral DNA in several infected tissues. This constitutes a memory of resistance mutations provoked by previous treatments. As a consequence, recycling of drugs leads to a rapid reselection of previously existing resistance mutations, which are not prevalent in the predominant viral variant. For avoiding such a short-term viral rebound, the treating clinician considers the patient’s treatment history, i.e. the previously taken drugs, in addition to the viral genotype when selecting a new regimen. Treatment history has long been recognized as clinically relevant (Bratt et al., 1998). More recently it was shown that taking all available genotypes into account improves the prediction of treatment response in heavily pretreated patients (Zaccarelli et al., 2009). Thus, a prediction model focusing on treatment history could be an effective alternative to a genotype-based model in resource-limited settings. This study aimed at investigating which representation of the treatment history is the most useful one for inferring response to combination antiretroviral therapy. And how the predictions of an history-only engine compare to state-of-the-art predictions based on the viral genotype. The research was part of Fabian Müller’s Bachelor Thesis (Müller, 2008) and only the main results are summarized in the following sections. 5.9.1 Material and Methods Data and Response Definition For the experiments we used the 2007/08/16 release of the EuResist integrated database. The definition of successful virological response was modified compared to the standard datum definition for allowing to exploit the wealth of data in the database. A success was defined if at least one VL measurement during the treatment was below 500 copies per ml. If the lowest measurement in the database exceeded the value, then the treatment was labeled as a treatment failure. Moreover, since we are interested in the impact of treatment history, a baseline genotype was not required for inclusion of a treatment in the study. Of note, the response definition used here is yet another alternative to the originally defined standard datum definition. The definition resulted in 35,149 treatments being labeled as success or failure, we refer to this dataset as HO (for: history only). Of those treatment changes, a set of 3,910 could be associated with a genotype that was obtained at most 90 days prior to treatment start. These instances were collected in a second dataset termed WG (for: with genotype). The HO dataset comprised 1689 distinct combinations of antiretroviral drugs. 5.9 Therapy History: Replacement for the Genotype? 109 Features of Treatment History The current therapy, i.e. the therapy of which the virological response has to be inferred, was encoded using 17 binary variables each representing a single drug. This encoding was used in previous sections and is termed no history. The indicators for the current therapy were augmented with features derived from the patient’s treatment history. The simplest encoding used one indicator for previous exposure to a drug for each of the 17 compounds, hence it is denoted as binary history. A more elaborate encoding takes into account the time since last exposure to the drug. Here the history is not encoded as a binary variable, but as a continuous variable ranging between 0 (never exposed) to 1 (very recently exposed). Precisely, the variable was defined as f (t, k) = ( 1− 0 1 tk 4300k 0 ≤ t ≤ 4300 , else where t is the time since last exposure in days (with a maximum of 4300) and k is a parameter controlling the decay of the influence of previously used drugs. A value of 1 for k equals linear decay, values larger (smaller) than 1 lead to a slower (faster) decay of the influence. Due to the nature of the encoding it is called continuous history. The motivation for this feature is that compounds that were taken a long time ago may have a reduced impact on the current regimen. In addition to a simple concatenation of features for the current drugs and historic use of drugs, special interaction features were introduced aiming at modeling the interaction between previously used drugs and drugs in the current regimen. For example, if the current treatment does not include any protease inhibitor, the previous use of protease inhibitors is expected to have little impact on the treatment outcome. Like in the ME engine, second order features represent the previous use of a compound and the current use of another drug from the same class, i.e. 49 (7×7) and 100 (10×10) additional features model interaction between PIs and RTIs, respectively. These 149 variables extend the 17 binary variables for the current regimen. This approach was applied to the binary and continuous history features, thus, leading to the 2nd order binary and 2nd order continuous encodings. Genotype Features For correctly rating the performance of the history-only prediction within a state-of-theart context, we compared it to a prediction based on genotype. Precisely, we applied the encoding used for g2p-THEO (49 binary variables representing mutations in protease and reverse transcriptase, 17 binary indicators for the drugs in the current regimen, and 17 variables for the genetic barrier to drug resistance) and trained a classifier on the WG dataset. Statistical Learning The features were tested with two statistical learning techniques: logistic regression and random forest. The number of trees in the forest was set to 100. The parameters of the history encoding, the cutoff in days for a compounds for being considered as part 110 5 EuResist: Uniting Data for Fighting HIV of the history, and k, which controls the decay of the influence of previously used drugs, were optimized in a 10-fold cross-validation setting (separately for both learning methods). Moreover, all history encodings were compared in a 10-fold cross-validation setting on the HO dataset. Performance of the genotype-based model was assessed in a 10-fold crossvalidation setting on the WG dataset. And for the comparison to the genotype-based model the history-only models was trained on only those instances of HO for which no associated genotype was available (i.e. HO without WG: 31,239), the remaining instances were used as a test set and split into 10 equally sized folds matching the folds of the cross-validation for the genotype-based model. In addition to a performance comparison we also studied the best way for combining genotype and history predictions. Here we focused on three strategies. In strategy one we added the history features as additional covariates to the genotype encoding (concatenation). This approach, however, limits the data used for training to the WG dataset. Strategy two used the prediction computed by the history-only model as an additional covariate in the genotype encoding (prediction feature). This approach is akin to the one used in GD engine (Section 5.2), which adds the prediction of a Bayesian network to the list of features. The third strategy simply computed the mean of the history-only and the genotype-based prediction (combined by mean). In the latter two approaches the history model could be trained on all available data (i.e. HO without WG), and the genotype-based model only on WG. Performance is assessed as the area under the ROC curve. Significance is assessed using a paired one-sided Wilcoxon Rank Sum test. 5.9.2 Results The results suggested that previous use of a drug should be considered from the first day on. Linear increase of the threshold at which drugs are considered for the history encoding resulted in a linear decrease in AUC for both classifiers (data not shown; see Müller (2008) instead). Moreover, the best parameter k for modeling the influence of past treatments on the current regimen was 1 (i.e. linear decay) for both classifiers. It turned out that values of k that are smaller than 1 impair the performance of logistic regression. The random forest classifier was more robust with respect to the choice of k. For all further experiments, k was set to 1 and drugs were considered from the first day on. Table 5.8 summarizes the performance of the different history encodings obtained on the HO dataset. Using the binary history significantly, outperformed the predictions based on the current treatment only for both classifiers (p < 0.001). For logistic regression the best encoding was the 2nd order binary encoding that significantly outperformed the standard binary encoding (p < 0.001). The random forest classifier achieved the best results with the continuous encoding, which also significantly outperformed the binary one (p < 0.02). The use of second order features seriously impaired the performance of random forest, which performed significantly worse than their standard counterparts (p < 0.001). The random forest classifier performs in general better than the logistic regression. However, the initial difference of 0.03 in AUC (no history) could be reduced to 0.008 using the best performing history features for both approaches. Table 5.9 shows the performance of the best history encoding and the genotype-based 5.9 Therapy History: Replacement for the Genotype? no history binary continuous 2nd order binary 2nd order continuous logistic regression 0.670 (0.011) 0.713 (0.008) 0.714 (0.009) 0.726 (0.008) 0.707 (0.011) 111 random forest 0.700 (0.008) 0.729 (0.007) 0.734 (0.008) 0.720 (0.007) 0.727 (0.007) Table 5.8: Mean 10-fold cross-validation performance in AUC on the HO dataset. Standard deviation is given in parenthesis. history-only genotype-based concatenation prediction feature combined by mean logistic regression 0.722 (0.027) 0.766 (0.025) 0.778 (0.025) 0.781 (0.024) 0.780 (0.025) random forest 0.746 (0.023) 0.771 (0.022) 0.793 (0.018) 0.787 (0.020) 0.794 (0.022) Table 5.9: Mean 10-fold cross-validation performance on the dataset WG achieved by the best performing history-only models and the genotype based prediction and the three combination approaches. Standard deviation is given in parenthesis. predictor on the WG dataset. The genotype based prediction clearly outperformed the history-only predictions for both classifiers (p < 0.002). Again, the random forest classifier performs better than the logistic regression. Using the genotype features, however, the difference is only marginal – an effect that we observed before with g2p-THEO (Chapter 4). All three ways for combining the history information and the genotype information significantly outperformed the genotype alone (p = 0.032 for the smallest improvement: logistic regression and concatenation). None of the improvements among the different ways of combination achieved statistical significance. 5.9.3 Discussion The study showed that prediction of treatment response based on the current regimen and the genotype of the virus only can significantly be improved by including information on the patient’s treatment history. Moreover, recoding the exposure to a previously used drugs from the first day on provided the best results. We explored further features derived from the patient’s treatment history, among those were similarities of the current therapy to the preceding regimen, but none of the other encodings could improve the performance over the results presented here (Müller, 2008). A possible confounder of the analysis was the fact that the treatment history was likely to be incomplete for a large number of patients. However, repeating the analysis with a dataset comprising only patients with their treatment history completely recorded in the database, which resulted in a dataset with 17,720 treatment changes, led to the same results. The AUC for all encodings of the history (including no history) were about 0.03 higher then the results in Table 5.8 (data not shown). 112 5 EuResist: Uniting Data for Fighting HIV In our study, the history-based prediction is clearly inferior to the genotype-based prediction. Recently, other studies presented results where the difference between a genotypebased and a treatment history-based prediction were much less pronounced (Prosperi et al., 2009a; Revell et al., 2009). A potential cause for the discrepancy could be that the latter two studies aimed at predicting virological response at a fixed time point and additionally allowed the use of baseline VL as an additional covariate. In both studies, which applied random forests as a statistical learning method, baseline VL was ranked as the most important feature. However, due to a known bias of the random forest to rate continuous or categorical variables higher than variables with few categories – especially binary ones (Strobl et al., 2007), the impact of VL should be reassessed. The baseline VL might nevertheless be a good predictor for response to combination therapy, its application in resource-limited settings, however, remains doubtful; owing to the high cost of the viral load assay. Fiscus et al. (2006) investigate alternatives of the established VL assay that can be used in regions with limited resources where response to antiretroviral treatment is still typically monitored using the CD4+ T cell counts. In an ongoing work Saigo et al. (in preparation) use a tailored machine learning approach based on subsequence mining (Nowozin et al., 2007) that tries to exploit the order in which drugs were applied in the patient’s history. Here the value of genotypic information over treatment history alone could be confirmed using the original standard datum as response definition. Moreover, interpretation of the model revealed that the information about initial response (success or failure) of treatments in the past plays an important role. To conclude, the study demonstrated that response to treatment can be predicted with good performance based on treatment history alone. The best representation of the treatment history was different for the two evaluated statistical learning methods. Such historybased models might be an option for resource-limited settings and are currently subject of further investigations (Prosperi et al., 2009a; Revell et al., 2009), and (Saigo et al., in preparation). Such models will certainly be useful as soon as the multitude of antiretroviral drugs is also available in resource-limited settings. So far, the WHO addressed the inability to provide personalized HIV treatment to those patients by issuing simplified treatment protocols that cover first- and second-line treatment (Gilks et al., 2006). It still remains a question whether the performances of the history-based models assessed on databases reflecting the pandemic in the developed countries can be transferred to the situation in resource-limited settings; with the majority of viruses originating from other subtypes than B. Moreover, the performance of the history-based models is assessed on treatment change decisions that are usually genotype-based, either on a baseline genotype or a historic genotype. Thus, the genotypic information is implicitly contained in the drug combination that the model is asked to assess and therefore may lead to an overestimated performance in the comparison to genotype based methods. Therefore, either prospective studies or retrospective analyses on treatment changes that were not based on genotypes are required to assess the usefulness of history-only-based prediction models. 6 Planning Sequences of HIV Therapies In the previous chapters we demonstrated that predicting virological response to combination antiretroviral therapy is a feasible task. The resulting models, however, focus solely on the efficacy of a regimen, which is only one aspect of a putative treatment the clinician has to assess. The other side requiring consideration is the effect of the regimen on the virus (or the whole viral population); specifically, which mutations will the regimen provoke and how will these mutations affect the efficacy of future treatments. Patients infected with HIV cannot be cured currently since the virus integrates its DNA into the genome of the host cell, thus making a life-long therapy a necessity. For ensuring the best use of the currently available panel of antiretroviral drugs it is essential to apply them in an order that avoids or at least minimizes cross-resistance effects and exploits potential resensitization effects. Jiang et al. (2003) introduced the notion of future drug options (FDO) based on resistance of the virus against single drugs and complete drug classes as assessed by a rulesbased interpretation algorithm. The FDO measure was applied to data from the clinical trial ACTG 364 (Katzenstein et al., 2003) conducted by the AIDS clinical trial group (ACTG) and focused on assessing the impact of baseline phenotypic resistance on treatment length. The trail comprised mainly treatments with two NRTIs and the PI NFV and/or the NNRTI EFV. Difference in the FDO between the viral genotype before treatment start and at virological failure was studied for determining the most effective regimen in terms of resistance cost. When corrected with respect to the time to virological failure, EFV based treatments exhibit a higher resistance cost than NFV or NFV+EFV based regimens. The study clearly demonstrates the use of the FDO concept for devising general treatment guidelines. For using this approach to develop a personalized treatment plan, however, one has to compare the current predominant viral population in the patient to the makeup of that population at failure of the possible regimens. Various approaches have been undertaken to stochastically simulate the evolution of the viral population during treatment, for examples see Deforche et al. (2008); Prosperi et al. (2009b); Perelson (2002); Ribeiro and Bonhoeffer (2000) and references therein. These approaches focus on modeling of the infection dynamics of HIV, beginning with a simple hunter-pray dynamics up to models that consider changes in the viral environment affecting its replicative success, for example the presence of antiretroviral drugs. The latter models qualify for approaching the problem of estimating the viral population at failure of a regimen. The computational time, however, typically required for this in silico evolution significantly exceeds the time available for interactive web services (e.g. about 3 min for a single drug combination (Prosperi et al., 2009b)). Moreover, these models are based on certain modeling assumptions and rely on parameters (e.g. rate at which new target cells are generated, death rate of infected and uninfected cells) that are hard to estimate from available biological data. 113 114 6 Planning Sequences of HIV Therapies This chapter introduces an approach for computing the viral population at the time of treatment failure. The approach is based on finite state machines and is scalable, thus eventually affording web services with moderate waiting times. The strategy for calculating personalized future drug options comprises three steps: (i) selection of putative effective regimens with methods described earlier, (ii) in silico evolution of the virus according to a particular treatment, and (iii) assessment of the FDO at failure of that putative drug combination. Sections 6.1 and 6.2 introduce the framework employed for simulating the viral evolution during treatment. Section 6.3 describes the applied mutation models along with their assumptions, and Section 6.4 presents a validation for the mutation models. Section 6.5 concludes the chapter with final remarks and a case study focusing on the four most prominent first-line regimens. 6.1 Weighted Finite State Transducers Briefly, weighted finite state transducers are a family of finite state automata. They extend the concept of simple acceptors that comprise an input alphabet (A), a set of states (Q) with directed edges between them (E), as well as initial (I) and final (F ) states by an output alphabet (B), costs for every edge and cost functions for initial (λ) and final (ρ) states. Formally, a finite state acceptor is quintuple: F = (A, Q, I, F, E), where I, F ⊆ Q and E ⊆ Q × A ∪ {} × Q, with beeing the “empty word”. Basically, finite state acceptors verify whether words defined over the input alphabet are valid or not. Here, a word is a concatenation of elements of the alphabet. The verification process starts always at the beginning of the word and in one of the initial states of the automaton. From this initial state, the elements of the word are read one-by-one: if from the current state there exists an outgoing edge that is labeled with the corresponding element, then the acceptor proceeds to the target state of the arc and is ready to read the next element of the word. A word is considered valid, only if there exists at least one initial state such that reading the word from that initial state ends in one of the final states. Analogously, a weighted finite state transducer is defined as: T = (A, B, Q, I, F, E, λ, ρ), where I, F ⊆ Q, E ⊆ Q × A ∪ {} × B ∪ {} × K × Q, λ : I → K and ρ : F → K, with K being part of a semiring (K, ⊕, ⊗, 0̄, 1̄). Thus, (K, ⊕, 0̄) is a commutative monoid with identity element 0̄, (K, ⊗, 1̄) is a monoid with identity element 1̄, ⊗ distributes of ⊕, and 0̄ is an annihilator for ⊗, i.e. ∀a ∈ K : a ⊗ 0̄ = 0̄ ⊗ a = 0̄. In addition to acceptors, transducers emit elements of the output alphabet when traversing along an edge from one state to the another. Consequently, for every valid input word, an word over the output alphabet is generated. Moreover, in weighted finite state transducers, edges, initial states, and final states are equipped with scores (over K). Hence, a valid input word and the emitted output word are associated with a total score. Finite state transducers have been used in the field of speech recognition and language translation and provide a uniform way of representing knowledge for these tasks. In natural 6.1 Weighted Finite State Transducers 115 language processing knowledge is typically modeled in the form of conditional probabilities. For example. the probability of observing a certain word w depends on the preceding words in the sentence w0 , and can therefore be represented as P (w|w0 ). In a finite state automata representation each possible sequence of words, the history, gives rise to a state; every possible succeeding word is represented by an edge leaving that state, and the probability of this word following the history corresponds to the edge weight. For practical reasons the history is limited to a fixed length, i.e. n-gram models model the probability of the n-th word given the n − 1 preceding ones. Weighted finite state acceptors are sufficient for representing such language models. Transducers, however, are required to transform instances from one alphabet to instances from another alphabet, e.g. translating a French input sentence into English. The naïve approach, a word-to-word translation, maps one French input word to an English word. Unfortunately, the translation of a single word is often ambiguous, i.e. the word can have different translations which are highly dependent of the context. As a consequence, one has to model P (e, f |e0 , f 0 ), where e is the translation for f and f 0 and e0 are the history of the French and English words, respectively. Thus, every history of English and French words corresponds to a state in the transducer, every French word together with a possible translations gives rise to an edge leaving that state, and the probability of the French word following the history and the translation to a particular English word given that history constitutes the edge weight. Since the probabilities have to be estimated from training data, which are always scarce, the distribution P (e, f |e0 , f 0 ) is decomposed into P (e|f ), P (f |f 0 ), and P (e|e0 ) representing the translation model, the French language model, and the English language model, respectively. These models can then be learned independently from different datasets and represented as individual transducers and acceptors. The algorithm compose described below constructs an approximation to the P (e, f |e0 , f 0 ) representing transducer from the three individual finite state machines. For applications in natural language processing the obvious choice for a semiring for transducers is the Probability semiring: (R+ , +, ×, 0, 1). Due to computational reasons the Log semiring is preferred: (R ∪ {−∞, +∞}, ⊕log , +, +∞, 0), here all probabilities are represented by their negative natural logarithm and ⊕log is defined as: x ⊕log y = − log(e−x + e−y ). According to these two semirings the probability of events along a path from an initial state to a final state are multiplied, and probabilities of alternative paths are summed up. However, in language processing tasks the total probability of a sequence of events if often dominated by the probability of the most likely path. Thus, instead of accumulating the probabilities of all alternative paths it is sufficient to compute the probability of the most likely path. This approximation is termed Viterbi approximation or maximum approximation and is implemented by the tropical semiring: (R ∪ {−∞, +∞}, min, +, +∞, 0). Hence, many tasks can be solved efficiently by simply searching the shortest path from an initial to a final state in an automaton. Due to the computational demands required in natural language processing domains, libraries and toolkits offering efficient implementations of algorithms based on transducers are available, e.g. Mohri et al. (2000); Hetherington (2004); Lombardy et al. (2004); Allauzen et al. (2007). For the experiments presented here the RWTH FSA toolkit (Kanthak and Ney, 2004) is used. The toolkit is open source and offers a range of algorithms, C++ and Python interfaces as well as an command line tool. Many algorithms based on trans- 116 6 Planning Sequences of HIV Therapies ducers afford an on-demand implementation. Briefly, an on-demand algorithm computes the outgoing edges (and their weights) of a given state only as soon as the state is visited. For instance, if one wants to multiply all edge weights of a transducer with a scalar and afterwards process a word of the input alphabet, then the multiplication can be done for all edges explicitly before reading the word. Alternatively, in an on-demand implementation the multiplication can be carried out while the word is processed. This has the advantage that only edges of states that are actually required during a computation are updated. Kanthak and Ney (2004) give a definition of local algorithms, which is a prerequisite for the on-demand implementation. Consider an algorithm that generates a new transducer based on one or more transducers constituting its input (e.g. composition of transducers; see following section). If this algorithm is able to generate for the new transducer any arbitrary state and all its outgoing edges depending only on the information about the corresponding state(s) of the algorithm’s input transducer(s), then it has the local property. For instance, multiplication of edge weights with a scalar value has the local property: for computing an arbitrary state in the resulting transducer, only information on that state in the input transducer is needed (weights of the outgoing edges), all other states are irrelevant. The command line toolkit reads transducers in a binary and an XML format. The latter is mainly used for construction of new transducers by the user, while the former allows efficient storage and input-output operations. The following sections introduce some basic and important algorithms like the composition of two weighted finite state transducers and the extraction of the shortest path or the n-shortest paths of acceptors. 6.1.1 Composition Weighted finite state transducer composition is the generalization of nondeterministic finite state automata intersection. The algorithm is used to construct large complex automata from small and simple ones. The intersection of nondeterministic finite state machines is defined as follows. Let F1 = (A, Q1 , I1 , F1 , E1 ) and F2 = (A, Q2 , I2 , F2 , E2 ) be two finite state acceptors, then the intersection automaton is defined as: F1 ∩ F2 = F = (A, Q1 × Q2 , I1 × I2 , F1 × F2 , E), where ((q1 , q2 ), a, (q10 , q20 )) ∈ E ⇔ (q1 , a, q10 ) ∈ E1 ∧ (q2 , a, q20 ) ∈ E2 . Basically, the new automaton comprises one state for every combination of states of the two input automata (Q1 × Q2 ). Thus, each state in the resulting acceptor can be mapped to exactly one state in each of the input acceptors. Likewise, the new initial (final) states are defined by all combinations of initial (final) states of the input acceptors. Edges in the resulting acceptor are added based on pairs of edges of the input automata that have the same edge label: the source (target) of the new edge is the state representing the two source (target) states of the edges in the two input automata. When moving from acceptors to weighted transducers one has to respect the edge weights as well as the input and output symbols. Thus, let T1 = (A, B, Q1 , I1 , F1 , E1 , λ1 , ρ1 ) and T2 = (B, C, Q2 , I2 , F2 , E2 , λ2 , ρ2 ) be two weighted transducers. The composition of the two transducers is defined as: T1 ◦ T2 = T = (A, C, Q1 × Q2 , I1 × I2 , F1 × F2 , E, λ, ρ), 6.1 Weighted Finite State Transducers 117 Figure 6.1: Example of acceptor intersection and non-commutative transducer composition. States are represented as circles, initial states and final states are marked using bold and double circle boundaries, respectively. Numbers inside the circles stand for the labels. Edge labels in acceptors comprise a single element of the input alphabet, while edge labels in transducers comprise an element of the input alphabet (left) and one element of the output alphabet (right) separated by a colon. The first row shows the two input acceptors and intersection result in the last column. The second row shows two input transducers and the result of the composition left ◦ right and right ◦ left in the last column of the second and third row, respectively. where ((q1 , q2 ), a, c, w1 ⊗ w2 , (q10 , q20 )) ∈ E ⇔ (q1 , a, b, w1 , q10 ) ∈ E1 ∧ (q2 , b, c, w2 , q20 ) ∈ E2 , λ(q1 , q2 ) → w1 ⊗w2 ⇔ λ1 (q1 ) = w1 ∧λ2 (q2 ) = w2 , and ρ(q1 , q2 ) → w1 ⊗w2 ⇔ ρ1 (q1 ) = w1 ∧ ρ2 (q2 ) = w2 . The major difference (apart from updating the weights for edges, initial states, and final states) to the construction of the intersection acceptor is that edges in transducer composition are added based on pairs of edges of the two input transducers where the output element of one edge matches the input element of the other edge. Consequently, the composition of transducers is not commutative. An example of acceptor intersection (top row) and transducer composition (lower rows) is shown in Figure 6.1. Algorithm 2 provides the composition algorithm based on Pereira and Riley (1997) in pseudocode. It constructs a transducer T = (A, C, Q, I, F, E, λ, ρ) from two input transducers T1 and T2 . This algorithm, however, assumes -free input transducers and uses a queue S. The function E[q] retrieves all edges leaving the state q and functions i[e], o[e], and n[e] return the input label, output label, and target state of edge e, respectively. A problem with -containing transducers can occur when an -output edge of T1 is matched to (a sequence of) -input edges of T2 . In this case, the application of Algorithm 2 could generate redundant -paths in the resulting transducers that (depending on the semiring) could lead to incorrect path costs. Thus, all but one -path have to be removed from the composite transducers. Remarkably, this filtering can be carried out by another 118 6 Planning Sequences of HIV Therapies Algorithm 2 Weighted-Composition(T1 , T2 ) Q ← I1 × I2 S ← I1 × I2 while S 6= ∅ do (q1 , q2 ) ← Head(S); Dequeue(S) 5: if (q1 , q2 ) ∈ I1 × I2 then I ← I ∪ {(q1 , q2 )} λ(q1 , q2 ) ← λ1 (q1 ) ⊗ λ2 (q2 ) end if if (q1 , q2 ) ∈ F1 × F2 then 10: F ← F ∪ {(q1 , q2 )} ρ(q1 , q2 ) ← ρ1 (q1 ) ⊗ ρ2 (q2 ) end if for each (e1 , e2 ) ∈ E[q1 ] × E[q2 ] such that o[e1 ] = i[e2 ] do if (n[e1 ], n[e2 ]) ∈ / Q then 15: Q ← Q ∪ {(n[e1 ], n[e2 ])} Enqueue(S, (n[e1 ], n[e2 ])) end if E ← E ∪ {((q1 , q2 ), i[e1 ], o[e2 ], w[e1 ] ⊗ w[e2 ], (n[e1 ], n[e2 ]))} end for 20: end while return T transducer. To this end, the input transducers have to be preprocessed: output (input) -labels in transducer T1 (T2 ) are mapped to 2 (1 ) resulting in T˜1 (T˜2 ). Using the filter transducer FT , T1 ◦ T2 can correctly be computed by T˜1 ◦ FT ◦ T˜2 using Algorithm 2 (for details see e.g. Pereira and Riley (1997) and Mohri (2005)). Returning to the language translation example from above, one would create two acceptors, one representing the English language model Elm and one the French language model Flm . A simple translation transducer from French to English FE t can be constructed by a transducer comprising one state and edges for each French input word and a possible English translation. The operation Flm ◦ F E t ◦ Elm builds a transducer that translates French sentences to English TF →E . For translation, a single French sentence is represented as a linear acceptor f with edge labels corresponding to French words, and f ◦ TF →E generates all possible English translations of f . However, for translating f most parts of the translation transducer TF →E are not visited. Here the use of an on-demand implementation together with pruning (see following sections) results in improved computational performance. A transducer example from the domain of Bioinformatics is given by the combination of translating a sequence of nucleotides into amino acids and aligning the result to a reference amino acid sequence. For this task two transducers are required. Transducer TT r translates a sequence of nucleotides into a sequence of amino acids, and a second transducer encodes the alignment TAl . TT r can be constructed so that all three reading frames are translated, i.e. there is the possibility to read at most two nucleotides without starting the translation 6.1 Weighted Finite State Transducers 119 process (see Figure 6.2). The lower part of Figure 6.2 depicts a sketch of the alignment transducer, it supports affine gap costs (g for gap open and ge for gap extend) and one edge for each possible mismatch of amino acids, the cost of a mismatch m can, for instance, be related to the BLOSUM matrix. Let Sq,nt and Sr,A be linear acceptors representing the query nucleotide sequence and the reference amino acid sequence using the edge labels, respectively, then Sq,nt ◦ TT r ◦ TAl ◦ Sr,A computes three possible translations of nucleotides to amino acids and the cost of all possible alignments to a reference. 6.1.2 Single Source Shortest Path The previous section demonstrated how large transducers can be constructed from small ones. The information one wants to compute, however, is still concealed in these large directed graphs. In particular, the set of all possible translations of a French sentence into English is of less interest than the most likely translation. The same holds true for possible alignments. If the transducer is defined over a tropical semiring, however, then the desired information can be retrieved rather efficiently from the large automaton, since standard algorithms can be applied to solve the single source shortest path (SSSP) problem for computing the shortest path from an initial state to a final state. For example, Dijkstra’s algorithm (Dijkstra, 1959) can be applied if all edge weights are non-negative, the single source limitation can be circumvented by a slight modification of the automaton: introduction of a new single initial state that links via -edges to all previous initial states. The same modification can be applied to the final states, thus only the cost of the shortest path between the single initial and the single final state is of interest. Theoretically, as soon as the execution of Dijkstra’s algorithm extracts the final state from the queue, it can be stopped. For allowing also the more general case with negative edge weights (but non-negative circles) the Bellman-Ford algorithm (Bellman, 1958) can be used. Of note, if the semiring used by the transducer is defined over R+ , then one can safely apply Dijkstra’s algorithm. The RWTH FSA toolkit implements Algorithm 3 that was introduced by Mohri (2002). The algorithm is termed Generic Single Source Shortest Distance because some well known and widely used algorithms are special cases of this algorithm (hence generic) and depending on the semiring the term path is not pertinent for what the algorithm computes (hence distance). Mohri showed that the algorithm is correct for any semiring and queuing discipline. Precisely, given a transducer that is defined over a tropical semiring the queuing discipline determines which algorithm is executed: if the queue used in Algorithm 3 follows a first-in first-out or shortest-first principle then the algorithm is equivalent to executing Bellman-Ford or Dijkstra, respectively. Briefly, the algorithm maintains two properties for each state, d[q] and r[q] store the current distance estimate from the initial state i to state q and the total weight added to d[q] since the last time q was extracted from the queue S, respectively. Like in Algorithm 2, E[q] represents all edges leaving q, and n[e] and w[e] return the target state and the edge weight, respectively. In the tropical semiring, ⊕ is equal to min. Thus, intuitively, in this case the statement in Line 11 checks whether the current distance of n[e] to i is shorter than the current distance R from i to q plus the edge weight from q to n[e]. If this is not the case, then d[n[e]] and r[n[e]] are updated. Furthermore, if n[e] is not in the queue it is 120 6 Planning Sequences of HIV Therapies Figure 6.2: Example transducers for translating nucleotide sequences into amino acid sequences and alignment of two sequences. The notation for the transducers is the same as in Figure 6.1. Furthermore, *EPS* corresponds to (the “empty word”) and the slash in the edge label separates the edge weight (on the right) from input and output elements (on the left). The top transducer TT r translates a sequence of nucleotides into amino acids, the first two states (0 and 1) allow to read any nucleotide (n) without starting the translation process, the states (3 and 4) represent the fact that the first (n1) and second (n2) nucleotide of a codon have been read, respectively, and the arc between state 4 and 2 emits the amino acid (A) encoded by the codon n1n2n3. The bottom transducer TAl encodes an alignment with affine gap costs. State 0 has arcs for matches (X:X) and mismatches (X:Y) at cost m. Furthermore, states 1 and 2 encode deletions and insertions, respectively, with respect to the reference sequence. 6.1 Weighted Finite State Transducers 121 Algorithm 3 Generic-Single-Source-Shortest-Distance(T , i) for j ← 1 to |Q| do d[j] ← r[j] ← 0̄ end for d[i] ← r[i] ← 1̄ 5: S ← {i} while S 6= ∅ do q ← Head(S); Dequeue(S) R ← r[q] r[q] ← 0̄ 10: for each e ∈ E[q] do if d[n[e]] 6= d[n[e]] ⊕ (R ⊗ w[e]) then d[n[e]] ← d[n[e]] ⊕ (R ⊗ w[e]) r[n[e]] ← r[n[e]] ⊕ (R ⊗ w[e]) if n[e] ∈ / S then 15: Enqueue(S, n[e]) end if end if end for end while added to S so that its outbound edges can be updated later. For a detailed analysis of the running time, possible modifications of the algorithm and correctness see Mohri (2002). Given the result of any single source shortest distance algorithm, the shortest path from the source to all other states can easily be extracted. In particular, the paths leading to final states. Therefore, the best translation of a French sentence into English can be found by computing best(f ◦ TF →E ). Analog to this, the best reading frame and alignment of a nucleotide sequence to an amino acid sequence is retrieved by executing best(Sq,nt ◦ TT r ◦ TAl ◦ Sr,A ). 6.1.3 N -Shortest-Strings Due to imperfect models or violated model assumptions, the best hypothesis of an automaton, as represented by the shortest path, need not be the best solution with respect to a scoring function. If the model is sufficiently accurate, however, then the best (or a better) solution might be among the top n (i.e. most likely) hypotheses represented by the automaton. Moreover, these n-best lists can be post-processed to form a final solution. In the case of a deterministic acceptor, the n-best paths in the automaton are equivalent to the n-best hypotheses or n-best strings. In nondeterministic acceptors, however, the n-best paths may contain duplicate hypotheses (with different costs). Thus, in order to obtain n distinct hypotheses, nondeterministic automata have to be made deterministic before computing the n-best paths. The process of computing a deterministic weighted finite state automata from a nondeterministic one works analogously to the unweighted case via the classic subset construction algorithm (Aho et al., 1986). Unlike in the unweighted case, not all weighted automata 122 6 Planning Sequences of HIV Therapies can be made deterministic. Fortunately, any acyclic weighted automaton is determinizable (Allauzen and Mohri, 2002). Furthermore, the determinization algorithm affords a local implementation (Mohri and Riley, 2002). This makes the algorithm extremely useful for extraction of n-shortest-strings. Given an (non)deterministic acceptor F = (A, Q, I, F, E), for which we can assume (without loss of generality) that |I| = |F | = 1, one has to carry out the following steps to compute the n-best strings. First, a potential function φ is computed, which provides for every state q ∈ Q the shortest distance to the single final state. Obviously, φ can be computed by inverting all edges in the automaton and executing Algorithm 3 starting from the single final state. Then, the automaton F is determinized, resulting in the deterministic automaton F 0 = (A, Q0 , I 0 , F 0 , E 0 ). Based on the potential function φ defined for F, the potential function Φ for F 0 can be derived on demand (Mohri and Riley, 2002). This potential function for the states of the determinized automaton is the basis for the ordering of the elements in the priority queue S that is used in Algorithm 4 (based on Mohri and Riley (2002)). Precisely, the ordering of the queue is defined as: (p, c) < (p0 , c0 ) ⇔ (c + Φ[p] < c0 + Φ[p0 ]), where p, p0 ∈ Q0 , and c and c0 are the costs from the initial state to the states q and q 0 , respectively. Since Φ never underestimates the cost from the current state to a final state, one can view Φ also as the admissible heuristic that plays a central role in the A∗ algorithm (Hart et al., 1968). Algorithm 4 n-shortest-paths(F 0 , n) for p ← 1 to |Q0 | do r[p] ← 0̄ end for π[(i0 , 0)] ← NIL 5: S ← {(i0 , 0)} while S 6= ∅ do (p, c) ← Head(S); Dequeue(S) r[p] ← r[p] + 1 if r[p] = n ∧ p ∈ F then 10: Exit end if if r[p] ≤ n then for each e ∈ E[p] do c0 ← c ⊗ w[e] 15: π[(n[e], c0 )] ← (p, c) Enqueue(S, (n[e], c0 )) end for end if end while The algorithm maintains for each state of the automaton, which was made deterministic, the attribute r[p] that stores the number of times the state (p) was extracted from the priority queue S. This attribute is essential as it provides the stopping criterion for the algorithm: r[p] is initialized with 0 for all states and as soon as the single finial state is 6.2 Transducers in Therapy Planning 123 extracted from the queue n times (lines 9-11) the algorithm terminates. Paths are defined by storing a predecessor π for each pair (p, c) (Mohri and Riley, 2002). The potential function φ can be exploited in further ways together with pruning. Briefly, when applying forward pruning the search algorithm does not follow edges that lead to a total path cost exceeding a certain threshold. Mohri and Riley (2001) introduced a weight pushing algorithm that requires the potential function in order to move path weights further to the beginning (or end) of a path. Having the costs of the complete path accumulated at the beginning of the path facilitates more efficient forward pruning, i.e. one can decide not to search along paths that promise a high total cost early in the search process. 6.2 Transducers in Therapy Planning The strategy for assessing future drug options includes the computation of an estimate of the viral sequence at failure of the therapy. The baseline sequence can be represented as a linear acceptor with the edge labels corresponding to the sequence positions. For allowing an unambiguous identification, edges are labeled Tar_Pos_AA with Tar, Pos, and AA corresponding to the protein the sequence encodes (here Tar ∈ {PRO, RT}), the amino acid position, and the amino acid at that position, respectively. Given a function Mt (p, a, a0 ) that provides the probability of mutating the amino acids a to a0 at position p under the therapy t, one can easily construct a transducer representing this function. This is illustrated by the left transducer in Figure 6.3. The self loops at the initial state and the final state read single sequence positions and leave them unchanged (at no cost). All transitions from the initial state to the final state introduce exactly one mutation, e.g. at position p from a0 to a0 at the cost of Mt (p, a, a0 ). A composition of the linear sequence automaton with this mutation transducers introduces all possible mutations at all positions. Using the algorithms described above, one can retrieve the sequence with the most likely mutation or the n most likely sequences. Figure 6.4 displays a toy example. The base sequence comprises (the first) three amino acids of the protease (a). The mutation transducer (b) allows one possible mutation for each position (colored edges). The composition of the two automata results in a transducer that has one mutation for each path from the initial state to the final state (c). The most likely mutation can be extracted by finding the shortest path in that transducer (d), and the list of n-best mutations can be generated by extracting the n-best paths (e). The latter algorithm, however, is only defined for acceptors, thus the mutations (colored edges) cannot be inferred from the graph anymore. For introducing two mutations in the baseline sequence one simply has to compute the composition of the transducer in Figure 6.4 c) and the mutation transducer. Transducers constructed in a different way can be used to assess the activity of a regimen with respect to a given genotype. The right transducer in Figure 6.3 illustrates the concept of these rating transducers. In this automaton the output labels are identical to the corresponding input label for all edges. Thus, a composition of an other transducers with the rating transducers leaves the edge labels unchanged. In the rating transducer, the edges are equipped with a weight derived from the relevant rating function Rt (p, a). The rating function Rt (p, a) provides a score for amino acid a at position p in the context of treatment t. For instance, Rt (p, a) can express the resistance amino acid a at position p causes to treatment t. The initial state has a cost Rt (0) which corresponds to the offset 124 6 Planning Sequences of HIV Therapies Figure 6.3: Concept of mutation and rating transducers. The notation of the transducers is the same as in previous figures. The mutation transducer (left) for each treatment has two states, connections between these states introduce mutations. The edge weight corresponds to the score of the mutation as represented by the function Mt (p, a, a0 ). The rating transducer (right) has one loop edge for every position and amino acid, the edge weight Rt (p, a) can for example correspond to the weights for an amino acid at a certain position as derived from the linear SVM models, which predict in vitro drug resistance. The weight of the state Rt (0) then corresponds to the offset of the linear model z0 (see Eqn. 6.2). of that rating function. After computing the composition of a linear acceptor representing a sequence (short: sequence acceptor) with such a rating transducer, each edge in the sequence acceptor receives a score. The sum of all scores along a path constitutes the activity of a treatment against the sequence represented by the path. Rating transducers can for instance be used to remove ambiguities from an input sequence. In our case, ambiguities are by parallel edges between subsequent states in the sequence acceptor. In the composition of the sequence acceptor and the rating transducer these edges are extended by a score corresponding to the amino acid at the alignment position. Thus, one can arrange that the shortest path in the transducer corresponds to the most resistant variant represented by the sequence. So far the transducers only represent mutation or rating instructions for a single treatment. However, it is trivial to extend them to allow mutation and rating instructions for different treatments within one automaton. This allows to store all models within a single large transducer instead of multiple small ones. Figure 6.5 extends the toy example with a second treatment in the mutation transducer. For selecting the correct mutation function, the first edge in the sequence acceptor is labeled with the corresponding treatment. 6.3 Mutation Models The major problem with building the mutation transducer is the estimation of correct mutation probabilities (or scores). The following two subsections present approaches that use either in vitro or in vivo data for estimating mutation probabilities. The proposed models all aim at assessing mutation probabilities with respect to individual drugs (exceptions are the RTI mutation probabilities estimated from in vivo data). This approach has a drawback since mutation models for individual drugs have to be combined in some way 6.3 Mutation Models 125 (a) sequence acceptor (b) mutation transducer (c) composition (d) best result (e) 3-best result Figure 6.4: Toy transducer example. HIV sequences are represented by linear acceptors (a), the mutation transducer for each treatment comprises two states, with the connecting edges encoding the mutations (b). The composition introduces every possible mutation into the sequence (c). The most likely mutation can be retrieved via finding the shortest path (d). Likewise the n most likely mutations can be extracted by listing the n-best paths. 126 6 Planning Sequences of HIV Therapies (a) sequence acceptor (b) mutation transducer Figure 6.5: Toy transducer example for multiple treatments. The sequence information is preceded by an edge with a treatment label (a), the complete mutation transducer comprises two states for each treatment, the initial state connects to the pre-mutation state via an edge labeled with the treatment (b). for building the Mt functions that model mutation probabilities for a specific treatment. On the other hand, having mutation models for individual drugs available, enables the computation of therapy mutation model for every possible combination of drugs. If not stated otherwise the therapy mutation models are constructed as geometric mean of the mutation models for individual drugs Mt (p, a, a0 ) = Y 1 Md (p, a, a0 ) |t| , d∈t where d are the drugs used in therapy t and |t| corresponds to the number of drugs in the therapy. Since the transducers are defined over the tropical semiring this is equivalent to log Mt (p, a, a0 ) = 1 X log Md (p, a, a0 ). |t| d∈t Of note, the framework also supports the use of a weighted geometric mean: Y X Mt (p, a, a0 ) = Md (p, a, a0 )wd,t ⇔ log Mt (p, a, a0 ) = wd,t log Md (p, a, a0 ) d∈t with P d∈t wd,t = 1. d∈t (6.1) 6.3 Mutation Models 127 6.3.1 Mutation Probabilities Derived From GENO 2 PHENO The first model is motivated by the work of Beerenwinkel et al. (2003b) and assumes that mutations, which confer the most dramatic change in resistance with respect to the current regimen, are selected first during therapy. Unlike the work by Beerenwinkel et al., the model used here does not make use of the “maximum assumption” between drugs of the same class (see Section 4.1.1), but it simply takes into account the change of resistance against all drugs when a mutation is introduced. The SVM models in geno2pheno use a linear kernel to predict the decadic logarithm of the resistance factor; this allows for rewriting the regression function as a linear model (see Section 3.1). In order to simplify further steps we rewrite the linear model w ~ · ~x + b for predicting phenotypic drug resistance as: N X X log (rf(seq)) = z0 + zp,a ∗ δ(a, seq(p)), (6.2) p=1 a where z0 is the offset (before b), zp,a is the weight for amino acid a at position p ∈ {1, . . . , N } (extracted from the corresponding position in w), ~ and δ(a, seq(p)) is 1 only if the query sequence presents amino acid a at position p (representing the feature vector ~x). Because of this structure the weights zp,a can be used directly to construct the mutation function log M (p, a, a0 ). Precisely, the mutation score is defined by the difference in predicted resistance between a sequence seq and a mutant at position p exchanging amino acid a with a0 (seqp,a→a0 ): log M (p, a, a0 ) = (6.2) = log (rf(seq)) − log rf(seqp,a→a0 ) z0 + N X X zp,a ∗ δ(a, seq(p)) p=1 a − z0 + N X X zp,a ∗ δ(a, seqp,a→a0 (p)) p=1 a = zp,a − zp,a0 . The use of the raw resistance scores, however, introduces a problem since different drugs show different ranges of resistance levels. Thus, for minimizing range-related problems, the models were scaled to exhibit similar resistance levels for each drug. The scaling works in two steps and focuses on the bimodal distribution of resistance factors (Figure 3.3): first mean µ and variance σ of the susceptible subpopulation were determined for each drug and . Second, the values were scaled the resistance factor was transformed to log(rf0 ) = log(rf)−µ σ (i.e. divided by a scalar s) such that the mean of the resistant subpopulation µresist has the same value for all drugs. Note that the first transformation is similar to the z-scores used in geno2pheno (see Section 3.2). Using this scaling the score of a mutation is log M (p, a, a0 ) = zp,a − zp,a0 σ·s and a combination of Eqn. 6.1 and Eqn. 6.3 yields for arbitrary therapies log Mt (p, a, a0 ) = X d∈t wd,t zd,p,a − zd,p,a0 σd sd (6.3) 128 6 Planning Sequences of HIV Therapies with d being a drug in treatment t, wd,t standing for the weight of the drug in the treatment P ( d∈t wd,t = 1). All other variables receive and additional index d indicating that they are specific for an individual drug. This mutation model is termed g2praw . An alternative model, also derived from geno2pheno, deduces M (p, a, a0 ) from the probability that a susceptible sequence becomes resistant after acquiring an additional mutation. Precisely, given a set of sequences, let S be the number of all susceptible sequences, Sp,a ≤ S be the number of all susceptible sequences with amino acid a at position p, and Rp,a→a0 ≤ Sp,a be the number of sequences that move to the resistant subpopulation after mutating a to a0 . Hence M (p, a, a0 ) = Rp,a→a0 Sp,a (6.4) expresses the probability that a sequence becomes resistant after mutating a to a0 . Obviously, 0 ≤ M (p, a, a0 ) ≤ 1 holds and no scaling for the scores is required for achieving compatibility between different drugs. Here, we define the cutoff between susceptible and resistant viruses as the intersection of the Gaussians modeling the distribution of RF values for susceptible and resistant viruses (Figure 3.3). The probability of transcending the boundary from susceptible to resistant by accumulating one mutation clearly depends on the level of resistance the virus currently exhibits and the magnitude of resistance conferred by that additional mutation. Hence, the potential of a certain mutation is estimated from a dataset by averaging over all sequences. The resulting M (p, a, a0 ) scores are dominated by the magnitude of change in resistance conferred by the mutation from a to a0 and are therefore highly correlated with the corresponding scores of g2praw . For instance, a mutation that increases phenotypic resistance by 10 fold can turn a larger fraction of susceptible viruses into resistance ones than a mutation that increases resistance only by 2 fold. Consequently the former mutation will have a larger M (p, a, a0 ) score. Approximately 30, 000 HIV-1 pol sequences containing protease and reverse transcriptase from the EuResist database were used to estimate the probabilities. This model is termed g2ptransition . Figure 6.6 depicts scatter plots between log M (p, a, a0 ) from g2praw (x-axis) and g2ptransition (y-axis) for all drugs. The scatter plots show clearly that there is a substantial correlation between the two mutation scores. However, the plots reveal also that for some drugs the assumption, that the most resistance conferring mutation will be selected first during treatment, may lead to the preferred accumulation of rather rare mutations (e.g. RT mutation Q151M for ddI and d4T). For smoothing the probabilities with respect to the chance of a mutation to actually emerge Eqn. 6.4 is modified to M (p, a, a0 ) = Rp,a→a0 Rp,a0 ∗ , Sp,a R with R and Rp,a representing the number of all resistant sequences and the number of resistant sequences having amino acid a0 at position p, respectively. Thus, Rp,a0 /R expresses the probability of observing a certain mutation in the resistant subpopulation. This mutation model is termed g2pmixed . Figure 6.7 depicts scatter plots between log M (p, a, a0 ) from g2praw (x-axis) and g2pmixed (y-axis) for all drugs. For practical reasons the models were restricted to allowing only mutations from and to amino acids that actually occur in the training data of geno2pheno at a given position. 6.3 Mutation Models 0.4 0.8 1.0 0.00 0.10 change in log(rf) 0.15 0.15 Q151M 0.25 0.30 0.30 0.0 0.4 0.6 G48V 0.8 1.0 0.1 0.2 0.3 change in log(rf) 0.5 0.6 I50V 0.7 M184V 0.20 0.25 0.2 0.4 0.4 G190Q G190V G190A G190C G190T G190E K103SC188L G190SY188L R103N K103N 0.6 0.8 1.0 1.2 K14S Q18K I72E I15M E34IS63C S63Q S63F S63R T4QS63G S37H I66M C95L A12E E61N I20T R43K S37E S37I T80S P12S P12K E17RH63Q H63R I19T Q18I Y67L P39S T4N N69Y N12I N12A N12E R8A L63C E67W E67S E35K T19Q L63G T19V I10F R14T C63Q C63R N69T Y69Q Q63R L19R K69R E16W E16D K37H K37E K37I V15I Q18T L63F Q7D I12A V15M G94C V32G H69Y R14E N37Q L63Q S37Q Q18R L63R Q7E A37D Q18A K70T P79L M72L R41T W6G H69T D60N W6L G16A L38F I15K A63V A63E D35N E67Y K69E K70Q Q39T E12Q Q39P Q39S F53V F53I E12P E12S E12K S37T L19V R70E R41Q V33I V33M T12Q C95I K55I K14T V19Q L89S T12P N69Q A71T T4I R14Q P79H L64M S37Y D67C E37Q I19R D29E K55Q Q61E R69Y R69T I12E Q92H V63H V63D V63N V63S D67Y D67L A63H K14E T96P T12S T12K E17A Q7R Q2I I66F N98I V3I Q61N C37D T37D N37T E68R T63C H61D H61R R70N V15K R8D A63Q A63D A63N A63L A63F L19Q T72R A63S A63C A63G N37Y Y67C K45R I72F V56L I19Q I19V C95S D65K H37Q H37T H37Y R8L I82C A63R T39P T39S M93L I13R E21Q I72S T63G Q2L L33F V33L E16G L10S K14Q Q37D E35N K41T H69Q G49R G51A K41Q A74R I13V G94D K43I E17G L38M Y37D V77I I85M W67L T63F W67C E37T E37Y Q58K Q18E V63L I11R I11F L10C V82I I62V E16A T63Q T63R N98K N98D M89L M89V L15M L15I A12Q Q18L E70Q E70N E70K E70T A12P K69Y I36A L10V D65E I63P K37Q K69T R69Q N12Q T72I N12P N12S N12K M64V S37D P19R P19V W6R Q63P G6R Q61D I64V T72E A12S A12K L6R I93L Q61R H61K R70K D29A S4Q S4T V63C V63G S63P N61D N61R L89I P9L H37D I12Q I72L I12P I12S I12K R43I E61D V33F N61K Q61K E18H P9T V75I K20L L63P V63Q V63F I33M S67Y S67L V85M D29V K69Q Q92E M89S A13R A13V G48Q R63P E21K Q58E M72V L76V V63R V12Q V12P V12S V12K T36V T36L R72F R72S M46V L64V N37D K20R T74A K37T K37Y R70T P19Q K43T R55Q R55I L18H L72V I11V E61R E37D R70Q C63PH63P A63P N14Q I36V I72V N14T N14E V82C I33L I33F G73A K92R I36L L24I Q18H I84L V32I T72F K18I K18T K18R K18A K18E K18L M62V E67L D60E E72F E72S T74R E61K A74K R20S L15K M20I T72S S4N S4I E72L N55Q E63Q N55R N55K E63D E63N E63L N55I E63F E63R R16A E63S E63C E63G Q92R T63P R43T M89I A71V G48I S67C T74K T72L R20M M36A K37D M20T P10I P10F I37D G48R L24F K20S T74E T72V L10I L90V G73C A82F N7D L10F E68G F99L A74E N60E K20M E72V N7E N7R T74P G48L V10I A91I A91T K18H I82M V10F M36V T71V A74P V71I M36L V82M L54F L54M L54S E67C V82D V82G G63P I82D I82L V63P I82G G48A V82L R72L R72V L13R L13V L82A L82FI97F M63PE63P L82S A71I L90W H92R G67C T74S M46L I54L V93L V89S R20I G73S N63P D30V R20T A74S L11V I24F V89I K20I V82A K20T V82F G73T I54F T71I G48V I82A D83N I82F D30Y L54A I97L V82S I84V V20M V20S V50I I54S K74S I54M V20I V20T G48M L54V A82S N88K V47I I82S F82S L46I M46I I54A I54VN88S I54TL90M L54T S88D N88D D30N 3.0 2.5 2.0 1.5 −log(prob) 1.0 0.5 NFV 0.5 0.6 3.5 3.0 2.5 2.0 −log(prob) 1.0 0.5 I82S I54A I54T 0.7 L89V I13K 0.0 0.1 0.2 0.3 0.4 ATV S37E S37H K70Q N37Y L38M K43I T72K G48M L63S T12Q K43E D35N V15I T37N T37Y G94D I19V A63Q L64I S39Q Q19M S74P R14V N88T N12P K14V I13V I15M S37N S37Y G48Q K41N P19Q P19T P19MC63T E21Q E21K P1L H69T E37N G73C D67W K37I K37CR69H T12P E68G I66N I72R I72S A37N A37Y E35D I93V L19I D65Q M72T M72E M72K E37Y E21G R43E R43I A12Q E35N A12T V10C P39T A74T V15M T19M C37D L63T Q18R L5F Y69H Y69T S12T T74K I36A L15V A12P I20R A71T R69T S12Q M36I D67F D67L L19V K37Q Q61D H61E K69H Q35D Q35N Y67E L93F I72V A37Q A37I A37C Q92K Q18A S37I A63C T31P V62M H18L W6C N83D G16A T91I T96P S12P E72I Q61N M72I M72R M72S M72V R41T R41I S63T P19L R70Q R70K H37Y E18L T37I S37C L5R L5I N37I M93L M93F M93I M93V E37I E67D E67W V32A T72I K69T S37Q V33L Y37D E34D Q63C I66L Q12P I66M E37C E34Q N69H E17R W6R T70Q D25Y A63L T70K L63P E37Q I62V L10T I20T M36A Q18H L15I V63P N37C N69T T37C T19L N41R E17G K20V K41R L6W I63T K12Q K12T K12PP19I E7Q E70Q E70K I36K L76I L24S L10V D67C A74K E67F V64M K57S K37D T31S T91A Q63L A63S N63P L38I D29V H63T N37Q S39P T72R T72S L93I G48W G73S R20T Q39P Q18L K55N K55I S63P L10C S67D F53L R8L I85M I62M I85V I77V K57R E67L C63P L64M V10P I20M D29E D65E T80I N41T N41I T80S P19V I12K Y69Q H37I I10R H37C M36K I64M T19I T19V L15M T31A T72V A63T Q19L I3F E37D R12Q N83K G16R R12T R12P S67W S67F S67L Y67D Y67W T63P S37D H69Q A16E T37Q K41T I12Q L72I S39TK41I I12S I12T G48E R72V R69Q P79A Y67F Y67L M89V L93V H37Q H37D G51A G73A Q19I R92H G73T E58R Q63S L10P I12P K69Q D61E E72R E72S Q19V N61E A71V K20R I3V T74S L24F N83S E67C N88D L6R I63P L6C Y67C A37D T74P Q63T E58Q V32L L2Q E63P T37D L2S N37D E60N D25E V71I E72V I36V Q39T Q63P M36V L72R L72S L72V L89V D60N N69Q H63P Q37D R20M A63P G48A A74S T20M D25N I36L T71V K98N K20T L33F Q61E V33F V10R A71I I47L A74P L90M M36L I37Q G16E I37D I98N I37C S67C L10R P10R I82T N7Q A82M V36L T36L K20M I33F T10F L24I V82I L76V T71I Q92H H2P H2L H2S N30D I10Y R63P I82C R7Q I10F T8LT69Q V10Y M46V L10Y S73A S73T I89V L10F P10Y P10FV10F I82L A71L E92H A7Q V71L F66V N85V N85M R61E F66N F66M F66L F24I K74S K74P M46L I82D I82G T71L H2Q I47V V20R VK72V 20T V20M I82A S88T I84V I82M G48V V82T K72R K72S K92H V22A V82C I54L S88D I71L I50V V82L C73S V82D V82G L54F L54S T88D I32L V82M I82F I82S V82A I54F L46IL54M C73A C73T V46I M46I L54A L54T I54S A82F A82S L54V Q92E L19Q L19V I85L P63H H69T H69R L63C I47V K70Q L89I Q69K I93L N88T L19P P12I G94S I20T Q19P V11S N69H V63C N37E C37D S37Y P12E V64M N69R N69T S37I Q63T I62V D67W T37N T37E T37C I72L E34G D65Q V15M P63T D30N I72V N88K S63Q S63P K92Q K92E L63S V64IT37Y N69Y H69Y L97F Q61N S63H A22S K57S M20R R8D H61R I85M I15M Q18T R57S T37I V13M V15L A22V I13M N83K L63Q V19A V19S Q69E G16R V33F Q39P S63T L63P S37D A16G E12N L63H H63T N37Y V36I R8T S39P R43T S74A L18T K69E G16E N37I N83D A12N V63S T12N I13R Q19A K41N A63L A63V A71T E35G F10V R70K M64I I15L C67Y L19A R14Q R14E L38F E37Y I63Q I63P I63H I63T I63S K12N L19S L63T G51A I50V I85V M89I E34Q V19I E37I N61H R70Q R43K T72L R14K P12N A63C E34A C63Q C63S C63P C63H V11C K45N I10F T74A V63Q Q19S V63P V63H V11F S12N I10M V13R Y67W D67F D67L L10R M93I M93L E21G C63T E34D E34I T72V N88D N61E V63T L24I L10T D65E R69Y Y67F Y67L L64M L19I T37D R63Q R63P R63H R63T N61R L24F Q61H I10V Q69H T43K L46I Q19I S67W T4I N88S I12N L64I D88S R8P R8L E72L E72V Q61E D60E Q69R Q69T C67W A63S Q61R I11C I11F L33F A63Q Q58E L24S A63P C67F N41T M36A A63H N37D P19A P19S L36A A16R A16E I41R C67L K69H S67F E7Q E7H D35G V32I K20L G48M A63T L10M K69T L10F K69R S67L M46V T39P K41T K98N V89I I37D R6W R6S G73C V71I I75V E37D N35G L10VM36V T69Y L36V L36I A11C A11F M36I K35G Q69Y N7Q R19A R 19I I24F R19S I24S N7H G48V F53Y K65E V82I N14Q N14E N14K K69Y K34Q I33F P19I N41R D8P D8L K20R K41R K45R N60E T71D T71L A71G A71D A71L T14Q T14E T14K D12N F53L R92L V20I V20TT71G N70Q N43K R92Q R92H R92E V82D V82G V82F R20I M20I M20T V82M N85M K34A N85V I43T I43K M62I R20T H2Q V93I V93L G73T M46L F95L I82G A71V K20I I82D I82F V82S K20T V46L V46I T71VT71I I82M A71I K34D L54M K34I C6W C6S M62V M46I L90M K74A I82S F82A I54A I54L V82A S11C S11FF95C G73S I54M Q35G L54V I82A L82A 0.2 0.3 change in log(rf) 0.4 −log(prob) 3 N37C L63V 0.1 0.5 change in log(rf) Q18K 0.0 0.30 change in log(rf) 1 3 2 −log(prob) 1 I54T I54A I84V I54LI54M T82F L76V V48M 0 4 3 0 0.0 LPV 0.4 0.15 E44D T39D T35K I179F T48F D121A C162T T142L L120I H121Y K39R V35A M184L K73N K211E M178F T39L I47L A207S R104N G211R I5T A98S T200R K174T I35E A207E L193I A200K V178I L210E L210G K39N P4S K211S I142T K32T Q197N D40A R83E A173P C162F V142L K122D K102Q K11A G196E D169H D169Y C162K V202T G123E I60V P97S K70Q T215N L74M K102H I5M G123Q E219Q I142V D207N R78S R135M E200M E200I G211Q E169R H174N D177E V50L K174Q Q197L E207L K101R K83S Y171I Y171F A114V E44K T39Q E89K T35L N207H E207G Q197P T35M T211D T211W T211I T215H A138D K11N I5N E207R I47M A173M L195T N43Q N43T L100F T7S V202K W210L E204A S211D K46Q M135L R104Q R166T R166I K173E I50V Q122N Q122I A138Q A173K D207H K135R K135M V135I R43M E32K E32T Q219R T39I D192T G177Q A162S G177N G177K I142L M35E K139Q R174K Q91L K139R P9Q K139E K139I R206K R49Q A207L A207R A207G K32I G15V S69I V200Q T165P R166Q S173P V200T V200R G123T E29Q R20I N175Y D121H Q101T Q101I I184V E39Q D69I E39D E39L E39I D121E K122P C162V T200L S162N I50L N39Q Q207K N39D N39L Y127F T11R D162Y R104T S200Q S200M S200I S200T K35E K35V K35A K35S L195M L195R L195K L195I I47N G207P D185N T142F T142R T142K E200Q D121Y E200T E79K E211D N67G T165L K49Q Y162R N174Q N174G N174K N174T S68R Q211A S68N R174T S39K S39R S39N R139I T173R L35S S211W S211I S68T D44K L41M T215I R201M R172K M164L K39D K39L T27S E197Q N207K N207AG123D E35S R211Q K211D E204N M35S N211T N211K N211E I167F N68T N68Y T211G T211R T215S F210W F210L P176Q E200R Y162W S69E S68Y N123S T35E K32Q W210E Q207A W210G G18A V184W A162N V60R E174K V202I K43M K220E R11A A138V K207A K211I N104Q E36D H207K H207A N104T I35S L197R L197T S215E S215C K173N S173M S173K M135T Q197T E39P E39A Q197R E203K T166K K211W E218Y V142F T48Q T165I T173A T173S E169D E6A G93R I200Q D204A E200L N32R K173Q K197E V184T R43N E32I H174G E40K V184M K203V K203A K207S K39Q K139V K20I R32Q V142R V142K Q207S R32K R32T R32I E79D A36G A173E R135L Q207E K83E V184L S69P V200L K49E E122Q K73T T39P E122N D169G V189I E173Q E173N K197Q R199E N207E D69E K197N N207S C162A G211A K39I E174Q S48T E174T I200T R70K N39I N39P N39A K43N K207E T216S K211G R49E I181V I181F T11A N162G L135T H207E H207S I37F E219R D162R D162W R174Q T215E A138T E122I S200R A138G A138S S211G E123T E43K T7N K28D I200R V60Y S173E K138Q K82N K138D K138V M139P T142S T39A D204N D204K K211R M35V M35A S211R D69P R39Q K197P K197L I60R R173P R173M A139V R173K R39I R39P R39A E169H E169Y I2V S48F E197N I35V R11N T35S K207L K207R N219G I135A I135P H174K H174T K207G E6Q A28D Q151K E194K T11N Y162T Y162F L187F Y162C K6N K6T P52H T215C E204K I35A I142F K43Q K13I E203V S107P S107T L35V L35A C38F V178M V178F H197N H197P H197L H197R H197T N177Q R122P Q207L I142R I142K E53K N177K D169Q R201K K135L R43Q E32Q F215L F215A M16V R199K R43T V159L T139V Y162K A207P E203A D6K T7I G15R E197P E197L V142S K49T N207L N207R N207P L210S N207G Q207G I178M E40S F215Y S3R W24C S162G G18V T162F T162K T162V T69N N211S N211D N211W N211I N211G N211R R173E N68G R70Q E6S A36K E102Q Q207R A204N E102K E102H F210E F210G T173P R49T Y215N E44V K197R K197T V200A V200K Y208H G123N H175Y H174Q G70K L74I A173Q V135A V135P A173N R196Q K39P I31M V180I I178F T70Q L215N L215H D40F T70R L215I L215E I166K T70K S88G W88C L215S L215C I200L I31V S211Q I60Y T48E T173M T173K E29G T35V Q43T A162G K39A C162S I135N N162H H207L H207R N104K T35A H207G E207P Y215H D211W D211I I135R K73M K211Q E194Q M39P H64R E123D L80T K203R E218T F162A W88S R211A E197R E197T D123N N69A K28G K43T G123S D207K E6D D169K A69I A69E A69P D6N E173I E53G T173E N67E K173I A139P V215L V215A V215Y L173R L173A I167M I142S G67E L210Y S173Q E211W E211I E211G A36E S173N M16R M16I K82R Y162V I135M N147K A138E Q145K Q145E M200A M200K K102N R173Q R173N K219E K139P V135N D162T D162F D162K R166K D162C D215N T195M T195R T195K T195I V203R V203N E169G Q219D E43M V203G Y146F E40A Y146C T173N V135R G141R A173I K102M I180F K219Q E200A E218H R143S T173Q Y215I E35V E35A E44G T200A V111I L173P L173S M178L G177D Q204A N211Q Y215S T211Q Q122K Q122D E43N D6T C162N Q219N A36D C38G D207A D207E D207L H207P D207S P217Q F210S E28D A158S P55Q P55L R104K S69G E43Q N69D V21G S139V T200K N69S I132L T139P V135M L210M I135L D44V K203N L214F H208F R35M R35L R35E E169Q I109M I109V R35V R35A I109L S162H Y162A R35S I47F E203R G177E R143G L173M E211R T142M A203Q I118V N69I D123S A203D A162H S11A S11N K203G I2L K219R Q102N R207P M39A Q145C K 211A W210S H211Q E70Q N215H V167F I69E N215I I69P V167M K207P E70R E70K N215S L142F L142R L142K E70S S39D S39L Y215E V75T G134N G134T G134I E89G S48Q F115N R139V C88G K102R E122K I179M L80I S191Y D215H G18C D207R E44A E219D E219N E219H D207G R22K E203N V135L E219G R135T E200K Q219G S191A S191P S191T H208L D110N S176Q N147S K123T K123D K123N L173K V180F S191C K123S A7N D44G S211A Y215C F87S Q207P Q102M N64Q V179I E122D S200L T173I F215N E6K N219H L149I Q101H T162A V132L T162N T162S V179F N211A V106G V21I I200A E43T N177D F215H Q23H Y146S T211A V142M E203G A204K E102N T207K E102M T207A T207E T207L T207R T207S T207G S68G I31K T69A K13V D215I Y162S D215S D69G E169K L173E W210Y Q49E Q49T G70Q I179S Q219H C162G R99W K36E R28G D162V A28G E6N N177E N32K N32T R173I G67D V8I E40F E28G N67D I200K L74V K13N V178L T69S I142M Q204N S39Q T69D P4H W162H S39I S39P S39A L149F W162G N69E E122P L173Q D44A L173N F215I D211G V108A N69P S200A S200K W88G F215S E123N R169Q R169D R169H R169Y F162N T75A T75L F144Y L142S R169G F162S E45G L142M T69I Q122P Q91E K138T K138G K138S W210M I159L N32I E6T V135T A69G G155R A98G S98G V215N V215H D162A E211Q I184W S48E E211A I135T P4T D196Q S173I D207P D196E D196G K64R Y162N Y146D I178L D86E S3G E53D T69E F215E R219D Q151M V203Q T69P V203D I214L L4H L4T V148M N32Q R65K K219D K203D V148L F215C I74V Q102R F208L E218D D162S L173I C162H E123S R219N A190V D215E R162V R162A A196E G138E D215C A196G R162S S4H S4T V62T K219N V179M M164K E102R F210Y F210M T207P S139P V215I V215S R139P E203D G203Q Q200A Q200K Q177E I184M I184T I184L K138E H101E Y57K Y57N K40S H4T G203D N47F Q204K K219G P119T V179S F87L S207P I179A I179G Y162G D211Q E203Q D211R D211A K203Q P48Q V16R V16I N215E A157P P58T Y208F P58S N215C H211A I122P A117T P27T P27I C163S G134S P27S A117S F115Y K219H I31L D162N N54I R219H R219G P119S R196G V215E V215C A7I E179L I181C R125G Y113D I139V A123N I139P T162H T68G A123S T162G A14P K135T N69G T69G N147H V62P V62A R103H K36D S58T R196E V179A V179G Y208L N64R M75I A190Q F34IF34L Y116S E152G L77F V108I V106L V148G P59A Y188C R169K T75M I179E Y162H N54K Y181V Q142I Q142T Q142V Q142L Q142F Q142R Q142K Q142S Q142M A189I I164K N73T N73M G172R G212W G172K I106A V75A P119L I69G D162G P59Q Y181F K101G A107P A107T P35V P35A N220R E30R I10L I10V E104K N220E T101E R162N TR101G 101P P35S I10G F162G R45G R162G V179E Y116L Y116F V181C D162HF100I K40A K40F N22K T158S P48E A179L F162H K101N E69G K68G K125G V75L V106M K101Q R103T I179L R101N R101Q A75I K101I K101T V179L I214F R101T R101I R99G V75M V106I R162H A190C E101P Y181C R101H K101H L75M L75I H101P Y188N N65K R3G Q101E R103E Y188F T75I N166K E179D A190T M106A E30K R152G Y188H Y188S R103I V106A K103R C188N C188F C188H C188S I179D V75I V179D L100I K103H Q101P K101E A190E A179D K103T R101E K103E K103I K101P G190R R190A A190S R103S R101P R190Q R190V R190T R190E R190C R190S change in log(rf) N37T 0.3 1.2 S37Y G94C I72M E16W S37C S12P V77I V32G T72E Q61K R8V L63R D35G R43K L10V D35E A63L A63F G94R H63V H63K E34D S63V K70E R57K E16D I93M I72T T37S T19Q I19Q T19L Q92K C63A V33M K14T I72E S37N I15M R70T C95L I19L I13V K69E V13L K37Y K37S K37C I12A R69Y D67F D67V S63K S63C H69P H69L T4Q I62V V15I L90V T37Y T37C G68R A63R K69T M72T M72E I36A V33I H37V H37S R14E L64R Q63A A12E G68D P39L G48Q F53L N12Q V63A V63L E37T L19R E12R E12T H63A H63L H63F H63R H63C T91S A12R A12T C63L N69K L10S S63A S63L Q63L Q63F P19A L63T E37H R14Q L15S S63F T37N Q7A Q19R D65Q D65K V15M I64M T80K E67F E67V T12P E37V L89F R8T G40V G40E S12N K41R Q63R V63F E17G K20R I82V K14R H61N H61D L10H C37Q C37D T19R I13L A37V A37S M20I V10S D65H V3L R20Q G73A C63F C63R L10P K43E V56L K55I L63P H37Y H37NH37Q H37C E18H P19Q P19L S63R I19R W6S V77L L24S S37Q V10H M89L C67W I93F I3F E61N V63R V72S V10P T20I P39T R43E I84L N69E N69T E37Y R70K E37S E12P Q12K T96N T96I T19V Q39L K37N Y37Q I12E I12R I12T E61D Y37D A74S R70E A12P Q61N V12Q E37C D17G V12K Q61D V33LH63T T63P M64V N37Q I3N Q63T I77L A37Y A37N A37C P9L S12Q P12N I64V W67Y K20Q I33L L15I I36R I36T K69P K69L S67D S67E K14E N41Q I89S N41I L90F A63T I63V I63K I63A I63L I63R I63T I63CI63F H69R T74E Q35N Q35D Q35G Q35E I3L L6S E67R E67G E67C T26S A71M M89F C67L L19V H63P E16G K14Q I15E S63T G6W P39S L64M L10M T37Q D25N E21G S37D I66M C67Y Q39T Q39S N35E N35G V72K Q69Y L76I E37N T12N L89I L64V L76S D60E N12K H69Q V15E Q18I N69P N69L I89V Q58R N41T Q58H Q63P P19R T96S I12P K43I T96Y G48I T4P L15M I66F V71D L10R H18L M93L A63P I19V S67F K69R K55R A12N I36K V10M N37D W6A T96P Q18K I93L G63R G63T Q19V N69R A74P H37D S12K D67R D67G D67C S63P Q92H K69Q C63T C63P I72R Q18E S67V T36L K65E Q2H H69Y A71D E67W M89I P19V N69Q G48R M36I L89S F33L T80S R72V K37Q K37D R43I T72R I12N T12Q D65E E21V D67W E37Q E12N P12Q I36V N98I Q2I M36A I63P N63C A71G T37D G16A N98R N98S S74P N69Y K69Y V72L A37Q E63V E63KE63L E63A E63F E63R E63C V10R I66N L10T V71G I12Q T71A K43T I72V Q2P Q2T M89S A12Q S67G S67R S67C Q18H L89V K20S K70Q G6S K20A I82C R70Q L24M E67L E67Y E12Q A37D R20S S4Q N60D H67W H67L H67Y T1L S4TT4I S4L S79P M89V I12K D25E K20T R20A A12K M46V R20T W6P I36L E37D V36L T12K E18L T72V T70Q A22V G67L G67Y I72S R41Q V82C R43T E16A Q58E D63L D63F D63R D63T L15E R41I E72R E72V P12K M36R M36T G49R G51A V63T T69P T69L N63A N63L V93F T69R V93L E21D D67L D67Y R20M Q18L I72K E12K K20M K92H E21Q E72S T74R A11R A11C A11I L1P G63P A11V V10T W6C A71L A71I G6A F10M S67W W6R Q92L N7Q R41T N7K N7D N7L N7A K41Q D29G T74K K41I L10Y G48W S4P I11V Q7E M36K T72S I72L R72S R72K R72L K20I E72K V63P T74A D25Y L10I T10Y T10I N63F N63R N63T T74S M72R I54L G48M V32I M36V R20I L6A L6P L6R K41T L6C I50V T14Q D8R D8V D8T G34A G34T T14R G34K G34V G34E T14E G34D A91S T72K G6P G6C N98K G6R K45R G48L G73C D88S E92R G48A E72L F38L L54F N60E S4I F10R M46K N63P M72V G48E N98H S67L S67Y T71M M36L N88K T72L N70Q I82L L82M M72S H7E R7E M62I M62V N98D N55I N55R D63P T74P V10Y V10I D21Q T69Q H12Q M72K T69Y I37D H12P H12N S31T H12K Q92E V71L D29E E63T E63P M72L V71I I85L T71D D29A N43K I23V V46K V46L I45R N7E F24I V82L N30V T1P K92L T71G L24I F10T I85M Q8R Q8V Q8T H59Y S10H S10P S11R S11C S10M S11I S10R S10T S10Y S11V S10I F99L D29V N88H N88Y D29N K92E L76V N88T N88I E21K N88D Q92R T71L I82M I54F T71I L82D L82A L82G M46L V47IL11V I85V V20R I82D I82G D21K F10Y F10I V82M N88S V20Q I84V V20A V20T G73T F66N N43E NV20S 43I I23L V46I L90M L54V V82D V82G I82A N30D L62I L62V K92R G73S V20M V20I V82A H92L A7E G48V N43T M46I L46I H92R H92E I54V L54M L54S I54M I54SV82F I82F A82F L82F A82S L54A L54T V82S L82S 0.0 K70N K20T K20Q E34K E34G N88D L10R L63S A12N K20M Q63T M64I S37H R14I I64V P12K R8D L10P Q19V L93V R14T V63Q V63H V63R V63N V63D V63K V63M V63S T19R K57R C37S C37H C37N C37T T72R L64M P79T P79S I62V E34D S12A S12N P19I I36M G17E E37D M64V S67W K12Q K37N K37T KM72F 37Y R69Q Q35G H63S P39T A37H A37S Q19T Q19P Q19R D25E T20M I15V C67L R14E P12Q L63Q N12E R70Q R70K R70N I72L I20R M89L H69L E21G N69E C63H C63R I93V K69H T12P L10S A71I R43K S63Q K20V L24S W6G L10H S12E T37Y Q92H E34I D35K I13M L63T S37N M20V K37E T71M L64I L90V K43E S37T H63QV63T E34Q W6R W6S C67S A12E I72V L64V C63N C63D C63K C63M T12K S12T F99L N41T E72T N41I E72R E72I N37Y V10C G68D V33I S63T K69L K55I H37Y Q2P A12T Q63P D65E A13L A13V A13R T37E E67L N12T Q39S E16A N69R E12P E12K Y37E Y37D C37Y C37E T72I N69Y I12E K41N C63S V10L T12Q V12Q Q58K I75V T63P V12E V12T V12P V12K C67F H63T L63P A63Q L33M C95I K92H E67S T37D V19T V19P M72T M72R R8P L19V K41R S37Y V10R V11I Q19I V63P Y67L A63T V82I T70N T91A V11F P9H D25Y F53L C67W C95L C37D T19I E70Q E70K E70N H63P A12P V82S S12P I63P H69E V19R V72K G48E N37E V56L R14K L38M K37D L19T I12T I12P I12K T4I V33L G48R R43E K45N I20Q N60E I20K I20T S39P K69E V15K L19P S37E E61H E61K V10P T96P G86E L38W A12K Q58R P79A K45I E7Q I72K E67F V13R I33L I82D I82G N12P K45T A12Q E72L G16A S37D L93F Q61D I20M H69R V33M S63P Q2T N37D S74T L19R V19I D67E R72L Y69Q R41T R41I K69R D60E R20Q G48W Y67S R20K R20T E35D S39T I93F V10S N12Q S12K R12E R12T N12K K69Y Q61R T72L T72V A37N A37T H69Y A63P R20M I13L Q37E Q2D I82A N41S E72V E67W K35Q I12Q L10M D35Q T20V I20V M36K V10H I15K R8T K43N I13V K65E E61Q A37Y N98H T91S C63Q S12Q L39P L39T E12Q E35K L6P L6W E72K M36V I3F N98S H61Q D61N R41S G73C V15E T4P L10T T72K T71I A37E A37D E63Q N69Q L19I V71T I36K K41T K41I R20V I85M K18Q R43N K69Q Q2L I3V N98D Y67F N83S D25N Q37D H37E I36V K45R C63T D35N N83D T1P M46K Q39P Q39T M36A D29N K41SM89S C95S Y67W V46L V77L L10I Q61N N35G R72V Q14N Q14I Q14T Q14E Q14K V36A D63T D63P L6G R12Q M10F R12P R12K H69Q Q92L I82L G73T Q65E D67L I15E H61D R87K M36L H37D M89I Q58H I36A L24I K34Q K34I I13R V10M G51A I84L L38F A74P K35N E35Q K35G H18Q I33M P10H V36L P10M P10T P10I P10S L1P H61R I36L V71M N83K N7Q I37N I37T I91A K61D K61R I41S I91S M72I A71L V15L K43I L6R L6S V10T D83K L38I M72L R55I E35N I89V C63P R72K D12Q L66V T9H T10F M15L E63T D12K I14K E61D I77L E61R V32A R43I D67F D67S V82D L72K V82G I47K N30D V10I I10F F97L I15L D29A D29G M89V L89S M93I M93V H61N G34Q R10H G34D G34I R10S Q92R D67W A74S A74T V46I K21G M72V V82A R7Q R19I R16A N3F N3V L10F Q92E F66L F66V S73G I47L V71I D29E L90M H12Q K7Q D35G E63P H12T K7T K7E H12P K7D H12K L11F L11I M72K E61N D29V L89I M93F P10F V82L A82L I85V L20R I37Y I37E I37D K61N L46I Q58E K43T E35G K92L V10F R43T L62I L62V T71L I43T T82V K92R I23L K92E L89V L13R V32I V54S M46L S88H S88Y V22A I82F L66I V14K S88T S88N V48I S88D L76I V48Q L20Q L33F L20K L20T L20M L20V V71L V33F V54FR10M I54V M46I V66I E40G I33F T82I T82S I54S V48G R10T R10IF66I R10F I45R V48A I71L G48M I54F S73C V82F I47V L82F S73T V48E V48R T82D I50M H92LH92R V48W T82G 0.2 Y188LK103N 1.0 T91I Q18R T4L APV H92E V54T A82F T82A T82L V54A V54LV54M 0.8 0.0 3.0 2.5 2.0 1.5 −log(prob) 1.0 0.5 0.0 3.5 3.0 2.5 2.0 1.5 −log(prob) 1.0 0.5 0.0 3.5 0.6 Q2S 0.1 0.10 IDV I19R I13L N69T K14E L89M I93M K45T R14E P63N T4I E35K A12M A22S A22V E16W N69P N69L N61D Q18H Q63R Q63C S37I M64I H63T N83K L10C L23I K41N V10Y L76S N37K F53Y D37V I77V R8T L19I C37Y T72E T72R I10T L19R Y69R D60A L63R R69P P39T E35D S63R V15L A82I A12K I36L V72K R69L V82L R45N R45I I12K E68D D37T I93L L63C R20I T37Q E72F L63D M72K Q7E E37Y L46I I89S Q63D R8A S37N R70N S63C N69H I72L A63L A63I C37D C37V Q18L S63D H63P H63N N98I P19M L5F W6L V33F V82T V72I L5I L63H E16D T12A Y67R Y67G D37Q R70Q Q58K G17R S67G S67R A63R S37K I15S L38W K57I D67R D67G E67W E16A T12I H61K L15M A63C S39P I12N I12S I10V Q63H M20N I82D I82G G17A C37T I3N I36K I66L G16D P19A L19V T12M Y69K Y69E Y69T K43N I20N Q19L I13V A63D V15M S63H T37H S39T K69E T63P T63N W6A V63F M93L I10Y G16A R41S I19V L36K N12R I64V K43E V72L L63T L64I K12N K12S R69H E37D T70R T70E T70N T19F Q61H Q92K Y37T P19Q I11F C63T C63P C63N V12S Y69P Y69L E67D S37Y W67F L10S D65Q E37V E37T M72I V56L D60H R70K S63T T12K E37Q S67L R57S L63P C37Q M64V L63N L38F M89V D37H L19T P12N P12S D67L K37Y I20L E67S T72F V10F P39L H61N H61D S63P Y69H V82N L15E R41T R41I Y67L K20R K41R A63H N37Y V15E E68G E21K Q19R Q19I P79T P79S Q39S D60E K69T S S37V S37T 37D A12N S12R C37H V19T S63N K20I I19T C67F I20A I10F S67C E67R E67G K69P K69L E72M S4T S4I S37Q V62I T72M I15V H18L L89V N41R Q35G E18Q E18H D25Y Q61R Q63T A71T E61K R72F M46L R45K R45T V82A I66M A13V L33V L19F V82I I36A A12S L64V I15L V63E V63A S37H N69Q P19L P19R P19IY69Q N37D L36A N12E M46I A63T N37V Q39P D67C K37D K37V P9L R63H R63T R63P R63N L10I T70Q T70K G17D N37T Y59H Q2H I19F L10T Q58R I36V Q63P I85M W6S V12R V12E P12R I85N Q2P H69Q P79R E12Q V13R I15M A63P N37Q Q63N V63L Q2S W6P K12R K41S R20N I85V L10V Y67C T12N A63N L33F D65E A12R S39L I12R I12E L10Y R43I R8V R43S E18L P9T K92H T12S K20N E37H V82D V82G M36L I63R I63D I63H I63T I63C G17E K37Q K37T E67L L24F V19F K12E E61N D29G R69Q T96P T96S I15E V63I R43Q S67F M72L I13R G6L G6A C95W I33L N37H M36K F53S R8D K41T K41I S12E P19V P19T P19F Q39T M89I Y59D V10M S79R T1L T1P K69H W6C A82D I50V A82G L10F Q19V P12E R8G M89S Y37Q D12N D12S D12R D12E S74E Q61K N35K N35D D67F T12R I10M Q34K K43R P79L R72M E72V T72V Q92H K20L L36V R16E R16W M19L M19R M19I R16D R16A M19V K58E M19T M19F F10M P79D A12E L89I M20L E67C L89S Q19T Q61N Q65E R12Q I37Q D63T D63P D63N V93F I37Y I37D V93M I37V I37T V V93L I37H 93I S19V S19T S19F T72K I63P I63N E72K T71D K57S M36A Y67F D34T L6P G63T G63P G63N L6S L10M N98D V63R V63C R20L Q19F K35G T12E G6S G6P G6C Y37H E72I I11V K37H E61D Q61D T72I T39L K65E S88K V63D A71D A74K A74R Q39L M36V K43S E72L N7Q L82N L82A L82I L82D S74A L82G N7A N7K N7R N7E K43I D60N G49R G51A K20A K43Q T72L N12Q R20A I33V I33F E35G N88S D29N V12Q F53I L38I I54F E67F T82N N55K E63L E63I K34T I32A E63R E63D D29E E63C M47L D29H K92L Y59F N41S F99L R43T I54L L24M V54T D35G S79L S79D I32L E60N V63H K69Q A13R R55Q D29A R55T K12Q I12Q D12Q G67F Q92L T80S N35G L24I H7Q D29V I32G L6C H7E A71G T71G R72V R72K R72I M62L M62V V20T V20K V20R V20I V20N V20L V20A V89I M20A S74K N88K Q2I H12Q I43T F38I K7E H12E T80I T71V Q58E N41T N41I A71V Q92R A12Q S12Q T20R T20I F53V V63T C95I C95L K43T T12Q H2I R72L K92R S74R V63P V63N I98D C95R V89S N43Q Q8V Q8D Q8G N43E N43R N43I P12Q D94G N43S V71I A74P M12Q E63H E63T E63P E63N M47I R55N I84L C95S M62I V71L R55I Q34D Q34T S74P K34V R55K C95F T20N L90V K98D T20L Q2L I54M L90W T82A F66M T82I T82D L54M F97L T82G V46L T74E M54A M54V H2L T20A I82S T74A K34I V46I D34V N43T V47M A71I T74K N30D T74R Q92E M54T L54A L54V L54T K34E T71I F53L A71L S11V Q34V Q34I Q34E R92E T71L K92E D34I V76L T74P V76S I54A V47K I54V S73T F82C V47L V47I V82S F82M F82V I54T F82L F82T L90F G48Q F82N G73C L82S F82A F82I H92E F82D F82G D34EA82S N88D S88D G73S T82S L90M G48R G48T G48E G73T G48I G48M G48L G48A F82S I84V 0.4 0.05 change in log(rf) R14K 0.2 1.0 3.5 0.2 0.0 3 2 −log(prob) 1 0 3.0 2.5 2.0 1.5 −log(prob) 1.0 0.5 0.0 K65R 0.25 change in log(rf) 2 0.00 EFV K11R T215Y 0.25 change in log(rf) T165P R70Q S123R S68G D86H Y215D T165L K11N V148G K39Q K39N A173I N67H K211S M35A Y121D K174T G211M G211E G211T F171V F171I V60I T142R T142N T142K T142S G196K A138Q K174G K39L L195V E40F E207K F61L K64Q T39A A162D V148L T173E T173S K83G N175K K207D K207G T11Q R83E K20I T11A Y215S V142T V142R V142N V142K V142S Y146F Y146C G67E G207A S162T T7N L193F L173I P52L R20I D6A A162F R49T D6S R174T R174G K70T I132V C162W R206K K32N K64Y F214V R199K Q197T N207L N175Y E29K I167L D76E S48E S48P D162F P217Q D215E D215N K73N N57K I37N S48F S48Q G45E R172I E122A K174N K173M K13I T11K T11R T11N R78K F171L H121N Q102T D69I R72T V142L A162V A162N Q122P C162K N32I D76R S211R K166N K39P K39T E207G F214S C162A M35E S48L A162S A200Q F215L E174T H121C V184I V200Q E39Q E207D V200L V200E E39N E39L R70S T215V T200S E169R E79R E32T S105A S105T N104T D86Y H207Q Q197L H207N D186E S200Q G211A S200V S200L S200E K39A V178F M16T Q211K I200V H174L L39T L39A I167F A200L N54D L100V N174Q R172K I135K T7I P119T M35R M35I F115C I2V N67S I179S K43T L173N K6Q D169Q K6N D113E D36E K173L T142L E79Q I135A I135P R22N R174N Y215E G196E T216H E169A R135I D169G P140Q N211Q E218H N211K N211S N207E R173E Y171F Y171V Y171I Y171L R173S Q207L G15V E197P G177N A200E A122T K204E K173A L74M E86N T35S A44V A44G A44N E39P E39T K220E T173M N69E F210W V202L L197R L197H R72K A162T K122V E174G E174N Q151R Q173M Q43K Q173L Q173A Q173I I159Y E35L I142Q I35L R70K Q102M E28D C162D V180I W88S I179F A173N E203A E28K T173L Q101E E39A S173M K83E K135M H207L H207E H207K C162F N69K T173A D6Q D6N T35K N204E R101G L142M H174T E6D F87S I165S K102R W88G A211Q I179M K102N V50L S215E Q207E K32I N32A E89G R135K R135A R135P R207H I47N T215F M41V Q211S E203Y M35L T35M K65N S3G G211Q E203G T195M Y215N M184L N207K T35A E203V K32A V200R P55Q I2L N39T N39A I195M L35V Q161E D192T P122K D207A T165A E169H E169Y N67A T173I I47F F215A A36Q A36K A36G Q204G Q204K Q204N Q204A I50V I37F Q11N H121E K35E K35R K35I K35L K35V R166K N104R I35V G67D A35E A35R A35I A35L A35V E29Q V135R G177Q Q173N K197P S173L I200Q H174G D39T D39A A211K A211S K173I D123G V142M Q207K E211Q E6A E211A E211K E211S E6S D123S T200M A138D E40D Y208H R70T G211K G211S A162G Y162C V202K Y146D V184M C162V E122T R43Q M178F D44V T215L S173A K13N R125G W88C R199E E219N V178L S215N S215C S215I R122Q D123R S163C R122M R122D R122P R122K R122V I90V K126E Q219S I200L T35E K73T E197Q E197N T207Q A122Q A138G T207N I31M T207L T207E T207K T207D T207AA36D A122I A122M A122D T207G A122P T200I Q197R D169K H207D T35R T35I E86D E86H E86Y D44G H207G I165T I165P I165L L210M E123Q C162N D162V M35V R32Q T211Q I200E T211A Q166I H174N T128P Q102K M139A M139V M139P F210V F210L M139S Y121N A162H D69S Q219R C162S K211R A204E V159L N64H T27P G123N T142M R166N E35V K64R V200K K166Q E197T P119S K166E S163N K65R L215A L215H L215Y L215D L215E L215N T75S L215S V111L K174Q Q207G V60S T173N G15R Q207D Y121C R135M T215A A169Q I50L E102T E102M A169G G177E V202T V202I K73M C162T I106L A36E R104K E44V C38S C38F E177D K43E N207D I60S N207G T35L S200R K32T S162G F162V F162N E197L D211G D211V F162S R172T P122V M178L E122I E44G N177Q S68T E79K V8G D44N L195M I37M K166I D162N D162T Y162W Y162K N175I D162S Q122K K219Q F116I I135M T211K T211S D110N A98E W162V W162N W162T N32T M200R M200K W162S R207Q V135I S39E D179L R207N R22K G68T D121N Q197H H64K Q211R C38G G162H A69S T35V E44N E6Q E6N D121C G211R S173I E169Q E218Y V10L P4N I165A K197N E169G H174Q V215L V215A V8I R173M Y162A L210S S162H S123N S210Y A122K A122V Y121E V75T A138E T69N M16V T200V I159V A33G S48T P176Q E203Q P176T P176L K197Q I109V Y162D Y162F R43K R43T T211R Q161H R196D F210M N177E R174Q K138E E207A T27L T69H K207A G141R N54I Q102R Q102N N67G E28A E173M I39Q N64Q M16R F208L N68V I39R I39E I39D R173L I39N N64K N64Y N64R L193I N68C I39S N68S N68G I39G L210Y E203K N43T P119L T4S S207A R101I V10I F87L C38W A139V A139P A139K A139I E122Q E211R E79D P4H P4T E122M K197T D204K Q151K N162G D204G V135K V135A V135P I132L R173A N104K E174Q V60Y A158T S39D S39G E218V E123K T215H E123D Y162V Y162N W210V A200R Y162S R166Q R166E E122D S98G K197L A203Q A203K L200R L200K A203D R43E A211R V21G E173L N207A R166I E32Q D204N D204A A158S K173N I179A E179L K36D N69P S139R G134R V75S I184L P197Q T69E P197N E122P I60Y F214L K219S N175H R207L R207E R207K R207D R207G E173A E173I H64Q H64Y T215Y K32Q K219R N69A N69D D215C T27I E70Q A14Q C162G T215D R68Y E70R R68N R68V M139K M139I E70K R45E K123N R68C E70S A14P R173I T69K V60L Y215C D204E V184L G138E R101T T215S K28A K197R W210L Y215I V189I F215H Y162T N162H H207A G155R H208F Y175I E194G A123N A179G I69S N69I Q122V S39N Q219G E218T E102K A169K E102R E102N L74V T215E D215I Q219P T200Q Q204E P197T E169K P4S T200L E218R N211R I60L T215N S69G F40D S200K I142T F215Y Q151M A33P F162T Q43T D211M D211E D211T D211A C162H D121E T27S D194K S173N V135M A200K T200E I142N I142R I142K I142S K82Q A98S H208L M211Q R143S H219E A43T A43E E53Q N67E M211A M211K F115Y I179G R173N F215D R196Q S163G F210S G177D E200R W162H I200R K142M C121E W162G E123G F116Y I142L K197H S39Q N68T S39L S39P E123S A33V A69G S163R I178F V21I S163T E123R I159L N177D A55Q V179I E203D F215S A139R R172G Y64R K219G K203D T69P V108A K219P Q138G R169Q Q138E F160L R169H R169Y I34L E48T L94I R 169K T69A R169G F215E K49E R172S L77F Q207A E194Q T69D K172G K172S E53G R219P R219G K49Q I109M I109L M135L M135T I167M F210Y N67D I142M E200K V179S T69I E122K T215C H67A Y57H Y57N V196E Y57K H67E H67D I39L P62A P59A H67S H67G F215N T215I N32Q I31L V62P I178L Y175H T75A L215I T75M L215C P197L N43E Q219E R101H V179M V179F R135L R135T E197R N192T Q91E D69G M145Q Q200K R213V P131T V31L S163I I214F M145C I214V R213G I214S I122K T122K G28A I122V T122V D162G K49R V215H N69S I200K E173N R196A R196K D211Q Y162G R196G Q43E F162G I135L E218D I118V I135T Y162H E197H H219NK219E V75A K49T T166Q T166E T166I T69S E122V Q190A Q190E N215I D28A K36E P59Q N215C A117S G134C G134N G134T A98G F215C W210M F215I P197R P197H K101N R207A D162H R101Q I106M M139R E70T R196E R68S R68G I179E K82T S191Y T200R C134S V135L S191C V135T E49R E49T N134I P48T R66K R99G N134S S39T S39A Q219N V179A I179D E194K R101E I74V V75M V111I Q49R Q49T K101R H64R D123N R103H R103E I39P I39T I39A R219E F162H D211K D211S W210S V139K V139I L139P L139K L139I H215I L139R D219N V215Y V215D V215E V215N G148L H215C P69G M211S G143S V215S C163I M211R K219N N69G K101G T69G V179G S58T K82N T200K W210Y L4S L100I T75L T75I Q194K Y208F Q46K Y181S E101P V62T T139Q T101Q V203Q H182Q T101H T101E V203K I181F N186E A107T R28A V109L I179L I211S C188F G134I G134S K135L E53D Y208L T139E K135T K101I N166Q N166E V179E K101T V108I V106G T139S T139A V179D V75L G194K I214L E123N V215C D211R K101H Q101P K101Q K68T R219N V215I K82R T139V T139P R68T K101E V179L M75L A190E T139I T139K V75I V203D K53D V62A N166I Y181I T139R V139R F193I V106I A179E A179D A179L V106L H101Q Q190S R190Q H101E M50V M50L R190T T101P F100I Y181F I211R R190C M75I Y188N R101P V106M I181V M106A I69G R103I Y188C G190R Y188S Y181V K101P E190S Y188H R190V R190A R190E A190S K103R K143S K103H K103E Y188F I106A R103T H101P K103I G190T G190Q G190C G190V V106A G190A K103T G190E K103S R190S Y181C R103S I181C G190S V181C R103N C188L SQV −log(prob) 1.5 I35G K197H K197T V10I I94M R101N K73M I31K A36D A173E E174Q F124L R103T T135V K39N K39I T27P K207A A173K I35R A207G P122Q E173K I135M E138A H121D K64E I179E A28N K11S K219D N123E N39LK135T S200V E53Q D204E N103T G177Y Y121N D169N E207T P122E E32R P1T K35R K35G Q91H V106M E200K F87L K101G R174K E39K E39N E39I A162N A162C A162S W210Y W210F R49Q I180F D207L G190S R11T F116S G190V K197E Y121H H174K N162I I202T K207G E197Q E101G W212L R211M R211G N173E N173K M16I E177G M35G K32R K22N E203D Q102K K6E A35G G123S K101Q R211K V202L K219H E39L E177Y D194A D194V T200V S48T Q211A Q211V Q211D Q211E L173K D162I A105S K28Q K28E K103T N174Q G138K G190R D207E K83R V179L M135R N175H A39K M35R K101P R172K S3C K219G D6E E86N D215E R173E T211A T211V K135V V179D S215D I200V K154R P95Q K166R G99R R64Q V200A M39K M39N M39I T173E K20L S211A S211V P4S A39N A39I N207L I184M N81K S211D K174E L35G S211E V178L V60I D53Q A162I T173K Y121D R135T E102K I5V K204E G68N T211D T211E E101Q G211A G211V D203K N204E R173K N57K Q23H P197E R172S E200V I106V T139K N68V T139R T139I R135V D123K A139R A139K A139I K35V S158A D123G N64Y I39L C121N C121H K122Q S27T S27P A35R S39K I173E K49Q G211D D179I D179G K122E T39M Q151M G196K M210S E36A A114P V50I S215E L47V L47I S207T G211E V180F M39L I173K D207T K197Q K211A K211V T35Q M135T T69P K64Q D194E I8V D218E K211D I94L A28Q K211E E36D D179E V111I T69D I132V K194A K194V R39L K194E H101G G162Y T35S D169R M135V C162I S173E E101P K70N I35V Q203DQ203K R101G N113D I135R Y115F S200A S162I R64Y I165T R102K R174E K174Q K200V G141R T84I R101Q R101P S39N S39I N207E F210S K39L Y144F K64Y I106M G177N R206K Q207L L35R E196K E48T D69N S173K T69N G207L G207E E200A N177D N104R T200A D169E W210M L197E D79E V118F I200A I2V M178L N207T G207T A39L P1L A179I K169R A179E A6E K169E A179G K44Y L4S K104R R82K L135I D86N A207L E203K D123S V202T E177N I135T Q207E L205F E123K R174Q E123G R211A R211V G190C R207L C88S I166R R211D T215V R211E G68V I135V N70Q N68K H207L A207E V179G A138K K219T E138K V179I R20L L188Y C121D S11T K219R A162Y K11T S39L K22R Q190E R207E T35G S190C A28E N32R L135M Q207T V184T A69S H207E V179E M35V A98G S68T K207L V90I V21F M139R M139K M139I A204E V16I T4S N123K N123G A207T E123S H101Q T35R H101P T70R C215N T162Y E40D V21I K200A T39K F77L G190E Q200V Q200A P39K P39N P39I E67S P197Q R207T H 197Q H197E T39N T39I G68K K42E T69A E70Q T69I H207T R35V V75T T69E S98G V118I R219E L215F H211A H211V H211D H211E A62P K207E I195L G177D I178L A35V L135R C162Y S162Y K70Q N69A G190A K207T G67D E154R F188Y H219T H219R D28E E177D S137N A69G L35V L135T L135V H174E T215S N69I A106V T58N C181Y N123S W88C H188Y R190C S190E W88R T39L N43T N43E T58I T215D T69S K219E D185E N69E A106M Q217P Q70R T32R I181Y A203K A165T S58N S58I P69S P69G G134S T69G K43Q S190A F116Y T215E S215N T35V N219Q T158A V215D V215E K36A K36D V215N N219E D67N V215S H174Q Q43N Q43R S68N R43T E219Q Q190A L197Q C215I E215N R219Q S68V D162Y N69S L210W K219Q L210Y L210F D40F I215F N162Y E49Q K44E L210M T75I S68K N101Q P39L N101P N101G W88S N69G R190E R190A V215I L210S Q43T N70R W210S T105S S215I T215N D69A D69I V184M I74L I100L R43E K43N C134S K43R D44Y G70R E70R D69E K70R K40F C215F D215N T215I D67S E215I L184M M75T L75I M41L D69S K43T V75I N67S V74L D69G V181Y D44E N215F S215F E40F T215F D215IH219E K43E Q43E H219QP62V A62V E215F D215F F215Y V215F G67N M75I S215Y E215Y L215Y G67S C215Y A40F D215Y N215Y I215Y V215Y 0.0 M184V 1.0 V75M C162S 0.20 M184V 0.20 C162N I135V K101E G211R M35I D121N S68G E39S I142Q V21I I35V K43N E207T Q101R H121N M142V V180M T211K T211N D36A G123S I8V K135L S162D S162G M35V T135L K204E C162A T11K I165T K211N A39S V106M R173L K122P M39S H174R K122E T166R K39S L142M L142V S162Y Q174R Y181V V179D R135L A162D A162G E211N E79K R173K Q207T L35I L35V D204E M135L R207T A173L K207T D207T A162Y S173L S211K E173L G207T N175Y E207A C215V C215D K22R K35V G177D Q211K K83R T173L I106M N207T K174R T195L M210S A139R L39S N69I I173L V60I V135L G45E K104R S173K T173K K166R A173K E173K D113E C134S S200E G99R E203K V75M Q211N E203D E177D T139R K20R S27T D207A V202T D162Y D44A K70T I202T T39S C88S E36A Y121N Q204E G211K R200E E215N G207A S211N D123G I173K I195L I142M Q122P Q122E K207A N123G D218E T35I R199G R211K R207A K101R K139R H207T V200E C162D I200E I166R Y208H S207A A211K C162G D40F A35I A35V W88S T142M T107S D123S Q207A A33V C181I K66N E48T E48S I135L V50I C162Y R206K N173L N173K N147H C145Q V122P V122E A211N N104R N123S K172R H175Y E138A T35V H162D H162G D67G K70R T69D A200E I180M E40F G211N I142V R211N T200E Q46K Y188L L215F T4S N177D F124L K49R K200E N162A H162Y T142V A123G A123S V111I Q173L A190S D186N K103S K219N R35I R35V D186E Y115F P4S P133L R103S E123G A106M W162D W162Y M200E W162G N69S T58I E123S Q173K D79K N166R N207A H207A N162D N162G R162D R45E A142M R162G P39S R43E T58N N103S Q142M L75A K123G K123S G44A V75A T215C S215N G190S V179I E44A N162Y H182Q I215N L200E K40F I215F I181V A179I T215V K43E T69I E179I E101R T215D K142M D67S T69S F77L Q43E D179I D67N N43E K44A C181V I108V I100L K219G R162Y E70T E70R E215F A142V G134S L210S F116Y D215N S215F Q142V L75I L215Y E190S K142V V75I T215N R66N N188L S58N K219R C215N A62V K65R D69I F210S M41L D215F L74V E219G D69S T215F Q151M C215F L210W Q219G G67S M210W G67N F215Y Q219R E219R D215Y F210W S215Y C215Y T215Y M184L I215Y M184I E215Y I184V NVP 0.15 0.15 V202I S48T change in log(rf) change in log(rf) 1 −log(prob) 3.0 2.5 2.0 0.5 I202T 0.10 1.5 −log(prob) 1.0 0.5 M184W M184R M184L M184I 0.0 M184L Q151M M184I 0.10 ABC S163R G196A E196K K201Q R143S I94T Q151H S162H T39L R199S I35A A162R A162K A162H A162S I60L K211A E204N D123S T35K T35A T35I T35S K13E S211R A62P V111M N123Y Q145H R211E I165K N54H S163T P150S V106G F171L Y188H I200T D40G K39R P97A E123Y A162T S134I V50L R125K K102R E203K I31M S211E M178V D177E V75L Q174R I142T A36D A36N A36K A36G T215C V111L K207Q K207T K207L V142M T11A K211D S68T Y121H N123G D218T N137K K174E N67E A200S K122D E123G P1T K197T A173K A173I E44Y A200M K102G K82R I173E D86N Q219S T142L T142V I165L E40K I202T A207K K211V E40S T107P I135T K32R L209M P150T V106A I195T D185N D185E E36D E101G V135S L74V T35Q L109I K102M K46E G99R L209S K39T K39L S162T H96Y K166E C162D C162Y V108I R103N E39R E39T E39L E39S P14A A207Q E36G I178P V50I F214S T215V M135K M135R K174T I178N D67N L109R N177H Q197K T215D Y188L Q207T E138K D169Q K64N D169A C162G E36N D67E T215L V148G T142M T107S R199K E44D G177D G177E A173E L92I Q197T T200Q P14S K219E V90I D169R V135K K174Q Q219T K64H L135K A200K R104K T35L G211K G211A S123Y S123G K173E D203V T139M D203G F87L E196R K174R D207A G190Q T135V T135L Q207L E6T G51A G196E T211Q G196K I165T T211S T211R T211E K70E H174Q H174E H174T H174R A39E E40F K122T L109V L35V E204A R101N R101H T215I D123Y D67K C162R I202L T215N K70T T69D M35Q T139R K30T A207T A207L A207N A207E S215Y G51K C162K C162S C162H K103S I195L K207N K207E T11N T11K T173K T173I D40K G155R L178P L178N A138G Q122D Q122T Q122P K122P Q173K Q173I I142L K70R E36K D123G V118I C162T C181V E169D E6Q G211Q I31K T35V G211D G211V G211T G211S G211R G211E I178M G68Y A39S K219N R11N R11K R173T Q207N T215S I135V I135L R143G D218E D215S V75S D207Q D207K E89G Q91E Q145C D110N S173K S173I S173E I142V R72K K166R M164H I200Q Y115N A39R R206K R207H V179L E138D K103H D121H D40S N147H F116Y G196R N173K N173I N32R K103R V135R E32R Q207E S200E R43Q K200E E204K V179S E207H A39T S48Q A138K E89K A207H S68Y N39E N39R N39T N39L M41L N39S E197K E197T Q197H K103N K49R I94L E203V E101Q V179I E203G T173E Q101K K20R T215Y E53D E122Q A114P A211T T58N I178V A62V A200E K126E I47F Q151K E6D I63K K197H S39T K30E K64Y L34I A28Q D207T D207L A28G F124L R201Q A39L D67S L178M I142M W24R T135S A158Q K211T P4T I100V Y162G K43Q L214S I215Y I60V A138D R173K R173I F215I N68V L187F K219Q T200S T69Y E28K R172I I135S N211Q L34F E194K N211T N211S N211R N211E Q161H P140Q D67G L210V T200M F210S E169Q L210M P1L V75T R139K I63V K219S I200S E219N S68V K83R D40F E42K N43T K219T I35Q A122D A122T A122P S98E L210F E48L M75T Q91H Q173E T135K K211S V75I E169A L210W L178V K135R P150L M35L I135K D186E R172S S176P E169R A211Q I200M R101E V106M A211S R101G A211R A211E K64R N162H K207H T139V E203Q R173E K211Q F215N D215Y E29G K102E C215V C215D C215L E197H T139I K66N N177R N177D N177E T200K T139A E53K S48L L210S V106I D53Q D53K D53G K211R Q151M M35V Q207H K172I K172S E53Q K6Q K6E K6T K6G T69P D113E N67K A158P K66R V21I T69E D186N A190Q R101Q R101K I200K G68V K102Q T69I K211E D207N D207E E53G H64R T69A T135R K169Q E190Q I35L K169A K169R E101K V111I L135R L215Y L215S I180V Y162R Y162K I2V E194V P7Q E194N T69N S48P T139K V202T E28N A35V S39L I135R P4K T200E Y162H Y162S I200E N69S A158T P4H W162R W162K W162H W162T S103H S103R W162S V202L C215N C215I C215S E122D D207H S27T S190Q P101E L12I E28Q S98A Y64R I181F I181Y I132V I132L I35V A158S W210S K6D A33G V75A Y162T N43K P4S Q166I T69S M75I S48T R102E I189V I189L V122D E194A K28N E122T E122P Q161P E28G H101Q K219R N162T C215Y C181F D162G D69Y G98E E48P R102Q P101Q K194A P101G F215S K142V I39T I39L K203V K203G D162R V215N V215I E48T N173E E43Q E67K Q138D F208H T4K T4H T4S K138D D203Q R68Y R68V S103N C181Y K28Q K28G P101K K65R N166R L75T L75I L75A I69N N219Q L31M L31I L31K F215Y V203Q D162K D162H E219Q V122T V122P D162S E219S H101K M139V M139A M139I M139K E219T D69P Q219R D69E N47F G98A D69I Q200M Q200S Q200K Q200E A203V K203Q A203G D69A N219T N219S N67S K142M I100L E69A E69N V181F V181Y D69N A106M A106I P7T Q166E F115N Y57N T16M I75A D162T N67G I74V D211Q T75A D211S D211R D211E V215S Y208H Q166R M102Q M102E F167I A67G D69S M195L S142M C3G T67S C3S T67G M75A E49R A44Y V215Y E67S H219R G203Q E219R A44D E67G N43Q I69S A203Q T202L E69S N219R TDF 0.05 0.05 change in log(rf) 0.0 4 3 2 −log(prob) 1 V75I 0.20 K219N 0.0 0.00 3TC change in log(rf) 0.00 0.30 1.5 0.10 0.25 Q23H 0 2.0 1.5 0.0 0.5 1.0 −log(prob) 2.5 3.0 d4T 0.05 Q151M 0.20 G211N L142V L142T E207Q R211G K122E K102E K135T H207A Q11K Q211G A105S V135K A36G E197Q Q211N H121D H208Y P4L N162H N162Y A173T I94L C134S E173T S69N L178V K173R S123G H175Y N207T Y121D K207E M135T P101Q R211N R135T L197Q R72K S103K L135T K83R A211G H207T A169E V122E E123G D123G T215Y K207Q A162H S11K K172R M35V L178M T162Y W153G K101Q R11K M210S N81K K22R I135T E211G C215T K169E K197Q I35V C162H L210F G138K C121D A162Y A138K N32K N68V K43Q C162Y D162H D162Y N173T A211N Q207A N215T N6E G207A T216P K211G V10I R219Q R32K L200E T191S A207T A139P R173T V215T M142V R103K R197Q A122K I173T G99R M142T K44D A14P S162H K139P T139P D179I D215T E211N T211G L215T G67S E32K E219R S162Y T58N K211N E101Q D177E P197Q Q207T A122E L188Y T211N E138K K6E I142V D169E H219Q I8V A179I K57N A40F P170L E203V P122K N67S L35V V200E D79E K142V K142T S211G I202T H197Q T69D P7T I142T D218E R43Q I165T E203Q I202V I178V Q200E E36G A215T T163I A215Y D67S D6E V179I K207A M139P K173T G68V A33G D207A S215T I178M R200E N123G I180M N175Y I202L R139P G69S G69N T4S L4S Q122K E203D I60V P122E E44D Y127C L173T N43Q N215Y D36G E43Q I200E R101Q K219R T68V T123G A44D E207A R102Q K200E T200E F87L V118I L165T P4S N219Q N219R N103K K102R S200E C215Y S211N I215T S173T A35V Q122E Q203D T101Q Q217P N102Q E40F G134S N177E F210S L210S G177E D207T K207T V135T V75L L215Y G207T S68V E219Q H4S E215T D215Y T11K K219Q L207A L207T H211G G44D E207T A200E K35V I215Y E215Y K102Q S215Y V215Y P62V F215T M41L L75A K203V L168S R35V K203Q D40F L74V A43Q T69S N22R T69N P133L K65R H211N I74V T35V F77L D69S K203D Q6E D69N M210W F215Y V75A L75M L75T V75T V75M E102R L210W F116Y S210W L31I E102QF210W A62V V75I L75I E210S I184V E210W change in log(rf) R172I S211M V179F T142I D218V G190R V179G T39K Y188F T35I D121Y D40G K39A T135M A207T A207L W153M K219E G211V L173Q L173T V50F S98A E122K I173Q I173T T135R K103Q T173Q A207K H198P T39A D123Y D177Q I35A I5LI8V Q173S K103H E89G S200L E194G L39T L39K I179E I173S K39I I179S Q207T K207G D179E D179S C215I A173T H198Q K65N T165I V135I T35A T200L N175Y T69D K207E R102N N207H L26M Q207L D169K D39T D218H W153R T39I A122T E211V I50M L135M L135R A39I K204N H207T H207L H207K S211V D123G R101E R64Q K211S P9S V180F A173Q R173K Q122V Q122E Q122K A207G S162N E101P L173S R32K K103S A211V Q204N T173S L195I C162R S48T Q207K Q23H N173T K211M D67G I200L K197R L39A K6E L39I M142T M142I A207E G138E Q197K T11R H207E H207G A173S R72T E28A K138A K70T N123D C215F T139M K70R K35M K39S V74I A35M T195I L35T D186H P25A K13T I100F D204N T39S R174E S123D E196G N103E D207T D207L P119L G45E S123Y E200L D113E C145Q D194A Y162C G211N N81K I132V S163C R196K D53Q R139K L215F L215I L215C Y171F C162T R200L R101P K200L D39K D39A D39I T107S E203Q L12I N32Q V179L W162N W162S E173Q N123Y E173T E173S Q101E L39S K196Q V200L F214L A200L D207K M35G T215L S123G I202V E203D Q91H T215C D179V D179F K64Y D179G M184L S215D Q207G V215D K135M E44K V5L I106M Q211V I35M P157A D79E F144S T211V A39S R143T R125S T142V A69S A28G S162D R207T D6E A33V L35I N173Q N136T N173S N39T N39K S139A Q207E K172G N177D A114P I106L E48T F162N E44Y F162S E70G T11N D169R I47V I47L E179L R122V E177D D39S I135M K135R G162N G162D C121H T35M R211V N123G N177Q I135R I39S K64N S211N D36A A139K G68N S27T K139P L210Y K20L L210F D207G A55P Q197R A35V A35G D169A R102K G177N G177E R11N E28G R207L I31L K194A Y144S R143K D179L K49R A211N R39T R39K R39A R39I E177Q A105S P52L C88S D169E I90V E197K E204N T69N I106V T215I G196R K142T K142I H174E M35V D207E K203V I35G K174E I142V Y188H T4A T4L N69G M142V R139P N207T P14A T215F Y121E V50M P133L L178I Q203K Q203V N113E K123D K123Y R13T K123G K173T G196A R207K L205F S215A S215E S215T E32R G18C D121E I50L Q138E K211V I34F E190A N39A V16M W162D N39I N32R E211N K102Q I179V E215L E215C I35V I179F N207L N207K N219K E40D L168S T58N K70G L165T D215A D215E D215T L35A I179G Q11N W88S Y64N K28A A179L G196K N174E M211V R122E M211N R122K R104K R43N R43K K169R K169A K169E K173Q E194V Q101P P197K Y144D C162S G190C D53G M178V W153G I180L R173T Y162R M210S R35G V74L S146Y E40G K35G E194D D17G T7A K20R E197R T69G Q174E D218Y D203K A158T N137H K43A Q190C N102Q Q36A N102K H197K T67E H197R H219T A7P R152G K101E M184I I2L E39N P122V M164I R39S T7P H96Y K173S T35G C162NT11K T75A R190A D192E T75S R190C H96R S163R D218E L165I E53Q E36G K22R R173Q T211N S139K N39S K35V R103Q G213D R64Y K43Q Y162T N207G R11K M106L M106V N70R N70T F162D T162N T162S N134S N70G L187F D186E R35V M164L V75M N162D Q211N K101P G177D K138E R211N R103H G162A P122E R207G R64N P122K E123D G196Q E203K N207E E196R K211N N32K R93G N22R I178M D186N V135M G177Q C162D L142T L142I V135R N43K L193F V50L N219E T35V S162A R173S N103Q R196Q E196A S190C E123Y N103H A11K D203V A138E Q142T Q142I Q142V L207K L207E T16M A43E L207G G44K G44Y I214F G69P S170P G69A G69S S117A Y121H E196K I63V D121H D162A E215F E215I E123G P197R L35M K36D S68N T163CT163R K36G E36D N192E M41V Q11K K142V K126E A139P L142V R207E E194A R103S I179L L210M L132V R28A G190A F144D R 28G P140T N103S K219T E32K K28G F210M T139A P140Q E203V K65R T162D V10G P4A M210W I74L N215F N215I P4L T69Y E53G L197K H208Y Q35M L94I R45E R66K I69A W162A S139P K42E I69S A122V A122E A122K N81D I2V I181F E102H E102G K219R C162A T69E L35G L178M F116I T139K H101E R102Q E196Q D69N C134S L197R T69P V202L K53Q H182Q H218Y H218E A121E K53G A121H V10I L34I A62V M41L E39T G98S L35V V111I D44K Q190A S215L S215C I178V E39K C181Y Q46K K172T K172R K172I T139P I202L T215Y V202T I180V D67E E39A Q70G E152D E102N E30K K36A E152G S58N N162A E39I P4S L178V R70G R199G Y162S T69A A106M A106L L210S D44Y M139A I202T G98A T68N R162N R162D F162A R162A R162S E39S Y162N P150L V215A V215E V215T E36A R43Q R43A D215L D215C Q4A Q4L Q4S R68N L184I M195I I122V I122E I122K G134S P27T T69S E215Y I180F L34F S215I N69Y N104K T70G L149F F210S S215F M139K M139P Y162D L149I H64Y H64N N69E D69G L210W N69P H101P A106V Q104K P39T P39K P39A P39I E102K G39T G39K G39A P39S G39I T4S D86E T162A V184L K43E N215Y D215I Y162A V118I V21I Q35G Q35V D215F A215Y L188Y I181Y F210W E102Q V181Y L188F S190A I100L A158S N69A N219T H219R I214L V184I C215Y R43E D67N V109L V215L H103S I215Y I109L P62V V75L G39S N69S V215C L215Y T158S Q219T N219R L188H V75T Q43E F215Y A135M A135R F100L S67E F77L N43A D69Y V75A Q219R D215Y G67E N43Q S215Y V75S V215I D69E E67N E219T D69P V215F Q151L E219R H67E T67N D69A D69S M75L M75T I211N S67N Q151H M75A M75S Q151K G67N N43E F116Y V215Y L75I H67N T75I M75I 0.00 1.5 2.0 0.05 0.5 2.5 2.0 1.5 −log(prob) 1.0 0.5 T215Y T215F 0.6 2 0.2 E36K K11A I200T I200V T200V R135T A173S V142M K207L E173L E173K T216H I173Q I173A M41V T58I K11S G177N L214I A162H A162G K211D E219T G207K G207L H174K V35L I173S T135M Q173S K39E Y171F L173K Y121E V108I F215I Y188F K135R K135T N173I T165A D67G I8V F87C Q207D S200M K207H E123S K174N K28E W153R S162R R135M L214S K43T I37F S123T K104N Q211M Q211K Q211D V179F H198P K174L K102M L142V L142M N207K N207L I178F Q207R A200I G98E A200T K101R S69G H207A W153S N173Q N173A P197E A162Y N137I A200V D207R V179D K201N R11N A211E M35R T195I N162S A190G T142V K207T S162T T35V M35S N57K D44K R211A V215D H162Y L165K M210S Q91H R39E A105T T39K T39R D192E D6E T142M Q197H R174Q R172S R83S S211A K13S I37L E194D D39M K135M H208F G196Q A169R E79K A169D N123G N173S E39L L215N L215F T163C G207H K207A N192E W162D N123D E123T T39E R173E R173L T35L H174N K43A T211A I35T R102G I35A G211A G196K Y121N P7S A122N A122V Q102G Q102K M39R M39K M39E Y144H P150L M164L G196E I195L D79E R173K D79K R101P I165T D215E E177D H101R A138K R22K A11T C162N H174L P197H V5I M16I K39L R102K M16T R196Q D39I R196K R196E D39R D39K D39E E207Q T215N C162S S48L D215S R207K R207L I165A D113E M200L M200K T173E N162R T173L Q23H I39R I39K I39E S200E S200L I142V Q122N Q122V N39R N39K N39E K101P E203A E203G D67H Q211A K49R A69E A69P A69I T215F A36G P122N S211E T173K T215I I35V A39R A39K I142M G162D K220Q T69D R39L S39R S39K S98A N207H E174R E42G V90I S48P F210M E102K E207D E102M E102G A122E S139P S139R C162R Q173E Q173L I35L C215N C215F C215I M39L S105T G207T Q102M K194A R82K R45E Q122E D39L E29G D194Q M139I D194V K122N D194K W210M I173E I173L S200K V139R K36G A162D I135V R43T N69E L193F A39E E207R E35T E35A R200M R200E R200L R200K A173E T163S T39L G207A S27T C162T Q173K G190R I173K P122V A44K A173L V60L K122V S88W C145H E174Q F116S P119L L187F R190E R207H D218Y R190S A173K V202I K172R Y121D A28D V106M V106I K43E K211A K219D V178L T195L E44D Q207K R43A H175Y N207T A204E R155G S150L T158S P7T S107T K122E P122E E40K K174R K83S G211E A114P D204E K11N T166R V50L R211E S123N E169R I63V K35T K35A D207K F116I V215E K169R K169D D40F R207T P4L E44K E36G T200M S39E N207A R43E L210Y S162A A36D H208Y R197H V200M V200E V200L V200K K142V K142M F144D G204E E169D I200M D207L Q207L C88W K103H R11T A139P N175Y V189L H174R Q145H I75T V16T D194A M139P A7S A7T Y162D E215N E215F E215I E123N N174R E138K Q211E S162H F214L M35T M35A S162G L215I N162T S39L T211E K139M R102M K174Q S123G I50L F214I L12F R162A R162H V132I R162Y I39L S11NS11T R162G E194V S123D D67N I200E I200L R199S T162A T162H T162Y D207H T162G K103R Q32E L197H E179I E197H C162A T200E T200L A200M E36D R207A F214S T139P P197K N211A I200K R135V G68Y K204E K172S K35V K35L A28K D186H K123N A43E H162D P62V V215S D211A D211E I60L K13T N174Q K6D G68N D121H T200K Q207H Q204E N57H A106M A106I V118I N69P N69I E123G C162H E35V E35L G45E S215N E123D C162G V21I E207K N173E N173L E219N G177E G99R K211E G194Q H188L D196K D196E E194K G194V G194K I122N I122V G194A I122E P27T C188H C188L G190T S162Y S68G Q43T R70G D186E H101P M178L W153G C162Y N177E A39L Y144D K125R H197K K36D L12I C121D G67H R206K N173K A200E R32E A200L G68R E207L L74I K104R Q197K Y188H A200K A28E V189I S215F S215I Q196E H219N V31I N6E N166R E194Q G143S G143R S58T S58I M35V D215N D215F D215I N67T A139R Q207T R70K T139R T69E M35L D67T K6E Q166R T215Y K11T D185E Q219G G190V V179I E40Q T4S Q207A A158S K219G G190E E40D E194A K64H L34F L210M N104R G203Q L135V E67T L132I L75I I139P N11T E190S E121H D207T D207A K219E T69I G177D N39L K139I T69P K166R M139R E207H I181F D218E R101Q K219T D67S Q219E N103H N177D T135V S48E C162D K101Q H174Q Q43A Q103R T101Q F100L T43E T68R N162A I109L H218E R197K G44D G28D G44K R199K V10G I184L M135V F124S S48T S162D E101H E207T D17G P4H R35T R35A Q46K L47I I166R F124L K219N E207A Y121H P39L N32E V10I T69S L4S C181V Q219T A190R E101N N69S E215Y C121H L168S Y188L N43T E197K N211E L41M N103R D179I I178L N67S T69G S173E V111I E70K A135V R162D F181Y K154E E70G G148V I7SI7T C163S S173L S68Y E203Q L41V N204E S68N L197K A69S A69G D215Y V181Y N113E E40F K135V K123D K123G E210S S173K S68R N162H N162G N43A E67S K219R S215Y F210S N69G K32E K66N P4S Q43E T103H L31I I34F E219R C215Y V75M I215Y G98A L165T L210S N64H S103H S103R G67N R64H R35V R35L Q219N G28K G28E L215Y Q138K G190S M184L R30K N162Y Y64H T162D D203Q F215Y E101R D69E C181Y N162D I184V N219R R66N L165A F188H I139R F188L N43E I32E D69I D69P E46K A62V W210S H219R A190T E101P V75L K40F Q219R I100F F208Y M75L T103R A190V A190E V215N V215F V215I K139P G67T D69S A203Q K139R K203Q M184V P101Q D69G F116Y I100L V203Q L75T H101Q G67S Q151K F77L V215YE101Q I181V V75I A190S I181Y L74V V75T K65R N101Q M75I M75T I74V E203K V178M S163I I180F T200I I60L N207A V202L Q122P V180M V142T S162C V82F V82S I54M I54A I54T 0.5 I54V 0.6 0 0.0 0.0 3 2 −log(prob) 1 0 R211W P14Q V75S V35K V135L K83E I135K D218T I179L K22R T84I L209S I159T A39M A39D A200M T173K C162H R125K F171Y D86N V135I D123K R199E C162Y T48Q E36D I35A I142T E122R D113N V135K K49Q E169K K49E K70N P97A I135R P97G M178F R104Q R104N R104T R104E V75T K122Q E122I A39K G190V I106M S173L E138K V111G C162N Y162N D207T Y162D D207S V35G T27A Q211I Q211D Q211W Q211R R166K D40Q V135R E207D E197K A173K N123Y E196A R173K R173N E207T A162G A200R N175H N175D R206K A200T T142L P52H N57K G123T G123D C162D N175K V189I I35Q I35L T11K N123T E101T R83K P150L N175Y L109I Q197E D179V I35N D179S I35S G68T I5L T27I K200P S69A T39A M178I T173N C162A C162I N103K K211N M178V N123D D113E Y146F Y146C Y146S I179F V179S K101P Q211A W212S R83E D110N K43N N207R E207S H174Q P4T H174R I2L D36A K65N H162D T107K E6D I200Q R101G V180F G190E N81K A138D L209V T39M T139R N81Y T69D L187S L135K L135I K83T H96Y Q197N Q173N R173S G190T I200A K83G A207D A36G R211A I142L R174Q R20T D40K K43R D36G P4H T84S G18A G123K C162G Q197R T39D K173N A62P N147H A114P K46T D186H W153R L135R E204N G207L G207A T48A H121V K13S E169N N32T A173N I200M I200R I200T S200Q A162T S200A S200M Q211V Q211K F160L M35I A98G I47V I47L R20I T39K E173N T173L T173S D218E E200V E200K E200P A28E E200S M16T A28G K196G V184T W153G K83S V184I W88G L187W K200L S117P G177H G177E C88R K103Q N123K E122D A207T D39S K211G K28Q E32R K28A K28R E32L K28N V111L E194V E179M E179L E179I E179F E179S E179G T84N C162T R173L D123S A207S R196A V60L S211Q E39L W88C I159V Y144S A173S K204E G45R K103E K204G T128N P4L T139M L187C E43Q I90V A139K A139I V50L E200L F116L V148E V148A F214L T7Q R83T K70E G207D G207T G207S T211V T211K R101N E177D I108V R11Q S191P S191A S191T I35V I35K R11S K173S V179M E174H E174N A39S Y162A Y162I I132L K64N M135T R20L P170L V10I R65I M35Q R135M K70T M35A M35L M35N R139M D204E V10G D204N M35S L193I I165L D204G I35G V118F R72K F61C E194K V75L G152D E194Q L197E R201M L197N L197R D36K W71R D179M D179I D121V Q197K E36A D179G E203K K64Y R83G A173L H121N N123S S211I W212L I31M S69I K174E H207V K173L H207G R211V N147S N147K N147I V35T R211K S3V V135M R102Q R102H R102M R102K R102G V179G P14A E203R E122K S3R G123S E39N I180V V118I H96P E211V E211K E211N L39S S211D K174L K135R K135M K154R I37V A106L A106V S134I I37L S158P S211W S211R I60V D121N E174Q T27P A138V N32R N32L E174R A138G I135M L165P S200R S200T I132V E36G F116S P4A I173R A35V A35K A35T L135M Q102K A35G T27S A36K T69P R122Q A138E A138K K20T R122D R122K D207N E200Q S134G E200A E200M K204N A190C D179L D179F H162A H162I D186N T211N T211G R101T R101K E122Q M16V R11T R11K E89K E35V E35K E35T K207R S69G S69E E35G L178F A162S E207N K43T N103Q A211V A211K A211N I202V R39A K103I D40E S134R K219D V179I I165P R83S K13E D169K E39A T142M V75A R104K K39S Q211N E173S V179L K20I E200R E200T N103E G207N G207K D207K A105S Q173S Q173L E207K E196G R211N K70I R65K G190A E194D E101K E101P S211A L178V L178I A11Q Q176P Q23R R32K A11S K201E K103T Q91H D169N T207N T207K V111I A62T V74M V50I T107P A62S D67G L135T L34V V135T Q207G Q207V A190Q T163R T39S A190R C162S V179F K200Q K82Q I180F D67E A204D R35V R35K L188H Q207L R35G K82M A211G R43T T48S N177G H162G E89G K20L R135T Q101K A207N I106L I106V E173L E39M E39D M16R K70G K172I K172T K172R K174H I173K I173N I173L I35T I173S K6N K13T A207K W162H W162Y W162N W162D S211V S211K R211G K220N W162C W88L K174N D186E L193F Q211G Q207A A98V F77S G98V G98E E190Q T7I H197L E190R H197E H197N H197R T162S E190C E123Y T139Q G190C G18C P59A L197K Y115F Y56S S3I Y208Q E36K K174R Y208F K13N D79E E39K R101P I142M A98E Q161E E123T M142V K64R K200A K200M W88S N177H N177E Y162G H207L S139QT139A Q101P G207R N6A K103H S139A S139P K142M S139K S139I N6S D121C V74L Q23H K49R C181S E123D K28E K28G V202K V202T K43G T139P E48A E70G N43G K174Q I184W H162T Y144F I135T E53G N162A N162I N162T N162G H121C E123K L35V L35K L35G I60L G190Q A190S E211G E203D K135T W88R D121Y R196G V184W I167L S211N R43G N177D M35V M35K M35G C134G C134I C134R I165T N103I K200R K200T H207A H121Y T69S G177D I2V R65N N32K C181F A69N E53Q T69Y S211G V184L G190R N104Q N104E M210F I32L I32K K82R D207R R122P N103T K66N R143G T69A Q151K H96R K82N P4S S3G S3C T69I K43A L200M I184L I8V L142M T4S I8G I195L E207R Q207D A33V E39S Y162T T142V R39M R39D R39K E123S M39S E32K F116I A158Q L188Y T69E Q207T T69G A207R T107S Q122P K6A K219I D69P R219E Q207S C215V I202K I202T H162S H207D Q161P M35T A11T A11K A98S K122P T139K T207R T139I K201R N103H A158T E44N I100F S117A V202L A204E K203D L34F A204G R43A E102H E102M Q151H E102G K103S A33G S163R L142V Y144D C181Y W162A W162I L31M W162T L165T N102K G190S W162G I74L W162S R143T H207T H207S E53D I184M D44G E190S Q161H I202L V184M V5L I34F D162T V139K V139I A6D Y208N Y208L D162G K219R K219S N6E N6D N6G R39S K6S K6G E44G G98S K219H N166K E45R E49R N43A R93G Q151R I142V C145Q L35T N104K Q207N Q219E R43Q R43E Q207K S58T R190S A158G K6E K6D K201M R143K T75L E44Y N103S L210Y K43E T70I L75I H211V H211K H211N H211G E122P T69N L210S L200R L200T H207N H207K D67Y N162S Y162S L210V R139Q R139A R139P K43Q N70E N70T T158P I69N N70I H197K D44Y S210F A7I D44K D44A N70G F77L D67T F116Y A158S Y208H D162S E48S V75I S68Y K142V D67S A158P D28N D28E T122R N121C T122I G143T N121Y T122D G143K D28G E44K Q207R I181F I181S E44A A204N R35T L210M F162S D67H R143S S68V Y215F S68I I122P S69N R139K R139I L210F I100L K219N S68C H207R S68R L4S D69Y D69S K219Q T140Q K68V K68I K68R K68N T140P K68T A135T T32K K68C S137N K68G F142MF142V I214F I214L S68N S68G E102Q S68T S215C S215E I181Y M41F D69A C215H C215L C215I M75V M75T M75L M75A M75S K219E T70G C188H M41V E102K N43Q S215V N43E D69I K70R N70R V122P T75A A62V D67N T122Q T122K G143S T122P C188Y D69E D69G D215C N64R D215E F210WE70R P62VQ151M D215V D215H D215L D69N C215N M75I E215V E215H E215L E215I D215I S215H G67Y G67T S215L T215D L210W E67Y E67T T215S T75I G67S M41L G67H S215I D215N G70R M210W T215C T215E T215V G67N T215H E67H E67N T215L E67S T215I S215NS210W T215N E215N T70R C215Y S215Y S215F E215Y E215F D215Y D215F C215F ddC T35A 0.0 I178V 4 3.5 ddI 3.0 ZDV P95Q 129 0.0 0.1 0.2 0.3 I84V I54V 0.4 change in log(rf) Figure 6.6: Scatter plots between mutation scores of g2praw (x-axis) and g2ptransition (y-axis) for different drugs. Mutations from the wild type amino acid to a mutation listed by the IAS list (Johnson et al., 2008) for the corresponding drug class are colored red. 6 Planning Sequences of HIV Therapies ddI 0.4 0.8 1.0 0.00 0.05 0.10 change in log(rf) 0.15 0.15 Q151M 0.20 0.25 0.30 0.20 0.25 0.30 0.2 0.4 0.6 0.6 0.8 1.0 0.0 0.1 0.2 0.3 0.10 0.15 M184V 0.20 0.25 0.30 6 5 4 3 −log(prob) 7 EFV 0.8 6 5 4 3 −log(prob) 2 Y188L K103N 1.0 1.2 T48F L120I I5T Q197N D40A R83E A173P K122D K11A D169H D169Y C162K A207S K102H L74M K83S A114V T211W T215H I5N L210G D121A V202K K73N Q122N D192T K139Q Q91L K139E G15V T39D T165P S173P R20I K174T P97S D121E L193I R104T I179F K35S K39R L195R G207P R78S K32T E79K E169R N174G M184L L35S S211W C162F Y171I T39Q E35S M35S N68Y V35A T211I G123Q S68Y I47M I47L V184W L100F V60R T7S K220E R11A N104T E204A I5M I35S K211W E218Y T48Q V184T H174G V50L G177K K20I K83E G123T K73T E122N E89K E39Q R199E K197N C162V N39Q T216S T11A I37F A138T A173M T35K L195M T7N D185N T142F T142R I60R R173P E169H E169Y E197N S48F I135P T35S Q151K K6T P52H K13I E207L Q122I S211I C38F L210E H197N R201M T215N E32T Y162K E44K A207P N207P G177Q E40S S3R T162K G18V N211W E6S E102H T173P A138D K11N V135P T200R C162T N43T W210G L215H S88G I50L A138V M178F I135N K211I L197T Q197T E207P R104Q Y215H D211W L80T K203R E218T V142F D204A I167M E211W M16R M16I N174T Q145K K207S K39Q P9Q R174T A207L Q207S V142R V135N D162K T195R V203R V203N E40A T142L E29Q D169G N207S L173P K70Q I167F Q122D D6T Y127F C38G H207P P55Q H207S P55L E123T K28D K203N V60Y K82N K138V M139P T142S R35S E203R K211E G18A R39Q R49Q I135A A28D S11A S173M E39P R207P I142F N215H V167M K207P G134T G134I S48Q F115N E89G C88G L80I D215H H197T S191Y E203N G18C D44K D169Q I142R N177K S191P R43M S191C A7N Q207P F87S T7I G15R V142S N64Q K49T E122D T162V W24C N211I V106G F215H L195K K46Q I47N R32T A36G K32I R49T E200M F210G I31K K13V E44V V200Q K197T R99W V135A T39P K49Q Y162R Q101I K13N E174T I60Y N39P L149F T39I E29G N104Q S200Q D211I V108A E122I W88G R166I R169H R169Y E197T Q91E K138T E200Q E6T E53G R173M R39P A139P V215H Q101T R104N I184W I142S E211I E6Q E39I D207P H174T Y162F Y162V L187F K139P S39R S107P V202T N177Q T195M V148M Y146F E169G Y146C K35A G141R V148L R143S Q204A T162F M164K E6A I181F G93R T207P E28D D207S V200R T139P S68T I31M I35E D44V E169Q K138D I184T I109M T39L K40S I47F K39NK39P K43M T173M R206K R11N K207L S207P T11N L197R I2L Q197R M39P N68T P48Q V16R V16I L142F L142R A117T T200L I200Q Q207L E53K M164L V142L L195T W210E K123T R43T E32I N147S E40K D162R N207L N147K S200M L149I K139I K139V R32I D162F T207S V184L S69P Q49T N219G D162V I180F E44G R166T S39Q Q43T H207L P217Q E200R V21G R199K P59A R169Q L142S R169G K32Q K43T R143G D69P A139V L173M N54K I164K N73T K39I Y215N Y146D K6N N39I P119L S3G V167F G134N L215N P59Q R139I H197R S191A D110N F215A V180F T139V D44G H174N M35A R39I E204N N220R R162V N220E P35S I10G R32Q Y116L D207L P176Q E200L I35A K197R S139P H121Y Q197L L210Y L35A R139P N32T K40A Q145E N47F P119T S39P S11N D215N T195K V200L K73M S200R F210E E197R Q197P P27I Y162W I200R S68R A69P D6N P119S V215A A7IA36K E207R E39D D207N N39D R125G I142L I139P E43T Q23H T35A R70Q E194Q T207L V62P E43M E44D E32Q E207G T70Q P4S E218H M35E V62T I179M N162G E219G Q219G H208L T211D S139V A207R D204N K39D T142K F215N E35A Q142F Q142R Q142S A138S E194K W210Y N211E I69P K35E S211D D162W R139V I2V I200L I31V S39N R35A E79D R196Q R166Q A207G I10L S162G Y188S Y146S A204N NV148G 147H S69E R103I V215N I179S K102M V178F E6N S191T Y116S C188S A162G N69P E39L E70Q N207H F208L E70S K125G T35E N39L K49E N32I T215S I178F V179M V203G F210Y E211D S39I R49E F215L A200K S68N V142K Y162T T69P K139R G70Q K219G E122Q N69A T165L Q102M D69E G155R S69I A107P V159L N32R I109V D207H K203A N175Y K211D K39L E102M D69I R219G K203V K203G I181V S215E I184L Y57K S200L K101R A138Q G211Q V179S Y188N R3G Q219D M16V S162N A190V Q145C V215L R103E L210S Y208L K103I I142K E203G K211S S176Q Y181F K197L T11R A203Q I139V K197P A138G K207R N32Q C162G T215E H197L D162T N54I E203V V179F H197P E203A Q204N K28G E197L C188N W162G K173Q F87L E219D E197P D196Q N207R T69A P4H Q207R A69E K207G K173E Q211A N73M A162N I132L E173Q L215S K138S T27S R135M T35L K135R H207R L210M R101G F210S N207G Q207G S69G I142T E179L K173N K101G A190T L215E E30R K102N A190C L4H V8I N211D I50V S39D G177N I179G D169K V203Q E219H T173R S4H W210S V132L H207G R219D D86E N219H Y162G P35A K219D D121Y K103E T211G I69E L142K W88S Y215S Y188C R35E Q49E E173N Q219H I179A E36D K102Q G203Q D69G A173Q R28G I159L P58S R211Q A28G Q102N V179G S215C K101I Q207K N211T T162G E203Q E28G K203Q K135M T215I R101I P4T D204K G211A W210M D40F N215S S39K R172K D207R K138Q H175Y N69E A69G E102N L4T S173Q E204K V179A K101T R103H R173Q R101T Y215E T207R S4T D215S K82R H208F N207K A69I T173Q K211G T69E A179L F210M D207G E169K A173N T35M D162G H4T E200I S211G I179L F215S A173E V106L N162H S39L K197E F162G T207G R162G T215C D6K T173S T75A H207K E173I V179L K173I N43Q W88C K138G K219H R219H S173N R173N A98S S173E R103T D121H R211A N211G A173I D162Y T173N C162N Q219R N69I Y188F N67E F215E L215C N69G T69G G67E V215S V111I T165I T48E V189I D215E S162H T75L L173Q R173E A162H E40F T142M S211Q E211G R190T L173R C188F R43N K211Q T162N K122P L215I T173I N215E M135L I69G Q142L Q142K T69I I135R V215E K211A T 173E S200I V142M N67G V200K I142V A204K Y181V R173I K43N S211A F162N N207A Y208F W162H N123S I142M T173A N211A L173N T211A N211Q R190C Y162N T211Q S48T A190Q R169K Q207A L142M V135R E6K F34I S173I Y215I K207A Y215C D207K E69G H207A V75AA36D M200K N211S V75T H211Q L173I H64R E211A L173S A207E C162H N69S E6D Q101H T200K N215I D211G D162N I135M I31L Q204K L173E E219R R174K S48E K103T Y188H D215I R35L E200K V21I K103H E203K E44A F215C A158S E219Q E211Q T162H C188H D211A P27S E169D D215C K43Q F215I R190V H211A V135M T69S K101N K102R R43Q G196E T207K E43N I200K R101N N174K R135L L74I S200K L173A D44A E102Q R122P N215C Y162H C162A V75L V215C K40F Q142M G123S K36D D211Q G190R R162N E53D V215I G211R E174K I179E G123N D162H K135L Q219N K64R Q200K V106M Q102R D177E R35M D207A T69N D123N E43Q E102R F162H D123S E39A V179E E219N K219R K123S F162A H174K R162H Y162C V178M T207A N39A G172K G123E A203D I178M P48E T39A R39A I135L N64R I106A K123N A189I Y162A K219E E122P V135L R219N N68G K219N Q122P E123S Q151M T162A V178I E123N K39A D162C T101PR101H M178L K101H V203D R169D K203D A190E Q207E N69D T158S A123S D162A N207E F210W I60V K207E V135I E203D H207E G203D I122P R190Q S98G A98G M75I R162A Q142T T75M M39A I184V E101P V178L A123N T69D K35V H101P M106A S68G K174Q W210L I178L V106A M135T T211R V179I V200T S39A K219Q E32K L74V R190E N211K T70R Y171F E179D S200T G123D E200T A162S A173K H101E V200A L41M L135T A75I D207E I179D V106I I74V V184M V75MK103R V179D K101Q Q101P R101Q M200A L75M L75I F100I S211R K211R V108I F215Y F210L E200A A179D T200A T207E T68G N174Q E70R M35V T75I I35V G177E L195I Q142V N211R L35V V215Y K101P E197Q I200T I214L R32K T101E R101P V75I S173K V202I I200A N177E D196E T166K T35V E123D R70K S200A R103S A196E E174Q I181C R135T E43K K68G E35V R174Q K197Q E211R R173K R35V Q177E G70K Q122K T70K Q200A Q101E R196E K123D V135T L100I C162S S107T I135T A36E R201K H174Q E102K Y208H I166K T173K E122K V181C G177D V180I N104K R166K A138E A190S R22K G67D N67D I118V L214F E70K T195I K135T N177D D211R K101E I184M K36E N32K R104K T162S Y181C R190S Y162S I109L L173K R101E F162S D196G D162S A196G R162S E45G P27T F144Y R65K E218D A14P P35V G138E R196G K138E Y57N F34L F115Y P58T A157P C163S G134S A117S V62A Y113D I10V S58T Q142I L77F E152G G172R G212W R190A E104K Y116F R45G N22K A107T I214F R99G E30K N65K N166K R152G 0.0 0.2 0.4 0.4 G190T G190C G190V G190Q G190E K103S C188L G190S Y188L G190A K103N R103N 0.6 0.8 1.0 1.2 change in log(rf) 0.5 0.6 5 6 7 NFV I54T I82S I54A 0.7 S63F T4QS63G K14S I13K E34I I15M I66M S63R Q18K T4N L63G E16W Q18T L63F Q7D G94C V32G Q18A P79L F53V R41Q E17R H63R K55I Q18I T80S R8A C95L Q7R E17A Q2I E68R A63F A63G C63R V56L R41T C95S S37I Q63R I13R I72S T63G V15M K41Q L38M C95I T63F Q58K T4I L63R P79H I11R L19R N98D D29E R14T T96P K69E D29A S4Q E16D V63G P9L E35K R8L R14E V63F L38F D29V I15K K37I A13R Y67L K41T G49R I72E G94D R72S I85M L89S R8D R55I K70Q I19R D65K A63R S37H I84L Q18R K18T K18A Q92H K14T S63C E72S V15K A74R R20S T72S S4N N55I E63F E63G K14E E67W N69T T63R V85M G48IK20S L15M V33M R70N I72F R14Q V63D Q2L A63D L89V N7D N7R K55Q F53I H69T L10S P19R W6G L54F V82D I82D S63Q V63R E70Q S4I E70N A63E N98I L13R A12E K14Q D67L I11F L10C N37Q L90W S37Q I36A M89ST74R G48Q Q39T K18I G48R N98K R69T E63R Q7E E67S H61R R72F P9T E21Q W67L W6R G6R R70Q L90V D30V I54F L6R N14T N14E E37Q L15K I33M P39S L63C T72F K43I N12A V20S E72F K37H P12K N12E D60N W6L Q92E N12I K70T A91I I19T E12Q N14Q H61K H37Q G51A S37Y K69T V63N T12Q S67L R55Q R70E K18R N61K Q61K N69Y A63N E63D I97F I12A Q61R H63Q N37Y E21K R43I D30Y N61R Q18L Q18E M36A E61N H37Y A63V K37Q N55Q E67L H69Y E37Y Q39SK20L I82G G48L V89S C63Q E61R E61K T19Q S37E A74K A12Q T63C E12K D35N T19V Y69Q I12E A63C T74K V82G H61D L63Q N12Q P12S K18L K69R L64M S37T T72R E70T E67Y K18E T39S N88K I12Q R69Y T12K I20T I82C V63H A63H K37Y T74E V12Q A74E N37T D67Y T72E M46V E63N H37T N69Q Q61D V19Q K37E E35N L19V I82M V63C N61D V82M R70T A63Q E61D K69Y V75I G16A E37T V63S N12K V33I L19Q A12K I19Q A63S N7E E12S I66F M72L I12K I19V F99LG48A H69Q T63Q E12P V82L K45R Q61N T12P V82CI82L V12K V82I E63C T12S S67Y R69Q I10F K37T V63Q Q61E P19V L24F G73A K92R E16A K69Q E18H T36V I36V P19Q T74A Q92R N12S L89I A12P A12S L18H A63L E63Q G73C N12P Q18H I12S M89V L54S I12P I24F E63S V12S A71T I72L V63L T36L L76V V12P A37D I36L R16A H92R R20M M36V L82S K18HM89IT74P A82F R43KL33F A74P L10V E72L G48M T72L K20M L54M I54S L82F T37D C37D Q58E K43T V32I M72V E63L L24I L54A V89I Q37D M20T M64V L72V I72V V33F V71I V82F V82S I64V V15I I54L Y37D G73T R72L N55R A82S I82F M36L R43T A71I F82S V77I S37D I13V M93L M20I L64V K20R V20M H37D I54T L54TI82S D60E I33F T72V T71I P10F I54M I54A N88S R20T E72V Q39P L10F N37D I62V V3I K20T G48V E37D I93L T74S V10F R72V A13V D67C A74S N60E V33L Y67C T39P E16G R20I K37D I37D L15I M46L K20I I63P T72I V20T W67C M89L E17G Q63P M62V E70K S63P G73S D65E K74S L63P R70K S4T A71V R63P P10I H63P L10I A63P C63P I33L L82A V20I L13V V10I I11V T71V V82A I84V N55K V93L T63P I82A S88D S67C E68G G63P V63P M63P E63P A91T E67C L54V N63P L46I M46I I54V L11V G67C D83N I97L V50IV47I L90M 0.0 0.1 0.2 0.3 0.4 N88D D30N 0.5 change in log(rf) LPV ATV 0.0 0.1 0.2 0.3 0.4 change in log(rf) 0.5 0.6 0.7 0.0 0.1 0.2 0.3 change in log(rf) 0.4 5 4 3 −log(prob) 2 I54T V82S I54A V82F I54M 1 6 5 4 3 −log(prob) 2 I50V 1 E34G K20Q K70N V63K R8D V63M L10P L24S V63R V63D W6S C63K D25E P79T K55I R14T Q2P A13R E21G N69E Q58K R8P D25Y E34I L38M T4I L90V L93V T96P H69L Q58R C63R C63M V13R I82D N41T Q2T R14E R70N Q92H M72F E34K Q2D T19R C95I N98H R70Q C63D L6P P9H R14I N98S K69L N37T V15E D35K H69E L10S V56L I85M T71M G48R I20Q K69E L10R N98D D25N Q19R K45T M46K L93F D29N C95S K92H R41T R20Q G48W I93F I93V I15E Q58H I13R I84L K45N V15K V10P G86E W6R C67L V63N P39T T70N N41I L6S K43E V10C R55I E70N S67W K41T V82D L10H P79S A12N N41S K37Y M36K E70Q I15K R8T S37H G68D V11F W6G I47L R41S L33M K34I K7T K7D I36K D29V L10M V10S S12A C37H E35K K12Q I82G I3F L63S R43E K41S E67L R41I T9H V19R I47K E34D T37Y G17E R87K V33M D29A D29G M89S C95L P12Q G34I Y67L P12K N83K K21G L19R I41S A37H Q2L Q14T Q14E A13L L13R T72R D83K N37Y Q92L S88Y K43N K20T N88D S12N H37Y C63N S88T G48E V10M K41I V32A M93F V48I P10M V63H E61K F99L L64M V71M CD29E 37YS88H C37T V10H M36A R43N C67W S37Y L10T V36A P10S V63Q N12E I13M S39T I36A L89S K20M Q61R K45I K41N K37T V77L V63S N3F L38F L6R I33M Q19T S12E L39T C67S T12Q V12Q Q19V L38W I13L Q19P R10S E67W Q63T V82G P10T I50M L20Q A12E Q39S C63H P10H K92L V10T K35Q D35Q L38I V10R A37Y Q35G L63Q D67L Q39T M93V H63S I77L T82D L76I L6G E72R A12Q E67S T91A L11F Q14N Q14I I12E R69Q S37T V12E N12Q S63Q Y67W V54F R10H H61R T12K V72K K61R I12Q M72R K43I S12Q V48R H63Q I72K G51A E12Q E61R E12K V48Q Y67S V48W N69Y K20V R10M R43I E35Q C37S I3V V12K T20M I54F M20V P79A V19T R12E V19P Q61D Q92E I82L T4P I12K A63Q L19T D67W R12Q E72K A12K T72K L19P D67E A37S N83S C63S E61H V82I S12K N12K K69Y K7E T12P A37T H69Y N69R H92L I72L L63T D12Q V15L V63T V82S V33I I37Y K37E A71I P19I M15L K92E I91A I64V A71L L19V S63T C67F I15L E34Q N3V F66L M36V R10T E16A T82G H61D C63Q T37E D67S V11I R72K H12Q V82L A82L E72T G73C Y37E L72K H63T I36V E63Q H69R E12P C37E G34D Y69Q K61D K69R T20V A63T I20V R12K E67F D35N V12P E61D N83D T91S M64V I20M M72T N37E A12P I37T S12P R20V M72K I12P S37E K35N G16A R20M N35G D12K N69Q V48E T71L N12P L66V K69Q D61N K45R E35N I20T K35G E37D I72V Q37E E72L I20R Y67F R72L H69Q H92E F66V T72L R20T H12K Q61N I15V F53L I91S V48A A37E Q92R M89I C63T A74P V71L Q19I H37E T19I L64V M64I G73T I62V D63T T71I I36M D35G C37N R12P D67F M36L K34Q V19I I71L H61N E35G V36L V10L T72V E63T V54S T82L K37N E72V N60E K57R K92R E61N Y37D L89I L11I M72L K61N R16A L19II36L D60E G34Q T37D G48M A13V I89V C37D V71T K37D H12P I54S M89V R72V I37E V71I K69H L20V S37D S37N N37D L24I M89L T82I S73C R70K M10F R43K A74S T82S V46L L64I I85V E72I L20M S88D I82F E35D H92R V48M A37D I45R T10F S12T Q63P M72V R19I Q37D T72I I82A I10F A12T I13V V54T I20K N12T L89V I54T D65E T63P K41R L63P H37D V12T V63P L10F R20K K43T L20T V33L H63P R43T V82F I63P I75V S73T L82F P10F A37N Q58E V54A I33L I54A I12T I43T R14K E70K S63P V32I S39P S74T A63P E7Q R12T A82FV10F L20R L10I E61Q P10I V54M I47V T82F K65E H61Q M46L T1P L39P L6WI37D I54LI54M V54L V82A V46I K18Q I37N V10I D63P L76V L1P Q14K Q39P R10F M72I L46I M93I Q65E H18Q C63P L90M L33FI33F V33F N7Q I14K E63P L62V T82V L62I S73G N30D A74T F97L H12T R7Q K7Q M46I I54V L20KR10I I23L V14K I84V T82A L66I V22A S88N E40GV48GV66IF66I L38M Q18K K70Q G94D I15M R14V K14V I66N K43E Q18A T31P R41I E21G L5I V15M K43I N37Y G48Q D25Y I72S S37H P1L L76I L24S K57S D29V K55I L93F T96P T80I N41I K37I L5R M93F K41I I66M Q19M V10C E58R T37Y D65Q T91I M72S R41T T12Q H69T I93V L2S N88T G48W P19M R8L I85M E21Q E17R D29E R43E R70Q T72K I36A L15M I47L G16R V32A Q18R A37I E21K S37I I36K T37I T70Q T72S S39Q N37I E37I V10P S37Y N41T Y69T M36K E70Q R43I D67W T19M N83K T31S G48M H2S K41T R69T P39T M93V W6R T80S S37E D25N D67L L5F L10P M36A A37Y E72S I3F L10C E37Y T74K V32L D25E H37I K69T V62M D35N W6C K41N R92H L72S L10T K37C N69T H2P H18L A12Q T31A L63S E18L K37Q T8L A63Q L38I E67W L93V A37Q L6R E67L M72E M72K H37Y H2L I62M S67L S37Q S12Q A74K Y67L N85M F66N F66M Q92H P19T A37C Q18L E37Q I19V I72R G48E S37C G73C S39T N37Q L6C E37C S67W I66L Y67W I82D N37C E92H Q61D P19Q T37C G51A K12Q E35N T37Q S74P E67D Q39T A63C K72S H37Q E34D H37C M72R K55N R12Q Q63C I12Q T91A K92H Y67E N12P Q35N S67D C63T I10R V82DI54F N83D I3V L54F I37Q A82M Y67D I32L G48A T72R E60N T12P D60N I82G N83S L19V I37C P79A D67F V64M M72T I12K L63T Q61N L64M E72R I64M V10R L24F S88T V82G L10R A12P M46V P10R Q18H K20V G16A L72R I10Y A63S Q92K S63T L19I S12P V10Y G73A L10Y Y69Q H61E E67F I82L I82M H69Q P10Y P19V R69Q Q12P E34Q T19V I36V A71L M36V K69Q F66L Q63S S67F I63T I20M Y67F K12P H63T V71L Q19V A63L N69QT71L I12S I20T A63T G73T T37N I82C R12P T74P I72V Q63L M72V I13V A71T V15I R20T A16E I85V I12P V82I R20M V82LV82M A74P Q63T T20M S73A M89V N88D P19I I20R D61E F53L N61E I71L F66V E35D C37D K72R V71I L64I L15V T69Q I36L T19I L10V K20M S37N M36L L89V A71I V82C Q61E G16E T72V E37N V36L T36L Q19I R72V T74S K20T A37N G73S Q35D S73T Y37D R69H L76V T71I A74S I82T E72V K74P L72V M36I L54S I89V M93L K37D E68G A12T V20M A74T Y69H K20R E37D S37D I54S S12T C73A L24I L54T K69H R61E H37D I62V M93I E72I N85V M72I I82S I50V T10F A37D I47V T37D N37D T72I P19L V20T V33L V33F L33F Q37D N69H L54A R70K G48V K74S I10F V82TI54L L93I L15I L63P S88D V63P I37D I33F V10F L54M N41R L10F I82F A82S K41R F24I P10F T19L T70K C73T A71V N63P I77V K12T T88D S63P E17G D67C L6W E70K C63P K57R K72V L72I S39P Q39P T63P Q19L E7Q M46L T71V R12T D65E A82F I12T L2Q L90M I63P V20RH63P E63P Q63P E67C E58Q Y67C A63P K98N I84V C73SI98NN7Q S67C I82A R63P N30D R7Q A7Q H2Q V82A M46I L46I V46I V22A L54V 6 7 8 APV 6 0.05 change in log(rf) Q2S 4 0.00 4 7 6 5 4 −log(prob) 3 2 G48V 0.25 change in log(rf) T4L G94C E16W T91I V32G R8V Q18R A63F G94R D67V H63K T4Q H69P H37V L64R L63R H63F Q63F P19A L15S Q7A I15M S63F D65K K69E T80K E67V S37Y E37V V63F S63K L90V G68R A37V D65H C63F V56L K55I P39L L24S V72S I84L T96I H69L Q61K E16D P9L I36R I36T K69P N41Q L90F I63F V15M E67R L89F G40V K14T I15E V3L R20Q A63R L76S V15E N69P Q58R S37C Q58H T96Y G48I W6S G48Q V71D V33M N69E L19R T96N Q39L G40E R14E D67R H63R S67V A71D K20Q L10P I36A E21V Q19R N41I Q2I I63K A71G N98R N98S E63F I3L V71G I93F L6S E67G L10S Q2T M89F R8T S67R Q63R K20S V10P K20A D25N I66M T19R S4Q R20S S4L L76I R20A C63R C95L I72M I3N I19R S63R R41Q I72S G68D D63F L15M L15E K69L I3F M36R M36T V63R T69P D65Q V10S W6A E12R E72S T72E A11R A11C T26S R14Q D67G A12R K69T N7D N7A E21G K41Q D29G G48R T72S R72S Q18I N69L I89S N41T D25Y N63F I63R T96S V77L I66N D8V A71M S67G G6S T4I I36K T96P V13L L54F KE63K 14E N98H K43E W6P N88K L10H P19R M72S I77L T80S N55I L10M R41I G49R K70E R43E V10H K37Y G63R Q2P I12R E34D Q92H G6A T71D T37Y N7L Q2H K41I N30V G48W L89S T71G H63V K14Q V10M K37C Q8V S11R S11C E63R D29V T37C N88H N88Y N69T L6A L6P D29N M89S S63V G6P T69L V93F E21D Q18K T74R S4I M46K D25E R70T R41T I72E I54F N12Q I13L C67L N98D D63R D35G M36AM36K T1L C67W D29E K92H K41T C37Q V20A V20S S12P D29A V46K P39T I12A H37Q M72E S63C S37Q T74E H37Y I85M N63R L10T F10M D8T K43I Y37Q K70Q R70Q E37H E37Y H37C N37Q A12E Q92L I23V T71M N7K T70Q Q39T L82D A37Y N98I T14E H63C R69Y L24M G48L I93M E67L Q18E H67L S10P R43I V12Q E37C T37Q V10T A37C S12Q G34T V20Q G67L H18L W6R F66N T19Q I19Q N88IV82D I82D E37T E67W T14Q D67L R70E C63A Q8T D67W L6R H61D G6R E21Q S12N G34V S10M N98K H67W K37Q N70Q E37Q W6C I64M K92L A37Q I12E D67F Q63A E18L E61D S67L S67D G34A T74K G51A Q61D I63C Q92K V72K I63V T12Q A63L V63A L6C P12Q T4P T91S S67W Q18L L10R F10T H63A I12Q A12Q G73A I72T S10T P12N G48E S63A E12Q P39S Q12K M46V V33I Q39S D21Q Q92E S10H T37S I85L V12K G6C L82G S67E V10R T12N Q35N L64M L10V E67F P19Q L82M Q69Y V82G I82G K37S N63C E18H A12N H92L K92E N43E L10Y W67Y M72T E63C V77I N12K Q7E I72R E63V I82L H61N T10Y H37S T72R Q35G G48A A71L L63T I12N T19V E12N C67Y I72K S12K H69Y N35G S4P I63A E61N I82C T12P R57K H69R V63L H7E R7E R72K N69Y K69Y E72K H12Q G34K V82L Q61N F53L N43I H69Q V10Y H63L N88T E21K C63L G34D L19V E72R V82C S63L Q63L T72K F10R I12K N7E A12K I36V T12K D21K A37S V82M I82M E12P K69Q V71L K69R H63T L89I S10Y N69Q A12P P12K I19V S67F Q63T G48M I66F N69R M72K E12K M72R E67Y Q19V A63T H67Y I63T E37S A74P E63A P19V F99L G67Y S63T S10R M89I A22V Q18H H92E D67Y F10Y S74P T71L I12P K14R G73C H12N I63L N63A T36L G63T G16A A74S T69R I89V C63T M20I T69Y M36V N69K E92R H12K D88S D35E T74A A91S I13V S67Y A11I T20I A7E E16A R20M K20R K20M I36L V36L V72L R43K I50V D63T L89V T69Q V63T M89V I62V E63L K45R K55R S37N C37D I54L M64V K20T R20T T74P A71I I64V Q92R N63T T19L I72L D63L R72L D60E N63L R72V Y37D K43T M36L I45R K92R L64V I19L N88S S11I V15I I72V E72L R43T T72L H12P E63T T72V I82V S37D Q58E V71I T37N E72V M72L T74S Q35D E12T V32I A12T N37D L54S H37D H37N L76V G73T I54S K41R T14R K20I K37D H92R T71I T37D K37N R20I N88D V20M M72V N55R L63P A37N P19L E17G A37D M93L E37D F24I M89L I85V I93L V33L Q35E N60E L24I T63P T71A I12T L54T L15I I33L E37N N35E R70K M36I V20T A82S L82SI54M D17G H63P V46L L54M E67C E16G Q63P G6W A63P V93L V82FL54AI82FV82S A82F L82F S63P G48V C63P F33L I37D M46L D67C I63P K65E N43T T10I L10I V20R D65E V20I N60D S67C S4T L1P S79P G73S M62V G63P V10I M62I V63P A11V I84V N7Q L82A S10I N63P G34E D8R V46I F38L I11V E63P L54VV82A I82A T1P F10ID63P N43K S31T L62V L90M M46II54V S11V Q8R H59Y L62I L46I V47IL11V I23L N30D change in log(rf) −log(prob) 1.5 IDV 1 4 3 0 1 2 −log(prob) 5 6 SQV 2 2 6 1.0 T165P S123R D86H I202T F171V V148G T142N F61L V148L K174T K83G N175K R83E K20I T11A R70Q V142N K39Q F171I T7N R20I D6S K11N D76E T142R T142S S48Q S48F K174G K13I Q102T R72T C162K D76R E79R N104T D86Y Y146F V142R V142S Y146C G211M H174L N54D L195V P119T T7I F115C P52L R49T E79Q I135P R174T R174G F214V Y171V K64Q G15V E29K I167L L74M E86N T35S A44N P217Q S48P I37N R172I V202L Q151R I159Y E28D R78K K83E F87S W88G A162V L193F E89G R135P F214S E203Y E174T P55Q Q161E D192T E169H E169Y E32T I37F G45E K173M M35A E6S D113E A162F F171L I135A H121N K11R V202K P140Q K13N R199E D123R Q197T R122M R122D K126E Y171I K73T E197N A122T A122M A122D E86H E86Y I165P K39P A44V D162F K220E S48L K73N R72K E174G E39Q D6A L215H V60S E102T T11N C38S C38F N207L I60S R172T H174T D44N Y162K N175I F116I K6Q D169Q N32A T11Q R135A M41V C38G N67H T216H E44N S3G R199K D169G T195M E218Y V10L K197N Y171L K32A I195M E39P A36G P176L T173M N57K Q161H H121E T27L T69H M16R N68V N68C Q173M H174G I167F P119L C38W K39N Q151K E122M V135P S173M D215N T215H D6Q R206K Y146D E122D C162V E122T D44V E169R V50L R125G D186E P197N A44G D162V M16T T128P R68Y R68V R68C K43T F215H P119S Y175I E194G I2L V111L T165A I132V E218T A36Q N32I Q219P E218R E44V C162F Q207L G211E G177Q K32T F162V A200Q T165L K82Q I142N E53Q L195M V200Q I179S A98E K6N W162V N32T E123R S200Q A55Q E218H P4N V108A K219P L100V R169H R169Y I31M A122I Y121E H207L R172S E123Q K172S I47F R219P Q204G Q204A I167M M139P Q11N I165S Y121N I179M E29Q P59A E122A G15R A173I A169Q Q91E I50L N192T A169G V135A R213V S163I T142K V60Y E218V M184L V215H D6N Y162V E122I R101G N177Q R70S E79K L197R I37M V8G F215A D110N P59Q G134T H121C D44G V200L I60Y R22N D121N E6Q A14Q V60L K82T E169Q S191Y S105A I165A E6A C162W E169G K166N R173M S191C E197T A138D S200L K32I A200L I47N G141R I60L I39Q N64Q E173M Q219S I142Q T207L D121E Q102M E44G K39L Y215N I142R I142S R143S G148L C163I V10I F87L A139P G196K S162T C121E A36K K65N L215A V142K Y181S T215A V21G I200Q Y215D S105T F160L G134R N69K E53G H64Q V178F D211V Y162F T27I R45E F210V F215L Y57H L193I I2V V62P K166E S215N K207G N67A Q197R N67S K197T D204G I214V I214S V215A T35A E40F K174N A33P I179F K64Y S210Y S68T T27P K49T S163C Q204N L215N R43T D204A S 163G S39Q K32N Y215S F208L S39P K73M G68T A33V L210Y A162G S163R E6N N43T K70T E203G V142L K122V R169Q E49T N134I T200M R169G A138Q P176T D215E I109M P197T Q49T E39N Q197L R103E I39R L173I D211M I184L K219S R207L R101I G143S G211T E102M I200L I159V V184L W210V S162G R166E S163N G155R N54I R32Q Q219G Q166I T139Q R174N N69P G134C G134I K173L T142L S200V E207G Q43T M35E V200R E70Q I200V Y121C H208L A43T I39G N69E R103I D179L E39L M139S V106G I179A G67E M178F K166I A162T R196D E174N R166N E197P R172G T200S S39G K219G I39P K211S Q173L N175Y R101T K197R K102N K172G L139P R219G R122V D121C T173L H207N F210Y N162G L197H N68T K82N T200Q V202T Y215E M139V T215L T215N I109V V179M E203A D69I V62T T69P I106L E169A T139E T11R C162G V179S Y57K D204N E179L P122V E207K V159L S200R E70S T69K T173E R166I E32Q T35K P176Q K49Q W88S V8I A162N L210M K103I S173L M200R K32Q E203V S48E F215N P4H S215E E203Q T166E T173S W162G A122V G134N K197P E197R A139I Y162W Q173I A33G A179G I135K T139P Y188N S163T K207D V215L T200L H174N W210Y I39D A139V A203Q P197R I179G K35E G211A A158T T215V Y188S H67A A35E M139I E197L N186E E79D M35R T75S R122Q C162T Y208L S39D L173N S68G A200R K101G A122Q L200R V179A D162G Y162G D162T F162G M16V R70T A138G R196Q T173I W162T F210M A162D D169K N69A T35E M50L V215N R190T T207G K143S N32Q Q122V H207G E194Q N175H K103E R196A P4T I132L N211Q Q102N K173I L210S D123G T207N K101I N166E R135K M145C L215E Q207G A173N K166Q Q197H I181F I178F F193I I39N T166I R173L N207G K197L E200R I200R I165L V179F L215S T200V V184I R207H K101T Q204K E123K K68T R190C L142M D194K R68T T69A R172K E207D A211Q N64Y H67S T69E E173L R207N I159L S69G V75S E102N K35R E122Q A35R R173E G211Q Y162T I122V T122V Q173N V200E V179G S39N D211E I179L G177N Y175H A69G V142T M139A E211A E122V G207A R173S W88C A162H F162T S173I S200E C188F H64Y V142M R103H R166Q F210S N104R G190T E211Q T35R C162N I142K A36D V139I L139I I142L T200R R207G T211A N64H D69G H207K K49E A200E E28K T173N A169K P197L K197H V135R N211S E169K G162H T215E Y181F S215C T211Q E35L F162N I35L V179L T142M R68N K102R W210M D162N V203Q S162H T215S D211G T139S W162N H208F E173I E194K G190C E6D N207K R173I I39S K135M Q138G S39L V135K Q122P E40D R169K D204K M35L N166I F215E Q207K R190V A179L Q194K C162D P69G S215I E197H N69G T69G Q211S T75A T4S F215S K173A I39L E123G P197H K35L T139IY188C N69I L215D A35L Y162N K36D T207K T139V Q173A N162H W210S V75A I50V E28A T39A T207D C162A H207D K173N S39E T166Q R196K G194K G190R T173A A211S C162H E211S K101N V200K E70T D69S G211S W162H Q207D V75T S139R P4S I200E I39E N207D V106L K64R D211A T69I N67E S173N D215C M200K M211A V215E R173N K28A I31L Q190E Y215C T35L R135M T35M A139K V31L A69S Y188FG190V M211Q T27S K142M V215S I135M V60I D123S Q102R H67E T211S Y162H S173A Y208F T215D D207A A139R N166Q T75L R103T M139K I106M D162H I142M D211T K103H N64R L200K E173N Y215I R207K M35I D215I E219N F40D T215C R43Q R207D Y188H F215D L215C K103T F162H I69G D211Q V189I G28A T207A V75L R122P Y162D S48T A122P E102R R101H Q219R I69S S200K I179E D28A K39A R190Q A200K M139R K65R N215C V21I F215C L39A V111I A158S M75L Q101E L4S T215I L215I T200E T200I A190E T139A V135M L139R R173A E200K H215C Y64R G123N N215I F215I K35I Q200K E173A E39A E207A K207A A35I I200K R28A V179E N67G K82R S207A N69S T101H T69S V215D V139K L139K Y162A N207A H215I T35I E48T I181V N39A K219R V215C S123N V178L E122P Y162C A203D G190Q Y181IY181V E123S H207A I179D F116Y D39A K101H T215F K101R K43E T200K R190E I142T E203K H64R Y121D N207E V106M T69N Q151M T139R V139R M178L D211S A203K G196E M211S A179E K123N E101P P48T Q207A T75M H207E V215I A123N E203D T139K Q207E E53D I211S K203D H67G S98G V179D H219N R43E Q101P N68G F210W G190E R207A S211R V75M Q219N M135L D219N K53D K219N R101Q A43E T207E R135L V135L N69D T75I H219E T11K A179D Q211K I135L T101P K219Q R135I N211K L74V N43E M50V D123N R101P T101Q Q219E A98S M106A Q43E I178L G177E A98G K39T R219N T69D K101P A162S K219E V75I L100I H207Q V203K K135L V179I K101Q L39T D36E R101E I106A N177E R68G E123N R219E V203D L35V R207E H101P Q190S K35V I35V N174Q V106I A35V E39T A211K E211K F214L L215Y R70K M75I G211K S39A P122K K49R T101E V184M K103S I74V I39A Q43K R103S K211R E70R G67D E190S H101Q Y171F M35V K204E A190S V108I K103R V106A E35V E49R R122K Q49R N39T V180I Q211R T211K V135I Q190A F100I N204E G211R K101E D39T T35V V196E T211R R166K T207Q Q122K F210L T215Y E211R A211R Y208H A122K E123D H101E C162S A36E R190S R196E F215Y R22K E177D I90V G190S E197Q Q102K F162S R207Q I165T K174Q E86D D162S N211R W162S V202I A204E M211K M135T R104K R135T V135T R43K H174Q H64K A14P I135T I214L R174Q A138E Y162S N68S K197Q K138E E70K W210L N64K E174Q N104K E122K V215Y N67D H67D P197Q G177D I122K T122K G138E C188L N177D D204E E102K I34L M211R D211K Q204E K135T R190A F115Y Q138E I214F K36E I118V R196G L77F L94I I109L P62A D211R Y57N S39T E218D M145Q P131T R213G I39T G190A R68S A117S C134S R66K N134S R99G I211R I181C Y181C S58T Q46K V181C V109L H182Q A107T G134S V62A R103N 0.0 0.20 D121N I142Q E79K R199G H121N I180M D113E P133L T58I E207T F124L A33V K66N N147H D79K S162G C162N G45E Q207T Y121N D186N A162G R207T K207T S48T D207T E39S G207T G99R N207T R206K D186E H207T R45E V21I C162G R66N E215N V106M K22R V202T H162G T211N I202T Y181V D36A A39S R173L M210S M39S W162G K211N K39S Q101R T107S I106M L142M N162G K101E R162G A173L E211N T58N K219G K43N C88S S173L E173L N69I I135V W88S T173L V179D S215N N175Y K70T I173L L39S I215N E219G A139R M35I H174R Q219G C215V T39S Q174R D123G N123G T139R Q211N S162D I142M C181I N173L S211N D67ST215N D40F S68G K139R D215N L75A T142M V75A E36A A106M S58N G123S H175Y C215D T195L A211N A162D A123G E40F C215N K174R Q173L M184L G211N V111I T4S R211N E123G K104R K135L K122P S162Y I195L P4S L210S D44A T135L I181V C162A S200E P39S K101R K123G A142M R200E M142V A162Y K103S T69I G67S N104R C181V Q142M R103S V200E I200E V202I R135L V75M K40F F210S E70T M135L C162D E207A A200E T215C N103S T200E K142M K200E D162Y E48T T215V H162D E203D N69S G211R Y115F L35I A190S V135L M200E D69I D207A Q122P L142V C162Y D123S G207A K207A E203K G44A R207A V122P W162D N123S S207A T166R E44A K219N Q207A D67G T215D H162Y E101R N162D Y188L I135L R162D A123S L200E G190S E138A W162Y T69S D218E K44A I35V N162A T35I E123S A35I F77L N207A H207A K166R K123S N162Y K20R E190S T211K F116Y R35I D69S I166R T69D I142V L75I T142V R162Y M35V V75I K122E K49R I8V V60I L215F R43E K65R T11K N188L L35V M184I K43E K219R Q151M N166R A62V S211K V179I Q43E Q211K N43E K35V A179I K204E E179I I165T Q219R E219R K70R I215F Q122E R173K A142V D179I G211K V122E Q142V G177D R211K K142V A211K D204E S27T A35V E215F L74V K83R S215F E177D T35V S173K T173K A173K E173K C134S R35V I173K D215F T215F Q204E Y208H C215F N173K N177D E48S V50I E70R C145Q K172R Q46K D67N Q173K L215Y L210W M41L M210W H182Q F210W I108VS215Y I100LG67N F215Y G134S D215Y C215Y T215Y I215Y E215Y I184V change in log(rf) T4I A22S E16W N69P K45T K14E I19R L76S R14E D60A R69P R45N N69L Q63R D37V A63I N69T Y67R Y67G Q58K S67G S67R E35K I15S K57I D67R D67G I13L R8T S37I N83K M20N I82D G17A I20NP19A A12M L63R S63R V63F L10C C37V R69L T19F Y69P L5I V56L D60H R8A L19R P39L L15E V15E I20A E67R K69P E67G S4I I89S D25Y W6A L19F R70N G17R A63R E72F L63D P9L L15M I10T Y59H I19F Q58R Q63D V13R I3N E37V P63N Q2S R20N I36K Y69E S39L R57S R8V R43S E68D S63D K20N V15M V82D K69E V19F D29G I15E V63I V82N R41T I13R F53S L36K E16D P19F S37V Y59D A82D R8G P19M Y69L R45T I66M R16W M19F R41S S19F N37V T71D G16D K37V N98D T70N P39T Q19F R70Q R45I A63D I85M W6S T39L Q2P A71D Q39L I82G W6P K69L K43S L82D N7A N7R N98I K20A K43N R20A D29N T96P I54F E63I R41I M47L L10S D29H G6A C95W R8D P79T K41T A13R D29A V10Y I11F T12M C37Y I85N T72F I15M D65Q D29V I32G K41N L5F V20N V20A Q19R M20A N37K V82L Q2I I37V T80I L38F I63R Q63C K57S K43E Q18L L6P L6S F53V G6S G6P T37Q M36K R72F H2I H61K P79R N61D Q7E L82N I98D K41S P19R I93M G49R N12R E67W T72E Q8V Q8G N43S P79L F53Y Q2H E37Y Y69T I36A T96S T82N D29E I84L V93F L36A K41I S79R I32L S39T S67L V15L V63R P9T R55I M89S L38W I63D S88K D67L H63N A71G T71G D37Q Y67L T20N K98D T70Q T72R E21K V72K S12R L89S K92H T1L A22V W6L V82G N41T D34T T82D I32A E63R L10T C95I A74R T20A M72K M19R R16D V63E S79L A82G K69T N35K L63C Q8D V47M M62L Q92H V10M Q18H S37K V63D V12R P12R K34T N88K E61K N41S C95SV47K L89M T63N I10M K12R N7K Q61R M36A A12R I12R R55T S63C F10M L23I H18L T80S V76S E67L A12K E63D I20L C63N E37Q L82G I66L C37Q K92L L10M I10Y S37Y I12K P79S V47L N41I T12A D12R Q92L L90W D37T L63N F66M T12R W6C Y59F K37Y S37Q T70E S74R T37H N37Y K34I C95R A63C V89S S63N E18L A82I Q39T Q61K L90V R43I Y69R F82D L63H H63T G17D E67D G6C Q34I T12I S74E D34I R55Q E67S N37Q R63N Q34T Q63H D37H I12N Q63N L24M F53I G48I K37Q A63N C37H L38I S63H C37T A74K F82N Y37Q I15L K20L L6C E12Q M20L L10Y S37H Q39S G6L K43I N43E G51A R43Q K12N I36L C95L T74RG48Q I37Q R20L F38I H61D P79D Y37T D63N I37Y T82G Q34K I63N T12K E37T Q61H H2L G63N A63H E37H N7EQ2L P12N T70R K34V L33V S74K N37H E72M T72M A12N L19T R12Q R14K S37T G17E N12E L19V L90F I63C K43Q V19T I19T V20L I72L R63H F99L H7E E16A V12E I37H L19I P19Q D34V S79D L24F I19V N12Q N43I T72K E72K K7E Y37H I12E K37H V12Q Q34V K12E N37T I63H W67F T12N D60N I12S S12E K12Q Q35G I12Q D12Q G16A A63L P12E V63C V82T D12E G48R G48L V63N V82I K37T I36V V63A H12Q T74E E18H L63T A12E F82G E60N D12N C67F N69Q R72M A12Q S12Q C63T E63N E61D P19T Q92K Q61D R55N R20I K12S T12Q H61N N43Q T12E E63C Y69Q T20L R72K H69Q V12S S63T I33V V72L T74K K43R P12Q I37T M19T Q19T M12Q S19T N88S P12S R69Q G48T L36V Q92E Q63T M89V V71L V63H R92E F82M K92E V93M S67F Q34D A63T P19V R63T A12S M36V Q19V I50V H12E K35G D67F M89I I10V E61N L89V L82I M19V V54T S19V I63T M36L Y67F E35G L89I E35D V33F F82L T12S G48E D35G H92E Q92R R16A Q61N E63H C37D N35G S74A K69Q I85V M64I M72L T71LA71L K92R D12S L46I E67F R16E Q19I I93L D63T N43R Y69K G63T V10F V63L G67F I77V I64V I10F K20I P19I V89I I54L M64V E72L A71T M46L D60E G48A T72L K20R E37D S37N A74P L10V I13V F82C T82I L64V V63T M54T I82S L54T E72V T72V M19I N69H M93L M54A S74P L10F C95F S37D K58E E63L I15V R72L E63T V20T H63P V72I I54M L54A N37D L33F L54M V71I K37D G73C I54T R43T G48M V82A I54A V82S L24I A13V M46I R72V T74A L82S I43T T74P F82I Q19L T63P S39P K43T R69H Q58E V20I L64I V62I C63P L10I S73T I37D M72I I33F T20I A71I T82S L63P T71IA82S Y69H F82S K41R S63P N35D V20R R70K N41R T20R V93L L82A S67C E68G S4T P19LF82T F53L R63P E18Q Q63P N43T R45K G73T A63P D67C Q39P T70K V46L I33L T1P Y67C K69H V93I D65E M19L D63P M62V I63P G63P E72I T71V T72I A71V E67C Q65E N88D V20K K65E N7Q T82A N55K R72II11VS88D M54V V46I M62I H7Q V63P L54V D94G E63P I54V G73S M47I F82A R55K I84V F97L F82V N30D K34E S11V Q34E V76L V47I L90M D34E 0.4 4 5 0.5 change in log(rf) 0.2 3 M184I 1 6 5 4 −log(prob) 3 2 1 6 5 4 3 −log(prob) 2 1 7 0.15 T215Y 0.15 V180M NVP K65R M184V 0.10 change in log(rf) F124L V10I I94M K64E A28N E53Q G177Y D169N I31K P1T K35G K73M K197T N162I K39I R211M M16I K11S M35G A35G E177Y D194V D162I Q91H P95Q K154R R64Q K20L N81K L35G G190V D53Q A162I W212L R172S R103T T27P E39I V202L K64Q K28Q I180F D207L K194V C162I S162I T84I F87L G99R K44Y F116S Y121N K197H K219T R20L N103T K101G E86N V184T L47V M39I W210Y V16I V21F T35G A39I A28Q N207L T35S E207T E101G R101N N68V D194A Q211V A114P E154R H219T P1L V180F I5V T211V T58I D185E G190R S211V S58I K22N G141R S39I K219G K103T Q207L G207L T35Q G211V V118F Q23H R49Q D169R K39N D86N A207L K219N A207G G190C R207L K211V G68V H207L R206K C121N K194A S27P D44Y S190C N57K V179L K169R P122Q K219D K207L A36D W88R H101G P39I R101G T39I T69P N39L V75M S207T S200V D207T R211V T139I A139I K70N T39M L205F R190C E39N I202T D179G K207G E32R K49Q N207T G207T I35R N175H H211V A173E V106M I132V E39L A62P S68V Q207T S3C A179G K32R A207T I179E T200V S68T R207T Y144F V179G K35R H207T N70Q M139I M39N K219H A39N I200V G138K K207T L210Y W210M Q211E N43T C215N I2V W210F N101G D123K I39L D215E K122Q E200V E70Q R43T M39L A162N R211G I135M R39L Q211D K70Q S211E K39L K22R S39N T211E M35R K200V Q43T V202T I94L R11T S215E S215N V215N G196K M210S N68K A39L E215N K174E S211D Q211A T58N G211E E123K E200K T211D E49Q K211E K207A T69A N173E S58N K43T S39L F210S D123G N64Y E177G T211A E196K A35R T139R G211D I106M Q190E K197E A139R G68K E67S S211A T215N N69A P4S E36D V179D E39K E36A N123K G68N A69G K211D R172K A138K E138K N32R P39N D215N R174E G190E R64Y G211A T39N Q200V R173E P69G T69G K101P L210M G190S T173E C88S R211E K64Y L35R T69E T139K D179E A139K M135R K211A V111I E138A T39L E123G S190E N69G R211D I173E N69E D69A S68K T135V Y121H S48T A179E S215D A39K H211E T32R N123G P197E S173E L4S M39K P39L M139R E101P R211A G123S T215E R190E V215E R101P A106M V179E T69IT35R H211D Y115F D67S D69G I135R K101Q T215V H174E R174K T4S S39K D69E N67S E203D N69I S11T K11T M139K G177N K36D L197E N104R W88S L210S W210S H211A C162S H174K K36A K104R E177N W88C T215S C121H Q43R H101P V75T E48T E101Q S68N V21I L135M E40D A162C V215S L210F A69S D69I Q151M H197E I195L D40F L135R G67S K43R G162Y R101Q T39K P39K Q203D D123S D203K K40F C215I T69S K135V P69S N101P T215D F77L V178L V215D E123S V215I K219R N69S A162Y Q203K R135V S215I M75T K166R H101Q D218E K135T T162Y D69N T69N T215I E40F V90I M135V E203K N123S E215I F116Y Q43N D179I N123E C162Y S162Y H219R D69S M178L D207E E174Q D215I P122E T69D A98G I135V T75I K43N A40F V200A N101Q D162Y A179I S98G N162Y A173K K43Q E173K I166R V60I V179I R219E H121D A203K I178L L75I V75I N207E R211K L135V D204E K219E G207E S200A A162S N219E K6E R135T E200A G190A T200A Q207E I200A N43E K28E A207E I184M P62V M75I R207E S190A L215F H207E M135T D6E Q190A K122E E197Q N173K A62V Q102K K35V L173K R190A K200A K207E Q200A R43E N174Q K83R A105S S27T I135T I35V I215F I8V H219E T70R V118I T173K L135I R173K Y121D E102K N219Q K204E Q43E K43E E219Q I106V N204E C215F R219Q A6E S158A I173K K219Q V50I K197Q L47I M35V Q70R D194E N177D L135T K174Q K194E N215F R102K A28E S215F N113D I165T S173K T215F R35V G67D D169E D79E A35V K169E R174Q E215F R82K L35V L210W N70R D215F D67N G70R E70R C121D L188Y D28E K70R V215F T35V H219Q A204E G177D P197Q E177D H197Q K42E V184M C181Y F188Y A106V I181Y S137N H188Y M41L L184M H174Q A165T Q217P G134S T158A L197Q K44E I74L I100LF215Y T105S C134S G67N D44E V181Y V74L S215Y E215Y L215Y C215Y N215Y D215Y V215Y I215Y 0.0 −log(prob) 2 M184L M184V 0.0 M184I Q151M ABC M184R M184W TDF 0.10 0.05 M184L change in log(rf) 1 6 4 −log(prob) 2 V75I I35G 0.05 0.00 3TC Q23H R143S S163R I94T Q151H K13E V111M K201Q A162K N54H R199S T35S D40G P97A S134I I60L A36N N123Y D218T N137K K122D P1T E44Y G196A D86N A62P A162R T107P E40S L209M P150T D185E V106G F171L L209S E123Y I178P V50L F214S I178N L109R N177H D169Q E36N A36G K207L P14S K102G Q145H E6T I165K I31M D67K I202L D185N V111L V135S P150S T11N H96Y L178P L178N Q122D E36G G68Y R11N K174T R143G Q91E E89G M164H Y115N D40S L92I I35A S68Y Q219T S123Y K126E A28Q Q207L W24R G51A A158Q L214S H174T T69Y D123Y R172I Q161H K30T A207L E169Q G51K C162K P1L K197T T35Q T35A K102M K46E K39R A122D Q91H Q219S D169R P150L K211V R125K R172S S163T E196K T11A E29G S68T N177R E53K E101G D110N D53Q R72K D53K T39L K172I K172S V148G E53Q K6T K6G N67K E204N S48Q K169Q Y188H A114P E194V P7Q Q151K E28N I63K D207L F124L P4K R199K T135S E6Q I31K E40K Q197T V202L K122T E122D K207T E28Q A36K I135S M35Q N147H K219T T200Q E89K S162H Q122T E48L S98E E39R P14A V122D K28N Q161P F87L C162R K30E A162H S48L G99R E138D D69Y N68V L187F G211V P140Q I63V A200M Y162K S68V T35K E194N K166E C162G I47F T215L V179L R201Q E67K T215N A162T D186E Q207T T4K E169R I202T A138D W162K R68Y E197T K28Q K66N R211E I200Q D53G K6Q D113E A158P D186N D40K E204A E42K I35Q N43T A122T G68V E53G I100V K211A K169R D203G S48P L34F T142L V179S L109V A39R R206K K219S K103H S211E G98E A207T L12I V75L I181F N39R R101G L210V I195T I189L E36K S162T F115N A200S Q145C C181F Y162R E48P G155R K39L F215N E39L Y162G D162K E194A K211D E203G D207T W162R E122T T215C D169A V142M T139V G190Q E219T T107S L109I N123G T58N C3G T69P A173I R68V K32R T139I K194A A44Y M178V N219T E123G I142L K70E L210M K82R N67E C215L L31M L31K P4H K64N C215N K70T A138G R101N Q138D I165L E138K C162T K138D P101G E203Q S103H T215D T200M T139M K174E T211E D162R V75S E196R D67S A158T D203V E39S A36D E194K I200M T215S V122T D215S T142M T202L V181F D123S Q197H M135K G211E N47F A28G D67E P4T K102R K66R E122Q K197H L34I S123G Q174R V215N G196K C181V N39L I2V E219S T4H T69A V135K D162G L135K V202T A207N T215V K207N G196R A39L E36D C162H I94L T173I N219S Q173I I173E V106A K64Y G211D N32R E203V K64H G211A Q207N E32R A138K D123G E40F E169A A207K T200S T69E A190Q K102E E197H S173I T139R N211E I132V F210S N173I H174E E190Q Q166I A39S I200S R101H K203G E204K T211Q M139V D69P A211E T35I D203Q K169A M135R M139I L210S S190Q L215S N39S W162T A173E T215I I132L I142M I142T R173I V106M K211E G211Q T139A A33G Y162T S39L K173E V203Q C215S Y121H L210F G211T I195L R207H T69I A200K C215D I178V E28G K174R R102E V75A N162T K103S A39E Q197K E207H T35L A207H H174R N162H C162D K203Q A203G W210S T135K D40F C162Y D207N A211T Q166E I135K E203K L178V Q200M K211T S173E N211Q F215S F215I D207K T211S V75T I180V Y162H K28G N211T D69A R139K M75T W162H N39E V135R I39L T173E A211Q K207H E69A K103R K211Q E53D C215V N67S L75A Y188L Q207H F116Y G211S D69E V111I V90I K203V E28K S200E K200E D207A Q173E E197K G203Q T139K A200E V108I R173E D207H K135R E6D D162T P4S K219N C215I T200K M35L Q200S T67S I75A D121H M139A A203Q N211S L74V T135L T142V Q122P K122P I200K N69S T135R V21I D69I E67S T75A E101Q L135R K211S A203V R173T K64R V215S M102E D162H I135R A211S I35L K219E T69S A106M V75I R43Q D211E V106I T4S S103R H64R I135L A62V Q151M M75A E169D E219N D67G V215I K142M E44D T200E I200E S211R K43Q L75T T135V Y64R D218E A162S D177E A122P R101Q K6D T69D N173E I178M M139K A158S K102Q M75I I135V R101E S142M S48T I142V R103N L178M D211Q H101Q K166R V179I K49R K219R Q200K P101Q I135T E48T G177E E122P I200T P101E L75I K20R T69N K65R R102Q G196E A207E K207E Q200E M195L D67N D69S V118I E43Q Q207E K207Q A106I K70R K219Q D211S I69S V122P K103N Q219R G211K A173K T211R S215Y K39T A207Q E39T E69S I69N N67G L35V G177D G211R K142V M41L D207E A67G V50I T35V M102Q T215Y L210W N177E T67G T11K E69N D69N I215Y K174Q R11K H219R R104K H174Q C162S E67G N211R N166R E219R I165T D207Q D215Y T173K Q173K A211R N43Q A39T I74V N39T K211R S173K L215Y N173K S39T N219R N219Q M35V Q101K E219Q I60V S103N A35V R173K K6E K83R N177D Q166R C215Y I35V E49R S176P S27T A44D R101K E101K Y162S W162S F215Y S98A I181Y N43K I189V I39T L31I F208H C181Y D211R P101K D162S P7T H101K V215Y T16M G98A V181Y I100L C3S Y57N Y208H F167I change in log(rf) 0.00 0.30 −log(prob) 0.10 0.25 0 4 3 1 2 −log(prob) 5 6 7 d4T 0.05 0.20 S163I V180M I180F V202L I60L N81K A36G T163I R72K I180M Y127C A139P K139P T139P N68V T216P W153G M139P I202L L168S R139P P133L E36G V10I G68V D36G P170L T68V P4L N207T K102E H207T G99R S68V A207T F87L I94LT58N Q207T E203Q K22R G211N G67S M210S N67S V135K D67S D207T K207T R211G G207T I202T N162H L207T E207T E203K Q211N G138K A138K H175Y T200I L210F Q211G K203Q S123G A33G Q122P A162H R211N K173R L178V C162H E123G D123G V142T E203V D162H E138K V178M A211G N207A A211N F210S L210S S162H E211G L142T V75L N22R E211N A40F K211G K211N L75A T211N N175Y T211G N162Y K203V N123G V75A H207A S162C T4S L4S I178V S211G P101Q T123G E40F S211N P4S L200E T162Y K101Q H4S L142V A173T E173T A162Y C162Y D162Y V200E S69N G69S D40F Q200E H211G R200E H208Y K102R H211N K43Q M142T I200E E210S S162Y E101Q K200E E219R S200ET200E K142T N173T Q207A I142T G207A A200E L75T R173T I173T R101Q R43Q L178M K219R T69SV75T N219R D69S T101Q E203D N43Q E43Q R102Q K207A D207A E102R K173T F77L D218E N102Q Q203D M142V L173T K135T E207A G69N K207E K44D K122E K65RV75M S173T K102Q P62V I142V D179I T69D K142V L75M L207A I202V I178M A179I F116Y A43Q R219Q V179I K203D E44D E207Q M135T A44D R135T L135T Q11KD69N H219Q T69N V75I L75I E102Q V122E I135T T215Y G44D A62V L74V I74V C215T M35V N219Q I35V D177E A105S N215T E219Q E197Q H121D S103K S11K V215T K219Q K207Q V118I R11K C134S Y121D D215T K83R L215T L197Q A122E A122K A169E N6E A215Y N32K A215T N177E L35V G177E S215T R32K K169E N215Y P122E K197Q K172R P122K R103K A14P C121D V135T C215Y K6E E32K Q122K I8V Q122E I215T L215Y P7T D6E T191S R197Q D215Y A35V E215T I215Y E215Y S215Y V215Y M41L P197Q L188Y D169E M210W F215T K35V K57N I60V H197Q D79E I165T N103K R35V T11K L210W S210W F215Y T35V F210W L165T Q217P G134S Q6E E210W L31I I184V change in log(rf) S211M D218V R172I D40G A207L W153M Y188F H198P E194G G190R V179F L26M Q207L K103Q H207L R64Q P9S D123Y I5L V50F E89G K211M K39I G211V D186H P25A N103E A207T D207L P119L N81K D53Q V179G L12I T39I I50M A39I M35G V180F H198Q F144S R143T Q23H L39I E44Y T11N I47V I47L R72T W153R K139P K20L A35G K13T I100F R11N R207L Y144S P52L G45E S123Y I179S I35G Q207T T4A R139P P133L D179S D39I G18C D177Q N123Y E211V K65N N207L L168S Q11N S211V E194V Q91H K103H R35G E40G K35G E44K D17G V5L D218Y I35A I2L R125S A211V A122T D192E T35G H96R C162R E53Q H207T E36G K172G A114P G213D L187F N32Q D186N G44Y R39I S200L T35A M184L K36G D113E N192E K126E A139P V50M V10G P4A K123Y R13T D218H T69Y A33V S215A N39I S139P N81D E102H E102G D207T D215A D169R F116I L35G T200L Y144D G190C D194A W153G I180L K53Q I132V H218Y Q211V Q190C H219T T139P K172T T211V R143K E152D R190C N136T R199G R103Q K197R D44Y Y121E P150L R211V I50L Q4A D121E V179L N69Y N103Q Q122V S190C L149F E123Y K169R R207T M139P K207G G44K I200L K204N E203Q F144D K219T R102N N113E H96Y L210Y S163R Q35G R45E K211V K194A Q204N E179L M211V E200L D53G V202L D169K Y162R N207T V10I V50L R200L K200L D44K L205F K172I I202L D179F D179L I34F I63V V200L T163R A200L N177Q E39I A207G M41V D204N Y188H S163C P14A D186E K196Q V215A T215L E177Q L173Q I106L D179G I180F D69Y Q197R H207G A179L P157A L149I A121E I173Q T39K N137H T173Q P39I P140T G39I Q151L P140Q C162T T135M E53G R122V T107S T4L I181F L188F K43A L35A D121Y I179E M164I E194A L132V D179E K53G D123G N207H I179F N219T E197R H197R A28G M164L G69P R103H Q207G N175Y G177Q I179L N103H E215L Q219T V184L E204N T135R T11R I179G G196A A158T E28G Q173S M106L T69P E219T P122V P197R T7A A207K G196R D207G I173S L34F N69G E39N T139M G196Q A173Q K70T L193F R196Q P4L L197R C215I S215E E32R N32R S123G Q151H E194D K39S E28A T58N D215E E196A T39S S117A T69G Q151K R43A N69P R174E T163C E70G K211S G68N S162N Y162T A122V K22R D169A I106M L210M F210M E196Q L39K E196R R196K A106L L39S G69A L135R N207G L173S N123G N22R L210F T173S R207G S215L Q4L R28G L173T L135M K28G L215C K64Y A173S H207K E173Q L207G A39S I69A I31L K169A I173T K70G T75S K123G E101P K103S D215L I122V S139A T165I T215C D39S S215D V215D D36A I39S Q207K C88S W162N M210S A7P K64N T35I G211N N173Q I2V T7P R139K T69A L188H D69P E102N E173S N70G N43A K203V L215I H174E S48T K174E Q197K W88S Q203V T75A S58N S68N R101P F162N K219E V215E A173T M142T Y64N N174E G162N N173S K28A Q36A V215L G196K D69G E123G Q174E D207K V202T R39S K135M E215C L34I K39A N70T K173Q N39S S211N I202T D39K M178V T69E A139K N69A K135R D86E R64Y I135R A211N R64N I135M Q70G T68N D203V R173Q D67G R70G E40D T215I R68N E211N G177N N173T L210S K138A E203V E196K T139A M211N E48T K35M N39K R28A T70G A35M C162N A69S T67E T162N K173S F210S N69E A106M R207K V74I E197K N207K T211N Q101P K36D R39K R101E E36D S139K T39A K36A Q211N R211N D69A R173S I180V K211N K142T E173T E36A E39S M139A I35M P197K I178V S215C V135R M184I K101P H197K L178V V135M E215I V75S T142I H64N P39S H64Y S162D T35M N215I D215C L207K R103S T139K N103S E203D Y162C C121H G162D V111I R162N Y162N D69E L165I D67E Q101E V75A L35T L142T M75S V75L R43N Q142T P4S G69S L197K T69D K173T G39S K102Q W162D Q4S M139K S215I R173T I69S L39A N102Q K65R V215C E39K T4S M75A K207E L184I L35I H101P M75L D215I F162D N162D T142V E122K C215F I202V L35M T69N C162D D39A Y121H D121H V75M T69S Q35M L215F P39K G39K T162D S67E K101E H103S G67E G162A A207E H207E Q203K I142V M142V A135R S162A R102Q V21IV75T V184I A121H V215I D162A A135M H67E K49R K43Q D203K S98A F214L W162A V135I R39A R162D I8V I211N C162A K219R N69S K70R A62V E203K D218E L39T N219E I178M N39A Y162D H101E E190A Q122E N162A M75T Q142V T215F F162A R162A D69N Q122K Q207E K142V E102Q L142V R190A D39T S215T L178M K20R G177E D215T D69S T162A D207E R43Q A158S N123D Y162A G98S H218E K6E A43E S123D R32K G190A F77L T158S E215F H208Y H219R R173K P62V N215F N219R Q190A M142I E196G E39A N207E L195I Q219R G138E L207E R207E E219R D6E T195I A35V W162S N70R N43Q P39A G39A C145Q M35V N39T F116Y S27T S215FK43E Y171F V215T D179V R122E K123D N177D R122K I35V S190A E177D F162S L75I N219K R39T D215F D79E V16M R43E P122E M75I K35V T11KT75I R35V Q43E P122K R102K E123D R43K K142I T35V L178I A55P D169E R11K I90V I106V A105S I179V C162S M210W A122E I214L Q138E A11K A122K V74L V215F T16M L165T K169E Q11K R104K G177D N32K N43K T162S N102K S146Y L35V R152G M41L N43E E32K M106V I214F L142I I122E K138E N134S Q142IE30K I122K T215Y V118I R93G A138E E39T I74L S170P L210W E215Y P27T K42E F210W Q35V L94I R66K C181Y Y162S C134S N215Y R162S H182Q G98A A215Y P39T K172R G39T Q46K E152G C215Y M195I D67N G134S N104K I215Y L215Y E102K A106V F215Y I181Y Q104K V181Y D215Y S215Y I100L S67N T67NL188Y V109LE67N I109L F100L G67N V215Y H67N 0.00 0 5 4 3 −log(prob) 2 0.6 3 0.2 Q151M 2 0.0 1 4 −log(prob) 6 6 8 T215F T215Y 0 2 P14Q T84I R211W I159T L209S K83E R199E T48Q D218T P97A P97G V111G D86N N175D P52H P150L R104T I35N K200P C162I V35G L209V N123Y R125K N81Y L187S Q197N K83G R20T T84S G123T D207S I179L K46T D186H H121V K13S E169N N123T Q211W R20I R104E E200P I35S K83S W88G L187W S117P G177H C88R E32L K28N E194V V75S T84N Y144S Y146F T128N L187C I178V R83E D110N N175K T7Q I2L T107K S191T N81K Y162I R20L R65I M35N L193I V10G F61C L197N K83T D121V E122I R83G T27I H207V A36G G190V K22R T27A S3V R102H D36G K154R I35Q K174L N147H A200M D113E Y146C T48A N32L Q211I F160L P4A I47V E207S D40Q K20T V184T H162I R206K V180F I5L E122D K103I K28Q A39M V111L R83S K13E K20I H96Y D207T K70I G190T V60L Q23R K201E I35A D169N K70N V74M E122R A62S Q207V V148E K82Q V35K A114P K82M R83T W153R K49Q K20L G207L M16R S191P S191A K6N R11S N32T W88L E101T M35S A98V I35G G98V H197N T7I R104Q N57K P59A G152D G18C R201M W71R Y208Q K13N Q161E E207T N177H N147S N147I S139P N6S R102G C181S T139P I184W D113N N162I R101G A207S W88R D179S V184W I167L A39D S134I N103I V50L A138V W212S E36G F116L A35G I32L R122D A190C E196A G207S R143G H96R M178F S3G E35G G18A V10I V179S F116I V118F A158Q R72K K219I K65N Q211V Q161P K103E A11S Q91H K49E E203R A162G V148A R35G D123K I200Q E89G Y146S Q151H E102H S211W K172I K172T K13T A138D G68T Y144D L165P W162I M35Q I135K R143T F77S G98E T139Q E123Y E190C G190C Y56S E169K A62P Y208N I179F W153G A98E E123T E89K W212L I31M S139Q N6G K6S K6G I165P V202K E48A K43G D40K Q151R N43G T211V I47L I60L L35G H96P M16T I37V P170L R43G M35G C134I K103Q S200Q E53Q Q197R T69Y E44Y E179M N175H E179S I180F Q207L T70I R211V N147K K66N D186N Q151K I8G D67Y K204G S3R A207T R139P E211V N70I D44Y A7I N103E I37L S3I H121N C162H S211I I202K S68Y H207L R11Q F116S K122Q V179M D28N A62T T107P G143T E44N P4H L34V D204G T163R V202L I181S S173L D179M I159V A211V G207T P4L S158P E102G E179L E179F C162G D121N T27P T142L A200R K200L P14A S69A T69P T39M E138K I202L G45R I200M E179G R101T S200M E36D F171Y Q23H A33V R196A R102M S134R I106M K68I L197R E200L Q207S M41F S211V A158GS68I K201M N103Q K43T D36K E204N P4T N104E E200Q A139I D179G A11Q E53G K103T R139Q D186E H207S Q161H K70E V179G Y208L M35A R43T G190E V135K A36K T122D D179L D179F I100F A190R K28R Q211D K82N K200Q A28G C134R R173N I142L N6A V179L S163R L31M K220N V184L E190R T39D V135L V5L D204N I132V D69Y I184L L109I V179F S134G E200M R65N A162T L188H H197R C181F A106L G123K I200R N103T D69P I132L V75T C162N Y162N E194Q E36K G67Y G190R K219S L193F C162T I135R K6A R104N T173N R143S S68V E194K K204N N123K E200V H162G E67Y Q207T L178F E39L L34F A204G K43A T173L N207R N175Y E32R K68V D36A H211V K200M I106L Y162G S139I E200S M178V I34F R173L R101N E102M V135R Q173N N162G D44K K103H R143K E39M H207T N104Q E194D C134G Y144F S68C R43A A39K M16V K173N K43R E44K V189I T139R E174N T140Q T158P K68C A173N L200M D121C A138G S200R K211N S69G Q211A E173N H121C I165L A158P A173L K219D I142T H207G D44G D169K N43A T122I G143K G143S K173L S117A C162Y E207D V75L L135K M41V T139I E200R E44G E45R N32R L210V E197K K28G L210Y E39N N103H A158T I2V S69E I181F R211A T139M R39M W162G I35K E39D A190Q R122Q V139I D162G V75A E122Q D39S K101P K70T V202T A138K K211G Q173L R139M W88C E190Q Y162D I35L Q197E T69A E36A K28A Q207G H162T N162T S69I K174E H174R K64Y G190Q E173L A35K I173L K174N S68T K200R A39S S211Q R173S H197L E123K I180V E35K C215H S211D N70E S3C L135R K70G Y162T T122R R39D T69G I202T K207R V184I K64N D207N L39S A139K C162D E174H E207N T107S E6D W88S T173S A204D R35K E70G G207N K68T A207D R139I T69E T142M T207N W162T C188H I173N T211G E211N K39S N121C A173S D162T E200K T139A L178V A207N T27S I173R A204N S139A K43N L35K T39K D67T W162H G207R Y208F T39S M35K A33G T211N K173S A211N K201R D28G H162D A211G Q211N S211A D215H I142M V111I R211N L210M R211G Q211G K142M R11T K135R C162A D207R E215H L200R S215H D67E E207R G207D T69I D67S K174H R135M A207R D67H L197E K82R E211G E174R W162N M35L T207R M210F V135M L142M S211G Q197K G207A D69A E101P K219H E173S E39S Q173S K135M T75L M39S D123S S139K S68R L210S S211N I135M L135M K68R G207K D207K N70G E207K N177G M75S I173S P4S T4S R101P T207K Q101P T215H Q207N N123S K174R A207K R39S Y115F R139A G123S M75A N70T H197E T139K A11T H207N E39K Y162A C215V D69G R102Q Q207R I195L L197K M75L H211G T69S G67T V139K E43Q S210F C215L K64R T39A C215N T75A R83K T70G A98G K103S H207R M35I K68N D69E E67T D121Y S68N H211N T122Q A190S H121Y Q207D R39K H162A D218E S215E L210F W162D H207D T69D F142M V135I Q207A E53D N103S W162Y E203K D215N G67H G67S D69I L4S D67G Q207K R139K A6D H207A E179I D215L E123S N6D G190S E67H S215N E190S N162A E215L H207K T215N E215N E67S K6D T173K S215L Q211R D215E E203D R190S M75T D179I R122P V35T H197K I200A D44A S215C Q122P S215V A35T D69S N121Y K122P E44A S200A E35T W162A C215I G123D I202V V179I T215L K219R K203D F77L L75I R39A G190A F214L W162C F116Y E39A G177E T215E I35TR43Q A158S A69N D215C V75I M142V N123D R166K T11K A200T D215V E122P K49R A173K E215V R219E R173K L135I K219N T215S T215V T142V K43Q E200A E215I D215I N103K I122P E102Q N64R D179V V118I M35T A98S Q211K M178I L142V M135T Q219E T69N S215I I142V R43E G98S A28E H174Q I69N T215DT215C K200A E49R L35T K43E I200T V122P T211K A62V I35V N177E T122P N43Q S211R R174Q T215I R211K K196G S69N M75I E211K Q151M K142V L135T V135T P62V E122K R135T A35V R35T E177D A211K E35V T75I R122K K219E F142V K68G I135T K204E I90V S68G K135T R11K I108V S200T R35V S211K N43E I60V D204E E123D R102K E200T Y215F A162S E174Q L35V A106V R32K E196G R101K M35V Q102K D40E A138E D69N K28E L178I E101K C162S R104K R65K I214L A105S Q101K T162S Q176P V50I I173K K200T I184M I106V I8V N177D T48S R196G V184M N32K G177D K172R V74L I32K A11K K219Q K174Q D79E E32K I165T H162S N6E H211K L188Y K6E W162S A135T C181Y I74L A204E N102K N70R K70R L165T L200T N166K D28E N104K R93G N162S C145Q Y162S D162S S58T Y208H E70RD67N E215F F162S F210W E48S T32K T122K I214F S215F I100L D215F C215F I181Y L210W M75V T140P S137N G70R M210W S210W M41L T70R C188Y E102K G67N E67N C215YS215Y E215Y D215Y ddC K207L T35A E36K T58I K11A K11S G207L M41V Y121E T165A F87C T216H E219T K43T H198P K174L N207L W153S K201N R11N Q91H D192E Y188F R172S R83S K13S W153R I200V E79K L214S I37F S123T N192E L214I Q211M P7S Y144H P150L G98E D79K H174L M16I R207L I165A D113E N57K M35S L165K K220Q S48P S139P D194V R43T T200V E123T P119L L187F D218Y R102G Y121N A162G S150L K83S Q102G A114P S162R K11N V50L F116I F144D N137I D207L Q207L A139P D39I M139P A7S S48L Q23H I50L S11N E194V A36G T139P G68Y K172S D44K K102M A122N E102G G194V E29G Q43T I173Q Y144D K36G E207L G190R S58I V60L D185E I139P Q122N S200M V142M A169R P122N E36G E42G T39R V10G F124S V179F D17G R45E I37L V189L K122N N43T L168S L193F L215N I7S S68Y C145H F116S N113E F214S A28D D186H I60L G196Q K13T E173L M16T M39R M164L K207T G99R K194A I63V N162R K43A L12I A44K G143S D39R Q145H T215N C162R T163C A162H I39R L165A L12F N39R D39M I178F M139I R199S I181F K219T R196Q A39R I35A A173S S69G E44K N57H S39R G45E K211D E102M C215N N173I I100F Q102M I122N G190T F124L D194A Q219T A190R W153G A105T K139P E169R K169R L41V A69P V16T G194A K174N L210Y G28D I184L R43A G207T S162T E194D G177N Q207R G190V E215N M200L E194A L34F K66N E35A R102M F214I R66N S200L D186E Q197H A200V D207R N207T N173Q S215N G44K E203G A122V K35A R207T P4L E40K R206K K139I I173A R200M R200L A211E K103H D215N D215E S105T Q219G E40Q K219G D194Q M184L M35A N69P Q151K V10I L142M P197H G207K Q122V I34F K207H F210M K135R Q211D F215I A190T K154E T200M K104N V200M V200L I173S I200M W210M E39L M210S Q173S T135M K39E Q43A D67H R199K H174N Q207T T69P D215S I200L V215D D194K T200L A200M S162G P122V K122V R162G V35L C162T T162G D207T N103H M35R T142M C188H A190V C162G K39L A200L N43A Y188H G196K E207T R35A V215E H208F G194Q A69E Q207D S211E E194Q E207R N123G R197H E203A R39L G203Q N162T R196K K219D M39L V179D A138K D39L N69E R173L R135M T103H R211A T39L S103H V215N I122V N67T L197H E197H K101R D67T G207H S211A E194K G194K L210M E67T D69P G211E E203Q P4H R211E I142M N207K P197E V215S N162G T211A G67H G211A T173L V106M T69G Q211E R190E S39L T211E A69G E123S A69I I39L Q173L F188H S139R K139M D67G D203Q I195L D67S N69G D211E H174K G68R I173L R39E V139R K211E N207H A173L R173E N173A Q211A C162N K135M A39L N173S K142M R101P H175Y T69E N67S E138K H101R T39E A106M R70G T68R S162H R207H N39L D196K V108I A203Q R162H E67S K203Q T162H M39E N175Y K101P T215I T173E E101N V203Q S123G A11T K211A C162H T35L C215I T39K R135T I200T A200I P39L E174R D39E S68R G190E N211E E123G G67T T195L A162Y R207K Q173E I39E D69G N39E N173L D40F N69I G68N I173E D207H H207A M200K A173E C181V E70G Q207H E207D A36D S200E M39K K174R A139R T139R H162Y N211A E210S D211A A39E F210S D69E M139RL210S T69I E215I D39K W162D S200K I35L E36D L215I H174R R200K G67S N174R Q207K R200E I39K N39K E207H N162H A39K S39E D207K K207A I75T S39K K36D S173L T4S Q138K R11T K123G W210S H101P V200K E101H S68N N173E S215I D215I S11T V200E S48E K104R Q32E V75L A28K A169D K64H L142V M75L V189I G162D L4S N104R E207K V21I R190S I200E I200K T200E A190E T200K K6D V106I R32E K35L V111I A162D I139R E40F K103R K11T E35L I181V P4S E40D D69I V90I A200K N11T A200E T142V T158S P197K M35L S173E G207A K139R N32E N64H T69S R64H H197K E173K N69S Y162D A106I K135T Q197K Y64H K40F N207A A69S Q103R R162Y E101R I35T H162D T162Y K32E S162A S123N E190S E101P N103R L215F G28K R197K S162Y R207A R162A T162A C162Y D121H V215I K169D C162A E123N A158S E197K E169D E219N I142V K49R R35L L74I I32E I135V L197K S48T K123N S103R H219N C162D Q207A H188L S162D C188L P62V L75I E121H T69D D207A R101Q V178L E35T D69S L75T R162D T215F K101Q T101Q K43E G190S C215F K219N N162A Y121H T103R E207A C121H Q211K L173K R43E K35T K142V Y171F I8V R135V H208Y V75M G196E E44D T162D V75T Y188L N162Y K219R M35T N162D K28E T166R E219R Q219N A43E R196E M75T M178L S68G E215F L135V K219E Q219E T35V N219R A200T T135V E179I D218E M135V H219R F188L H218E F77L Q219R A190S F116Y A62V A135V N123D T43E S215F D215F P101Q I178L N166R K135V V179I Q166R R35T H101Q N162S D6E F214L K166R A190G E101Q G44D Q43E V75I T195I I166R D179I I35V R174Q G177E A122E K65R Q122E N101Q D196E N177E R22K M75I E177D V5I N43E Q196E E207Q V118I R173K K122E P122E Q102K C162S F208Y D79E I165T R102K S27T T173K S98A D67N V215F T163S S123D K35V E102K P7T E35V I122E E123D Q173K I173K R82K E174Q A173K V202I M35V A7T T215Y L74V Y121D S88W K172R A204E S107T R155G D204E P27T G204E K174Q C88W E215Y A28E N6E R70K L41M K6E V31I N174Q I74V D215Y V132I K204E S215Y K123D Q204E G177D N173K I215Y C215Y N177D G67N C121D L215Y R35V K125R F215Y S58T G143R E70K I7T L132I H174Q F100L F181YV215Y I109L G28E L31I V181Y Q46K L47I S173K R30K C163S G148V N204E I184V G98A L165T C181Y M184V E46K I100L I181Y 1 ZDV P95Q 1 130 I54V 0.5 0.6 G94S Q92E N37C V11S K70Q N88K L63V I85L A22S K57S E34G V15M Q18T R57S V19A H69T L18T S37I I13R Q19A L19A R8D V11C I85M I15M V13R V19S G16R D65Q T4I R8P N88T N83K I11C Q69E L97F L19S L24S P19A R8T N69T Q19S T37I K69E K92E E34I A11C R19A I24S N37I K45N R8L L38F L63C D8P E21G N41T E37I S37Y R70Q P19S A16R P63H R14E D67W K41TT71D A71D L19Q L19V R19S R14Q V82D R6S D8L H61R P12I T37C V11F V63C I82D H69R L10T T71G P12E T37Y M36A A71G L19P D67L L36A E34A I11F I10M Y67L N85M V64M Q19P S11C N14E L89I N37Y G51A E7H A11F V13M Q69T I13M K34I N61R C67LL10M E37Y V15L R92L N14Q Y67W N70Q S67L Q61R T14E N7H K69T K41N S67W I47V C6S R92H I15L C67W N69R N69Y H69Y T14Q R92EK20L V82G S63Q S63H A12N N37E E12N L10R A63V I82GM46V A63C T12N I20T E34D L63H Q63T K12N T37E L63Q P12N L63S N83D A22V S12N I63H K34A F95L I72L L24F C63H P63T V63H C67Y D30N R69Y I63Q Q61N S11F I12N L64M N61H S74A E35G C63Q V82M R63H V63Q Q69K G48M D88S N88S F53Y V63S R63Q Q61H A63H T74A I82M I63S I24F T71L I50V A71L C63S A63Q M89I S63T D67F T69Y Q69Y I72V H63T G73C Y67F G16E K69Y E34Q I93L D35G I63T M36V L63T Q69R A63S L36V C37D C67F T72L N35G S67F D12N I85V K35G C63T V63T K69R N61E A63L R63T R43T K34D N88D V89I E72L Q61E A63T V19I A16E M20R A71T F10V I62V N69H L24I V71I V33F L19I T72V I10FE72V K34Q K45R V82I Q19I T37N V82S S37D V32I I10V V82F Q58E G48V I82S K74A D60E I82F L10F R19IP19I S63P I54A L54M L10V F53L G73T V20T L33F V36I V64I N85V T37D Q35G K92Q M20T T71I A71I I54L I54M M93LR20T L46I N37D L63P K20T I43T N60E I37D I33F K20R A16G Q39P M64I I63P M93IE37D S39P V20I C63P R70K V63P R43K R14K R20I M20IV46L Q69H R63P L36I M36I M46LK20I T43K D65E A63P K69HL64I I41R E7Q G73S T39P R6W K98N I75V N41R V93L N14K N7Q V46I T71V K41R K65E M46I M62I A71V F82A M62V V93I T14K N43K H2Q L90M V82A R92Q L82A I43K L54V I82A C6W F95C 0.0 0.1 0.2 0.3 I84V I54V 0.4 change in log(rf) Figure 6.7: Scatter plots between mutation scores of g2praw (x-axis) and g2pmixed (y-axis) for different drugs. Mutations from the wild type amino acid to a mutation listed by the IAS list (Johnson et al., 2008) for the corresponding drug class are colored red. 6.3 Mutation Models 131 Moreover, the g2pmixed model excludes mutations that did not appear among resistant sequences from the EuResist database by definition, since Rp,a0 = 0 ⇒ M (p, a, a0 ) = 0. 6.3.2 Mutation Probabilities Derived From Treatment Data The previously introduced mutation models assume (to a varying extent) that mutations conferring the largest increase of resistance as measured in vitro are selected first during treatment. Certainly, this assumption does not hold for real treatments. In fact, the order of mutations can also be expected to be influenced by restrictions imposed by the protein structure. For example, a mutation might result in an impossible side chain conformation and therefore requires a prior mutation in the vicinity that may relax the side chain placement. In the case of the HIV integrase, it was suggested that different pathways may be induced by alternative binding conformations of the inhibitor to the target protein (Loizidou et al., 2009). These circumstances are not respected by in vitro drug resistance. A perfect mutation model should consider all these aspects. However, as briefly mentioned in the previous section there are some rarely occurring mutations that confer a large change in resistance. Unfortunately, the mechanisms underlying these rare events are not yet understood and, in general, knowledge on how and why a certain mutation is preferred over another one is rather scarce. Instead of integrating all these different (and probably incomplete) sources of knowledge, one can study the history of the virus for constructing a mutation model. The mutagenetic trees introduced in Chapter 4 rely on the fact that in a large database of viral sequences every step of a resistance pathway occurs sufficiently often. Briefly, if mutation X is rarely observed alone but rather frequently in the presence of mutation Y , one can assume that Y has to be acquired before X. The success of the mutagenetic trees demonstrates that relying on large databases for deriving evolutionary models is a promising alternative. Obviously, longitudinal sequence data from monotherapies constitute the ideal source of information for estimating in vivo mutation probabilities. Monotherapies, however, are considered insufficient regarding today’s standard of care and are therefore not enriched in large databases. Nowadays standard therapies typically comprise multiple drugs from different drug classes. The majority of these treatments, however, uses only one protease inhibitor or one NNRTI, thus even modern therapies can be regarded as pseudo-monotherapies. Precisely, from the approximately 100,000 therapies stored in the EuResist database 99.6% and 95.6% of the NNRTI- and PI-containing regimens, respectively, apply only one representative of the corresponding class. In contrast, only 15.8% of the recorded treatments are pseudo NRTI monotherapies. NRTIs are given mostly in combination and consequently 73.5% of the NRTI regimens contain exactly two NRTIs. These statistics suggest that in general it should be possible to deduce in vivo mutation probabilities for individual drugs (PIs and NNRTIs) or at least for drug pairs (NRTIs) from large databases. The bottleneck of this estimation is the availability of sequence data, as at least one sequence is required before a treatment change (baseline genotype) and one during the treatment (follow-up genotype). Figure 6.8 depicts a schematic description of the training data requirements and a graph relating the time allowed between baseline genotype and treatment start to the size of the resulting training set. Beyond a threshold of 180 days, the amount of available training sequences grows only 132 6 Planning Sequences of HIV Therapies start Sequencing A < x days Sequecing B stop 111111111111111111111111 000000000000000000000000 000000000000000000000000 111111111111111111111111 > 0 days > 0 days 000000000000000000000000 111111111111111111111111 Therapy training set size 4000 3000 2000 1000 0 0 100 200 300 400 500 600 700 days allowed between treatment start and baseline sequence Figure 6.8: The top figure shows a schematic description of the training data requirements. The baseline sequence has to be obtained at most x before start of the treatment, while the follow-up sequence may be obtained at any time during the treatment. The lower graph depicts the relationship between x and the size of the dataset. slowly with respect to the time frame allowed between obtaining the baseline sequence and treatment start. The downside of a large time frame is that the viral population is under continuous selective pressure by the old treatment and thus might accumulate additional mutations before start of the next treatment. From this point of view, a small time frame is preferred. Here, however, the downside is clearly the lack of available sequence data. For complying with the standard datum definition the cutoff of 90 days was selected and gave rise to approximately 1,900 sequence pairs. For exploiting the wealth of the EuResist database a cutoff of 720 days was investigated as well. These datasets provide for every treatment the applied drugs, the baseline sequence, the follow-up sequence, and the three time points. For estimating the in vivo mutation rates, the available information for a sequence pair is represented as an itemset. On this dataset, itemset mining is applied to discover interesting rules, e.g. rules in the form of “drug → mutation”. The confidence of a rule serves as the mutation probability. The plain application of an itemset mining algorithm unfortunately retrieves a number of artifacts, for instance, rules stating that zidovudine causes NNRTI related mutations with a high confidence. These artifacts mainly result from the co-administration of several compounds and have to be removed in a postprocessing step. Furthermore, the approach assumes that RTIs do not influence mutation probabilities of positions in the protease and vice versa. This allows to treat the estimation of protease and RT mutations as two independent problems and consequently reduces the amount of artifacts. The following two sections describe the itemset mining 6.3 Mutation Models 133 and subsequent filtering in detail. The resulting mutation models are termed vivo90 and vivo720 . Mining Frequent Mutations In frequent itemset mining instances are represented as sets containing a subset of N binary attributes termed items. Due to historic reasons, each instance is called transaction t ⊆ {i1 , i2 , . . . , iN } and a collection of transactions is referred to as database D = {t1 , t2 , . . . , tM }. In our case, each drug applied in a regimen is represented by one item. Furthermore, every difference between the baseline sequence and the follow-up sequence is encoded as an item. For limiting the number of items to be considered in total, only the mutations in the baseline sequence with respect to the reference HXB2 were encoded as single items (instead of all positions of the baseline sequences). Ambiguities at positions were resolved, i.e. a1 , . . . , aj at one position in the baseline sequence was encoded as j items. The same holds true for encoding the differences between baseline and follow-up sequence. Given a large database, one wants to find sets of items that frequently occur together in transactions, i.e. item sets whose support exceeds a certain threshold. The support of an itemset X is defined as the number of times all items of X occur together in the same transaction in the database: supp(X) = |{m|X ⊆ tm }|. Frequent itemsets can be used to derive rules X1 ⇒ X2 , with X1 , X2 ⊂ X ∧ X1 ∩ X2 = ∅. The confidence of a rule is defined as supp(X) conf(X1 ⇒ X2 ) = . supp(X1 ) These rules are termed association rules and can be interpreted as P (X|X1 ) (Agrawal et al., 1993). The best-known algorithm for finding frequent itemsets is the Apriori algorithm (Agrawal and Srikant, 1994). Apriori exploits the fact that X1 ⊆ X ⇒ supp(X1 ) ≥ supp(X). (6.5) In consequence, this means that if X1 is not considered a frequent itemset, then a set containing X1 can never be a frequent itemset, and also that if X is a frequent itemset, then all subsets of X are frequent itemsets, too. The algorithm limits the search space by first finding all frequent n-itemsets (|X| = n) and based on these defines candidates for frequent n + 1-itemsets. Precisely, if X1 , X2 are frequent n-itemsets and X1 ∩ X2 = n − 1, then X1 ∪ X2 is a candidate for a frequent n + 1-itemset. The non-frequent candidate sets are then discarded by checking if all subsets of size n are frequent. The candidate generation and validation, however, can be quite costly in terms of computation. The frequent pattern (FP) tree mining algorithm avoids the candidate generation step (Han et al., 2004). The FP tree mining algorithm first represents the database D as an FP tree. In essence, an FP tree is a prefix tree with counts of occurrences of a prefix stored in the node representing that prefix. For efficient tree traversal the FP tree maintains an item header table pointing to an item’s occurrences in the tree via a linked list. Figure 6.9 depicts an example comprising nine transactions and the resulting FP tree including its header table. 134 6 Planning Sequences of HIV Therapies List of itemIDs {I1, I2, I5} {I2, I4} {I2, I3} {I1, I2, I4} {I1, I3} {I2, I3} {I1, I3} {I1, I2, I3, I5} {I1, I2, I3} reordered list Item ID Support Node−link (I2, I1, I5) (I2, I4) I2 7 (I2, I3) I1 6 (I2, I1, I4) I3 6 I1:4 (I1, I3) I4 2 (I2, I3) I5:1 I5 2 (I1, I3) I3:2 (I2, I1, I3, I5) (I2, I1, I3) null {} I2:7 I1:2 I3:2 I4:1 I3:2 I4:1 I5:1 Figure 6.9: The left table lists a database comprising 9 transactions. In the right column the items I1,. . . ,I5 are ordered with respect to their frequency in the database. The FP tree represents exactly this database. The header table allows for an efficient retrieval of all occurrences of an item within the tree. This is needed to construct the conditional pattern base. The colored edges indicate the links one has to traverse for constructing the conditional pattern base for I3. Example adapted from Kamber and Han (2001). For the FP tree construction the items within every transactions are first ordered in decreasing order of their support in the whole database. Of note, after the reordering the itemsets are no longer sets but item tuples. The root node of the tree is labeled with the empty set, then each transaction tm is inserted into the tree in the following way: starting from the root the tree is traversed following the order imposed by the items in transaction tm , if no branch exists then it is created and the counter at the target node is initialized with 0, after the last item of tm the count on each visited node is increased by one. The task of finding frequent patterns in the database is now transformed to mine frequent patterns in the FP tree. The FP tree is mined recursively. The process starts with constructing the conditional pattern base for each frequent 1-itemset. The frequent 1-itemsets can easily be determined from the header table. The conditional pattern base for an item in is simply given by the itemsets on all paths from nodes labeled with in to the root node with the support set to the count stored in the in node. For example, the conditional pattern base for I3 in Figure 6.9 is {(I2 I1: 2), (I2: 2), (I1: 2)} as indicated by the colored solid edges. The occurrences of I3 in the tree can easily be found by following the linked list (colored dashed edges). Another FP tree is constructed for the conditional pattern base and mined for frequent patterns. The end of the recursion is reached when either the FP tree for the conditional pattern base is empty, i.e. comprises only the root node, or the FP tree comprises only a single path. In the latter case, all combinations of nodes on that single path are (parts of) a frequent pattern. Algorithm 5 provides pseudo code for the recursive FPgrowth function (adapted from Kamber and Han (2001)). Due to the excessive computational requirements of Apriori, the FP tree mining was used to find frequent mutations caused by protease and reverse transcriptase inhibitors. A pattern was considered frequent in the case of protease mutations and reverse transcriptase mutations if it occurred 12 and 20 times, respectively. These thresholds were manually set 6.3 Mutation Models 135 Algorithm 5 FPgrowth(Tree, α) if Tree contains a single path P then for each combination (denoted as β) of the nodes in path P do generate pattern β ∪ α with supp(β ∪ α) = minimum support of any node in β end for 5: else for each αi in the header of Tree do generate pattern β = αi ∪ α with supp(β) = supp(αi ) construct β’s conditional pattern base and β’s conditional FP_tree: Tree β if Tree β 6= ∅ then 10: FPgrowth(Tree β , β) end if end for end if and represent a trade-off between robustness of mined rules and the number of distinct rules. For estimating the mutation probability and its variance 1,000 bootstrap samples of the original dataset were generated. Postprocessing of Mutation Rules For protease and reverse transcriptase, executing the FP tree mining algorithm results in 3323 and 2283 (61,498 and 24,968) frequent patterns on average per bootstrap replicate, respectively, using a cutoff of 90 (720) days. A large fraction of these frequent patterns is not of interest for our application. Combining results from the bootstrap replicates by taking the union of the discovered patterns and limiting the list of frequent patterns to itemsets containing at least a drug and one additional mutation (i.e. difference between baseline sequence and follow-up sequence) yields 171,993 and 21,124 (4,844,840 and 217,042) potentially interesting patterns for protease and RT, respectively, using the 90 (720) days cutoff. Figure 6.10 a) depicts the relationship between the median support (x-axis) and the number of bootstrap replicates, in which a pattern was considered frequent (y-axis) for the protease dataset using the 720 days cutoff. As expected, the majority of the patterns was selected only in very few bootstrap replicates (Figure 6.10 b)), e.g 105,032 and 4117 patterns were selected once and at least 200 times, respectively, on the protease 90 days dataset. However, at first, all frequent patterns were rewritten as rules. In general, a P n frequent n-itemset can be written as n−1 i=1 i different rules. In our application, however, there is only one meaningful rule per pattern. Precisely, an item of a pattern is either a compound, a baseline mutation, or an additional mutation, thus the only interesting rule has drugs and baseline mutations on the left side and additional mutations on the right side. This allows for interpreting the confidence of a rule as the probability of developing a certain mutation (or a list of mutations) given a drug and a set of baseline mutations. Unfortunately, the application of an association rule mining algorithm on our kind of data generates a large number of artifact rules originating from the frequent co-administration of drugs. For example, since the year 2000 the activity of protease inhibitors is increased by administering a boosting dose of ritanovir (RTV) that occupies the cytochrome P450- 136 6 Planning Sequences of HIV Therapies (a) support vs. bootstrap selection (b) histogram bootstrap selection Figure 6.10: Scatter plot between the median support and the times a pattern was selected in a bootstrap replicate (a). Histogram of the times a pattern was selected from the 1000 bootstrap replicates (b). Both figures are derived from the protease 720 days cutoff dataset. mediated metabolism in the human liver (Kumar et al., 1996) and therefore allows improved plasma levels of drugs that undergo the same metabolism (typically protease inhibitors). As a consequence, today RTV (or RTVb for boosting dose) is part of every PI containing regimen and indeed, approximately 70% of PI containing treatments in the dataset list RTV as one of the drugs, thus many mined rules claim that RTV is the cause for certain mutations. This is most certainly not the case, however, and corresponding rules have to be filtered out. For the reverse transcriptase there exist similar cases, lamivudine (3TC) for example is used in approximately 60% of the NRTI containing therapies and a number of rules accordingly blame 3TC for causing a great variety of mutations. In fact, 3TC is only associated with mutations at positions 65 and 184 (Johnson et al., 2008). On the other hand, other RTIs are connected with the appearance of mutations at position 184 since due to the relation in Eqn. 6.5 all frequent itemsets containing 3TC imply a frequent itemset without 3TC. This example demonstrates that inferring which RTI causes which RT mutation is a challenge. The subsequent application of the mutation models, however, does not require the estimation of models for individual drugs. Especially the models for RTIs will have to be combined, while the PI models are independent to a greater or lesser extent. Therefore, solving the inference of causality for RTI models can be avoided by estimating models for combinations of RTIs directly. The challenges for a postprocessing filter differ between the drug targets. The following two paragraphs describe the filtering steps that were applied to remove most of the artifacts from the list of protease and reverse transcriptase mutation rules. In the following the left side and right side of a rule are denoted by Xl and Xr , respectively. The set of PIs and RTIs is denoted by Cpro and 6.3 Mutation Models 137 Crt , respectively, with C = Cpro ∪ Crt ∪ “empty00 . The set of all baseline and additional mutations is denoted by Ebase and Eadd , respectively. All rules containing RTV as the only PI are most likely artifacts and are therefore filtered: |RTV ∩ Xl | = 1 ∧ |Cpro ∩ Xl | = 1; the first part of the expression checks whether RTV is on the left side of the rule, while the second expression checks whether there is only one PI listed on the left side of the rule. Moreover, since not all regimens contain a PI, there are many transactions for the protease task that contain the item “empty00 . These transactions represent pseudo-treatment breaks for the protease. Rules with “empty” as only compound on the left side are interesting, since they provide information on how resistance mutations disappear from the predominant viral population when there is no selective pressure on the protease. For this task, however, these rules constitute an artifact and have to be removed: |“empty00 ∩ Xl | = 1. In the last step, all Patterns (Xl ∪ Xr ) that were not selected a sufficient number of times in the bootstrap analysis were discarded. Protease Rules Filter: In a first step, all rules that only contain one drug are discarded (|Xl ∩ C| = 1), this also includes the rare case of “empty” as the only listed compound. In contrast, all rules comprising three or more drugs are retained (|Xl ∩ Crt | > 2). Finally, all rules comprising exactly two RTIs have to undergo a further filtering step. Briefly, if two drugs are said to cause a certain mutation, then there is the chance that actually a third drug (not part of the rule) is the true cause. This case obviously occurs when three drugs are frequently administered together. For example, Atripla is available as a single pill and comprises three compounds, two NRTIs (FTC+TDF) and one NNRTI (EFV). As a consequence one obtains the artifact rule {FTC, TDF} ⇒ {K103N}. The mutation is clearly an NNRTI related mutation and therefore most likely caused by EFV. Given a rule Xl ⇒ Xr containing exactly two drugs {c1 , c2 } = Xl ∩ Crt , then 0 = Xl − {c1 , c2 } is the set of baseline mutations. The filter checks for every RTI Ebase 0 that is not part of the rule x ∈ Crt − {c1 , c2 } whether the rules {c1 , x} ∪ Ebase ⇒ Xr and 0 {c2 , x}∪Ebase ⇒ Xr exist. If for any drug x both rules exist and both exhibit a significantly higher confidence than the original rule, then the original rule is discarded. For assessing whether the difference in confidence is significant a t-test based on mean and standard deviation of the confidence obtained from the bootstrap replicates can be applied. In accordance with the protease filter, all patterns that were not selected a sufficient number of times in the bootstrap analysis were discarded as well. Reverse Transcriptase Rules Filter: From Rules to Mutation Models Applying the described filters with a bootstrap selection threshold of 20% for both sets results in 1188 and 4541 (12,572 and 3861) rules for protease and RT, respectively, using the 90 (720) days cutoff. These rules have two major advantages over the mutation scores derived from in vitro measured resistance. Firstly, the rules reflect the mutation rates in vivo, and secondly, the presence of baseline mutations on the left side of the rules allows to model mutation rates in dependence of pre-existing mutations, rather than independently. 1 10% minimum selection threshold 138 6 Planning Sequences of HIV Therapies An interesting example for rules obtained with the 720 day cutoff is given by the rather novel protease inhibitor tipranavir (TPV). The rule mining discovered three major TPV related protease mutations 33F, 82T, and 84V, with probability 0.27, 0.63, and 0.38, respectively. The mutation at position 82, however, does not occur from the wild type amino acid (Valine), but from a mutant (Alanine) that was probably selected during previous treatment with another PI. For this mutation there exist two rules in the list: {TPV} ⇒ {A82T} and {TPV, 82A} ⇒ {A82T} with confidence 0.28 and 0.63, respectively. For constructing the mutation model one has to take into account that A82T is not a mutation from wild type and therefore use the confidence of the rule listing the corresponding baseline mutation on the left side. While in theory one can construct transducers that represent mutation models with an arbitrary number of baseline and additional mutations, their construction algorithm can be rather complex. For instance, in order to model the mutation probability in presence of baseline mutations, the mutation transducer must have a path that is different from the two states in Figure 6.3 and comprises at the position of the baseline mutation only the corresponding amino acid. Moreover, one has to distinguish between baseline mutations that occur before and after the follow-up mutation in the sequence (regarding the alignment position). For example, for extending the mutation transducer in Figure 6.4 with differences in the mutation probability of position two in the presence of mutations at position one or three, one has to introduce two new states (2 and 3). One new edge connects the initial state with state 2, its input and output label is “PRO_1A” and the weight is equal to the mutation probability of position 2 in presence of 1A. State 2 and the final state are connected via an edge with input label and output label “PRO_2I” and “PRO_2C”, respectively. The same labeling is used for the edge between the initial state and state 3. Finally, state 3 is connected to the final state using an edge with input and output label “PRO_3E” and has the altered mutation probability as weight. Moreover, in theory, states 2 and 3 need loop edges for all alignment positions between the baseline and the follow-up mutation. As can be easily seen, the size of the transducer grows rapidly with respect to the number of baseline mutations that are represented. For practical reasons we restrict the transducer construction to rules that provide the probability of a single mutation, i.e. |Xr | = 1, and with at most one baseline mutation. The construction algorithm is simplified 0 by the fact that for the vast majority of rules conf(Xl ⇒ Xr ) ≤ conf(Xl ∪ Ebase ⇒ Xr ) holds (see Figure 6.11 a)). Thus, one simply has to add an alternative path featuring the elevated probability in addition to the normal mutation probability. In accordance with in vitro mutation models we define mutation models derived from treatment data as Md,m (p, a, a0 ) = max{conf({d, m} ⇒ {apa0 }), conf({d, m, pa} ⇒ {apa0 })}, where m is a single baseline mutation, and d is either a single drug (like above) or a combination of drugs. Hence, for complete treatments we have Y Mt,m (p, a, a0 ) = Mt0 ,m (p, a, a0 )−k , t0 ⊆t with k = |{t0 |t0 ⊆ t ∧ Mt0 (p, a, a0 ) 6= 0}|. Figure 6.11 b) depicts a the number of rules per PI derived from the 720 days cutoff dataset. The figure also clearly demonstrates the existence of a bias towards frequently 6.4 Validation (a) effect of baseline mutations 139 (b) rules per protease inhibitor Figure 6.11: Scatter plot between the probability for one mutation with and without the presence of baseline mutations given the same drug (a). Final number of rules for different protease inhibitors, different grey scales indicate the number of baseline mutations in those rules (b). Both figures were derived from the protease 720 days cutoff dataset. used drugs that are obviously captured better by the model. For example, the majority of rules describe mutation development under LPV/r containing treatments. Moreover, the figure indicates that by restricting the mutation models to at most one baseline mutation the majority of discovered rules is exploited (except for LPV). 6.4 Validation Finding a validation scenario for an FDO score is a challenging task. In contrast to the prediction of treatment response where one has the viral load as a direct measure, there is no useful covariate in the HIV treatment databases, which necessarily has to correlate with a good FDO. For instance, it is unlikely that a suboptimal choice regarding FDOs in the first treatment will significantly shorten the patient’s life as so many other factors most likely have a more observable impact, e.g. adherence. Moreover, the group of patients that are followed from their first therapy till death is rather small and most likely did only have very limited alternatives for the first treatment. Nonetheless, there are possibilities for (at least) validating the mutation models alone. The following two sections describe approaches for estimating the quality of the introduced mutation models. 140 6 Planning Sequences of HIV Therapies 6.4.1 Prediction of Response to the Next But One Treatment In previous chapters we aimed at predicting the virological response to the patient’s upcoming treatment. To this end we extracted retrospective treatment change episodes from a database collecting routine treatment data. Then, statistical learning was used to correlate information about the virus and the treatment to the patient’s virological response to the antiretroviral combination treatment. The availability of a mutation models allowing to evolve the viral population during a treatment makes a modification of this initial setting possible. Knowing the genotype of a patient shortly before one treatment, one can estimate the additional mutations at end of that treatment, and predict response to the subsequent treatment based on this simulated genetic makeup of the viral population. This scenario allows to study the impact of parameters of the mutation models. First, one can study which mutation model provides the basis for the best predictions. Second, one can study whether a fixed or a variable number of newly introduced mutations provides better results. Third, one can assess the benefit of using the n-most likely results of the in silico evolution over the single best variant. The following sections provide the experimental setup, the results, and the discussion of this validation scenario. Experimental Setup Figure 6.12 illustrates the requirements for TCEs that are used for this validation. Briefly, each TCE comprises two treatment changes. A genotype has to be available at most 90 days prior to the start of treatment A. In analogy to the EuResist standard datum definition for assessing response to the treatment, a baseline viral load and follow-up viral load have to be available at most 90 days before and about eight weeks (4 to 12 weeks) after onset of treatment B, respectively. Application of this modified TCE definition to the EuResist integrated database (release 2008/10/10) yields 1952 instances. A treatment success is defined by suppressing the viral load below the limit of detection (400 copies of viral RNA per ml blood) or achieving at least a 100-fold reduction compared to the baseline value. Of note, the response to treatment A is of no interest, thus an instance is considered successful if treatment B is a successful treatment. Using this definition, 1395 instances are considered to be successful and 557 are considered treatment failures. The instances selected for this validation overlap with the instances used for training the EuResist prediction model. Precisely, 1428 TCEs and 350 TCEs of the EuResist training data correspond to treatment A and treatment B, respectively. Since we predict neither the response to treatment A nor use the real viral genotype obtained shortly before treatment B for our predictions, we can consider this dataset as an independent test set. Each of the sequences in the dataset was represented by a linear acceptor. Ambiguities in the amino acid sequence were encoded as alternative edges between two states. In a first step only those amino acids in ambiguities conferring the most resistance against the current treatment were retained (using a rating transducer as in Figure 6.3). Then, novel mutations were introduced into the sequence by applying all five mutation models described above. The number of newly introduced mutations is a parameter. We follow three strategies for setting the number of mutations. The first strategy introduced a fixed number of mutations into each sequences and we vary this number from 0 to 10, with 0 representing the reference model that does not introduce any mutation. The second approach 6.4 Validation start Sequencing < 90 days start VL 111111111111111111111111 000000000000000000000000 000000000000000000000000 111111111111111111111111 > 14 days 000000000000000000000000 111111111111111111111111 < 90 days > 0 days VL < 400 cp/ml ? 141 stop 111111111111111111111111 000000000000000000000000 000000000000000000000000 111111111111111111111111 > 0 days 28 days 56 days 000000000000000000000000 111111111111111111111111 Therapy A Therapy B stop Figure 6.12: Modified TCE definition. Every TCE comprises two treatments. The sequence had to be obtained at most 90 days before treatment A. A baseline viral load measurement had to be available at most 90 days before treatment B and a follow-up viral load measure had to be available within 4 to 12 weeks after onset of the therapy. A TCE is successful if treatment B is successful, i.e. viral load is reduced 100-fold or below 400 copies of viral RNA per ml (limit of detection). produces a number of mutations that depends on the composition of the antiretroviral therapy. More precisely, the number of mutations is simply based on the average potency of the regimen. Here, we use the instantaneous inhibitory potential (IIP) after 24 hours of the maximal concentration of the drug in the plasma (IIP24 ) as defined by Shen et al. (2008). The IIP24 ranges from 0 for d4T to 8.4 for DRV/r. The number of mutations based on the average potency pot is computed by n(pot) = dα · exp(1 − pot)e, with α ∈ {1, . . . , 5}. (6.6) If n(pot) is less than 1 or exceeds 10 mutations, then n(pot) is set to 1 or 10, respectively. The parameter α controls the increase of mutations with the decrease of average potency. The exponential function is used to realize a convex function. The third strategy originates from the fact that an effective treatment gives the virus fewer opportunities to mutate. Thus, the number of mutations is determined by the predicted success probability of treatment A (pA ). Precisely, the number of mutations is calculated as n0 (pA ) = dα(exp(1 − pA ) − 1)e, with α ∈ {1, . . . , 5}. (6.7) Again, the exponential function is applied for realizing a convex function. Finally, we use a combination of potency and predicted success – the effect – for estimating the number of mutations. Precisely, we apply Eqn. 6.6 to pot · pA instead of pot alone. For models with varying number of mutations we use two different baselines approaches. The first approach generates a random number of mutations that is uniformly sampled from 1 to 10 (randomA). The second method generates a random number of mutations that follows the distribution of the pA -based model. This is achieved by randomly shuffling the number of mutations computed with Eqn. 6.7 (randomB). For both random models the provided performance measure is the mean of 100 repetitions. For each of these models we compute the list of 10,000 variants receiving the highest score according to the mutation model. Our prediction model, which we contributed to the EuResist prediction engine (5.2), is used to predict the response of the 10,000 most likely outcomes of the simulated evolution 142 6 Planning Sequences of HIV Therapies g2praw g2ptransition g2pmixed baseline 0.757 fixed number 1 0.757 0.764 0.762 2 0.749 0.755 0.756 3 0.724 0.735 0.741 4 0.702 0.714 0.725 5 0.694 0.699 0.710 6 0.692 0.690 0.696 7 0.690 0.674 0.683 8 0.687 0.660 0.671 9 0.686 0.651 0.660 10 0.684 0.645 0.651 dependent on current treatment composition randomA 0.694 0.687 0.690 randomB 0.737 0.738 0.741 potency 0.759 0.768 0.768 success 0.766 0.767 0.768 effect 0.762 0.769 0.768 vivo90 vivo720 0.749 0.743 0.738 0.728 0.718 0.710 0.703 0.697 0.693 0.689 0.757 0.756 0.750 0.735 0.721 0.711 0.705 0.701 0.696 n/a 0.707 0.737 0.757 0.755 0.759 0.717 0.745 0.764 0.765 0.768 Table 6.1: Performance for different mutation models and mutation numbers measured in AUC. to the drugs in treatment B. The predicted success for treatment B is the weighted mean of the top n of these 10,000 predictions 1 Pn n X j=i sj i=1 s i pi , (6.8) where si and pi are the mutation score and the predicted success probability of the ith variant, respectively. For g2praw si corresponds to the increase in resistance, and for all other models si equals the real probability (not the negative decadic logarithm). Results Table 6.1 lists the AUC values that were achieved in this setting with all five mutation models. None of the models manages to substantially improve over the baseline (no mutations) and only subtle differences are visible between the mutation models. A general trend is the decline of prediction performance with increased fixed number of mutations. Consequently, the introduction of only one mutations provides the highest AUCs. The introduction of a variable number of mutations depending on the treatment composition of regimen A provides a slight improvement over the baseline (no mutations). Moreover, all three proposed ways of estimating the number of mutations improve substantially over models generating a random number of mutations. Here, the uniform sampling provides the worst results and is probably an underestimation of the baseline. 6.4 Validation 143 Still, none of the mutation models paired with an approach to estimate the number of mutations provides a substantial improvement over the baseline. One reason for this observation is doubtless the large fraction of treatments having a low success rate with respect to regimen B even before regimen A caused changes in the viral population. Figure 6.13 a) depicts the distribution of predicted success rate according to regimen B (pB ) at baseline for the viral sequence obtained before onset of regimen A. Interestingly, the distribution resembles the one that can be expected for pA as seen in Figure 5.7 a). It becomes obvious that a large number of treatments failing regimen B had a bad perspective of succeeding even before regimen A was administered. Hence, the baseline of zero mutations is almost insurmountable and little improvement is visible by the mutation models. For a fair comparison of the mutation models, the distributions of the predicted success rate in the failing and succeeding groups should be identical for regimen B before onset of regimen A. In order to have an ideal starting point for the validation of the mutation models, the distribution of pB should be identical in failing and succeeding cases. Thus, for studying a distribution of pB that is more similar in failing and succeeding cases, we applied a threshold on pB and considered only cases exceeding that threshold. The cutoff was varied from 0.0 (all cases) to 0.9 in steps of 0.1. Moreover, ambiguous failures (i.e. treatments labeled as failure but achieving a VL reduction below 500 copies at least once during the treatment; see Section 5.3) were removed for obtaining a more robust quality assessment. Figure 6.13 c) displays the decline in AUC for increasing cutoffs of pB for the baseline (no mutations), different mutation models with variable number of mutations based on pA , and both random number of mutations models (using g2pmixed ). This setting reveals qualitative differences among the g2p-based mutation models and the mutation models derived from treatment data. All three g2p-based models perform the best and achieve an AUC of approximately 0.630 compared to the baseline of 0.547 at a pB cutoff of 0.8. At this threshold, the vivo90 model is indistinguishable from the baseline with an AUC of 0.556, while the vivo720 model reaches a slightly better AUC of 0.602 and places itself between the baseline and the g2p-based models. The results also demonstrate the importance of linking the right number of mutations to the right treatment. The performance of randomB is very close to the baseline, while g2pmixed with pA -based number of mutations demonstrates clearly superior prediction accuracy. Potency- or effect-based computation of the number of mutations produces similar curves (data not shown). Of note, only 10 cases were labeled as failure when the threshold of 0.9 was applied. Due to the low number of failing cases, results obtained using the highest cutoff cannot be considered reliable. For instance, the increase observed for both random models when moving from the 0.8 to the 0.9 cutoff are most likely an artifact. For investigating the impact of the number of most likely viral variants considered for the computation of the success for the next but one regimen, we chose the pB cutoff of 0.8, the g2pmixed model, pA -based number of mutations, and varied the number of considered in silico variants: {1, 10, 100, 1000, 10000}. Figure 6.13 b) depicts the corresponding ROC curves. Clearly, it is beneficial to consider more than the single most likely variant: the top 10 yield already an improvement of 0.03 in AUC. Nevertheless, no improvement was observed beyond the top 1000, which increased the AUC by 0.051 compared to the single most likely variant. 6 Planning Sequences of HIV Therapies 0.6 0.4 1 (0.58) 10 (0.61) 100 (0.62) 1000 (0.631) 10000 (0.631) 0.0 0.2 True positive rate 0.8 1.0 144 0.0 0.2 0.4 0.6 0.8 1.0 False positive rate (a) Distribution of pB 0.8 (b) Impact of varying top n variants ● ● ● 0.7 ● ● AUC ● 0.5 0.6 ● baseline g2p raw g2p mix g2p trans prob 90 prob 720 randomA randomB ● ● ● ● 0.0 0.2 0.4 0.6 0.8 success cutoff (c) Development of AUC Figure 6.13: (a) Distribution of pB before onset of regimen A in failing treatments (red) and successful treatments (green). (b) Change of AUC depending on the considered top n viral variants. AUC computed with g2pmix mutation model at cutoff 0.8 using. (c) Development of AUC after restricting the data in terms of pB . Apart for the random models, the number of mutation was based on pA . 6.4 Validation 145 Discussion The results demonstrated that it is possible to infer response to the treatment following the next treatment on the basis of the current genotype. Unfortunately for our validation, the baseline (i.e. no additional mutations) provided already a very good prediction performance. Indeed, the performance was close to AUC values observed for predicting response to the next antiretroviral regimen (see results in Chapter 5). The high performance was the result of a large number of cases that had bad perspectives for succeeding with regimen B even before regimen A caused additional mutations. Consequently, the mutation models showed only little improvement. A scenario that is more suitable for assessing the quality of the mutation models is the separation of failing and succeeding regimens that had the same (or at least similar) distribution of success rate pB before the onset of regimen A. Here, mutations selected by regimen A are the cause for the later failure, and the mutation models should allow for a separation. The removal of all instances with pB below a certain cutoff served as way for matching the distributions of pB in successful and failing regimens. The altered scenario revealed differences between mutation models: g2p-based models generally outperform the models estimated from treatment data. Moreover, the importance of assigning the right number of mutations to the treatments was made evident and we could demonstrate a clear improvement over the baseline (no mutations). Of note, the performance of the baseline can be seen as a measure of dissimilarity of the distribution of pB in both groups (if the distributions were identical, then the baseline would achieve an AUC of 0.5). 6.4.2 Comparing Predicted Future Drug Options to Data From Real Cases The second validation scenario aims at comparing predicted future drug options after a treatment to real future drug options observed at treatment failure in patients. In order to avoid problems with archived drug resistance mutations in patients, which had already received antiretroviral therapies, this scenario is restricted to treatment naïve patients. That restriction has another advantage, as it allows us to assume that the virus before the treatment is a wild type virus and therefore we only need to retrieve sequences of the patient at end of the first treatment serving for the computation of future drug options. Consequently, this maximizes the amount of available data for this analysis. The simplification bears also one risk: by assuming that a patient was infected with a wild type virus, we ignore the existence of conferred drug resistance mutations. Validation Setup The EuResist database was queried for HIV genotypes that were obtained during the first treatment the patient ever received. More precisely, the sequence comprising RT and protease had to be obtained 30 days after start earliest and 14 days before stop of the treatment latest, respectively. As argued earlier (Section 4.2), the ability to sequence the virus while the patient is on antiretroviral treatment indicates virological failure. In order to ensure that the patient indeed received the first treatment, it was required that all treatments, which the patient had ever received, were stored in the database. Table 6.2 lists 146 6 Planning Sequences of HIV Therapies all antiretroviral treatments that met the requirements and occur at least three times. For all sequences fold-change in resistance against 17 drugs was computed with geno2pheno and the activity score for each drug was determined (Section 3.2). The FDO was simply defined as the sum of the activity for all 17 drugs, thus the FDO was in the range of [0, 17]. In addition to the FDO for all drugs, drug class specific FDOs were calculated. Unfortunately, a number of sequences (about 5.9%) suggested complete resistance against a large number of drugs from all classes. Given that the sequences were obtained at the end of the first antiretroviral treatment, this was an unexpected result and most likely originates from faulty entries in the database or cases of transmitted drugs resistance. Hence, we excluded all cases that exhibit an FDO score of 5.0 or less from the analysis. Table 6.3 lists FDO and drug class wise FDO after removal of the cases. Predictions for the FDOs were generated by constructing a linear acceptor representing the consensus B wild type sequence. Then, for every treatment listed in Table 6.3 a mutation model was used to introduce a fixed number of mutations (1 to 10), and the 10,000 variants receiving the highest mutation score were stored. For simplicity, the mutation model was restricted to g2pmixed , which also ranked among the best models in the previous setting. Next, for all predicted variants the FDO was computed. The consensus FDO of the (at most) 10,000 variants was computed using a weighted mean (see Eqn. 6.8). Finally, we correlated the predicted FDOs with the observed FDOs. Here, however, the number of mutations that have to be introduced into the wild type sequence is unknown. Thus, to get an upper estimate on the prediction performance we selected the number of mutations for each treatment that best fits the observed value (best fit). In addition, we used a crude heuristic for estimating the number of mutations (manual): the number of mutations to be introduced is 2, treatments containing boosted PIs introduce one mutation less, treatments with less than 3 drugs one mutation more, and so do treatments containing the combination d4T+ddI. Results Table 6.4 lists the obtained correlation for a varying number of top N most likely viral variants. For the best fit approach the correlation steadily increases with larger viral population and reaches up to 0.951 for the full list of 10,000 variants. Using the crude manual heuristic for inferring the number of mutations the correlation between predicted FDO and actual FDO decreases to approximately 0.570. Figure 6.14 shows scatter plots of the real FDO against the predicted FDO using the 100 most likely variants. A surprising observation is that using the best fit approach a small number of mutations is sufficient to achieve a high correlation with the observed values. Precisely, the majority of treatments are best modeled when only one, two, or three mutations are introduced. In order to verify whether the drug classes are correctly modeled, we correlated the observed class specific FDO to the predicted class specific FDO. Here, we used the number of mutations determined with best fit approach on the FDO for all drugs. The correlations was very high for NRTIs and NNRTIs, reaching 0.809 and 0.796, respectively. The PIs, however, showed with 0.407 only an acceptable correlation. A confounder of the FDOs for PIs are treatments that did not contain any PIs. Consequently, removal of non-PI treatments elevated the correlation to 0.599. Figure 6.14 c) shows the 6.4 Validation drug combination ZDV ddC+ZDV 3TC/FTC+ZDV 3TC/FTC+LPV+ZDV 3TC/FTC+NFV+ZDV 3TC/FTC+NVP+ZDV 3TC/FTC+IDV+ZDV 3TC/FTC+EFV+ZDV 3TC/FTC+LPV+TDF 3TC/FTC+ABC+ZDV ddI+ZDV 3TC/FTC+SQV+ZDV 3TC/FTC+d4T+NFV 3TC/FTC+EFV+TDF 3TC/FTC+d4T d4T+ddI+NFV 3TC/FTC+d4T+EFV 3TC/FTC+d4T+NVP ddC+SQV+ZDV 3TC/FTC+d4T+SQV d4T+ddI+EFV 3TC/FTC+APV/FPV+TDF 3TC/FTC+ATV+TDF 3TC/FTC+ABC+LPV 3TC/FTC+d4T+IDV 3TC/FTC+d4T+LPV d4T+ddI d4T+ddI+NVP ddI+NVP+ZDV 3TC/FTC 3TC/FTC+NVP+TDF d4T+ddI+IDV d4T+ddI+SQV ddI+EFV+TDF count 100 44 40 38 38 34 31 24 23 20 20 19 15 11 11 9 7 7 7 6 6 5 5 4 4 4 4 4 4 3 3 3 3 3 all 13.94 (3.39) 14.13 (3.17) 13.01 (2.49) 15.55 (1.97) 10.70 (3.99) 13.11 (3.13) 13.42 (3.39) 12.17 (4.77) 14.15 (4.82) 12.01 (3.13) 12.74 (2.59) 12.09 (3.61) 10.37 (4.89) 11.65 (5.42) 11.39 (1.67) 10.58 (4.31) 13.04 (2.10) 14.38 (2.73) 14.62 (2.59) 9.25 (4.44) 12.12 (3.62) 12.65 (6.95) 13.59 (2.58) 15.46 (2.26) 12.69 (5.34) 13.38 (6.31) 9.90 (3.58) 11.25 (2.72) 11.60 (2.42) 12.57 (0.39) 13.95 (3.92) 14.28 (2.22) 10.56 (5.78) 9.15 (4.99) NRTI 4.80 (2.21) 4.83 (2.33) 3.49 (2.13) 6.17 (1.30) 3.67 (1.63) 4.60 (2.07) 5.04 (1.80) 4.62 (2.32) 5.94 (1.80) 2.60 (2.60) 3.23 (2.24) 3.82 (1.90) 3.44 (2.34) 4.47 (2.44) 2.02 (1.24) 3.07 (2.29) 5.02 (1.74) 5.85 (1.48) 5.32 (1.73) 2.16 (1.79) 5 (2.35) 4.95 (2.88) 4.78 (1.59) 5.96 (1.59) 4.79 (2.36) 5.41 (2.98) 3.43 (2.74) 4.48 (2.42) 2.88 (1.68) 3.23 (0.39) 5.39 (2.55) 5.59 (1.30) 3.72 (2.98) 4.89 (1.60) NNRTI 2.66 (0.69) 2.82 (0.32) 2.64 (0.77) 2.88 (0.24) 2.64 (0.69) 1.63 (1.23) 2.83 (0.27) 1.48 (1.39) 2.38 (1.01) 2.68 (0.77) 2.55 (0.88) 2.76 (0.67) 2.89 (0.18) 1.67 (1.44) 2.53 (0.86) 2.89 (0.07) 1.64 (1.24) 1.62 (1.50) 2.86 (0.19) 2.96 (0.03) 0.66 (1.19) 2.11 (1.21) 2.97 (0.04) 2.81 (0.22) 2.93 (0.09) 2.72 (0.52) 2.98 (0.02) 0.86 (1.44) 1.79 (0.97) 2.85 (0.25) 1.96 (1.70) 1.69 (1.53) 2.96 (0.04) 0.00 (0.00) 147 PI 6.49 (1.70) 6.49 (1.63) 6.88 (0.34) 6.50 (0.99) 4.38 (2.60) 6.88 (0.25) 5.55 (2.27) 6.07 (1.95) 5.83 (2.35) 6.73 (0.72) 6.97 (0.06) 5.51 (2.36) 4.03 (2.87) 5.52 (2.18) 6.84 (0.34) 4.62 (2.80) 6.38 (1.02) 6.91 (0.15) 6.43 (1.27) 4.12 (3.29) 6.46 (1.17) 5.60 (3.13) 5.84 (1.60) 6.69 (0.47) 4.97 (3.35) 5.25 (3.50) 3.48 (4.02) 5.91 (2.16) 6.92 (0.15) 6.49 (0.88) 6.60 (0.42) 7 (0.00) 3.88 (3.48) 4.26 (3.41) Table 6.2: First line antiretroviral treatments and observed future drug options (FDO). The column count states the sample size that served to estimate the FDO. The column all, NRTI, NNRTI, and PI state the FDOs for all drugs, NRTIs, NNRTIs, and PIs, respectively. Values are the mean among all observed cases and standard deviation is state in parenthesis. 148 6 Planning Sequences of HIV Therapies drug combination ZDV ddC+ZDV 3TC/FTC+ZDV 3TC/FTC+LPV+ZDV 3TC/FTC+NVP+ZDV 3TC/FTC+NFV+ZDV 3TC/FTC+IDV+ZDV 3TC/FTC+EFV+ZDV 3TC/FTC+LPV+TDF 3TC/FTC+ABC+ZDV ddI+ZDV 3TC/FTC+SQV+ZDV 3TC/FTC+d4T+NFV 3TC/FTC+d4T 3TC/FTC+EFV+TDF d4T+ddI+NFV 3TC/FTC+d4T+EFV 3TC/FTC+d4T+NVP ddC+SQV+ZDV d4T+ddI+EFV 3TC/FTC+ATV+TDF 3TC/FTC+ABC+LPV 3TC/FTC+APV/FPV+TDF 3TC/FTC+d4T+SQV d4T+ddI d4T+ddI+NVP ddI+NVP+ZDV 3TC/FTC 3TC/FTC+NVP+TDF 3TC/FTC+d4T+IDV 3TC/FTC+d4T+LPV d4T+ddI+IDV count 95 43 40 38 34 33 30 22 21 20 20 18 12 11 10 8 7 7 7 6 5 4 4 4 4 4 4 3 3 3 3 3 all 14.47 (2.55) 14.39 (2.70) 13.01 (2.49) 15.55 (1.97) 13.11 (3.13) 11.75 (3.06) 13.71 (3.05) 13.24 (3.26) 15.39 (2.62) 12.01 (3.13) 12.74 (2.59) 12.55 (3.06) 11.95 (4.08) 11.39 (1.67) 12.79 (4.09) 11.41 (3.77) 13.04 (2.10) 14.38 (2.73) 14.62 (2.59) 12.12 (3.62) 13.59 (2.58) 15.46 (2.26) 15.67 (1.89) 11.93 (1.97) 9.90 (3.58) 11.25 (2.72) 11.60 (2.42) 12.57 (0.39) 13.95 (3.92) 15.30 (1.38) 16.53 (0.63) 14.28 (2.22) NRTI 4.98 (2.10) 4.94 (2.24) 3.49 (2.13) 6.17 (1.30) 4.60 (2.07) 4.04 (1.41) 5.10 (1.79) 5.01 (1.99) 6.40 (0.97) 2.60 (2.60) 3.23 (2.24) 3.83 (1.95) 4.28 (1.78) 2.02 (1.24) 4.89 (2.11) 3.34 (2.28) 5.02 (1.74) 5.85 (1.48) 5.32 (1.73) 5 (2.35) 4.78 (1.59) 5.96 (1.59) 6.18 (0.96) 2.90 (1.76) 3.43 (2.74) 4.48 (2.42) 2.88 (1.68) 3.23 (0.39) 5.39 (2.55) 5.76 (1.63) 6.90 (0.05) 5.59 (1.30) NNRTI 2.67 (0.69) 2.81 (0.33) 2.64 (0.77) 2.88 (0.24) 1.63 (1.23) 2.70 (0.57) 2.87 (0.20) 1.61 (1.38) 2.60 (0.72) 2.68 (0.77) 2.55 (0.88) 2.91 (0.14) 2.87 (0.20) 2.53 (0.86) 1.83 (1.40) 2.88 (0.07) 1.64 (1.24) 1.62 (1.50) 2.86 (0.19) 0.66 (1.19) 2.97 (0.04) 2.81 (0.22) 2.49 (0.99) 2.96 (0.04) 2.98 (0.02) 0.86 (1.44) 1.79 (0.97) 2.85 (0.25) 1.96 (1.70) 2.91 (0.10) 2.63 (0.59) 1.69 (1.53) PI 6.82 (0.87) 6.64 (1.31) 6.88 (0.34) 6.50 (0.99) 6.88 (0.25) 5.01 (2.17) 5.74 (2.06) 6.62 (0.64) 6.38 (1.54) 6.73 (0.72) 6.97 (0.06) 5.81 (2.01) 4.80 (2.68) 6.84 (0.34) 6.07 (1.25) 5.18 (2.37) 6.38 (1.02) 6.91 (0.15) 6.43 (1.27) 6.46 (1.17) 5.84 (1.60) 6.69 (0.47) 7 (0.00) 6.07 (1.68) 3.48 (4.02) 5.91 (2.16) 6.92 (0.15) 6.49 (0.88) 6.60 (0.42) 6.63 (0.62) 6.99 (0.01) 7 (0.00) Table 6.3: First line antiretroviral treatments and observed future drug options (FDO). The column count states the sample size that served to estimate the FDO. The column all, NRTI, NNRTI, and PI state the FDOs for all drugs, NRTIs, NNRTIs, and PIs, respectively. Values are the mean among all observed cases and standard deviation is state in parenthesis. 6.4 Validation top N best fit manual 1 0.784 0.148 10 0.876 0.351 100 0.907 0.561 1,000 0.943 0.588 149 10,000 0.951 0.572 Table 6.4: Pearson correlation between predicted and actual FDOs. Top N indicates how many most likely variants were used for computing the predicted drug options. The two methods for finding the number of mutations to be introduced are best fit and manual. (a) 100 variants (best fit) (b) 100 variants (manual) (c) 100 variants protease (best fit) Figure 6.14: Scatter plot between real FDO and predicted FDO using 100 most likely variants. FDO for all 17 drugs, and the number of mutations was set using the best fit approach (a) or the manual heuristic (b). FDO for PIs only, the number of mutations was set using the best fit approach on all 17 drugs (c). corresponding scatter plot for only PI containing regimens. Discussion The best fit approach explains the future drug options very well, while requiring only a small number of additional mutations for the majority of treatments to explain the observed FDO value. Moreover, the best fit (in terms of mutations) regarding the FDO of all drugs reflects also the class-wise FDOs with correlations raging from 0.599 to 0.809, indicating that the underlying mutation model reflects the resistance development in the different drug targets in vivo to a satisfying extent. Indeed, the measured FDOs exhibit a high variance themselves, most likely owing to differing treatment lengths, differing patient adherence, and also differing mutations in the virus preexisting at treatment start. Clearly, these sources of uncertainty add to the difficulty of fitting a single FDO to a single drug combination. Consideration of the other factors, however, would either lead to overfitting the model to the noisy data (e.g. introduction of a treatment length parameter) or unreliable estimations of the FDOs due to low sample numbers (e.g. requirement of baseline genotype). Moreover, for other external factors like adherence no data are available. We made a further assumption which does certainly not hold: every patient started with a wild type consensus B virus. As can be seen from the first extraction of data from the database, a substantial number of cases showed an FDO of 5.0 or less and, therefore, 150 6 Planning Sequences of HIV Therapies were excluded from the analysis. Moreover, also the cleaned set of instances contains unlikely post-treatment FDOs. For instance, patients treated with ddI+d4T only have only 3.5 active PIs at treatment failure, on average. A change in FDO among PIs after a PI-less combination therapy can only be explained by either preexisting drug resistance mutations or errors in the database, e.g. the treatment contained a PI or stop date of the first treatment is incorrect and the genotype was actually obtained during the following PI-based regimen. The best fit estimation clearly serves as an upper bound of the model performance. Despite the many sources of uncertainty originating from the gold standard, i.e. the real data, the model explains the observations using only few mutations. And, most importantly, the resistance development in the different drug classes is captured well. Thus, the mutation model provides the means to simulate resistance development in antiretroviral combination treatments, an essential basis for the sequencing of treatments. 6.5 Remarks We introduced five different mutation models together with a framework for quickly simulating resistance development in HIV against combination treatments. The mutation models demonstrated, to varying extent, that prediction of response to the one after next regimen can be achieved at moderate performance. Moreover, a second validation setting showed that observed future drug options can be explained using one of the models (g2pmix ) by the introduction of only few mutations. Here, the overall resistance development was reflected well by resistance development within drug classes. A crucial variable contributing to success and failure of the model is the number of mutations that are introduced by a treatment. The amount of mutations depends on a range of parameters, including activity of the regimen, potency of the drugs in the regimen, duration of the regimen, and the patient’s adherence to the regimen. The exact number is therefore hard to estimate. Within the first validation setting, we explored different functions to infer a suitable number of mutations based on activity and potency of the regimen. These approaches showed an improved behavior compared to the introduction of a fixed number of mutations. However, there is clearly room for further improvements, for instance by functions based on adherence of the patient. Of note, inferring the number of mutations from the duration of the regimen alone is not likely to succeed. For example, one can expect that a long treatment duration results in a higher number of mutations. But, on the other hand, a long duration indicates good compliance of the patient and, as a consequence, a sustained treatment success resulting in only few mutations. Thus, only a measure of patient adherence paired with treatment length is likely to succeed. The introduced model can still provide insight on how to sequence treatments even when the number of mutations is unknown. More precisely, one can observe the development of the estimated success probability of potential nth line treatments with respect to (n − 1)th line regimen. For instance, on a Dual-Core AMD OpteronTM processor with 2.6 GHz and 32 GB memory, the simulation from 1 to 10 additional mutations including storage of the 1,000 most likely variants takes approximately 35 seconds. The interpretation of all variants using the EV EuResist engine requires (due to suboptimal implementation for this task) another 70 seconds. Furthermore, the framework for simulating the evolution is 6.5 Remarks 151 easily scalable and one can achieve further speedup by limiting the number of variants (6 seconds with 100 variants instead of 1000), the number of different mutations (17 seconds with 5 mutations instead of 10) or a combination of both (3 seconds). Hence, we reached a computational performance that enables the construction of web services. Such a service might allow to upload the sequence of a patient isolate as well as the selection of a few potential treatments and possible follow-up regimens. As a result the service will provide the change in treatment options (using different measures) as a consequence of the selected regimen. In the following we provide a case-study of four typical first-line treatments using three different measures for future treatment options. Figure 6.15 depicts the development of four different regimens (different colors) in response to two potential first-line regimens (different line styles) in terms of PSS (a) and predicted success by the EV EuResist engine (b). The simulation started with a consensus B wild type sequence, up to 10 mutations were introduced and the 1,000 most likely variants were analyzed. Figure 6.15 a) reveals that up to six mutations there is no difference for TDF-containing regimens (black and green) whether the preceding treatment contained TDF or ZDV. For the ZDV-containing regimens, however, the TDFcontaining pre-treatments exhibit a higher score (around 0.3). Hence, first using TDF and then ZDV seems to be a better choice. Also the analysis in terms of estimated success probability (Figure 6.15 b) demonstrates that the use of 3TC+ZDV+LPV provides a slightly worse perspective for future regimens than administration of 3TC+TDF+LPV (i.e. all dashed lines are below solid lines of same color). The benefit of 3TC+TDF over 3TC+ZDV in terms of resistance at treatment failure is well recognized in the medical community, and indeed, TDF-containing regimens are suggested for first-line regimens (http://www.aidsinfo.nih.gov/Guidelines/). The advantage of TDF is that it selects for the mutation K65R, which causes hypersusceptibility of ZDV and other NRTIs (Parikh et al., 2006). Of note, also the applied mutation model assigns a high probability to the appearance of K65R during TDF treatment (see Figure 6.7). The simulation of both treatments (including interpretation) took approximately 3.5 minutes (using only the 100 most likely variants the computational time is reduced to 30 seconds, while maintaining the same qualitative result). Instead of studying the development of the predicted success rate for a set of fixed regimens, one can also focus on resistance against individual drugs. For instance, Figure 6.16 shows the decrease in FDO as response to the same four drug combinations. By inspecting the FDO for all drugs no difference between the use of TDF or ZDV is visible. When focusing on the FDO in the class of NRTIs, however, the inferiority of ZDV becomes evident, as there is a difference of approximately half a drug with four and five mutations. Concluding, the transducer-based framework provides the computational means for efficiently simulating resistance development during anti-HIV therapy, and thereby allows the search for optimal orderings of antiretroviral therapies. The computational performance can be even further increased by using the C++ interface instead of the command-line tool; the latter requires to store transducers on the hard drive, while the former allows to store transducers in the main memory and therefore avoid the I/O overhead. Furthermore, the mutation models, on which are simulation depends, can also be based on expert-based rating systems, as opposed to geno2pheno. Likewise, depending on the users’ preference the used rating scheme for computing FDOs can be based on expert-based systems. ● ● ● ● ● ● ● ● 3TC+TDF+LPV 3TC+ZDV+LPV 3TC+TDF+EFV 3TC+ZDV+EFV ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.5 3TC+TDF+LPV 3TC+ZDV+LPV ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.0 4 ● ● ● ● ● ● 2 ● ● ● ● ● ● ● ● 0.6 ● ● ● ● ● ● ● ● ● 0.4 ● predicted success ● ● ● ● ● ● ● ● 3TC+TDF+LPV 3TC+ZDV+LPV 3TC+TDF+EFV 3TC+ZDV+EFV ● ● 0.2 ● ● 6 number of mutations (a) geno2pheno PSS 8 ● ● ● 10 0.0 2.0 1.5 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 1.0 geno2pheno PSS (activity) ● ● ● ● ● ● ● ● ● ● 0.8 2.5 ● ● 1.0 6 Planning Sequences of HIV Therapies 3.0 152 ● 3TC+TDF+LPV 3TC+ZDV+LPV 2 4 6 8 10 number of mutations (b) success probability Figure 6.15: Development of geno2pheno PSS (a) and predicted success probability (b) of four different regimens (different colors) after treatment with two typical first-line regimens (different line styles) 6.6 Discussion Transducers provide a solid framework for simulating development of resistance mutations in HIV. The problem of finding the most likely viral variant, which may emerge during a combination therapy, is modeled as a shortest path problem, in which edge costs are related to the probability of single mutations to emerge during treatment. Due to the application of transducers in the domain of natural language processing (e.g. spoken language recognition and text translation) a number of free and efficient implementations of algorithms that are based on transducers are available. Moreover, the transducer framework can easily be scaled to the available computational hardware. For instance, the number of mutations to be added, the number of most likely viral variants to be tracked, and the amino acid positions of the HIV sequence to be considered (positions have to be simply omitted from the linear input acceptor representing the baseline sequence) can be adapted without modifying the mutation models. Most importantly though, transducers allow to easily implement different mutation models. In addition to the models presented here, mutation models based on Hidden Markov Models (HMMs) can be realized (see for instance Healy and Degruttola (2007)). The capability of representing HMMs follows from the fact that weighted finite state transducers comprise output labels and weights which can be used to model observation (emissions of the HMM) and probabilities for transitions and emissions, respectively. A major drawback of using transducers for implementing mutation models is the practical requirement to model mutations independently from each other. That is, each mutation will lead to the same increase in total path cost; independent of the mutations that are already present in the genotype. In theory it is possible to construct mutation transduc- ● 15 153 7 6.6 Discussion ● ● ● ● 6 ● ● ● ● 5 ● ● ● ● ● ● 3TC+LPV+TDF 3TC+LPV+ZDV 3TC+EFV+TDF 3TC+EFV+ZDV 4 ● ● ● ● ● ● ● ● 3 10 ● ● FDO NRTI ● ● ● ● ● ● ● ● ● ● 2 ● ● ● 0 0 1 5 FDO ALL ● 1 2 3 4 5 6 1 2 3 ● ● ● ● 7 ● ● ● ● ● ● ● ● ● ● ● ● ● ● 2.0 5 ● 1.5 ● 4 ● ● 3 FDO PI ● ● 1.0 FDO NNRTI 6 6 ● ● 5 mutations 2.5 3.0 mutations 4 ● 2 ● ● ● ● 1 0.5 ● ● ● ● ● ● 1 2 3 4 mutations 5 6 0 0.0 ● 1 2 3 4 5 6 mutations Figure 6.16: Change of future drug options (FDO) as response to four typical first-line treatments. The panels depict the FDO for all drugs (top left), NRTIs (top right), NNRTIs (bottom left), and PIs (bottom right). ers that implement mutation scores that are dependent on present mutations; however, these transducers are complicated to construct, and the results are in general very large automata (i.e. many additional states and edges are required to represent alternative paths that go along preexisting mutations). Of course, large transducers increase the required computation time (and the I/O overhead). Another method that takes preexisting mutations into account is the approach introduced by Deforche et al. (2008) for estimating in vivo viral fitness landscapes. Here, a Bayesian network model for an individual drug is trained on viral sequences that were obtained during treatment with that drug. This network is used to capture interactions between mutations. Furthermore, in a second training phase the conditional probabilities of the Bayesian network are replaced with conditional fitness contributions of each mutation. This fitness function is applied together with a simple finite ideal Wright-Fisher population model (Wright, 1931) for simulating the resistance development in the viral population during anti-HIV treatment. However, the approach has to model and simu- 154 6 Planning Sequences of HIV Therapies late the resistance development for each viral variant in the population (Deforche et al. (2008) used a population size of 104 ), and is therefore likely not to meet the computational demands for an interactive system. The major problem for all approaches is the estimation of mutation models for drug combinations instead of individual drugs. For instance, longitudinal in vivo data is usually scarce (see Figure 6.8); the available data decreases even further when one is restricted to a single drug combination. Thus, constructing models for individual drugs is likely to achieve more robust models. Estimation of such models from in vivo data, however, is problematic since drugs are given in combination and the causal inference between drugs and emerging mutations is hard to solve. In the end, the most stable approach is the estimation of mutation models for individual drugs from in vitro data followed by a combination approach to realize models for drug combinations. Finally, another possibility for finding sequences of beneficial drug combinations is to omit the explicit resistance development in the viral population and directly learn from the sequences of combination therapies in the patients’ pasts. For instance, one can focus on the success of a treatment given the N − 1 preceding combinations. We recently investigated such an N -gram graph of therapy sequences as an potential encoding for a patient’s therapy history (Müller, 2008). Unfortunately, only few treatment sequences occurred sufficiently often. However, it cannot be excluded that with the significantly larger EuResist integrated database, which is available today, and a more elaborate statistical learning method (or encoding of therapy sequences) information about beneficial orders of drug combinations can be extracted without explicitly simulating the viral population. 7 A Solution to the HIV Pandemic: A Vaccine Current treatment options do not provide a cure to HIV infections. Hence, the ultimate solution to the HIV pandemic is to prevent further infections. One established instrument for preventing infections are vaccines, and therefore the development of a successful HIV vaccine is of major interest today. In this chapter we will present a bioinformatics approach to aide the search for immunogens that elicit broadly neutralizing antibodies against HIV. In Section 7.1 we will very briefly summarize the major elements of the immune system and the molecular basis of immunity conferred by vaccination. This is followed by a short overview on HIV vaccine research, including a focus on the viral spike (gp160), which is the target for neutralizing antibodies (Section 7.2). Section 7.3 focuses on predicting antibodymediated neutralization from gp160 genotype based on statistical models. These models are also used to identify features of gp160 that increase susceptibility to neutralizing antibodies. This knowledge will be used to find gp160 variants that can be used as immunogens in a successful HIV vaccine. Section 7.4 concludes the chapter with an outlook. 7.1 The Immune System and Vaccines In the following we provide a brief overview on the immune system based on DeFranco et al. (2007). The immune system of an organism is organized as a layered defense. The first layer comprises physical barriers (e.g. shells, skin) and mechanical mechanisms (e.g. coughing, sneezing). Once a pathogen manages to surmount the first line of defense, it faces mechanisms of the innate immune system and the adaptive immune system, where the latter one is a characteristic of jawed vertebrates. Innate Immune System The cells and molecules of the innate immune system recognize features that are conserved across broad groups of pathogens. This system reacts to pathogens in a generic (non-specific) way and it confers neither long-lasting immunity nor increased efficacy after previous exposure to a pathogen. The innate immune system includes different defense mechanisms: humoral and chemical barriers of the innate system include inflammation and the complement system. Inflammation is a process for recruiting immune cells to the sites of infection; it is produced by eicosanoids and cytokines that are released by injured or infected cells. The complement system consists of about 30 serum and membrane proteins that can trigger a variety of immune reactions by a biochemical signal cascade. Furthermore, the innate response depends on a range of leukocytes (white blood cells). A major subgroup of leukocytes comprises phagocytes, which act by engulfing particles or pathogens: the pathogen is trapped in a vesicle and subsequently “destroyed” by digestive enzymes. Dendritic cells and macrophages (a type of phagocyte) present parts of the digested pathogen on their surface. These antigens (antibody generators) bridge the gap 155 156 7 A Solution to the HIV Pandemic: A Vaccine to the adaptive immune system by triggering the response of T cells and B cells, which constitute the main building block of the adaptive immune system. Adaptive Immune System T cells and B cells belong to the group of lymphoctytes, a special type of leukocytes. T cells are involved in a cell-mediated immune response, whereas B cells are part of the humoral response. Both T and B cells carry receptor molecules that recognize only specific targets, and each cell produces only one type of antigen-recognizing receptor. T cells are divided into two subgroups: killer or cytotoxic T cells and helper T cells. T cells only respond to antigens that are bound to a protein of the major histocompatibility complex (MHC), killer T cells (CD8+ T cells) require antigens coupled to Class I MHC molecules, whereas helper T cells (CD4+ T cells) rely on Class II MHC molecule-antigen complexes. As their name suggests cytotoxic T cells release substances, which are toxic for cells (cytotoxins), and thereby directly kill infected (e.g. by viruses) and damaged or dysfunctinal (e.g. cancer) cells. Helper T cells activate other cells of the immune system and thereby regulate both the innate and adaptive immune responses. For instance, helper T cells stimulate killer T cells and enhance the microbicidal function of macrophages. Moreover, helper T cells also activate B cells to produce neutralizing antibodies. Both helper and killer T cells originate from naïve T cells, and the required proliferation into effector T cells is mediated by the dendritic cells from the innate immune system. Naïve B cells carry B cell antigen receptors (BCR) on their surface that contain immunoglobulins (antibodies). Once the BCR binds to its specific antigen (e.g. a virus surface protein) the complex undergoes endocytosis. The antigen is cleaved into small peptides that are presented by Class II MHC molecules on the surface of the cell. After recognition and subsequent activation by a helper T cell the B cell becomes activated and undergoes differentiation into an effector B cell that secretes copies of exactly the antibody, which recognized the antigen. Antibodies can directly inactivate (neutralize) pathogens by binding to its surface proteins that are used to infect target cells or to toxins secreted by bacteria. In addition, bound antibodies mark their target for destruction by the innate immune system, i.e. complement activation or processing by phagocytes. In principle, only antigens that can be recognized by the preexisting pool of B cells can trigger an immune response. However, on successful activation, the antigens produced by B cells undergo a process termed affinity maturation in which somatic hypermutations lead to modified binding affinities of the antibody to the antigen, and antibodies with higher affinity are selected. In contrast to the innate immune system, the adaptive immune system maintains specific cells targeting specific pathogens, or more precisely: specific antigens. Moreover, an immunologic memory is established by offsprings of T and B cells that begin to replicate after activation. These memory cells can survive for decades and therefore the immune system is prepared for a challenge with the same pathogen or more precisely by pathogens presenting the same antigens. The aim of a vaccine is to induce immunity against a pathogen, that is, invoking a strong immune response against the pathogen upon recognition. While the main goal of a vaccine is to achieve sterilizing immunity (complete protection from infection), it is often sufficient to substantially reduce the amount of active pathogens in the host and therefore limit disease-related symptoms. Protection against a pathogen is elicited by challenging 7.2 Vaccines and HIV 157 the adaptive immune system with adequate antigens that are able to trigger a response by the immune system and consequently establish the immunologic memory in the form of T and B cells. An immune response can be triggered by mild versions of the pathogen that usually do not cause the disease (live attenuated vaccines), by killed pathogens (inactivated vaccines), or by parts of the pathogens (compound vaccines) (Plotkin, 2005). The general idea is to mimic an attack by the pathogen and thereby create the immunity that would be naturally established after the real challenge. In general, the more similar the immunogen comprising the vaccine is to the real pathogen, the better and longer lasting is the protection by the vaccines. Hence, live attenuated vaccines are more effective than inactivated vaccine, which, in turn, are more effective than compound vaccines (Lambert et al., 2005). For instance, multiple shots of the new hepatitis B vaccine (a compound vaccine) are required to establish immunity. Inactivated and compound vaccines, however, are safer than live attenuated vaccines, since they are incapable of causing the disease. Most of today’s successful vaccines rely on the establishment of memory B cells and therefor on the rapid generation of matching neutralizing antibodies (nAbs) upon a challenge by the pathogen. Vaccines are a major instrument for controlling diseases, and since the introduction of the first vaccine against smallpox in 1798, smallpox have been eradicated and other pathogens have almost disappeared. Today, vaccination is extended from infectious to noninfectious diseases (immunotherapy). Among the infectious diseases HIV, malaria, and tuberculosis are still of leading interest, while research in the area of noninfectious diseases currently focuses on various types of cancer, autoimmune diseases (e.g. multiple sclerosis), atherosclerosis, and Alzheimer’s disease (Plotkin, 2005). 7.2 Vaccines and HIV After identification of the pathogen that causes AIDS, researchers expected to have a vaccine within a few years. Indeed, on April 23 in 1994 Magaret Heckler (then U.S. Health Secretary) proclaimed publicly that a vaccine candidate will be available for testing within two years (Walker and Burton, 2008). Now, 28 years after the discovery of HIV there is still no vaccine. Even worse, there is not even a promising candidate in close sight. The challenges that researchers are facing during the development of an HIV vaccine are manifold: First, HIV is extremely diverse, which makes it hard to identify conserved epitopes, and due to the error-prone replication process HIV can quickly evade the immune response by mutating epitopes that are recognized by the adaptive immune system. Second, a characteristic of HIV is the unique choice of target cells: the virus infects cells of the immune system, in particular CD4+ helper T cells that coordinate the immune response. Consequently, the virus slowly and steadily destroys is natural antagonist. Third, HIV has evolved mechanisms to counter immune responses, e.g. the accessory protein Nef down-regulates the synthesis of MHC molecules, which are crucial for the identification of infected cells by T cells (Yang et al., 2002), and the surface envelope proteins exhibit multiple features for successfully avoiding antibody recognition. A further complication for vaccine design is resulting from the fact that HIV integrates its genetic information into the host genome rapidly after infecting a cell. When the provirus is in its latent state, the cell shows no signs of infection and is immunologically silent and thus unrecognized 158 7 A Solution to the HIV Pandemic: A Vaccine by the immune system. This way the virus can survive for decades in memory cells of the immune system, its latent viral reservoir. All these unique features of HIV lead to following requirements for a successful vaccine: recognition of an array of diverse viruses and fast response to the challenge in order to prevent the establishment of viral reservoirs. If the vaccine fails to prevent the latter, it has at least to assist the natural immune response in preventing the destruction of CD4+ cells during the acute phase (Walker and Burton, 2008). This would enable the patient to better control the infection, i.e. maintain a low viremia, and thereby substantially delay progression to AIDS and, in addition, reduce the risk of transmission to further individuals. Because cells present fragments of proteins they produce on their surface via their MHC molecules, HIV vaccines aiming at eliciting T cell-mediated immunity can (in theory) be based on any of the viral proteins. Vaccines that rely on eliciting neutralizing antibodies, on the other hand, can only target the viral spike (gp160). As mentioned earlier (Section 2.2) the viral spike comprises two sub-units, gp41 (transmembrane) and gp120, that occur in trimeric form (Figure 7.1 a) (Liu et al., 2008). During cell entry, the glycoprotein gp120 binds to the CD4 receptor and the coreceptors, while gp41 is responsible for membrane fusion. The viral spike is heavily shielded against the host immune system by multiple mechanisms. For instance, the glycoprotein, as its name suggests, is densely covered with sugar molecules that shield conserved epitopes against the adaptive immune response. The subunit gp120 contains five variable regions (V1-V5) that are inter-spaced with conserved regions. These five variable loops exhibit a very high sequence diversity and provide further protection of the conserved parts of gp120 against immune recognition. The variable surface region of gp120 is also referred to as the “silent face” because only few antibodies are elicited against it (Karlsson Hedestam et al., 2008). Figure 7.1 b) shows the core of the gp120 protein in contact with the CD4 receptor. Also its own structure protects the viral spike against the immune response. Conserved regions in gp41, for instance, are inaccessible, simply because there is too little space between gp120 and the viral membrane to allow for binding of the B cell receptor. A more advanced protection mechanism is termed conformational masking. Here, conserved epitopes are only accessible after conformational changes in the protein. For instance, the coreceptor binding site of gp120 is only accessible after the conformational change induced by CD4 binding (Douek et al., 2006). Upon infection, a race between HIV and the adaptive immune system commences. Unfortunately, the virus is always one step ahead. More precisely, the adaptive immune system inflicts selective pressure on the evolution of HIV, which drives HIV to evolve into variants that cannot be neutralized by the available array of neutralizing antibodies. Richman et al. (2003) studied blood sera and viruses from patients at different time points. Briefly, their analysis showed that the blood serum at time point t is very well capable of neutralizing older viral variants, i.e. obtained at time point t − 1 and earlier. However, the sera are only insufficiently capable or incapable of neutralizing the current or future variants, respectively. Thus, antibodies typically generated during a chronic HIV infection cannot be used as basis for a vaccine. Fortunately, some chronically infected individuals elicit antibodies that are capable of neutralizing a broad range of HIV variants. These broadly neutralizing antibodies are of major interest for the design of a successful HIV vaccine because they provide the means of establishing sterilizing immunity. In addition, currently existing neutralizing antibodies demonstrate that the viral spike has a few weak 7.2 Vaccines and HIV (a) The viral spike 159 (b) Structure of the gp120 core Figure 7.1: A schematic representation of the HIV-1 envelope glycoprotein trimer in contact with the CD4 receptor (red) and the coreceptor (green) of the host cell (a). The trimer comprises three copies of the transmembrane glycoprotein gp41 (brown), the three copies of the surface glycoprotein gp120 (blue) are attached to gp41. Figure reprinted from Phogat and Wyatt (2007) with kind permission from Bentham Science Publishers. Structure of core gp120 (purple) in complex with the CD4 receptor (orange) (b). The figure was rendered with PyMOL on the basis of PDB entry 2nxy. Estimated position of the five loops is given in blue font. spots that can be exploited by rational vaccine design. This knowledge can help to develop an immunogen that elicits broadly neutralizing antibodies (bnAbs) in vivo. Until now, a number of broadly neutralizing antibodies have been identified in chronically infected individuals (Burton et al., 2004) and the search for new bnAbs is still going on (Walker et al., 2009). One of the first bnAbs discovered, which is called b12, targets an epitope that substentially overlaps with the conserved CD4 binding site on gp120 (Burton et al., 1994). Consequently, upon binding to gp120 the nAb prevents CD4 attachment, and thus the conformational change that is a prerequisite for successful entry of HIV into the target cell. The nAbs 2F5 and 4E10 recognize conserved linear epitopes on gp41. It is expected that these nAbs act like fusion inhibitors, i.e. they exhibit their neutralizing effect by preventing the conformational change in gp41 leading to fusion of viral and cell membranes. Figure 7.2 depicts the viral spike and epitopes recognized by some bnAbs. Recently, Trkola et al. (2008) demonstrated the efficacy of bnAbs in vivo with the aim to determine clinically relevant titers of the antibodies. HIV Vaccine Candidates So far, there have been a few vaccine candidates that were tested in clinical trials. The most prominent examples are AIDSVAX and the STEP study. Within STEP (launched in 2004) a candidate vaccine using a replication-deficient adenovirus type 5 vector that expresses some HIV clade (subtype) B proteins (Gag, Pol, Nef ) was investigated. As it is clear from the list of HIV proteins involved, the candidate was 160 7 A Solution to the HIV Pandemic: A Vaccine Figure 7.2: Structure of the envelope trimer with highlighted epitopes of some broadly neutralizing antibodies: 2F5 (red), 4E10 (yellow), b12 (grey), and 17b (orange). Figure reprinted from Phogat and Wyatt (2007) with kind permission from Bentham Science Publishers. targeted to elicit T cell mediated immunity. The clinical trial, however, was stopped in 2007 by the Data and Safety Monitoring Board. The reason for the premature end of the trial was that individuals that received the vaccine exhibited an enhanced rate of HIV-1 acquisition compared to people in the control group (Barouch, 2008). This, of course, was an immense setback for the HIV vaccine research. The AIDSVAX vaccine candidate developed by the company VaxGen uses monomeric gp120 to elicit humoral immune response, i.e. neutralizing antibodies. The candidate managed to elicit type-specific antibody response. A protective effect, however, could not be detected in two clinical trials in 2005 and 2006 (Barouch, 2008). Recently, AIDSVAX received further media coverage when in September 2009, researchers reported a success in the search for an HIV vaccine. In a study started in October 2003 in Thailand, about 16,000 participants of HIV negative men and women received a combination of the two earlier tested vaccine candidates AIDSVAX and ALVAC-HIV. ALVACHIV uses a canarypox vector that, like the candidate in the STEP trial, expresses HIV (poly-)proteins (here: Gag, Env, PR). At the endpoint of the study, 76 participants from the control group were infected with HIV, compared to 56 of the vaccinated participants. Hence, the vaccine’s efficacy was 26.4%. This result, however, did not reach statistical significance. Statistical significance and 31.2% vaccine efficacy was achieved after excluding seven individuals that were found to have HIV at baseline (Rerks-Ngarm et al., 2009). Although a glimmer of hope, the protection conferred by the vaccine is only marginal and the analysis of the trial is regarded a “data crunch” for achieving statistical significance1 . 1 http://www.nature.com/news/2009/091021/full/news.2009.1035.html 7.3 Predicting Neutralization from Genotype 161 Despite the immense promise given by bnAbs, it has not been possible yet to identify an immunogen that elicits bnAbs in vivo. With the work presented in the following we investigate what determines the susceptibility of an Env variant against three of the available bnAbs. 7.3 Predicting Neutralization from Genotype This section describes a joint work with Fabian Müller, Oliver Sander, Thomas Lengauer, Klaus Überla, and Francisco Domingues. Here, using an array of statistical learning methods, we investigate the genotype-phenotype-relationship between the Env genotype and the neutralization phenotype of 147 viral variants with three different bnAbs. The experimental assay for determining the neutralization phenotype is very similar to the approach used for assessing in vitro drug resistance phenotypes (Section 2.5) and thus not described in further detail (Richman et al. (2003) for example used an adapted version of the original drug resistance assay developed by Petropoulos et al. (2000)). We selected statistical learning methods that allow for assessing feature importance and analyze statistically relevant positions in terms of susceptibility to neutralization by bnAbs. Finally, based on the knowledge derived from interpreting the statistical learning methods we propose in vitro experiments for experimental validation of our model. The focus of our investigation is the antibody b12, which targets an epitope that largely overlaps with the CD4 binding site. The model will be used to optimize an immunogen in an HIV vaccine development project. 7.3.1 Material and Methods Data The data for this study originate from four publications. Binley et al. (2004) studied the neutralization phenotype regarding a panel of nAbs of 90 viral variants from different clades, Li et al. (2005, 2006) analyzed 19 and 18 variants from clade B and C, respectively, and Schweighardt et al. (2007) examined another 20 variants from subtype B. The datasets are named according to their origin and subtype of focus (Binley, Li.B, Li.C, and Schweighardt.B ). Due to an overlap of viral variants studied in Binley and Li.B, five of the total 147 samples were excluded from the analysis. All publications provided IC50 values for the three nAbs b12, 2F5, and 4E10. The continuous neutralization phenotypes were dichotomized to “neutralizing” and “non-neutralizing” using an antibody-dependent cutoff; IC50 values below the cutoff indicate neutralization by the antibody. For b12 and 2F5 the cutoff selection was straightforward, since a large fraction of the samples was clearly not susceptible to the antibody (values “>50” and “>25” of mg antibody/ml, respectively), and variants with intermediate susceptibility were rare. The neutralization effect of 4E10, however, was so broad that only very few viruses were completely resistant. Thus, a cutoff between very susceptible and intermediate susceptible isolates was selected based on the distribution of IC50 values. Figure 7.3 a) depicts the distribution of IC50 values for all three bnAbs and the applied cutoffs, and Figure 7.3 b) shows the distribution of the binary neutralization phenotype within each clade in the dataset. The b12 phenotype data for clades B, C, and D is almost balanced, while other clades (represented by a few isolates only) are mainly non-neutralizing. 7 A Solution to the HIV Pandemic: A Vaccine 60 50 Frequency 40 50 0 10 20 >25 30 40 >50 20 10 0 0 10 20 30 Frequency 40 50 40 30 0 10 20 Frequency 4E10 60 2F5 60 b12 30 162 0 10 IC50 (mg antibody/ml) 20 >25 30 40 >50 0 10 IC50 (mg antibody/ml) 20 >25 30 40 >50 IC50 (mg antibody/ml) 20 30 40 50 neutralizing non−neutralizing 0 10 number of samples 60 70 (a) IC50 distribution A AC AE AG B BF BG C D F G J clade (b) Clade distribution Figure 7.3: Histograms of IC50 values for antibodies b12, 2F5, and 4E10 (a). Origin of the data is color-coded: Binley (blue), Li.B (light green), Li.C (dark green), Schweighardt.B (orange). Cutoffs are depicted by red dashed lines. Distribution of the binary neutralization phenotype within each clade (b). Left, middle, and right bars correspond to the b12, 2F5, and 4E10 phenotype, respectively. Amino-acid sequence data for the 142 Env variants were obtained from GenBank using the accession numbers listed in the four publications. The Env gene comprises the signal peptide, gp120, and gp41. The sequences were aligned using MUSCLE (Edgar, 2004) according to a guide tree based on the HIV phylogenetic tree published by Leitner et al. (2005). The sequence of HXB2 was added to the multiple sequence alignment as an alignment independent reference. For details on the sequence processing please refer to the Master thesis of Fabian Müller (Müller, 2009). The final alignment comprised 975 amino acid positions. Feature Encodings The amino acids at each of the 975 alignment positions were encoded in two different ways. The categorical encoding was used for statistical methods that can operate on categorical data. Here, each alignment position was represented as a single character: one of the 20 7.3 Predicting Neutralization from Genotype 163 amino acids, a gap symbol (-), symbols for uncertain amino acids (B and Z, representing presence of Asparagine or Aspartic acid and Glutamine or Glutamic acid, respectively), or a symbol representing any amino acid (X). Hence, each viral variant was represented by one feature for each alignment position. The binary encoding was input to support vector machines (SVMs), which are unable to operate on categorical features with more than two categories. Consequently, we used the same encoding as in geno2pheno (Section 2.5.2), i.e. we encoded an amino acid sequence using binary indicator variables. More precisely, one indicator variable was used for each possible symbol at each alignment position. Thus, a sequence was represented as a binary vector of length 24 × 975. We refer to the categorical encoding and the binary encoding as the sequence encoding. A property that is only implicitly captured by the sequence encoding is the location of potential N-linked glycosylation sites. As stated before, sugar molecules attached to the surface of gp120 mask conserved epitopes from recognition by the immune system. The glycosylation processes is carried out by the host cell at Asparagines (N) that are followed by any amino acid (X) and Serine (S) or Threonine (T), i.e. the typical NX[S,T] pattern of N-linked glycosylation. There were 268 positions in the alignment showing at least one N among all sequences. Of those, 148 fulfilled the NX[S,T] criterion in at least one Env variant. The respective 148 binary features constitute the glycosylation encoding, which were added to the sequence encoding (glyco). Apart from regions of high sequence variability, the multiple sequence alignment was very stable. Unfortunately, regions of high sequence variability comprise exactly the five variable loops that are believed to play an important role in determining antibody neutralization (Wyatt and Sodroski, 1998; Pantophlet and Burton, 2006). In order to further investigate the influence of variable loops on the ability to predict neutralization by antibodies, they were completely removed from the sequence encoding (no loops). In addition, we used sequence encodings that maintained sequence information on exactly one variable loop, for all five loops (only Vx ). Furthermore, as a replacement for the sequence information, we encoded the five loops using alignment-independent features meant to capture physicochemical properties, which can be derived from the sequence information. More precisely, for variable loop features we used the molecular weight (sum of all molecular weights of residues in the loop; one continuous feature per loop), the charge (number of positively and negatively charged amino acids or the overall charge; three integer features per loop), the length (number of residues in the loop, one integer variable per loop), the hydropathy (sum of all hydropathy indices of the residues (Kyte and Doolittle, 1982); one continuous feature per loop), the number of potential hydrogen bonds (absolute number of residues potentially participating in hydrogen bonds, and sum of all donors and acceptors; two integer values per loop), the N-linked glycosylation (number of potential glycosylation sites; one integer per loop), the fold index (based on Prilusky et al. (2005); one continuous variable per loop), and simply the abundance of each amino acid in the loop (20 integer values per loop). These variable loop features were added to the no loops encoding (loop feat). Finally, we also investigated subsets of the sequence encoding that were restricted to the three parts of Env : the signal peptide (signal ), gp120, and gp41. All applied encodings are briefly summarized in Table 7.1. 164 encoding sequence glyco no loops loop feat only Vx signal gp120 gp41 7 A Solution to the HIV Pandemic: A Vaccine description comprises sequence information of the Env gene: for the RF, amino acids sequences were encoded using categorical features; for the SVM, sequences were represented by a binary encoding sequence encoding and additional 148 binary features indicating potential glycosylation sites sequence information removed from all variable loops sequence information removed from all variable loops; instead loops are represented by physicochemical properties (see text) sequence information removed from all variable loops, but Vx, with x ∈ {1, . . . , 5} sequence encoding restricted to signal peptide sequence encoding restricted to gp120 sequence encoding restricted to gp41 Table 7.1: List of feature encodings. The features are briefly summarized, for a full description see the text. Statistical Learning Methods We employed two statistical learning methods for predicting the binary neutralization phenotype of viral Env variants. The Random Forests (RF) method (Breiman, 2001) trains B decision trees on B bootstrap replicates of the training data. At each node, only m randomly chosen variables of all p variables are considered for the split. The variable that reduces the impurity of the labels the most is selected for splitting the node. In RF P the impurity is measured using the Gini index: k pk (1 − pk ), where k and pk are the class and the frequency of that class at the node to be split, respectively. All B trees are fully grown and left unpruned. For the two parameters of the method, the number of features √ checked at each split (m) and the number of trees (B) we chose standard values b pc and 500, respectively. For the RF classifier the categorical sequence encoding was used since RF can operate on categorical variables. As second classifier we applied linear support vector machines (SVMs). For the parameter C, which penalizes misclassified samples in the training set, values of 2−7 , 2−6 , . . . , 22 were tested in a 10-fold cross-validation. The choice of the parameter did not influence the performance substantially, and was therefore set to 1. Since linear SVMs do not support categorical data the binary sequence encoding was applied. For achieving optimal performance, constant features were removed, and the built-in scaling option for features in the libSVM implementation (Chang and Lin, 2001) was used. The classification performance for both methods was assessed using the area under the ROC curve (AUC) in a 10 times five-fold cross-validation setting. Both classifiers were used together with all feature encodings for predicting neutralization by all three antibodies. Statistical significance is assessed using a one-sided Wilcoxon test on the 10 mean AUCs obtained in the five-fold cross-validation runs. 7.3 Predicting Neutralization from Genotype 165 Measures of Variable Importance We used Mutual Information (MI) and p-values from Fisher’s exact test (Fisher, 1922) as univariate measures of feature importance. Both measures were applied only to the categorical sequence encoding. Furthermore, we used the mean decrease in Gini index (MDG) extracted from the RF models and z-scores of features weights derived from the linear SVM (Section 3.1) as multivariate measures of feature importance. Briefly, the MDG of a feature is the decrease in Gini index observed when a node was split using that feature – averaged over all B trees in the forest. Due to the binary encoding of the sequence for the SVM, the feature importance (z-scores) was provided for each amino acid at each alignment position. In order to make the measure comparable to the measures provided by the other three methods, which provide one value for each alignment position, only the maximum z-score at each alignment position was used as a measure of feature importance. As recently described, the MDG is biased towards features with a high number of categories (Strobl et al., 2007), e.g. an uninformative variable with 10 categories receives a higher MDG than an uninformative variable with two categories. Consequently, informative features with few categories may go unnoticed among uninformative features with many categories. This bias is of particular interest for our problem, since alignment positions in unstable regions of the alignment show a higher number of different amino acids, i.e. categories, than positions in stable regions. Thus, MDG for amino acids located in the variable loops are likely to be boosted by this artifact. In order to correct for this bias, we developed a heuristic based on permutations. Briefly, the response vector is randomly permuted k times, and for each permutation the RF classifier is trained and MDG for each variable is computed. Mean µ and standard deviation σ of MDG for one feature are estimated from the k models. Under the assumption that the MDG of an uninformative variable is normally distributed, given µ and σ, the probability of observing the MDG value obtained in combination with the original response vector serves as a p-value. We termed this approach permutation importance (PImp; Altmann et al. (2010)). For correcting the MDG importance, we used 1000 permutations of the original response vector. For MI and the combined z-scores we observed the same bias as for MDG (Müller, 2009), and consequently, we applied PImp for correcting it. For both MI and SVM-based z-scores the PImp p-value was based on 1000 iterations. All measures of variable importance were applied to the complete set of 142 training samples, i.e. not derived from cross-validation models. In vitro Validation In order to experimentally validate our b12 prediction model, site-directed mutagenesis is performed to confirm important positions derived from the feature importance analysis. To this end, the b12 neutralization phenotype is measured before (original viral variant) and after the introduction of up to three mutations. In total, 18 sets of mutations were introduced into 10 different viral variants. 166 7 A Solution to the HIV Pandemic: A Vaccine 7.3.2 Results Classification Performance Table 7.2 lists the mean AUC and its standard deviation assessed in 10 repetitions of 5-fold cross-validation. Based on sequence information alone both classifiers achieve an AUC of around 0.8 for predicting the b12 neutralization phenotype. Adding information on putative glycosylation sites slightly decreased the performance for both methods. Removal of the sequence information from the five variable loops led to a substantial and significant decrease in performance for the RF (p = 0.00098) and the SVM model (p = 0.00195). Replacement of the sequence information of the variable loops with the variable loop features led to a further decrease in performance. Maintaining the sequence information of the V2 loop resulted in re-establishment (and even slight improvement) of the performance obtained with the complete sequence. Maintaining one of the other variable regions did not improve performance substantially over the model without any variable loop information (data not shown). The model based on the gp120 region alone achieved a performance comparable to the sequence encoding, while models based on the signal peptide or gp41 alone were clearly inferior (both p = 0.00098). For most encodings (except “signal” and “gp120”) the SVM outperformed the RF model significantly (p < 0.03223). RF performs significantly better than the linear SVM (p < 0.00977) for the prediction of neutralization by nAb 2F5. With AUCs above 0.9 the performance for the RF models is better than that for the b12 model. Addition of glycosylation features resulted in a slight decrease in performance, while removal of variable loop information, unlike in the case of b12, resulted in a slight increase. The best performance was achieved for both learners when the sequence information was restricted to gp41. For both methods the model restricted to gp41 sequence information model significantly better than the sequence encoding (p < 0.004883). With values around 0.72 the performance of 4E10 neutralization prediction is slightly worse than the performance observed for the b12 models. Here, addition of glycosylation and removal of the variable regions slightly increased the model performance. As in the case of 2F5, the best models were obtained when the information was restricted to the target of the nAb: gp41. The improvement reached statistical significance only for the RF classifier (p = 0.001953). Variable Importance Figure 7.4 a) depicts the raw MDG importance extracted from the RF model predicting b12 neutralization. A large number of alignment positions within the variable loops (primarily V1,V2, and V4) received high MDG values. The PImp correction of the MDG (Figure 7.4 b), however, yields a different picture. Here, most of the variable loops do not contain any information. The exception is V2, which contains four of the top 20 most important positions. In general, according to this model, only few positions are linked to the b12 neutralization phenotype. Surprisingly, among the top 20, two important positions are located in the signal peptide, and another five are in gp41. One of the positions in the signal peptide (5b; an insertion compared to HXB2) is indeed the top-most important position. The second most important position is located in the b12 binding site. 5b ● 135 140136 147 148 150 164 1.0 ● ● ● 31 MDG 460 ● ● 446 ● 396 400 410 360 ● ● 167 ● ● 185188 1.5 134 146 151 7.3 Predicting Neutralization from Genotype ● ● ● ● ● ● ● ● 0.0 0.5 ● 0 200 400 600 800 1000 alignment position 5b 25 (a) MDG 369 15 ●● ● ● ● 750 633 641 ● ● 446 ● ● ● ● ● 0 ● 372 134 ● 155 164 173 188 84 47 ● 63 5 ● 786b 360 ● 676 185 ● ● 21 10 −log10(MDG.pimp) 20 ● 0 200 400 600 800 1000 alignment position 750 774 525 535 633 646 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0 ● 786b 403 ● 372 386 ● 333 ● 277 292 ● 234 ● 190 ● 155 173 ● 132 ● 84 2 726 ● 706 676 641 ● 467 ● 460 ● 432 446 369 360 ● 352 ● 161 ● 63 ● 3 188 185 ● 164 134 ● 47 ● 21 ● ● 1 consensus ● 5b 4 (b) PImp MDG 0 200 400 600 800 1000 alignment position (c) consensus Figure 7.4: Feature importance derived from models that predict b12 neutralization: RF MDG (a), PImp of MDG (b), consensus of all four methods (c). The regions of Env (signal peptide, gp120, and gp41) are separated by vertical dashed lines. The variable regions within gp120 are highlighted in the background orange (V1), green (V2), beige (V3), light blue (V4), and yellow (V5). Furthermore, known epitopes are highlighted with red (CD4 binding), blue (b12 binding), purple (CD4 and b12), orange (2F5 binding), and green (4E10 binding). For the top 20 important residues, the position in reference to HXB2 is provided. 168 7 A Solution to the HIV Pandemic: A Vaccine b12 sequence glyco no loops loop feat only V2 signal gp120 gp41 RF 0.791 (0.018) 0.782 (0.021) 0.765 (0.013) 0.755 (0.015) 0.801 (0.017) 0.708 (0.024) 0.788 (0.015) 0.734 (0.019) SVM 0.808 (0.020) 0.803 (0.020) 0.781 (0.019) 0.773 (0.021) 0.813 (0.019) 0.579 (0.031) 0.792 (0.026) 0.752 (0.020) 2F5 RF 0.914 (0.013) 0.905 (0.019) 0.927 (0.016) 0.911 (0.017) 0.924 (0.011) 0.766 (0.016) 0.825 (0.028) 0.956 (0.013) SVM 0.840 (0.012) 0.836 (0.014) 0.845 (0.015) 0.849 (0.013) 0.829 (0.011) 0.693 (0.021) 0.807 (0.016) 0.866 (0.016) 4E10 RF SVM 0.717 (0.026) 0.718 (0.018) 0.724 (0.032) 0.723 (0.020) 0.736 (0.025) 0.719 (0.022) 0.719 (0.023) 0.710 (0.021) 0.727 (0.028) 0.708 (0.029) 0.621 (0.038) 0.587 (0.037) 0.694 (0.023) 0.680 (0.026) 0.754 (0.019) 0.726 (0.028) Table 7.2: Classification performance. The mean (sd) classification performance measured using AUC and assessed via 10 repetitions of 5-fold cross-validation for the two classifiers (RF and SVM) and the three different antibodies. Rows correspond to different feature sets (see Table 7.1 for a brief description). The best performance for each antibody is indicated by bold font. Every measure for feature importance provided a slightly different picture (data not shown). Thus, for the final interpretation we will rely on a consensus of the four methods. The consensus (Figure 7.4 c) counts how often each position occurs among the top 20 most important positions (thus, only values of 0, . . . , 4 are possible). Figure 7.5 depicts the location of important positions (i.e. appearing in two or more top 20 lists) on the gp120 structure (PDB ID 2nxy). Of the eight positions with a consensus count of two or larger in gp120 and outside the variable loops, two cannot be mapped to the structure (HXB2 positions 47 and 63), two are located in the b12 binding site (positions 369 and 432), three are surrounding the b12 binding side (positions 352, 360, and 460), and one is located on the opposite site of the b12 binding site (position 446). Of note, position 446 is a potential glycosylation site, and also, mutations at this position might influence the glycosylation status of position 444. Due to the trimeric structure of gp120 (see for instance Liu et al. (2008)), glycans attach at this position might influence binding of b12. Of note, two additional b12 binding sites appear among the top 20 list of one method (positions 372 (RF) and 386 (SVM)). Figure 7.6 a) shows the PImp MDG from the RF 2F5 model. One positions (665) in gp41 receives outstanding importance and is located in the 2F5 epitope (Zwick et al., 2001). Among the top 20, two additional positions (662 and 667) belonging the linear epitope are discovered. Likewise, Figure 7.6 b) depicts the PImp MDG importance for the 4E10 model. Important positions are more scattered in the complete Env gene, than in the case of 2F5. The most important position is located in gp41 and the second most important one between the variable loops V3 and V4. With position 674, one known 4E10 binding site position (Zwick et al., 2001) is ranked third by the corrected MDG measure. In vitro Validation Table 7.3 lists the sets of mutations that were selected for experimental validation. With pathways of three mutations we aim at observing a substantial change in the b12 neutralization phenotype. Mutations in the signal peptide and in gp41 are used to verify, whether these positions are statistical artifacts or have biological impact. Further mutations are used to assess the effect of adding and removing potential glycosylation sites. The remain- 7.3 Predicting Neutralization from Genotype 169 0 200 400 600 853 ● ● ● ● 787 800 ● 662 ● 490 492 444 ● 184 99 102 ● ● ● 63 20 21 22 23 30 32 0 ● ● ● ●● ● 730 667 400 600 ● 200 −log10(MDF.pimp) 800 665 Figure 7.5: Important positions for b12 neutralization. Important positions were mapped to the structure of gp120 (PDB ID 2nxy). The right picture depicts the surface of b12 and CD4 binding sites, the left picture provides the location of the important positions on the HXB2 reference sequence. ● ● ● 800 1000 alignment position 60 567 (a) PImp MDG 2F5 348 40 30 658 674 417 ● ● ● ● ●● ● 793 ● 629 ● 585 588 ● ● 557 321 ● 513 525 265 ● 429 134 ● ● ● ● 0 20 ● 360a 373 92 20 ● 10 −log10(MDF.pimp) 50 ● 0 200 400 600 800 1000 alignment position (b) PImp MDG 4E10 Figure 7.6: Feature importance from models that predict 2F5 and 4E10 neutralization: PImp of RF MDG for 2F5 (a) and 4E10 (b). For further information see caption of Figure 7.4. 170 7 A Solution to the HIV Pandemic: A Vaccine ing experiments shall clarify the effect of mutations close to the b12 binding site and within conserved parts of V1 and V2. For further information on the mutation selection process please refer to the Master thesis of Fabian Müller (Müller, 2009). The mutagenesis and neutralization experiments are currently being conduced at Prof. Dr. Klaus Überla’s laboratory at the Ruhr-University in Bochum, Germany. 7.3.3 Discussion We trained statistical learning models that predict the neutralization of viral variants by three nAbs based on the Env amino acid sequence. With AUC values from 0.717 to 0.914 all models achieved good to very good prediction performance using the sequence encoding. The removal of sequence information of the five variable loops resulted in a sensible loss of performance for the RF model in the prediction of neutralization by b12. The dependence on information on the variable loop V2 for predicting the b12 phenotype is confirmed by the re-established performance after providing the V2 sequence information and by the consensus variable importance (Figure 7.4). Here, several positions within V2 receive high importance values. The influence of the V2 loop on the neutralization efficacy by b12 was recognized soon after the discovery of b12 (Roben et al., 1994). Of note, the corrected MDG measure concurs better with the increase in performance due to the addition of single loops than the uncorrected MDG measure. The variable loop features derived from sequence information that aimed at capturing physicochemical properties of the loops failed to improve the model. In fact, inclusion of these features resulted in decreased performance with both statistical learning methods. Several of the important sequence positions in the conserved region of gp120 were located either in the b12 binding site or structurally close to the b12 and CD4 binding site. The virus has to maintain the capability to bind the CD4 receptor, hence mutations in the CD4 binding site are unlikely. Consequently, substitutions near the or at the b12 binding site, but not at the CD4 binding site, are likely to affect susceptibility to nAb b12. For other important positions that do not afford an obvious explanation, we will conduct in vitro neutralization experiments that will either confirm their importance or mark them as an artifact of the statistical methods. The interpretation of the 2F5 prediction models revealed that decisions of the RF model were mainly based on a single position within the known 2F5 epitope. Also the analysis of the 4E10 RF model reveals a known position in the binding epitope among the top three important positions. Given that the cutoff selection for 4E10 was not as straightforward as for the other two antibodies, the rediscovery of a known epitope is reassuring. The agreement between the positions relevant for 2F5 and 4E10 neutralization and their known epitopes confirm the validity of the approach. Unfortunately, the described method is susceptible to evolutionary bias. Here, evolutionary bias occurs in the form of two (or more) mutations, with one mutation having a causal link to the phenotype, and the other mutations merely being correlated to the first one due to emergence in a common ancestor or accumulation in the same lineage. Of course, the statistical learning models cannot discriminate between a causal link and a correlation to the phenotype. These correlations might explain the positions falsely assessed as being important by the method. Mutations that are located in the signal peptide or in gp41 accession AY835438 AY835438 AY835438 DQ388515 DQ388515 DQ388515 DQ435683 DQ435683 DQ411853 DQ411853 DQ411853 DQ388517 DQ388517 AY835452 DQ411852 AY423984 AY835448 AY835448 mutations T132S, L134A T132S, L134A, P369L D386N K360V, L369P K360V, L369P, R432K N460I S132T, A134L S132T, A134L, L369P V134A V134A, D185N K446T T185D T185D, M161T N386D H352I N462S K5bQ D750N expected effect more resistant more resistant more resistant (SVM) more susceptible more susceptible more susceptible more susceptible more susceptible more resistant more resistant more resistant more susceptible more susceptible more susceptible more resistant more susceptible more resistant more susceptible (SVM) measured effect comment path of three mutations path of three mutations add glycosylation; RF model shows no effect path of three mutations path of three mutations remove glycosylation path of three mutations path of three mutations mutations in V1, V2 mutations in V1, V2 close to two glycosylation sites mutations in V1, V2 mutations in V1, V2 remove glycosylation near binding site remove glycosylation mutation in signal peptide mutation in gp41, RF model shows no effect Table 7.3: Site-directed mutagenesis experiments. The table lists the name of the isolate, its accession number, the mutations that have to be introduced (sequence position is given in reference to HXB2; capital letters before and after the position denote wild type amino acid found in the isolate and the mutated amino acid, respectively), the expected effect, the measured effect, and the effect that is to be tested (comment). isolate 6535 6535 6535 ZM197M ZM197M ZM197M CAP210.2.00.E8 CAP210.2.00.E8 Du172 Du172 Du172 ZM233M ZM233M CAAN Du156 ZM53M.PB12 THRO TRJO 7.3 Predicting Neutralization from Genotype 171 172 7 A Solution to the HIV Pandemic: A Vaccine and are supposed to be determinants of b12 binding are likely examples for this bias. For instance, position 5b in the signal peptide is highly correlated with position 352 in the proximity of the b12 binding site (Müller, 2009). Moreover, best performance was usually achieved when the sequence information was restricted to the Env region targeted by the nAb. Models based on the remaining regions that accommodate (potential) correlated mutations were clearly inferior. With such a restricted dataset a cautious interpretation of the results is required. Also the clade membership is a strong indicator for this evolutionary bias. In order to study a less biased dataset we repeated the RF analysis of b12 neutralization prediction with a dataset restricted to sequences from clades B, C, and D that show a balanced neutralization phenotype. The dataset comprised 106 instances, and the AUC of the sequence encoding was 0.75±0.037, corresponding to a slight decrease in comparison to the full dataset. The qualitative findings, however, could be confirmed: decrease of performance without information on variable loops (0.67±0.038), models restricted to signal peptide (0.578±0.054) and gp41 (0.63±0.043) achieved worst performance, and sequence information on the V2 loop is pivotal for prediction performance (0.764±0.028). A drawback of this study was the fact that experimental data were pooled from four different publications using different assays. Different neutralization assays may show a different phenotype in combination with some or all nAbs (Fenyö et al., 2009). However, dichotomizing the neutralization phenotype to “neutralizing” and “non-neutralizing” using a cutoff is likely to remove most of the variation originating from different assays. Moreover, we did not use the misclassification rate as a performance measure, but the area under the ROC curve. The latter measure can be interpreted as the probability that a randomly chosen positive (neutralizing) sample receives a higher prediction score than a randomly selected negative (non-neutralizing) sample. More precisely, the AUC is computed by first sorting all samples according to their predicted value. Hence, even if, due to experimental variation, a sample is mislabeled, it is less likely to substantially perturb the performance measure. Simply because samples close to the cutoff are more likely to receive prediction values in-between clear negatives and clear positives. Despite the fact that only relatively few training data were available, we were able to build statistical models with high predictive power. Furthermore, interpretation of these models revealed sequence positions that were known to be in the binding site of the studied nAbs. Therefore, this seems to be a valid approach for identifying residues that determine antibody neutralization. In terms of vaccine design our approach can be used of identify viral variants that are potential immunogens for eliciting a broadly neutralizing antibody. More precisely, assuming that variants that are very susceptible to a given nAb are also potential potent immunogens, which elicit the nAb in vivo, our in silico prediction method can help to screen large available sequence databases for interesting candidate immunogens. 7.4 Outlook In the search for new broadly neutralizing antibodies, comparatively large panels of different viruses are tested against the new candidate antibody, and the three antibodies studied here are typically used as a reference, see for instance the work by Walker et al. (2009). Hence, if made public, additional data can be used to improve the prediction models intro- 7.4 Outlook 173 duced here, or alternatively, the additional samples can be used as an independent validation set. Moreover, our approach can also be extended to study the genotype-phenotype relation of newly discovered nAbs. An additional approach to improving models that predict neutralization by antibodies is the utilization of features derived from the protein structure instead of the sequence. Clearly, successful binding of a nAb to its epitope on the surface of the viral spike is determined by shape and chemical complementarity. The protein sequence is only an indicator for the structure of the protein in vivo. As a consequence of its high impact for vaccine design, the structure of the gp120 core was studied in great detail, and today there are a number of experimentally generated protein crystal structures available. Some of these structures contain even the variable loop V3 (Huang et al., 2005) that is crucial in coreceptor binding. And, precisely this structure of the V3 loop was previously used to derive structural descriptors for the prediction of coreceptor usage (Sander et al., 2007). In this previous work, a statistical learning model using structural descriptors could outperform a sequence-based approach, and a combination of both, structure and sequence, provided the best model. Likewise, the prediction of neutralization by nAbs could benefit from the application of structural descriptors. This approach is of particular interest for b12, as a gp120 structure in complex with b12 is available (Zhou et al., 2007). The proposed approach to model nAb neutralization will be applied to assist vaccine development. The knowledge extracted from the statistical learning models and the models themselves can help to construct a large library of Env variants that are likely to be neutralized by the nAb. The library of Env variants is then used in an immunogen optimization protocol to develop an HIV vaccine. In particular, the variants in the library will be screened for their ability to evoke a strong immune response and additional statistical models will also be implemented for refining the search of candidate immunogens. The approach for finding immunogens that elicit a desired type of antibodies can be extended to find nAbs for other pathogens or even for specific types of cancer. 174 7 A Solution to the HIV Pandemic: A Vaccine 8 Conclusion and Outlook We have presented methods for improving HIV patient care. The developed models use techniques from statistical learning and focus on the prediction of response to combination treatment as opposed to resistance against single drugs. In particular, we investigated the benefit of features encoding the viral evolution during therapy. The most predictive feature was the genetic barrier to drug resistance, which quantifies the likelihood that the virus will escape from the drug by developing further resistance mutations. A statistical model was trained on clinical data from the United States of America comprising viral genotype, treatment, and outcome of the regimen. The use of the genetic barrier together with indicators for mutations in protease and reverse transcriptase as well as indicators for the usage of antiretroviral drugs in the regimen outperforms commonly used rules-based systems in finding successful treatments. Precisely, a retrospective analysis on data from Europe demonstrated that our tool geno2phenoTHEO reaches a true positive rate (TPR) of 64% at a false positive rate (FPR) of 20%. By contrast, the rules-based approaches yield only 44% to 48% TPR at the same FPR, i.e. our method identifies up to 20% more successful combination treatments than standard approaches. Within the EuResist project, which integrated multiple HIV resistance databases storing treatment related information, we further developed geno2pheno-THEO. In particular, we improved the genetic barrier by relying on a large number of predicted resistance phenotypes rather than a small number of measured phenotypes for defining mutation patterns that correspond to complete resistance. And, more importantly, we made use of the patient’s treatment history (i.e. drugs the patient has been exposed to) and baseline viral load to improve prediction to combination therapy. Apart from geno2pheno-THEO, which is (due to the employed features based on viral evolution) also referred to as evolutionary engine, two other prediction models were developed within the project. In order to provide a single prediction for one request, we explored ways for optimally combining the output of all three engines. Here, it turned out that simply using the mean of all predictions is an efficient and robust way. On a small set of 25 treatment change episode the predictions of the combined prediction engine were compared to the predictions of ten international human HIV treatment experts. Our system performed as well as the best human expert and also as well as the consensus of all human experts. The availability of three prediction engines trained on the same training data unveiled a serious limitation of the current definition to of treatment response. The standard datum definition focuses on short-term response measured at eight weeks (4-12) of therapy. The cutoff for success was a viral load below 500 cp/ml at the time-point closest to eight weeks. A substantial number of treatments, however, which were labeled as failure, reached a VL below the threshold at a later time during the treatment. Interestingly, all three engines captured that trend frequently, and disagreed with the label of the treatment. 175 176 8 Conclusion and Outlook As a consequence, we studied modified treatment response definitions: sustained response at 24 weeks (16-32) of treatment, the area under the viral load curve during one year of therapy, and a labeling that combines short-term and sustained response. The latter definition rejects treatments that have a discordant response at eight and 24 weeks of therapy, and therefore filters instances with potential adherence problems. In this setting our methodology achieves an AUC of 0.85 compared to an AUC of 0.77 on the original response definition (using the richest model). Furthermore, we addressed the update-problem of data-driven decision support systems. Briefly, the update-problem originates from the fact that data-collection efforts providing the training data for those systems lag behind the introduction of novel drugs. More precisely, novel drugs tend to be prescribed to patients in an advanced state of the disease. Hence, it takes time to collect a sufficient amount of training data for the data-driven systems. In order to circumvent that problem, we introduced a new covariate, which represents the activity of novel drugs in the regimen. Here, activity of novel drugs is assessed by a rules-based system and when applied to treatments comprising newly licensed drugs the modified system outperforms the purely rules-based approach – manifested in an increase of the TPR by approximately 25% at a FPR of 20%. We also investigated treatment history as a potential replacement for the viral sequence. The rationale behind the approach is that the prescribed drugs shape the genetic makeup of the viral population. Consequently, the treatment history, i.e. list of drugs the patient was exposed to, is a strong predictor of treatment response. Here, we found that prediction models based on history information alone are slightly inferior (AUC of 0.75) to genotypebased predictions (AUC of 0.77). History and genotype information seem to be partially complementary, since combining genotype-based and history-based predictions resulted in a further increase in performance (AUC of 0.79), regardless of the mode of combination. We made first progress towards the sequencing of anti-HIV therapies. To this end, we used methods from large vocabulary language processing to develop a framework that allows rapid search for likely viral variants arising at treatment failure. We developed five mutation models on the basis of in vitro phenotypic resistance data and emergence of resistance mutations observed in vivo. The framework was challenged to separate successful from failing treatments on the basis of mutations caused by the immediately preceding regimen. The performance was moderate (AUC of 0.63) but constitutes a substantial improvement over the baseline (AUC of 0.55). Additionally, we could demonstrate that resistance development is correctly captured within drug classes. Simulating resistance development during antiretroviral treatment with a maximum of five mutations takes in the new framework only 3 seconds while keeping track of the 100 most likely viral variants. Hence, the framework affords web services with acceptable response time. Finally, we developed statistical models that predict neutralization of HIV variants by three broadly neutralizing antibodies based on the Env genotype. The predictive performance of the models using sequence information of complete Env ranged from 0.72 AUC to 0.91 AUC, depending on the antibody. Interpretation of the statistical learning methods for all three antibodies recovered at least one known position located in the antibody-specific epitope. For the antibody b12, sequence information on the variable loop V2 is pivotal to maintain model performance. Furthermore, selected mutations from the b12 model are currently tested in vitro to further validate the approach. Knowledge derived from the 177 statistical models and the models themselves can be used to build focused libraries that screen in vitro for potent immunogens. Our methodology can, in theory, be applied to novel HIV targeting neutralizing antibodies or even be extended to other pathogens and cancer. The work described in this thesis resulted in a number of freely available web services. Geno2pheno-THEO is available as part of the geno2pheno web suite (http://www. geno2pheno.org). Part of this web suite is also a prediction system for resistance against integrase inhibitors and can be directly accessed at http://integrase.geno2pheno.org. Finally, our contribution to the EuResist prediction engine can be queried as part of the complete system using the website http://engine.euresist.org. Outlook A most straightforward direction is the extension of the models developed in this thesis to the newly available drugs from different classes. In particular, integrase inhibitors and entry inhibitors. A further challenge is the development of prediction tools for treatment naïve patients. This group faces the unique problem of transmitted drug resistance mutations. Transmitted mutations may, due to replicative disadvantage, vanish from the majority of the viral population, and consequently, go unnoticed using standard interpretation tools. However, traces of those mutations in viral genome are likely to be present at nucleotide or in rare cases even at amino acid level. In addition, it is of particular interest to infer drug adherence problems from stored treatment data (primarily viral load trajectories). Such information will not only enhance response prediction, but, more importantly, will also provide the means for treating clinicians to identify patients having trouble in correctly taking their medication. Here, one can provide a tool for detering development of resistance mutations. Personalized treatment in the HIV field has, so far, only focused on the interaction between the drug and virus. With our approaches for predicting response to antiretroviral drugs in vivo, a third player enters the interaction network: the patient. Until now, the patient was almost neglected in the prediction of treatment response. However, in order to make further advancements in personalized anti-HIV therapy the interplay between virus and host as well as between drugs and host has to be considered. Indeed, initial studies focusing on the relationship between human genomics and the control of HIV infection have already been conduced (Telenti and Goldstein, 2006). The next steps in this direction should involve the systematic exploitation of available resistance databases that offer a wealth of treatment data. Hence, it is comparatively uncomplicated to updated these databases gradually with the missing human genetics data. This will offer a wealth of possibilities, including pharmocogenetic studies uncovering the interaction between drugs and host-related differences in the involved metabolic pathways (Telenti and Zanger, 2008)1 . The resulting knowledge will help to provide the right dosage of drugs for the patient in addition to the right drug attacking the virus. The correct dosing is of particular interest, since drug-related side-effects are a major obstacle in the life-long HIV treatment. Also the use of next generation sequencing techniques will allow to study the evolution of the 1 http://www.hiv-pharmacogenomics.org/ 178 8 Conclusion and Outlook viral population during treatment and result in more advanced decision support tools. The evaluation of clinical decision support systems used for assessing drug resistance should be carried out on standardized publicly available datasets. To date, validation of such tools is based on retrospective analysis of clinical data – with different sets of data used for different tools. Consequently, the performance of the various number of tools is hard to compare. Especially since the datasets are prone to local differences in drug prescriptions or widely differing extent of pre-treatment of involved patients. Hence, a publicly available benchmark dataset with fair distributions of drug usage and certain level of pre-treatment (i.e. not to many “simple cases”) is needed. Moreover, publicly available datasets attract a variety of scientists applying available methods or developing new tailored methods to the problem. For instance, the freely available (standardized) genotype-phenotype dataset (Rhee et al., 2006) caused a number of follow-up publications (Saigo et al., 2007; Kjaer et al., 2008; Kierczak et al., 2009). The methods developed in this theses can also be applied to optimization of treatment for other viral diseases such as Hepatitis B and Hepatitis C. While data-collection efforts and statistics will remain the same, the extension to these viral diseases is not straightforward, as clinically relevant definitions of response to treatment have to be identified and agreed upon. Furthermore, using matched genotype-phenotype data it is comparably simple to identify resistance mutations. Understanding their mechanism of resistance, however, still remains a challenge. Bibliography (2006). Denying science. Nature Medicine, 12(4):369. (2008). The cost of silence? Nature, 456(7222):545. Abbadessa, G., Accolla, R., Aiuti, F., Albini, A., Aldovini, A., Alfano, M., Antonelli, G., Bartholomew, C., Bentwich, Z., Bertazzoni, U., Berzofsky, J. A., Biberfeld, P., Boeri, E., Buonaguro, L., Buonaguro, F. M., Bukrinsky, M., Burny, A., Caruso, A., Cassol, S., Chandra, P., Ceccherini-Nelli, L., Chieco-Bianchi, L., Clerici, M., Colombini-Hatch, S., de Giuli Morghen, C., de Maria, A., de Rossi, A., Dierich, M., Della-Favera, R., Dolei, A., Douek, D., Erfle, V., Felber, B., Fiorentini, S., Franchini, G., Gershoni, J. M., Gotch, F., Green, P., Greene, W. C., Hall, W., Haseltine, W., Jacobson, S., Kallings, L. O., Kalyanaraman, V. S., Katinger, H., Khalili, K., Klein, G., Klein, E., Klotman, M., Klotman, P., Kotler, M., Kurth, R., Lafeuillade, A., Placa, M. L., Lewis, J., Lillo, F., Lisziewicz, J., Lomonico, A., Lopalco, L., Lori, F., Lusso, P., Macchi, B., Malim, M., Margolis, L., Markham, P. D., McClure, M., Miller, N., Mingari, M. C., Moretta, L., Noonan, D., O’Brien, S., Okamoto, T., Pal, R., Palese, P., Panet, A., Pantaleo, G., Pavlakis, G., Pistello, M., Plotkin, S., Poli, G., Pomerantz, R., Radaelli, A., Robertguroff, M., Roederer, M., Sarngadharan, M. G., Schols, D., Secchiero, P., Shearer, G., Siccardi, A., Stevenson, M., Svoboda, J., Tartaglia, J., Torelli, G., Tornesello, M. L., Tschachler, E., Vaccarezza, M., Vallbracht, A., van Lunzen, J., Varnier, O., Vicenzi, E., von Melchner, H., Witz, I., Zagury, D., Zagury, J.-F., Zauli, G., and Zipeto, D. (2009). Unsung hero Robert C. Gallo. Science, 323(5911):206–7. Agopian, A., Gros, E., Aldrian-Herrada, G., Bosquet, N., Clayette, P., and Divita, G. (2009). A new generation of peptide-based inhibitors targeting HIV-1 reverse transcriptase conformational flexibility. J Biol Chem, 284(1):254–64. Agrawal, R., Imielinski, T., and Swami, A. (1993). Mining association rules between sets of items in large databases. In Proceedings of ACM SIGMOD Conference on Management of Data, pages 207–216. Agrawal, R. and Srikant, R. (1994). Fast algorithms for mining association rules. Proc. 20th Int. Conf. Very Large Data Bases, pages 487–499. Aharoni, E., Altmann, A., b Borgulya, D’utilia, R., Incardona, F., Kaiser, R., Kent, C., Lengauer, T., Neuvirth, H., Peres, Y., Petroczi, A., Prosperi, M., Rosen-Zvi, M., Schulter, E., Sing, T., Sonnerborg, A., Thompson, R., and Zazzi, M. (2007). Integration of viral genomics with clinical data to predict response to anti-HIV treatment. IST-Africa 2007 Conference Proceedings, Paul Cunningham and Miriam Cunningham (Eds). Aho, A., Sethi, R., and Ullman, J. (1986). Compilers: Principles, techniques, and tools. Addison-Wesley. 179 180 8 Bibliography Akaike, H. (1974). A new look at the statistical model identification. IEEE transactions on automatic control, AC-19:716–723. Allauzen, C. and Mohri, M. (2002). On the determinizability of weighted automata and transducers. In Proceedings of the workshop Weighted Automata: Theory and Application (WATA). Allauzen, C., Riley, M., Schalkwyk, J., Skut, W., and Mohri, M. (2007). OpenFst: A general and efficient weighted finite-state transducer library. CIAA 2007, Lecture Notes in Computer Science, 4783:11–23. Altmann, A., Beerenwinkel, N., Sing, T., Savenkov, I., Däumer, M., Kaiser, R., Rhee, S.-Y., Fessel, W. J., Shafer, R. W., and Lengauer, T. (2007a). Improved prediction of response to antiretroviral combination therapy using the genetic barrier to drug resistance. Antivir Ther (Lond), 12(2):169–78. Altmann, A., Däumer, M., Beerenwinkel, N., Peres, Y., Schülter, E., Büch, J., Rhee, S.-Y., Sönnerborg, A., Fessel, W. J., Shafer, R. W., Zazzi, M., Kaiser, R., and Lengauer, T. (2009a). Predicting the response to combination antiretroviral therapy: retrospective validation of geno2pheno-THEO on a large clinical database. J Infect Dis, 199(7):999– 1006. Altmann, A., Sing, T., Vermeiren, H., Winters, B., Craenenbroeck, E. V., der Borght, K. V., Rhee, S. Y., Shafer, R. W., Schuelter, E., Kaiser, R., Peres, Y., Sonnerborg, A., Fessel, W. J., Incardona, F., Zazzi, M., Bacheler, L., Vlijmen, H. V., and Lengauer, T. (2007b). Inferring virological response from genotype: with or without predicted phenotypes? Antivir Ther (Lond), 12(5):S169–S169. Altmann, A., Sing, T., Vermeiren, H., Winters, B., Craenenbroeck, E. V., der Borght, K. V., Rhee, S.-Y., Shafer, R. W., Schülter, E., Kaiser, R., Peres, Y., Sönnerborg, A., Fessel, W. J., Incardona, F., Zazzi, M., Bacheler, L., Vlijmen, H. V., and Lengauer, T. (2009b). Advantages of predicted phenotypes and statistical learning models in inferring virological response to antiretroviral therapy from HIV genotype. Antivir Ther (Lond), 14(2):273–83. Altmann, A., Thielen, A., Frentz, D., Laethem, K. V., Bracciale, L., Incardorna, F., Sönnerborg, A., Zazzi, M., Kaiser, R., and Lengauer, T. (2009c). Keeping models that predict response to antiretroviral therapy up-to-date: fusion of pure data-driven approaches with rules-based methods. Reviews in Antiviral Therapy, page A92. Altmann, A., Toloşi, L., Sander, O., and Lengauer, T. (2010). Permutation importance: a corrected feature importance measure. Bioinformatics, 26(10):1340–7. Andreola, M. L., Pileur, F., Calmels, C., Ventura, M., Tarrago-Litvak, L., Toulmé, J. J., and Litvak, S. (2001). DNA aptamers selected against the HIV-1 RNase H display in vitro antiviral activity. Biochemistry, 40(34):10087–94. Arthos, J., Cicala, C., Martinelli, E., Macleod, K., Ryk, D. V., Wei, D., Xiao, Z., Veenstra, T. D., Conrad, T. P., Lempicki, R. A., McLaughlin, S., Pascuccio, M., Gopaul, R., 8 Bibliography 181 McNally, J., Cruz, C. C., Censoplano, N., Chung, E., Reitano, K. N., Kottilil, S., Goode, D. J., and Fauci, A. S. (2008). HIV-1 envelope protein binds to and signals through integrin alpha4 beta7, the gut mucosal homing receptor for peripheral T cells. Nat Immunol, 9(3):301–9. Baelen, K. V., Salzwedel, K., Rondelez, E., Eygen, V. V., Vos, S. D., Verheyen, A., Steegen, K., Verlinden, Y., Allaway, G. P., and Stuyver, L. J. (2009). Susceptibility of human immunodeficiency virus type 1 to the maturation inhibitor bevirimat is modulated by baseline polymorphisms in gag spacer peptide 1. Antimicrob Agents Chemother, 53(5):2185–8. Barouch, D. H. (2008). Challenges in the development of an HIV-1 vaccine. Nature, 455(7213):613–9. Barré-Sinoussi, F., Chermann, J. C., Rey, F., Nugeyre, M. T., Chamaret, S., Gruest, J., Dauguet, C., Axler-Blin, C., Vézinet-Brun, F., Rouzioux, C., Rozenbaum, W., and Montagnier, L. (1983). Isolation of a T-lymphotropic retrovirus from a patient at risk for acquired immune deficiency syndrome (AIDS). Science, 220(4599):868–71. Beerenwinkel, N., Däumer, M., Oette, M., Korn, K., Hoffmann, D., Kaiser, R., Lengauer, T., Selbig, J., and Walter, H. (2003a). Geno2pheno: Estimating phenotypic drug resistance from HIV-1 genotypes. Nucleic Acids Res, 31(13):3850–5. Beerenwinkel, N., Däumer, M., Sing, T., Rahnenfuhrer, J., Lengauer, T., Selbig, J., Hoffmann, D., and Kaiser, R. (2005a). Estimating HIV evolutionary pathways and the genetic barrier to drug resistance. J Infect Dis, 191(11):1953–60. Beerenwinkel, N., Lengauer, T., Däumer, M., Kaiser, R., Walter, H., Korn, K., Hoffmann, D., and Selbig, J. (2003b). Methods for optimizing antiviral combination therapies. Bioinformatics, 19 Suppl 1:i16–25. Beerenwinkel, N., Rahnenführer, J., Däumer, M., Hoffmann, D., Kaiser, R., Selbig, J., and Lengauer, T. (2005b). Learning multiple evolutionary pathways from cross-sectional data. J Comput Biol, 12(6):584–98. Beerenwinkel, N., Rahnenführer, J., Kaiser, R., Hoffmann, D., Selbig, J., and Lengauer, T. (2005c). Mtreemix: a software package for learning and using mixture models of mutagenetic trees. Bioinformatics, 21(9):2106–7. Beerenwinkel, N., Schmidt, B., Walter, H., Kaiser, R., Lengauer, T., Hoffmann, D., Korn, K., and Selbig, J. (2002). Diversity and complexity of HIV-1 drug resistance: a bioinformatics approach to predicting phenotype from genotype. Proc Natl Acad Sci USA, 99(12):8271–6. Bellman, R. (1958). On a routing problem. Quaterly of Applied Mathematics, 16(1):87–90. Berger, E. A., Murphy, P. M., and Farber, J. M. (1999). Chemokine receptors as HIV-1 coreceptors: roles in viral entry, tropism, and disease. Annu Rev Immunol, 17:657–700. 182 8 Bibliography Bickel, S., Bogojeska, J., Lengauer, T., and Scheffer, T. (2008). Multi-task learning for HIV therapy screening. Proceedings of the 25th International Conference on Machine Learning, pages 56–63. Binley, J. M., Wrin, T., Korber, B., Zwick, M. B., Wang, M., Chappey, C., Stiegler, G., Kunert, R., Zolla-Pazner, S., Katinger, H., Petropoulos, C. J., and Burton, D. R. (2004). Comprehensive cross-clade neutralization analysis of a panel of anti-human immunodeficiency virus type 1 monoclonal antibodies. J Virol, 78(23):13232–52. Boffito, M., Acosta, E., Burger, D., Fletcher, C. V., Flexner, C., Garaffo, R., Gatti, G., Kurowski, M., Perno, C. F., Peytavin, G., Regazzi, M., and Back, D. (2005). Therapeutic drug monitoring and drug-drug interactions involving antiretroviral drugs. Antivir Ther (Lond), 10(4):469–77. Boser, B., Guyon, I., and Vapnik, V. (1992). A training algorithm for optimal margin classifiers. Proceedings of the fifth annual workshop on Computational Learning Theory, pages 144–152. Bratt, G., Karlsson, A., Leandersson, A. C., Albert, J., Wahren, B., and Sandström, E. (1998). Treatment history and baseline viral load, but not viral tropism or CCR-5 genotype, influence prolonged antiviral efficacy of highly active antiretroviral treatment. AIDS, 12(16):2193–202. Breiman, L. (2001). Random forests. Machine learning. Breiman, L., Friedman, J. H., Olshen, R. A., and Stone, C. J. (1984). Classification and regression trees. Wadsworth, Belmont. Bromley, S. K., Burack, W. R., Johnson, K. G., Somersalo, K., Sims, T. N., Sumen, C., Davis, M. M., Shaw, A. S., Allen, P. M., and Dustin, M. L. (2001). The immunological synapse. Annu Rev Immunol, 19:375–96. Brun-Vézinet, F., Costagliola, D., Khaled, M. A., Calvez, V., Clavel, F., Clotet, B., Haubrich, R., Kempf, D., King, M., Kuritzkes, D., Lanier, R., Miller, M., Miller, V., Phillips, A., Pillay, D., Schapiro, J., Scott, J., Shafer, R., Zazzi, M., Zolopa, A., and DeGruttola, V. (2004). Clinically validated genotype analysis: guiding principles and statistical concerns. Antivir Ther (Lond), 9(4):465–78. Burton, D. R., Desrosiers, R. C., Doms, R. W., Koff, W. C., Kwong, P. D., Moore, J. P., Nabel, G. J., Sodroski, J., Wilson, I. A., and Wyatt, R. T. (2004). HIV vaccine design and the neutralizing antibody problem. Nat Immunol, 5(3):233–6. Burton, D. R., Pyati, J., Koduri, R., Sharp, S. J., Thornton, G. B., Parren, P. W., Sawyer, L. S., Hendry, R. M., Dunlop, N., and Nara, P. L. (1994). Efficient neutralization of primary isolates of HIV-1 by a recombinant human monoclonal antibody. Science, 266(5187):1024–7. Carries, S., Muller, F., Muller, F. J., Morroni, C., and Wilson, D. (2007). Characteristics, treatment, and antiretroviral prophylaxis adherence of south african rape survivors. J Acquir Immune Defic Syndr, 46(1):68–71. 8 Bibliography 183 Chang, C. and Lin, C. (2001). LIBSVM: a library for support vector machines. Software avaliable at http: // www. csie. ntu. edu. tw/ ~cjlin/ libsvm . Chapelle, O. and Zien, A. (2005). Semi-supervised classification by low density separation. Proceedings of the 10th International Workshop on Artificial Intelligence and Statistics, pages 57–64. Chen, J. C., Krucinski, J., Miercke, L. J., Finer-Moore, J. S., Tang, A. H., Leavitt, A. D., and Stroud, R. M. (2000). Crystal structure of the HIV-1 integrase catalytic core and Cterminal domains: a model for viral DNA binding. Proc Natl Acad Sci USA, 97(15):8233– 8. Clavel, F. and Hance, A. J. (2004). HIV drug resistance. N Engl J Med, 350(10):1023–35. Cozzi-Lepri, A., Phillips, A. N., Ruiz, L., Clotet, B., Loveday, C., Kjaer, J., Mens, H., Clumeck, N., Viksna, L., Antunes, F., Machala, L., Lundgren, J. D., and Group, E. S. (2007). Evolution of drug resistance in HIV-infected patients remaining on a virologically failing combination antiretroviral therapy regimen. AIDS, 21(6):721–32. Crum, N. F., Riffenburgh, R. H., Wegner, S., Agan, B. K., Tasker, S. A., Spooner, K. M., Armstrong, A. W., Fraser, S., Wallace, M. R., and Consortium, T. A. C. (2006). Comparisons of causes of death and mortality rates among HIV-infected persons: analysis of the pre-, early, and late HAART (highly active antiretroviral therapy) eras. J Acquir Immune Defic Syndr, 41(2):194–200. Dalgleish, A. G., Beverley, P. C., Clapham, P. R., Crawford, D. H., Greaves, M. F., and Weiss, R. A. (1984). The CD4 (T4) antigen is an essential component of the receptor for the AIDS retrovirus. Nature, 312(5996):763–7. Däumer, M., Beerenwinkel, N., Hoffmann, D., Selbig, J., Lengauer, T., Oette, M., Fätkenheuer, G., Rockstroh, J. K., Pfister, H. J., and Kaiser, R. (2007). Geno2pheno: Determination of clinically relevant cut-offs. European Journal of Medical Research, 12(Supplement III):1–2. Däumer, M. P., Kaiser, R., Klein, R., Lengauer, T., Thiele, B., and Thielen, A. (2008). Inferring viral tropism from genotype with massively parallel sequencing: qualitative and quantitative analysis. Antivir Ther (Lond), 13(4):A101–A101. De Luca, A., Cingolani, A., Giambenedetto, S. D., Trotta, M. P., Baldini, F., Rizzo, M. G., Bertoli, A., Liuzzi, G., Narciso, P., Murri, R., Ammassari, A., Perno, C. F., and Antinori, A. (2003). Variable prediction of antiretroviral treatment outcome by different systems for interpreting genotypic human immunodeficiency virus type 1 drug resistance. J Infect Dis, 187(12):1934–43. De Luca, A., Giambenedetto, S. D., Romano, L., Gonnelli, A., Corsi, P., Baldari, M., Pietro, M. D., Menzo, S., Francisci, D., Almi, P., Zazzi, M., and Group, A. R. C. A. S. (2006). Frequency and treatment-related predictors of thymidine-analogue mutation patterns in HIV-1 isolates after unsuccessful antiretroviral therapy. J Infect Dis, 193(9):1219–22. 184 8 Bibliography De Luca, A., Vendittelli, M., Baldini, F., Giambenedetto, S. D., Trotta, M. P., Cingolani, A., Bacarelli, A., Gori, C., Perno, C. F., Antinori, A., and Ulivi, G. (2004). Construction, training and clinical validation of an interpretation system for genotypic HIV-1 drug resistance based on fuzzy rules revised by virological outcomes. Antivir Ther (Lond), 9(4):583–93. Deforche, K., Camacho, R., Laethem, K. V., Lemey, P., Rambaut, A., Moreau, Y., and Vandamme, A.-M. (2008). Estimation of an in vivo fitness landscape experienced by HIV-1 under drug selective pressure useful for prediction of drug resistance evolution during treatment. Bioinformatics, 24(1):34–41. DeFranco, A. L., Locksley, R. M., and Robertson, M. (2007). Immunity: the immune response in infectious and inflammatory disease. New Science Press. DeLong, E. R., DeLong, D. M., and Clarke-Pearson, D. L. (1988). Comparing the areas under two or more correlated receiver operating characteristic curves: a nonparametric approach. Biometrics, 44(3):837–45. Dempster, A., Laird, N., and Rubin, D. (1977). Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society B, 39:1–38. Descamps, D., Apetrei, C., Collin, G., Damond, F., Simon, F., and Brun-Vézinet, F. (1998). Naturally occurring decreased susceptibility of HIV-1 subtype G to protease inhibitors. AIDS, 12(9):1109–11. Desper, R., Jiang, F., Kallioniemi, O. P., Moch, H., Papadimitriou, C. H., and Schäffer, A. A. (1999). Inferring tree models for oncogenesis from comparative genome hybridization data. J Comput Biol, 6(1):37–51. Dijkstra, E. (1959). A note on two problems in connexion with graphs. Numerische mathematik, 1:269–271. DiRienzo, A. G., DeGruttola, V., Larder, B., and Hertogs, K. (2003). Non-parametric methods to predict HIV drug susceptibility phenotype from genotype. Statistics in medicine, 22(17):2785–98. Dorigo, M. and Gambardella, L. (1997). Ant colony system: A cooperative learning approach to the traveling salesman problem. IEEE Transactions on evolutionary computation. Douek, D. C., Kwong, P. D., and Nabel, G. J. (2006). The rational design of an AIDS vaccine. Cell, 124(4):677–81. Drăghici, S. and Potter, R. B. (2003). Predicting HIV drug resistance with neural networks. Bioinformatics, 19(1):98–107. Durant, J., Clevenbergh, P., Halfon, P., Delgiudice, P., Porsin, S., Simonet, P., Montagne, N., Boucher, C. A., Schapiro, J. M., and Dellamonica, P. (1999). Drug-resistance genotyping in HIV-1 therapy: the VIRADAPT randomised controlled trial. Lancet, 353(9171):2195–9. 8 Bibliography 185 Dybul, M., Fauci, A., Bartlett, J., Kaplan, J., and Pau, A. (2002). Guidelines for using antiretroviral agents among HIV-infected adults and adolescents - the panel on clinical practices for treatment of HIV. Ann Intern Med, 137(5):381–433. Edgar, R. C. (2004). Muscle: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res, 32(5):1792–7. Edmonds, J. (1967). Optimum branchings. J Res Nbs B Math Sci, B 71(4):233–&. Esté, J. A. and Telenti, A. (2007). HIV entry inhibitors. Lancet, 370(9581):81–8. Fawcett, T. (2006). An introduction to ROC analysis. Pattern recognition letters, 27:861– 874. Fellay, J., Shianna, K. V., Ge, D., Colombo, S., Ledergerber, B., Weale, M., Zhang, K., Gumbs, C., Castagna, A., Cossarizza, A., Cozzi-Lepri, A., Luca, A. D., Easterbrook, P., Francioli, P., Mallal, S., Martinez-Picado, J., Miro, J. M., Obel, N., Smith, J. P., Wyniger, J., Descombes, P., Antonarakis, S. E., Letvin, N. L., McMichael, A. J., Haynes, B. F., Telenti, A., and Goldstein, D. B. (2007). A whole-genome association study of major determinants for host control of HIV-1. Science, 317(5840):944–7. Fenyö, E. M., Heath, A., Dispinseri, S., Holmes, H., Lusso, P., Zolla-Pazner, S., Donners, H., Heyndrickx, L., Alcami, J., Bongertz, V., Jassoy, C., Malnati, M., Montefiori, D., Moog, C., Morris, L., Osmanov, S., Polonis, V., Sattentau, Q., Schuitemaker, H., Sutthent, R., Wrin, T., and Scarlatti, G. (2009). International network for comparison of HIV neutralization assays: the NeutNet report. PLoS ONE, 4(2):e4505. Fields, B. N., Knipe, D. M., and Howley, P. M. (2007). Fields virology. Lippincott Williams & Wilkins. Fiscus, S. A., Cheng, B., Crowe, S. M., Demeter, L., Jennings, C., Miller, V., Respess, R., Stevens, W., and for Collaborative HIV Research Alternative Viral Load Assay Working Group, F. (2006). HIV-1 viral load assays for resource-limited settings. PLoS Med, 3(10):e417. Fisher, R. (1922). On the interpretation of χ2 from contingency tables, and the calculation of P. Journal of the Royal Statistical Society. Freed, E. O. (2004). HIV-1 and the host cell: an intimate association. Trends Microbiol, 12(4):170–7. Gallo, R. C. (2002). 298(5599):1728–30. Historical essay. the early years of HIV/AIDS. Science, Gallo, R. C., Sarin, P. S., Gelmann, E. P., Robert-Guroff, M., Richardson, E., Kalyanaraman, V. S., Mann, D., Sidhu, G. D., Stahl, R. E., Zolla-Pazner, S., Leibowitch, J., and Popovic, M. (1983). Isolation of human T-cell leukemia virus in acquired immune deficiency syndrome (AIDS). Science, 220(4599):865–7. 186 8 Bibliography Gao, F., Chen, Y., Levy, D. N., Conway, J. A., Kepler, T. B., and Hui, H. (2004). Unselected mutations in the human immunodeficiency virus type 1 genome are mostly nonsynonymous and often deleterious. J Virol, 78(5):2426–33. Gareiss, P. C. and Miller, B. L. (2009). Ribosomal frameshifting: an emerging drug target for HIV. Current opinion in investigational drugs (London, England : 2000), 10(2):121– 8. Gilks, C. F., Crowley, S., Ekpini, R., Gove, S., Perriens, J., Souteyrand, Y., Sutherland, D., Vitoria, M., Guerma, T., and Cock, K. D. (2006). The WHO public-health approach to antiretroviral treatment against HIV in resource-limited settings. Lancet, 368(9534):505– 10. Glass, T. R., Geest, S. D., Hirschel, B., Battegay, M., Furrer, H., Covassini, M., Vernazza, P. L., Bernasconi, E., Rickenboch, M., Weber, R., Bucher, H. C., and Study, S. H. C. (2008). Self-reported non-adherence to antiretroviral therapy repeatedly assessed by two questions predicts treatment failure in virologically suppressed patients. Antivir Ther (Lond), 13(1):77–85. Green, D. R., Droin, N., and Pinkoski, M. (2003). Activation-induced cell death in T cells. Immunol Rev, 193:70–81. Grobler, J. A., Mckenna, P. M., Ly, S., Stillmock, K. A., Bahnck, C. M., Danovich, R. M., Dornadula, G., Hazuda, D. J., and Miller, M. D. (2009). Hiv integrase inhibitor dissociation rates correlate with efficacy in vitro. Antivir Ther (Lond), 14(4):A27–A27. Grossman, Z., Meier-Schellersheim, M., Paul, W. E., and Picker, L. J. (2006). Pathogenesis of HIV infection: what the virus spares is as important as what it destroys. Nature Medicine, 12(3):289–95. Guyon, I., Weston, J., Barnhill, S., and Vapnik, V. (2002). Gene selection for cancer classification using support vector machines. Machine learning, 46:389–422. Hahn, B. H., Shaw, G. M., Taylor, M. E., Redfield, R. R., Markham, P. D., Salahuddin, S. Z., Wong-Staal, F., Gallo, R. C., Parks, E. S., and Parks, W. P. (1986). Genetic variation in HTLV-III/LAV over time in patients with AIDS or at risk for AIDS. Science, 232(4757):1548–53. Hall, M. (1999). Correlation-based feature selection for machine learning. Ph.D. Thesiss, Department of Computer Science, University of Waikato, Hamilton, New Zealand. Hammer, S. M., Eron, J. J., Reiss, P., Schooley, R. T., Thompson, M. A., Walmsley, S., Cahn, P., Fischl, M. A., Gatell, J. M., Hirsch, M. S., Jacobsen, D. M., Montaner, J. S. G., Richman, D. D., Yeni, P. G., Volberding, P. A., and Society-USA, I. A. (2008). Antiretroviral treatment of adult HIV infection: 2008 recommendations of the international AIDS society-USA panel. JAMA, 300(5):555–70. Hammer, S. M., Katzenstein, D. A., Hughes, M. D., Gundacker, H., Schooley, R. T., Haubrich, R. H., Henry, W. K., Lederman, M. M., Phair, J. P., Niu, M., Hirsch, M. S., and Merigan, T. C. (1996). A trial comparing nucleoside monotherapy with combination 8 Bibliography 187 therapy in HIV-infected adults with CD4 cell counts from 200 to 500 per cubic millimeter. N Engl J Med, 335(15):1081–90. Han, J., Pei, J., Yin, Y., and Mao, R. (2004). Mining frequent patterns without candidate generation: A frequent-pattern tree approach. Data Mining and Knowledge Discovery, 8:53–87. Harrigan, P. R., Hogg, R. S., Dong, W. W. Y., Yip, B., Wynhoven, B., Woodward, J., Brumme, C. J., Brumme, Z. L., Mo, T., Alexander, C. S., and Montaner, J. S. G. (2005). Predictors of HIV drug-resistance mutations in a large antiretroviral-naive cohort initiating triple antiretroviral therapy. J Infect Dis, 191(3):339–47. Hart, P., Nilsson, N., and Raphael, B. (1968). A formal basis for the heuristic determination of minimum cost paths. IEEE transactions on Systems Science and Cybernetics, SSC4(2):100–107. Hastie, T., Tibshirani, R., and Friedman, J. H. (2001). The elements of statistical learning: data mining, inference, and prediction. Springer Series in Statistics. Hazenberg, M. D., Otto, S. A., van Benthem, B. H. B., Roos, M. T. L., Coutinho, R. A., Lange, J. M. A., Hamann, D., Prins, M., and Miedema, F. (2003). Persistent immune activation in HIV-1 infection is associated with progression to AIDS. AIDS, 17(13):1881– 8. Hazuda, D. J., Felock, P., Witmer, M., Wolfe, A., Stillmock, K., Grobler, J. A., Espeseth, A., Gabryelski, L., Schleif, W., Blau, C., and Miller, M. D. (2000). Inhibitors of strand transfer that prevent integration and inhibit HIV-1 replication in cells. Science, 287(5453):646–50. Healy, B. and Degruttola, V. (2007). Hidden markov models for settings with intervalcensored transition times and uncertain time origin: application to HIV genetic analysis. Biostatistics, 8(2):438–452. Heeney, J. L., Dalgleish, A. G., and Weiss, R. A. (2006). Origins of HIV and the evolution of resistance to AIDS. Science, 313(5786):462–6. Hetherington, L. (2004). The MIT finite-state transducer toolkit for speech and language processing. Proceeding of the 8th International Conference on Spoken Language Processing. Huang, C., Tang, M., Zhang, M.-Y., Majeed, S., Montabana, E., Stanfield, R. L., Dimitrov, D. S., Korber, B., Sodroski, J., Wilson, I. A., Wyatt, R., and Kwong, P. D. (2005). Structure of a V3-containing HIV-1 gp120 core. Science, 310(5750):1025–8. Huang, Y. and Suen, C. (1995). A method of combining multiple experts for the recognition of unconstrained handwritten numberals. Ieee T Pattern Anal, 17:90–94. Jacks, T., Power, M. D., Masiarz, F. R., Luciw, P. A., Barr, P. J., and Varmus, H. E. (1988). Characterization of ribosomal frameshifting in HIV-1 gag-pol expression. Nature, 331(6153):280–3. 188 8 Bibliography Jenwitheesuk, E. and Samudrala, R. (2005). Prediction of HIV-1 protease inhibitor resistance using a protein-inhibitor flexible docking approach. Antivir Ther (Lond), 10(1):157–66. Jiang, H., Deeks, S. G., Kuritzkes, D. R., Lallemant, M., Katzenstein, D., Albrecht, M., and DeGruttola, V. (2003). Assessing resistance costs of antiretroviral therapies via measures of future drug options. J Infect Dis, 188(7):1001–8. Joachims, T. (1999). Transductive inference for text classifcation using support vector machines. International Conference on Machine Learning (ICML). Johnson, V. A., Brun-Vezinet, F., Clotet, B., Conway, B., Kuritzkes, D. R., Pillay, D., Schapiro, J. M., Telenti, A., and Richman, D. D. (2005). Update of the drug resistance mutations in HIV-1: Fall 2005. Topics in HIV medicine : a publication of the International AIDS Society, USA, 13(4):125–31. Johnson, V. A., Brun-Vezinet, F., Clotet, B., Gunthard, H. F., Kuritzkes, D. R., Pillay, D., Schapiro, J. M., and Richman, D. D. (2008). Update of the drug resistance mutations in HIV-1. Topics in HIV medicine : a publication of the International AIDS Society, USA, 16(5):138–45. Jordan, R., Gold, L., Cummins, C., and Hyde, C. (2002). Systematic review and metaanalysis of evidence for increasing numbers of drugs in antiretroviral combination therapy. BMJ, 324(7340):757. Kamber, M. and Han, J. (2001). Data mining: Concepts and techniques. Morgan Kaufmann. Kanki, P. J., Hamel, D. J., Sankalé, J. L., c Hsieh, C., Thior, I., Barin, F., Woodcock, S. A., Guèye-Ndiaye, A., Zhang, E., Montano, M., Siby, T., Marlink, R., NDoye, I., Essex, M. E., and MBoup, S. (1999). Human immunodeficiency virus type 1 subtypes differ in disease progression. J Infect Dis, 179(1):68–73. Kanthak, S. and Ney, H. (2004). FSA: an efficient and flexible C++ toolkit for finite state automata using on-demand computation. ACL ’04: Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics, pages 510–517. Kantor, R., Katzenstein, D. A., Efron, B., Carvalho, A. P., Wynhoven, B., Cane, P., Clarke, J., Sirivichayakul, S., Soares, M. A., Snoeck, J., Pillay, C., Rudich, H., Rodrigues, R., Holguin, A., Ariyoshi, K., Bouzas, M. B., Cahn, P., Sugiura, W., Soriano, V., Brigido, L. F., Grossman, Z., Morris, L., Vandamme, A.-M., Tanuri, A., Phanuphak, P., Weber, J. N., Pillay, D., Harrigan, P. R., Camacho, R., Schapiro, J. M., and Shafer, R. W. (2005). Impact of HIV-1 subtype and antiretroviral therapy on protease and reverse transcriptase genotype: results of a global collaboration. PLoS Med, 2(4):e112. Karlsson Hedestam, G. B., Fouchier, R. A. M., Phogat, S., Burton, D. R., Sodroski, J., and Wyatt, R. T. (2008). The challenges of eliciting neutralizing antibodies to HIV-1 and to influenza virus. Nat Rev Microbiol, 6(2):143–55. 8 Bibliography 189 Katzenstein, D. A., Bosch, R. J., Hellmann, N., Wang, N., Bacheler, L., Albrecht, M. A., and Team, A. . S. (2003). Phenotypic susceptibility and virological outcome in nucleoside-experienced patients receiving three or four antiretroviral drugs. AIDS, 17(6):821–30. Kauder, S. E., Bosque, A., Lindqvist, A., Planelles, V., and Verdin, E. (2009). Epigenetic regulation of HIV-1 latency by cytosine methylation. PLoS Pathog, 5(6):e1000495. Kierczak, M., Ginalski, K., Draminski, M., Koronacki, J., Rudnicki, W., and Komorowski, J. (2009). A rough set-based model of HIV-1 reverse transcriptase resistome. Bioinformatics and Biology Insights. Kirkpatrick, S., Gelatt, C., and Vecchi, M. (1983). Optimization by simulated annealing. Science, 220(4598):671–680. Kittler, J., Hatef, M., Duin, R., and Matas, J. (1998). On combining classifiers. Ieee T Pattern Anal, 20(3):226–239. Kjaer, J., Høj, L., Fox, Z., and Lundgren, J. D. (2008). Prediction of phenotypic susceptibility to antiretroviral drugs using physiochemical properties of the primary enzymatic structure combined with artificial neural networks. HIV Med, 9(8):642–52. Korber, B., Muldoon, M., Theiler, J., Gao, F., Gupta, R., Lapedes, A., Hahn, B. H., Wolinsky, S., and Bhattacharya, T. (2000). Timing the ancestor of the HIV-1 pandemic strains. Science, 288(5472):1789–96. Kumar, G. N., Rodrigues, A. D., Buko, A. M., and Denissen, J. F. (1996). Cytochrome p450-mediated metabolism of the HIV-1 protease inhibitor ritonavir (ABT-538) in human liver microsomes. J Pharmacol Exp Ther, 277(1):423–31. Kuncheva, L. (2002). Switching between selection and fusion in combining classifiers: anexperiment. IEEE Transactions on Systems Man And Cybernetics, Part B-cybernetics, 32(2):146–156. Kuncheva, L., Bezdek, J., and Duin, R. (2001). Decision templates for multiple classifier fusion: an experimental comparison. Pattern Recognition, 34(2):299–314. Kyte, J. and Doolittle, R. F. (1982). A simple method for displaying the hydropathic character of a protein. J Mol Biol, 157(1):105–32. Lambert, P.-H., Liu, M., and Siegrist, C.-A. (2005). Can successful vaccines teach us how to induce efficient protective immune responses? Nature Medicine, 11(4 Suppl):S54–62. Landwehr, N., Hall, M., and Frank, E. (2005). Logistic model trees. Machine learning, 59:161–205. Lapins, M., Eklund, M., Spjuth, O., Prusis, P., and Wikberg, J. E. S. (2008). Proteochemometric modeling of HIV protease susceptibility. BMC Bioinformatics, 9:181. 190 8 Bibliography Larder, B., Degruttola, V., Hammer, S., Harrigan, R., Wegner, S., Winslow, D., and Zazzi, M. (2002). The international HIV resistance response database initiative: a new global collaborative approach to relating viral genotype and treatment to clinical outcome. Antivir Ther (Lond), 7:S111–S111. Larder, B., Wang, D., Revell, A., and Lane, C. (2003). Neural network model identified potentially effective drug combinations for patients failing salvage therapy. 2nd IAS Conference on HIV Pathogenesis and Treatment 13–17 July2003, Paris, France. Poster LB39. Larder, B., Wang, D., Revell, A., Montaner, J., Harrigan, R., Wolf, F. D., Lange, J., Wegner, S., Ruiz, L., Pérez-Elías, M. J., Emery, S., Gatell, J., Monforte, A. D., Torti, C., Zazzi, M., and Lane, C. (2007). The development of artificial neural networks to predict virological response to combination HIV therapy. Antivir Ther (Lond), 12(1):15– 24. Larder, B. A., Darby, G., and Richman, D. D. (1989). HIV with reduced sensitivity to zidovudine (AZT) isolated during prolonged therapy. Science, 243(4899):1731–4. Larder, B. A. and Kemp, S. D. (1989). Multiple mutations in HIV-1 reverse transcriptase confer high-level resistance to zidovudine (AZT). Science, 246(4934):1155–8. Lathrop, R. and Pazzani, M. (1999). Combinatorial optimization in rapidly mutating drug-resistant viruses. Journal of Combinatorial Optimization, 3:301–320. Le, T., Chiarella, J., Simen, B. B., Hanczaruk, B., Egholm, M., Landry, M. L., Dieckhaus, K., Rosen, M. I., and Kozal, M. J. (2009). Low-abundance HIV drug-resistant viral variants in treatment-experienced persons correlate with historical antiretroviral use. PLoS ONE, 4(6):e6079. Le Cessie, S. and Van Houwelingen, J. C. (1992). Ridge estimators in logistic regression. Applied Statistics, 41:191–201. Leitner, T., Korber, B., Daniels, M., Calef, C., and Foley, B. (2005). HIV-1 subtype and circulating recombinant form (CRF) reference sequences, 2005. HIV sequence compendium. Lengauer, T., Sander, O., Sierra, S., Thielen, A., and Kaiser, R. (2007). Bioinformatics prediction of HIV coreceptor usage. Nat Biotechnol, 25(12):1407–10. Lengauer, T. and Sing, T. (2006). Bioinformatics-assisted anti-HIV therapy. Nat Rev Microbiol, 4(10):790–7. Li, M., Gao, F., Mascola, J. R., Stamatatos, L., Polonis, V. R., Koutsoukos, M., Voss, G., Goepfert, P., Gilbert, P., Greene, K. M., Bilska, M., Kothe, D. L., Salazar-Gonzalez, J. F., Wei, X., Decker, J. M., Hahn, B. H., and Montefiori, D. C. (2005). Human immunodeficiency virus type 1 env clones from acute and early subtype B infections for standardized assessments of vaccine-elicited neutralizing antibodies. J Virol, 79(16):10108– 25. 8 Bibliography 191 Li, M., Salazar-Gonzalez, J. F., Derdeyn, C. A., Morris, L., Williamson, C., Robinson, J. E., Decker, J. M., Li, Y., Salazar, M. G., Polonis, V. R., Mlisana, K., Karim, S. A., Hong, K., Greene, K. M., Bilska, M., Zhou, J., Allen, S., Chomba, E., Mulenga, J., Vwalika, C., Gao, F., Zhang, M., Korber, B. T. M., Hunter, E., Hahn, B. H., and Montefiori, D. C. (2006). Genetic and neutralization properties of subtype C human immunodeficiency virus type 1 molecular env clones from acute and early heterosexually acquired infections in Southern Africa. J Virol, 80(23):11776–90. Liu, J., Bartesaghi, A., Borgnia, M. J., Sapiro, G., and Subramaniam, S. (2008). Molecular architecture of native HIV-1 gp120 trimers. Nature, 455(7209):109–13. Liu, R. and Yuan, B. (2001). Multiple classifiers combination by clustering and selection. Information Fusion, 2:163–168. Loizidou, E. Z., Kousiappa, I., Zeinalipour-Yazdi, C. D., Van de Vijver, D. A. M. C., and Kostrikis, L. G. (2009). Implications of HIV-1 M group polymorphisms on integrase inhibitor efficacy and resistance: genetic and structural in silico analyses. Biochemistry, 48(1):4–6. Lombardy, S., Regis-Gianas, Y., and Sakarovitch, J. (2004). Introducing Vaucanson. Theoretical Computer Science, 328:77–96. Lyles, R. H., Muñoz, A., Yamashita, T. E., Bazmi, H., Detels, R., Rinaldo, C. R., Margolick, J. B., Phair, J. P., and Mellors, J. W. (2000). Natural history of human immunodeficiency virus type 1 viremia after seroconversion and proximal to AIDS in a large cohort of homosexual men. J Infect Dis, 181(3):872–80. Maggiolo, F., Airoldi, M., Callegaro, A., Ripamonti, D., Gregis, G., Quinzan, G., Bombana, E., Ravasio, V., and Suter, F. (2007). Prediction of virologic outcome of salvage antiretroviral treatment by different systems for interpreting genotypic HIV drug resistance. Journal of the International Association of Physicians in AIDS Care (JIAPAC), 6:87–93. Mahungu, T., Smith, C., Turner, F., Egan, D., Youle, M., Johnson, M., Khoo, S., Back, D., and Owen, A. (2009). Cytochrome p450 2b6 516g–>t is associated with plasma concentrations of nevirapine at both 200 mg twice daily and 400 mg once daily in an ethnically diverse population. HIV Med, 10(5):310–7. Mallal, S., Phillips, E., Carosi, G., Molina, J.-M., Workman, C., Tomazic, J., Jägel-Guedes, E., Rugina, S., Kozyrev, O., Cid, J. F., Hay, P., Nolan, D., Hughes, S., Hughes, A., Ryan, S., Fitch, N., Thorborn, D., and Benbow, A. (2008). HLA-B*5701 screening for hypersensitivity to abacavir. N Engl J Med, 358(6):568–79. Mansky, L. M. and Temin, H. M. (1995). Lower in vivo mutation rate of human immunodeficiency virus type 1 than that predicted from the fidelity of purified reverse transcriptase. J Virol, 69(8):5087–94. Markowitz, M., Mohri, H., Mehandru, S., Shet, A., Berry, L., Kalyanaraman, R., Kim, A., Chung, C., Jean-Pierre, P., Horowitz, A., Mar, M. L., Wrin, T., Parkin, N., Poles, 192 8 Bibliography M., Petropoulos, C., Mullen, M., Boden, D., and Ho, D. D. (2005). Infection with multidrug resistant, dual-tropic HIV-1 and rapid progression to AIDS: a case report. Lancet, 365(9464):1031–8. Mattapallil, J. J., Douek, D. C., Hill, B., Nishimura, Y., Martin, M., and Roederer, M. (2005). Massive infection and loss of memory CD4+ T cells in multiple tissues during acute SIV infection. Nature, 434(7037):1093–7. McCallister, S., Lalezari, J., Richmond, G., Thompson, M., Harrigan, R., Martin, D., Salzwedel, K., and Allaway, G. (2008). HIV-1 Gag polymorphisms determine treatment response to bevirimat (PA-457). Antivir Ther (Lond), 13(4):A10–A10. McColl, D. J., Fransen, S., Gupta, S., Parkin, N., Margot, N., Chuck, S., Cheng, A. K., and Miller, M. D. (2007). Resistance and cross-resistance to first generation integrase inhibitors: insights from a Phase II study of elvitegravir (GS-9137). Antivir Ther (Lond), 12(5):S11–S11. Meynard, J.-L., Vray, M., Morand-Joubert, L., Race, E., Descamps, D., Peytavin, G., Matheron, S., Lamotte, C., Guiramand, S., Costagliola, D., Brun-Vézinet, F., Clavel, F., Girard, P.-M., and Group, N. T. (2002). Phenotypic or genotypic resistance testing for choosing antiretroviral therapy after treatment failure: a randomized trial. AIDS, 16(5):727–36. Miller, M. D. and Hazuda, D. J. (2004). HIV resistance to the fusion inhibitor enfuvirtide: mechanisms and clinical implications. Drug Resist Updat, 7(2):89–95. Mocroft, A., Vella, S., Benfield, T. L., Chiesi, A., Miller, V., Gargalianos, P., d’Arminio Monforte, A., Yust, I., Bruun, J. N., Phillips, A. N., and Lundgren, J. D. (1998). Changing patterns of mortality across europe in patients infected with hiv-1. eurosida study group. Lancet, 352(9142):1725–30. Mohri, M. (2002). Semiring frameworks and algorithms for shortest-distance problems. Journal of Automata, Languages and Combinatorics, 7(3):321–350. Mohri, M. (2005). Statistical natural language processing. M. Lothaire, editor, Applied Combinatorics on Words. Cambridge University Press., pages 199–226. Mohri, M., Pereira, F., and Riley, M. (2000). The design principles of a weighted finite-state transducer library. Theoretical Computer Science, 231:17–32. Mohri, M. and Riley, M. (2001). A weight pushing algorithm for large vocabulary speech recognition. Proceedings of the Seventh European Conference on Speech Communication and Technology, pages 1303–1606. Mohri, M. and Riley, M. (2002). An efficient algorithm for the N-best-strings problem. Proceedings of the Seventh International Conference on Spoken Language Processing, pages 1313–1316. Montagnier, L. (2002). 298(5599):1727–8. Historical essay. a history of HIV discovery. Science, 8 Bibliography 193 Müller, F. (2008). Inferring virological response to antiretroviral combination therapy based on past treatment lines. Bachelor Thesis, Saarland University, Saarbrücken, Germany. Müller, F. (2009). Assessing antibody neutralization oh HIV-1 as an initial step in the search of gp160-based immunogens. Master Thesis, Saarland University, Saarbrücken, Germany. Ney, H., Essen, U., and Kneser, R. (1994). On structuring probabilistic dependences in stochastic language modelling. Computer Speech and Language, 8:1–38. Nijhuis, M., van Maarseveen, N. M., Lastere, S., Schipper, P., Coakley, E., Glass, B., Rovenska, M., de Jong, D., Chappey, C., Goedegebuure, I. W., Heilek-Snyder, G., Dulude, D., Cammack, N., Brakier-Gingras, L., Konvalinka, J., Parkin, N., Kräusslich, H.-G., Brun-Vezinet, F., and Boucher, C. A. B. (2007). A novel substrate-based HIV-1 protease inhibitor drug resistance mechanism. PLoS Med, 4(1):e36. Novembre, J., Galvani, A. P., and Slatkin, M. (2005). The geographic spread of the CCR5 δ32 HIV-resistance allele. PLoS Biol, 3(11):e339. Nowozin, S., BakIr, G., and Tsuda, K. (2007). Discriminative subsequence mining for action classification. 11th IEEE International Conference on Computer Vision, pages 1919–1923. Oette, M., Kaiser, R., Däumer, M., Petch, R., Fätkenheuer, G., Carls, H., Rockstroh, J. K., Schmalöer, D., Stechel, J., Feldt, T., Pfister, H., and Häussinger, D. (2006). Primary HIV drug resistance and efficacy of first-line antiretroviral therapy guided by resistance testing. J Acquir Immune Defic Syndr, 41(5):573–81. Pantaleo, G., Graziosi, C., and Fauci, A. S. (1993). New concepts in the immunopathogenesis of human immunodeficiency virus infection. N Engl J Med, 328(5):327–35. Pantophlet, R. and Burton, D. R. (2006). GP120: target for neutralizing HIV-1 antibodies. Annu Rev Immunol, 24:739–69. Parikh, U. M., Bacheler, L., Koontz, D., and Mellors, J. W. (2006). The K65R mutation in human immunodeficiency virus type 1 reverse transcriptase exhibits bidirectional phenotypic antagonism with thymidine analog mutations. J Virol, 80(10):4971–7. Pereira, F. and Riley, M. (1997). Speech recognition by composition of weighted finite automata. Roche, E. and Schabes, Y. (Eds.) Finite-State language processing. MIT Press, pages 431–453. Perelson, A. S. (2002). Modelling viral and immune system dynamics. Nat Rev Immunol, 2(1):28–36. Perner, J., Altmann, A., and Lengauer, T. (2009). Semi-supervised learning for improving prediction of HIV drug resistance. Lecture Notes in Informatics, Proceedings of the German Conference on Bioinformatics 2009, P-157:55–65. 194 8 Bibliography Petropoulos, C. J., Parkin, N. T., Limoli, K. L., Lie, Y. S., Wrin, T., Huang, W., Tian, H., Smith, D., Winslow, G. A., Capon, D. J., and Whitcomb, J. M. (2000). A novel phenotypic drug susceptibility assay for human immunodeficiency virus type 1. Antimicrob Agents Chemother, 44(4):920–8. Pettit, S. C., Everitt, L. E., Choudhury, S., Dunn, B. M., and Kaplan, A. H. (2004). Initial cleavage of the human immunodeficiency virus type 1 gagpol precursor by its activated protease occurs by an intramolecular mechanism. J Virol, 78(16):8477–85. Phogat, S. and Wyatt, R. (2007). Rational modifications of HIV-1 envelope glycoproteins for immunogen design. Curr Pharm Des, 13(2):213–27. Picker, L. J. (2006). Immunopathogenesis of acute AIDS virus infection. Curr Opin Immunol, 18(4):399–405. Plantier, J.-C., Leoz, M., Dickerson, J. E., Oliveira, F. D., Cordonnier, F., Lemée, V., Damond, F., and Simon, D. L. R. F. (2009). A new human immunodeficiency virus derived from gorillas. Nature Medicine. Plotkin, S. A. (2005). Vaccines: past, present and future. Nature Medicine, 11(4 Suppl):S5– 11. Popovic, M., Sarngadharan, M. G., Read, E., and Gallo, R. C. (1984). Detection, isolation, and continuous production of cytopathic retroviruses (HTLV-III) from patients with AIDS and pre-AIDS. Science, 224(4648):497–500. Prilusky, J., Felder, C. E., Zeev-Ben-Mordehai, T., Rydberg, E. H., Man, O., Beckmann, J. S., Silman, I., and Sussman, J. L. (2005). Foldindex: a simple tool to predict whether a given protein sequence is intrinsically unfolded. Bioinformatics, 21(16):3435–8. Prosperi, M., Giambenedetto, S. D., Trotta, M., Cingolani, A., Ruiz, L., Baxter, J., Clevenbergh, P., Perno, C., Cauda, R., Ulivi, G., Antinori, A., and De Luca, A. (2004). A fuzzy relational system trained by genetic algorithms and HIV-1 resistance genotypes/virological response data from prospective studies usefully predicts treatment outcomes. Antivir Ther (Lond), 9(4):U89–U89. Prosperi, M., Rosen-Zvi, M., Altmann, A., Aharoni, E., Peres, Y., Sönnerbord, A., Incardona, F., Struck, D., Kaiser, R., and Zazzi, M. (2009a). Antiretroviral therapy optimisation without genotype resistance testing: a perspective on treatment history based models. Reviews in Antiviral Therapy, 1:28–29. Prosperi, M., Zazzi, M., Perno, C., Giambenedetto, S. D., Baxter, J., Ruiz, L., Clevenbergh, P., Ulivi, G., Antinori, A., and De Luca, A. (2005). ’Common law’ applied to treatment decisions for drug resistant HIV. Antivir Ther (Lond), 10(4):S62–S62. Prosperi, M. C. F., D’Autilia, R., Incardona, F., De Luca, A., Zazzi, M., and Ulivi, G. (2009b). Stochastic modelling of genotypic drug-resistance for human immunodeficiency virus towards long-term combination therapy optimization. Bioinformatics, 25(8):1040– 7. 8 Bibliography 195 Rahnenführer, J., Beerenwinkel, N., Schulz, W. A., Hartmann, C., von Deimling, A., Wullich, B., and Lengauer, T. (2005). Estimating cancer survival and clinical outcome based on genetic tumor progression scores. Bioinformatics, 21(10):2438–46. Rambaut, A., Posada, D., Crandall, K. A., and Holmes, E. C. (2004). The causes and consequences of HIV evolution. Nat Rev Genet, 5(1):52–61. Ratner, L., Haseltine, W., Patarca, R., Livak, K. J., Starcich, B., Josephs, S. F., Doran, E. R., Rafalski, J. A., Whitehorn, E. A., Baumeister, K., Ivanoff, L., Petteway, S. R., Pearson, M. L., Lautenberger, J. A., Papas, T. S., Ghrayeb, J., Chang, N. T., Gallo, R. C., and Wong-Staal, F. (1985). Complete nucleotide sequence of the AIDS virus, HTLV-III. Nature, 313(6000):277–84. Rerks-Ngarm, S., Pitisuttithum, P., Nitayaphan, S., Kaewkungwal, J., Chiu, J., Paris, R., Premsri, N., Namwat, C., de Souza, M., Adams, E., Benenson, M., Gurunathan, S., Tartaglia, J., McNeil, J., Francis, D., Stablein, D., Birx, D., Chunsuttiwat, S., Khamboonruang, C., Thongcharoen, P., Robb, M., Michael, N., Kunasol, P., Kim, J., and the MOPH-TAVEG Investigators (2009). Vaccination with ALVAC and AIDSVAX to prevent HIV-1 infection in thailand. N Engl J Med. Revell, A. D., Wang, D., Harrigan, R., Gatell, J., Ruiz, L., Emery, S., Perez-Elias, M. J., Torti, C., Baxter, J., DeWolf, F., Gazzard, B., Geretti, A. M., Staszewski, S., Hamers, R., Wensing, A. M. J., Lange, J., Montaner, J. M., and Larder, B. A. (2009). Computational models developed without a genotype for resource-poor countries predict response to HIV treatment with 82% accuracy. Antivir Ther (Lond), 14(4):A38–A38. Rhee, S.-Y., Gonzales, M. J., Kantor, R., Betts, B. J., Ravela, J., and Shafer, R. W. (2003). Human immunodeficiency virus reverse transcriptase and protease sequence database. Nucleic Acids Res, 31(1):298–303. Rhee, S.-Y., Taylor, J., Wadhera, G., Ben-Hur, A., Brutlag, D. L., and Shafer, R. W. (2006). Genotypic predictors of human immunodeficiency virus type 1 drug resistance. Proc Natl Acad Sci USA, 103(46):17355–60. Ribeiro, R. M. and Bonhoeffer, S. (2000). Production of resistant HIV mutants during antiretroviral therapy. Proc Natl Acad Sci USA, 97(14):7681–6. Richman, D. D., Wrin, T., Little, S. J., and Petropoulos, C. J. (2003). Rapid evolution of the neutralizing antibody response to HIV type 1 infection. Proc Natl Acad Sci USA, 100(7):4144–9. Roben, P., Moore, J. P., Thali, M., Sodroski, J., Barbas, C. F., and Burton, D. R. (1994). Recognition properties of a panel of human recombinant Fab fragments to the CD4 binding site of gp120 that show differing abilities to neutralize human immunodeficiency virus type 1. J Virol, 68(8):4821–8. Robertson, D. L., Anderson, J. P., Bradac, J. A., Carr, J. K., Foley, B., Funkhouser, R. K., Gao, F., Hahn, B. H., Kalish, M. L., Kuiken, C., Learn, G. H., Leitner, T., McCutchan, F., Osmanov, S., Peeters, M., Pieniazek, D., Salminen, M., Sharp, P. M., Wolinsky, S., and Korber, B. (2000). HIV-1 nomenclature proposal. Science, 288(5463):55–6. 196 8 Bibliography Rogova, G. (1994). Combining the results of several neural network classifiers. Neural networks, 7(5):777–781. Roomp, K., Beerenwinkel, N., Sing, T., Schuelter, E., Buech, J., Sierra-Aragon, S., Daeumer, M., Hoffmann, D., Kaiser, R., Lengauer, T., and Selbig, J. (2006). Arevir: A secure platform for designing personalized antiretroviral therapies against HIV. Lect Notes Comput Sc, 4075:185–194. Rosen-Zvi, M., Altmann, A., Prosperi, M., Aharoni, E., Neuvirth, H., Sönnerborg, A., Schülter, E., Struck, D., Peres, Y., Incardona, F., Kaiser, R., Zazzi, M., and Lengauer, T. (2008). Selecting anti-HIV therapies based on a variety of genomic and clinical factors. Bioinformatics, 24(13):i399–406. Rousseeuw, P. and Kaufman, L. (1990). Finding groups in data: an introduction to cluster analysis. Wiley. Sacktor, N., Nakasujja, N., Skolasky, R. L., Rezapour, M., Robertson, K., Musisi, S., Katabira, E., Ronald, A., Clifford, D. B., Laeyendecker, O., and Quinn, T. C. (2009). HIV subtype D is associated with dementia, compared with subtype A, in immunosuppressed individuals at risk of cognitive impairment in kampala, uganda. Clin Infect Dis, 49(5):780–6. Saigo, H., Uno, T., and Tsuda, K. (2007). Mining complex genotypic features for predicting HIV-1 drug resistance. Bioinformatics, 23(18):2455–62. Salzwedel, K., Martin, D. E., and Sakalian, M. (2007). Maturation inhibitors: a new therapeutic class targets the virus structure. AIDS reviews, 9(3):162–72. Sander, O., Sing, T., Sommer, I., Low, A. J., Cheung, P. K., Harrigan, P. R., Lengauer, T., and Domingues, F. S. (2007). Structural descriptors of gp120 V3 loop for the prediction of HIV-1 coreceptor usage. PLoS Comput Biol, 3(3):e58. Sarkar, I., Hauber, I., Hauber, J., and Buchholz, F. (2007). HIV-1 proviral DNA excision using an evolved recombinase. Science, 316(5833):1912–5. Schweighardt, B., Liu, Y., Huang, W., Chappey, C., Lie, Y. S., Petropoulos, C. J., and Wrin, T. (2007). Development of an HIV-1 reference panel of subtype B envelope clones isolated from the plasma of recently infected individuals. J Acquir Immune Defic Syndr, 46(1):1–11. Shaw, G. M., Hahn, B. H., Arya, S. K., Groopman, J. E., Gallo, R. C., and Wong-Staal, F. (1984). Molecular characterization of human T-cell leukemia (lymphotropic) virus type III in the acquired immune deficiency syndrome. Science, 226(4679):1165–71. Shaw, G. M., Harper, M. E., Hahn, B. H., Epstein, L. G., Gajdusek, D. C., Price, R. W., Navia, B. A., Petito, C. K., O’Hara, C. J., and Groopman, J. E. (1985). HTLV-III infection in brains of children and adults with AIDS encephalopathy. Science, 227(4683):177– 82. 8 Bibliography 197 Sheehy, A. M., Gaddis, N. C., Choi, J. D., and Malim, M. H. (2002). Isolation of a human gene that inhibits HIV-1 infection and is suppressed by the viral Vif protein. Nature, 418(6898):646–50. Shehu-Xhilaga, M., Crowe, S. M., and Mak, J. (2001). Maintenance of the Gag/Gag-Pol ratio is important for human immunodeficiency virus type 1 RNA dimerization and viral infectivity. J Virol, 75(4):1834–41. Shen, L., Peterson, S., Sedaghat, A. R., McMahon, M. A., Callender, M., Zhang, H., Zhou, Y., Pitt, E., Anderson, K. S., Acosta, E. P., and Siliciano, R. F. (2008). Dose-response curve slope sets class-specific limits on inhibitory potential of anti-HIV drugs. Nature Medicine, 14(7):762–6. Simon, V., Ho, D. D., and Karim, Q. A. (2006). HIV/AIDS epidemiology, pathogenesis, prevention, and treatment. Lancet, 368(9534):489–504. Sing, T., Sander, O., Beerenwinkel, N., and Lengauer, T. (2005a). ROCR: visualizing classifier performance in R. Bioinformatics, 21(20):3940–1. Sing, T., Svicher, V., Beerenwinkel, N., Ceccherinig-Silberstein, F., Däumer, M., Kaiser, R., Walter, H., Korn, K., Hoffmann, D., Oette, M., Rockstroh, J., Fatkenheuer, G., Perno, C., and Lengauer, T. (2005b). Characterization of novel HIV drug resistance mutations using clustering, multidimensional scaling and SVM-based feature ranking. Lect Notes Artif Int, 3721:285–296. Sluis-Cremer, N., Arion, D., and Parniak, M. A. (2000). Molecular mechanisms of HIV1 resistance to nucleoside reverse transcriptase inhibitors (NRTIs). Cell Mol Life Sci, 57(10):1408–22. Snoeck, J., Kantor, R., Shafer, R. W., Laethem, K. V., Deforche, K., Carvalho, A. P., Wynhoven, B., Soares, M. A., Cane, P., Clarke, J., Pillay, C., Sirivichayakul, S., Ariyoshi, K., Holguin, A., Rudich, H., Rodrigues, R., Bouzas, M. B., Brun-Vézinet, F., Reid, C., Cahn, P., Brigido, L. F., Grossman, Z., Soriano, V., Sugiura, W., Phanuphak, P., Morris, L., Weber, J., Pillay, D., Tanuri, A., Harrigan, R. P., Camacho, R., Schapiro, J. M., Katzenstein, D., and Vandamme, A.-M. (2006). Discordances between interpretation algorithms for genotypic resistance to protease and reverse transcriptase inhibitors of human immunodeficiency virus are subtype dependent. Antimicrob Agents Chemother, 50(2):694–701. Strobl, C., Boulesteix, A.-L., Zeileis, A., and Hothorn, T. (2007). Bias in random forest variable importance measures: illustrations, sources and a solution. BMC Bioinformatics, 8:25. Swanstrom, R., Bosch, R. J., Katzenstein, D., Cheng, H., Jiang, H., Hellmann, N., Haubrich, R., Fiscus, S. A., Fletcher, C. V., Acosta, E. P., and Gulick, R. M. (2004). Weighted phenotypic susceptibility scores are predictive of the HIV-1 RNA response in protease inhibitor-experienced HIV-1-infected subjects. J Infect Dis, 190(5):886–93. Telenti, A. and Goldstein, D. B. (2006). Genomics meets HIV-1. Nat Rev Microbiol, 4(11):865–73. 198 8 Bibliography Telenti, A. and Zanger, U. M. (2008). Pharmacogenetics of anti-HIV drugs. Annu Rev Pharmacol Toxicol, 48:227–56. Thibaut, L., Rochas, S., Dourlat, J., Monneret, C., Carayon, K., Deprez, E., Mouscadet, J. F., Soma, E., and Lebel-Binay, S. (2009). New integrase binding inhibitors acting in synergy with raltegravir. Antivir Ther (Lond), 14(4):A29–A29. Trkola, A., Kuster, H., Rusert, P., von Wyl, V., Leemann, C., Weber, R., Stiegler, G., Katinger, H., Joos, B., and Günthard, H. F. (2008). In vivo efficacy of human immunodeficiency virus neutralizing antibodies: estimates for protective titers. J Virol, 82(3):1591–9. Uchil, P. D. and Mothes, W. (2009). HIV entry revisited. Cell, 137(3):402–4. Van Laethem, K., De Luca, A., Antinori, A., Cingolani, A., Perno, C. F., and Vandamme, A.-M. (2002). A genotypic drug resistance interpretation algorithm that significantly predicts therapy response in HIV-1-infected patients. Antivir Ther (Lond), 7(2):123–9. Vercauteren, J. and Vandamme, A.-M. (2006). Algorithms for the interpretation of HIV-1 genotypic drug resistance information. Antiviral Res, 71(2-3):335–42. Verheyen, J., Verhofstede, C., Knops, E., Vandekerckhove, L., Fun, A., Brunen, D., Dauwe, K., Wensing, A., Pfister, H., Kaiser, R., and Nijhuis, M. (2009). High prevalence of bevirimat resistance mutations in protease inhibitor-resistant hiv isolates. AIDS. Vermeiren, H., Van Craenenbroeck, E., Alen, P., Bacheler, L., Picchio, G., Lecocq, P., and Team, V. C. R. C. (2007). Prediction of HIV-1 drug susceptibility phenotype from the viral genotype using linear regression modeling. J Virol Methods, 145(1):47–55. von Andrian, U. H. and Mackay, C. R. (2000). T-cell function and migration. two sides of the same coin. N Engl J Med, 343(14):1020–34. Walker, B. D. and Burton, D. R. (2008). Toward an AIDS vaccine. Science, 320(5877):760– 4. Walker, L. M., Phogat, S. K., Chan-Hui, P.-Y., Wagner, D., Phung, P., Goss, J. L., Wrin, T., Simek, M. D., Fling, S., Mitcham, J. L., Lehrman, J. K., Priddy, F. H., Olsen, O. A., Frey, S. M., Hammond, P. W., Investigators, P. G. P., Kaminsky, S., Zamb, T., Moyle, M., Koff, W. C., Poignard, P., and Burton, D. R. (2009). Broad and potent neutralizing antibodies from an african donor reveal a new HIV-1 vaccine target. Science, 326(5950):285–9. Walter, H., Schmidt, B., Korn, K., Vandamme, A. M., Harrer, T., and Überla, K. (1999). Rapid, phenotypic HIV-1 drug sensitivity assay for protease and reverse transcriptase inhibitors. J Clin Virol, 13(1-2):71–80. Wang, D., Larder, B., Revell, A., Harrigan, R., and Montaner, J. (2003). A neural network model using clinical cohort data accurately predicts virological response and identifies regimens with increased probabilisticbility of success in treatment failures. Antivir Ther (Lond), 8(3):U99–U99. 8 Bibliography 199 Wang, J. Y., Ling, H., Yang, W., and Craigie, R. (2001). Structure of a two-domain fragment of HIV-1 integrase: implications for domain organization in the intact protein. EMBO J, 20(24):7333–43. Wang, K., Jenwitheesuk, E., Samudrala, R., and Mittler, J. E. (2004). Simple linear model provides highly accurate genotypic predictions of HIV-1 drug resistance. Antivir Ther (Lond), 9(3):343–52. Winters, B., Montaner, J., Harrigan, P. R., Gazzard, B., Pozniak, A., Miller, M. D., Emery, S., van Leth, F., Robinson, P., Baxter, J. D., Perez-Elias, M., Castor, D., Hammer, S., Rinehart, A., Vermeiren, H., Craenenbroeck, E. V., and Bacheler, L. (2008). Determination of clinically relevant cutoffs for HIV-1 phenotypic resistance estimates through a combined analysis of clinical trial and cohort data. J Acquir Immune Defic Syndr, 48(1):26–34. Wong-Staal, F., Shaw, G. M., Hahn, B. H., Salahuddin, S. Z., Popovic, M., Markham, P., Redfield, R., and Gallo, R. C. (1985). Genomic diversity of human T-lymphotropic virus type III (HTLV-III). Science, 229(4715):759–62. Woods, K., Kegelmeyer Jr., W. P., and Bowyer, K. (1997). Combination of multiple classifiers using local accuracy estimates. Ieee T Pattern Anal, 19:405–410. Wright, S. (1931). Evolution in mendelian populations. Genetics, 16:97–159. Wyatt, R. and Sodroski, J. (1998). The HIV-1 envelope glycoproteins: fusogens, antigens, and immunogens. Science, 280(5371):1884–8. Yang, O. O., Nguyen, P. T., Kalams, S. A., Dorfman, T., Göttlinger, H. G., Stewart, S., Chen, I. S. Y., Threlkeld, S., and Walker, B. D. (2002). Nef-mediated resistance of human immunodeficiency virus type 1 to antiviral cytotoxic T lymphocytes. J Virol, 76(4):1626–31. Yin, J., Beerenwinkel, N., Rahnenführer, J., and Lengauer, T. (2006). Model selection for mixtures of mutagenetic trees. Statistical applications in genetics and molecular biology, 5:Article17. Yoder, K. E. and Bushman, F. D. (2000). Repair of gaps in retroviral DNA integration intermediates. J Virol, 74(23):11191–200. Zaccarelli, M., Lorenzini, P., Ceccherini-Silberstein, F., Tozzi, V., Forbici, F., Gori, C., Trotta, M. P., Boumis, E., Narciso, P., Perno, C. F., and Antinori, A. (2009). Historical resistance profile helps to predict salvage failure. Antivir Ther (Lond), 14(2):285–91. Zazzi, M., Prosperi, M., Vicenti, I., Giambenedetto, S. D., Callegaro, A., Bruzzone, B., Baldanti, F., Gonnelli, A., Boeri, E., Paolini, E., Rusconi, S., Giacometti, A., Maggiolo, F., Menzo, S., De Luca, A., and Group, A. C. (2009). Rules-based HIV-1 genotypic resistance interpretation systems predict 8 week and 24 week virological antiretroviral treatment outcome and benefit from drug potency weighting. J Antimicrob Chemother, 64(3):616–24. 200 8 Bibliography Zhou, T., Xu, L., Dey, B., Hessell, A. J., Ryk, D. V., Xiang, S.-H., Yang, X., Zhang, M.-Y., Zwick, M. B., Arthos, J., Burton, D. R., Dimitrov, D. S., Sodroski, J., Wyatt, R., Nabel, G. J., and Kwong, P. D. (2007). Structural definition of a conserved neutralization epitope on HIV-1 gp120. Nature, 445(7129):732–7. Zhu, T., Korber, B. T., Nahmias, A. J., Hooper, E., Sharp, P. M., and Ho, D. D. (1998). An african HIV-1 sequence from 1959 and implications for the origin of the epidemic. Nature, 391(6667):594–7. Zhu, X. (2007). Semi-supervised learning literature survey. Computer Science. Zwick, M. B., Labrijn, A. F., Wang, M., Spenlehauer, C., Saphire, E. O., Binley, J. M., Moore, J. P., Stiegler, G., Katinger, H., Burton, D. R., and Parren, P. W. (2001). Broadly neutralizing antibodies targeted to the membrane-proximal external region of human immunodeficiency virus type 1 glycoprotein gp41. J Virol, 75(22):10892–905. Appendix: List of Publications Jasmina Bogojeska, Steffen Bickel, André Altmann, Thomas Lengauer (submitted). Dealing with sparse data in predicting outcomes of HIV combination therapies. Bioinformatics (submitted). André Altmann*, Laura Tolosi*, Oliver Sander, Thomas Lengauer (2010). Permuation importance: a corrected feature importance measure. Bioinformatics, 2010, 26(10): 13401347. Hendrik Weisser, André Altmann, Saleta Sierra, Francesca Incardona, Daniel Struck, Anders Sönnerborg, Rolf Kaiser, Maurizio Zazzi, Monika Tschochner, Hauke Walter, and Thomas Lengauer (2010). Only slight impact of predicted replicative capacity for therapy response prediction. PLoS ONE, 2010, 5 (2): e9044. Thomas Lengauer, André Altmann, Alexander Thielen, Rolf Kaiser (2010). Chasing the AIDS virus. Communications of the ACM, 2010, 53(3): 66-74. Thomas Lengauer, André Altmann, Alexander Thielen (2009). Bioinformatische Unterstützung der Auswahl von HIV-Therapien. Informatik-Spektrum, 2009, 32(4): 320-331. Juliane Perner , André Altmann, Thomas Lengauer (2009). Semi-Supervised Learning for Improving Prediction of HIV Drug Resistance. Lecture Notes in Informatics (Proceedings of GCB 2009), P-157: 55-65. Nadine Sichtig, Saleta Sierra, Rolf Kaiser, Martin Däumer, Stefan Reuter, Eugen Schülter, André Altmann, Gerd Fätkenheuer, Ulf Dittmer, Herbert Pfister, Stefan Esser (2009). Evolution of Raltegravir Resistance During Therapy. Journal of Antimicrobial Chemotherapy, 64: 25-32. Mattia CF Prosperi, André Altmann, Michal Rosen-Zvi, Ehud Aharoni, Gabor Borgulya, Fulop Bazso, Anders Sönnerborg, Yarenda Peres, Eugen Schülter, Daniel Struck, Giovanni Ulivi, Francesca Incardona, Anne-Mieke Vandamme, Jürgen Vercauteren and Maurizio Zazzi for the EuResist and Virolab study groups (2009). Investigation of Expert Rule Bases, Logistic Regression, and Non-Linear Machine Learning Techniques for Predicting Response to Antiretroviral Treatment. Antiviral Thearapy, 14: 433-442. André Altmann*, Tobias Sing*, Hans Vermeiren, Bart Winters, Elke Van Craenenbroeck, Koen Van der Borght, Soo-Yon Rhee, Robert W Shafer, Eugen Schülter, Rolf Kaiser, Yardena Peres, Anders Sönnerborg, W Jeffrey Fessel, Francesca Incardona, Maurizio Zazzi, Lee Bacheler, Herman Van Vlijmen, Thomas Lengauer (2009). Advantages of Predicted Phenotypes and Statistical Learning Models in Inferring Virological Response to Antiretroviral Therapy from HIV Genotype. Antiviral Thearapy, 14: 273-283. 201 202 8 Bibliography André Altmann, Martin Däumer, Niko Beerenwinkel, Yardena Peres, Eugen Schülter, Joachim Büch, Soo-Yon Rhee, Anders Sönnerborg, Jeffrey Fessel, Robert W Shafer, Maurizio Zazzi, Rolf Kaiser and Thomas Lengauer (2009). Predicting response to combination antiretroviral therapy: retrospective validation of geno2pheno-THEO on a large clinical database. Journal of Infectious Diseases, 199(7), 999-1006. André Altmann, Michal Rosen-Zvi , Mattia Prosperi, Ehud Aharoni, Hani Neuvirth, Anders Sönnerborg, Eugen Schülter, Joachim Büch, Daniel Struck, Yardena Peres, Francesca Incardona, Rolf Kaiser, Maurizio Zazzi and Thomas Lengauer (2008). Comparison of Classifier Fusion Methods for Predicting Response to Anti HIV-1 Therapy. PLoS ONE, 3(10): e3470. Jasmina Bogojeska, Adrian Alexa, André Altmann, Thomas Lengauer and Jörg Rahnenführer (2008). Rtreemix: an R package for estimating evolutionary pathways and genetic progression scores. Bioinformatics, 24 (20): 2391. Michal Rosen-Zvi, André Altmann, Mattia Prosperi, Ehud Aharoni, Hani Neuvirth, Anders Sönnerborg, Eugen Schülter, Daniel Struck, Yardena Peres, Francesca Incardona, Rolf Kaiser, Maurizio Zazzi and Thomas Lengauer (2008). Selecting anti-HIV therapies based on a variety of genomic and clinical factors. Bioinformatics, 24 (13):i399-406. André Altmann, Niko Beerenwinkel, Tobias Sing, Igor Savenkov, Martin Däumer, Rolf Kaiser, Soo-Yon Rhee, W Jeffrey Fessel, Robert W Shafer and Thomas Lengauer (2007). Improved prediction of response to antiretroviral combination therapy using the genetic barrier to drug resistance. Antiviral Therapy, 12 (2):169-178.