insights from patient authored text: from close reading to automated
Transcription
insights from patient authored text: from close reading to automated
INSIGHTS FROM PATIENT AUTHORED TEXT: FROM CLOSE READING TO AUTOMATED EXTRACTION A DISSERTATION SUBMITTED TO THE DEPARTMENT OF COMPUTER SCIENCE AND THE COMMITTEE ON GRADUATE STUDIES OF STANFORD UNIVERSITY IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY DIANA LYNN MACLEAN MARCH 2015 © 2015 by Diana Lynn MacLean. All Rights Reserved. Re-distributed by Stanford University under license with the author. This work is licensed under a Creative Commons AttributionNoncommercial 3.0 United States License. http://creativecommons.org/licenses/by-nc/3.0/us/ This dissertation is online at: http://purl.stanford.edu/nh030tg4542 ii I certify that I have read this dissertation and that, in my opinion, it is fully adequate in scope and quality as a dissertation for the degree of Doctor of Philosophy. Jeffrey Heer, Primary Adviser I certify that I have read this dissertation and that, in my opinion, it is fully adequate in scope and quality as a dissertation for the degree of Doctor of Philosophy. Michael Bernstein I certify that I have read this dissertation and that, in my opinion, it is fully adequate in scope and quality as a dissertation for the degree of Doctor of Philosophy. Christopher Manning I certify that I have read this dissertation and that, in my opinion, it is fully adequate in scope and quality as a dissertation for the degree of Doctor of Philosophy. Stuart Card Approved for the Stanford University Committee on Graduate Studies. Patricia J. Gumport, Vice Provost for Graduate Education This signature page was generated electronically upon submission of this dissertation in electronic format. An original signed hard copy of the signature page is on file in University Archives. iii Abstract Millions of people collaborate online with others who share their health concerns. In the process, these users perform complex health-related tasks, such as differential diagnosis and treatment comparison. The result is a massive, growing and readily accessible corpus of patient authored text (PAT) that documents patients’ behavior outside of the clinical environment. As a result, PAT can provide insights into otherwise obscure topics, such as why patients follow only certain parts of a treatment protocol, or how people self-treat stigmatized conditions such as prescription drug addiction. Despite the potential value of PAT, attempts to extract medically-relevant insights from it have been limited. PAT is notoriously noisy and challenging to work with, and there is a dearth of methods and tools for processing and analyzing it. Moreover, the specific research questions that PAT can support are not obvious: determining what data PAT encodes, and how, is a challenge in and of itself. In this thesis, I develop methods for automatically extracting medically-relevant data from PAT. I focus specifically on the topic of addiction: a stigmatized and prevalent medical condition. Building on close readings of source text to inform schema induction, data annotation, and feature engineering, I train classifiers that accurately identify (1) medically-relevant terms in PAT; (2) users’ motivations for participating in an addiction-related online health community; (3) users’ drugs of choice, and (4) users’ transitions through relapse and recovery. Using these classifiers to scale analyses to large PAT corpora, I derive novel insights into the process of addiction, as well as the role that online health communities play in giving users informational and emotional support and, ultimately, in enabling recovery. In concert, these contributions both underscore PAT’s latent value for illuminating poorly understood or clandestine medical topics, and offer viable methods that dramatically improve our ability to realize this value. iv For Angus and June v Acknowledgements My first and foremost thanks to go my advisor, Jeffrey Heer. Jeff has been a wonderful source of support, knowledge and inspiration during my time at Stanford, and I am deeply indebted to him for not only supporting my curiosity as my research ventured into uncharted territory, but for doing so with enthusiasm and confidence. Most importantly, however, Jeff has been an exemplary role model. I am lucky, grateful, and unquestionably better for having had the opportunity to learn from him, and am proud to be taking that with me as I start my next great adventure. There are several people without whom this dissertation would not have been possible: Anna Lembke, who brought with her invaluable medical perspective, and whose enthusiasm, thoughtful insight and patience were instrumental in making this cross-disciplinary work a reality; Stuart Card, whose ingeniousness I aspire to, and whose advice I have had the fortune to benefit from on several occasions; Sonal Gupta, a close friend and collaborator from whom I have learned a great many things, and hope to learn many more; and Michael Bernstein and Christopher Manning, who have given generously of their time and advice, helping to steer this work from its inception through its completion. I am fortunate to have had many wonderful co-conspirators while at Stanford. Sudheendra Hangal, whose patient support and advice were instrumental in my early graduate school years, has been a fantastic collaborator and a dear friend. Monica Lam, with whom I worked closely during my first year, remains an uplifting source of inspiration. The UW IDL group, the Stanford HCI group, and the fantastic people in the 3B wing have been a fun, dynamic and reliable source of new ideas, feedback and camaraderie, and will be greatly missed. Finally, Jillian Lentz and Monica Niemiec deserve special thanks for not only providing efficient administrative support, but also for answering even the most frantic of questions with a smile. Finally, there are some people without whom I would not be where I am today. The inimitable Margo Seltzer who, suffice it to say, started this whole business in the first place; David Holland, whose patient and thorough technical tutelage stands me in good stead to this day; Will Phan, who helped me to see the real joy in coding; my mother, Heather, who is the embodiment of never giving up; and, of course, my husband, Isa, who inspires and challenges me to be a little better every day. It makes all the difference. vi Table of Contents 1 Introduction 1 1.1 Overview & Focus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Contributions 4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.3 Outline of Thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 The Internet and Health 6 9 2.1 Online Health Information Seeking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 2.1.1 Historical Overview & Current Landscape . . . . . . . . . . . . . . . . . . . . . . 9 2.1.2 What Health Information Do Users Seek Online? . . . . . . . . . . . . . . . . . . 12 2.1.3 Who Seeks Health Information Online? . . . . . . . . . . . . . . . . . . . . . . . 12 Gender . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 Age . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 Health . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 Race . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 Socio-Economic Status & Education . . . . . . . . . . . . . . . . . . . . . . . . . 15 Role (Patient vs. Caregiver) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 2.1.4 Where Do People Find Health Information Online? . . . . . . . . . . . . . . . . . 15 2.2 Online Health Community Participation . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 2.2.2 Who Participates in OHCs? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 2.2.3 Reasons for Participation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 2.2.1 Modes of Participation Medium-Based Affordances . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 Informational Support . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 Emotional Support . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 2.2.4 Efficacy of Online Health Forums . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 2.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii 19 3 Prior Work on Patient Authored Text 21 3.1 Patient Authored Text (PAT): Introduction & Overview . . . . . . . . . . . . . . . . . . . . 21 3.1.1 Value of PAT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 3.1.2 Challenges of Working with PAT . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 Noisiness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 Lack of Analysis Tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 Applicability to Research Questions . . . . . . . . . . . . . . . . . . . . . . . . . 23 3.2 Syndromic Surveillance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 3.2.1 Condition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 3.2.2 Data Source . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 3.2.3 Filtering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 3.2.4 Modeling and Prediction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 3.2.5 Real-World Evaluation Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 3.3 Pharmacovigilance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 3.3.1 Data Source . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 3.3.2 Identifying Drugs in PAT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 3.3.3 Identifying Adverse Events in PAT . . . . . . . . . . . . . . . . . . . . . . . . . . 28 3.3.4 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 3.4 Named Entity Recognition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 3.4.1 Ontology-Based Tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 3.4.2 Statistical Classifiers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 3.5 Thematic Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 3.5.1 Condition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 3.5.2 Data Source . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 3.5.3 Analysis Question . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 3.5.4 Scaling Thematic Analyses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 3.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 Data 33 35 4.1 MedHelp Corpus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 4.1.1 Terminology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 4.1.2 Forum77 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 viii 4.2 CureTogether Corpus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 Identifying Medically Relevant Terms in PAT 39 40 5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 5.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 5.2.1 Medical Term Identification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 5.2.2 Consumer Health Vocabularies . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 5.3 Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 5.3.1 Preparation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 5.3.2 Samples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 5.4 Labeling Medically Relevant Terms with the Crowd . . . . . . . . . . . . . . . . . . . . . 45 5.4.1 Task Design and Pilot Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 5.4.2 Experiment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48 Determining a Gold Standard . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 Comparing Turkers Against a Gold Standard . . . . . . . . . . . . . . . . . . . . 49 5.4.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 5.4.4 Limitations of the Crowd 50 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.5 Training a Classifier on Crowd-Labeled Data . . . . . . . . . . . . . . . . . . . . . . . . 52 5.5.1 Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 5.5.2 Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 5.5.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 Failure Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 5.6 Example Applications of ADEPT to PAT . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 5.6.1 Summarizing Important Medical Content in MedHelp’s Arthritis Forum . . . . . . . 57 5.6.2 Navigating MedHelp’s Substance Abuse Forum (Forum77) . . . . . . . . . . . . . 57 5.7 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62 6 What do People Seek on Forum77? 64 6.1 Why Study Addiction? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65 6.1.1 Addiction is Highly Prevalent . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65 6.1.2 Addiction is Highly Stigmatized . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65 6.1.3 People are Turning Online for Help with Addiction . . . . . . . . . . . . . . . . . . 66 ix 6.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66 6.3 Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68 6.3.1 Thematic Analysis Development Dataset . . . . . . . . . . . . . . . . . . . . . . 68 6.3.2 Labeled Training & Testing Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . 68 6.4 Who Posts? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 6.5 Users’ Objectives in Initiating Discussions . . . . . . . . . . . . . . . . . . . . . . . . . . 69 6.6 Classifying Informational vs. Emotional Support . . . . . . . . . . . . . . . . . . . . . . . 70 6.6.1 Training Dataset Annotation and Agreement . . . . . . . . . . . . . . . . . . . . . 70 6.6.2 Classifier Training . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71 6.6.3 Classifier Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73 6.7 Classifying Updates vs. Non-updates . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73 6.7.1 Classifier Training . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73 6.7.2 Classifier Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73 6.8 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74 6.8.1 Thomas’ Recipe: An Informal Collaboration . . . . . . . . . . . . . . . . . . . . . 76 6.9 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77 6.9.1 Limitations and Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79 6.10 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 Identifying Drugs of Choice 79 83 7.1 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84 7.2 Datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85 7.3 Automatically Identifying Drugs of Choice . . . . . . . . . . . . . . . . . . . . . . . . . . 85 7.3.1 Definition of Drug of Choice . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85 7.3.2 Data Annotation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86 7.3.3 Classifier Training & Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86 7.3.4 Drug Term Resolution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87 7.4 Comparing Real-World DOC Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . 88 7.4.1 Forum77 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89 7.4.2 Narcotics Anonymous . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90 7.4.3 TEDS 90 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . x 7.4.4 DAWN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90 7.5 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91 7.6 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93 7.6.1 Limitations & Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94 7.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 Quantifying Recovery and Relapse 95 96 8.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96 8.2 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97 8.2.1 The Prescription Drug Abuse Cycle . . . . . . . . . . . . . . . . . . . . . . . . . 97 Withdrawal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98 Self-Detoxification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98 Relapse & Recovery . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98 8.2.2 In-Person Mutual Help Groups . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99 8.2.3 Inferring Health State from Social Media . . . . . . . . . . . . . . . . . . . . . . . 99 8.3 Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100 8.4 Exploring & Modeling Phases of Addiction . . . . . . . . . . . . . . . . . . . . . . . . . . 101 8.4.1 Transtheoretical Model for Behavior Change . . . . . . . . . . . . . . . . . . . . . 101 8.4.2 Rubric Development . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101 8.4.3 A Taxonomy of the Phases of Addiction . . . . . . . . . . . . . . . . . . . . . . . 102 8.4.4 Labeling People, not Posts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102 8.5 Characterizing the Phases of Addiction . . . . . . . . . . . . . . . . . . . . . . . . . . . 102 8.5.1 Sample & Labeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103 8.5.2 Activity Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104 8.5.3 Linguistic & Content Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105 LIWC Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105 Days Mentioned and Question Features . . . . . . . . . . . . . . . . . . . . . . . 105 Phase-Specific Term Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105 8.5.4 Results: Activity and Linguistic Features . . . . . . . . . . . . . . . . . . . . . . . 106 8.6 Automatically Classifying Addiction Phase . . . . . . . . . . . . . . . . . . . . . . . . . . 108 8.6.1 Model & Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108 8.6.2 Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109 xi 8.6.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109 8.7 Automatically Classifying Relapse and Recovery . . . . . . . . . . . . . . . . . . . . . . 111 8.7.1 Identifying Relapse . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111 8.7.2 Identifying Recovery . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113 8.7.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113 8.8 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115 8.8.1 Use and Efficacy of Forum77 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116 8.8.2 Implications for Forum Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117 8.8.3 Implications for Addiction Treatment . . . . . . . . . . . . . . . . . . . . . . . . . 118 8.8.4 Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119 8.9 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119 9 Conclusion 121 9.1 Contribution Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121 9.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123 9.2.1 Supporting the Methodological Process . . . . . . . . . . . . . . . . . . . . . . . 123 Interface Support for Thematic Analysis . . . . . . . . . . . . . . . . . . . . . . . 124 Improved Tools for Annotation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124 Mapping the Limits of the Crowd in PAT Annotation Tasks . . . . . . . . . . . . . 125 9.2.2 PAT Interface Design & Support . . . . . . . . . . . . . . . . . . . . . . . . . . . 125 Expose Aggregate Data to Users . . . . . . . . . . . . . . . . . . . . . . . . . . . 125 Support Data Entry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126 Automatically Construct User Timelines . . . . . . . . . . . . . . . . . . . . . . . 126 9.2.3 Making the Leap to Medical Discoveries . . . . . . . . . . . . . . . . . . . . . . . 126 9.3 Concluding Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127 A ADEPT Supplementary Material 128 B F77 Purpose Supplementary Material 129 C F77 Drug of Choice Supplementary Material 130 D F77 Phase Supplementary Material 136 xii List of Tables 4.1 Top 40 MedHelp forums ranked by total post count. A ◦ in the Stigmatized column denotes our conservative estimate of whether the condition represented by the forum carries a stigma or is otherwise embarrassing. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 5.1 Majority vote at the token level over RN responses. Terms identified by RNs as medically relevant are shown in bold. Stopwords (e.g.,“and”, “of”) are excluded from the vote. . . . 49 5.2 Turker performance against the RN gold standard. Voting threshold indicates the minimum number of Turkers who have to annotate a term as medically relevant for it to be included in the result. Maximum column values are indicated in bold. A corroborative policy of 2+ votes yields high scores across the board, and maximizes F1-score. . . . . . . . . . . . 50 5.3 Annotator performance against the crowd-labeled data set and the gold standards. Maximum column values are indicated in bold. . . . . . . . . . . . . . . . . . . . . . . . . . . 54 5.4 Examples of ADEPT’s misclassifications in the test corpora. . . . . . . . . . . . . . . . . 56 6.1 Summary statistics of a representative sample of online health communities focused on addiction recovery. We identified sites through Google searches and gathered statistics (if available) from site pages. Data current as of 3/1/2014. . . . . . . . . . . . . . . . . . 67 6.2 Annotator-derived taxonomy for users’ objectives in initiating a post, with % prevalence in the 1,000 post labeled sample on the right. Note that 1.) labels are mutually exclusive, 2) “w/d” stands for “withdrawal”. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71 6.3 Descriptions and samples of taxonomy labels. Samples are synthesized in order to preserve user privacy. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72 6.4 Classifier performance for labeling initiating posts as seeking informational support or emotional support. Performance scores are averaged over 10 folds. . . . . . . . . . . . . 73 6.5 Classifier performance labeling posts as either update or non-update. Performance scores are averaged over 10 folds. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiii 74 6.6 Thomas’ Recipe (circa 2001) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81 6.7 Thomas’ Recipe (circa 2006) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82 7.1 DOC classifier performance across term categories. The classifier performs best on correctly spelled, specific drug terms; worst on general drug terms. . . . . . . . . . . . . . . 87 7.2 Examples of DOCs extracted by our CRF classifier. Identified SOA terms are shown in bold in the context of their originating sentence, and the resolved drug name, generic name and category are shown on the right. . . . . . . . . . . . . . . . . . . . . . . . . . 87 7.3 Summary of similarities and differences between our Forum77, NA, TEDS and DAWN datasets. Forum77 is unique in that participation is always voluntary and that users report only substances that they deem relevant. . . . . . . . . . . . . . . . . . . . . . . . . . . 88 7.4 Alignment of categories across the Forum77, NA, TEDS and DAWN datasets for comparative purposes. Exact category terms from each survey have been preserved in this table for replicability. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89 8.1 Addiction Phase Taxonomy derived via a thematic analysis. . . . . . . . . . . . . . . . . 103 8.2 Sample phase specific terms for the USING, WITHDRAWING and RECOVERING categories. 106 8.3 CRF performance scores aggregated over 10 runs of 10-fold cross validation, with randomly shuffled input sets. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109 8.4 Performance for identifying relapse events (top) and whether a user’s final state is RECOVERING (bottom). Combined scores across classes are shown in bold. . . . . . . . . . . . 113 8.5 Comparison of activity features for users who are and are not RECOVERING in their last initiating post. Per-user values are aggregated over USING and WITHDRAWING posts. Statistical significance is determined using Kruskal-Wallis tests (*** p < 0.001) after Bonferroni corrections. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115 A.1 The following features are specified when training our CRF. Other features retain their default values as described at http://nlp.stanford.edu/nlp/javadoc/javanlp/edu/ stanford/nlp/ie/NERFeatureFactory.html . . . . . . . . . . . . . . . . . . . . . . . 128 B.1 Features used to train our purpose classifiers, which distinguish emotional from informational support seeking, as well as update from non-update posts. . . . . . . . . . . . . . 129 xiv C.1 Drug term resolution map, manually compiled from classifier output. The i column indicates whether the drug category is included in our analysis in Chapter 7. . . . . . . . . . 130 C.2 The default feature list for Stanford’s NER classifier is at nlp.stanford.edu/nlp/javadoc/ javanlp/edu/stanford/nlp/ie/NERFeatureFactory.html. Here, we list all features whose default values were changed to train our DOC classifier. . . . . . . . . . . . . . . 134 C.3 Gazette of common substances used as a feature in the DOC classifier. This gazette was compiled from a range of online resources. . . . . . . . . . . . . . . . . . . . . . . . . . 135 D.1 LIWC features for the three classes in the labeled dataset over initiating posts. Only statistically significant variables are shown. Statistical significance is determined using Kruskal-Wallis tests (* p < 0.05; ** p < 0.005; *** p < 0.001) after Bonferroni corrections to adjust for family-wise error rate across all 184 variables (includes activity features). Column c denotes (◦) if the feature is used in our CRF classifier. . . . . . . . . . . . . . . 136 D.2 LIWC features for the three classes in the labeled dataset. Only statistically significant variables are shown. Statistical significance is determined using Kruskal-Wallis tests (* p < 0.05; ** p < 0.005; *** p < 0.001) after Bonferroni corrections to adjust for family-wise error rate across all 184 variables (includes activity features). Column c denotes (◦) if the feature is used in our CRF classifier. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137 D.3 Activity and content-based features for the three classes in the labeled dataset. Statistical significance is determined using Kruskal-Wallis tests (* p < 0.05; ** p < 0.005; *** p < 0.001) after Bonferroni corrections to adjust for family-wise error rate across all 184 variables (includes 160 LIWC variables). Column c denotes (◦) if the feature is used in our CRF classifier. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138 xv List of Figures 1.1 Our general methodological process. Nodes in grey show avenues for future work supported by our contributions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.1 Illustrative example of MedHelp and Forum77 content and structure. . . . . . . . . . . . 5 37 4.2 Summary statistics of Forum77 variables: post volume by month (A), user volume by month (B), thread length distribution (C), user tenure distribution (D), user initiating post count distribution (E), and user response post count distribution (F). . . . . . . . . . . . . 38 5.1 Final PAT medical term identification task instructions and interface. Turkers were informed that their answers would be checked against other Turkers’ in the HIT description on the MTurk interface. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48 5.2 Sample sentences labeled by ADEPT, the dictionary, MetaMap, OBA and TerMINE. . . . 54 5.3 Term classification accuracy plotted against logged term frequency in test corpora. Purple (darker) circles represent terms that are always classified correctly; blue (lighter) circles represent terms that are misclassified at least once. A LOWESS fit line to the entire data set (black) shows that most terms are always classified correctly. A LOWESS fit line to the misclassified points (blue/lighter) shows that classification accuracy increases with term frequency. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 5.4 Top 50 terms, ranked by frequency, derived from MedHelp’s Arthritis forum as determined by ADEPT (left) and OBA (right). Terms unique to their respective portion of the list are shown in bold. Terms occurring in both lists are linked with a line. The gradient of these lines show that all co-occurring terms, bar three, are more highly ranked by ADEPT. . . . 58 5.5 A graph showing important terms in Forum77 (nodes), and significant co-occurrence relationships between them (edges). Node size is proportional to degree, while colors indicate clusters. Node labels are omitted for legibility; instead, we examine main clusters in-depth in subsequent figures. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xvi 59 5.6 The largest cluster in Figure 5.5 suggests that discussions frequently involve detoxification from prescription drugs. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60 5.7 The second-largest cluster in Figure 5.5 suggests that discussions frequently pair specific drugs and the withdrawal symptoms that they cause. . . . . . . . . . . . . . . . . . . . . 60 5.8 The third-largest cluster in Figure 5.8 contains medically relevant terms from Thomas’ Recipe: a user-developed schedule for medication-assisted opioid withdrawal. . . . . . . 61 6.1 Thematic analysis process. Orange edges indicate the iterative component of the analysis. 70 6.2 Normalized transition probabilities and average transition times between consecutive update and non-update posts. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76 7.1 Drug of choice distributions (% of population using) across the Forum77, TEDS, NA and DAWN data sets. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91 7.2 Prevalence of major opioids in the Forum77 population over time. . . . . . . . . . . . . . 92 8.1 Illustration of how sequence analysis can (1) reduce NA labels by leveraging context from surrounding posts, and (2) capture relapse events in regressive sequences without requiring the user to explicitly state that she relapsed. . . . . . . . . . . . . . . . . . . . . . . . 104 8.2 Confusion matrix for our CRF classifier aggregated across 10 randomized runs of 10-fold cross validation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110 8.3 (a) Normalized transition frequencies between addiction phases (e.g., USING → RECOVERING edges comprise 1.12% of the total transitions in the CRF-labeled data) and (b) conditional transition probabilities (e.g., the probability of a user moving from USING to RECOVERING is 4.57%.) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111 8.4 Distributions of phase lengths. A red bar indicates the median value, while the dark blue region indicates the middle spread. The light blue region indicates values that fall within 1.5 ∗ the interquartile range of the middle spread. . . . . . . . . . . . . . . . . . . . . . . 112 8.5 Aggregated user transitions from start to end state. Bar widths denote population proportion. For example, 48% of users in our sample relapsed during their tenure on Forum77. 114 9.1 Our general methodological process. Nodes in grey show avenues for future work supported by our contributions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124 xvii Chapter 1 Introduction Just keep in mind that whether you recommend online support groups or not, your patients will use them. There’s no getting around the fact that certain patients in your practice will become as knowledgeable about their conditions as they can. They will also begin to develop clinical judgment on their own. – Deborah Grandinetti: Doctors and the Web. Help your patients surf the Net safely [104]. 1.1 Overview & Focus The Internet has revolutionized the way in which people interact with medical knowledge, transforming its availability, leveling the playing field in terms of who can contribute such knowledge, and facilitating connections between people with shared health concerns. While to this day accessing and sharing medical knowledge via traditional resources (e.g., medical practitioners, textbooks, pamphlets, etc.) requires overcoming financial, scheduling and geographic barriers, such frictions are divorced from online resources. Indeed, the use of the Internet as a health resource is one of its earliest functions: with the commercialization of the Internet in 1995, patients readily took advantage of the ability to collaborate with others who shared their health concerns, and the first online health communities (OHCs), in the form of listservs, came into existence. Demand for such groups remains high today. Pew’s 2013 Health Online survey [91] reported that 59% of U.S. adults looked online for health information in the last year, and that of these, 16-18% specifically sought to find others who shared their health concerns. Based on the U.S. Census Bureau’s population estimate for 2013 [9], this comprises some 50-57 million people. Today, thousands of OHCs 1 CHAPTER 1. INTRODUCTION 2 exist, and while their interfaces have become slightly more sophisticated, their underlying functionality of connecting patients with mutual health interests remains unchanged. Through participation in online mutual help groups, patients can spend a sizable number of hours performing complex, health-related tasks. These include differential diagnoses (of either their own or someone else’s condition), treatment comparison and evaluation, symptom measurement and documentation, and seeking and providing emotional support. To perform these tasks, patients draw on a variety of resources: their own experiential knowledge, observations of other community members’ experiences, information sourced from healthcare providers, and the fruits of self-directed research efforts. The culmination of this effort is a massive, and growing, corpus of data contributed by patients who have gained not a small degree of clinical expertise in their own condition. Although in some cases these data are structured (e.g., PatientsLikeMe1 and CureTogether2 collect symptom severity measurements on numerical scales), for the most part OHCs have barely deviated from the original listserv format, meaning that a large portion of these data exist as free-form text. We term any medical text authored by patients patient authored text (PAT). PAT contains inherently valuable content. Foremost, PAT uniquely documents patients’ behavior outside of the clinical environment. As such, it can host insight into topics that remain obscure in traditional medical data sets, such as why patients follow only certain parts of a treatment protocol, or how people self-manage conditions that carry a stigma in the medical profession, like addiction [176, 187]. Answers to such questions could have high-level policy impacts on healthcare systems, potentially affecting both their efficiency and efficacy. PAT may also contain data of immediate medical value. Prior work has leveraged PAT to identify disease trends [33, 41] and adverse drug events [257]. Through active collaboration, OHC participants have uncovered novel insights into disease co-morbidities (such as a correlation between asthma and infertility [40]) and drug-treatment effects (such as the questionable efficacy of lithium as a treatment for ALS [260]) which have been replicated in subsequent medical trials [97, 260]. Finally, medically-relevant data derived from PAT could be used to both enhance community design as well as support members in tasks that they already perform, such as polling treatment popularity or sourcing drug reviews. In spite of the inherent value in PAT and the enormous number of human-intelligence hours invested in its creation, attempts to leverage PAT have been limited for three main reasons. First, PAT is notoriously noisy and often incomplete, making it challenging to work with. For example, the fact that authors may have only partial mastery over medical terminology casts the accuracy of their symptom descriptions 1 http://www.patientslikeme.com 2 http://www.curetogether.com CHAPTER 1. INTRODUCTION 3 into doubt. Moreover, they may omit important information and their contributions may be infrequent and irregular. Second, and closely related, is the dearth of methods, approaches and toolkits for extracting medicallymeaningful data from PAT. Take, for example, the basic problem of identifying medically-relevant terms in PAT. While well-established toolkits for extracting medical terms from text authored by medical experts exist, as we show in Chapter 5 their performance on PAT is sufficiently poor that the resulting output is of dubious analytic value. Third, the question of whether PAT contains data of medical relevance is contentious. As we discuss in detail in Chapter 2, medical professionals especially take issue with such claims. Even taking an openminded perspective, however, the medical relevance of PAT in relation to a specific research question is usually unclear, and must be determined empirically. This relevance tends to depend on how well the research question aligns with users’ motivations for authoring PAT. For example, because people mention their influenza-like symptoms on social media platforms, Twitter is a viable data source for monitoring influenza outbreaks [10, 15, 62, 213]. However, Twitter would be a poor data source for comparing drug dosage efficacy, because people do not consistently tweet drug dosages, schedules, and self-reported wellness metrics. Determining what medically relevant signals are present in PAT is a challenge separate from extracting them. Our goals in this work are twofold: first, to develop methods for extracting a variety of medicallyrelevant data from PAT. Second, to uncover medically-meaningful insights through the application of these methods. To this end, we focus specifically on the topic of addiction, studying Forum77: MedHelp’s 3 online health community for Addiction & Substance Abuse. Addiction is both highly prevalent, affecting 16% of Americans ages 12 or older (about 40 million people), which far exceeds the number of people afflicted with heart disease (27 million), diabetes (26 million), or cancer (19 million) [4], and highly stigmatized, even within the medical profession [176, 187]. These facts conspire to make addiction-related PAT a rich source for novel and impactful insights. Our work draws from and contributes to several fields in Computer Science. From the Human Computer Interaction perspective, we investigate crowdsourcing as a method for large-scale data annotation, and leverage methodological work on thematic analyses to develop taxonomies of medically relevant information contained in our PAT data sources. From the Computer Supported Cooperative Work perspective, we investigate the types of support that users give and receive, and analyze on-site behavioral 3 http://www.medhelp.org CHAPTER 1. INTRODUCTION 4 and content features that correlate with successful and unsuccessful participatory outcomes in Forum77. On the Natural Language Processing side, we evaluate the application and extension of existing statistical classification methods to a variety of PAT information extraction tasks. Finally, to guide the validity of our work from a medical perspective, we collaborated closely with an addiction specialist: a practicing psychiatrist who specializes in the topic of substance use disorders. 1.2 Contributions In concert, this thesis contributes a viable, multi-stage approach for finding and extracting data of medical relevance from PAT. The specific contributions of this thesis are: Targeted literature reviews that serve both to illuminate the landscape of related work as well as contextualize our own work. In particular, we review: Online health seeking behavior: via a cross-disciplinary literature review, we first synthesize an overview of the demographics, methods and motives of people who seek health information online. Next, we narrow our focus to the specific topic of OHC participation, exploring users’ reasons for participation as well as whether and how such participation is beneficial (Chapter 2). Prior work analyzing patient authored text: we conduct an extensive review of literature utilizing PAT as a primary data source, including work on pharmacovigilance, syndromic surveillance, entity extraction and thematic analyses (Chapter 3). To our knowledge, this review is the first comprehensive synthesis and summary of data sources, methods, goals and outcomes of prior work that utilizes PAT as a primary data source. Methods for extracting medically-relevant data from PAT. Our characteristic methodology, illustrated in Figure 1.1, moves through human categorization and labeling of data to automatic extraction and analysis. Accordingly, our methods comprise multiple stages, including inductive content analysis, data annotation, feature engineering, classifier training and result analysis. Our specific contributions are: A method for crowdsourcing medically-relevant term annotation in PAT. Having medical experts annotate data is both costly and slow. We show that for the task of identifying medicallyrelevant terms in PAT, a crowd of non-experts yields annotations comparable in quality to those submitted by medical professionals (Chapter 5). CHAPTER 1. INTRODUCTION 5 Data-driven annotation rubrics describing what users seek when they initiate posts on Forum77 (Chapter 6), as well as the phases of addiction that users exhibit on Forum77 (Chapter 8). These rubrics, educed via thematic analyses of Forum77 content, serve as novel contributions in their own right as well as reusable guides for data annotation. A novel analysis of behavioral and linguistic features that correlate with each phase of addiction. The results of this feature space analysis (Chapter 8) give novel insight into how the psychologically and physiologically distinct phases of addiction correspond with Forum77 users’ behavior and linguistic usage. They are also a valuable resource for feature design and engineering. Trained classifiers that accurately extract medically-relevant data from PAT. We train classifiers that accurately extract medically-relevant terms (Chapter 5), addictive drugs of choice (Chap- Future Work ter 7), phase of addiction at the time of writing (Chapter 8) and the type of support that a user is seeking when she initiates a thread (Chapter 6) from PAT. These classifiers are novel in function. We make them freely available to support future work and comparisons in this area. close annotation reading PAT Content Schema training application Labeled Labeled Data Data Features Classifier (human) (auto) schema revision processing & analysis Processed Data Insights tuning PAT interface design Medical Discovery Figure 1.1: Our general methodological process. Nodes in grey show avenues for future work supported 108 by our contributions. Medically-relevant insights on Addiction. Our classification methods allow us to scale our analyses to the entire Forum77 population. Some of the resulting insights are, to the best of our knowledge, novel to both the Computer Science and the Addiction literature. These insights include the discovery that: CHAPTER 1. INTRODUCTION 6 Users actively collaborate on developing highly effective medication-assisted withdrawal treatment protocols. The most prevalent example of this is Thomas’ Recipe, a detailed protocol for medication-assisted opiate withdrawal that has evolved on Forum77 over the course of several years (§ 6.8.1). The Forum77 population is comprised almost entirely of people struggling with prescription opioid abuse, making it strongly distinct from traditionally surveyed drug-using populations. Our results evidence that such populations are not well covered by existing medical research methods. While relapse is common, chances of a user leaving Forum77 in the state of RECOVERING are favorable. Although different methodological approaches make comparison with real-world treatments difficult, our results suggest that Forum77 is an effective self-detoxification resource. Active participants are more likely to leave Forum77 in a state of RECOVERING. Such users participate significantly more frequently than those who leave in a state of ¬ RECOVERING, even when they are USING and WITHDRAWING. This resonates with prior research that shows that increased participation in the traditional mutual help group Alcoholics Anonymous correlates with sustained sobriety [190, 223]. 1.3 Outline of Thesis Chapters 2-4 serve to contextualize our work and give the reader a framework for reference and evaluation. Chapter 2 presents a targeted literature review of online health information seeking. We begin with a broad overview of online health information seeking (§ 2.1) before focusing on the question of who participates in OHCs, their motivations for doing so, and the associated benefits and pitfalls of participation (§ 2.2). Chapter 3 begins with a definition of PAT accompanied by a discussion of its values and the challenges that it presents (§ 3.1). Next, we synthesize prior work that utilizes PAT as a primary data source, including syndromic surveillance (§ 3.2), pharmacovigilance (§ 3.3), Named Entity Recognition (§ 3.4) and Thematic Analyses (§ 3.5). Chapter 4 describes the data sets that we use in our work: the MedHelp corpus (§ 4.1), which includes the Forum77 data set, and the CureTogether corpus (§ 4.2). While PAT contains a wealth of information, it is inherently noisy, and requires text mining techniques to extract data of value. In Chapter 5, we address one of the most basic problems of this sort: identifying medically-relevant terms in PAT. After discussing related work (§ 5.2) and data preparation (§ 5.3), we CHAPTER 1. INTRODUCTION 7 explore the feasibility of replacing experts with non-expert crowds in medical term annotation tasks (§ 5.4). Next, we show that a conditional random field (CRF) model trained on crowd-labeled data dramatically outperforms state of the art medical term annotation tools (§ 5.5). Finally, we demonstrate the effectiveness of our approach through applying our classifier to large PAT corpora (§ 5.6). While our results demonstrate the efficacy of our approach, we find that the extracted data are too broad for deriving insights on specific medical conditions. We narrow our focus to the topic of addiction, one of the most urgent public health issues of the day. Understanding why people participate in Forum77 is a precursor to more targeted analyses. Chapter 6 poses the question, “what do people seek on Forum77?”. We first motivate studying the topic of addiction (§ 6.1), before discussing related work (§ 6.2) and data preparation (§ 6.3). Next, we present the process and result of a thematic analysis of users’ motivations for initiating Forum77 discussions (§ 6.5). Congruent with prior work, driving motivations are the seeking of informational and emotional support. In terms of informational support, we find that users primarily seek explicit medical advice on prescription opioids. In the emotional support category, the update post, in which users log their progress but request no feedback, is highly prevalent. We train machine learning classifiers to distinguish emotional from informational support-seeking (§ 6.6), as well as update from non-update posts (§ 6.7). Finally, we present and discuss the results of applying our classifiers to the entire Forum77 data set (§ 6.8 & § 6.9). Chapter 7 establishes whether the Forum77 population is similar to traditionally surveyed drug-using populations in terms of drugs of choice (DOCs). We first discuss related work (§ 7.1) as well as our data preparation and sampling (§ 7.2). Next, we present our method for automatically extracting users’ DOCs from Forum77 initiating posts (§ 7.3), which comprises data annotation, classifier training and term resolution. We then detail how we compare our classifier-derived Forum77 DOC distribution with those from three traditionally-surveyed drug-using populations (§ 7.4). Among other things, our results (§ 7.5) indicate that Forum77 is used primarily by people struggling with prescription opioid use disorders, rather than by people using traditionally-abused substances such as alcohol, cocaine and marijuana (§ 7.5). Finally, we discuss the implications and opportunities revealed by these results (§ 7.6). Chapter 8 focuses on the topic of the cycle of abuse, a well-known concept whose stages and transitions, to the best of our knowledge, have never been quantified. Drawing on the addiction literature, we first describe the phases of drug abuse and define key terminology (§ 8.2), and then describe our data preparation and sampling (§ 8.3). Next, building on the well known Transtheoretic Model for Behavioral Change [203], we develop a taxonomy describing the phases of addiction as they are expressed on CHAPTER 1. INTRODUCTION 8 Forum77 (§ 8.4). We then analyze a variety of behavioral and content-based features in order to identify features that discriminate between the phases USING, WITHDRAWING and RECOVERING (§ 8.5). Next, we present our statistical classifier for identifying addiction phase (§ 8.6), and discuss how this enables us to identity important sequences in the process of addiction, such as relapse and recovery (§ 8.7). Aggregating these events across the entire Forum77 membership base indicates, amongst other results, that although relapse is common, reaching a state of RECOVERING prior to leaving the forum is likely (§ 8.7.3). In Chapter 9, we reiterate the main contributions of this thesis (§ 9.1), and outline challenges for future work (§ 9.2), and offer our concluding thoughts (§ 9.3). Chapter 2 The Internet and Health Millions of people around the world seek health information online, and have been doing so since the earliest days of the Internet [166]. But who are these people, and what do they seek? Our goal in this chapter is to provide readers with a contextual backdrop against which to interpret our work. Drawing on prior work from Computer Science, Medical Informatics and Medicine, we first describe online health information seeking in general (§ 2.1), beginning with an historical overview before investigating what kinds of information people seek, who seeks this information, and where. Next, we focus on a specific subset of online health information seeking: online health community (OHC) participation (§ 2.2). We pay particular attention to who participates, their motivations for doing so, and potential benefits associated with participation. Finally, we summarize our findings (§ 2.3) before moving on to a literature review of prior work utilizing PAT as a primary data source (Chapter 3). 2.1 Online Health Information Seeking 2.1.1 Historical Overview & Current Landscape When the Internet was commercialized in 1995 [120], widespread consumer adoption brought with it widespread supply and demand for health information [49]. The Internet made health information more accessible. An example illustrates: between 1997-1998 the National Library of Medicine (NLM) made Medline1 , a repository of journal citations and abstracts from the biomedical literature previously only available to medical professionals, publicly accessible online. The number of queries to Medline increased almost threefold, from 7 million to 120 million, with more than 30% of new queries stemming from consumers [49]. In response, the NLM launched MedlinePlus2 , a site hosting information targeted 1 https://www.nlm.nih.gov/bsd/pmresources.html 2 http://www.nlm.nih.gov/medlineplus 9 CHAPTER 2. THE INTERNET AND HEALTH 10 specifically at patients and their families [49]. The move was a roaring success: in the first quarter of 1999, MedlinePlus had 62,638 unique visitors. Since then, this statistic has only increased: in the third quarter of 2013, the site had ∼81,000,000 unique visitors [172]. In addition to making health information more accessible to consumers, the Internet also broadened the scope of potential contributors: for the first time, health information could be easily sourced from and exchanged between patients. Widespread, patient-driven mutual help efforts unfolded simultaneously with the commercial web. As early as 1997, Salem et al. [215] published an analysis of an online mutual help group for depression; their study covered 2 weeks’ worth of data and comprised 533 participants. Even earlier, in 1996 Mayer and Till [166] published a short, interview-based study of a breast cancer listserv allegedly utilized by thousands of patients. Today, a full 8% of Internet users in the U.S. report either sharing a personal health experience or posting a related question online [91]. The revolution in how health information was created and shared was received primarily positively by consumers and sociologists, who celebrated its potential for “democratizing” healthcare and rebalancing the power dynamic in doctor-patient relationships [182]. The reaction from the medical community was substantially more turbulent. Early research on online health information seeking raised concerns about the quality of the information available, as well as patients’ ability to evaluate it critically [49, 156, 181, 182, 199, 210]; some even described the phenomenon as an “epidemic of misinformation” [51]. Indeed, discussion in the medical literature at the time communicates a strong resistance to the idea of patients pursuing medical knowledge outside the purview of a medical professional [104, 182]. For example, in 2000 the Journal of Medical Economics initiated a series of articles aimed to educate doctors about online resources so that they, in turn, could guide their patients through the plethora of available online health information resources. The first article in the series is titled, “Doctors and the Web: Help your patients surf the Net safely” [104]. Despite these concerns, analyses of online health seeking behavior indicates that patients are, in fact, highly skeptical of information presented online and take care to evaluate it critically [21, 105, 156, 178, 182, 205, 209, 225]. Patients tend to mistrust information from websites that appear to be primarily commercial [92, 182], have unclear sources of information [92], or that seem unprofessional or highly opinionated [182]. Moreover, rather than taking a single source at face value, patients typically evaluate information quality by aggregating information from multiple sources [82, 92, 182, 205], and even posing and testing hypotheses from one information source to the next [225]. That said, online health seekers are not infallible: cyberchondria – the escalation of a user’s perception of the severity of her medical CHAPTER 2. THE INTERNET AND HEALTH 11 state as a result of researching it online – has been provably documented, and results in increased stress levels and potentially unnecessary use of available medical resources [254, 255]. Measuring the quality of online health information is challenging. Prior work finds that information accuracy tends to be high [25, 80]. For example, in an independent evaluation of 4,600 posts on The Breast Cancer Mailing List3 , Esquivel et al. [80] found only 10 (0.22%) posts containing misleading or incorrect information. Of these, 7 were identified as such by participants and corrected within 4.5 hours. However, the majority of studies from the medical domain conclude that online health information is of subpar quality [21,83]. A common point of failure cited is whether the information is “complete” (covers all medically-relevant details). However, the value of the completeness metric has been called into question: first, including all relevant medical information might comprise information overload for readers [83]. Second, as patients typically synthesize medical information from a variety of sources [82, 92, 182, 205], they are likely robust to this. Patients themselves report that in general they have no trouble finding the information that they need online [92, 105]. Despite this, strong resistance, and even condescension, from medical professionals is a common response to the idea of patients pursuing medical knowledge online. “Many of the participants reported symptoms that they attributed to using a computer keyboard, so it appeared incongruous that they turned for help to an activity that required more typing”, quip Culver et al. [64] in an evaluation of an online health community on Carpal Tunnel Syndrome. Yet even amongst surveyed physicians, there is general agreement that the result of patients pursuing medical information online is rarely harmful, and in fact can be moderately beneficial [181, 199]. One explanation may be that the public dissemination of medical knowledge, which was previously exclusive and difficult to access, challenges medical professionals’ dominance as medical experts [116]. Indeed, many physicians who feel that online health information seeking negatively impacts the doctor-patient relationship also feel that their patients are challenging their authority [11, 181]. Today, almost 20 years later, most research agrees that the nature of the patient-doctor-internet relationship remains in flux, with resistance from the medical field barring potential synergies from reaching fruition [14, 121]. 3 http://www.bclist.org CHAPTER 2. THE INTERNET AND HEALTH 2.1.2 12 What Health Information Do Users Seek Online? Despite the concerns echoed in the medical literature, patients seem disinclined to stage a cyber coup d’etat against the medical profession. In fact, with the exception of teens [105, 209], patients rarely consider the Internet their primary or most important source of medical information [82, 165, 200]. Rather, information acquired online tends to supplement or complement that acquired through traditional channels [82, 149, 209], and is often sought for the express purpose of discussing it with a medical practitioner [49, 92, 181, 205, 225]. Moreover, patients have preferences over which types of information they would prefer to acquire online: respondents to Pew’s 2010 Peer-to-peer Healthcare Survey [90] said that they would prefer to communicate with medical professionals for information regarding prescription drugs and alternative treatments, an accurate diagnosis, and recommendations for other medical professionals and medical facilities. Peers and professionals were rated as equally helpful for practical advice for day-to-day coping, and peers were rated most helpful for emotional support and quick remedies for non-urgent, everyday health issues. Major categories of online information sought by patients include finding disease-specific information [49, 91], finding information about particular medial treatments or procedures [91]; and attempting to diagnose or treat a new condition [49, 91]. In fact, Pew’s 2013 Health Online survey found that 35% of American adults tried to diagnose a condition using information found online; of these, roughly half followed up with a medical professional [91]. Cartright et al. [42], who analyzed user search logs surrounding self-diagnosis attempts, observed two patterns: evidence-based searching, in which users searched for a condition that matched a set of symptoms and risk factors, and hypothesis-based searching, in which given a specific condition, users searched for symptoms and risk factors associated with that condition. Minor categories of health information sought online include finding information about health insurance, food and drug safety recalls, interpreting medical test results, information on weight loss [91], and finding reviews on medical professionals or medical facilities [49]. Finally, an estimated 16-18% of online health seekers go online specifically to find others who share their health concerns [90, 91]. 2.1.3 Who Seeks Health Information Online? Early proponents of the Internet as a health information resource touted its potential as a liberating technology for those with limited access to traditional health resources [182]. In some ways this is true: online health information seeking seems to be need-driven, with those suffering from chronic or CHAPTER 2. THE INTERNET AND HEALTH 13 stigmatized conditions more likely to seek health information online. However, survey-based research also points to a strong “digital divide” between those who have access to, and are comfortable using the Internet as a determinant of who searches for health information online. We discuss discriminating features in detail below. Gender Women are more likely to seek health information in general [57], and this trend is mirrored online [37,57, 90–92, 165] despite the fact that men and women have equal access to the Internet [91]. Pew’s Health Online [91] survey in 2013 estimated that while 53% of all U.S. male adults look for health information online; the corresponding statistic for U.S. female adults is 64%. Extrapolating from the 2013 U.S. Census results [9], approximately 55% of online health seekers are female. In a survey exploring online health information seeking in 2000, Fox & Rainie [205] describe several differences between men and women’s health seeking behavior. First, while both men and women are equally likely to search for information in relation to a parent or older relative, women are twice as likely to search for information on behalf of a child. This is likely a residual of the fact that women spend more time on child care [192]. Finally, women are more likely to search for information related to specific conditions (either physical or mental), while men are more likely to search for information related to sensitive topics and for information on treatment timelines and administration [205]. Age Studies measuring the age distribution across online health information seekers report that it is relatively uniform among adults until the age of 65, at which point it declines [37, 91, 165]. This is contrary to the fact that health needs generally increase with age, and stands in contrast to the age distribution over offline health information seekers, who tend to be older (mean age 40 vs. 52) [57]. Both Cotten et al. [57] and Bundorf et al. [37] hypothesize that this discrepancy is due to the fact that younger people have more access to and experience with using the Internet. In fact, health information seeking is one of the most common and important online activities for young people [105, 209]. A random-dial survey of 1,209 respondents aged 15-24 initiated in 2002 by healthcare provider Kaiser [209] found that 75% of respondents had looked for health information online: more than had downloaded music (72%), played games (72%), shopped online (50%) and participated in chat rooms (67%). In fact, many young people consider the Internet to be their primary source of health information [105]. CHAPTER 2. THE INTERNET AND HEALTH 14 Health People suffering from chronic conditions (e.g., asthma, diabetes etc.) [37, 90] and people suffering from stigmatized conditions (e.g., anxiety, herpes, addiction) [24, 67] are highly likely to seek health information online. A casual inspection of our own MedHelp data set (described in Chapter 4) corroborates this: 8 of the top 20 forums focus on stigmatized or otherwise embarrassing conditions including addiction, Hepatitis C, STDs and HIV (see Table 4.1). Other health characteristics that correlate with online health information seeking include experiencing a medical crisis within the past year [90], experiencing a significant change in physical health (e.g., weight loss/gain, smoking cessation) [90], having a rare condition [90], and having significant barriers to health care (e.g., expense, travel distance) [37]. This suggests that online health seeking behavior is need-driven; however, other evidence also points to a digital divide: people are more likely to seek health information online if they have health insurance [91] and a regular healthcare provider [165]. Finally, online health seekers self-report as being healthier than their offline counterparts [57]. Race Pew’s 2013 Health Online survey [91] reports that 83% of Caucasian adults go online: significantly more than adult African Americans (74%) and Latinos (73%). Therefore, at a population level, significantly more Caucasians search for health information online. In a study of online health information seeking in youth, Rideout et al. [209] observe the same phenomenon, noting that fewer African American and Hispanic youth in their survey had Internet access at home. Controlling for adults who use the Internet shows no significant differences in ethnicity between those who search for health information online and those who do not. In addition, Cotton et al. [57] find no significant differences in ethnicity between online and offline health seekers. However, Pew’s 2013 Health Online survey [91] highlights some statistically significant, ethnicity-based differences in what kind of information people seek. For example, Caucasians are more likely than African Americans and Latinos to look online for a diagnosis and for information pertaining to a specific disease/condition, and are less likely to search for information on weight loss. African Americans are more likely to conduct online research on a drug seen in advertising, while Latinos are more likely to search for information on pregnancy. CHAPTER 2. THE INTERNET AND HEALTH 15 Socio-Economic Status & Education Online health seekers tend to have higher income levels than those who do not seek health information online [57,74,165]. In addition, higher levels of education correlate with online health seeking [57,74,91, 165]. This again suggests a digital divide, with those who have ready Internet access being more likely to use it as a health information resource. However, o work points out that literacy and language barriers can prevent people from engaging fully with online health resources [25, 49]. Role (Patient vs. Caregiver) Queries conducted on behalf of someone else (e.g., a child, a parent or other older relative, or a friend) comprise roughly 50% of all online health inquiries [90, 91]. Usually such “caregivers” are either women [205] or parents [91] (or both). 2.1.4 Where Do People Find Health Information Online? There are myriad ways of accessing health information online. We highlight those most often discussed in related work. Search Engines The majority of online health information quests start at a search engine such as Google4 , Yahoo5 or Bing6 [82, 91, 114, 178]. Users iteratively refine their queries based on search results [82, 114], and in the majority of cases are successful in finding the information that they are looking for [92, 114]. Medical Information Portals Sites such as WebMD7 and MedlinePlus8 serve as medical information portals and are heavily utilized [172]. However, it is rare for online health seekers to have a favorite or “go-to” information portal [92], and they are rarely the starting point of a user’s search [82]. Online Health Communities Online health communities (OHCs) provide an interactive environment in which users can seek others familiar with their health concerns and acquire tailored information. These groups provide social support, information and shared experiences, and can be empowering 4 http://www.google.com 5 http://www.yahoo.com 6 http://www.bing.com 7 http://www.webmd.com 8 http://www.nlm.nih.gov/medlineplus CHAPTER 2. THE INTERNET AND HEALTH 16 for patients [49]. Prior work indicates that a significant proportion of online health seekers ultimately participate in an OHC, with estimates ranging from 8% [91] to 16% [90] to 25% [49]. We discuss OHC participation in depth in the next section. 2.2 Online Health Community Participation Having outlined the landscape of online health information seeking in general, we now turn to the specific topic of online health community participation. Where possible, we expand on any relevant details introduced in § 2.1. We briefly discuss modes of participation (§ 2.2.1), before addressing the question of who participates in OHCs (§ 2.2.2), why (§ 2.2.3), and what measurable benefits may result from their participation (§ 2.2.4). 2.2.1 Modes of Participation OHCs typically comprise environments in which users communicate via posted messages. There are three primary forms of participation on an OHC: users start new discussions by contributing initiating posts, and respond to existing discussions with response posts. The third, much overlooked, mode of participation is lurking, in which users read community-generated content, but never contribute or make their presence known in any way. Lurking is prevalent in all kinds of online communities [185, 202], although possibly less so in health-oriented OHCs [186]. Prior work suggests that lurkers’ demographics and motivations for participating align closely with those of active OHC participants [202]. Moreover, lurkers and active members derive the same benefits from OHC participation [246]. As defining and measuring lurking behavior is challenging, we do not discuss it further in our own work, but note here that capturing lurking behavior is an important avenue for future work. 2.2.2 Who Participates in OHCs? Demographic analyses of OHC participants similar to those offered in § 2.1.3 are scarce. Unlike the problem of general health information seeking, OHCs focus on specific medical conditions, many of which correlate with particular demographic factors. For example, people suffering from breast cancer tend to be female, and people suffering from Alzheimer’s tend to be older. However, in concert with research on online health seeking behavior [24], Davison et al. [67] find that social factors that predict for face-to-face support group seeking correlate with those that predict for CHAPTER 2. THE INTERNET AND HEALTH 17 online support group seeking. Specifically, conditions that are embarrassing, stigmatized, or disfiguring, as well as conditions in which a patient’s attitude towards the condition is important in treatment outcome, lead people to seek the support of others with similar conditions online. 2.2.3 Reasons for Participation A user’s overarching goal in joining an OHC is to align herself with other people who share her health concerns [90, 96, 259]. A great deal of literature examines patients’ perceived benefits to OHC participation. Results tend to fall into one of three categories: (1) medium-based affordances, in which users cite practical advantages related to the fact that OHCs are online, digital resources; (2) informational support; and (3) emotional support. We discuss each of these in detail below. Medium-Based Affordances By nature of being online and digital, OHCs have several unique characteristics that users view as advantageous, such as the convenience of having the community be available around the clock [49, 60, 162, 205, 275]. Other factors cited include providing access to a wide range of people, information and experiences [162, 205]; the fact that such information is personalized or tailored [49]; the ability to store and edit personal narratives [117, 162]; and the perception of privacy and anonymity on OHCs [49, 105, 205, 270, 275]. Users’ ability to conceal their true identities has also been credited with increasing their propensity to discuss issues that they would not discuss face-to-face [21,105,149]. Finally, OHC content is easily searchable, making it easier for patients to browse and filter for suitable people to approach for help. In an analysis of PatientsLikeMe, Frost et al. [96] conclude that searching for similar users is the primary motivation behind patients’ sharing their data with each other. Informational Support The two most cited benefits of OHC participation are the information and emotional (sometimes called “social”) support given by the community [36, 47, 86, 122, 131, 148, 149, 162, 211, 243, 250, 258]. Informational support constitutes the exchange of clinical as well as experiential knowledge relevant to a particular condition. Typical topics of discussion include treatments and treatment options [47, 96, 258], symptoms [96, 258], preventive care [47] and condition outcomes [47, 96]. Patients seek this information for several reasons, including learning what to expect in the future and how to plan for it [47], informing decision making (especially related to treatment options) [47, 122], informing day-to-day care/everyday CHAPTER 2. THE INTERNET AND HEALTH 18 illness management (coping strategies) [60, 90, 122, 131], advice on managing interactions with others (e.g., from healthcare professionals to colleagues to family) [122], and often for simply acquiring a better understanding of their condition [47, 122, 149, 258]. As such, OHCs are often a source of information distinct from and complementary to that typically acquired via medical practitioners. Emotional Support In addition to being valuable sources of personalized informational support, OHCs provide users with an accepting and safe space to vent emotions or discuss uncomfortable topics [149, 243]. Participation provides users with a means of articulating and making sense of their experience, which they find empowering [131,173]. Patients also receive positive affect, encouragement and sympathy from their fellow community members [60, 131]. Continued participation over time may result in patients taking on new, supportive roles [164] as well as developing increased optimism towards their situation [211]. OHCs also provide patients managing serious conditions with unique types of emotional support that are difficult to acquire elsewhere. For example, patients find that sharing with people like them partially relieves the burden of care placed on family members who, despite their best intentions, cannot empathize with the patient’s experience [162, 243]. In addition, patients find that while family and friends tend to try to normalize their (the patient’s) emotions – even when they are inappropriate – online communities challenge users on inappropriate emotional behavior [243]. 2.2.4 Efficacy of Online Health Forums While patients perceive many benefits to participating in OHCs, measuring the effect of participation on their health outcomes is difficult, and raises the question of what metrics really matter in health management. Would we consider OHC participation effective if it altered disease outcome, or shortened time to recovery? What about if it imparted a sense of control and wellbeing on patients, improving quality of life, even if it had no effect on prognosis? Although OHC efficacy is difficult to define, participation has been shown to promote effective disease management strategies [93, 131, 148, 211], and impart psychosocial benefits, such as improved ability to cope [148, 150, 179], improved mood/decreased distress [158, 211], and improved stress management [211]. Moreover, some studies report measurable beneficial effects on symptoms. Houston et al. [130] found that increased participation in a depression-oriented OHC correlated with likelihood of users experiencing a resolution in their condition. Lieberman et al. [158] found that cancer patients who participated in OHCs reported a decrease in physical pain. However, they note CHAPTER 2. THE INTERNET AND HEALTH 19 that it is impossible to tell whether this was due to emotional suppression on behalf of their subjects: a conundrum afflicting the measurement of any subjective symptom. In general, then, research points to OHC participation having beneficial effects for patients. However, the jury is still out when it comes to conditions in which negative behaviors are enabled through social interaction with similar patients [21]. While some research finds that OHC participation provides increased protection and motivation for continuing these behaviors, others conclude that the overall experience may be a more positive way of dealing with the condition than traditional methods [21, 89, 179]. For example, Wilson et al. [261] found that patients learned new binging and purging techniques on both pro-eating disorder sites9 and pro-recovery sites. However, while they found no significant difference in final health outcomes between the two groups, users of pro-eating disorder sites experienced a significantly longer illness duration [261]. On the other hand, group bonds forged through shared secret identity may render participants less likely to reveal their condition to others, potentially increasing the likelihood that they will not seek appropriate help [98]. 2.3 Summary Our goal in this chapter was to provide a general overview of the landscape of online health information seeking. Beginning with an historical overview (§ 2.1.1), we noted that the advent of the Internet both made health information more accessible, and made it possible for anybody to contribute health information online. From patient’s perspective, this was a largely positive improvement, and a great deal of research supports the notion that little harm, other than cyberchondria, arises from online health information seeking. The medical community, however, remains somewhat opposed to people pursuing health information outside of the purview of medical professionals. In general, online health seekers search for information on specific diseases and diagnoses (§ 2.1.2). This behavior appears to be partially need-driven, with people suffering from chronic or stigmatized conditions more likely to seek help online. It is also partially driven by a digital divide, in which those with ready Internet access and technical skills (i.e., younger, wealthier, and more educated people) are more likely to seek health information online. One exception to the digital divide pattern is gender: 55% of online health seekers are female (§ 2.1.3). 9 Sites that promote eating disorders. CHAPTER 2. THE INTERNET AND HEALTH 20 While medical information portals such as WebMD and MedlinePlus are heavily utilized, most health information quests begin with search engines. A significant proportion (8-25%) of online health information seekers eventually participate in an OHC (§ 2.1.4). The primary reason for participating in an OHC is to find others who share the same health concerns. While we know that people with stigmatized, or otherwise embarrassing, medical conditions are more likely to participate in OHCs, we know little else about participant demographics, which are rarely studied. Given the demographic specificity of many medical conditions (e.g., only women acquire breast cancer), it is likely that such demographics vary widely across conditions (§ 2.2.2). Users perceive several benefits to participating in OHCs, which we can categorize into: mediumbased affordances – unique and valuable characteristics that OHCs have by nature of being an online, digital resource; informational support benefits; and emotional support benefits (§ 2.2.3). While acquiring an objective assessment of an OHC’s efficacy is challenging, participation does appear to impart psychosocial benefits on users, and may play a role in measurably reducing certain symptoms. However, the answer to whether OHC participation benefits those afflicted with conditions that are stimulated by social contact with similar patients, such as eating disorders, is less clear (§ 2.2.4). Chapter 3 Prior Work on Patient Authored Text A great deal of prior work utilizes patient authored text (PAT) as a primary data source. Despite this, to our knowledge no organized review of data sources, methods, goals and outcomes of such work exists. Our goal in this chapter is to motivate the utility of PAT as a data source and provide a structured framework over relevant prior work. We first scope our definition of PAT, and discuss its latent value as a data source as well as the challenges it poses for analysis (§ 3.1). We then review prior work that uses PAT as a primary data source. This work tends to fall into one of four categories: syndromic surveillance (§ 3.2), pharmacovigilance (§ 3.3), entity extraction (§ 3.4), and thematic analysis (§ 3.5). Finally, we summarize our findings (§ 3.6). 3.1 Patient Authored Text (PAT): Introduction & Overview We define patient authored text (PAT) as any online, medical text authored by someone who is not a medical professional. A main source of PAT is online health communities (OHCs): online discussion forums dedicated to specific health topics where people converse in the form of posted messages. MedHelp1 , PatientsLikeMe2 and CureTogether3 are all examples of OHCs. Other sources of PAT include search logs, social media data (e.g. Twitter4 and Facebook5 ), personal blogs (e.g. Lady of Lyme6 ), and email. 1 http://www.medhelp.org 2 http://www.patientslikeme.com 3 http://www.curetogether.com 4 http://www.twitter.com 5 http://www.facebook.com 6 http://www.ladyoflyme.com 21 CHAPTER 3. PRIOR WORK ON PATIENT AUTHORED TEXT 3.1.1 22 Value of PAT In the process of creating PAT, users are documenting medical data, making sense of it, prioritizing it, and synthesizing it in order to solve problems that are relevant to them. This is time intensive work, performed by agents who may well make up in motivation for what they lack in medical expertise. The resulting text is rich in medical information, with users recording medical histories, comparing treatments, detailing symptoms and reasoning about differential diagnoses. At a minimum, this culminates in a unique record of patient behavior outside of the clinical environment. In the case of stigmatized or otherwise embarrassing conditions7 , PAT may well contain medical data that is rarely captured elsewhere. For example, someone struggling with substance abuse might detail her self-prescribed treatment schedule for withdrawal. In concert, then, PAT comprises a valuable and, in many cases, unique medical data set that is abundant and readily available. However, PAT is also challenging to work with. 3.1.2 Challenges of Working with PAT PAT is notoriously difficult to work with. We attribute this to three main reasons: it’s inherent noisiness; the lack of existing tools for exploring and analyzing it; and the fact that it is often difficult to discern whether PAT supports any given research question. As we will show in § 3.2-§3.5, prior work tends to compensate for these challenges by either fixing some variables in a quantitative analysis, or by conducting small-scale, qualitative analyses. Noisiness On the text level, PAT is riddled with spelling and grammatical errors. Compared with expert-authored text, differences include lexical and semantic mismatches [167, 272], mismatches in consumers’ and experts’ understanding of medical concepts [99,272] and mismatches in descriptive richness and length [99, 167,272]. Consider, for example, the text snippets below, both discussing the predictive value of a family history of breast cancer. The first snippet is from a medical study by De Bock et al. [68]: In our study, at least 2 cases of female breast cancer in first-degree relatives, or having at least 1 case of breast cancer in a woman younger than 40 years in a first or second-degree relative were associated with early onset of breast cancer. 7 In Chapter 2 we note that people suffering from stigmatized conditions are more likely to seek help online and to participate in OHCs. CHAPTER 3. PRIOR WORK ON PATIENT AUTHORED TEXT 23 The second (unedited) snippet is from the MedHelp breast cancer community: im 40yrs old and my mother is a breast cancer surivor. i have had a hard knot about an inch long . the knot is a little movable. the knot has grew a little over the past year and on the edge closest to my underarm. i am scared and dnt want to worry my mom .. Moreover, PAT contributors vary widely in their level of medical expertise, command of medical jargon, and the frequency with which they document their experiences online. Most PAT would be considered unusable from a medical perspective: symptom descriptions, treatments and medical histories are incomplete, and basic demographic data is absent. Lack of Analysis Tools The dearth of tools and methods for mining PAT is likely exacerbated by its noisiness and inconsistencies. As we discuss in § 3.4, the handful of medical annotation toolkits that do exist are tailored to process well formatted, expert-authored text (e.g. clinical text, journal publications), and perform poorly on PAT. As a result, exploring PAT corpora is costly, often requiring researchers to build ad hoc tools for large scale annotation and extraction. Moreover, as there is no systematic method for exploring the space of possible approaches to extracting medically useful information from PAT, these ad hoc tools are often not recyclable. Applicability to Research Questions The question of whether or not a PAT corpus supports a given research question is not always obvious, and depends very much on users’ reasons for authoring the PAT in the first place. Finding a tight match between a research question and users’ motivations for authoring PAT is crucial for success. For example, search logs are an appropriate data source for monitoring influenza trends, because users are motivated to search for their symptoms when they get sick. However, Twitter would be an inappropriate data source for mining optimal drug dosages, as users tend not to tweet this information en masse. Determining what data PAT encodes, and how it is encoded, is a costly investment and a separate challenge from extracting these data. CHAPTER 3. PRIOR WORK ON PATIENT AUTHORED TEXT 3.2 24 Syndromic Surveillance Syndromic surveillance – also known as early warning, outbreak detection, or biosurveillance – is the utilization of health-related data for the purpose of detecting, analyzing and monitoring potential disease outbreaks [128]. Syndromic surveillance systems do not necessarily utilize online data: the first such systems were developed to give advanced notice of bioterrorism attacks – in particular, those related to anthrax – after 9/11, and utilized data such as pharmacy purchases and emergency room visits [35,127, 128, 163, 207]. However, building syndromic surveillance systems based on PAT is appealing for a number of reasons. The first is users’ proclivity for seeking health information online. For example, it is fairly common for users to search online for symptoms that they are experiencing, or for conditions that they believe they might have [156, 254, 256]. As such, data useful for syndromic surveillance tends to accrue naturally, which is preferable to resource-intensive, manual data collection [128, 262]. In addition, collecting and analyzing online data is fast, enabling advanced (or even real-time) detection of outbreaks, which is not possible using traditional syndromic surveillance systems [41, 100, 128]. The best known example of a PAT-based syndromic surveillance system is likely Google Flu Trends8 , which estimates regional flu activity from aggregated search queries [41]. Google Flu Trends can often identify flu outbreaks a full 1-2 weeks ahead of the CDC, which bases its reports on laboratory and clinical data [41]. However, the system is vulnerable to anomalous situations, such as outbreaks of new influenza strains, or particularly bad influenza seasons [38]. Other challenges to syndromic surveillance systems based on PAT include their vulnerability to changes in users’ online health seeking behavior [38, 262], making it difficult to estimate false positive and false negative rates [262]. Finally, a successful syndromic surveillance system requires that a sufficient portion of the population of interest is seeking health information online, which is not always the case. Below, we outline the chief components of syndromic surveillance projects. 3.2.1 Condition Typically, syndromic surveillance systems focus on a single medical condition of interest. To date, the majority of work on syndromic surveillance focuses on influenza [10, 15, 55, 56, 62, 63, 81, 100, 132, 137, 152, 198]. Exceptions include investigating general infectious disease outbreaks [33, 52, 109, 262], Lyme 8 http://www.google.org/flutrends/us CHAPTER 3. PRIOR WORK ON PATIENT AUTHORED TEXT 25 Disease [221], and potential foodborne illness outbreaks at restaurants [213]. Syndromic surveillance techniques have also been used to monitor “non-outbreak” conditions or behaviors. For example, Cooper et al. [53] use syndromic surveillance techniques to monitor cancer prevalence, while Ayers et al. [18] use them to track the popularity of electronic nicotine delivery systems (e-cigarettes). 3.2.2 Data Source People searching for their own symptoms online is a well documented phenomenon [254, 257]. Accordingly, search logs are a natural choice for a syndromic surveillance data source, and are successfully utilized in several instances of prior work [18, 53, 100, 132, 198, 221]. More recently, Twitter has come to light as another suitable source [10, 15, 62, 63, 152, 213], suggesting that users are prone to mentioning when they, or someone around them, falls ill. Rarer data sources include blogs [55, 56], website access logs [137], and aggregated web data (a combination of search logs, news articles, RSS feeds etc.) [33, 52]. The latter may be particularly appropriate when trying to survey regions in which the population of interest has limited education and/or Internet access, such as developing countries. 3.2.3 Filtering As syndromic surveillance aims to correlate online frequency data with real-world epidemiological trends, separating signal from noise in the data stream is important. Mentions of a condition do not necessarily correlate with real-world instances of it [152]. On the simple end of the spectrum is keyword filtering. While common [10, 18, 53, 55, 56, 198, 221], this approach has several shortcomings. First, relying on a static set of keywords makes the system susceptible to over-fitting [62], as well as fluctuations in the use of those keywords that are unrelated to the disease in question [15, 38, 100]. For example, a news story on flu could galvanize a “burst” of online activity around the topic of flu, even while infection levels in the population remain unchanged. Finally, although keywords are occasionally picked in a principled and consistent manner (e.g. Ginsberg et al. [100] pick keywords based on how their frequency fluctuations correlate with regional influenza activity), in general selection is arbitrary and prone to human misjudgment. For example, spelling variations of keywords may be ignored [56]. Other work indicates that more nuanced filtering yields higher quality results [62, 152]. One such approach is to train statistical classifiers to automatically identify whether a datum is relevant or not. Both Support Vector Machines (SVMs) [15,193] and other simple bag-of-words models [52,62,63] have been CHAPTER 3. PRIOR WORK ON PATIENT AUTHORED TEXT 26 successfully leveraged to identify data that correspond to actual influenza infections. Moreover, Lamb et al. [152] show that using binary classifiers to acquire even more detailed information (specifically, whether a tweet is about the author or about someone else; whether a tweet represents an awareness vs. an instance of flu; and whether a tweet is flu-related or not) greatly improves prediction. 3.2.4 Modeling and Prediction In the case of syndromic surveillance systems that focus on a specific condition (e.g. influenza), linear models are commonly used to predict trends from the filtered data [10, 62, 63, 100, 152, 198]. Simpler approaches do not model the filtered data, deeming frequency counts sufficient for reflecting real-world trends [15, 53, 55, 56, 137, 221]. The few syndromic surveillance systems attempting to monitor a range of diseases require the additional step of identifying specific diseases and geographic locations [33, 52]. Of note is the approach used by Paul et al. [193], who use topic modeling over their filtered data to acquire distributions of ailments over time. One key advantage of this approach is its ability to surface new diseases without manual intervention [193]. 3.2.5 Real-World Evaluation Dataset In order to prove the utility of a syndromic surveillance system, a corresponding real-world metric of the same phenomenon that the system is trying to measure is required for comparison. In the case of influenza, the CDC frequently releases timely data on cases of influenza-like illnesses detected through its traditional surveillance systems9 . It is likely that the availability of this data set is the driving force behind the fact that almost all PAT-based syndromic surveillance research focuses on the topic of influenza. 3.3 Pharmacovigilance Pharmacovigilance is concerned with detecting, monitoring and preventing adverse affects related to pharmaceutical products. Like syndromic surveillance, traditional Pharmacovigilance systems are offline, typically comprising adverse drug event reports contributed by patients, physicians and pharmacists, which are collected by the United States Food and Drug Administration10 . Many of the appeals 9 http://www.cdc.gov/flu/weekly/fluactivitysurv.htm 10 http://www.fda.gov CHAPTER 3. PRIOR WORK ON PATIENT AUTHORED TEXT 27 of making online-based syndromic surveillance systems apply to Pharmacovigilance. However, by construction Pharmacovigilance is a more complex problem: whereas syndromic surveillance systems typically monitor only a single variable (e.g. how many people have the flu), an adverse event involves at least two elements: a drug and an adverse effect (e.g. unexpected side effects). Extracting such entities can be challenging. Unlike syndromic surveillance, prior work on Pharmacovigilance addresses a wide array of topics and conditions. Below, we discuss important components of Pharmacovigilance systems. 3.3.1 Data Source In order to leverage the advantages of both scale and relevant content, researchers must find a large source of PAT where patients typically disclose both which drugs they use as well as adverse events they experience. Online health communities (OHCs) are rich with discussions disclosing users’ medications, symptoms and current health states (see Chapter 2). Accordingly, almost all work on PAT-based Pharmacovigilance utilizes OHC communications as a primary data source [23, 45, 154, 171, 183, 265, 266, 268, 269]. To our knowledge, the only exception to this is also arguably the most successful & impactful work on Pharmacovigilance: White et al [257] successfully utilize search query logs to discover a novel adverse drug-drug interaction, which was later proved in medical trials. 3.3.2 Identifying Drugs in PAT Identifying drugs in PAT is challenging: in addition to the many spelling variations of a drug that might be present in a PAT data set, users may mention several drugs at once, making it difficult to tell which one is responsible for the adverse event [119]. Accordingly, only a handful of prior Pharmacovigilance work attempts to explicitly identify drugs related to adverse events in a data set. Yang et al. [265, 266] extract drug entities using a lexicon, and Yates et al. [269] train a conditional random field (CRF) model for this purpose. A more common approach is to pre-select a small number of drugs of interest, filter the original data set for mentions of these drugs, and then attempt to extract adverse events from these filtered data [154, 171, 183, 257, 268]. Chee et al. [45] take a different approach that is worth noting. Rather than attempting to extract {drug, adverse event} pairs, they use an ensemble classifier over OHC text to identify drugs that are similar to “watch list” drugs: drugs that already have adverse effects reported by the FDA11 . Unfortunately this method gives no insight into why a drug might be worthy of inclusion on such a list. 11 http://www.fda.gov/Safety/MedWatch CHAPTER 3. PRIOR WORK ON PATIENT AUTHORED TEXT 3.3.3 28 Identifying Adverse Events in PAT Unlike the drug involved in an adverse event, the adverse events themselves are rarely fixed: typically a Pharmacovigilance system will attempt to identify any adverse event related to a particular drug. The list of extracted events is then somehow ranked and given to an human reviewer for analysis. Yang et al. [265,266] and Lehman et al. [154] identify adverse events in PAT by first compiling lexicons describing adverse events, and then scoring matches against sliding n-gram windows over PAT sentences. Yates et al. [269] train a CRF to identify adverse events in PAT. Nikfarjam et al. [183] learn patterns from text about known adverse drugs; they then apply these patterns to identify new adverse events. White et al. [257] are the sole exception to extracting an open set of adverse events: rather, they limit their extraction to a pre-specified set of symptoms related to hyperglycemia. The fact that theirs is arguably the most successfully Pharmacovigilance system to date suggests that this may be a promising approach. 3.3.4 Evaluation In general, evaluating the efficacy of Pharmacovigilance systems is difficult: results typically contain several known indications; the remaining result elements are either false positives, or true positives that have yet to be detected via traditional reporting mechanisms. In general, most work serves as a proof of concept that some adverse drug events manifest in PAT, but there is little quantification of how many and how strongly different events are represented. Most importantly, determining how to surface the most relevant true positives remains an area for future work. The work by White et al. [257], which rigorously demonstrates the existence of the connection between paroxetine, pravastatin and hyperglycemia in PAT (predating the FDA’s discovery of this), comes closest to proposing a methodology for doing this. However, their approach lacks flexibility in that both their drugs and adverse events of interest were predefined. 3.4 Named Entity Recognition Named entity recognition (NER) is an information extraction task in which the goal is to develop methods that automatically identify entities of a specific type from text. For example, extracting drugs, adverse events or symptoms from medical records are all NER tasks. In general, there are two ways to go about medical NER in PAT: the first is to use state of the art ontology-based tools, which work “straight out of CHAPTER 3. PRIOR WORK ON PATIENT AUTHORED TEXT 29 the box”, but have poor performance on PAT. The second is to use custom statistical classifiers, which tend to have high accuracy, but require large volumes of labeled data for training and testing. We discuss each in detail below. 3.4.1 Ontology-Based Tools Historically, the go-to tools for medical text annotation are MetaMap12 [17] and, more recently, the Open Biomedical Annotator (OBA)13 [138]. These tools are ontology-based, meaning that they search through text for matches against underlying ontologies (curated vocabularies of medical terms and the relationships between them) [17, 138]. While these tools are capable of fine-grained entity resolution, a previous study [201] comparing OBA and MetaMap against human annotator performance underscores two sources of performance error on PAT. The first is ontology incompleteness, which results in low recall, and the second is inclusion of contextually irrelevant terms. For example, when restricted to the RxNORM ontology and semantic-type Antibiotic (T195), OBA will extract both “Today” and “Penicillin” from the sentence “Today I filled my Penicillin rx”. We observe the same limitations in Chapter 5 and in later collaborative work with Gupta et al. [112]. Despite recent efforts to develop an ontology suitable for PAT - the open and collaborative Consumer Health Vocabulary (OAC) CHV [77, 273, 274] - we suspect that tools like MetaMap and OBA will remain ill-suited to the task of medical term identification in PAT due to structural differences between PAT and text authored by medical experts that we discuss in § 3.1.2. Finally, in addition to including misspellings and slang, consumer medical jargon may evolve over time as patients acquire expertise. 3.4.2 Statistical Classifiers A natural alternative to ontology-based tools are statistical classifiers, which can be trained to extract biomedical entities of interest with high accuracy. However, such methods require sizable corpora of labeled data for training and evaluation. This is problematic in the medical domain, as having medical experts annotate text is both expensive and time consuming. Only a handful of publicly available annotated medical corpora exist, all of them comprised of annotated biomedical journal publication abstracts (i.e. expert authored text) [145, 146, 204, 271]. This has had the dual effect of generating a plethora of prior work demonstrating the efficacy of statistical-based approaches to biomedical 12 http://metamap.nlm.nih.gov 13 http://bioportal.bioontology.org/annotator CHAPTER 3. PRIOR WORK ON PATIENT AUTHORED TEXT 30 NER [76, 87, 95, 124, 125, 214, 238, 239, 267], but little work that explicitly examines PAT as a potential data source. Our work on ADEPT (Chapter 5) is an exception to this. By proving that crowdsourcing medical term annotations yields labels comparable in quality to experts’, we were able to use crowd-labeled PAT to train a conditional random field (CRF) classifier to identify medically-relevant terms in PAT. However, we also find that crowdsourcing is not always a ready solution to PAT annotation tasks (§ 5.7). In Chapter 7 we show that a CRF similarly extracts users’ drugs of choice (preferred substances of abuse) from PAT from a manually-labeled data set. Later work in collaboration with Gupta et al. [112] shows that the unsupervised method of lexico-syntactic pattern induction is a promising approach for extracting specific types of biomedical entities (including symptoms & conditions, as well as drugs & treatments) from PAT. This approach is also employed by Xu et al. [264], although our method achieves higher scores. Finally, other work demonstrating entity extraction on PAT includes some of the work discussed in Pharmacovigilance (§ 3.3), which utilizes CRFs [269] and pattern learning [183] to extract drugs and adverse events from PAT. 3.5 Thematic Analysis Thematic analyses (sometimes called content analyses) involve the systematic reading of text with the goal of eliciting a taxonomy (i.e., an organized collection of significant patterns and themes) that describes the source data. While some literature outlines standard practice for thematic analyses [30, 110, 236], it is infrequently referenced, and methods utilized in applied research tend to be somewhat ad hoc. Thematic analysis is the most extensively used qualitative analysis technique [110], and in our experience, the most common type of analysis applied to PAT, easily outnumbering work on syndromic surveillance, pharmacovigilance, and Named Entity Recognition. This is likely due to the fact that (1) thematic analyses are easy to apply: any kind of text is a suitable candidate for thematic analysis, which is not true for quantitative analyses requiring automated extraction, (2) they are interesting: the results of a thematic analysis over PAT almost always satisfy our latent curiosity about what people actually do online in relation to their own health, and (3) they are useful: illuminating corpus content via thematic analysis is a sensible precursor to higher-investment, quantitative research with automated components. Below, we discuss compare and contrast prior work that conducts thematic analyses on PAT. CHAPTER 3. PRIOR WORK ON PATIENT AUTHORED TEXT 3.5.1 31 Condition There is a great deal of diversity in the conditions studied via thematic analysis. Stigmatized, or otherwise embarrassing, conditions receive notably more coverage than they do in syndromic surveillance, pharmacovigilance or NER. Examples include smoking cessation [180, 197], infertility [160, 161], HIV/AIDS [61, 177], Huntington’s disease [59], irritable bowel syndrome [58], and post-partum depression [69]. Underlying the interest in these topics is likely the fact that PAT comprises a unique data source, especially for stigmatized conditions. Another common topic of study are conditions that have a behavioral component through which the user can directly influence health outcomes. These include diabetes [107,206], smoking cessation [180,197], weight loss and fitness [134,142,217,240], and general wellness [108]. 3.5.2 Data Source The majority of thematic analyses focus on online health communities (OHCs) [29, 34, 58–61, 101, 134, 160, 161, 177, 206, 220, 233], a natural choice given the volume and richness of OHC text. However, contemporary PAT thematic analyses also turn to Twitter [69, 135, 142, 170, 180, 218, 234, 235, 240] and Facebook [22, 71, 107, 197]. Other data sources include search logs [224, 255, 256], email [13], and personal blogs [217]. 3.5.3 Analysis Question Thematic analyses are, by nature, exploratory, and researchers leverage them to answer a wide array of questions. A frequent focus is unearthing users’ reasons for participating in a particular OHC, which alludes to the question of what role the community plays in helping users meet their health goals [13, 34, 60, 134, 142, 161, 206, 217, 224, 235]. Results usually contain some interesting insights. For example, Hwang et al. [134] find that online support groups for weight loss are an important source of encouragement as well as friendly competition. Relatedly, Kendall et al. [142] find that people use Twitter to realize their fitness goals in two ways: the first is to publish evidence of having worked out, the second is to publish a commitment to work out in the future. The assumed role of many OHCs is to provide users with support. In such cases, a natural question to ask is what types of support users receive. Results are practically unanimous in noting that users seek primarily informational and emotional support [58, 59, 61, 153, 177, 197, 224]. CHAPTER 3. PRIOR WORK ON PATIENT AUTHORED TEXT 32 In larger communities that are not necessarily specifically health-oriented (e.g. Twitter and Facebook), the research question often takes the angle of, “When people mention X on interface Y, what do they talk about?”. A wide range of health topics have been analyzed on Twitter along these lines, including insomnia [135], epileptic seizures [170], and concussions [234], often with interesting insights. For example, Scanfeld et al. [218] find that Tweets mentioning antibiotics often indicate misuse. McNeil et al. [170] note that most tweets about concussions are in reference to professional sports injuries, and Bender et al. [22] find that a great deal of breast cancer related discussion on Facebook involves fundraising. Finally, a handful of thematic analyses investigate how the experience of an illness can differ by gender. Makil et al. [160, 161] investigate infertility, paying special attention to the experience of men whose partners are infertile. Another topic that has received some attention is how coping and self-help mechanisms differ between people with breast cancer and prostate cancer. In general, these studies find that men seek more informational support and less emotional support than women do [101, 220, 233]. 3.5.4 Scaling Thematic Analyses Only a handful of prior work uses thematic analysis results as the foundation for a larger-scale analysis of PAT. Most notable is that by De Choudhury et al. [69–71], who analyze how postpartum depression (PPD) is characterized on both Twitter and Facebook. Using their findings, they leverage activity and linguistic features to build models that can predict the onset of PPD from Facebook data [71]. Also of note is the work on cyberchondria by White & Horvitz [255, 256], who analyze health-related search logs and leverage the results of their analysis to model anxiety escalation and predict the transition from self-diagnosis to seeking medical assistance. Our work on identifying users’ reasons for participating in Forum77 (Chapter 6) and their transitions through addiction (Chapter 8) also implements scaled thematic analyses. Results of scaled thematic analyses are especially powerful, as they provide both a novel, insightful contextualization of PAT acquired via close reading of a small sample, as well as population-level insights acquired via extending these results through automated annotation and large-scale analysis. As such, their rarity is puzzling: it is possible that many researchers who conduct thematic analyses do not have experience with machine learning. Alternatively, categories derived in thematic analyses may be too fine-grained for classifier training. A final explanation may be that there is sufficient reward for publishing the results of a thematic analysis without investing the resources required to scale it. CHAPTER 3. PRIOR WORK ON PATIENT AUTHORED TEXT 3.6 33 Summary Our goal in this chapter was to motivate PAT as a data source and present a comprehensive overview of relevant prior work. We define PAT as any medical text authored by someone who is not a medical professional (§ 3.1). PAT, which is often the product of many human hours spent on complex health-related problem solving, provides a unique window into patient behavior outside of the clinical environment (§ 3.1.1). However, it is also challenging to work with: PAT is noisy, few tools support mining and exploring it, and determining what medical data PAT encodes, and how, is often unclear upon casual inspection (§ 3.1.2). This underscores the importance of matching research questions with users’ motivations for authoring PAT in the first place. Work utilizing PAT as a primary data source tends to fall into one of four categories. Syndromic Surveillance (3.2) and Pharmacovigilance (3.3) both involve processing large quantities of data in order to monitor health-related variables. Entity extraction (3.4), which lies under the purview of Natural Language Processing and Machine Learning, concerns the identification of specific entities in PAT. Finally, on the qualitative side, thematic analyses (3.5) involve close readings of text in order to gain insight into its structure and content. PAT-based syndromic surveillance systems have great potential in the toolbox of techniques for the real-time monitoring of medical conditions. To date, the majority of such systems focus on the topic of influenza, relying either upon search query logs or Twitter as a primary data source. Filtering the PAT data stream for relevant entities is crucial for a cleaner signal: although keyword-based filtering is popular due to its simplicity, training classifiers to discriminate relevant from irrelevant data produces superior results. Often, frequency counts of these filtered data are compared as-is to real-world gold standards (most commonly, the CDC ILI data set14 ), but prior work shows that linear models built on these data have promising predictive value. Pharmacovigilance (§ 3.3) is concerned with detecting adverse effects related to pharmaceutical products in real-time. PAT comprises a potentially valuable, but difficult to work with, data source for Pharmacovigilance [119]. Most prior work focuses on online health communities (OHCs), although search logs have also been shown to be a viable data source for web-scale pharmacovigilance [257]. While many systems demonstrate the ability to identify {drug, adverse event} pairs, automatically identifying 14 http://www.cdc.gov/flu/weekly/fluactivitysurv.htm CHAPTER 3. PRIOR WORK ON PATIENT AUTHORED TEXT 34 which of these pairs (amongst thousands) are important is an unsolved problem. To date, no work has presented a viable predictive model for adverse drug events. A great deal of work on biomedical named entity recognition (NER) exists. While ontology-based MetaMap and Open Biomedical Annotator are the go-to tools for medical term annotation, they perform poorly on PAT for two reasons: first, ontologies have insufficient coverage of consumer medical terminology. Second, their lack of context sensitivity leads to over-inclusion of irrelevant terminology in results. Statistical classifiers have been shown to achieve high accuracy in biomedical NER tasks. However, these approaches are limited by their requirement for a sizable corpus of annotated data for training and testing. Most research on biomedical NER utilizes existing publicly available data sets, which are based on abstracts from biomedical journal publications. Consequently, little prior work on biomedical NER in PAT exists. Exceptions to this include some of the work on Pharmacovigilance [183, 269], and our work on ADEPT (Chapter 5), identifying drugs of choice (Chapter 7) and using patterns to extract entity types [112, 264]. Thematic analyses over PAT cover a wide array of conditions. However, notably present are stigmatized conditions and conditions that have a behavioral component through which the user can influence health outcomes. Online health communities, Twitter and Facebook are the most commonly utilized PAT sources for thematic analyses. As thematic analyses are exploratory by nature, they are used to answer a wide array of questions. Common topics include elucidating users’ reasons for participating in an online community as well as what kinds of support such a community provides. The results of a thematic analyses can be used to train automatic classifiers, thereby extending the research from a small PAT sample to large PAT corpora. While prior work demonstrates the power and value in this approach, it is rare. In sum, PAT is a valuable data source that has been proven to have clinical value. However, PAT is challenging to work with. To date, prior work on PAT tends to be either structured in such a way as to reduce the number of variables being analyzed, making analysis and evaluation easier (e.g. syndromic surveillance, pharmacovigilance, NER), or focuses on qualitative analyses of PAT (e.g. thematic analyses). Although little work builds automated extraction and analysis on top of the results of a thematic analysis, prior work, as well as our findings in Chapters 6-8, indicate that this approach yields novel and valuable insights. Chapter 4 Data In this chapter we describe our PAT data sets and define terminology relevant to our work. We first present our full MedHelp data set (§ 4.1), which we use in our work on medical term identification (Chapter 5), and define key terminology (§ 4.1.1). We then describe Forum77 (§ 4.1.2), a subset of the MedHelp data set, which we use for our work on addiction (Chapters 6, 7 and 8). Finally, we present our CureTogether data set (§ 4.2), which we use as an independent test set in Chapter 5. We acquired our data sets through research agreements with MedHelp and CureTogether, respectively, who anonymized the data prior to sharing them. 4.1 MedHelp Corpus MedHelp1 is an online health community designed to aid users in the diagnosis, exploration, and management of personal health conditions. The site boasts a variety of tools and services, including over 200 condition-specific user online health communities (OHCs). Our data set comprises all discussions on all of MedHelp’s forums from 2006 through mid-2011: a total of ∼1,250,000 threads. Table 4.1 lists the top 40 MedHelp forums by post volume, along with unique contributor counts. 4.1.1 Terminology Figure 4.1 provides an illustrative example of the composition and content of our MedHelp data. A forum comprises several threads (or discussions) centered around a specific medical condition (e.g. addiction, breast cancer, etc.). A thread is composed of an initiating post, in which the initiator posts new content for the community’s consideration, and a series of response posts, in which respondents contribute to 1 http://www.medhelp.com 35 CHAPTER 4. DATA 36 Table 4.1: Top 40 MedHelp forums ranked by total post count. A ◦ in the Stigmatized column denotes our conservative estimate of whether the condition represented by the forum carries a stigma or is otherwise embarrassing. Stigmatized ◦ ◦ ◦ ◦ ◦ ◦ ◦ ◦ ◦ ◦ ◦ ◦ Forum Post count Unique users Addiction: Substance Abuse 486,972 32,542 Maternal & Child 402,065 45,821 Pregnancy 18-34 364,475 28,321 Hepatitis C 343,433 14,330 HIV Prevention 274,072 27,528 Fertility 243,919 17,391 Women’s Health 208,683 76,221 Thyroid Disorders 169,713 21,939 Multiple Sclerosis 156,500 5,545 STDs 117,462 29,455 Neurology 111,671 47,968 Dermatology 107,134 47,612 Ovarian Cancer 99,954 10,425 Anxiety 98,971 17,373 Herpes 89,792 17,061 Undiagnosed Symptoms 82,301 30,741 Gastroenterology 79,659 32,694 Heart Disease 74,671 22,294 2,122 Hepatitis Social 74,412 Pregnancy 35+ 72,414 5,923 Eye Care 70,744 18,666 3,253 Addiction: Social 68,831 Heart Rhythm 57,001 9,496 Child Behavior 45,660 14,961 Relationships 42,891 4,724 Pain Management 42,099 7,990 Breast Cancer 41,197 10,869 Urology 37,121 17,351 Weight Loss Alternatives 36,925 15,003 Depression 35,614 9,035 Chiari Malformation 32,493 1,892 Sexual Health 32,269 11,344 MedHelp Social 31,800 778 Men’s Health 31,712 14,832 Bipolar Disorder 29,057 3,775 Back & Neck 28,926 13,082 Hepatitis B 28,664 4,621 Ear, Nose & Throat 28,439 14,244 Miscarriages 26,043 3,703 CHAPTER 4. DATA 37 the discussion galvanized by the initiating post. When an initiator posts a response to a thread that she started, this post is called a self-response. While features for sub-discussions (nested responses) as well as picking a “best response” in a thread do exist, they are used infrequently and we do not consider them in our analyses. Moreover, we have neither demographic data (age, geographic location etc.) describing MedHelp users nor page view data describing lurking (reading without posting – see § 2.2.1) behavior. MEDHELP COMMUNITIES ADD/ADHD Addiction (Forum77) Allergies – Food Allergy Arthritis Asthma Autism Back & Neck Bipolar Disorder Bone Cancer Breast Cancer Breastfeeding Cancer Carpal Tunnel Syndr. Celiac Disease Cerebral Palsy Cervical Cancer Chemotherapy DISCUSSION THREAD FORUM77 Suboxone withdrawal By liquid_daisy 10 hours ago 10 3 By liquid_daisy 6/12/2012 I quit cold turkey off 32mgs of suboxone. Today is day 5 and I’m in a lot of pain. I just want to know how long these withd… the best way? By sparklystars 23 minutes ago 3 I want to come off 10 percs per day. Is it better to taper, or to go gold turkey??? oxycodone 12 By oxyuser 5 hours ago I have been taking vics for about 5 years now. At times I have taken as much as 40 a day. I’m sorta on day 3. I took 1 pill y… 300 DAYS for LEX!!! By happystar 6/12/2013 19 Guess who had 10 months clean today!?? LEX, you go girl!!! Great job we are all sooooooo proud of you! Can you withdraw from Lyrica? By fl12abs 6/11/2013 My doctor prescribed Lyrica for chronic Suboxone withdrawal 2 INITIATING POST I quit cold turkey off 32mgs of suboxone. Today is day 5 and I’m in a lot of pain. I just want to know how long these withdrawals will last…? Is there anything I can get OTC that will help??? Thanks. 10 responses Boo28 on 6/12/2012 Congrats on the 5 days clean! 32mgs is a high dose to CT, but doable. First, some questions: are you on any other medications? What other w/d symptom… RESPONSES yellowPop on 6/12/2012 hi congrats and keep posting for support. I myself jumped from 44mgs although it wasn’t pretty. Physical w/ds tend to last 10 days to 2 weeks but everyone is diff… liquid_daisy on 6/12/2012 SELF RESPONSE No diarrhea, just cold sweats. I stay busy so that I don’t let my mind wander. Don’t have much of an appetite, but redbulls seem to help… chugged 4 today alrea… Figure 4.1: Illustrative example of MedHelp and Forum77 content and structure. 4.1.2 Forum77 MedHelp’s largest forum is dedicated to the topic of Addiction: Substance Abuse2 . We dub this community Forum77 3 Our data set covers all Forum77 content from 2007 to mid-2014 (7.5 years), and comprises 80,529 discussions (740,046 total posts) authored by 51,153 unique users. Figure 4.2 illustrates summary statistics describing content and activity on Forum77. As expected, the volume of response posts correlates strongly with the volume of initiating posts; moreover, both experience a slight decline from 2009 - 2014 (Figure 4.2 (A)). While the number of new users to Forum77 varies widely each month, the number of 2 http://www.medhelp.org/forums/Addiction-Substance-Abuse/show/77 3 All of MedHelp’s forums have a unique identifier, and the Addiction: Substance Abuse community’s is 77. We settled on Forum77 as a convenient way to refer to this community. To our knowledge nobody within the community refers to it as Forum77. 100 12 24 36 48 60 100 90 80 70 60 50 40 30 20 10 9 8 7 6 5 4 3 2 1 72 10 20 30 40 50 60 70 80 7/26/2014 20,000 localhost:8081/index_hist.html 7/24/201418,000 16,000 40,000 30,000 20,000 9,000 7/24/2014 10,000 8,000 7,000 6,000 5,000 CHAPTER 4. DATA 4,000 3,000 38 20,000 2,000 14,000 1,000 20,000 10,000 20,000 900 800 700 12,000 600 500 400 300 10,000 10,000 10,000 200 9,000 9,000 8,000 8,000 return users, which comprise the core community base, is more consistent: in any given month there 100 7,000 7,000 90 80 8,000 70 6,000 6,000 60 50 5,000 5,000 40 30 4,000 4,000 6,000 20 3,000 3,000 10 are between 200 - 300 return users participating in the1,000 forum (Figure 4.2 (B)). This is consistent with 9 4,000 8 7 2,000 6 2,000 5 4 2,000 3 2 1,000 0 900 900 1 800 800 user1,000 tenure distribution on Forum77: while most users have of ≤3 1 month, a6 long tail indicates 700 700a0 tenure 600 600 1 2 4 5 7 8 9 10 10 20 30 40 50 60 70 80 500 500 200 400 400 300 300 2008 Finally, 2009 2010 2011 2012 2013 posts several thousand users who have tenure > 1 year (Figure 4.2 (D)). while some initiating 200 200 20,000 12 24 36 48 60 72 12 24 36 48 60 72 18,000 get 16,000 no responses, most get at least one, and modal thread length is 4 posts (Figure 4.2 (C)). 7/26/2014 14,000 localhost:8081/index_hist.html 12,000 800 10,000 20,000 8,000 700 800 800 10,000 6,000 600 8,000 700 700 4,000 7,000 500 600 600 Responding 2,000 New users 6,000 400 0 500 500 5,000 0 1 2 3 4 5 6 7 8 9 10 300 400 400 1,000 4,000 200 300 300 3,000 Return users 100 200 200 Initiating 2,000 200 100 100 1,000 2008 2009 2010 2011 2012 2013 2007 2008 2009 2010 2011 2012 0 Year 48 12 24 36 60 72 0 512 1024 15 36 Year 20 48 25 6030 72 35 40 B User count 20,000 40,000 18,000 30,000 20,000 16,000 10,000 9,000 8,000 7,000 6,000 14,000 5,000 4,000 3,000 2,000 12,000 1,000 900 800 10,000 700 600 500 400 300 8,000 200 6,000 100 90 80 70 60 50 40 4,000 30 20 2,000 10 9 8 7 6 5 40 3 2 0 1 40,000 C 10,000 40,000 30,000 20,000 10,000 9,000 8,000 40,000 7,000 6,000 30,000 5,000 1,000 4,000 20,000 3,000 2,000 10,000 9,000 8,000 7,000 6,000 1,000 5,000 900 800 4,000 700 600 3,000 500 100 400 2,000 300 200 1,000 900 800 700 600 100 500 90 80 400 70 60 300 50 10 40 200 30 20 100 90 80 70 60 10 50 9 8 40 71 6 30 5 4 20 3 28 10 0 9 7 16 5 4 3 2 1 D User count 8,000 800 7,000 700 40,000 6,000 30,000 20,000 600 5,000 10,000 9,000 8,000 7,000 6,000 5,000 500 4,000 4,000 3,000 2,000 400 1,000 3,000 900 800 700 600 500 400 300 2,000 200 100 90 200 80 1,000 70 60 50 40 30 0 100 20 10 9 0 8 7 6 5 4 3 2 1 5 10 15 20 25 30 10 20 30 40 50 60 40+ 35 Thread length 48 (# posts) 24 36 60 12 72 70 80 10 15 20 25 30 35 40 45 50 55 60+ 5 10 20 30 40(months) 50 60 Tenure 10 20 30 40 20,000 18,000 20,000 16,000 18,000 14,000 16,000 http://localhost:8081/index_hist.html 12,000 14,000 10,000 12,000 8,000 10,000 6,000 8,000 4,000 6,000 2,000 4,000 0 2,000 0 1 2 3 4 5 0 9 10+ 0 1 2 3 4 5 50 70 80 70 80 60 E F User count User count Thread count Post count A 1 2 3 4 5 6 7 Initiating posts per user 30 40 50 60 10 20 8 70 6 7 8 9 10 6 7 8 9 10+ Responses per user 80 Figure 4.2: Summary statistics of Forum77 variables: post volume by month (A), user volume by month 20,000 http://localhost:8081/index_hist.html (B),18,000 thread length distribution (C), user tenure distribution (D), user initiating post count distribution (E), 16,000 and14,000 user response post count distribution (F). 12,000 10,000 8,000 6,000 4,000 2,000 0 8,000 7,000 6,000 0 1 2 3 4 5 6 7 8 9 10 8,000 7,000 6,000 5,000 4,000 3,000 2,000 1,000 0 40,000 30,000 20,000 10,000 9,000 8,000 7,000 6,000 5,000 4,000 3,000 2,000 1,000 900 800 700 600 500 400 300 200 100 90 80 70 60 50 40 0 5 10 15 20 25 30 35 40 localhost:8 localhost: CHAPTER 4. DATA 4.2 39 CureTogether Corpus CureTogether4 is an online health community that focuses on collecting structured health information from its members via surveys. The site covers a wide array of medical conditions (589 in our data set), each associated with a curated collection of symptom, treatment, side effect and cause/trigger terms. By focusing on collecting structured data, CureTogether circumvents the problem of extracting medicallyrelevant information from PAT. However, discussion levels on the site are low: our data set contains ∼3,000 free-text posts on a variety of CureTogether’s medical topics. Despite this, these posts are detailed and thoughtful and suffice, in Chapter 5, as a suitable PAT source independent from MedHelp. 4 http://www.curetogether.com Chapter 5 Identifying Medically Relevant Terms in PAT 5.1 Introduction When we began exploring our MedHelp corpus, we realized that our efforts were severely hampered by the absence of a good solution to a seemingly simple problem: identifying the medically relevant terms in PAT. How, for example, might one automatically extract the terms that we have flagged as medically relevant in the following excerpt from MedHelp’s Addiction: Substance Abuse forum? So, I’m 62 hours without pills, and its definitely getting worse, I ache all over, the anxiety is the worst, along with restless legs but I ’m here now, and I’m not sure it can get much worse so hopefully soon I’ll be out the other side. Last night was horrible. I had around 3 hours broken sleep, night sweats and the most awful haunting nightmares when I was sleeping. I’ve taken the l-tyrosine and B6 this morning, I’ll try and force some food down me shortly and then take the rest of the vitamins. The ability to distill medically relevant terms from PAT is useful for exploration: it filters out irrelevant content, allowing for high-level insights into the corpus and facilitating hypothesis generation. More sophisticated analyses can also be implemented on the extracted terms. The results of co-occurrence analyses, for example, can improve query expansion and information retrieval over a corpus [194, 219, 245], or can be used to impose additional structure, such as clustering [39] or hierarchical concept summaries [216], over the source data. In a PAT corpus, significant term co-occurrences could be used to build a “map” of important links between symptoms and treatments. 40 CHAPTER 5. IDENTIFYING MEDICALLY RELEVANT TERMS IN PAT 41 Identifying medical concepts in text is a long-standing research challenge that has spurred the development of several software toolkits [17]. Those such as MetaMap1 and the Open Biomedical Annotator (OBA)2 focus primarily on mapping words from text authored by medical experts to concepts in biomedical ontologies. A biomedical ontology is essentially a controlled collection of terms and the hierarchical relationships between them. Usually, ontological terms are also categorized or typed (e.g., drug, sign or symptom, medical device, etc.). Thousands of biomedical ontologies exist, and differ according to the topic or level of specificity covered by their terms. For example, the MOFEM3 (Emotion Ontology) covers concepts specifically related to affective phenomena, while SNOMED-CT4 (Systemized Nomenclature of Medicine - Clinical Terms) covers a broad array of clinical terms. Curating ontologies is a labor intensive process, in which people must agree on which terms should be included, removed, combined or split, must categorize said terms, and must define their hierarchical relationships. Despite recent efforts to develop an ontology suitable for PAT - the open and collaborative Consumer Health Vocabulary (OAC) CHV [77, 273, 274] - we suspect that tools like MetaMap and OBA will remain ill-suited to the task of medical term identification in PAT due to structural differences between PAT and text authored by medical experts. As we note in § 3.1.2, such differences include lexical and semantic mismatches [167, 272], mismatches in consumers’ and experts’ understanding of medical concepts [99, 272] and mismatches in descriptive richness and length [99, 167, 272]. Finally, consumer medical jargon may evolve over time as a patient acquires expertise. This would be a challenge for ontologies which are, by design, inflexible and brittle. Our goal is to automatically and accurately identify medically relevant terms in PAT. (Note that we do not attempt to map terms to ontological concepts; we view this as a separate and complementary task.) As acquiring annotated data sets is a major obstacle to classifier training, we investigate crowdsourcing as an alternative option to having medical professionals label PAT (§ 5.4). First, we discuss the process of designing the crowdsourcing task (§ 5.4.1). Next, we compare crowdsourced annotations from nonexperts (Amazon’s Mechanical Turk5 workers (Turkers)) and medical experts (Registered Nurses hired via ODesk6 ) (§ 5.4.2). We find that crowdsourcing PAT medical term identification tasks to non-experts achieves results comparable in quality to those given by medical experts (§ 5.4.3). While this result 1 http://metamap.nlm.nih.gov 2 http://bioportal.bioontology.org/annotator 3 http://bioportal.bioontology.org/ontologies/MFOEM 4 http://www.ihtsdo.org/snomed-ct 5 http://www.mturk.com 6 http://www.odesk.com CHAPTER 5. IDENTIFYING MEDICALLY RELEVANT TERMS IN PAT 42 opens a new avenue for rapid and affordable PAT annotation, not all PAT annotation tasks are amenable to crowd labeling (§ 5.4.4). Next, we train a conditional random field (CRF) classifier to automatically identify medically relevant terms in PAT (§ 5.5). Our classifier, trained on 10,000 crowd-labeled PAT sentences, dramatically outperforms state-of-the-art annotation tools MetaMap, OBA and TerMINE (§ 5.5.3). We call our classifier ADEPT (Automatic Detection of Patient Terminology). In an error analysis, we observe that ADEPT has the most trouble correctly classifying “generic” medical terms (e.g.,pills, medicine, doctor) (§ 5.5.3). We attribute ADEPT’s success to the suitability of sentence-level context-sensitive learning models, like CRFs, to PAT medical term identification tasks (§ 5.7). Finally, we demonstrate ADEPT’s efficacy through applying it to text from our MedHelp corpus (§ 5.6). First, we compare the top-50 terms extracted from MedHelp’s Arthritis forum by both ADEPT and the OBA (§ 5.6.1), noting that those recovered by ADEPT are both diverse and richly descriptive of arthritic conditions, while the majority of those recovered by OBA are spurious. Next, we construct a graph of co-occurring terms extracted by ADEPT from MedHelp’s Addiction: Substance Abuse forum, Forum77 (§ 5.6.2). The resulting graph suggests that a primary topic of discussion on the forum is withdrawal, and moreover, that users discuss explicit drugs, especially prescription opioids, on the forum. Our work in Chapters 6, 7 and 8 further explores Forum77 and confirms that these high-level insights are accurate. 5.2 Related Work 5.2.1 Medical Term Identification MetaMap, arguably the best-known medical entity extractor, is a highly configurable program that relates words in free text to concepts in the UMLS Metathesaurus [16, 17]. MetaMap sports an array of analytic components, including word sense disambiguation, lexical and syntactical analysis, variant generation, and POS tagging. MetaMap has been widely used to process data sets ranging from email to MEDLINE7 abstracts to clinical records [17, 31, 43]. The Open Biomedical Annotator (OBA) is a more recent biomedical concept extraction tool under development at Stanford University. OBA is based on MGREP: a concept recognizer developed at the University of Michigan [138]. Like MetaMap, OBA maps words in free text to ontological concepts; its 7 A collection of biomedical publications abstracts. For more information see: http://www.nlm.nih.gov/pubs/factsheets/ medline.html CHAPTER 5. IDENTIFYING MEDICALLY RELEVANT TERMS IN PAT 43 workflow, however, is simpler, comprising a dictionary-based concept recognition tool and a semantic expansion component that finds concepts related to those present in the exact text [138]. A handful of studies compare MetaMap and/or OBA to human annotators, and tend to find the tools wanting. Ruau et al. [212] evaluated automated MeSH annotations on PRoteomics IDEntification (PRIDE) experiment descriptions against manually assigned MeSH annotations. MetaMap achieved precision and recall scores of 15.66% and 79.44%, while OBA achieved 20.97% and 79.48%. Pratt and Yetisgen-Yildiz [201] compare MetaMap’s annotations to human annotations on 60 MEDLINE titles: they found that MetaMap achieved exact precision and recall scores of 27.7% and 52.8%, and partial precision and recall scores of 55.2% and 93.3%. They note that several failures result from missing concepts in the UMLS. This is corroborated in an analysis of 376 patient-defined symptoms from PatientsLikeMe by Smith and Wicks [226], who found that only 43% of unique terms had either exact or synonymous matches in the UMLS; of the exact matches, 93% were contributed by SNOMED CT. In addition to ontological approaches, there are several statistical approaches to medical term identification. NaCTeM’s TerMINE8 is a domain-independent tool that uses statistical scoring to identify technical terms in text corpora [94]. Given a corpus, TerMINE produces a ranked list of candidate terms. In a test on eye-pathology medical records, precision was highest for the top 40 – ranked by C-value – terms (∼75%) and decreased steadily down the list (∼30% overall). Absolute recall was not calculated, due to the time-consuming nature of having experts verify true negative classifications in the test corpus. Recall relative to the extracted term list, however, was ∼97% [94]. As we discuss in Chapter 3, a great deal of prior work has focused on training statistical classifiers for biomedical named entity recognition (NER) tasks [76, 87,95, 111, 124, 125,214, 222, 238,239, 267]. In general, this work demonstrates good results, indicating that statistical classification methods are more appropriate for biomedical NER tasks than MetaMap and OBA. However, none of this work utilizes PAT as a primary data source: statistical classifiers require sizable quantities of labeled data for training and testing, and to date all available such data sets are based on biomedical publication abstracts [145, 146, 204, 271]. 5.2.2 Consumer Health Vocabularies A complementary and closely related branch of research to ours is Consumer Health Vocabularies (CHVs): ontologies that link layman and UMLS medical terminology [85,273]. Supporting motivations for 8 http://www.nactem.ac.uk/software/termine CHAPTER 5. IDENTIFYING MEDICALLY RELEVANT TERMS IN PAT 44 developing CHVs include: narrowing knowledge gaps between consumers and providers [273,274], coding data for retrieval and analysis [77], improving the “readability” of health texts for lay consumers [144] and coding new concepts that are missing from the UMLS [143, 226]. We are currently aware of two CHVs: the MedlinePlus Consumer Health Vocabulary9 , and the open and collaborative Consumer Health Vocabulary10 – (OAC) CHV – which was included in UMLS as of May 2011. To date, most research on CHVs has focused on discovering new terms to add to the (OAC) CHV. In 2007, Zeng et al. [274] compared several automated approaches for discovering new “consumer medical terms” from MedlinePlus query logs. Using a logistic regression classifier, they achieved an AUC of 95.5% on all n-grams not present in the UMLS. More recently, Doing-Harris & Zeng [77] proposed a computer-assisted update (CAU) system to crawl PatientsLikeMe, suggesting candidate terms for the (OAC) CHV to human reviewers. By filtering CAU terms by C-value [94] and termhood [274] scores, they were able to achieve a 4:1 ratio of valid to invalid terms; however, this also resulted in discarding over 50% of the original valid terms. Given the goals of the CHV movement, our CRF model for PAT medical term identification may prove to be an effective method for generating new candidates terms for CHVs. 5.3 Data In this section we describe our data preparation and sampling methods. We use samples from our MedHelp (§ 4.1) data set for comparing crowdsourced vs. expert sourced labels, and for training and cross-validation of our CRF classifier. We use a sample from our CureTogether (§ 4.2) data set as a hold-out gold standard for comparing our CRF classifier to state of the art medical term annotation tools. 5.3.1 Preparation We analyze our data at the sentence level. This promotes a fairer comparison between machine taggers, which break text into independent sentences or phrases before annotating, and human taggers, who may otherwise transfer context across sentences. We use Lucene11 to tokenize our corpora into sentences. For consistency, we excluded sentences from MedHelp forums that we agreed were tangentially medical (e.g.,“Relationships”), over-general (e.g.,“General Health”), or that contain fewer than 1,000 9 http://www.nlm.nih.gov/medlineplus/xml.html 10 http://consumerhealthvocab.org 11 http://lucene.apache.org CHAPTER 5. IDENTIFYING MEDICALLY RELEVANT TERMS IN PAT 45 sentences. The raw MedHelp data set contains approximately 1,250,000 discussions. After preparation, the data set comprises approximately 950,000 discussions from 138 forums: a total of 27,230,721 sentences. 5.3.2 Samples We use the following samples: MH1K : 1,000 MedHelp sentences sampled uniformly at random; labeled by crowd and experts. We use this sample to compare expert and crowd labels. We also use the expert labels as a gold standard for comparing our CRF classifier’s performance against state-of-the-art tools. MH10K : 10,000 MedHelp sentences sampled uniformly at random; labeled by crowd. We use this sample to train our CRF classifier to identify medically relevant terms in PAT. We also use it for 10-fold cross validation of this classifier. CT1K : 1,000 CureTogether sentences sampled uniformly at random; labeled by experts. We use this as an independent gold standard for comparing our CRF classifier performance against those of stateof-the-art tools. 5.4 Labeling Medically Relevant Terms with the Crowd A common barrier to both training and evaluating medical text annotators is the lack of sufficiently large, labeled data sets [17,201]. The challenge in building such data sets lies in sourcing medical experts with enough time to annotate text at a reasonably low cost [201]. Crowdsourcing is the allocation of a series of small tasks (often called micro-tasks) to a “crowd” of online workers, typically via a web-based marketplace. Crowdsourcing is particularly attractive for obtaining results faster and at lower cost than other participant recruitment schemes. When the workflow is properly managed (e.g., via quality control measures such as aggregate voting, or by breaking up tasks into suitable sub-components such the “find-fix-verify” method proposed by Bernstein et al. [26]) the combined results are often comparable in quality to those obtained via more traditional task completion methods [126, 147]. Snow et al. [228] find that non-expert crowds can effectively execute linguistic annotation tasks (affect recognition, word similarity, textual entailment, temporal ordering, and word CHAPTER 5. IDENTIFYING MEDICALLY RELEVANT TERMS IN PAT 46 sense disambiguation) that are typically performed by experts. However, designing a crowdsourcing task such that quality results are obtained is challenging and requires careful though [26, 147]. Replacing medical experts with non-expert crowds would address concerns of time and cost, allowing us to build labeled PAT data sets quickly and cheaply. To test the viability of this idea, we first design a crowdsourcing task for medical term identification in PAT (§ 5.4.1). Next, we deploy this task to both experts (in our case, Registered Nurses, or RNs) and non-experts (Amazon Mechanical Turk workers, or Turkers), and compare their annotations over a sample of 1,000 sentences (MH1K ) (§ 5.4.2). 5.4.1 Task Design and Pilot Study Amazon’s Mechanical Turk12 is an online crowdsourcing platform where workers (Turkers) can browse “human intelligence tasks” (HITs) posted by requesters and complete them for a small payment. We designed a simple interface in which a HIT comprised 100 sentences, each of which was accompanied by a text box into which Turkers could copy medically relevant terms. Our original prompt simply asked Turkers to copy/paste any terms that seemed medically relevant from each sentence into the accompanying text box. The resulting data contained several inconsistencies, including: terms taken out of context: users selected terms that had no medical relevance in the context of the given sentence, but might have medical connotations in other contexts. E.g., “anxiety” in the sentence “I apologize if my post created any undue anxiety”. omission: users would often leave an empty response for a sentence that contained a term that was clearly medically relevant. numerical measurement inclusion: some users felt that numbers corresponding to medication dosages, units of measurement, etc. were relevant, while others did not. concept granularity and scope: in a sentence such as “I have low blood sugar”, users would not know whether to select “low blood sugar” or just “blood sugar”. repetition: if a medically relevant term appeared twice in the same sentence (e.g.,“pain” in “I am in a lot of pain and the meds don’t seem to help, they just take the edge off the pain if anything”), some users would extract it only once, and others would extract it each time it appeared. 12 http://www.mturk.com CHAPTER 5. IDENTIFYING MEDICALLY RELEVANT TERMS IN PAT 47 Prior work shows that the design of a crowdsourcing task and prompt strongly impacts response quality [147]. In order to arrive at a suitable prompt that produced consistent results, we iterated on our original version several times, basing our changes on the design principles outlined by Kittur et al. [147]. We discuss pivotal changes below; Figure 5.1 shows our final prompt and interface. The most problematic inconsistency was terms taken out of context, which amount to unnecessary false positives. Subjective tasks are especially difficult for crowd workers [147], and the medical term identification task is inherently subjective. We discovered, however, that making the task seem less subjective by asking users to tag words/phrases that they thought doctors would find interesting, all but eliminated this effect. The next problematic issue was omission, or unnecessary false negatives. We suspected that one reason Turkers were cheating was because by doing so they could complete the HIT faster. Kittur et al. [147] note that to acquire accurate results from Turkers, malicious completion and good-faith completion should require comparable levels of effort. We changed our interface such that each text box had to contain some value prior to completion of the HIT, and instructed Turkers to type “NA” into text boxes corresponding to sentences containing no medically relevant concepts. This helped somewhat, but it is still easier to type “NA” than to copy/paste several terms into a text box. Kittur et al. [147] also note that signaling to Turkers that their responses will be verified in a believable manner is thought to reduce invalid responses as well as increase time spent on task. Before accepting the HIT, we informed Turkers that four other Turkers would be completing the same HIT, and that their response would be rejected if it disagreed substantially from the others. We enforced this policy. Implementing these changes resulted in a drastic reduction of omissions. Explicitly asking users to ignore numerical measurements and providing illustrative examples on multi-word concepts reduced conflicting incidences of numerical measurement inclusion and concept granularity to the point where aggregating over Turker responses produced a good result. However, similar interventions related to issues of repetition had no effect. Ultimately we propagated the “medically relevant” label to all unlabeled terms in the sentence that matched an extracted term. It is reasonable to assume that two identical terms should carry the same label in a sentence, and we observed no instances in which this assumption was violated. CHAPTER 5. IDENTIFYING MEDICALLY RELEVANT TERMS IN PAT 48 Instructions (please read to get full credit for this task) For this HIT, we would like you to extract all words/phrases that are medical concepts from the sentences below. There are 100 sentences; this should take ~15-25 minutes. To find medical concepts, ask yourself the question: "If I was telling this to my doctor, which words would the doctor find interesting?" To simplify things, do not extract numerical values such as age, weight, gender, medication dosage, symptom duration etc. Do extract concepts describing body parts, conditions (and causes and effects of conditions), symptoms, treatments, etc. Remember that some medically relevant terms are abbreviated (e.g. BS for "blood sugar"). For each sentence, please COPY/PASTE the relevant text EXACTLY (do not re-type it, or correct misspellings), and SEPARATE each concept with a COMMA. For example: I gave up smoking 2 weeks ago, and my blood pressure is under control with verapamil (0.5mg twice a day).. smoking, blood pressure, verapamil For multi-word concepts, include as many words as you can, but make sure that they refer to just ONE concept. Do not extract overlapping concepts. For example, in the sentence below, the term "blood sugar" is preferred to "blood". Shakes in the hands can be symptomatic of low blood sugar. shakes, hand, blood sugar Finally, many of the sentences will contain no medically relevant concepts. Just enter NA in the boxes in these cases. For example: You need to take care of yourself before you can take care of someone else. NA NOTE: you will be able to complete ONLY ONE of these HITs. Please do not attempt to accept another hit after completing this one. Have fun! Figure 5.1: Final PAT medical term identification task instructions and interface. Turkers were informed that their answers would be checked against other Turkers’ in the HIT description on the MTurk interface. Submit 5.4.2 Experiment We use our MH1K data set for this experiment: a uniform sample of 1,000 sentences from the general MedHelp data set. We deemed 1,000 sufficiently large for an informative comparison between RN and CHAPTER 5. IDENTIFYING MEDICALLY RELEVANT TERMS IN PAT 49 Table 5.1: Majority vote at the token level over RN responses. Terms identified by RNs as medically relevant are shown in bold. Stopwords (e.g.,“and”, “of”) are excluded from the vote. RN 1: shakes in the hands can be symptomatic RN 2: shakes in the hands can be symptomatic RN 3: shakes in the hands can be symptomatic Result: shakes hands symptomatic low blood of low blood sugar of low blood sugar blood sugar of sugar Turker responses, but small enough to make expert annotation affordable. We split the sample into 10 groups of 100 sentences. Our experts comprised 30 RNs from ODesk13 , an online professional contracting service. In addition to the RN qualification, we required that each expert have perfectly rated English language proficiency. Each expert did one PAT medical term identification task (100 sentences), and each group of 100 sentences was tagged by three experts, who were reimbursed $5.00 for completing the task. All tasks were completed within two weeks at a cost of $150.00. Our non-expert crowd comprised 50 Turkers recruited from Amazon’s Mechanical Turk (AMT). We required that the Turkers have high English language proficiency, reside in the United States, and be certified to work on potentially explicit content. Each Turker performed a single PAT medical term identification task (100 sentences), and each sentence group was tagged by five Turkers. The Turkers were reimbursed $1.20 upon faithful completion of the task. All tasks were completed within 17 hours at a cost of $60.00. Determining a Gold Standard We determine a gold standard for each sentence by taking a majority vote over the RNs’ responses. Voting is performed at the word level, despite the prompt to extract words or phrases from the sentences. Table 5.1 illustrates how this simplifies term identification by eliminating partial matching considerations over multi-word concepts. N-gram terms can be recovered by heuristically combining adjacent words. Comparing Turkers Against a Gold Standard To test the feasibility of using non-expert crowds in place of experts, we compare Turker to RN responses directly, aggregating across all 5 possible Turker voting thresholds. This allows us both to evaluate 13 http://www.odesk.com CHAPTER 5. IDENTIFYING MEDICALLY RELEVANT TERMS IN PAT 50 Table 5.2: Turker performance against the RN gold standard. Voting threshold indicates the minimum number of Turkers who have to annotate a term as medically relevant for it to be included in the result. Maximum column values are indicated in bold. A corroborative policy of 2+ votes yields high scores across the board, and maximizes F1-score. Vote Threshold F1 Precision Recall Accuracy MCC 1 78.45 67.15 94.31 93.96 0.77 2 84.43 82.53 86.41 96.29 0.82 3 83.80 91.67 77.18 96.52 0.82 4 76.61 95.70 63.87 95.46 0.76 5 59.81 97.99 43.04 93.26 0.62 the quality of aggregated Turker responses against the gold standard and to select the optimal voting threshold. 5.4.3 Results Both the RN and the Turker group achieve high inter-rater reliability scores: κ = 0.709 and κ = 0.707 respectively using Fleiss’ Kappa [88], which measures agreement across two or more voters. Table 5.2 compares aggregated Turker responses against the RNs’ gold standard; voting thresholds dictate the number of Turker votes required for a word to be tagged as “medically relevant”. F1-score is maximized at a voting threshold of 2. We call this a corroborated vote, and select 2 as the appropriate threshold for our remaining experiments. Overall, Turker scores are sufficiently high that we regard corroborated Turker responses as an acceptable approximation for expert judgment. 5.4.4 Limitations of the Crowd Crowdsourcing medical term identification in PAT allows us to build large, annotated data sets both cheaply and quickly. Exploring the crowd’s efficacy at other medical entity annotation tasks is an important avenue for future work. Here, we offer some anecdotal insights based on our own attempts to get the crowd to label specific types of medical terms in PAT. We attempted to implement two tasks similar to that described in § 5.4.1: in the first, we asked Turkers to identify terms referring to symptoms and/or conditions (e.g.,“cough”, “asthma”, “headache”). In the second task, we asked them to identify terms referring to drugs and/or treatments (e.g.,“acupuncture”, “Tylenol”, “cough medicine”). CHAPTER 5. IDENTIFYING MEDICALLY RELEVANT TERMS IN PAT 51 Although Turkers’ seemed to approach the task earnestly (they spent a reasonable amount of time on it), the results were surprisingly inconsistent. In fact, some workers defaulted to labeling any terms that were medically relevant, even though it is unlikely that they had been exposed to the original task described in § 5.4.1, as more than 6 months had since elapsed. Ultimately, we hypothesized that there were three factors explaining Turkers’ poor performance: The first is subjectivity. The task of identifying symptoms or treatments is ambiguous and, in our experience, more subjective than that of identifying terms that are simply medically relevant. For example, do wheelchairs, relaxation classes, birth control or drinking water constitute treatments? Do sensations, flare-ups, pregnant and being worried constitute symptoms or conditions? The answers to these questions tend to be “it depends”. The second is concept scatteredness, which primarily affects the symptom/condition category. Symptom descriptions are often spread across an entire sentence, and Turkers are unsure of how to scope such concepts. Consider, for example, the phrase “after I took the meds I felt like I’d been hit by a truck”. Is “felt like I’d been hit by a truck” a symptom? This challenge is also cited by Leaman et al. [154] in work on mining adverse drug events from user comments on DailyStrength14 . The final factor that likely affected Turker performance was task overlap. The postings of the symptom and/or condition task and the drug and/or treatment tasks were staggered by a couple of days. However, we noticed that some people tried to pick out just drugs and/or treatments in a symptoms and/or conditions task, and vice versa. We attribute such mixups to the fact that the same Turkers who had done the earlier task were also attempting the staggered task, but had habituated to the first task. Allowing more time to elapse before posting the second task, or preventing Turkers from doing both tasks, should ameliorate this effect. We believe that with additional design and iteration, it would be possible to get Turkers to identify specific types of medical terminology in PAT. For example, a multi-tiered approach such as find-fixverify [26] might reduce the level of task subjectivity. Enhancing the interface such that Turkers could select “core” concepts and then related supporting terms might facilitate accuracy. Refining the task to make it more specific would likely reap rewards. For example, instead of asking Turkers to “find terms referring to symptoms or conditions”, they might be asked to “find terms that refer to symptoms related to the condition Asthma”. 14 http://www.dailystrength.com CHAPTER 5. IDENTIFYING MEDICALLY RELEVANT TERMS IN PAT 52 In sum, however, designing a crowdsourcing task can be a resource intensive process, and this must be traded off against alternative annotation methods. In our later work on Forum77, our data were sufficiently small that we elected to annotate it ourselves. However, systematically exploring the design space of crowdsourcing PAT annotation tasks would likely yield high returns in the long term. 5.5 Training a Classifier on Crowd-Labeled Data We now turn to the question of training a statistical classifier to identify medical terms in PAT automatically. We describe the models that we both use and compare against (§ 5.5.1), before describing our experiment design (§ 5.5.2). Next, we present our results (§ 5.5.3), along with a failure analysis of our classifier, ADEPT. Finally, we discuss our results and the limitations of our approach (§ 5.7). 5.5.1 Models MetaMap, OBA and TerMINE We use the Java API for MetaMap 201215 , running it under three con- ditions: default; restricting the target ontology to SNOMED CT (a high percentage of “consumer health vocabulary” is reputedly contained in SNOMED CT [226]), and restricting the target ontology to the (OAC) CHV. We used the Java client for OBA [138], running it under two conditions: default; and restricting the target ontology to SNOMED CT, as the OAC (CHV) was not available to the OBA at the time of writing. For TerMINE, we used the online web service16 . Dictionary A dictionary (or gazette) is one of the simplest classifiers that we can build using labeled training data. Our dictionary compiles a vocabulary of all words tagged as “medical” in the training data according to the corroborative voting policy; it then scans the test data and tags any words that match a vocabulary element. Our dictionary implements case-insensitive, space-normalized matching. ADEPT: A Conditional Random Field Model Conditional random fields (CRFs) are probabilistic graphical models particularly suited to labeling sequence data [151]. Their suitability stems from the fact that they relax several independence assumptions made by Hidden Markov Models; moreover, they can encode arbitrarily related feature sets without having to represent the joint dependency distribution over features [151]. As such, CRFs can incorporate sentence-level context into their inference procedure. For example, a CRF can discern that the word “tired” represents a medical term in the sentence, 15 http://metamap.nlm.nih.gov 16 http://www.nactem.ac.uk/software/termine CHAPTER 5. IDENTIFYING MEDICALLY RELEVANT TERMS IN PAT 53 “I’m feeling so tired, as though I am oxygen deprived.”, but not in the sentence, “I’m tired of feeling as though I am oxygen deprived.”” The term “oxygen deprived” is medically relevant in both sentences17 : Our CRF training procedure takes, as input, labeled training data coupled with a set of feature definitions, and determines model feature weights that maximize the likelihood of the observed annotations. We use the Stanford Named Entity Recognizer package18 , a trainable Java implementation of a CRF classifier, and its default feature set. Examples of default features include word substrings (e.g.,“ology” from “biology”) and windows (previous and trailing words); the full list is detailed in Appendix A. We refer to our trained CRF model as ADEPT (Automatic Detection of Patient Terminology). 5.5.2 Design To test our second hypothesis, we create a crowd-labeled data set comprising 10,000 MedHelp sentences (MH10K ), and a RN-labeled data set comprising 1,000 CureTogether sentences (CT1K ). Using the procedures described in § 5.4, this cost approximately $600 and $150, respectively. We train two models – a dictionary and a CRF – on the MedHelp data set (MH10K ), and evaluate performance via 5-fold cross validation; we compare MetaMap, OBA and TerMINE’s output directly. Finally, we compare the performance of all 5 models against the CureTogether gold standard (CT1K ). 5.5.3 Results Table 5.3 shows the performance of MetaMap, OBA, TerMINE, the dictionary model and ADEPT on MH10K , (MH1K and CT1K ). ADEPT achieves the maximum score in every metric, bar recall. Moreover, its high performance carries over to the Cure Together test corpus, indicating adequate generalization from the training data. Figure 5.2 provides illustrative examples of the models’ performance on sample sentences from MH1K . Failure Analysis While ADEPT’s results are promising, assessing points of failure is useful for future improvements and implementations. Figure 5.3 plots term classification accuracy against logged term frequency in both test corpora. We observe that while most terms are always correctly classified, a number of terms (∼650) are never classified correctly. Of these, almost all (>90%) appear only once in the test corpora. A LOWESS 17 Note: this is actual output from our final classifier. 18 http://nlp.stanford.edu/software/CRF-NER.shtml CHAPTER 5. IDENTIFYING MEDICALLY RELEVANT TERMS IN PAT ADEPT: Dictionary: MetaMap: OBA: TerMINE: it it it it it says says says says says ADEPT: Dictionary: MetaMap: OBA: TerMINE: last last last last last ADEPT: Dictionary: MetaMap: OBA: TerMINE: in in in in in ADEPT: Dictionary: MetaMap: OBA: TerMINE: i i i i i ADEPT: Dictionary: MetaMap: OBA: TerMINE: mgmt mgmt mgmt mgmt mgmt proliferative proliferative proliferative proliferative proliferative summer summer summer summer summer my my my my my had had had had had case case case case case a a a a a i i i i i the the the the the chest chest chest chest chest retail retail retail retail retail sales sales sales sales sales ductal ductal ductal ductal ductal was was was was was at at at at at woman woman woman woman woman xray xray xray xray xray hyperplasia hyperplasia hyperplasia hyperplasia hyperplasia home home home home home my my my my my done done done done done not not not not not without without without without without with with with with with my my my my my husband husband husband husband husband and and and and and overweight overweight overweight overweight overweight they they they they they daughter daughter daughter daughter daughter had had had had had said said said said said good good good good good atypia atypia atypia atypia atypia an an an an an affair affair affair affair affair there there there there there almost almost almost almost almost was was was was was great great great great great and and and and and who who who who who 54 non-proliferative non-proliferative non-proliferative non-proliferative non-proliferative is is is is is with with with with with now now now now now in in in in in ecstasia ecstasia ecstasia ecstasia ecstasia without without without without without carcinoma carcinoma carcinoma carcinoma carcinoma 2 2 2 2 2 reassured reassured reassured reassured reassured something something something something something duct duct duct duct duct him him him him him my my my my my twice twice twice twice twice she she she she she had had had had had no no no no no stds stds stds stds stds lung lung lung lung lung posture posture posture posture posture Figure 5.2: Sample sentences labeled by ADEPT, the dictionary, MetaMap, OBA and TerMINE. Table 5.3: Annotator performance against the crowd-labeled data set and the gold standards. Maximum column values are indicated in bold. Validation data set Crowd-labeled MH10K MedHelp gold standard MH1K CureTogether gold standard CT1K Annotator F1 Accuracy MCC MetaMap 64.20 70.44 0.24 Default 55.85 76.83 0.26 SNOMED CT 24.48 60.63 74.75 0.26 CHV 43.77 30.20 79.53 77.21 0.39 Default 43.23 36.15 53.76 84.25 0.35 SNOMED CT Dictionary 46.18 32.34 80.75 79.02 0.42 ADEPT 78.41 82.66 74.59 95.42 0.76 OBA Precision Recall 32.64 21.88 34.97 25.45 34.88 Parameters MetaMap 37.73 28.03 57.67 77.82 0.29 SNOMED CT OBA 45.78 32.10 79.31 78.04 0.41 SNOMED CT TerMine 42.35 52.67 35.41 88.77 0.37 Dictionary 37.30 26.34 63.89 74.98 0.29 ADEPT 78.33 82.55 74.53 95.20 0.76 MetaMap 39.12 29.33 58.57 74.13 0.27 SNOMED CT OBA 47.28 33.56 79.91 74.74 0.40 SNOMED CT TerMine 43.09 53.11 36.25 86.43 0.37 Dictionary 38.74 27.53 65.35 70.65 0.27 ADEPT 77.74 78.82 76.69 93.78 0.74 Adept Chart 4/27/13 4:29 PM CHAPTER 5. IDENTIFYING MEDICALLY RELEVANT TERMS IN PAT 55 100 90 80 Classification accuracy (%) 70 60 50 40 30 1 term 20 10 terms 10 100 terms 500 terms 0 0 1 2 3 4 5 6 7 ln(frequency) of term in test corpora Figure 5.3: Term classification accuracy plotted against logged term frequency in test corpora. Purple (darker) circles represent terms that are always classified correctly; blue (lighter) circles represent terms that are misclassified at least once. A LOWESS fit line to the entire data set (black) shows that most terms are always classified correctly. A LOWESS fit line to the misclassified points (blue/lighter) shows that classification accuracy increases with term frequency. fit to the points representing terms that were misclassified at least once shows that classification accuhttp://localhost:8999/scatter.html racy increases with term frequency in the test corpora (and by logical extension, term frequency in the training corpus). As we might expect, over half (∼51%) of the misclassified terms occur with frequency one in the test corpora. A review of these terms reveals no obvious term type (or set of term types) Page 1 of 1 CHAPTER 5. IDENTIFYING MEDICALLY RELEVANT TERMS IN PAT 56 Table 5.4: Examples of ADEPT’s misclassifications in the test corpora. Frequently Misclassified (FP > 1, FN > 1) baby, bc, condition, doctor, doctors, drs, health, ice, natural, relief, short, strain, weight Mostly False Positive (FP > 1, FN ≤ 1) accident, decreased, drinks, drunk, exertion, external, healthy, heavy, higher, lie, lying, milk, million, pants, periods, prevention, solution, suicidal . . . [37 more terms] Mostly False Negative (FP ≤ 1, FN > 1) appointment, clear, copd, hiccups, lack, ldn, massage, maxalt, missed, nurse, physician, pubic, rebound, silver, sleeping, smell, tea, treat, tree, tx . . . [41 more terms] Infrequently Misclassified (FP ≤ 1, FN ≤ 1) cravings, generic, growing, hereditary, increasing, lab, limit, lunch, panel, pituitary, position, possibilities, precursor, taste, version, weakness . . . [118 more terms] likely to be incorrectly classified. Indeed, many are typical words with conceivable medical relevance (e.g.,gout, aggravates, irritated). Such misclassifications would likely improve with more training data, which would allow ADEPT to learn new terms and patterns. It remains to investigate terms that are both frequent and frequently misclassified. Table 5.4 shows terms from the test corpora that ADEPT misclassifies at least once. Immediately obvious is the presence of terms that are medical but generic, such as doctor, doctors, drs, physician, nurse, appointment, condition, and health. These misclassifications likely stem from ambivalence in the training data; indeed, Yetisgen-Yildiz and Pratt [201] find that human annotators have low certainty over whether to include general terms such as these in medical term annotation tasks. In either case, specific instructions to human annotators on how to handle generic terms, or rule-based post-processing of annotations, could ameliorate such errors. 5.6 Example Applications of ADEPT to PAT To illustrate ADEPT’s efficacy, we present two applications to PAT corpora. The first is to MedHelp’s Arthritis forum, with an eye to summarizing its important medical concepts. In this application, we compare ADEPT’s output with OBA’s. Our second application is to Forum77, MedHelp’s Addiction: Substance Abuse forum, in which our goal is to generate a high-level concept map of its medically relevant content. CHAPTER 5. IDENTIFYING MEDICALLY RELEVANT TERMS IN PAT 5.6.1 57 Summarizing Important Medical Content in MedHelp’s Arthritis Forum A simple way of summarizing the medical content in a PAT corpus is to simply rank all relevant terms by frequency, and select the top N . Figure 5.4 compares the top 50 medical terms in MedHelp’s Arthritis forum as determined by ADEPT and the OBA. (We picked OBA instead of MetaMap due to its superior performance – see Table 5.2). The terms recovered by ADEPT are both diverse and richly descriptive of arthritic conditions; in contrast, the majority of terms recovered by the OBA are spurious, and serve only to demote the rankings of the few relevant terms that it does find. 5.6.2 Navigating MedHelp’s Substance Abuse Forum (Forum77) A natural way of acquiring a casual overview of a corpus’ content is to visualize both the important medical terms, as well as significant relationships between them. Including term relationships imparts an extra layer of insight to the underlying content. For example, if drug terms tend to co-occur in sentences, then it is likely that users compare drugs in their discussions. On the other hand, if drug terms tend to co-occur with symptom terms, then discussions likely document which drugs treat specific symptoms. To acquire a high-level topography of Forum77’s medical content, we first apply ADEPT to the Forum77 corpus. Filtering out infrequent terms (terms that appear < 10 times in the corpus), we score connections between remaining co-occurring terms with the G2 metric, which rewards significant (or interesting) co-occurrence relationships over common ones [78]. We then use Gephi19 , a tool for graph analysis and visualization, to explore the results interactively. Note that what follows is a casual analysis in which we utilize Gephi’s internal filtering and clustering features to facilitate rapid exploration. Our goal is to illustrate a typical point of departure in exploring a novel corpus of ADEPT-extracted PAT terms. Figure 5.5 shows a co-occurrence graph over ADEPTextracted Forum77 terms, with node labels omitted to illustrate the underlying graph structure. Immediately obvious is the presence of two, large, interlinked clusters (dark and light blue). A third cluster (dark green) is more independent. We examine each of these clusters in greater detail by filtering out non-member nodes, and recalculating the graph layout. Figure 5.6 shows the largest (light blue) cluster with node labels. This cluster appears to detail general aspects of addiction related to detoxification: suboxone and methadone are synthetic opioids used in opioid-replacement therapy; detox and taper are direct detoxification references; many other nodes 19 http://gephi.github.io CHAPTER 5. IDENTIFYING MEDICALLY RELEVANT TERMS IN PAT ADEPT pain arthritis symptoms joints knees feet hands swelling neck knee fingers ankles legs tests joint rheumatologist diagnosed swollen meds disease surgery treatment leg shoulder spine doctor inflammation wrists test stiffness painful diagnosis arms toes fatigue shoulders joint pain wrist bone muscles arm osteoarthritis foot hip medication negative positive skin cold OBA have pain doctor arthritis like help time years symptoms right did work blood joint good does need months joints test knee day started ago try is a tests better left hope long year disease bad rheumatologist diagnosed here days hands old sure weeks knees doctors normal cause lot got make 58 ! ! ! Figure 5.4: Top 50 terms, ranked by frequency, derived from MedHelp’s Arthritis forum as determined ! by ADEPT (left) and OBA (right). Terms unique to their respective portion of the list are shown in bold. ! are linked with a line. The gradient of these lines show that all co-occurring Terms occurring in both lists ! terms, bar three, are more highly ranked by ADEPT. ! ! ! ! ! ! ! CHAPTER 5. IDENTIFYING MEDICALLY RELEVANT TERMS IN PAT 59 Figure 5.5: A graph showing important terms in Forum77 (nodes), and significant co-occurrence relationships between them (edges). Node size is proportional to degree, while colors indicate clusters. Node labels are omitted for legibility; instead, we examine main clusters in-depth in subsequent figures. detail withdrawal symptoms (anxiety, cramps, body aches, muscle-tremors, muscles-restlessness, etc.). Overall, this cluster suggests that Forum77 hosts detailed discussions on the process and mechanisms of opiate withdrawal. Figure 5.7 illustrates the second-largest (dark blue) cluster. This cluster is almost clique-like, and its core comprises primarily addictive prescription drugs: oxy (oxycodone), hydro (hydrocodone), xanax, vicodin, benzo (benzodiazapine) etc. This cluster also details several withdrawal symptoms (tired, chills, CHAPTER 5. IDENTIFYING MEDICALLY RELEVANT TERMS IN PAT 60 Figure 5.6: The largest cluster in Figure 5.5 suggests that discussions frequently involve detoxification from prescription drugs. Figure 5.7: The second-largest cluster in Figure 5.5 suggests that discussions frequently pair specific drugs and the withdrawal symptoms that they cause. CHAPTER 5. IDENTIFYING MEDICALLY RELEVANT TERMS IN PAT 61 Figure 5.8: The third-largest cluster in Figure 5.8 contains medically relevant terms from Thomas’ Recipe: a user-developed schedule for medication-assisted opioid withdrawal. flu, etc.) as well as body parts (head, legs, skin, etc.), suggesting a great deal of discussion around specific prescription opioids and their associated withdrawal symptoms. Finally, Figure 5.8 shows the third-largest cluster (dark green). Like Figure 5.7, the structure is cliquelike. Its nodes constitute a combination of withdrawal symptoms (runny nose, general aches, leg cramps etc.), terms representing wellness activities or supplements (mild exercise, cycling, vitamin b6, zinc, l-tyrosine etc.), and non-opiate drugs (ativan, imodium, benzodiazepine). In hindsight, it is clear that this cluster represents medically relevant terms from Thomas’ Recipe: a user-developed schedule for medication assisted opioid withdrawal that is popular on Forum77. We discuss Thomas’ Recipe in depth in § 6.8.1. These casual explorations of co-occurring ADEPT-extracted Forum77 terms suggest that withdrawal is a primary topic of discussion on the Forum (Figures 5.6, 5.7). Moreover, users discuss specific drugs, primarily prescription drugs (Figure 5.7). Without prior knowledge of Thomas’ Recipe (§ 6.8.1), guessing that Figure 5.8 partially represented a detoxification protocol would be difficult, although the nodes opiate CHAPTER 5. IDENTIFYING MEDICALLY RELEVANT TERMS IN PAT 62 detox and at-home self-detox might have provided a clue. Overall, our later work in this thesis shows that these explorations yield accurate, although incomplete, insights into Forum77’s primary content. 5.7 Conclusion Our work on ADEPT was prompted by the observation that despite the abundance of PAT, tools for extracting medically relevant content from it are lacking. This, in turn, restricts general exploration and hypothesis generation over PAT corpora. One major limitation to building such tools is a lack of large, annotated corpora for training and testing statistical models. Our first result addresses this by proving that a crowd of non-experts is a sufficient replacement for medical experts in the PAT medical term identification task (§ 5.4). Through paying careful attention to existing crowdsourcing design principles, we were able to design a prompt and task that resulted in labels of comparable quality to those produced by experts (§ 5.4.1). Combined and aggregated according to a corroborative vote, Turker responses achieve an F1-Score of 84% against our RNs’ gold standard (§ 5.4.2). As crowds of non-experts are much easier to coordinate than medical experts, this opens up the option of building large, labeled PAT corpora of high quality both quickly and cheaply. We note, however, that not all tasks may be suitable to crowd labeling; those that are more subjective or require specialized knowledge may involve particularly challenging task design (§ 5.4.4). Next, we addressed the issue of automating the PAT medically relevant term identification task (§ 5.5). ADEPT, our CRF classifier trained on crowd-labeled data, dramatically outperforms existing tools MetaMap, OBA and TerMINE (§ 5.5.3). Moreover, ADEPT’s performance carries over to an independently sourced PAT gold standard from CureTogether. While one limitation of ADEPT is that it does not identify specific term types (e.g.,drugs, symptoms), it is excellent at finding terms of medical relevance. This makes it a useful and novel tool for summarizing and exploring PAT corpora (§ 5.6.2). We attribute ADEPT’s success to the suitability of sentence-level, context-sensitive learning models like CRFs to PAT medical term identification tasks. Our dictionary, trained on the same data as ADEPT, achieves high recall because it collects many medical terms from training data, but it achieves low precision because it cannot discriminate between relevant and irrelevant invocations of these terms. Unlike ADEPT, for example, the dictionary cannot learn that the word “sugar” is of particular medical relevance when it co-occurs with the word “diabetes”. The third sentence in Figure 5.2 suggests that context-based relevance detection may be problematic for MetaMap and OBA, too. In this sentence, the term “case” is CHAPTER 5. IDENTIFYING MEDICALLY RELEVANT TERMS IN PAT 63 annotated because of its membership in SNOMED-CT as a medically relevant term pertaining either to a “situation” or a “unit of product usage”. In concert, our contributions in this chapter constitute an alternative approach to medical term annotation and identification. In Chapter 7 we leverage the lessons learned in this chapter to extract a specific type of medical term from Forum77 discussions: users’ drugs of choice. First, however, in Chapter 6 we investigate users’ motivations for participating in Forum77. Chapter 6 What do People Seek on Forum77? Forum77 is the largest community on MedHelp, which indicates that it provides something that users need and find useful. But what do people seek through participation on Forum77? Insight into how and why users engage with Forum77 is instructional in its own right, but also provides a valuable template for planning future, targeted explorations of the corpus. Our goal in this chapter is to elucidate users’ motivations for initiating discussions on Forum77. We first motivate our focus on the topic of addiction (§ 6.1) before covering related work (§ 6.2) and summarizing the data sets used in this chapter (§ 6.3). Next, we conduct a thematic analysis, developing a taxonomy of users’ reasons for participation (§ 6.5). In congruence with prior work, the two driving motivations are seeking emotional support and seeking informational support. Within these categories are sub-categories specific to the topic of substance abuse, such as seeking information on withdrawal and expressing concern about relapse. The most prevalent label, accounting for over 30% of all initiating posts, is the update: a status log devoid of requests for feedback. Next, we discuss the training and evaluation of two binary statistical classifiers than can distinguish emotional from informational posts (§ 6.6), and update from non-update posts (§ 6.7). Our classifiers perform well, achieving F1-scores of 80.12% and 76.54% for emotional vs. informational and update vs. non-update, respectively. Finally, we present the results of applying these classifiers to the entire Forum77 corpus (§ 6.8). We compare and contrast features such as thread longevity and response rates across thread categories. We also present and discuss Thomas’ Recipe: a highly prevalent informational support artifact on Forum77 that we came across in the course of our analyses. We conclude that Forum77 serves both as a user-generated and tested repository of medically-explicit knowledge on managing substance abuse 64 CHAPTER 6. WHAT DO PEOPLE SEEK ON FORUM77? 65 withdrawal, as well as a public platform where people broadcast their progress as a mechanism for seeking emotional support and encouragement from others. In this latter capacity, Forum77 is similar to the offline mutual help groups Alcoholics Anonymous (AA) and Narcotics Anonymous (NA); in its information providing capacity, however, Forum77 is quite distinct, as AA and NA explicitly eschew the sharing of medical information [133]. 6.1 Why Study Addiction? We focus on the topic of addiction for 3 primary reasons, which we expand on below. The first is that addiction is highly prevalent. As such, any insights or results that arise from studying addiction could be useful and impactful to a large number of people. Second, addiction is highly stigmatized. As a result, people suffering from addiction are likely to turn online for help, and addiction-related PAT is likely to contain information that is difficult to acquire through traditional medical channels. Finally, people are turning online en masse for help with Addiction. Forum77 is MedHelp’s largest forum, but, as we show in Table 6.1, only one of several online forums dedicated to the topic of substance abuse recovery. 6.1.1 Addiction is Highly Prevalent Drug and alcohol use disorders, in particular the escalating misuse of prescription drugs, present one of the most pressing public health issues of the day. Addiction affects 16% of Americans ages 12 or older (about 40 million people), far exceeding the number of people afflicted with heart disease (27 million), diabetes (26 million), or cancer (19 million) [4]. Deaths due to accidental drug overdose now exceed deaths due to motor vehicle accidents [251]. In 2008, more than 36,000 deaths were due to drug overdoses; of these, opioid pain reliever (OPR) overdoses accounted for more than heroin and cocaine combined [3, 249]. Taking into account workplace, criminal justice, and health care costs, the burden of prescription drug abuse on the U.S. Economy was $56-$57 billion in 2006-2007 [27, 115]. 6.1.2 Addiction is Highly Stigmatized Recent medical research argues that drug dependence is a chronic, relapsing and remitting disorder that behaves just like other chronic illnesses with a behavioral component, such as Type II Diabetes Mellitus [169]. Despite this, prescription opioid abuse is a highly stigmatized condition: the opinion CHAPTER 6. WHAT DO PEOPLE SEEK ON FORUM77? 66 that opioid misuse is a flaw of a person’s moral character, rather than a legitimate medical condition, is common [187]. This stigma carries into the medical profession. In general, medical professionals feel that addiction lacks parity with other medical conditions in terms of prestige and importance [176]. In addition, there is a mutual mistrust between addiction patients, who feel that they are mistreated and stigmatized and receive poor medical care as a result, and their providers, who find it difficult to evaluate whether patients’ requests for opioids stem from genuine “medically indicated” needs or from addictive behavior [174]. The stigma is compounded by the fact that the most effective treatments for opioid use disorders are methadone or buprenorphine-assisted replacement therapies, which require patients to continue taking prescription opioids under the supervision of a medical professional [187]. Finally, as pain treatment is often the starting point of a longer addiction to prescription opioids, it is common for people with prescription drug use disorders to acquire their drug of choice via a doctor’s prescription [229, 249]. 6.1.3 People are Turning Online for Help with Addiction People with substance use disorders are no exception to the trend of online health forum participation. Myriad discussion forums focus on addiction recovery and are widely utilized. Table 6.1 describes a representative sample of these that we curated during a brief search. The result of this is a massive, growing and (until now) unexamined corpus of text in which users document their experiences with addiction and their attempts at overcoming it. 6.2 Related Work Emotional and informational support consistently emerge as the primary reasons for user engagement in online health communities [36, 47, 86, 122, 131, 148, 149, 162, 211, 243, 250, 258]. However, little work attempts to extend analyses of users’ support giving, seeking, or reasons for participation to data sets that are too large for manual annotation. We discuss this work here, referring the reader to § 2.2.3 for a thorough discussion of users’ reasons for participation in online health communities, and to § 3.5 for a summary of prior work on thematic analyses of PAT. CHAPTER 6. WHAT DO PEOPLE SEEK ON FORUM77? 67 Table 6.1: Summary statistics of a representative sample of online health communities focused on addiction recovery. We identified sites through Google searches and gathered statistics (if available) from site pages. Data current as of 3/1/2014. Name Description Forum77 medhelp.org/forums/ Addiction-SubstanceAbuse/show/77 The Suboxone Talk Zone suboxforum.com Addiction Recovery Guide addictionrecovery guide.org Addiction Survivors addictionsurvivors.org Cyber Recovery cyberrecovery.net Sober Recovery soberrecovery.com/forums Join Join to to post read Members Posts Threads Single forum dedicated to recovery in general. ∼51,153 ∼740,046 ∼80,529 Y N Multiple forums focused on issues related to Suboxone. ∼11,000 ∼77,000 ∼8,900 Y N Collection of resources for assisting recovery; includes online forum. N/A 700,000 N/A Y N Forums focus on opiate, alcohol, benzodiazepine, and stimulant addiction. ∼15,870 ∼270,000 ∼17,500 Y N Multiple forums dedicated to recovery in general. 5,078 154,975 23,000 Y Y Multiple forums dedicated to alcoholism and drug abuse recovery. 132,964 >3.5 M 234,311 Y N Wang et al. [250] successfully use workers on Amazon’s Mechanical Turk1 (Turkers) to quantify the amount of emotional and informational support contained in both initiating and response posts on Breastcancer.org2 . They then use this data to train regression models that have correlation scores 0.76 and 0.80 for emotional and informational content, respectively. Investigating whether certain types of support are important for member retention, they found that receiving high levels of emotional support predicted for lower dropout risk. Biyani et al. [28] manually labeled ∼1,000 sentences from the Cancer Survivor’s Network forum3 as either emotional or informational. An ensemble classifier trained on this data achieved an F1-score of 84% (88% for emotional support, 77% for informational support). Their goal was to determine whether influential and regular community members differed in terms of the types of support they provided on the forum. They found that influential members offer significantly more emotional support than regular community members. 1 http://www.mturk.com 2 http://breastcancer.org 3 http://csn.cancer.org CHAPTER 6. WHAT DO PEOPLE SEEK ON FORUM77? 68 To our knowledge, no other prior work attempts to automatically classify informational and emotional support in PAT. However, some work does investigate methods for labeling or featurizing these data at scale. Vlahovic et al. [248] found that Turkers produced good labels for emotional and informational support on posts from a breast cancer support forum. Finally, both Owen et al. [188] and Alpers et al. [12] evaluate the efficacy of using LIWC4 to automatically identify emotions expressed in posts on breast cancer support forums. While both find the tool reasonably accurate, they do not attempt to analyze users’ motives for posting. Unlike Wang et al. [250] and Biyani et al. [28], we investigate and discuss a more detailed taxonomy of users’ reasons for participation. In addition to automatically classifying informational and emotional support, we are also able to train a classifier to identify a specific sub-category of emotional support posts: the update. While we leave the analysis of response post content to future work, we do investigate response levels to different categories of initiating posts. 6.3 Data For clarity, we briefly summarize the data sets used in this chapter. 6.3.1 Thematic Analysis Development Dataset We use our Forum77 data set (§ 4.1.2) for this work. For our thematic analysis (§ 6.5), we used ∼1,000 initiating posts sampled uniformly at random for each iteration of the analysis, and evaluated interannotator agreement on a 200-post subsample. With a total of 3 iterations, we used ∼3,000 initiating posts sampled uniformly at random to conduct the thematic analysis. 6.3.2 Labeled Training & Testing Dataset We created a data set for labeling and classifier training as follows: first, we curated a sample of initiating posts from recurring Forum77 users by randomly sampling 200 users who had initiated 5 or more posts. (We restricted the sample to recurring users in order to ensure a more balanced representation of taxonomy labels, as we observed in our thematic analysis (§ 6.5) that certain labels (e.g., support giving) tend to appear only later in a user’s tenure.) Our 200 sampled users authored ∼32,000 initiating posts, 4 http://www.liwc.net CHAPTER 6. WHAT DO PEOPLE SEEK ON FORUM77? 69 of which we took a random sample of 1,000 for subsequent coding. To prevent any user from dominating the sample, we admitted no more than 30 posts per user. 6.4 Who Posts? Traditional demographic information such as gender, age, race and socioeconomic status is rarely discernible from Forum77 posts. However, we were able to determine other aspects of identity, namely whether a user was posting on their own behalf or on behalf of someone else. We noted that most users initiate posts in which they are the subject; occasionally, however, users initiating posts in which someone else is the subject. These proxies range from concerned parents, to members congratulating each other on clean time, to loved ones posting on behalf of an incapacitated member. We defined the subject of the post to be self if the author is writing about her own addiction, associate if the author is writing about someone else’s addiction, or n/a if this information is absent or indeterminate. Two authors labeled our 1,000 initiating post training data sample with the subject label. Inter-annotator agreement was 92%, with a Cohen’s Kappa of 0.77. The distribution of subject labels over the sample data set is: 85% self, 8% associate, and 7% n/a. While most users post on their own behalf, a significant minority post on behalf of another. Moreover, the number of posts in which the subject was indeterminate was higher than we expected. Such posts typically consist of social chatter (e.g., talking about sports). As these results do not suggest anything interesting or novel, we do not pursue this analysis at scale. 6.5 Users’ Objectives in Initiating Discussions Thematic analyses are frequently used on PAT to identify structure and patterns in user behavior and user-generated content (§3.5). To develop a taxonomy describing users’ objectives in initiating discussions on Forum77, we use an adapted General Inductive Approach [236]: over the course of reading ∼3,000 posts, two authors iteratively co-developed a taxonomy describing recurrent and emergent themes in the posts. On each iteration, the authors used the taxonomy to independently label 1,000 randomly sampled posts. They then revised the rubric based on subsequent error analysis and interannotator agreement scores calculated on a 200-post subsample. The authors executed a total of three iteration cycles. Figure 6.1 illustrates our thematic analysis process. Thematic Analysis CHAPTER 6. WHAT DO PEOPLE SEEK ON FORUM77? Schema Consult Addiction Specialist Sample n=1,000 70 Label Set#1 n=600 Label Set#2 n=600 Error Analysis Final Schema Figure 6.1: Thematic analysis process. Orange edges indicate the iterative component of the analysis. Table 6.2 presents our final taxonomy, which was reviewed and approved by an Addiction specialist, along with label prevalence in our labeled training data set. Table 6.3 presents sample text from posts in each category in the taxonomy. 6.6 6.6.1 Classifying Informational vs. Emotional Support Training Dataset Annotation and Agreement Having finalized our taxonomy, two annotators used it to each label 600 of our 1,000 initiating post training data sample (§6.3.2). We annotated each post with its primary purpose using the most specific label available. Inter-annotator agreement for specific purpose labels (Label in Table 6.2) was medium, with agreement of 67% and Cohen’s kappa [50] of 0.62. Inter-annotator agreement on the three broader categories informational, emotional and neither (Category in Table 6.2), however, was high with agreement of 87% and a Cohen’s kappa [50] of 0.78. CHAPTER 6. WHAT DO PEOPLE SEEK ON FORUM77? 71 Table 6.2: Annotator-derived taxonomy for users’ objectives in initiating a post, with % prevalence in the 1,000 post labeled sample on the right. Note that 1.) labels are mutually exclusive, 2) “w/d” stands for “withdrawal”. Category informational emotional neither 6.6.2 Label Description % w/d expectations Questions on what to expect when going through withdrawal, especially regarding symptom severity and duration. 11.8 w/d management Questions about how to manage withdrawal and relieve symptoms. 8.7 w/d method Soliciting advice on how best to quit drug(s) of choice. Topics include method of quitting (cold turkey vs. tapering) and scheduling a time to detox. 7.8 general information Subject posts medical questions unrelated to withdrawal. 8.5 seek support Specific requests for support (like keeping in thoughts, prayers, getting in touch). 4.6 give support Primary purpose of the post is to offer encouragement to others, often via relating a personal story of overcoming addiction. 9.9 update Posts that comprise a log-like report of the user’s current status. These are often highly detailed and contain no requests for feedback or support. 35.5 general guidance Subject posts non-medical questions to the community. These often comprise advice for personal relationships and scenarios requiring moral judgement. 5.0 relapse concern Subject is worried that she is going to relapse. While rare, these posts typically forecast relapse due to a required medical procedure that will require prescription pain medication. These posts varied in their information vs. support leanings, so we excluded them from either category. 2.8 n/a Impossible to speculate on the purpose of the post. 5.4 Classifier Training To identify posts as either primarily informational or primarily emotional, we built a logistic regression classifier (which outperformed Support Vector Machine and Naive Bayes classifiers) using the Stanford CoreNLP toolkit5 . For each post, we used the following features: the number of question sentences, content unigrams and bigrams, positive and negative word counts with polarity score ≥ 0.8 in SentiWordNet [19], and number of days clean, if stated. The last feature was determined by applying the pattern “X days/weeks/months clean” and “on day X” to the post text. A full feature list is documented in Appendix B. 5 http://nlp.stanford.edu/software/corenlp.shtml CHAPTER 6. WHAT DO PEOPLE SEEK ON FORUM77? 72 Table 6.3: Descriptions and samples of taxonomy labels. Samples are synthesized in order to preserve user privacy. Label Description (+ Additional Notes) Synthesized Sample w/d expectations What to expect while going through w/d. (Typically users will ask how long symptoms will last, whether the symptoms are normal etc.) I stopped long term methadone 12 days ago. I was wondering if anyone knows how long the anxiety RLS and hot/cold last? The other symptoms rnt too bad... w/d management How to handle w/d symptoms. Implies w/d e. (Typically users will try to source ideas for alleviating pain, RLS etc.) I’m wondering about the Amino Acid protocol and Thomas recipe. What would be the most important to take from day 1 to 4 during the worst W/D symptoms? I know I suffer the most with RLS and chemical chills [...] w/d method User seeks information on how to quit a substance. (Include questions like whether to go c/t vs. taper, requests for tapering schedules or advice etc.) I am taking 5000mg of vicodin currently daily can anyone help me with this? general information User seeks informational advice that is not related to quitting/withdrawal. (Several possibilities, including questions about how much would it take to overdose etc.) I’m curious as to how long people were addicted/dependent to their DOC. I know using for longer makes it harder to quit, and each time you quit WDs are harder than before. As for me, I had a 12 year run with vics/oxies. seek support User explicitly requests emotional support from the community. (Request for emotional support should be explicit. Typically users will ask for help or prayers or thoughts.) For those of you who are prayer warriors, please could you pray for my friend, for recovery and protection. Could you also please pray for his family - they are in a very hard place right now. Thank you! give support User imparts a strong message of encouragement to the community. (Look for terms like “so I just wanted everyone to know that it’s possible and you can do it”) Hey y’all! Well today the depression paid me visit but I kept it caged! Anxiety about 20% Did a 2.5 mile run and that helped tons. I can’t say it enough: exercise really helps withdrawals. If you can then DO IT! When the wds hit don’t crawl into bed - get up and move! update Update the community on the user’s status The only reason I’m not getting more is the stress involved in getting them and setting up a supply because you can’t have just one. WD today are ok not too bad. It’s my neck that’s killing me and my body laughing at the Advil I took. general guidance Non-medical advice that doesn’t fall into any of the above categories. (Typical examples include questions of how to deal with telling spouses about addiction, whether to cut off a family member etc.) Do any of you guys have experience with giving a husband an ultimatum? It seems simple: Get treated or you’re out. But with 3 young children it’s actually quite complicated. Help. relapse concern Often patients claiming to be clean but need a medical procedure that will require pain meds. i had an accident yesterday that got me stuck in the emergency room. today i’m 21 days off my roxies [...]. i ’m scared of going back because I know i’ll be given pain meds [...] n/a Impossible to determine I’ve been away for few days and everything seems different. Anyway I hope everyone is doing great. CHAPTER 6. WHAT DO PEOPLE SEEK ON FORUM77? 6.6.3 73 Classifier Performance The final classifier performs well, achieving an accuracy of 80.98% in 10-fold cross validation versus a baseline of 59.7% in which every post is labeled with the majority class. Table 6.4 shows precision, recall, and F1 scores averaged over all 10 folds. Table 6.4: Classifier performance for labeling initiating posts as seeking informational support or emotional support. Performance scores are averaged over 10 folds. Label 6.7 6.7.1 Precision Recall F1 support information 84.57 76.18 83.40 77.12 83.84 76.41 Average 80.37 80.26 80.12 Classifying Updates vs. Non-updates Classifier Training To automatically label all posts with update or non-update labels, we again built a logistic regression classifier, using the same training and testing dataset from § 6.6.1. The non-update posts contain all posts that are not an update or n/a. We added two features to those used in our informational vs. emotional classifier (§ 6.6): whether the post mentions a number of days (using the pattern: “day” or “days” followed by a number), and time elapsed (days) since the user’s last initiating post. Table 6.2 shows that the ratio of update to non-update posts is roughly 1:3. To compensate for this class imbalance, during classifier training we randomly sub-sample such that non-update post quantity is at most 1.5x that of update posts. We do not change the test set. 6.7.2 Classifier Performance Our classifier achieves an accuracy of 78.40% compared to the majority-class baseline accuracy of 62.55% in 10-fold cross validation. Table 6.5 shows precision, recall and F1 scores. CHAPTER 6. WHAT DO PEOPLE SEEK ON FORUM77? 74 Table 6.5: Classifier performance labeling posts as either update or non-update. Performance scores are averaged over 10 folds. Label 6.8 Precision Recall F1 update non-update 72.15 82.36 69.29 84.16 70.09 82.99 Average 77.25 76.72 76.54 Results Users post primarily on their own behalf: In our sample, ∼85% of initiating posts were written by the author on her own behalf, while only ∼8% were written on behalf of someone else. This differs from reports by the Pew Research Center that find that ∼50% of online health inquiries are made on behalf of another [90, 91]. It is possible that the stigmatized nature of addiction prevents users from disclosing their situation to loved ones, who might otherwise ask questions on their behalf. Another possibility is that the act of posting on Forum77 during the physically uncomfortable and painful process of withdrawal is cathartic in and of itself: a benefit unavailable to proxy participants. Informational and emotional support are the driving motivations for initiating discussion: In congruence with prior work, our thematic analysis revealed that seeking informational and emotional support drives user participation on Forum77. Applying our classifier to the entirety of the Forum77 data set, we find that users seek both types of support in roughly equal proportion: 47% of all initiating posts seek primarily informational support, while 53% of all initiating posts seek primarily emotional support. This stands in contrast to our manually-annotated sample (Table 6.2) in which only 36.8% of initiating posts are informational. Given that our machine-labeled sample comprises recurring Forum77 users, one potential explanation for this is that longer-tenure or more involved users seek emotional support more than users who post only a couple of times on the forum. Informational posts seek explicit medical advice about withdrawal: Users primarily seek knowl- edge on withdrawal methods, management and expectations in informational posts. Table 6.2 shows that in our sample, almost 75% of informational posts specifically discuss the topic of withdrawal. A casual analysis of informational posts also reveals that the type of information requested by users is often explicitly medical in nature, such as the pharmacological management of withdrawal. A prevalent CHAPTER 6. WHAT DO PEOPLE SEEK ON FORUM77? 75 example of this is Thomas’ Recipe, an opioid withdrawal tapering schedule that has evolved on Forum77 over time (§ 6.8.1). Informational threads receive fewer responses, but have a longer lifespan: Approximately 95% of both informational and emotional initiating posts receive a response. Of these, initiating posts that primarily seek emotional support receive more responses than those seeking informational support (mean 8.7 vs. 7.4, median 6 vs. 5). The distributions are significantly different (Mann-Whitney-U test, n1 = 39,553, n2 = 38,954, U = 758,376,673, p < 0.001). The “lifespan” of a discussion is the number of days between its initiating post and the last response on record. On average, initiating posts that seek primarily informational support have a lifespan 2.5 times as long as those that seek primarily emotional support (mean 74.4 days vs. 27.6 days, median 0 (< 24 hours) vs. 0). The differences in means are statistically significant (Mann-Whitney-U test, n1 = 37,112, n2 = 41,395, U = 817,010,310, p < 0.001). Most (56% of informational and 59% emotional) discussions have a lifespan of 0 days (<24 hours). Excluding these, informational discussions remain dominant in terms of lifespan (mean 170.3 days vs. 68.8 days, median 2 days vs. 1 day). Update posts are the most prevalent type of emotional post: Our classifier identifies some 15,000 out of ∼55,000 (30%) initiating posts as updates. Update posts comprise a log-like status update of the user’s current condition, and rarely explicitly request any sort of response from the community. For example: I was used to taking 8-10 5/325 oxycodones a day. Havent taken any of them since Friday but I took one Oxy 40mg Sat and one on Sunday morning. Its been almost 24 hrs and not to bad so far but im sure there is more to come. Despite the lack of specific requests, update posts do indeed trigger a community response, as we discuss in the next paragraph. Update posts have more responses & more unique contributors, but shorter lifespans: To further assess the role that update posts play, we compared several features of threads that were initiated by update vs. non-update posts. Update threads have a shorter average lifespan than other threads (mean = 10.8 days vs. 30.0, sd = 88.8 vs. 151.1; t435332 = -18.2, p < 0.001). It is possible that the personal nature of an update post makes them difficult to repurpose. Other differences are small: on average, CHAPTER 6. WHAT DO PEOPLE SEEK ON FORUM77? 76 55% 9.7 days Update 45% 4.4 days Nonupdate 71% 22 days 29% 8.2 days Figure 6.2: Normalized transition probabilities and average transition times between consecutive update and non-update posts. threads initiated by update posts net slightly more responses (mean 7.19 vs. 6.65; t27230 = 7.2, p < 0.001) and slightly more unique contributors (mean 4.91 vs. 4.35; t27126 = 10.6, p < 0.001). Time elapsed between consecutive update posts is short: Figure 6.2 shows users’ transition fre- quencies between initiating update and non-update posts, along with the average number of days between transitions. Users posting consecutive updates do so in comparatively quick succession, averaging 4.4 days between each update. 6.8.1 Thomas’ Recipe: An Informal Collaboration During our analysis, we noticed that not only do users share explicit medical advice with one another: they test, evaluate, modify and re-share it. In others words, users informally collaborate on developing treatment protocols that are effective at assisting withdrawal. A prevalent example of this on Forum77 is Thomas’ Recipe. Thomas’ Recipe6 is a detailed treatment protocol for medication-assisted opioid withdrawal management. It was written in the early 2000’s7 by a Forum77 user who had years of experience detoxing from opioids, but no medical qualifications. Over the years, the original Thomas’ Recipe has evolved. Table 6.6 shows a version of Thomas’ Recipe from circa 2000, while Table 6.7 shows a version from circa 2006. While the core content remains, the newer version has a great deal more structure and formalization. Details of the recipe have also changed. For example, the older recipe recommends a 4000mg 6 http://www.medhelp.org/tags/health_page/45/Addiction/Thomas-Recipe-Re-Posted?hp_id=16 7 While our data set officially starts in 2007, it also contains some posts from as far back as 1999. We believe that this was either a pilot program or another forum that was acquired by MedHelp. CHAPTER 6. WHAT DO PEOPLE SEEK ON FORUM77? 77 dose of L-Tyrosine, while the newer recipe suggests beginning with a 2000mg dose and scaling up as necessary. An informal assessment of iterations of Thomas’ Recipe on Forum77 suggest that these changes are a result of user testing and feedback. Users’ comments, too, suggest that over time, they have modified Thomas’ Recipe to make it more generally applicable and effective: “I’m actually doing pretty good I’ve taken the Thomas recipe from day 1 but I’ve also added Vitamin D, and niacin.” “I have a modified Thomas Recipe that seems to have done wonders on my withdrawals if anyone is interested. (No Xanax or Valium etc) Added Potassium pills, Ensure protein drinks (since I cant eat anything solid yet).” “If it helps any, I did a modified Thomas’ Recipe. I didn’t use any pharmaceuticals and added some additional supplements (Magnesium, Potassium and Calcium for RLS and Melatonin for sleep).” Thomas’ Recipe is wildly popular on Forum77. Approximately 1.72% of all posts in our data set mention it directly. Moreover, it is not constrained just to MedHelp: these days Thomas’ Recipe is hosted on a number of addiction recovery sites8 9 10 11 , and a Google search for “Thomas’ Recipe” brings up sponsored advertisements for opiate withdrawal remedies. The recipe’s prevalence is likely testimony to the fact that it does genuinely assist the process of opiate withdrawal. Forum77 users swear by its efficacy, calling it a “life saver”, a “god send”, and something that “works wonders”. To evaluate the efficacy of Thomas’ Recipe, we showed it to a psychiatrist specializing in addiction. She noted that not only was the recipe very similar to a treatment she might have recommended professionally, but also that it contained novel elements that would facilitate the withdrawal process. 6.9 Discussion Forum77 serves as a valuable, user-generated repository of medical information pertaining to the process of addiction recovery. Moreover, this information is not static: it is curated, tested and modified. As 8 http://www.drugs.com/forum/featured-conditions/thomas-recipe-opiate-withdrawal-35169.html 9 http://www-personal.umich.edu/ ~timaster/biopsych/home.html 10 http://opiatewithdrawaltips.com/thomas-recipe 11 https://www.drugs-forum.com/forum/showthread.php?t=12568 CHAPTER 6. WHAT DO PEOPLE SEEK ON FORUM77? 78 we saw in the example of Thomas’ Recipe (§ 6.8.1), users actively collaborate on developing effective treatment protocols. The continual evolution of informational artifacts on Forum77 is likely a contributing factor to the fact that informational discussions have significantly longer lifespans than emotional discussions. Another factor that we have observed lengthening the lifespan of informational discussions is that some users repurpose them, sometimes years after the initial post, to describe their own situation. In doing so, users may feel that they are not starting from scratch, that they have a ready made description of their condition, or that they are leveraging work that the previous initiator put into finding other Forum77 members who could address their specific issue. While users do explicitly seek emotional support on Forum77, most emotional posts are not explicit requests, but rather, update posts. The prevalence of the update post suggests that users place value in having a community bear witness to their struggle with addiction. The fact that update posts garner slightly more responses on average than non-update posts shows, too, that responses are expected. It is possible that users publicize update posts (rather than writing them, for example, in a private journal) as a self-enforcement mechanism to help them progress with cessation. Qualitative evidence shows that users feel a great deal of embarrassment and shame when a withdrawal attempt fails, and that failing may even delay their return to the community. In addition to having a community of witnesses, users derive utility from the process of documentation itself. Authors find it valuable to reflect upon their past posts, which serve as reminders and evidence of both accomplishments and regressions. For example, one user reflects on something that she was scared to do: I just found some old post about no desire for sex. Whew! I was so scared to ask the question. Another laments a relapse: I cna’t believe I’m at 25 days when I was in the hundereds before. I’m so angry at myself for relapsing and still keep beating myslef up!! Readers, too, find others’ chronicles both informative and illustrative. This user mentions reading through hundreds of old posts to glean insight into what his withdrawal will be like: This is my first d/x and pray that it will be my last. I’ve read through tons of old posts and they definitely help. CHAPTER 6. WHAT DO PEOPLE SEEK ON FORUM77? 79 Another poster used narratives on Forum77 to help her husband prepare for the process of her recovery: i have showed him this site and let him read some of your stories, so he knows its not all going to be plane sailing 6.9.1 Limitations and Future Work The primary limitation to our work is our requirement that a post be labeled as either informational or emotional. In our experience, while only one of these labels tends to be dominant in an initiating post, Wang et al. [250] and Biyani et al. [28] do show that finer-grained labeling is possible at scale. Although picking the dominant label was sufficient for examining our analysis questions, a more nuanced analysis might benefit from more detail. A natural avenue for future work is to analyze response posts in addition to initiating posts. While Wang et al. [250] utilize the same scales of emotional and information support in scoring both initiating and response posts, our informal analyses of Forum77 response posts suggest that response categories would require an entirely new descriptive taxonomy. (For example, a fairly common response tactic that we observed that does not manifest in Table 6.2 is the hijack : when a user attempts to shift the focal attention of active thread participants away from the initiator and onto herself, usually by claiming identical circumstances to the initiator. This tactic often kills the thread.) Having derived this taxonomy, however, one could start to ask questions such as, “What is the most effective way of getting informational support?”, or “What types of initiating threads draw a diverse crowd of respondents?”. 6.10 Summary We set out in this chapter to answer the question: “What do users seek on Forum77?”. We first motivated our focus on the topic of addiction, noting that both its prevalence and stigma make it a potentially rewarding focus of study (§ 6.1). We then presented related work on identifying types of support seeking on online health forums (§ 6.2), and described the data samples used in this chapter (§ 6.3). Through conducting a thematic analysis over a sample of initiating posts, we found that, in congruence with prior work, users seek both informational and emotional support on Forum77. Moreover, we discovered that the most prevalent form of emotional support seeking was to issue update posts: essentially status logs containing no explicit request for a community response (§ 6.5). With some feature CHAPTER 6. WHAT DO PEOPLE SEEK ON FORUM77? 80 engineering, we were able to train two binary statistical classifiers to distinguish emotional from informational posts (§ 6.6), and update from non-update posts (§ 6.7), with F1 scores of 80.12% and 76.54%, respectively. Applying our classifier to the entire Forum77 data set, we then analyze differences between these post categories (§ 6.8). We find, for example, that informational posts have a longer lifespan than emotional posts, and that while update posts make no explicit request for feedback, they garner more responses on average than non-update posts. We also analyze Thomas’ Recipe (§ 6.8.1), an informational artifact of Forum77 that provides users with instructions for medication-assisted detoxification from opioids. In conclusion, Forum77 provides two main services to users: first, it serves as a repository of information on opioid abuse that is generated, tested, and modified for improved efficacy by community members. Second, it offers a space where the disclosure of personal progress (whether forward or backward) can be witnessed by others and recorded for posterity. In Chapter 7 we turn our attention to identifying which drugs Forum77 users abuse. CHAPTER 6. WHAT DO PEOPLE SEEK ON FORUM77? 81 Table 6.6: Thomas’ Recipe (circa 2001) THOMAS RECIPE Here’s my tried-and-true do-it-yourself ”cold turkey” detox protocol. Supplies you’ll need first: As many Valium, Xanax, Librium or Klonopin as you can get your hands on. — first day off the opiate, use enough Valium or whatever, to, if possible, sleep through most of the first couple days. Then start decreasing the dose until you’re down to nothing in about 5 or 6 days. You’ll have to do the math. The Valium or one of its sister drugs will help tremendously with the anxiety and, somewhat, with the body aches. Valium may make you eat like a pig and, when withdrawing from narcotics, one usually craves sweets, so I’d be ready to indulge myself, along with some good escapist movies. That always worked for me. Around-the-clock access to either hot baths or a Jacuzzi. –speaking of those goddamn mostly thigh cramps that seem to love to show up in the middle of the night, have that hot bath or Jacuzzi at the ready. Don’t hesitate to spend the majority of the week in that hot water if that’s what it takes to get you through it. You may be wrinkled, but you’ll have your sanity. Don’t underestimate what the hot baths can do to relieve the withdrawal discomfort. They really work. Heating pads between the thighs can help with those cramps, too, but not as much as the hot baths. Brand-name-only Imodium (over the counter at the supermarket) – if you’re a normal hydro addict, you’ll be getting the runs by no later than the second or third day off the lorcet. In my experience, it’s an especially unpleasant variety. At the first impulse, take two or three and respond to returning urges with two tabs. It’s important that you do it immediately. L-Tyrosine (qty 50 of the 500mg caps) - an amino acid available at the health food store. – chronic use of narcotics depletes the brain of several critical neurotransmitters responsible for well-being and mental performance and attitude. Plus: Bottle of 100 mg B6 caps My experience detoxing with this stuff says take 4000 (four thousand) mg. (8x500mg caps of L-Tyrosine) with two 100mg B6 caps every day for your ”detox week” to provide your brain with the raw material it needs to replenish its stores of these neurotransmitters. Many feel the difference on the very first dose. ***Take it on an empty stomach, either first thing in the morning or at bedtime. You can continue this regimen after the first week if it continues to make you feel good. I continue to use it every other day with very few exceptions. After a few weeks, I cut down on the dosage, though, as it can cause the runs at high doses. Multi-vitamins (most junkies don’t eat too well, so this one’s just for good sense). Take a look at this link. According to this doc, you also need to add copper, phosphorus and Vitamin C to replenish the dopamine, and the norepinephrine. You might have to do some hunting at the health food store to find the right vitamin or vitamins to supply all this stuff. I got a pretty good result from just the L-Tyrosine and B6, however. I also understand from another contributor that zinc and magnesium help replenish and restore vital substances depleted by narcotics use. WARNING: This same site says to avoid L-Tyrosine if you’re on an SSRI (serotonin reuptake inhibitor) such as Prozac, etc. Good luck. Thomas Sourced on 9-02-2014 from: http://www.medhelp.org/posts/Addiction-Substance-Abuse/How-Long-Untill-You-Are-Normal/show/ 43582 CHAPTER 6. WHAT DO PEOPLE SEEK ON FORUM77? 82 Table 6.7: Thomas’ Recipe (circa 2006) THOMAS RECIPE PLEASE NOTE: I am not a doctor, simply a long-time Rx opiate junkie who has had many opportunities to develop a way to detox. This is a recipe for at-home self-detox from opiates based on my experience as well as that of many other addicts. It is not intended as professional medical advice. It is always wise to make sure none of the recipe ingredients or procedures conflict with medications you may be taking. Likewise, if you have any medical condition, disease, allergy or any other health issue, consult your doctor before using the recipe. Thanks, Thomas If you can’t take time off to detox, I recommend you follow a taper regimen using your drug of choice or suitable alternate – the slower the taper, the better. For the Recipe, You’ll need: 1. Valium (or another benzodiazepine such as Klonopin, Librium, Ativan or Xanax). Of these, Valium and Klonopin are best suited for tapering since they come in tablet form. Librium is also an excellent detox benzo, but comes in capsules, making it hard to taper the dose. Ativan or Xanax should only be used if you can’t get one of the others. 2. Imodium (over the counter, any drug or grocery store). 3. L-Tyrosine (500 mg caps) from the health food store. 4. Strong wide-spectrum mineral supplement with at least 100% RDA of Zinc, Phosphorus, Copper, Magnesium and Potassium (you may not find the potassium in the same supplement). 5. Vitamin B6 caps. 6. Access to hot baths or a Jacuzzi (or hot showers if that’s all that’s available). How to use the recipe: • Start the vitamin/mineral supplement right away (or the first day you can keep it down), preferably with food. Potassium early in the detox is important to help relieve RLS (Restless Leg Syndrome). Bananas are a good source of potassium if you can’t find a supplement for it. • Begin your detox with regular doses of Valium (or alternate benzo). Start with a dose high enough to produce sleep. Before you use any benzo, make sure you’re aware of how often it can be safely taken. Different benzos have different dosing schedules. Taper your Valium dosage down after each day. The goal is to get through day 4, after which the worst WD symptoms will subside. You shouldn’t need the Valium after day 4 or 5. • During detox, hit the hot bath or Jacuzzi as often as you need to for muscle aches. Don’t underestimate the effectiveness of hot soaks. Spend the entire time, if necessary, in a hot bath. This simple method will alleviate what is for many the worst opiate WD symptom. • Use the Imodium aggressively to stop the runs. Take as much as you need, as often as you need it. Don’t take it, however, if you don’t need it. • At the end of the fourth day, you should be waking up from the Valium and experiencing the beginnings of the opiate WD malaise. Upon rising (empty stomach), take the L-Tyrosine. Try 2000mgs, and scale up or down, depending on how you feel. You can take up to 4,000mgs. Take the L-Tyrosine with B6 to help absorption. Wait about one hour before eating breakfast. The L-Tyrosine will give you a surge of physical and mental energy that will help counteract the malaise. You may continue to take it each morning for as long as it helps. If you find it gives you the ”coffee jitters,” consider lowering the dosage or discontinuing it altogether. Occasionally, L-Tyrosine can cause the runs. Unlike the runs from opiate WD, however, this effect of L-Tyrosine is mild and normally does not return after the first hour. Lowering the dosage may help. • Continue to take the vitamin/mineral supplement with breakfast. • As soon as you can force yourself to, get some mild exercise such as walking, cycling, swimming, etc. This will be hard at first, but will make you feel considerably better. —Thomas Sourced on 9-02-2014 from: http://www.drugs.com/forum/featured-conditions/thomas-recipe-opiate-withdrawal-35169.html Chapter 7 Identifying Drugs of Choice Monitoring drug use at a population level is crucial for observing, managing, and responding to substance abuse-related issues, such as the emergence of new “designer drugs”, or the existence of particularly vulnerable populations. Drug use trends could also be useful for exploring more theoretical aspects of addiction, such as the Gateway Hypothesis [139], which proposes that drug use follows a progressive and hierarchical sequence in which the user begins with legal addictive substances (e.g. alcohol and cigarettes), before progressing onto marijuana and, finally, illicit substances. The stigmatized [174,176,187] and often illegal nature of substance abuse, however, can make such data collection difficult. Existing substance abuse surveillance efforts are restricted to convenient populations: schools (Monitoring the Future1 ), hospital emergency room visits (Drug Abuse Warning Network2 ), state run treatment facilities (Treatment Episodes Dataset3 ), and in-person mutual help groups (Narcotics Anonymous4 ). However, as membership in each of these populations can be compelled, these surveys, while large-scale and thorough, fail to capture a more representative sample of drug users. Despite the fact that millions of people voluntarily participate in online health communities for substance use disorders, almost no prior work attempts to derive drug usage data from PAT. Our goal in this chapter is to profile substance use in the Forum77 population, and to compare this against traditionally surveyed drug-using populations. We begin by developing a method for automatically identifying Forum77 users’ drugs of choice (DOCs) from their initiating posts (§ 7.3). As this task is context-sensitive, we build on lessons learned in Chapter 5 and train a conditional random field (CRF) classifier that identifies DOCs with F1, Precision and Recall scores of 84.65%, 91.12% and 79.46%, respectively. Next, we 1 http://www.monitoringthefuture.org 2 http://www.samhsa.gov/data/dawn.aspx 3 http://wwwdasis.samhsa.gov/webt/information.htm 4 http://www.na.org/?ID=PR-index 83 CHAPTER 7. IDENTIFYING DRUGS OF CHOICE 84 manually develop a map for resolving identical entities (e.g. Vicodin and Hydrocodone) extracted by our classifier, and mapping these to classes. Applying our classifier to the entire Forum77 data set, we develop a profile of substance use in the Forum77 population. We contrast this with survey data on the face-to-face peer recovery group Narcotics Anonymous (NA), as well as survey data on individuals who present to addiction treatment centers (TEDs) and emergency rooms (DAWN) (§ 7.4). After normalizing each data set for comparison (§ 7.4), we present both comparative results as well as substance use trends on Forum77 over time (§ 7.5). Compared to other measured drug-using populations, prescription opioid use is highly prevalent in Forum77, while use of more traditionally-abused substances (e.g. alcohol, marijuana and cocaine) is notably scarce. Over time, opioid replacement therapy drugs have become increasingly prevalent on Forum77, while use of other prescription opioids has declined. We discuss possible explanations for and implications of these results (§ 7.6) before concluding (§ 7.7). 7.1 Related Work Two branches of prior work apply to this chapter: the primary one is syndromic surveillance, which is concerned with the utilization of of health-related data for the purpose of detecting, analyzing and monitoring potential disease outbreaks [128]. We discuss syndromic surveillance in depth in § 3.2. The second is work related specifically to observing substance abuse trends in online data, which we discuss below. Surprisingly little work attempts to survey substance use via online data, although the potential for doing so has been recognized [44, 113]. In August 2014, the National Institute on Drug Abuse (NIDA)5 announced the funding of a 5-year initiative to build a substance abuse surveillance system using web data [113]. A related system, called the “Psychonaut Web Mapping Project” already exists in Europe, and has demonstrated an ability to give timely and accurate information related to the outbreak of novel drugs [73]. The project aggregates data scraped from myriad sites, including discussion forums, online stores, and Google search queries, the latter of which have also been shown to correlate with demand for specific substances [65]. This is unsurprising, given that the Internet plays host to a highly competitive market for illicit substances [54, 244]. Dasgupta et al. [66] were even able to show that black market prices for prescription opioids can be accurately assessed via crowdsourcing. Although sparse, this 5 http://www.drugabuse.gov CHAPTER 7. IDENTIFYING DRUGS OF CHOICE 85 prior work supports the supposition that PAT is a promising data source for extracting substance use data. 7.2 Datasets Users typically offer information about the substance(s) they are using in initiating posts, in which they set the tone and topic of discussion, and disclose the issue for which they are seeking help. As respondents may or may not offer similar information about themselves, we restrict our analysis to Forum77’s initiating posts, of which there are 78,507 authored by a total of 28,005 unique users. Training & Testing Dataset Our classifiers require labeled data for training. As we felt that our fa- miliarity with the data set would expedite labeling and reduce errors, we use 500 posts from the 1,000 initiating-post sample described in § 6.3.2. For completeness, we re-specify our sampling methodology from § 6.3.2 here: first, we curated a sample of initiating posts from recurring Forum77 users by randomly sampling 200 users who had initiated 5 or more posts. Our 200 sampled users authored ∼32,000 initiating posts, of which we took a random sample of 1,000 for subsequent coding. To prevent any user from dominating the sample, we admitted no more than 30 posts per user. Analysis Dataset We conduct our final analysis on all of Forum77’s initiating posts (78,507 posts authored by some 28,005 unique users). 7.3 Automatically Identifying Drugs of Choice In this section, we describe how we automatically identify DOCs from Forum77 initiating posts. After defining the term drug of choice, we manually annotate our training & testing data set. Next, we trained a CRF classifier to automatically identify drugs of choice in Forum77 initiating posts. Finally, we resolve the extracted DOC entities to specific categories to facilitate analysis and comparison. 7.3.1 Definition of Drug of Choice In the context of Forum77 data, we define a drug of choice (DOC) as any substance that the user indicates that she is, or was, addicted to. Such indications can be direct (e.g. “I am addicted to percs/patches”) or implied (e.g. “I need to get off 32mgs subox”). We also include as DOCs phrases that CHAPTER 7. IDENTIFYING DRUGS OF CHOICE 86 unequivocally imply a misused substance (e.g. “chasing the dragon” implies opium, “blazing” implies marijuana), although we found such occurrences to be rare. Identifying DOCs in Forum77 text is a context sensitive task: whether a substance plays the role of treatment or addiction depends on the user. Methadone and buprenorphine, opioids used in opioid replacement therapy, are common examples. Valium, which is both an addictive benzodiazepine and an ingredient in Thomas’ Recipe for aiding opioid withdrawal (§ 6.8.1), is another. 7.3.2 Data Annotation Using the definition above, two authors each labeled DOCs in 300 of the 500 posts in our sample. Interrater agreement calculated on the 100 overlapping posts was high, with a Cohen’s kappa [50] of 0.84. Of the total sample, 276 (∼ 55%) of posts contained DOC mentions. 7.3.3 Classifier Training & Evaluation As discussed in § 5.5.1, conditional random field (CRF) models are particularly well suited to identifying specific entities in text [151]. CRFs are also context sensitive. For example, a CRF could leverage other words in a sentence to determine whether a term like methadone refers to a substance being abused vs. a substance being used as a treatment. This, in addition to the fact that prior work has successfully utilized CRF models to identify a variety of medical terms [159, 222], makes it an appropriate choice for the challenge of identifying DOCs in text. We trained a CRF to automatically identify DOCs mentioned in initiating posts on our labeled training and testing data set. For training, we exclude annotations of general drug terms such as pills, meds and drugs. As we observed in our work on ADEPT in Chapter 5, generic terms are uninformative as well as a significant source of classifier error [201]. For full documentation of classifier features, see Appendix C. Results Our CRF performs well at identifying DOCs from initiating posts. On 10-fold cross validation it achieves an F1-score of 84.65%, and Precision and Recall scores of 91.12% and 79.46%, respectively. Table 7.1 shows a breakdown of performance across different types of terms. The CRF performs best on drug terms that are both specific and correctly spelled (e.g. marijuana, oxycodone) and informal/morphological variations thereof (e.g. pot, oxides), and performs worst on generic drug terms (e.g. stuff, pain pills). Table 7.2 illustrates the results of applying our DOC classifier to sample sentences, CHAPTER 7. IDENTIFYING DRUGS OF CHOICE 87 Table 7.1: DOC classifier performance across term categories. The classifier performs best on correctly spelled, specific drug terms; worst on general drug terms. Category Examples F1 score (%) Precision (%) Recall (%) 84.7 91.1 79.5 All terms Specific drug terms, spelled correctly (53.1% of all terms) marijuana, ultram, phenobarbital, hydrocodone 87.0 90.3 83.9 Informal & morphological variations of drug terms (34.5% of all terms) roxies, oxyz, subs, pot, vics, blues, hydros, smokes 84.6 93.4 77.2 General drug terms (12.8% of all terms) pain pills, painkillers, stuff, substances 79.7 94.0 69.2 powder, Table 7.2: Examples of DOCs extracted by our CRF classifier. Identified SOA terms are shown in bold in the context of their originating sentence, and the resolved drug name, generic name and category are shown on the right. Sentence Resolved Drug Resolved Generic Resolved Category My doc prescribed suboxone on Sunday to help quitting from vicodin. Vicodin hydrocodone opioid I need help. I am on vic for the last 20 years. Vicodin hydrocodone opioid She began with meth months ago and now is using coke. cocaine methamphetamine cocaine methamphetamine cocaine stimulant As for myself, it was a 7 year run with percs/patches. Percocet oxycodone opioid and resolving these to drug categories as per § 7.3.4. Note the model’s sensitivity to context: in the first sentence, suboxone is not extracted because it is being used as a treatment for the author’s addiction to Vicodin. 7.3.4 Drug Term Resolution The DOC terms extracted by our classifier vary widely in terms of spelling (we saw 58 variations on Vicodin alone) and specificity (users refer to drugs with brand, generic and even class names). For example, somebody might refer to Suboxone as buprenorphine, or even just as an opiate. Resolving related drug terms to common entities is necessary for analysis and comparison. CHAPTER 7. IDENTIFYING DRUGS OF CHOICE 88 Table 7.3: Summary of similarities and differences between our Forum77, NA, TEDS and DAWN datasets. Forum77 is unique in that participation is always voluntary and that users report only substances that they deem relevant. Forum77 NA TEDS DAWN Population size 19,634 8,837 1,844,720 131,698 Time in which data were generated 2007-2011 2011 2011 2011 Data self-reported? Yes Yes Yes Yes Duplicate users in dataset possible? Yes Yes Yes Yes Survey population membership voluntary? Yes Not always Not always Not always Users can report multiple substances Yes Yes Yes Yes Substances reported only those which user perceives as relevant All All All To resolve drug names, we compiled a list mapping misspellings in our data set to a single drug name (either brand or generic). We then mapped all brand names to their respective generic names, and finally, categorized each substance into a general class (Table C.1). We ultimately resolved ∼1,200 terms to 90 entities in 10 drug classes (see Appendix C). 7.4 Comparing Real-World DOC Distributions We compare our results to survey data on the face-to-face peer recovery group Narcotics Anonymous (NA), as well as survey data on individuals who present to addiction treatment centers (TEDs) and emergency rooms (DAWN). We use the 2011 (most recently available) reports for each of these surveys, and compare results to the Forum77 data set spanning 2007-2011. We include multiple years of Forum77 data as we find that the DOC distributions in the Forum77 population vary only slightly over time. Below, we describe how we process each data set, and summarize key similarities and differences between them (Table 7.3). Final categorical alignment for cross-survey comparison between surveys is described in Table 7.4. CHAPTER 7. IDENTIFYING DRUGS OF CHOICE 89 Table 7.4: Alignment of categories across the Forum77, NA, TEDS and DAWN datasets for comparative purposes. Exact category terms from each survey have been preserved in this table for replicability. Forum77 NA TEDS DAWN Alcohol Alcohol Alcohol Alcohol Cocaine Cocaine, Crack Cocaine/Crack Cocaine Hallucinogens Hallucinogens (LSD, PCP) PCP Other Hallucinogens LSD PCP Misc. hallucinogens Heroin Opiates (heroin, morphine) Heroin Heroin Inhalants Inhalants (glue, Nitrous Oxide) Inhalants Inhalants Marijuana Cannabis (pot, hashish) Marijuana/Hashish Marijuana Synthetic cannabinoids Methadone and Suboxone Methadone/Buprenorphine Methadone (non-RX) Methadone/Buprenorphine Opioids Opioids (Oxycodone, Vicodin, Fentanyl) Opiates/Synthetics Opiates/Opioids Stimulants Ecstasy Stimulants (speed, crystal meth) Methamphetamine Other Amphetamines Other Stimulants Amphetamines Amphetaminedextroamphetamine GHB MDMA Methamphetamine Methyphenidate Sedatives Tranquilizers (Klonopin, Valium, Xanax) Barbituates Benzodiazepines Non-Barbituate sedatives Other non-benzodiazepine tranquilizers Barbiturates Benzodiazepines Ketamine Misc. anxiolytics sedatives and hypnotics 7.4.1 Forum77 Our classifier identifies DOCs for 19,634 (70%) of the 28,005 users who initiated discussions on Forum77, corresponding to ∼50% of the 78,507 initiating posts analyzed. This corroborates our observation that ∼55% of the posts in our 500-post training and testing sample contained DOC mentions. To acquire a distribution of DOCs in the Forum77 population, we count, for each drug category (see Table 7.4) the number of unique users who abused a drug in that category. We then normalize the counts by the DOC-identifiable population size. CHAPTER 7. IDENTIFYING DRUGS OF CHOICE 7.4.2 90 Narcotics Anonymous Narcotics Anonymous (NA) conducts an annual membership survey in which respondents identify both main drugs used as well as any other drugs used on a regular basis [2]. Responses are identified using a checklist of drug categories (Table 7.4). As the results are published only in aggregate form, we acquired the raw data from NA for the online component of the survey for analysis. Omitting entries with either a 0 second response time or in which the user declined to answer the drug-related questions, there were 8,837 respondents. Categorizing heroin in the NA survey data: While both DAWN and TEDS have a separate category for heroin, NA groups heroin in to the category “Opiates (heroin, morphine etc.)”. To align the NA data set with DAWN and TEDS, we classify “Opiates (heroin, morphine, etc.)” with “Heroin”, based on the assumption that most users in this category are using heroin rather than morphine or other opiates. 7.4.3 TEDS The Treatment Episode Dataset is an annual survey detailing peoples’ self-reported drug use upon admission to state and national rehabilitation facilities [241]. There is no need to process this data set further, and we report results directly from the TEDS 2011 survey (1,844,720 respondents). 7.4.4 DAWN The Drug Abuse Warning Network (DAWN) is a nationally representative public health surveillance system that monitors drug-related emergency department visits to hospitals. The survey records up to 22 drugs related to an emergency room visit [231]. We considered only DAWN data set instances corresponding to drug misuse (131,698 instances). As 95.5% of the users in this population mention at most three drugs, we consider only the first three substances mentioned. From these, we filter out substances that are common but not typically abused, such as insulin. Finally, we map the remaining drugs to categories using the DAWN Drug Reference Vocabulary6 . 6 Available at http://www.samhsa.gov/data/dawn.aspx 7/4/2014 localhost:8081/index2.html CHAPTER 7. IDENTIFYING DRUGS OF CHOICE 91 Opioids Suboxone Sedatives Alcohol Cocaine Heroin Marijuana Stimulants Hallucinogens Inhalants 0% 25% FORUM77 50% 75% 0% 25% TEDS (2011) 50% 75% 0% 25% NA (2011) 50% 75% 0% 25% DAWN (2011) 50% 75% Figure 7.1: Drug of choice distributions (% of population using) across the Forum77, TEDS, NA and DAWN data sets. 7.5 Results Forum77 users struggle with opioid addiction at much higher rates than other surveyed populations of drug users Figure 7.1 shows substance usage distributions across the Forum77, TEDS, NA and DAWN surveys. Prescription opioids, utilized by ∼70% of the population, are by far the most prevalent DOC, followed by opioid replacement therapy opioids Methadone and Suboxone (25%). This is more than double the population prevalence reported in any of the other three surveys. Relatively few Forum77 users mention struggling with traditionally abused drugs: Alcohol, mar- ijuana and cocaine are the three most prevalent DOCs in the NA, TEDS and DAWN populations (Figure 7.1). However, these three substances are conspicuously scarce in the Forum77 population. For example, alcohol is reportedly abused by approximately 80%, 55% and 37% of the NA, TEDS and DAWN populations, respectively, but only by 10% of Forum77 users. After peaking in 2008, the Forum77 population slowly declines: Figure 7.2(a) shows the number of active monthly users by DOC on Forum77. In February 2008, ∼180 unique hydrocodone users initiated a discussion on Forum77. In contrast, the corresponding number of users for February 2014 is ∼60. The decline in population of hydrocodone and oxycodone users is steeper than that of other DOCs. To analyze DOC prevalence over time accounting for population decline, we normalize by population size (Figures 7.2(b) and 7.2(c)). Hydrocodone and oxycodone are the most prevalent DOCs on Forum77, but this prevalence declines over time: Figure http://localhost:8081/index2.html 7.2(b) shows the prevalence of the six most common opioids in the Forum77 1/1 CHAPTER 7. IDENTIFYING DRUGS OF CHOICE 7/4/2014 92 localhost:8081/trends_interactive_raw.html 180 hydrocodone oxycodone suboxone methadone tramadol heroin Number of monthly users by SOA 160 140 120 100 80 60 40 20 2007 2008 (a) Number of unique monthly 7/2/2014 Raw Data Smoothing scale (0-100): 2009 2010 2011 2012 2013 2014 users for the 5 most prevalent opioids in Forum77 from 2007-2014. localhost:8081/trends_interactive.html 50 Percentage of drug-identifiable population using (%) LOESS fit Smoothing parameter [0.25, 0.5, 0.75]: hydrocodone oxycodone suboxone methadone tramadol heroin 40 30 20 10 0 2007 2008 2009 2010 2011 2012 2013 2014 (b)RawUnique monthly users for the 5 most prevalent opioids from Forum77 as a percentage of the scale (0-100): Data Smoothing 95%localhost:8081/trends_interactive_agg.html confidence intervals indicate trends. Percentage of drug-identifiable population using (%) 7/2/2014 population. LOESS [48][0.25, fit lines with parameter 0.5, 0.75]: LOESS fit Smoothing Rx opioids ORT opioids heroin 60 50 http://localhost:8081/trends_interactive_raw.html 1/1 40 30 20 10 0 2007 2008 2009 2010 2011 2012 2013 2014 (c)RawUnique monthly scaleusers (0-100): of opioid replacement therapy (ORT) opioids, other prescription opioids and Data Smoothing heroin a proportion of the Forum77 population. LOESS fit lines with 95% confidence intervals Smoothing parameter [0.25, 0.5, 0.75]: LOESS fitas indicate trends. Figure 7.2: Prevalence of major opioids in the Forum77 population over time. http://localhost:8081/trends_interactive.html 1/1 CHAPTER 7. IDENTIFYING DRUGS OF CHOICE 93 population over time. Locally weighted smoothing (LOESS [48]) is used to fit lines to each series, and 95% confidence intervals for each fit are shown. In 2007, hydrocodone and oxycodone are utilized by approximately 45% and 33% of the population, respectively. By 2011, they each have a prevalence of approximately 30%, which declines to about 27% (hydrocodone) and 26% (oxycodone) by 2014. Opioid replacement therapy (ORT) opioids methadone and buprenorphine increase in prevalence over time: Figure 7.2(c) aggregates the data shown in Figure 7.2(b), showing the prevalence of ORT opioids (methadone and buprenorphine), other prescription opioids (e.g. oxycodone, hydrocodone etc.), and heroin in the Forum77 population over time. While prescription opioids remain the most prevalent DOCs, this prevalence declines from about 70% to 56% over time, while ORT opioid prevalence increases from approximately 19% to 28%. Heroin prevalence increases slightly in 2013: On average, about 5% of Forum77 participants abuse or misuse heroin until 2013, when the proportion of heroin users starts to increase noticeably, reaching 10% and looking to keep increasing by the end of our data set (Figures 7.2(b) and 7.2(c)). Moreover, Figure 7.2(a) indicates a small absolute increase in heroin users from mid-2013 onwards, indicating that the increase illustrated in Figures 7.2(b) and 7.2(c) is not purely an artifact of population normalization with a decline of hydrocodone and oxycodone users. 7.6 Discussion Prescription opioids are the strongly dominant DOC on Forum77, with their prevalence far exceeding that measured in other drug-using populations. We suspect that this is the result of several factors. First, users may be more receptive to seeking help anonymously online than discussing the issue with a health care provider, since the healthcare provider may be the unwitting source of the opioids in the first place [249]. Second, despite a robust evidence base for the medical treatment of opioid addiction [230], few physicians have training in such treatment [263] and the condition remains highly stigmatized within the medical community [176, 187]. Third, the more traditional self-help venues for addiction support, namely Alcoholics Anonymous and Narcotics Anonymous, demand overcoming the stigma associated with attending such meetings. The fact that opioid use disorders tend not to stem from recreational drug use, which such venues are historically associated with, likely enhances this stigma. Finally, prescription painkiller overdoses are growing at a significantly faster rate in the female population [8]. This, combined CHAPTER 7. IDENTIFYING DRUGS OF CHOICE 94 with the fact that women are more likely than men to seek help online for health issues [37, 57, 90–92, 165], could partially account for the high prevalence of prescription opioid users on Forum77. The scarcity of alcohol, marijuana and cocaine, the three most prevalent drugs present in the NA, TEDS and DAWN surveys, could suggest a low number of recreational drug users in the Forum77 population. Alternatively, it is possible that Forum77 users are using alcohol and marijuana, but do not see this use as problematic and so do not mention it. As we note in Table 7.3, the Forum77 data set is unique in that users mention DOCs at their own discretion, and are not encouraged to disclose all substances that they might be abusing. It is also possible that users approach different communities for these issues: MedHelp, for example, has a separate, albeit very small, forum dedicated to alcoholism7 . Temporal trends indicate an increase in prevalence of opioid replacement therapy (ORT) opioids and heroin, and a corresponding decline in other prescription opioids. It is possible, perhaps even likely, that these trends reflect real-world drug usage: Cicero et al. [46] report a recent increase in heroin usage due to oxycodone being more difficult to acquire and tamper with. In addition, survey data report a steady increase in national buprenorphine usage [232] over time, and a slight decrease in non-medical use of prescription opioids in the younger population [242]. While non-medical use of prescription opioids has increased in the population of users 50 and older [242], this demographic is less prevalent online [7]. However, drawing epidemiological conclusions from these data without further study into what other factors might be influencing these trends is ill advised. 7.6.1 Limitations & Future Work While our work is the first to analyze drug usage trends in an online population, several challenges remain. Foremost is extending similar analyses to a variety of online forums. Analyzing multiple data sources would yield more comprehensive insights, and would also help to triangulate features in PAT that are universally useful for monitoring substance abuse trends. Finally, a difficult but necessary challenge is to investigate whether and how drug usage trends reflected in PAT align with those observed in the real world. As we discussed in Chapter 2, online health seeking populations are not necessarily representative of real-world populations. As such, understanding the relationship between PAT-observed and real-world drug usage trends would be necessary prior 7 http://www.medhelp.org/forums/Alcoholism/show/158 CHAPTER 7. IDENTIFYING DRUGS OF CHOICE 95 to utilizing such data for monitoring and surveillance. In sum, however, our contributions in this chapter both propose a viable methodology for automatically identifying DOCs from PAT, and lend the first data-driven insights into drug usage in an online community. 7.7 Summary Our goal in this chapter was to profile substance use in Forum77, and compare this to substance use reported in traditionally surveyed drug-using populations. The ability to monitor population-level drug use trends is valuable. Despite the popularity and uniqueness of OHCs focused on the topic of substance abuse, however, no work to date focuses on automatically identifying users’ drugs of choice (DOCs) from PAT. As such, our contributions – a method for automatically extracting and resolving DOCs, as well as insights on the Forum77 population acquired through the application of this method – are both novel and useful. To automatically extract a user’s DOCs from her Forum77 initiating posts, we used manually-labeled data to train a CRF classifier (§ 7.3.2 and 7.3.3). We use a CRF classifier as the problem of identifying DOCs is context sensitive: many commonly abused drugs are also used as legitimate treatments for withdrawal. Our CRF classifier is highly accurate, achieving F1, Precision and Recall scores of 84.65%, 91.12% and 79.46%, respectively (§ 7.3.3). Finally, to facilitate analysis and comparison, we resolve extracted entities (e.g. vics, benzos) to drugs (e.g. Vicodin, benzodiazepines), and drugs to categories (e.g. opiates, sedatives) (§ 7.3.4). To profile substance use on Forum77, we applied our method to the entire set of initiating posts on Forum77 (78,507 posts authored by some 28,005 users), and compared our results to those from three surveys: the Narcotics Anonymous annual membership survey, the Treatment Episode Dataset, which surveys users in state-funded rehabilitation facilities, and the Drug Abuse Warning Network, which collects data on substance abuse related admissions to emergency departments (§ 7.4). Our results (§ 7.5) show that Forum77 users are disproportionately addicted to prescription opioids, while more traditionally-abused substances, such as alcohol, marijuana and cocaine, are infrequently reported. Our analyses of drug usage trends on Forum77 over time suggest that Forum77 may reflect real-world trends in substance use. Chapter 8 Quantifying Recovery and Relapse 8.1 Introduction Despite the prevalence of online health forums for substance use disorders, we have little understanding of the role that they play in the process of cessation. For example, when in the cycle of abuse are they most helpful to users? As we noted in Chapter 7, most substance abuse data are collected at pointof-care facilities. As such, online health communities (OHCs) are uniquely poised to offer quantified answers to questions that have previously been answered only anecdotally. For example, in a cohort of people with substance use disorders attempting recovery, what percentage relapse? Of those who recover, how long do these recovery periods tend to last? Our goal in this chapter is to educe patterns of relapse and recovery as they manifest on Forum77. We begin by describing the process of prescription abuse drug cessation and related prior work (§ 8.2), and describing the data samples used in this chapter (§ 8.3). We then make the following contributions: A quantified taxonomy of phases of addiction as expressed by users on Forum77 (§ 8.4). Our taxonomy, developed in concert with an addiction specialist, is based on Prochaska’s Transtheoretical Model (TTM) of behavior change [203], and serves both as a labeling rubric for mapping text to phases of addiction, as well as a quantified summary of phase-based activity on Forum77. We use the taxonomy to manually label initiating post sequences from 191 Forum77 users (2,266 posts total) with the labels USING , WITHDRAWING or RECOVERING. We find that Forum77 is most heavily utilized when users are WITHDRAWING. An analysis of activity and linguistic features across the phases of addiction (§ 8.5). We identify features that are characteristic of each phase, and leverage them to train a conditional random field (CRF) model to automatically label users’ phases of addiction over their tenure on Forum77. Our CRF 96 CHAPTER 8. QUANTIFYING RECOVERY AND RELAPSE 97 achieves an F1-score of 67.6% against a baseline F1-score of 20%. Using CRF-labeled sequences, we are able to identify (1) whether a user relapsed at some point during their tenure, and (2) whether a user was RECOVERING at the time of her final initiating post, with F1-scores of 78% and 82%, respectively. An analysis of transition, relapse and recovery based on the CRF-labeled phase sequences of 2,848 Forum77 users (32,345 posts) (§ 8.6 and § 8.7). We find that overall, progressive transitions are more prevalent than regressive transitions. Moreover, despite the fact that relapse is common (almost half of users relapse at some point during their tenure), the chances of a user RECOVERING by her final post are favorable. Finally, we observe a significant correlation between high forum engagement (both frequency of participation and volume of response posts authored) during a user’s phases of USING and WITHDRAWING and the probability that she is RECOVERING when she leaves Forum77. We discuss our results in the context of Forum77’s efficacy as a withdrawal aide, implications for future forum design, and implications for Addiction research (§ 8.8) before concluding (§ 8.9). 8.2 Background To our knowledge, our work is the first to investigate the topic of prescription drug abuse cessation in social media. Given the secretive and stigmatized nature of this condition [174, 176, 187], our contribution provides a unique and often overlooked perspective on prescription drug abuse: that of patients themselves. In this section, we provide an overview of prescription drug abuse as well as the traditional, in-person mutual help groups Alcoholics Anonymous (AA) and Narcotics Anonymous (NA). Next, we present work that, like ours, attempts to infer a person’s health state from her social media contributions. For a review of literature analyzing the efficacy of OHC participation, we refer the reader to § 2.2.4. 8.2.1 The Prescription Drug Abuse Cycle Prescription drug abuse (or “nonmedical use”) is defined as “the use of a medication without a prescription, in a way other than prescribed, or for the experience or feelings elicited” [249]. Opioid pain relievers, such as hydrocodone, oxycodone, morphine and codeine, are the most frequently abused prescription medications [5]. In 2010, some 5.1 million Americans reported misusing prescription pain relievers in the last month, followed by sedatives (2.6 million) and stimulants (1.1 million) [5]. CHAPTER 8. QUANTIFYING RECOVERY AND RELAPSE 98 Withdrawal Withdrawal (or detoxification) is a painful process that is frequently compared to having a bad case of influenza [6, 84]. Common withdrawal symptoms include agitation, anxiety, muscle aches, insomnia, sweating, abdominal cramping, diarrhea, goose bumps, nausea and vomiting [6]. Typically, symptom onset aligns with the first missed dose in the case of a “cold turkey” approach, or within a few days of dose reduction in the case of a taper [84]. Symptom severity peaks within a few days of final exposure, and gradually reduces as the user’s physical dependence on the drug weakens [84]. Withdrawal duration, dependent on biological factors, drug and dosage levels, and withdrawal method, ranges broadly from 7-10 days (cold turkey) [102] to 20-35 days (methadone-assisted taper) [84]. Self-Detoxification Research on easing the withdrawal process focuses primarily on medication-assisted detoxification overseen by a medical professional, with almost no work on the subject of self-detoxification. We found two studies in which attendees of the same London methadone treatment facility were interviewed about prior self-detoxification attempts. In both studies, most patients had attempted self-detoxification, and many had made multiple attempts [102, 184]. The short-term success rate of achieving 24 hours of abstinence per episode was 41% [184], while the medium-term success rate of achieving 10 days of abstinence per episode was 24% [102]. The design of these studies naturally exclude patients who successfully maintain long-term abstinence. When asked why their attempts had failed, subjects pointed to lack of support during detoxification [102], as well as easy access to drugs and severity of withdrawal symptoms [102,184]. Patient-reported strategies for effectively completing withdrawal include distraction and avoidance, especially in the form of physical activity [102]. In addition, Green et al. [106] showed that informing patients in full as to the type and severity of withdrawal symptoms that they were likely to experience resulted both in lower self-reported symptom severity scores as well as an increased probability of completing the detoxification process. Relapse & Recovery Relapse rates for opioid use are high. Reported reuse statistics for individuals having gone through detoxification programs range from 81-91% [103, 227]. However, long-term prognoses are more favorable, with evidence suggesting that 45-51% of patients may achieve sustained abstinence, and that sustained abstinence is a gradual process [103]. CHAPTER 8. QUANTIFYING RECOVERY AND RELAPSE 99 “Recovery” is a hotly contested term in drug use disorder communities. Many align with the Alcoholics Anonymous viewpoint that addiction is an uncurable disease and, as such, an individual never fully “recovers” from addiction [1]. Rather, users who reach sustained sobriety are referred to as being “in recovery”. In this work, we refer to users who have overcome physical withdrawal as RECOVERING. 8.2.2 In-Person Mutual Help Groups Alcoholics Anonymous (AA), founded in the 1930s, is one of the most utilized services for substance use disorders in the world, with over 4 million members across 100 different societies [133]. It has also given rise to other peer recovery groups for addiction, like Narcotics Anonymous (NA) and Gamblers Anonymous (GA). AA and NA are almost entirely based on mutual support, even condemning the giving of medical advice as outside the expertise of the group, instead encouraging members to see a doctor if medical or psychiatric problems arise [133]. Three decades of accumulated evidence demonstrates that active participation in such groups for addiction improves outcomes [155], although success rates are ill-defined and vary across studies [20]. A high participation level in AA is reported to be one of the strongest predictors for abstinence [190, 223]. For example, Pagano et al. [190] found that users who actively helped other AA members had a relapse rate of 55%, while those who did not relapsed at a rate of 75%. Correspondingly, many of the benefits of AA are thought to stem from the social network that it provides its members, who afford each other support, role modeling and experiential advice [140]. Kelly et al. [141] find that through their interactions with other AA members, users experience increased abstinence self-efficacy, increased spirituality/religiosity and reduced negative affect. Having a sponsor is also thought to help newcomers avoid relapse [237]. 8.2.3 Inferring Health State from Social Media The idea that social media users’ health states will be somehow reflected in the content that they contribute, and that it may be possible to predict health state from these data, has captured the interest of several researchers. De Choudhury et al. [69–71] analyze how postpartum depression (PPD) might be reflected on both Twitter and Facebook. Using their findings, they leverage activity and linguistic features to build models that can predict the onset of PPD from Facebook data [71]. In other social media studies, both activity features, such as social engagement and connectivity, and linguistic features, such CHAPTER 8. QUANTIFYING RECOVERY AND RELAPSE 100 as affect and writing style, have been shown to be useful indicators of depression [72, 129, 191, 208], neuroticism [208] and post-traumatic stress disorder [118]. A related challenge is to identify a user’s current phase within a specific medical condition. Jha and Elhadad [136] found that a combination of linguistic and activity features are helpful for identifying cancer stages I–IV. Murnane and Counts [180] conducted an analysis of smoking cessation as reflected on Twitter. They find that linguistic features of positive and negative sentiment, as well as social interaction variables, were significant differentiators between users who relapsed and users who ceased their smoking behavior during the time of the study. Finally, Wen and Rosê use logistic regression and flexible pattern matching over posts from an online cancer community to extract pre-defined events onto a timeline [252]. 8.3 Data Typically, users present their own current substance use situation (e.g., drugs used and number of days clean) in initiating posts. In contrast, users are liable to discuss a wide range of substance abuse situations in response posts, including their own and the initiator’s. Accordingly, we restrict our analysis to Forum77’s initiating posts, of which there are 78,507 authored by a total of 28,005 unique users. Below, we describe the data sets that we use for taxonomy development, classifier training and testing, and analysis. Taxonomy Development: Our taxonomy development (§ 8.4) is an iterative process; for each iteration we randomly sampled 1,000 of Forum77’s initiating posts. Training & Testing Dataset: In § 8.4.4 we describe the importance of labeling sequences of initiating posts rather than randomly sampled individual posts (as we did for taxonomy development). For our labeled data set (§ 8.5.1) we randomly sample 200 users who had authored > 5 initiating posts on Forum77, and all of their 2,266 initiating posts. Analysis Dataset: We analyze all initiating post sequences of users who authored > 5 initiating posts on Forum77. This totals 41,387 initiating posts authored by 2,848 users. CHAPTER 8. QUANTIFYING RECOVERY AND RELAPSE 8.4 101 Exploring & Modeling Phases of Addiction To systematically analyze phases of substance abuse in Forum77, we require both a valid taxonomy of phases and a rubric mapping post text to these phases. Towards this aim, we derive a rubric based on labels from the Transtheoretic Model (TTM) of behavior change, which we describe below. 8.4.1 Transtheoretical Model for Behavior Change The Transtheoretical Model (TTM) is a framework that describes six stages of change that a person traverses in order to manifest permanent behavior change. Established in 1997 by Prochaska & Velicer [203], the TTM has been applied to a range of behaviors, from smoking cessation [75, 180, 247] and substance abuse [175], to sustainable energy usage [123]. The intuitiveness and universal applicability of the TTM make it a useful descriptive tool; however, care should be taken before utilizing it to inform treatment or intervention [175, 253]. According to the TTM, a person begins in the stage of pre-contemplation, in which she is not thinking about initiating a behavior change. After contemplation, she moves on to preparation, in which she makes preparations necessary to initiate a behavior change. The person then moves on to action, a concerted and deliberate attempt to affect short-term behavior change. If successful, the person enters a period of maintenance, in which she tries to sustain the behavior change in the long term. If successful, the person eventually enters the stage of termination [203]. As there is considerable debate over whether addiction is a terminable condition [1], we omit this stage for our purposes. 8.4.2 Rubric Development In order to match Forum77 posts to TTM stages, we randomly sampled 1,000 initiating posts. Two authors mapped these posts to stages in the TTM, assigning descriptive labels to emergent sub-categories specific to the topic of addiction (e.g., tapering and cold turkey are both part of the TTM stage Action) in the style of a General Inductive Approach [236]. We repeated this process several times, reviewing the rubric with an addiction specialist prior to finalization. (Note: this is the same thematic analysis process as that described in Figure 6.1 in § 6.5.) CHAPTER 8. QUANTIFYING RECOVERY AND RELAPSE 8.4.3 102 A Taxonomy of the Phases of Addiction Table 8.1 describes our resulting phase taxonomy, along with example posts (synthesized from genuine posts to preserve user privacy) and the prevalence of each label in our final 1,000 initiating post sample. Although descriptively interesting, several of the labels in the taxonomy (e.g., intent to quit and about to quit) are rare. For parsimony, and to aid subsequent classification accuracy, we collapse labels into three categories: USING, WITHDRAWING and RECOVERING. This improves inter-annotator agreement (over a 100-post, independently labeled sample) from a Cohen’s Kappa of 0.73 to 0.78. 8.4.4 Labeling People, not Posts Moving forward, we want to analyze addiction phases at the level of individual people. Two factors that emerged in our taxonomy development (see Table 8.1) convinced us that labeling randomly sampled posts would be insufficient for such analyses, and that we should instead label users’ entire post sequences. The first was the high prevalence (9.8%) of n/a labels. These posts are often social in nature and, taken independently, impossible to assign to a class. However, when read in the context of the author’s previous and subsequent posts, the label is usually obvious (see Figure 8.1). The second factor was the low prevalence of relapse labels. We noticed that while many users relapse, few announce the fact directly. Rather, most users will mention a relapse when they are already committed to another cessation attempt (e.g., about to quit or even quitting again). However, a relapse can still be observed in a regressive sequence, such as WITHDRAWING → USING (see Figure 8.1). Based on these observations, in the rest of this paper we label sequences of posts. 8.5 Characterizing the Phases of Addiction Phases of addiction coincide with distinct physiological and psychological states. In this section, we analyze activity and linguistic features that might characterize an author’s phase on an initiating-day. We define an initiating-day to be any day on which the user initiated a thread on Forum77. If the author initiated multiple posts, we combine them for analysis. Our goal is two-fold: (1) to characterize phases of addiction as they are expressed on Forum77, and (2) to identify discriminative features that might be used for classification. CHAPTER 8. QUANTIFYING RECOVERY AND RELAPSE 103 Table 8.1: Addiction Phase Taxonomy derived via a thematic analysis. Final Category TTM phase Label Description Synthesized Example USING Precontemplation Using Subject is using substances and demonstrates no intention to quit. it has been forever since I’ve been here and not much has changed. I am still using the prescribed amount of oxycodone for neck pain. 3.1 Addicted Subject is using substances and indicates that she is addicted, but demonstrates no intent to quit. Subject has used substances again after an attempt to quit. my girlfriend and i r both addicted to percs but she is taking way more than me and keeps getting chest painonce every other week. 7.4 I just messed up majorly. I was 6 days clean, doing OK-ish, when my mother stopped by with 10 Vics “incase I needed them”. Of course, being the WEAK person I am, I took them all right there. 1.3 Relapse WITHDRAWING RECOVERING 8.5.1 % Contemplation Intent to quit Subject expresses desire to stop abusing a substance in the future. I want off roxies. is methadone the answer. I need to work daily. I cannot do withdrawls. PLEASE HELP! 9.3 Preparation About to quit Subject notes time and/or plan (e.g., tapering schedule) to quit. i was planning to quit the first week of March. True to form addict fashion I’m out of both money and pills. So I‘m about to go ct now instead of next week when I‘d planned. 2.5 Action Quitting Subject is in withdrawal; method unspecified. 39.1 Tapering Subject is in withdrawal; detoxification method is a taper. Cold Turkey Subject is in withdrawal; detoxification method is cold turkey. Today is my 5th day of FREEDOM! I havent experienced any w/ds yet. So much energy. Have some Vics I am taking. I am down to 6 a day. I plan to go down to 3 a day then 1 a day until I am done! I am on day 6 of CT from 150mg+ a day of ocycodone. I‘m doing fine just some overall anxiousness In recovery Subject has finished detoxing; no physical withdrawal symptoms expressed Just an update to tell you that I have 67 clean days today. I feel amazing. I sleep well now and feel good! I’ve had a lot of discussions about aftercare. 17.8 n/a Impossible to determine status based on post I’ve been away for few days and everything seems different. Anyway I hope everyone is doing great. 9.8 Maintenance 6.4 3.3 Sample & Labeling To study how addiction phase sequences change over time, we restrict our analysis to users who have initiated at least 5 threads on Forum77 (n=2,848 out of 29,196 users who initiated at least one post). Of these, we randomly sampled 200 users (∼7% of the full 2,848) and all of their initiating posts. We Label sequences, not posts CHAPTER 8. QUANTIFYING RECOVERY AND RELAPSE 104 Day 4 off vics today and some cravings but I’m going strong!! -WilB Hey guys. Just checking who’s hanging around on the forum tonight. Peace! Absence USING WITHD. RECOV. 6 days today and feeling pretty terrible. The restless legs are killing me, can’t… Relapse First post Last post 105 from Figure 8.1: Illustration of how sequence analysis can (1) reduce NA labels by leveraging context surrounding posts, and (2) capture relapse events in regressive sequences without requiring the user to explicitly state that she relapsed. discarded 9 users from the sample: two who had authored more than 100 posts, one account that belonged to MedHelp, and six accounts for which there was no clear ownership (several different people appeared to be using the same MedHelp account). The resulting sample contains 2,266 initiating posts (average 11.9 posts per user) and comprises ∼5.5% of the full 41,387 initiating posts authored by the 2,848 users who have authored ≥ 5 posts on the forum. Two authors categorized each initiating post in the sample using the taxonomy presented in Table 8.1. We labeled each user’s data in chronological order so as to transfer context learned from surrounding labels. Disagreements (which were rare) were relabeled based on a consensus reached after discussion. 8.5.2 Activity Features We identify 15 activity characteristics that describe an initiator’s global activity over time, her local activity 5 days prior to the initiating-day in question, and both the initiator’s and respondents’ activity on the initiating-day. The features capture user activity volume (e.g., number of posts initiated in the last 5 days), engagement (e.g., days elapsed since last response to another user) and attention (e.g., number of unique respondents to a user’s initiating post on the initiating-day). For a full description of all features, as well as summary statistics of their distributions across each class, we refer the reader to Table D.3. CHAPTER 8. QUANTIFYING RECOVERY AND RELAPSE 8.5.3 105 Linguistic & Content Features LIWC Features Differences in word use and linguistic style are believed to reveal a range of information about people, from psychological state to social identity [196]. The Linguistic Inquiry and Word Count (LIWC) [195] software calculates 80 linguistic variables over text. In prior work, LIWC has been used to characterize and distinguish women suffering from Post-Partum Depression (PPD) [71], individuals at risk for depression [72] and smokers on Twitter who are at risk for relapse [180]. We calculate all 80 LIWC variables over initiating post text as well as over all responses received on the initiating-day. We then examine differences in these variables across the USING, WITHDRAWING and RECOVERING phases (Tables D.1 & D.2). Days Mentioned and Question Features In addition to the LIWC features, we calculate three variables over initiating post text. Users frequently mention how long they have been clean at the time of posting. We extract days clean automatically by using hand written patterns, such as “clean X days” and “X weeks off”, where X represents a number. We convert X to days if necessary. We also use a more relaxed version of this feature, called days mentioned, in which we do not require the user to explicitly mention terms like “clean” or “off”. Finally, we count the number of questions asked by identifying sentences that start with a question word and/or end with a question mark. This feature has proved helpful in prior work [71]. We find that including these three extra features improves classifier performance by ∼2.2%. Phase-Specific Term Features Finally, we count how many phase-specific words occur in both initiating post text as well as response text. To determine whether a term t is particularly descriptive of a phase p, we calculate its frequencybased odds ratio. If fp (t) is the number of posts of phase p that contain t, then: OR(t, p) = fp (t) ∗ fp̄ (t̄) fp (t̄) ∗ fp̄ (t) CHAPTER 8. QUANTIFYING RECOVERY AND RELAPSE 106 The odds ratio is a measure of strength of association. We calculate the odds ratio for each term across each phase, and retain terms with an odds ratio >2. Table 8.2 shows sample terms for both initiating and response posts. Table 8.2: Sample phase specific terms for the USING, WITHDRAWING and RECOVERING categories. Initiating Posts Response Posts USING withdrawls, wants, hate, addicted, scared, tried, stop situation, willing, treatment, withdrawl, option, advise, rehab, counseling WITHDRAWING rls, hot, restless, aches, slept, arms, legs, headache, wd, worst, stomach, tramadol potassium, heating, fluids, baths, pad, showers, legs, melatonin, hot, slept, bananas RECOVERING craving, recovery, lately, sober, fight, truly, clean, cravings, true, worth inspiration, accomplishment, congratulations, sharing, thank, miss, proud, paws 8.5.4 Results: Activity and Linguistic Features We present linguistic features over initiating posts in Table D.1, linguistic features over response posts in Table D.2, and activity features in Table D.3. Unless otherwise mentioned, we use Kruskal-Wallis tests to assess statistical significance. A non-parametric test is appropriate for data that are not expected to follow a normal distribution (such as ours), and a Kruskal-Wallis test determines whether any pair in a trio of distributions is significantly different. Our feature analysis indicates that both users’ activity and users’ content and linguistic characteristics differ measurably across addiction phases. We discuss particularly descriptive features of each phase below. USING : This phase is characterized by long absences from the forum and, correspondingly, low levels of recent activity. Users who are USING have, on average, been absent from forum participation in all capacities for more than twice as long as users who are WITHDRAWING or RECOVERING (40 vs. ∼18 days since last activity ). A longer absence from the forum may partially explain why USING posts are, on average, longer (208 vs. ∼180 words): users must account for lost time and bring their audience back up to speed. Both days clean and days mentioned vary widely in USING posts, and have surprisingly high median values. Examining the underlying data provides an explanation: users who are USING often mention how CHAPTER 8. QUANTIFYING RECOVERY AND RELAPSE 107 long they had been clean prior to relapse in statements such as, “I was clean for 4 months before...” or “I would have had 717 days clean today”. Finally, USING posts offer the lowest levels of positive affect (16% less than WITHDRAWING and 32% less than RECOVERING), and the highest levels of discussion around the topic of health (16% more than WITHDRAWING and 36% more than RECOVERING); characteristics that are mirrored in responses to USING posts. The lack of positivity resonates with the fact that users who are USING have either relapsed or failed to progress towards recovery. WITHDRAWING: In recent activity, users who are WITHDRAWING issue more initiating posts and self responses than those who are USING or RECOVERING. In addition, they have the smallest average number of days since last initiating post (21 vs. 31 RECOVERING and 50 USING) and days since last self-response (29 vs. 42 RECOVERING and 66 USING). As we might expect, WITHDRAWING users express the lowest numbers of days clean and days mentioned. In addition there is a great deal more language about feeling, biological processes and the body. These observations align with the nature of detoxification as an uncomfortable physical process from which people constantly seek relief [84]. Responses to WITHDRAWING posts are not particularly distinctive. Aside from expressing slightly more anxiety, and writing slightly more about feeling and the body, other linguistic variables tend to take on a value somewhere in between those of responses to USING and RECOVERING. It is possible that respondents try to influence users from one side of the spectrum to the other, modifying their language according to the user’s progress. RECOVERING : These users are highly active, especially in the area of responding to other peoples’ posts. In recent activity they issue, on average, 15.2 responses to other peoples’ threads, compared to 5.5 by users who are WITHDRAWING and 1.9 by users who are USING. Moreover, unlike WITHDRAWING and USING users, their # initiating posts # responses authored tends to be <1. Linguistic features also suggest that RECOVERING users tend to focus on others. The pronoun you is used almost 100% more while the I pronoun is used less, and language is more social. Moreover, users express significantly more positive affect (25% more than WITHDRAWING, 48% more than USING) and less anxiety (18% less than WITHDRAWING, 16% less than USING). The evident outward focus of initiating posts from RECOVERING users resonates with the 12th step in traditional twelve-step programs CHAPTER 8. QUANTIFYING RECOVERY AND RELAPSE 108 such as AA, which encourage people to strengthen their sobriety by using their experiences to help others achieve it [1]. Responses to RECOVERING posts are distinct in that they express substantially more positive affect (27% more than responses to WITHDRAWING, 57% more than responses to USING). They also tend to host a notable quantity of exclamation marks (100% more than WITHDRAWING, 350% more than USING). Inspection reveals that this is an expression of excitement and encouragement in response to good news, for example, “hoooooorrrraaaahhhhh!!!!!!!!!” and “I am so PROUD of YOU!!!!!”. 8.6 Automatically Classifying Addiction Phase Informed by our feature analysis, we next train a statistical classifier to automatically label Forum77 posts as USING, WITHDRAWING or RECOVERING. Analyses of phase sequences can give insight into events such as relapse and recovery. Our classifier allows us to scale such analyses to the entire Forum77 data set. Below, we describe our classifier and report its performance. We discuss relapse and recovery in § 8.7. 8.6.1 Model & Features A user’s path through the different phases of addiction forms a natural sequence. A conditional random field (CRF) [151] is a probabilistic graphical model that performs inference over sequences, rather than individual data points. By taking into account prior and subsequent data items in a sequence, CRFs are context sensitive. For example, unlike a CRF, a non-sequence-based classifier might have difficulty classifying a post like, “I’ve been away for a few days and everything seems different. Anyway I hope everyone is doing great...”, even if it was sandwiched between two posts that were obviously USING, as the post itself contains no clues as to the user’s phase. Accordingly, we train a 3-class CRF to annotate a user’s sequence of initiating-days with the labels USING , WITHDRAWING or RECOVERING. We use an adapted a version of the Stanford Named Entity Recognizer package, a trainable, Java implementation of a CRF classifier1 , that analyzes sequences of documents (default unit of analysis is a token). Tables D.1, D.2 and D.3 indicate the subset of features that we used for classifier training. We selected features based on apparent discriminability and iterative evaluation through 10-fold cross validation. In order to improve robustness and model potentially 1 http://nlp.stanford.edu/software/CRF-NER.shtml CHAPTER 8. QUANTIFYING RECOVERY AND RELAPSE 109 Table 8.3: CRF performance scores aggregated over 10 runs of 10-fold cross validation, with randomly shuffled input sets. Label Precision Recall F1 score Accuracy Combined 68.3 68.0 67.6 69.8 USING 62.4 61.7 61.4 WITHDRAWING 70.6 71.9 70.9 RECOVERING 72.1 71.2 70.9 Baseline 14.0 33.0 20.0 43.0 non-linear responses, we binned numeric features into octiles: ranks that divide the data evenly into 8 groups. While using quartiles is arguably more common in standard practice, we found that using octiles improved classifier performance. 8.6.2 Performance Table 8.3 shows precision, recall and F1 scores for the CRF classifier. Our classifier achieves an F1 score of 67.6% against a baseline F1 score of 20.0%, acquired by labeling each instance with the majority class, WITHDRAWING. It is useful to know which labels the CRF is likely to confuse. Figure 8.2 shows the CRF classifier’s confusion matrix. Diagonal entries indicate counts of correctly-classified instances. The strong diagonal indicates a relatively high level of accuracy. Most classification errors occur between adjacent phases: confusing USING and WITHDRAWING, and confusing WITHDRAWING and RECOVERING is common, but confusing USING and RECOVERING less so. This resonates with a point prevalent in the addiction literature: stages of recovery are not black and white but rather fall on a spectrum [79, 168]. 8.6.3 Results We analyze the result of applying our CRF classifier to the entirety of the Forum77 membership base who have initiated > 5 posts (2,848 users, 32,345 initiating posts). Our results give us insight into common transitions between addiction phases, enabling us to answer questions such as, “If a user is WITHDRAWING today, how likely is it that she will be RECOVERING on her next initiating-day?” and “what is the most frequent phase change observed on Forum77?” 6/4/2014 localhost:8080/index_transition.html CHAPTER 8. QUANTIFYING RECOVERY AND RELAPSE 110 327.2 131.8 62.2 150.2 686.9 142.7 52.2 139.8 560.2 G Using CRF LABELS Recov. Withd. Using CRF LABELS Recov. Withd. Using GOLD LABELS Using Withd. Recov. Figure 8.2: Confusion matrix for our CRF classifier aggregated across 10 randomized runs of 10-fold cross validation. Figure 8.3(a) shows the normalized transition frequency matrix for USING, WITHDRAWING and RE COVERING . The most common transitions lie along the diagonal, indicating that users typically initiate consecutive posts in any one phase. Self-transitions aside, the progressive edges between consecutive stages (USING → WITHDRAWING and WITHDRAWING → RECOVERING) are the most common, accounting for approximately 6% and 5.2% of total transitions, respectively. In contrast, regressive edges between consecutive stages (WITHDRAWING → USING and RECOVERING → WITHDRAWING) are less common, accounting for 2.6% and 1.1% of total transitions, respectively. Figure 8.3(b) shows conditional transition probabilities across states. The likelihood of a samestate transition increases with the progressiveness of the state. For example, there is a 71% chance that a USING user will be USING in her next post, an 81% chance that a WITHDRAWING user will be WITHDRAWING in her next post, and a 91% chance that a RECOVERING user will be RECOVERING in her next post. Figure 8.4 shows the distributions of phase length in days for each phase. We calculate phase length as the number of days between the first and last post in a contiguous sequence. The typical WITHDRAWING phase lengths align well with those reported in the literature on addiction, which suggests a 7–35 day duration depending on the detoxification method used, as well as other factors [84, 102]. http://localhost:8080/index_transition.html 6/3/2014 localhost:8080/index_transition.html CHAPTER 8. QUANTIFYING RECOVERY AND RELAPSE Source State Recov. Withd. Using Gold Labels g Withd. Recov. 111 Target State Using Withd. Recov. Target State Using Withd. Recov. 17.35 6.04 1.12 70.79 24.64 4.57 2.56 33.85 5.23 6.15 81.29 12.56 1.78 5.26 1.11 30.96 (a) 3.28 91.46 (b) Figure 8.3: (a) Normalized transition frequencies between addiction phases (e.g., USING → RECOVERING edges comprise 1.12% of the totalSTATE transitions in the CRF-labeled data) and (b) conditional transition TARGET TARGET STATE probabilities (e.g., the probability of a user moving from USING to RECOVERING is 4.57%.) Using Withd. Recov. Using Withd. Recov. SOURCE STATE Recov. Withd. Using LD LABELS Withd. Recov. localhost:8080/index_transition.html 6/3/2014 8.7 Automatically Classifying Relapse and Recovery Relapse and recovery are critical events in the process of addiction that are often viewed as “failure” or “success”. Prior work in the addiction literature suggests that recovery is a long, iterative process of which relapse is a part [103]. Leveraging our CRF classifier, we present methods for identifying (1) if a user has relapsed during her tenure on the forum, and (2) if a user is RECOVERING on her last initiating-day on Forum77. We then investigate if relapse adversely correlates with a user’s chance of RECOVERING . Finally, we identify activity features during USING and WITHDRAWING phases that discrim- inate between users who wrote their final post on Forum77 in a state of RECOVERING, and those who did not. 8.7.1 Identifying Relapse To identify a relapse incident, we codify three regressive transition patterns: RECOVERING → { WITHDRAWING, USING } WITHDRAWING → USING WITHDRAWING → (45+ days absent) → WITHDRAWING http://localhost:8080/index_transition.html CHAPTER 8. QUANTIFYING RECOVERY AND RELAPSE 112 7/24/2014 240 220 200 180 160 140 120 100 80 60 40 20 0 7/24/2014 260 240 220 200 180 160 140 120 100 80 60 40 20 0 localhost:8080 Median Q1 – Q3 (1.5 * IQR) within Q1, Q3 13 13 8 7 7/24/2014 200 180 160 140 120 100 80 60 40 20 0 1 7 24 USING phase length (days) 16 35 WITHDRAWING phase length (days) 17 36 RECOVERING phase length (days) 55 60 localhost:8080 60 localhost:8080 60 79 Figure 8.4: Distributions of phase lengths. A red bar indicates the median value, while the dark blue region indicates the middle spread. The light blue region indicates values that fall within 1.5 ∗ the interquartile range of the middle spread. This last pattern is based on the observation that a general window for withdrawal duration is 7-35 days [84, 103]. As such, if a user was absent for more than 45 days, and then returned in a state of WITHDRAWING, it is likely that she failed in her initial attempt and has restarted. While it is possible that this pattern will capture individuals on a slow taper, in our experience it is unlikely that such users would be inactive for a full 45 days. We identify whether a user relapsed or not during her tenure on Forum77 by testing whether any of http://localhost:8080/ the above patterns exist in her sequence of phase transitions. To evaluate the efficacy of this approach, we apply it to both the gold label sequences as well as the CRF-labeled sequences in our labeled sample CHAPTER 8. QUANTIFYING RECOVERY AND RELAPSE 113 Table 8.4: Performance for identifying relapse events (top) and whether a user’s final state is RECOVER ING (bottom). Combined scores across classes are shown in bold. Identifying a relapse event Label Precision Recall F1 score Accuracy Combined 79.92 78.18 78.04 78.42 Relapse 86.11 66.67 75.15 No relapse 73.73 89.69 80.93 Baseline 25.65 50.00 33.91 51.30 Identifying final initiating post phase Label Precision Recall F1 score Accuracy Combined 81.47 81.52 81.49 81.57 RECOVERING 79.78 80.68 80.23 ¬RECOVERING 83.17 82.35 82.76 Baseline 26.84 50.00 34.93 53.40 data set. Using this technique, we achieve an F1-score of 78% and accuracy of 78% in identifying Relapse and No relapse, compared to baseline scores of 33.9% and 51.3% if we labeled each user with the majority class, No relapse (Table 8.4). 8.7.2 Identifying Recovery To identify whether a user was RECOVERING when she last initiated a post on Forum77, we simply examine the final phase label in her transition sequence. Using the CRF-labeled sequences, we classify a user’s last post as RECOVERING or ¬RECOVERING with an F1-score of 81.5% and accuracy of 81.6%; the comparative baselines are 34.9% and 53.4%, in which all last posts are labeled as ¬RECOVERING (Table 8.4). 8.7.3 Results Using the methods described above, we identify users who are RECOVERING at the time of their last initiating post on Forum77, as well as users who have relapsed at least once during their tenure on Forum77. We apply this analysis to the entirety of the Forum77 membership base who have initiated > 5 posts (2,848 users, 32,345 initiating posts). 6/3/2014 Sankey Diagram May 22, 2012 Mike Bostock No relapse 52% Withd. 44% First post 17% 114 37% Using 48% Relapse 48% Sankey Diagrams Recov. 46% CHAPTER 8. QUANTIFYING RECOVERY AND RELAPSE Last post Figure 8.5: Aggregated user transitions from start to end state. Bar widths denote population proportion. For example, 48% of users in our sample relapsed during their tenure on Forum77. Do users tend to recover on Forum77? Overall, users progress towards recovery during their tenure. Figure 8.5 shows the distribution over start state, relapse, and end state for the 2,848 users described above. Most users first initiate contact on the forum when they are USING (48%), followed by WITH DRAWING (44%). In contrast, only 17% of users are USING by the time of their last post, while 37% are WITHDRAWING and 46% are RECOVERING. Does relapsing hurt recovery likelihood? Roughly half of users experience a relapse during their tenure. Users who experience no relapse are significantly more likely to end in RECOVERING than users who relapse (53.4% vs. 44.4% end in RECOVERING, χ21 = 55.1, p < 0.001). Despite this, RECOVERING is still the most likely end state for Forum77 users who relapse. Are relapses associated with longer tenure? Given the documented prevalence of relapse [103, 227], the observation that more than half of the users in our data set experience no relapse is surprising. Analyzing tenure values reveals that the average tenure of no relapse users is 128 days, compared to 418 days for users who relapse. One hypothesis is that users who experience no relapse do relapse after leaving the forum and do not return. http://localhost:8081/ What differentiates users who are ultimately RECOVERING? 1/1 We define a user as active if she ini- tiated a post on the forum in the last 45 days of our data set, and remove these. We then analyze users’ global activity characteristics (Table D.3) aggregated over their USING and WITHDRAWING posts (RECOVERING posts are omitted as this is the phenomenon that we are studying). Table 8.5 shows the results. CHAPTER 8. QUANTIFYING RECOVERY AND RELAPSE 115 Table 8.5: Comparison of activity features for users who are and are not RECOVERING in their last initiating post. Per-user values are aggregated over USING and WITHDRAWING posts. Statistical significance is determined using Kruskal-Wallis tests (*** p < 0.001) after Bonferroni corrections. not RECOVERING Activity Characteristic p Mean Med. IQR MAD RECOVERING Mean Med. IQR MAD # initiating posts authored *** 8.99 5 8 4.44 9.89 6 6 2.96 # self responses authored *** 19.56 8 16 10.37 17.04 9 16 8.89 # responses authored *** 45.56 9 31 13.34 33.81 8 24 10.37 # initiating posts # responses authored *** 0.73 0.50 0.76 0.44 1.04 0.67 0.83 0.49 Days since last init. *** 16.39 3.33 12.41 3.95 27.05 8.30 28.36 10.53 Days since last self-response *** 17.47 3.00 13.38 3.95 29.53 8.29 31.45 10.81 Days since last response *** 15.92 1.66 7.32 2.47 25.30 4.37 21.75 5.99 Days since last activity *** 14.11 1.80 6.09 1.90 20.94 4.80 20.09 5.79 # self responses *** 1.93 1.50 1.64 1.19 1.83 1.50 1.50 1.11 # replies received *** 5.63 5.00 3.40 2.37 5.56 4.83 3.30 2.29 # respondents *** 4.09 3.83 2.00 1.60 4.01 3.70 2.03 1.42 Users who leave the forum in a state of RECOVERING are significantly more engaged in forum activity, even when they are USING and WITHDRAWING. The average time lapse between any form of activity (initiation, self-response and response) is about 30% shorter for those who are RECOVERING when they leave. Moreover, their activity is focused outwardly on other community members: users who are RECOVERING author, on average, 50% more responses than those who are ¬RECOVERING (average 45.6 vs. 33.8), but author slightly fewer initiating posts (average 9.0 vs. 9.9). These results resonate strongly with prior work on AA that finds that both active participation in AA and explicitly focusing on helping other members correlates with sustained abstinence [190, 223]. 8.8 Discussion Our motivating goals were to study phases of addiction as they manifest on Forum77 and to analyze the forum’s effectiveness in promoting recovery. In this section, we discuss Forum77’s efficacy as a tool for supporting users through withdrawal, relapse and sustained recovery, drawing on post excerpts to contextualize our findings. We then discuss how our results might inform future interface design, before touching on potential implications for addiction treatment. CHAPTER 8. QUANTIFYING RECOVERY AND RELAPSE 8.8.1 116 Use and Efficacy of Forum77 Supporting Withdrawal: Our results suggest that Forum77 is an effective tool for helping users through opioid withdrawals and physical detoxification. In general, users progress more often than they regress (Figure 8.3), and these local progressions translate into a global trend of many users reaching a state of RECOVERING ING during their tenure. When first initiating a post, 48% of users are USING, 44% WITHDRAW- and 8% RECOVERING; in their most recent initiating post, however, only 17% of users are USING, 37% are WITHDRAWING and 46% are RECOVERING, despite the fact that almost half of the population experiences a relapse (Figure 8.5). If we interpret our results as a 46% success rate on users’ final detoxification attempt before leaving the forum, this is an improvement over self-detoxification success rates reported in the addiction literature [102, 184]. We must be cautious here, however, as we are comparing across different study designs. Forum77’s efficacy at supporting detoxification may be attributable, in part, to both the strong social support and the detailed information on withdrawal that members receive from each other. Both of these factors have been shown to improve withdrawal outcomes [102, 106, 184], and qualitative remarks from users suggest that Forum77 meets the mark on both. “I have tried to cope by myself for too long. Its so hard to deal with something like addiction by your self”, wrote one user. “[T]here is so much support and advice on getting through this and addiction I am living proof it works!!!!!!”, and “i was on here once before and was able to achieve 9 months of sobriety due to the support i had here and from meetings.” remarked others. In other cases, simply discovering a supportive community might galvanize a cessation attempt: “up until 3 weeks ago, I had no intentions of quitting, i was just looking to find some stuff on addiction...and i just happened to run across this forum...”. Relapse and Shame: RECOVERING Despite the favorable prognosis that users are more likely to reach a state of during their tenure (Figure 8.5), we do not know whether they maintain this state upon leaving. It is possible that the same strong support network that helps users through detoxification deters them from wanting to admit a relapse. Quantitatively, although almost half of our sample relapsed (Figure 8.5), we rarely observed posts in which users reported a relapse immediately after the fact (Table 8.1). The hypothesis that users are too ashamed to admit relapse until they implement a renewed attempt to quit is qualitatively well supported. Statements such as “I suck!! I am so sorry, I’ve been too embarrased too admit I fell off the proverbial wagon around Christmas.” are common. Others, such as CHAPTER 8. QUANTIFYING RECOVERY AND RELAPSE 117 “haven’t posted in a few weeks because, of course, i slipped up and am ashamed. but now i am back on track with the sub” and “Im in day 3 of detox, i was too embarassed to post the first 3 days...” echo these sentiments, and suggest that some users feel that a new detoxification effort is required as proof of commitment before returning to the community. Supporting Sustained Recovery: Without observing users’ behavior outside the forum, we cannot quantify Forum77’s effectiveness at supporting long term recovery. Qualitatively, however, some users feel that this is something that Forum77 could improve upon. One user summarizes: “I wonder if there is not a need for a forum community for long-term support. This community is great, but is skewed towards the short-term wd symptoms and getting through the initial physical pain of wd.”. Also prevalent are observations that the forum does not sufficiently prepare users to handle post-acute withdrawal syndrome (PAWS): “I wish people would warn others about this PAWS thing”, wrote one user. “i was doing so good i made it to about 100 days sober ... the PAWS really got me”, expressed another. Moreover, users who return to Forum77 after some time may find that their support network has moved on. One user who was struggling not to relapse asked “Where are all of the friends i made here that I no longer see?!?”. Other users, however, give qualitative evidence in support of Forum77’s efficacy at aiding sustained recovery. “I have not posted much lately but continue to log on and read ppls posts and I believe that is a key aspect in my recovery”, states one user. Another wrote “when I get a craving I come here and read, even if I read it before, it helps me think of what I went through what I’m going through and how others cope”. We found that higher engagement, in the form of activity levels and volumes of responses contributed, correlate with the chances of a user being in a phase of RECOVERING by the end of her tenure. Extending this idea, one possibility is that remaining engaged with the forum (even in the form of “lurking”) after reaching a state of RECOVERING helps to prevent relapses in a similar way that continued participation in AA correlates with longer periods of sobriety [190,223]. A deeper analysis into the mechanisms through which Forum77 does and does not support long-term recovery is an important topic for future work. 8.8.2 Implications for Forum Design Computational tools for automatically identifying addiction phases, relapses, and whether a user’s tenure ends in RECOVERING could prove valuable to communities like Forum77. One question commonly asked CHAPTER 8. QUANTIFYING RECOVERY AND RELAPSE 118 by users is what to expect when they quit their drug of choice, and having access to this information has been shown to improve the chances of a successful cessation attempt [106]. Using phase sequence data labeled by our CRF classifier, users could set realistic expectations by exploring patterns based on thousands of others’ prior experiences. Having a realistic perspective of the process of relapse and recovery may also reduce the number of instances in which users feel too embarrassed or ashamed to return to Forum77 after relapsing. Finally, exposing such data could help people find others who exhibit similar patterns to their own. Finding “people like me” is one of the primary stated reasons for user participation in online health communities [90]. While Forum77 appears to promote detoxification effectively, we observed that users have mixed feelings about how well it supports sustained recovery. It is possible that this could be addressed via altering community dynamics. For example, as we suggested above, continued participation in Forum77 post RECOVERING might help users achieve sustained recovery. Efforts focused on decreasing user churn and increasing member retention could support this. Alternatively, in a similar vein to AA’s sponsorship program, which is thought to promote sustained recovery [237], we might consider automatically matching newcomers with long-term members who would act as formal mentors (or sponsors). Finally, it is possible that the community dynamics that support detoxification are different from those that would support sustained recovery. In this case, a forward reference to a different community might help RE COVERING 8.8.3 Forum77 users plan what to do next. Implications for Addiction Treatment Forum77 accrues, at scale, information that is difficult to acquire through formal medical channels. First, abusing prescription drugs usually entails deceiving one’s doctor. Second, addiction research data are typically acquired at point-of-care facilities (e.g., emergency rooms) or surveys at high schools or colleges. Although the ethics and privacy of such analyses must be carefully considered, it is possible that data extracted from sites like Forum77 (e.g., CRF-based transition frequencies, recovery trends, etc.) could help medical professionals and policy makers better understand patients’ experiences with drug abuse. For example, insight into the day to day difficulties of opioid-assisted withdrawal might inform policy for improving the management of this popular treatment down the road. It is also possible that research like ours could illuminate poorly understood aspects of addiction: to our knowledge, ours is the first attempt to quantify the cycle of addiction. CHAPTER 8. QUANTIFYING RECOVERY AND RELAPSE 8.8.4 119 Limitations One limitation of this work is the selection bias of our subjects: users who come to Forum77 are likely already open to (or at least, considering) the possibility of quitting. This problem is well known to those hoping to analyze the efficacy of Alcoholics Anonymous [20]. As such, care should be taken in applying our results to a more general population who misuse prescription medication. We cannot assume, for example, that a random sample of people who misuse prescription medication would similarly progress towards recovery if they were asked to participate in Forum77. We also cannot draw epidemiological conclusions that apply to the population as a whole from these data. However, the size of Forum77, the prevalence of the opioid epidemic, and the increasing popularity of online health communities alone make the forum worth studying. Another limitation is the acceptable, but still improvable, accuracy of our CRF classifier. While we were able to use CRF-based sequences to identify relapse, and whether a user’s final post was written when she was RECOVERING with high accuracy, improving our underlying classifier performance would open up more nuanced analyses. Finally, having page view data would allow us to incorporate measures of passive participation (“lurking”), which would add a new dimension to our study. We hope to address such opportunities in future work. 8.9 Summary Our goal in this chapter was to analyze the process of opioid withdrawal, recovery and relapse on Forum77, MedHelp’s Addiction: Substance Abuse community. Drawing on literature from the Addiction community, we first present an overview of prescription drug abuse and present key concepts and terminology (§ 8.2). Next, using Prochaska’s Transtheoretical Model for behavior change, we develop a taxonomy of phases of addiction that comprises three main categories: USING, WITHDRAWING and RE COVERING (§ 8.4). The majority of initiating posts are authored when users are WITHDRAWING. Next, we analyze linguistic and behavioral features across the USING, WITHDRAWING and RECOVERING phases. Several significant differences characterize each phase (§ 8.5), and we leverage these results to train a CRF model to automatically annotate users’ phase sequences (§ 8.6). We can identify relapse events, and whether a user was RECOVERING when she authored her final post, with high accuracy from our CRF-annotated sequences (§ 8.7). CHAPTER 8. QUANTIFYING RECOVERY AND RELAPSE 120 Applying our classifiers to 2,848 users (§ 8.6.3 and § 8.7.3) reveals that progressive transitions towards RECOVERING are much more prevalent than regressive transitions. Moreover, despite the fact that almost 50% of users relapse during their tenure, leaving Forum77 in a state of RECOVERING is the most probable outcome for all users. Finally, we find that increased participation in the community correlates with a user RECOVERING by the end of her tenure: users who are RECOVERING by their final initiating post are significantly more engaged with the community when they are USING and WITHDRAWING than users who are ¬RECOVERING by their final initiating post. To our knowledge, ours is the first work to investigate the efficacy of online mutual help groups for prescription drug abuse. Our results, which help to illuminate a previously poorly understood resource, suggest that Forum77 is an effective detoxification aid. Based on our findings, we also highlight several ways in which Forum77 might be enhanced to better support its users (§ 8.8), such as exposing aggregate user data describing the cycle of addiction, or matching newcomers with sponsors. Finally, as the type of information shared on Forum77 is difficult to acquire at scale through traditional channels, we note that the tools and insights presented here may be of use to the addiction research community. Chapter 9 Conclusion This dissertation presents both methods for automatically extracting medically-relevant data from patient authored text (PAT) as well as insights derived through the application of these methods. In concert, our contributions both underscore PAT’s latent potential for illuminating poorly understood or clandestine medical topics that may be invisible to traditional medical data collection, as well as offer viable methods that dramatically improve our ability to realize this potential. In this final chapter, we reiterate the contributions of this thesis (§ 9.1) and present principal opportunities for future research (§ 9.2) before offering concluding thoughts (§ 9.3). 9.1 Contribution Summary Our work is predicated on the observation that despite being both abundant and uniquely valuable, patient authored text (PAT) is a heavily underutilized health data resource. In Chapter 2 we presented an overview of prior work describing online health seeking behavior and, more specifically, online health community (OHC) participation. Synthesized via a cross-disciplinary literature review, this chapter serves to illuminate how people use the Internet as a health resource. In Chapter 3 we present a novel review of prior work that utilizes PAT as a primary data source. We discuss goals, data sources, methodological approaches and outcomes, providing a contextual background against which to interpret and evaluate the rest of our work. To our knowledge, this review is the first such synthesis of prior work focused on extracting value from PAT. The development of ADEPT (Chapter 5) – our CRF classifier that automatically identifies medicallyrelevant terms in PAT – was prompted by our observation that existing biomedical term annotation toolkits perform poorly on PAT. While statistical classifiers present an attractive alternative, acquiring large, expert-annotated PAT corpora on which to train and test them is a major challenge. To this end, we prove 121 CHAPTER 9. CONCLUSION 122 that a crowd of non-experts yields annotations comparable in quality to experts’ for the PAT medical term identification task. Our result offers an alternative method for acquiring large annotated PAT corpora both quickly and cheaply. However, our task design failed to yield similar quality results for more specific PAT annotation tasks (e.g. identifying all symptom terms). This underscores the tradeoff between designing crowdsourcing tasks and annotating the data oneself. Applying ADEPT to large PAT corpora yields high-level insights useful for summarization and hypothesis generation; however, the tool is too broad for fine-grained analysis. For higher-resolution insights, we narrow our focus to the topic of addiction: a highly prevalent but stigmatized medical condition. Understanding why people author PAT is crucial for matching it with appropriate research questions. In Chapter 6, we investigate users’ motivations for participating in Forum77: MedHelp’s Addiction: Substance Abuse community. Our thematic analysis over initiating posts concurs with prior work stating that, in general, people seek both informational and emotional support from OHCs. However, our analysis also reveals distinct sub-categories of these two kinds of support. Of particular interest is the update: a prevalent emotional support seeking post in which the user does not explicitly request a community response. We train two logistic regression classifiers: the first distinguishes emotional from informational support-seeking posts; the second, update from non-update posts. Applying these to the entire Forum77 data set reveals that update posts garner slightly more responses on average than non-update posts. The prevalence of update posts suggests that users value the forum as a place where their personal progress can be witnessed by others and recorded for posterity. Forum77 also serves as a repository for information on opioid withdrawal. In fact, Thomas’ Recipe, a protocol for medication-assisted opioid withdrawal that evolved on Forum77, suggests that Forum77 users actively collaborate on developing effective treatment protocols. In Chapter 7 we investigate the distribution of drugs of choice (DOCs) in the Forum77 population. A close reading indicates that identifying DOCs is a context sensitive problem, as a variety of substances can serve as either addiction or treatment. A CRF classifier trained on manually annotated data is able to identify DOCs with high accuracy. Our resulting analysis, which compares the Forum77 DOC distribution to those of other drug-using populations, reveals that the Forum77 population struggles disproportionately more with prescription opioids, and disproportionately less with traditionally abused substances such as alcohol, marijuana and cocaine. While it is difficult to ascertain whether Forum77 reflects realworld drug use trends, our results do suggest that Forum77 represents a population of drug users that is not well covered by existing monitoring systems. CHAPTER 9. CONCLUSION 123 Finally, in Chapter 8, we analyze the process of opioid withdrawal, recovery and relapse on Forum77. Through a thematic analysis, we develop a taxonomy describing phases of addiction based on Prochaska’s Transtheoretic Model for behavior change. Phases of addiction are accompanied by distinct physiological and psychological changes, and this is mirrored in users’ usage of the site: exploring activity and linguistic features from posts across the phases USING, WITHDRAWING and RECOVERING reveals several significant differences. We leverage these differences to train a sequence-based CRF model to annotate users’ phase sequences automatically. We can also identify relapse events from these sequences, as well as whether a user’s final post was made in a state of RECOVERING, with high accuracy. Our resulting analysis of all Forum77 users’ transition sequences indicates that despite the fact that relapse is common, leaving the forum in a state of RECOVERING remains the most probable outcome. Moreover, we show that high engagement with the community correlates with the probability of a user RECOVERING by her last initiating post on the forum. Overall, these results suggest that Forum77 is an effective detoxification aide. To our knowledge, this work is the first that attempts to quantify the phases of addiction and the transitions between them. 9.2 Future Work Given the considerably high levels of enthusiasm currently surrounding health-related technology, our contributions present a timely foundation and reference. However, many limitations to realizing the full value of PAT remain. In this section, we articulate key opportunities for future research. 9.2.1 Supporting the Methodological Process Figure 9.1 (replicated from Chapter 1) illustrates the stages of our methodological process for extracting insights from PAT. At present, most of the stages in the main process (top row) must be cobbled together in an ad-hoc fashion by the researcher. This hurts efficiency, replicability and makes comparison between studies difficult. Developing this process into more of a standardized pipeline would enable closer synergy between disparate research efforts, and make it easier to identify quality results. We suggest several areas for improvement below. Future Work CHAPTER 9. CONCLUSION close annotation reading PAT Content Schema 124 training application Labeled Labeled Data Data Features Classifier (human) (auto) schema revision processing & analysis Processed Data Insights tuning PAT interface design Medical Discovery Figure 9.1: Our general methodological process. Nodes in grey show avenues for future work supported 108 by our contributions. Interface Support for Thematic Analysis Thematic analyses are frequently used to develop deep insights into text-based corpora and to inform future analyses. Moreover, as we note in Chapters 6 and 8, not only do the results of thematic analyses stand as their own qualitative contribution, they also indicate junctions at which we may shift from a closereading to a large-scale, automated analysis. In spite of their complexity and importance, there is no interface support for thematic analyses: provenance of this iterative process is never recorded; reasons (and supporting examples) for making particular decisions about categories are lost; and the clustering, combining, and splitting of categories is done primarily in the researchers’ working memories. Based on our own experience, a starting point for interface support would provide visual “sand boxes” for comparing and organizing data elements into categories; support for flagging items that either especially support, or especially contend, the proposed taxonomy; and facilitate the easy expression of categorization rules. Aside from making thematic analyses more efficient and consistent, externalizing the process in this fashion would make resulting taxonomies easier for a third party to verify, compare against and reuse. Improved Tools for Annotation Related to the matter of interface support for thematic analyses is interface support for data annotation. In our work, we conducted this process primarily through the use of shared spreadsheets. While this makes data output easy, it hinders comparison between non-adjacent data elements; does not support CHAPTER 9. CONCLUSION 125 the capture of spontaneous updates to annotation rules that arise from encountering novel examples; and only weakly supports collaboration between annotators. Examples of features that an annotation interface might provide include visual support for clustering and comparing data elements; automatic label suggestions based on underlying text analytics; iterative updating of annotation rules in response to new data elements; and automatic evaluation of inter-annotator agreement that facilitates rapid exploration of agreements and errors. Not only would such an interface make the annotation process faster and more consistent, but it may also encourage standardization in annotation and reporting practices. Mapping the Limits of the Crowd in PAT Annotation Tasks In Chapter 5 we showed that the crowd can replace medical experts for some PAT annotation tasks. However, correctly designing crowdsourcing tasks is sufficiently time consuming that in subsequent chapters, we elected to annotate our data manually. Exploring the crowd’s ability to perform a variety of PAT annotation tasks, however, remains a crucial avenue for future work. Without it, it would be difficult to scale analyses such as ours to larger forums or to multiple data sets. More importantly, however, this would make it easier to create and share large, labeled corpora within the research community. Due to our data sharing agreement with MedHelp, we were unable to share any of our labeled data sets. However, making a large, labeled PAT corpus available to the public would be the most direct way to stimulate research on these topics. 9.2.2 PAT Interface Design & Support Despite their popularity, the general structure of online health communities (OHCs) has barely changed since the late 1990’s. However, both insights and classifiers derived through the PAT analysis pipeline could prove valuable if incorporated into OHCs. As we show in Figure 9.1, closing this loop may create a virtuous cycle, in which the results of interface improvements result in higher volumes and quality of PAT. This, in turn, would lead to more fine-grained insights and improved classifiers. While we do not implement any interface changes in this work, we have several suggestions. Expose Aggregate Data to Users OHC participants spend hours doing tasks that often amount to simple aggregation, such as calculating treatment popularity, establishing what Forum77’s most popular DOC is, and estimating the probability of a successful detoxification attempt conditional on a specific withdrawal method. This is inefficient: not CHAPTER 9. CONCLUSION 126 only are OHCs difficult to navigate for these sorts of tasks, but often many users will conduct identical analyses at different points in time. In the best case, exposing such data to users could alleviate users’ need to reinvent the wheel for each analysis, freeing their time for alternative tasks. Support Data Entry One critique of PAT is that it is often incomplete in terms of containing all relevant medical information. Nudging users towards providing more complete accounts of their conditions would enrich our analyses and enhance PAT’s credibility as a data source. One example is “symptom autocomplete”: rather than relying on users to remember and list all of their symptoms (some of which may not even be severe enough to notice), it would be relatively straightforward to automatically suggest (or “autocomplete”) symptoms based on the ones already entered. Automatically Construct User Timelines Personal timelines are commonplace in social media and the quantified self movement. Our work on users’ reasons for participating in Forum77 (Chapter 6) indicate that they value its archival features. Making it easier for users to browse their histories, especially histories enhanced with structured data provided by classifiers, could facilitate an array of tasks, from discovering behavioral patterns to finding other “people like them”. Quantitative uses aside, a timeline comprises a narrative of important life events, failures, and accomplishments that would have strong emotional significance to users. Given the chance, it is likely that users would take it upon themselves to curate their own timelines: a situation that could be leveraged to have users label their own data. 9.2.3 Making the Leap to Medical Discoveries Our work adds to a growing body of proof that medically-relevant insights are automatically extricable from PAT. However, the holy grail is to move from medical insights to actionable medical discoveries. In our own work, efforts along these lines might include extending our work on identifying drugs of choice (Chapter 7) to support real-time identification of new drugs, or extending our work on phases of addiction (Chapter 8) to prove that participation in Forum77 measurably reduces the number of relapses that someone experiences. However, making such leaps is nontrivial. Challenges include understanding how signals in PAT correspond to real-world trends, in spite of the fact that PAT rarely contains demographic data; clinically verifying results, which is both slow and expensive; and developing new experimental CHAPTER 9. CONCLUSION 127 designs that are compatible with online health seeking behavior. Such challenges could only be met through a close-knit collaboration with medical professionals who agree that PAT is a valuable data source. 9.3 Concluding Remarks Patient authored text is the abundant byproduct of hours of human intelligence spent on complex, healthrelated problem solving tasks. As long as this valuable resource is underutilized, researchers, patients and medical professionals alike will be deprived of the unique insights and benefits that it has to offer. Although this dissertation takes a step towards leveraging some of the considerable work that patients do in managing their own health, this is only the tip of the iceberg: we anticipate a future in which technology creates, supports and encourages synergy between patients, providers and data. Appendix A ADEPT Supplementary Material Table A.1: The following features are specified when training our CRF. Other features retain their default values as described at http://nlp.stanford.edu/nlp/javadoc/javanlp/edu/stanford/nlp/ie/ NERFeatureFactory.html Property Name Type Value Description useClassFeature boolean TRUE Include a feature for the class (as a class marginal). Puts a prior on the classes which is equivalent to how often the feature appeared in the training data. useWord boolean TRUE Gives you feature for w useNGrams boolean TRUE Make features from letter n-grams, i.e., substrings of the word noMidNGrams boolean TRUE Do not include character n-gram features for n-grams that contain neither the beginning or end of the word useDisjunctive boolean TRUE Include in features giving disjunctions of words anywhere in the left or right disjunctionWidth words (preserving direction but not position) maxNGramLeng int 7 If this number is positive, n-grams above this size will not be used in the model usePrev boolean TRUE Gives you feature for (pw,c), and together with other options enables other previous features, such as (pt,c) [with useTags) useNext boolean TRUE Gives you feature for (nw,c), and together with other options enables other next features, such as (nt,c) [with useTags) useSequences boolean TRUE usePrev boolean TRUE useNext boolean TRUE maxLeft int 1 The number of things to the left that have to be cached to run the Viterbi algorithm: the maximum context of class features used. useTypeSeqs boolean TRUE Use basic zeroeth order word shape features. useTypeSeqs2 boolean TRUE Add additional first and second order word shape features useTypeySequences boolean TRUE Some first order word shape patterns. wordShape String chris2useLC Either none for no wordShape use, or the name of a word shape function recognized by WordShapeClassifier.lookupShaper(String) 128 Appendix B F77 Purpose Supplementary Material Table B.1: Features used to train our purpose classifiers, which distinguish emotional from informational support seeking, as well as update from non-update posts. Feature Name Description containsQuestion whether the post contains a question (binary) numQuestions number of questions the post contains unigrams all words present in the post bigrams all bigrams (two consecutive words) present in the post timeMentioned number of days clean time (if mentioned). patterns: Extracted using the following two X:NUM (day—days—week—weeks—months—month—year—years) (clean—off) on? ”day—days” X:NUM where NUM is any number and ”—” represents the OR operator. We then convert weeks/months/years to days and use the number of days as the feature value. The default value is 0. numPosWords number of words with a positive sentiment score in SentiWordNet of ¿ 0.8 numNegWords number of words with a negative sentiment score in SentiWordNet of ¿ 0.8 daysMentioned whether the user mentions a number followed by the term “day” or “days”’ days since last initiating post the number of days since the user’s last initiating post 129 Appendix C F77 Drug of Choice Supplementary Material Table C.1: Drug term resolution map, manually compiled from classifier output. The i column indicates whether the drug category is included in our analysis in Chapter 7. Category alcohol cigarettes cocaine Drug name Resolved drug terms i alcohol acholic, acoholic, alcahol, alchohol, alchol, alcholo, alcohol, alcoholic, alcoholoc, alcolhol, alcololic, alocholic, alocohol, champagne, beer, beers, vodka, wine, drink alcohol, drink beer, drinking alcohol, drink wine, drinking beer, drinking wine, drinks, beer bottles, beer drinking, alcohol drinking, alcohol drinks, alcoholic drink, drink, drinking ◦ cigarettes cigarettes, cigaretts, cigarrettes, cigars, cigerattes, ciggaretes, ciggarettes, ciggaretts, ciggerettes, ciggies, ciggs, cigrattes, cigs, smoke, smoke cigarets, smoke cigarettes, smoke cigs, smoked, smoker, smokes, smokes cigarettes, smokes ciggaretts, smokes cigs, smokin, smokin cigs, nicotine, smoking cigarettes, smoking, smoking cigs ◦ cocaine cocaine, cocain, cocaine, cocane, coccaine, cociane, coke, coaine, powder, smoke cocaine, smoke coke, smokin coke, smoking coke, smoking crack, smoking crack cocaine ◦ hallucinogens, mescaline ◦ psilocybin mushroom, mushrooms, shrooms, psychedelics ◦ heroin heroin, herioin, herion, heroin, heroin cocaine, heroine, smoking heroin, smack, smoke heroin, heroin heroin, heroin smoking ◦ marijuana marijuana marijuana, marajuana, marihuana, marijanna, marijauna, marijuan, marijuana, marijuana smoker, marijuanna, marijuanna smoker, marjuana, marjuana smoke, pot, pot brownies, pot smoke, pot smoker, pot smokers, pot smokin, weed, weed smoker, smoke marijuana, smoke marijuanna, smoke pot, smoke weed, smoked pot, smokes marijuana, smokes pot, smokes weed, smokin pot, smoking marijuana, smoking pot, smoking weed, dope, pot smoking, smoking weed, smoking dope, smoke dope, hash, hashish, smoked weed, smokin dope, marijuana smoke, marijuana smoked, marijuana smoking ◦ methadone methadone methadone, mehadone, mehtadone, mehtadone pain killers, metadone, methadoen, methadome, methadon, methadone, methadone pain killers, methadones, methadont, methadose, methandone, methaodne, methatdone, methdaone, methdone, methedome, methedone, methodone, methodone pain pills, methondone, methone, mdone ◦ suboxone sub, suoxone, subbies, subboxin, subboxine, subboxone, subetex, subitext, subloxone, subo, subone, subonoxe, subooxone, subotex, subox, suboxan, suboxe, suboxen, suboxene, suboxens, suboxin, suboxine, suboxins, suboxne, suboxom, suboxome, suboxon, suboxone, suboxones, suboxtone, suboxyn, suboxzone, subozone, subroxone, subs, soboxan, soboxen, soboxene, soboxin, soboxine, soboxion, soboxon, soboxone, soboxones, sabonxon, saboxan, saboxen, saboxin, saboxins, saboxon, saboxone, subtex, subutec, subutek, subutex, subutext, subutox, subuxone, subx, subxone, syboxin, syboxone, symboxin, buprenorphine, buprenorphine, bupenorphine, bupenorphrine, bupernepherine, bupernorphine, bupremorphine, buprenex, buprenophine, buprenorphene, buprenorphine, bupreorphine ◦ hallucinogens hallucinogens psilocybin mushroom heroin Continued on next page 130 APPENDIX C. F77 DRUG OF CHOICE SUPPLEMENTARY MATERIAL 131 Table C.1 – Continued from previous page Category opioid Drug name Resolved drug terms i codeine codeine, codeiene, codein, codeine, codeine otc pills, codeine painkillers, codeine sulphate, codene, codene pain pills, codien, codiene, codiene painkillers, codiens, codine, codone, coedine, tylenol 3, tylenol3 ◦ dextropropoxyphene dextropropoxyphene, darovcet, darv, darvacet, darvacets, darvaset, darvecet, darvecette, darvicet, darviset, darvo, darvocet, darvocets, darvocett, darvocette, darvon, darvoncet, darvos, darvoset, darvs, darvys, davocet, davort, dextropropoxyphene ◦ dialudid diladid, diladin, diladud, dilantin, dilatin, dilaudad, dilauded, dilaudeds, dilaudid, dilaudin, dilauid, dillauded, dilodid, dilodids, diloted, dilotid, dilotted, diloudid, diluadid, diluadids, diludid, diluidid, hydromorphone, hydromophone, hydromorophone, hydromorphcontin, hydromorphine, hydromorphone ◦ fentanyl actiq, fenatyl, fentaly, fentanol, fentanyl, fentanyl pain patch, fentanyl pain patches, fentayl, fentenal, fentenyl, fentinol, fentnyl, fentora, fentyl, fentynal, fentynal pain patches, fentynl, fentynol, fentynyl, fetynal ◦ hydrocodone hydrocodone, hrdrocodone, hudro, hycodan, hycodne, hydo, hydocodone, hydorcodone, hydors, hydos, hydos-75, hydr, hydracodone, hydrco, hydrcodene, hydrcodone, hydro, hydro codeine, hydro-codone, hydroc, hydrocdone, hydrochodone, hydroco, hydrocod, hydrocodan, hydrocode, hydrocodeine, hydrocoden, hydrocodene, hydrocodien, hydrocodiene, hydrocodin, hydrocodine, hydrocodine pills, hydrocodne, hydrocodon, hydrocodone, hydrocodones, hydrocodons, hydrocondone, hydrocondone pain medication, hydrocone, hydrocordon, hydrocordone, hydrodcodone, hydrododone, hydrodone, hydromet, hydromorphone hydrochloride, hydros, hydrycodone, hyrdo, hyrdocodone, hyrdos, hyrdro, hyrdrocodone, hyrdros, hyro, hyrocodone, hyros, smoke hydro ◦ lortab lortab, loratab, loratabs, loratb, lorcet, lorcets, lorcett, lorecet, lorecets, lorects, loretab, loricet, loritab, loritabs, lorocet, lorocets, lorotabs, lorset, lortab, lortab◦, lortab◦-5, lortabs, lotab, lotabs, loracet, loracets ◦ meperidine, demerol, demeral, demerol, demoral, demorol ◦ morphine morphine, mophine, moraphine, morhine, morhphine, morhpine, moriphine, morophine, morp, morphane, morpheine, morphen, morphene, morphin, morphine, morphines, morphone, morpine, mscontin, morphine mscontin, morphine sulf, morphine sulphate, ms-contin, avinza, ms contin, oramorph, kadian ◦ norco norco, noco, noraco, norc, norce, norco, norco vicodin, norcos, norcs, nordco, noreco, norko, noroco, norocs, narco, narcos ◦ opiates opiates, opates, opiade, opiants, opiat, opiate, opiates, opiats, opiete, opiets, opiot, opiote, opiotes, opitaes, opitate, opitates, opites, oopiate, opaite, opaite pain meds, opaites, opiate meds, opiate narcotic pain pills, opiate narcotics, opiate pain killer, opiate pain killers, opiate pain medication, opiate pain medications, opiate pain medicines, opiate pain meds, opiate pain pill, opiate pain pills, opiate painkillers, opium, opiads-heroine/percs/hydro, opiate drug, opiate narcotic pain, opiate pain, opiate pain med, opiates percs, opiates vicodin, opiates xanax, oppiates, smoking opium ◦ opioids opioids, opiod, opiods, opioid, opioids, opoid, opoids, opiod drug, opiod narcotic, opioid meds, opioid pain med, opioid pain medications, opioid pain meds ◦ oxycodone oxycodone, roxcodone, roxi, roxicdone, roxicet, roxicets, roxicodne, roxicodone, roxicodones, roxicontin, roxicontins, roxicotin, roxies, roxiodone, roxis, roxocodone, roxy, roxy codone, roxy3, roxy4, roxycet, roxycodine, roxycodone, roxycodones, roxycontin, roxycontins, roxycotin, roxycottin, roxys, oxcodone, oxcontin, oxcotin, oxcy, oxcycodone, oxcycontin, oxcycotin, oxcyontin, oxcys, oxen, oxey, oxeys, oxi, oxicoden, oxicodon, oxicodone, oxicontin, oxicotin, oxicotines, oxicoton, oxie, oxie codine, oxies, oxocodone, oxtcontin, oxxy, oxy, oxy codone, oxy contin, oxy-contin, oxy4, oxy8, oxy8s, oxyc, oxyco, oxycocet, oxycocets, oxycod, oxycode, oxycodeine, oxycoden, oxycodene, oxycodin, oxycodine, oxycodne, oxycodon, oxycodone, oxycodones, oxycodpne, oxycoidone, oxycondin, oxycondone, oxyconin, oxycontiin, oxycontin, oxycontine, oxycontins, oxyconton, oxycontontin, oxycoontin, oxycotdin, oxycoten, oxycotin, oxycotine, oxycotins, oxycotion, oxycoton, oxycotten, oxycottin, oxycottins, oxycotton, oxydocone, oxydodone, oxydone, oxyicodone, oxyies, oxyir, oxynorm, oxys, oxytocin, oxyxodones, oxyz, oycodone, blues, blue pills, ocs, ocycodone, oxy hydro, oxy ocs, oxy vics, oxy-norm, oxy/percs/tabs, oxycodone oxycontin, oxycodone pain meds, oxycontin, oxycotontin, smoking oxy, smoking oxycontin ◦ meperidine oxymorphone, opana, opanas ◦ percocet percocet, perc, percacet, percacets, percaset, percasets, perccet, percecet, percecets, percet, percets, percicet, percks, perco, percocect, percocet, percocete, percocets, percocett, percocette, percocetts, percocite, percocoet, percoct, percodan, percodone, percoet, percoets, percoset, percosets, percot, percote, percots, percs, perkacet, perkeset, perkocet, perkocets, perks, perocaet, perocet, perocets, persocet, pecocet, pecocets, pers, perts ◦ tramadol tramadol, tradol, tram, tramacet, tramadal, tramadaol, tramado, tramadol, tramadole, tramadols, tramadon, tramal, tramdol, tramedol, tramidol, trammadol, tramodal, tramodol, tramol, trams, tranadol, ulram, ultam, ultracet, ultram, ultrams, ultrm, ultrum ◦ oxymorphone Continued on next page APPENDIX C. F77 DRUG OF CHOICE SUPPLEMENTARY MATERIAL 132 Table C.1 – Continued from previous page Category OTC Drug name Resolved drug terms i vicodin vics, vicks, vic, vicadan, vicaden, vicadin, vicadine, vicadon, viccodin, viccoding, vicdin, vicdon, vicdone, viciden, vicidin, vicidine, vicidon, vicidons, viciodin, vico, vicodan, vicodein, vicodeine, vicoden, vicodene, vicodens, vicodent, vicodien, vicodine, vicodines, vicoding, vicodins, vicodion, vicodn, vicodon, vicodone, vicodyn, vicoin, vicondin, vicos, vicotin, vidodin, vik, vikcs, vike, vikes, vikoden, vikodin, viks, viocdin, viocidin, viocoden, viodin, vivodin, vivodins, vocidin, vocodin, vicodin ◦ vicoprofen vicaprofen, vicobrofin, vicoprofen, vicoprofin, vicoprohen, vicoprophen, vicoprophin, vicroprofen, vicuprofen ◦ acetaminophen, acetamenophin, acetamenophine, acetaminaphen, acetaminaphin, etaminophen, acetem, aceteminophen, acetomenophine, acetominophen, acetominophin ◦ acetaminophen benadryl, benadril, benadryl, benadryll, bendryl, benedryl, benodryl, benydryl ◦ dextromethorphan, dxm ◦ ibuprofen advil, ibeprofen, ibogaine, ibp, ibprofen, ibprofin, ibprohin, ibprophin, ibu, ibupofen, ibupro, ibuprofen, ibuprofin, ibuprophen, ibuprophin, ibuprophren, ibupropin, mortin, mortrin, motrin, neurofen, neurophen, nurofen ◦ melatonin melantonin, melatonin, meletonin, melitonin, melotonin ◦ naproxen naproxen, aleeve, aleve, aleive, alieve, alleve ◦ nyquil, nyquill ◦ paracetamol, paracetemol, paracetomal, paracetomol, parecetamol ◦ tyelonol, tyenol, tyl, tylanol, tylenal, tylenol, tylenol oc, tyleonol, tylinol, tylonal, tylonel, tylonol, tylox, tyloxes, tynenol, tyneol, tylenol ◦ benadryl dextromethorphan nyquil paracetamol tylenol sedative ac- alprazolam ativan barbiturates benzodiazepine buspirone chlordiazepoxide clonazepam diazepam eszopiclone fioricet flunitrazepam gabapentin alpralozam, alprazalam, alprazolam, alprozalam, alprozolam, ◦ ativan, adavan, adavant, adavin, adivan, advan ◦ barbiturates, barbituates, butalbital, phenobarbital, barbs ◦ benzodiazepine, benzo, benzocaine, benzodiazapenes, benzodiazapines, benzodiazepams, benzodiazepines, benzodiazpines, benzoes, benzoids, benzos, oxazepam ◦ buspirone, buspar ◦ chlordiazepoxide, librium ◦ clonazepam, klnopin, klodopin, klon, klonapin, klonapins, klonepin, kloni, klonidine, klonipin, klonipin oxycontins, klono, klonoin, klonoipn, klonopan, klonopin, klonopine, klonopines, klonopins, klonpin, klonpion, klonzapam, klopin, kloponin, kolonapin, kolonipin, kolonopin, kolonopins, kpins, clonazepam, clonazepams, clonazepham, clonozepam, clonapin, clonapine, clonopin, clonopine, clonipin, clonipine, clonipins, colonopin ◦ diazapam, diazapams, diazepam, diazipam ◦ eszopiclone, lunesta ◦ fioricet, fioricet, fierocet, fioracet, fiorcet, fiorecet, fiorecett, fioricet, fiorocet, fiurecet, floricet, fiorinal, fiorinals, fiorinol, fiornal, fiorinal, fiorinals, fiorinol, fiornal ◦ flunitrazepam, rohypnol ◦ gabapentin, nerontin, neuotin, neuratin, neurontin, neuronton, neurontrin, neurotin, neuroton, neurontin ◦ ◦ ghb lorazapam, lorazapan, lorazepam, lorazepan, lorazopam, lorazpam, lorezapam, lorezepam ◦ soma, soma pills, somas ◦ valium valium, valiums, vallium, vallum, valuem, valuim, valuims, valum, valume ◦ xanax xaanx, xana, xanac, xanacs, xananx, xanax, xanex, xanix, xannax, xannies, xantax, xantex, xanx, xanxa, xanxax, xnax, xznax, zanax, zanaz, zanex, zanix, zannax, zantac, zanx ◦ zolpidem zolpidem, ambein, ambian, ambiem, ambien ◦ sedatives sedatives, ketamine ◦ adderal, adderal, adderall, adderalll, adderol, adderral, adderrall, adderrol, addreall, aderol, aderoll, aderrall, dexedrine, dextroamphetamine ◦ amphetamine, amphetamines ◦ lorazepam soma stimulant adderall amphetamine ◦ amphetamine LSD mdma ◦ lsd, acid ◦ mdma, ecstacy, ecstasty, ecstasy, exstacy, extacy Continued on next page APPENDIX C. F77 DRUG OF CHOICE SUPPLEMENTARY MATERIAL 133 Table C.1 – Continued from previous page Category Drug name methamphetamine methylphenidate modafinil general antidepressant general Resolved drug terms i methamphetamine, meth, meth smoker, methamphedamines, methamphetamine, methamphetamines, methamphetimines, methanphetamine, smoking meth ◦ methylphenidate, ritalin, ritilan, ritilin, concerta ◦ alertec ◦ meds, drugs, drug, med narcotics narcotics, narc, narc meds, narc pain meds, narc painkillers, narcan, narcanon, narcatic, narcatics, narcodics, narcotic, narcotic meds, narcotic pain killers, narcotic pain medication, narcotic pain medications, narcotic pain medicine, narcotic pain medicines, narcotic pain meds, narcotic pain pill, narcotic pain pills, narcotic pain reliever, narcotic pain relievers, narcotic pain-killers, narcotic painkillers, narcotic pills, narcotics, narcotis, narcs, narctoics painkillers pain pill, analgesic, analgesics, pain meds, pain pills, pain killers, painkillers, pain medication, pain medicine, pain medications, pain relievers, pain kiilers, pain killer, pain killer pills, pain killlers, pain kills, pain kller, painpills, pain med, pain reliever, painkillers hydros, painkliiers, painmeds, pains meds, pill, pills, narcotic pain, pils, ls, pilss, pharmaceuticals, pain amitriptyline amitriptyline, amiltriptyline, amitriptaline, amitripthyline, amitriptyline, amitripyline, amitryptaline, amitryptilline aripiprazole aripiprazole, abilfy, abilify citalopram citalopram, celexa, celexia, celxa duloxetine duloxetine, cymbalta, cybalta, cymalta, cymbalata, cymbalta, cymbalts, cymbata, cymbolta, cynbalta fluoxetine prozac, fluoxetine, fluoxtine lexapro paroxetine paroxetine, paroxetine, paroxotine, paxatine, paxial, paxil, paxill trazodone trazadone, trazodone venlafaxine wellbutrin zoloft NA albuterol venlafaxine, effexor, efforex, effxor, eflexor bupropion, buproprio, wellbutrin, welbrutrin, welbutrin, wellbrutrin, wellbutrin zoloft, zoloff albuterol sulphate, albuterol amoxicillin amoxicillin, amoxcillin, amoxicillin, amoxxillin antibiotics antibiotics, anitbiotics, anitdepressants, anphetamines, antabuse, antibiotic, antibiotics, antibitics, antibotics carisoprodol carisoprodol clonidine clonidine, cloadine, clondine, clonidine, clonine, clonodin, clonodine, colodine, colondine, colonidine, colonodine cyclobenzaprine cyclobenzaprine, flexaril, flexarill, flexeral, flexerall, flexeril, flexerill, flexerils, flexerol, flexiril, flexirils, flexirl, flexril, flexrill naloxone naloxone, nalorex, naloxone naltrexone naltrexone, naltexone, naltraxone, naltrex, naltrexone, naltrexone hydrochloride, naltexone, naltraxone, naltrex, naltrexone, naltrexone hydrochloride prednisone prednisone, predensone, predinsone, predisolone, predisone, prednisolone, prednison, prednisone, prednizone pregabalin pregabalin, lyrica quetiapine quetiapine, seraquel, seraquil, sereoqol, serequel, serequil, serezone, seriquil, seroqel, seroquel, seroquell, seroquels, seroquil, serqual, serquil steroids steroids, roids vitamins vitamins, vitaimns, vitamans, vitamians, vitamines, vitamins, vitamns, vitams, vitc, vite, vitiamins, vitiams, vitimans, vitimins, vits, supplements zaleplon zaleplon, sonata APPENDIX C. F77 DRUG OF CHOICE SUPPLEMENTARY MATERIAL 134 Table C.2: The default feature list for Stanford’s NER classifier is at nlp.stanford.edu/nlp/javadoc/ javanlp/edu/stanford/nlp/ie/NERFeatureFactory.html. Here, we list all features whose default values were changed to train our DOC classifier. Feature Name Feature Value useTag true useClassFeature true useWord true maxNGramLeng 3 useNGrams true usePrev true useNext true useSequences true usePrevSequences true maxLeft 1 useTypeSeqs false useTypeSeqs2 false useTypeySequences false wordShape chris2useLC useLemmas true useDistSim true distSimLexicon We used Twitter word clusters [189] and word clusters generated using the Brown hierarchical word clustering algorithm [32, 157] on all MedHelp posts. useDisjunctive true disjunctionWidth 3 cleanGazette true gazette We utilized a dictionary composed from several online lists of commonly misused substances. Table C.3 shows all dictionary terms. APPENDIX C. F77 DRUG OF CHOICE SUPPLEMENTARY MATERIAL 135 Table C.3: Gazette of common substances used as a feature in the DOC classifier. This gazette was compiled from a range of online resources. Acamprosate, acid, actiq, adderall, aerosol propellants, alcohol, alprazolam, ambien, amidone, amobarbital, amphetamine, amphetamines, amytal, anadrol, anexsia, angel dust, antabuse, apache, ativan, avinza Barbs, beer, bennies, bidis, big o, biocodone, biocondone, biphetamine, biscuits, black beauties, black stuff, blue heaven, blues, blunt, buprenorphine, butalbital, butane propane, butorphanol Cactus, campral, captain cody, carisoprodol, cat valium, chalk, charlie, china girl, china white, chlordiazepoxide, cigarettes, cigars, clarity, clonazepam, clonidine, cocaine, cocaine hydrochloride, codeine, cody, coke, concerta, crack, crack cocaine, crank, crosses, crystal, crystal meth, cubes, cyclohexyl Damason-p, dance fever, darvocet, darvon, demerol, demmies, depade, depo-testosterone, desoxyn, dexedrine, dextroamphetamine, dextromethorphan, dextropropoxyphene, dextrostat, di-gesic, diacetylmorphine, diazepam, dicodid, dilaudid, dillies, disulfiram, dolophine, dope, downers, duodin, durabolin, duragesic, duramorph, dxm Ecstasy, empirin, empirin with codeine, equipoise, eszopiclone Fentanyl, fioricet, fiorinal, fiorinal with codeine, fizzies, flake, flunitrazepam, forget-me pill Gamma-hydroxybutyrate, ganja, gasoline, georgia home boy, ghb, glues, goodfella, goop, grievous bodily harm, gym candy Halcion, hash, hash oil, hearts, hemp, heroin, hillbilly, hycodan, hydrococet, hydrocodone, hydromorphone, hydros Inhalant, isoamyl isobutyl Jackpot, jif, joint Kadian, kapanol, ketalar sv, ketamine, klonopin La turnaround, laam, laudanum, laughing gas, levacetylmethadol, librium, liquid ecstasy, liquid x, liquor, little smoke, lorazepam, lorcet, lortab, love boat, lover’s speed, lsd, luminal, lunesta, lysergic acid diethylamide Magic mint, magic mushrooms, maria pastora, marijuana, mary jane, meperidine, meperidine hydrochloride, mesc, mescaline, meth, methadone, methadose, methadrine, methamphetamine, methaqualone, methylphenidate, mexican valium, microdot yellow sunshine, miss emma, monkey, morphine, mrs. o, ms contin, msir, murder 8, mushrooms Naltrexone, nembutal, nitrites, nitrous oxide, norco, numorphone, numporphan O bomb, o.c., octagons, opana, opium, oramorph, orlaam, oxandrin, oxy, oxycet, oxycodone, oxycontin, oxycotton Paint thinners, palladone, panacet, paregoric, pcp, peace, peace pill, pentobarbital, percocet, percocet:oxy, percodan, percs, peyote, phencyclidine, phennies, phenobarbital, poppers, pot, pumpers, purple passion Quaalude R-ball, red birds, reds, reefer, revia, ritalin, roach, robitussin, robitussin a, robitussin a-c, robitussin b, robitussin c, robo, robotripping, roche, rohypnol, roids, roofies, roofinol, rophies, roxanol, roxicodone, roxicondone, ryzolt Sally-d, salvia, schoolboy, secobarbital, seconal, shepherdess’s herb, shrooms, sinsemilla, skag, skippy, sleeping pills, smack, smoke, snappers, solvents, soma, sonata, special k, speed, steroids, stilnox, stop signs, sublimaze, suboxone, subutex, symtan Tango and cash, temesta, the smart drug, tnt, tooies, toot, tramadol, tramal, tranks, triazolam, triple c, truck drivers, tussionex, tylenol, tylenol with codeine, tylox Ultram, uppers Valium, vicodin, vicoprofen, vike, vitamin k, vitamin r, vivitrol Watson-387, weed, white horse, white stuff, wine Xanax, xodol Yellow jackets, yellows Zaleplon, zolpidem, zydone Appendix D F77 Phase Supplementary Material Table D.1: LIWC features for the three classes in the labeled dataset over initiating posts. Only statistically significant variables are shown. Statistical significance is determined using Kruskal-Wallis tests (* p < 0.05; ** p < 0.005; *** p < 0.001) after Bonferroni corrections to adjust for family-wise error rate across all 184 variables (includes activity features). Column c denotes (◦) if the feature is used in our CRF classifier. Initiating Post Linguistic Features Word count Dic Numerals Function words Pronoun Personal pronoun Pronoun: I Pronoun: you Pronoun: he/she Pronoun: they Pronoun: impers. Verb Present tense Numbers Social Humans Affect Affect: positive Affect: anxiety Cognitive Mech. Certain Inhibition See Feel Biological Body Health Relative Time Home Comma QMark Other Punctuation c p USING Mean Median ◦ ◦ ◦ ◦ * *** *** *** *** *** *** *** *** *** * ** *** ** *** * *** *** ** * * * * *** *** *** *** *** *** *** ** * *** 208.20 89.26 1.28 60.50 18.51 12.83 9.72 0.98 1.14 0.65 5.68 18.54 12.56 0.71 7.60 0.49 5.30 2.80 0.61 17.27 1.21 0.50 0.34 0.73 3.87 0.58 3.00 13.46 7.24 0.30 3.01 1.35 0.81 ◦ ◦ ◦ ◦ ◦ ◦ ◦ ◦ ◦ ◦ ◦ ◦ ◦ ◦ ◦ ◦ ◦ ◦ ◦ ◦ ◦ ◦ ◦ ◦ ◦ ◦ 151 90.17 0.89 60.92 18.68 13.05 9.97 0.41 0 0.20 5.33 18.69 12.55 0.48 6.59 0 5.00 2.45 0.25 16.98 1.03 0.23 0 0.45 3.46 0 2.63 13.39 6.86 0 2.17 0.52 0 WITHDRAWAL SD Mean Median 211.06 4.89 1.51 5.31 4.32 3.83 3.60 1.70 2.08 1.05 2.82 3.76 3.90 0.93 4.79 0.76 2.76 1.99 0.88 4.50 1.22 0.70 0.65 1.10 2.63 1 2.29 4.65 3.46 0.54 3.36 2.87 1.77 178.92 88.10 1.75 58.40 16.99 11.49 9.02 1.02 0.74 0.47 5.49 17.64 11.53 0.75 6.38 0.40 5.76 3.33 0.55 17.14 1.41 0.41 0.30 1.18 4.01 1.13 2.58 15.04 8.51 0.40 2.75 1.34 0.89 136 127.00 88.89 1.33 59.28 17.17 11.54 9.14 0.13 0 0 5.26 17.59 11.24 0.37 5.26 0 5.54 2.86 0 17.09 1.21 0 0 0.83 3.70 0.63 2.13 14.75 7.87 0 1.94 0.40 0 POST-WITHDRAWAL SD Mean Median 168.81 6.26 1.97 6.45 4.70 4.19 3.76 2.04 1.82 1.12 2.81 4.20 4.09 1.12 5.18 0.79 3.09 2.85 0.98 4.95 1.41 0.74 0.80 1.50 2.90 1.53 2.36 5.25 4.21 0.77 3.27 2.58 1.91 183.23 89.38 1.32 59.74 17.97 11.88 7.89 2.05 1 0.54 6.09 18.13 11.95 0.54 8.85 0.57 6.41 4.14 0.45 17.93 1.57 0.43 0.50 0.85 3.31 0.68 2.20 13.72 7.33 0.68 2.19 1.50 0.62 124.50 90.54 0.83 60.48 18.16 11.86 8.18 0.99 0 0 5.76 17.96 11.63 0 7.89 0 6.11 3.50 0 17.96 1.36 0 0 0.50 2.89 0 1.72 13.61 7.02 0.14 1.63 0 0 SD 209.24 6.59 2.04 7.07 5.28 4.60 4.31 2.89 2.46 1.03 3.35 4.91 4.45 0.89 5.90 1.04 3.52 3.16 0.90 5.11 1.53 0.76 1.14 1.23 2.72 1.12 2.25 5.23 4.23 1.18 2.43 4.92 2.05 APPENDIX D. F77 PHASE SUPPLEMENTARY MATERIAL 137 Table D.2: LIWC features for the three classes in the labeled dataset. Only statistically significant variables are shown. Statistical significance is determined using Kruskal-Wallis tests (* p < 0.05; ** p < 0.005; *** p < 0.001) after Bonferroni corrections to adjust for family-wise error rate across all 184 variables (includes activity features). Column c denotes (◦) if the feature is used in our CRF classifier. Response Post Linguistic Features c Word count Words per sentence Numerals Function words Personal Pronouns Pronoun: she/he Pronoun: they Pronoun: impers. Article Verb Aux. verb Future Preposition Conjunction Quantitative Social Affect Affect: positive Affect: negative Affect: anxiety Cognitive Proc. Discrepancy Tentative Exclusive Perceptual proc. Feel Biological Body Health Sexual Ingetion Relativity Time Money Assent Colon Exclamation Dash Other punct. All punct. ◦ ◦ ◦ ◦ ◦ ◦ ◦ ◦ ◦ ◦ ◦ p USING Mean Median *** *** * *** *** ** *** ** *** ** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** * ** ** * *** ** *** ** *** *** 494.69 19.21 0.75 59.01 10.86 0.68 0.66 5.48 4.91 17.26 10.67 1.50 11.63 6.39 3.00 10.26 5.73 3.72 1.96 0.36 19.37 2.32 3.35 3.35 1.52 0.64 3.46 0.52 2.68 0.15 0.17 11.46 5.29 0.32 0.27 0.09 1.02 0.79 3.41 22.07 347.00 15.40 0.43 59.85 11.36 0 0.41 5.67 4.98 18.15 11.11 1.44 12.27 6.76 2.99 10.11 5.76 3.53 1.92 0.24 17.81 2.32 3.25 3.40 1.48 0.53 3.20 0.28 2.45 0 0 11.82 5.10 0.13 0.07 0 0.34 0.28 2.84 21.51 WITHDRAWAL SD Mean Median 506.67 18.60 1.02 4.56 3.99 1.35 0.91 2.20 2.06 4.94 3.51 1.07 3.57 2.33 1.52 4.77 2.68 2.43 1.34 0.47 7.77 1.31 1.79 1.62 1.07 0.70 2.17 0.78 1.85 0.35 0.39 4.39 2.90 0.55 0.50 0.20 1.79 2.08 2.55 9.71 427.38 17.04 0.95 56.95 10.21 0.44 0.49 5.57 4.75 17.13 10.37 1.50 11.19 6.18 2.94 8.83 6.55 4.61 1.90 0.40 18.71 1.92 3.12 3.07 1.90 0.91 3.42 0.78 2.24 0.14 0.30 12.36 5.88 0.28 0.40 0.15 2.25 0.82 4.29 25.75 284.00 14.09 0.68 57.69 10.53 0 0.27 5.78 4.96 17.82 10.68 1.43 11.66 6.58 2.88 8.75 6.34 4.10 1.87 0.23 17.43 1.88 3.09 3.18 1.81 0.76 3.22 0.45 1.95 0 0 12.68 6.06 0 0.18 0 0.82 0 3.53 23.69 POST-WITHDRAWAL SD Mean Median 487.46 14.73 1.12 5.41 3.81 1.16 0.66 2.36 2.02 4.88 3.44 1.13 3.38 2.46 1.64 4.23 3.25 3.17 1.33 0.55 7.83 1.33 1.77 1.66 1.34 0.85 2.46 1.08 1.90 0.36 0.66 4.70 3.12 0.56 0.81 0.42 5.08 2.20 3.22 14.52 356.29 14.98 0.95 55.82 10.86 0.64 0.49 5.10 4.20 16.09 9.66 1.10 10.61 5.72 2.50 9.78 7.54 5.84 1.67 0.32 18.77 1.63 2.55 2.56 1.87 0.65 2.71 0.52 1.70 0.30 0.25 11.90 5.66 0.23 0.62 0.27 4.52 0.62 5.64 29.69 210.50 12.99 0.56 57.06 11.58 0 0.13 5.32 4.41 17.23 10.33 1.01 11.51 6.13 2.58 9.81 7.33 5.13 1.50 0 16.80 1.60 2.45 2.60 1.68 0.45 2.41 0.19 1.32 0 0 12.50 5.69 0 0.27 0 1.68 0 4.29 26.82 SD 439.75 14.25 1.49 7.17 4.71 1.63 0.90 2.75 2.23 5.78 3.96 1.03 4.14 2.69 1.67 5.45 4.31 4.36 1.51 0.61 10 1.30 1.96 1.83 1.55 0.76 2.39 0.90 1.76 0.89 0.71 5.37 3.33 0.42 2.01 0.84 8.40 1.64 6.35 19.27 Days since last init. post Days since last self resp. Days since last response Days since last activity # initiating posts # responses authored # replies received # respondants # self responses ◦ ◦ ◦ ◦ ◦ ◦ ◦ ◦ . . ** *** *** *** *** *** *** *** *** *** *** *** *** p # Responses# # Initiating terms terms terms RECOVERING WITHDRAWING USING Days clean Days mentioned # questions # USING terms # WITHDRAWING terms # RECOVERING terms *** *** ** *** *** *** ** *** *** ◦ ◦ ◦ ◦ ◦ ◦ ◦ ◦ ◦ Post and Response Content Characteristics Today # initiating posts # responses authored # initiating posts authored # self responses authored Last 5 days # responses authored All time # initiating posts authored # self responses authored # responses authored Activity Characteristics c 0.31 0.86 0.53 421.15 52.10 2.94 0.73 0.50 0.38 5.15 3.82 1.57 0.93 1.37 1.87 1.02 8.84 13.93 26.90 1.36 50.94 66.34 73.37 39.56 Mean 0.00 0.00 0.00 14.00 10.00 2.00 0.00 0.00 0.00 4.00 3.00 1.00 0.00 0.00 0.00 1.00 5.00 5.00 6.00 1.00 5.00 9.00 5.00 3.00 Med IQR 0.00 1.00 1.00 175.00 38.25 3.00 1.00 1.00 1.00 5.00 3.00 2.00 1.00 2.00 2.00 0.00 10.00 18.00 21.00 1.31 24.00 43.50 27.00 13.00 USING 0.00 0.00 0.00 17.79 11.86 1.48 0.00 0.00 0.00 2.97 2.97 1.48 0.00 0.00 0.00 0.00 5.93 7.41 8.90 1.02 5.93 11.86 5.93 2.97 MAD 0.19 1.18 0.53 47.50 19.08 2.35 0.35 1.11 0.39 5.52 4.05 1.89 2.01 3.32 5.48 1.06 8.78 13.80 23.61 1.28 21.04 29.94 33.51 16.66 Mean 0.00 1.00 0.00 5.00 5.00 2.00 0.00 1.00 0.00 4.00 3.00 1.00 1.00 1.00 1.00 1.00 5.00 8.00 8.00 0.82 2.00 2.00 2.00 1.00 Med 0.00 2.00 1.00 7.00 7.00 2.00 1.00 2.00 1.00 5.00 3.00 3.00 3.00 4.00 6.00 0.58 8.00 15.00 21.00 1.26 5.00 8.00 6.00 2.00 IQR WITHDRAWING 0.00 1.48 0.00 4.45 4.45 1.48 0.00 1.48 0.00 2.97 2.97 1.48 1.48 1.48 1.48 0.64 4.45 8.90 10.38 0.85 1.48 1.48 1.48 0.00 MAD 0.18 0.76 0.78 125.97 57.03 2.60 0.25 0.44 0.94 6.09 4.68 1.53 1.81 2.89 15.20 0.58 20.73 33.26 178.69 0.53 31.04 42.05 28.68 17.76 Mean 0.00 0.00 0.00 45.00 27.00 2.00 0.00 0.00 1.00 4.00 3.00 1.00 1.00 0.00 5.00 0.33 14.00 23.00 67.00 0.22 4.00 6.00 2.00 1.00 Med 0.00 1.00 1.00 74.00 48.00 2.00 0.00 1.00 1.00 6.00 4.00 2.00 3.00 4.00 16.00 0.87 22.00 36.25 159.25 0.35 12.00 17.00 5.00 4.00 IQR RECOVERING 0.00 0.00 0.00 43.00 28.17 1.48 0.00 0.00 1.48 4.45 2.97 1.48 1.48 0.00 7.41 0.42 13.34 23.72 83.77 0.21 4.45 7.41 1.48 0.00 MAD Table D.3: Activity and content-based features for the three classes in the labeled dataset. Statistical significance is determined using Kruskal-Wallis tests (* p < 0.05; ** p < 0.005; *** p < 0.001) after Bonferroni corrections to adjust for family-wise error rate across all 184 variables (includes 160 LIWC variables). Column c denotes (◦) if the feature is used in our CRF classifier. APPENDIX D. F77 PHASE SUPPLEMENTARY MATERIAL 138 Bibliography [1] Alcoholics Anonymous (“Big Book,” 4th ed.). AA World Services, Inc. (2001). [Online: http: //www.aa.org/bigbookonline, accessed 20-May-2014]. [2] Narcotics Anonymous Annual Membership Survey. Narcotics Anonymous (2011). [Online: http://www.na.org/admin/include/spaw2/uploads/pdf/PR/NA_Membership_Survey.pdf, accessed 12-August-2013]. [3] Vital signs: Overdoses of prescription opioid pain relievers United States, 1999-2008. Center for Disease Control. Morbidity and Mortality Weekly Report. (2011). [Online: http://www.cdc.gov/ mmwr/preview/mmwrhtml/mm6043a4.htm, accessed 93/4/2014.]. [4] Addiction medicine, closing the gap between science and practice. CASAColumbia (2012). [Online: http://www.casacolumbia.org/download/file/fid/1177, accessed 4/5/2014.]. [5] Commonly abused prescription drugs. National Institute on Drug Abuse (2012). [Online: http:// www.drugabuse.gov/sites/default/files/rx_drugs_placemat_508c_10052011.pdf, accessed 28-May-2014]. [6] Opiate withdrawal. MedlinePlus - U.S. National Library of Medicine (2012). [Online: http://www. nlm.nih.gov/medlineplus/ency/article/000949.htm, accessed 28-May-2014]. [7] Internet user demographics. [Online: http://www.pewinternet.org/data-trend/ internet-use/latest-stats/, accessed 7/1/2014]. [8] Prescription painkiller overdoses: A growing epidemic, especially among women. Vital Signs. CS238899B. Center for Disease Control. (2013). [Online: http://www.cdc.gov/vitalsigns/ pdf/2013-07-vitalsigns.pdf, accessed 9/4/2014]. [9] State and County QuickFacts. U.S. Census Bureau (2013). [Online: http://quickfacts. census.gov/qfd/states/00000.html, accessed 28-August-2014]. 139 BIBLIOGRAPHY 140 [10] Achrekar, H., Gandhe, A., Lazarus, R., Yu, S.-H., and Liu, B. Predicting flu trends using Twitter data. In Computer Communications Workshops, IEEE (2011), 702–707. [11] Ahmad, F., Hudak, P. L., Bercovitz, K., Hollenberg, E., and Levinson, W. Are physicians ready for patients with Internet-based health information? Journal of Medical Internet Research 8, 3 (2006), e22. [12] Alpers, G. W., Winzelberg, A. J., Classen, C., Roberts, H., Dev, P., Koopman, C., and Barr Taylor, C. Evaluation of computerized text analysis in an Internet breast cancer support group. Computers in Human Behavior 21, 2 (2005), 361–376. [13] Anand, S. G., Feldman, M. J., Geller, D. S., Bisbee, A., and Bauchner, H. A content analysis of e-mail communication between primary care providers and parents. Pediatrics 115, 5 (2005), 1283–1288. [14] Anderson, J. G., Rainey, M. R., and Eysenbach, G. The impact of cyberhealthcare on the physician–patient relationship. Journal of Medical Systems 27, 1 (2003), 67–84. [15] Aramaki, E., Maskawa, S., and Morita, M. Twitter catches the flu: detecting influenza epidemics using Twitter. In Empirical Methods in Natural Language Processing, ACL (2011), 1568–1576. [16] Aronson, A. R. Effective mapping of biomedical text to the UMLS Metathesaurus: the MetaMap program. In American Medical Informatics Association Annual Symposium, AMIA (2001), 17. [17] Aronson, A. R., and Lang, F.-M. An overview of MetaMap: historical perspective and recent advances. Journal of the American Medical Informatics Association 17, 3 (2010), 229–236. [18] Ayers, J. W., Ribisl, K. M., and Brownstein, J. S. Tracking the rise in popularity of electronic nicotine delivery systems (electronic cigarettes) using search query surveillance. American Journal of Preventive Medicine 40, 4 (2011), 448–453. [19] Baccianella, S., Esuli, A., and Sebastiani, F. SentiWordNet 3.0: An enhanced lexical resource for sentiment analysis and opinion mining. In Language Resources and Evaluation (2010). [20] Bebbington, P. E. The efficacy of Alcoholics Anonymous: the elusiveness of hard data. The British Journal of Psychiatry 128, 6 (1976), 572–580. BIBLIOGRAPHY 141 [21] Bell, V. Online information, extreme communities and Internet therapy: Is the Internet good for our mental health? Journal of Mental Health 16, 4 (2007), 445–457. [22] Bender, J. L., Jimenez-Marroquin, M.-C., and Jadad, A. R. Seeking support on Facebook: a content analysis of breast cancer groups. Journal of Medical Internet Research 13, 1 (2011), e16. [23] Benton, A., Ungar, L., Hill, S., Hennessy, S., Mao, J., Chung, A., Leonard, C. E., and Holmes, J. H. Identifying potential adverse effects using the web: A new approach to medical hypothesis generation. Journal of Biomedical Informatics 44, 6 (2011), 989–996. [24] Berger, M., Wagner, T. H., and Baker, L. C. Internet use and stigmatized illness. Social Science & Medicine 61, 8 (2005), 1821–1827. [25] Berland, G. K., Elliott, M. N., Morales, L. S., Algazy, J. I., Kravitz, R. L., Broder, M. S., Kanouse, D. E., Muñoz, J. A., Puyol, J.-A., Lara, M., et al. Health information on the Internet: accessibility, quality, and readability in English and Spanish. Journal of the American Medical Association 285, 20 (2001), 2612–2621. [26] Bernstein, M. S., Little, G., Miller, R. C., Hartmann, B., Ackerman, M. S., Karger, D. R., Crowell, D., and Panovich, K. Soylent: a word processor with a crowd inside. In User Interface Software and Technology, ACM (2010), 313–322. [27] Birnbaum, H. G., White, A. G., Schiller, M., Waldman, T., Cleveland, J. M., and Roland, C. L. Societal costs of prescription opioid abuse, dependence, and misuse in the United States. Pain Medicine 12, 4 (2011), 657–667. [28] Biyani, P., Caragea, C., Mitra, P., and Yen, J. Identifying emotional and informational support in online health communities. In Computational Linguistics, ICCL (2014), 827–836. [29] Braithwaite, D. O., Waldron, V. R., and Finn, J. Communication of social support in computermediated groups for people with disabilities. Health Communication 11, 2 (1999), 123–151. [30] Braun, V., and Clarke, V. Using thematic analysis in psychology. Qualitative Research in Psychology 3, 2 (2006), 77–101. [31] Brennan, P. F., and Aronson, A. R. Towards linking patients and clinical information: detecting UMLS concepts in e-mail. Journal of Biomedical Informatics 36, 4 (2003), 334–341. BIBLIOGRAPHY 142 [32] Brown, P. F., Desouza, P. V., Mercer, R. L., Pietra, V. J. D., and Lai, J. C. Class-based n-gram models of natural language. In Computational Linguistics, vol. 18, ICCL (1992), 467–479. [33] Brownstein, J. S., Freifeld, C. C., Reis, B. Y., and Mandl, K. D. Surveillance Sans Fron- tieres: Internet-based emerging infectious disease intelligence and the HealthMap project. PLoS Medicine 5, 7 (2008), e151. [34] Buchanan, H., and Coulson, N. S. Accessing dental anxiety online support groups: An exploratory qualitative study of motives and experiences. Patient Education and Counseling 66, 3 (2007), 263–269. [35] Buehler, J. W., Berkelman, R. L., Hartley, D. M., and Peters, C. J. Syndromic surveillance and bioterrorism-related epidemics. Emerging Infectious Diseases 9, 10 (2003), 1197. [36] Buis, L. R. Emotional and informational support messages in an online hospice support community. Computers Informatics Nursing 26, 6 (2008), 358–367. [37] Bundorf, M. K., Wagner, T. H., Singer, S. J., and Baker, L. C. Who searches the Internet for health information? Health Services Research 41, 3p1 (2006), 819–836. [38] Butler, D. When google got flu wrong. Nature 494, 7436 (2013), 155. [39] Card, S. K., Mackinlay, J. D., Pirolli, P. L., and Pitkow, J. E. Method and apparatus for clustering a collection of linked documents using co-citation analysis, 2000. US Patent 6,038,574. [40] Carmichael, A. Infertility-Asthma Link Confirmed. Cure Together Blog. www.curetogether.com/blog/2011/03/07/infertility-asthma-link-confirmed, [Online: ac- cessed 15-Sept-2013]. [41] Carneiro, H. A., and Mylonakis, E. Google trends: a web-based tool for real-time surveillance of disease outbreaks. Clinical Infectious Diseases 49, 10 (2009), 1557–1564. [42] Cartright, M.-A., White, R. W., and Horvitz, E. Intentions and attention in exploratory health search. In Research and Development in Information Retrieval, ACM SIGIR (2011), 65–74. [43] Chapman, W. W., Fiszman, M., Dowling, J. N., Chapman, B. E., and Rindflesch, T. C. Identifying respiratory findings in emergency department reports for biosurveillance using MetaMap. Medinfo 11, Pt 1 (2004), 487–91. BIBLIOGRAPHY 143 [44] Chary, M., Genes, N., McKenzie, A., and Manini, A. F. Leveraging social networks for toxicovigilance. Journal of Medical Toxicology 9, 2 (2013), 184–191. [45] Chee, B. W., Berlin, R., and Schatz, B. Predicting adverse drug events from personal health messages. In American Medical Informatics Association Annual Symposium, AMIA (2011), 217. [46] Cicero, T. J., Ellis, M. S., and Surratt, H. L. Effect of abuse-deterrent formulation of oxycontin. New England Journal of Medicine 367, 2 (2012), 187–189. [47] Civan, A., and Pratt, W. Threading together patient expertise. In American Medical Informatics Association Annual Symposium, AMIA (2007), 140. [48] Cleveland, W. S., and Devlin, S. J. Locally weighted regression: an approach to regression analysis by local fitting. Journal of the American Statistical Association 83, 403 (1988), 596–610. [49] Cline, R. J., and Haynes, K. M. Consumer health information seeking on the Internet: the state of the art. Health Education Research 16, 6 (2001), 671–692. [50] Cohen, J. Weighted kappa: Nominal scale agreement provision for scaled disagreement or partial credit. Psychological Bulletin 70, 4 (1968), 213. [51] Coiera, E. Information epidemics, economics, and immunity on the Internet: We still know so little about the effect of information on public health. British Medical Journal 317, 7171 (1998), 1469. [52] Collier, N., Doan, S., Kawazoe, A., Goodwin, R. M., Conway, M., Tateno, Y., Ngo, Q.-H., Dien, D., Kawtrakul, A., Takeuchi, K., et al. Biocaster: detecting public health rumors with a web-based text mining system. Bioinformatics 24, 24 (2008), 2940–2941. [53] Cooper, C. P., Mallon, K. P., Leadbetter, S., Pollack, L. A., and Peipins, L. A. Cancer Internet search activity on a major search engine, United States 2001-2003. Journal of Medical Internet Research 7, 3 (2005), e36. [54] Corazza, O., Valeriani, G., Bersani, F. S., Corkery, J., Martinotti, G., Bersani, G., and Schifano, F. “Spice”, “Kryptonite”, “Black Mamba”: An Overview of Brand Names and Marketing Strategies of Novel Psychoactive Substances on the Web. Journal of Psychoactive Drugs 46, 4 (2014), 287–294. BIBLIOGRAPHY 144 [55] Corley, C., Mikler, A. R., Singh, K. P., and Cook, D. J. Monitoring influenza trends through mining social media. In Bioinformatics and Computational Biology (2009), 340–346. [56] Corley, C. D., Cook, D. J., Mikler, A. R., and Singh, K. P. Text and structural data mining of influenza mentions in web and social media. International Journal of Environmental Research and Public Health 7, 2 (2010), 596–615. [57] Cotten, S. R., and Gupta, S. S. Characteristics of online and offline health information seekers and factors that discriminate between them. Social Science & Medicine 59, 9 (2004), 1795–1806. [58] Coulson, N. S. Receiving social support online: an analysis of a computer-mediated support group for individuals living with irritable bowel syndrome. CyberPsychology & Behavior 8, 6 (2005), 580– 584. [59] Coulson, N. S., Buchanan, H., and Aubeeluck, A. Social support in cyberspace: a content analysis of communication within a Huntington’s disease online support group. Patient Education and Counseling 68, 2 (2007), 173–178. [60] Coulson, N. S., and Knibb, R. C. Coping with food allergy: exploring the role of the online support group. CyberPsychology & Behavior 10, 1 (2007), 145–148. [61] Coursaris, C. K., and Liu, M. An analysis of social support exchanges in online HIV/AIDS self-help groups. Computers in Human Behavior 25, 4 (2009), 911–918. [62] Culotta, A. Towards detecting influenza epidemics by analyzing Twitter messages. In workshop on Social Media Analytics, ACM (2010), 115–122. [63] Culotta, A. Lightweight methods to estimate influenza rates and alcohol sales volume from Twitter messages. Language Resources and Evaluation 47, 1 (2013), 217–238. [64] Culver, J. D., Gerr, F., Frumkin, H., et al. Medical information on the Internet. Journal of General Internal Medicine 12, 8 (1997), 466–470. [65] Curtis, B., Alanis-Hirsch, K., Kaynak, Ö., Cacciola, J., Meyers, K., and McLellan, A. T. Using web searches to track interest in synthetic cannabinoids (aka “herbal incense”). Drug and Alcohol Review 34, 1 (2014), 105–108. BIBLIOGRAPHY 145 [66] Dasgupta, N., Freifeld, C., Brownstein, J. S., Menone, C. M., Surratt, H. L., Poppish, L., Green, J. L., Lavonas, E. J., and Dart, R. C. Crowdsourcing black market prices for prescription opioids. Journal of Medical Internet Research 15, 8 (2013), e178. [67] Davison, K. P., Pennebaker, J. W., and Dickerson, S. S. Who talks? The social psychology of illness support groups. American Psychologist 55, 2 (2000), 205. [68] De Bock, G. H., Jacobi, C. E., Seynaeve, C., Krol-Warmerdam, E. M., Blom, J., Van Asperen, C. J., Cornelisse, C. J., Klijn, J. G., Devilee, P., Tollenaar, R. A., et al. A family history of breast cancer will not predict female early onset breast cancer in a population-based setting. BMC Cancer 8, 1 (2008), 203. [69] De Choudhury, M., Counts, S., and Horvitz, E. Major life changes and behavioral markers in social media: case of childbirth. In Computer Supported Cooperative Work, ACM (2013), 1431–1442. [70] De Choudhury, M., Counts, S., and Horvitz, E. Predicting postpartum changes in emotion and behavior via social media. In Human Factors in Computing Systems, ACM (2013), 3267–3276. [71] De Choudhury, M., Counts, S., Horvitz, E. J., and Hoff, A. Characterizing and predicting postpartum depression from shared Facebook data. In Computer Supported Cooperative Work, ACM (2014), 626–638. [72] De Choudhury, M., Gamon, M., Counts, S., and Horvitz, E. Predicting depression via social media. In International Conference on Weblogs and Social Media, AAAI (2013). [73] Deluca, P., Davey, Z., Corazza, O., Di Furia, L., Farre, M., Flesland, L. H., Mannonen, M., Majava, A., Peltoniemi, T., Pasinetti, M., et al. Identifying emerging trends in recreational drug use; outcomes from the Psychonaut Web Mapping Project. Progress in Neuro-Psychopharmacology and Biological Psychiatry 39, 2 (2012), 221–226. [74] Diaz, J. A., Griffith, R. A., Ng, J. J., Reinert, S. E., Friedmann, P. D., and Moulton, A. W. Patients’ use of the Internet for medical information. Journal of General Internal Medicine 17, 3 (2002), 180–185. [75] DiClemente, C. C., Prochaska, J. O., Fairhurst, S. K., Velicer, W. F., Velasquez, M. M., and Rossi, J. S. The process of smoking cessation: an analysis of precontemplation, contemplation, and preparation stages of change. Journal of Consulting and Clinical Psychology 59, 2 (1991), 295. BIBLIOGRAPHY 146 [76] Dingare, S., Nissim, M., Finkel, J., Manning, C., and Grover, C. A system for identifying named entities in biomedical text: How results from two evaluations reflect on both the system and the evaluations. Comparative and Functional Genomics 6, 1-2 (2005), 77–85. [77] Doing-Harris, K. M., and Zeng-Treitler, Q. Computer-assisted update of a consumer health vocabulary through mining of social network data. Journal of Medical Internet Research 13, 2 (2011), e37. [78] Dunning, T. Accurate methods for the statistics of surprise and coincidence. Computational Linguistics 19, 1 (1993), 61–74. [79] DuPont, R. L., McLellan, A. T., White, W. L., Merlo, L. J., and Gold, M. S. Setting the standard for recovery: Physicians’ health programs. Journal of Substance Abuse Treatment 36, 2 (2009), 159–171. [80] Esquivel, A., Meric-Bernstam, F., and Bernstam, E. V. Accuracy and self correction of information received from an Internet breast cancer list: content analysis. British Medical Journal 332, 7547 (2006), 939–942. [81] Eysenbach, G. Infodemiology: tracking flu-related searches on the web for syndromic surveillance. In American Medical Informatics Association Annual Symposium, AMIA (2006), 244–248. [82] Eysenbach, G., and Köhler, C. How do consumers search for and appraise health information on the world wide web? Qualitative study using focus groups, usability tests, and in-depth interviews. British Medical Journal 324, 7337 (2002), 573. [83] Eysenbach, G., Powell, J., Kuss, O., and Sa, E.-R. Empirical studies assessing the quality of health information for consumers on the world wide web: a systematic review. Journal of the American Medical Association 287, 20 (2002), 2691–2700. [84] Farrell, M. Opiate withdrawal. Addiction 89, 11 (1994), 1471–1475. [85] Fernandez-Luque, L., Karlsen, R., and Bonander, J. Review of extracting information from the social web for health personalization. Journal of Medical Internet Research 13, 1 (2011), e15. [86] Finfgeld, D. L. Therapeutic groups online: the good, the bad, and the unknown. Issues in Mental Health Nursing 21, 3 (2000), 241–255. BIBLIOGRAPHY 147 [87] Finkel, J., Dingare, S., Nguyen, H., Nissim, M., Manning, C., and Sinclair, G. Exploiting context for biomedical entity recognition: from syntax to the web. In joint workshop on Natural Language Processing in Biomedicine and its Applications, ACL (2004), 88–91. [88] Fleiss, J. L. Measuring nominal scale agreement among many raters. Psychological Bulletin 76, 5 (1971), 378. [89] Fox, N., Ward, K., and O’Rourke, A. Pro-anorexia, weight-loss drugs and the Internet: an “antirecovery” explanatory model of anorexia. Sociology of Health & Illness 27, 7 (2005), 944–971. [90] Fox, S. Peer-to-Peer Health Care. Pew Internet & American Life Project, 2011. [Online: http://pewinternet.org/Reports/2011/P2PHealthcare/Summary-of-Findings.aspx, accessed 6-January-2014]. [91] Fox, S., and Duggan, M. Health Online. Pew Internet & American Life Project, 2013. [Online: http://pewinternet.org/Reports/2013/Health-online/Summary-of-Findings. aspx, accessed 2-April-2013]. [92] Fox, S., and Rainie, L. Vital Decisions: How Internet Users Decide what In- formation to Trust when They Or Their Loved Ones are Sick. American Life Project, 2002. [Online: Pew Internet & http://www.pewinternet.org/2002/05/22/ vital-decisions-a-pew-internet-health-report/, accessed 2-April-2013]. [93] Franklin, V. L., Waller, A., Pagliari, C., and Greene, S. A. A randomized controlled trial of Sweet Talk, a text-messaging system to support young people with diabetes. Diabetic Medicine 23, 12 (2006), 1332–1338. [94] Frantzi, K., Ananiadou, S., and Mima, H. Automatic recognition of multi-word terms: the cvalue/nc-value method. International Journal on Digital Libraries 3, 2 (2000), 115–130. [95] Friedrich, C. M., Revillion, T., Hofmann, M., and Fluck, J. Biomedical and chemical named entity recognition with conditional random fields: the advantage of dictionary features. In Semantic Mining in Biomedicine, vol. 7 (2006), 85–89. [96] Frost, J. H., and Massagli, M. P. Social uses of personal health information within PatientsLikeMe, an online community: what can happen when patients have access to one anothers data. Journal of Medical Internet Research 10, 3 (2008), e15. BIBLIOGRAPHY 148 [97] Gade, E. J., Thomsen, S. F., Lindenberg, S., Kyvik, K. O., Lieberoth, S., and Backer, V. Asthma affects time to pregnancy and fertility: a register-based twin study. European Respiratory Journal 43, 4 (2014), 1077–1085. [98] Gavin, J., Rodham, K., and Poyer, H. The presentation of “pro-anorexia” in online group interactions. Qualitative Health Research 18, 3 (2008), 325–333. [99] Gibbs, R. D., Gibbs, P. H., and Henrich, J. Patient understanding of commonly used medical vocabulary. The Journal of Family Practice 25, 2 (1987), 176–178. [100] Ginsberg, J., Mohebbi, M. H., Patel, R. S., Brammer, L., Smolinski, M. S., and Brilliant, L. Detecting influenza epidemics using search engine query data. Nature 457, 7232 (2008), 1012–1014. [101] Gooden, R. J., and Winefield, H. R. Breast and prostate cancer online discussion boards a thematic analysis of gender differences and similarities. Journal of Health Psychology 12, 1 (2007), 103–114. [102] Gossop, M., Battersby, M., and Strang, J. Self-detoxification by opiate addicts. a preliminary investigation. The British Journal of Psychiatry 159, 2 (1991), 208–212. [103] Gossop, M., Green, L., Phillips, G., and Bradley, B. Lapse, relapse and survival among opiate addicts after treatment. A prospective follow-up study. The British Journal of Psychiatry 154, 3 (1989), 348–353. [104] Grandinetti, D. A. Doctors and the web. Help your patients surf the Net safely. Medical Economics 77, 5 (2000), 186. [105] Gray, N. J., Klein, J. D., Noyce, P. R., Sesselberg, T. S., and Cantrill, J. A. Health informationseeking behaviour in adolescence: the place of the Internet. Social Science & Medicine 60, 7 (2005), 1467–1478. [106] Green, L., and Gossop, M. Effects of information on the opiate withdrawal syndrome. British Journal of Addiction 83, 3 (1988), 305–309. [107] Greene, J. A., Choudhry, N. K., Kilabuk, E., and Shrank, W. H. Online social networking by patients with diabetes: a qualitative evaluation of communication with Facebook. Journal of General Internal Medicine 26, 3 (2011), 287–292. BIBLIOGRAPHY 149 [108] Grimes, A., Landry, B. M., and Grinter, R. E. Characteristics of shared health reflections in a local community. In Computer Supported Cooperative Work, ACM (2010), 435–444. [109] Grishman, R., Huttunen, S., and Yangarber, R. Information extraction for enhanced access to disease outbreak reports. Journal of Biomedical Informatics 35, 4 (2002), 236–246. [110] Guest, G., MacQueen, K. M., and Namey, E. E. Applied Thematic Analysis. Sage, 2011. [111] GuoDong, Z., and Jian, S. Exploring deep knowledge resources in biomedical name recognition. In workshop on Natural Language Processing in Biomedicine and its Applications, ACL (2004), 96–99. [112] Gupta, S., MacLean, D. L., Heer, J., and Manning, C. D. Induced lexico-syntactic patterns improve information extraction from online medical forums. Journal of the American Medical Informatics Association 21, 5 (2014), 902–909. [113] Hampton, T. Warning system aims to detect emerging trends in illegal drug use. Journal of the American Medical Association 312, 8 (2014), 779–779. [114] Hansen, D. L., Derry, H. A., Resnick, P. J., and Richardson, C. R. Adolescents searching for health information on the Internet: an observational study. Journal of Medical Internet Research 5, 4 (2003), e25. [115] Hansen, R. N., Oster, G., Edelsberg, J., Woody, G. E., and Sullivan, S. D. Economic costs of nonmedical use of prescription opioids. The Clinical Journal of Pain 27, 3 (2011), 194–202. [116] Hardey, M. Doctor in the house: the Internet as a source of lay health knowledge and the challenge to expertise. Sociology of Health & Illness 21, 6 (1999), 820–835. [117] Hardey, M. the story of my illness: Personal accounts of illness on the Internet. Health: 6, 1 (2002), 31–46. [118] Harman, G. A., Coppersmith, C. T., and Dredze, M. H. Measuring post traumatic stress disorder in Twitter. In International Conference on Weblogs and Social Media, AAAI (2014), 579–582. [119] Harpaz, R., DuMouchel, W., Shah, N. H., Madigan, D., Ryan, P., and Friedman, C. Novel datamining methodologies for adverse drug event discovery and analysis. Clinical Pharmacology & Therapeutics 91, 6 (2012), 1010–1021. BIBLIOGRAPHY 150 [120] Harris, S., and Gerich, E. Retiring the NSFNET Backbone Service: Chronicling the end of an era. Connexions 10, 4 (1996). [121] Hartzband, P., and Groopman, J. Untangling the Web: patients, doctors, and the Internet. New England Journal of Medicine 362, 12 (2010), 1063–1066. [122] Hartzler, A., and Pratt, W. Managing the personal side of health: How patient expertise differs from the expertise of clinicians. Journal of Medical Internet Research 13, 3 (2011), e62. [123] He, H. A., Greenberg, S., and Huang, E. M. One size does not fit all: applying the transtheoretical model to energy feedback technology design. In Human Factors in Computing Systems, ACM (2010), 927–936. [124] He, Y., and Kayaalp, M. Biological entity recognition with conditional random fields. In American Medical Informatics Association Annual Symposium, AMIA (2008), 293. [125] Hearst, M. S. A simple algorithm for identifying abbreviation definitions in biomedical text. In Pacific Symposium on Biocomputing (2003), 451–462. [126] Heer, J., and Bostock, M. Crowdsourcing graphical perception: using Mechanical Turk to assess visualization design. In Human Factors in Computing Systems, ACM (2010), 203–212. [127] Heffernan, R., Mostashari, F., Das, D., Karpati, A., Kulldorff, M., Weiss, D., et al. Syndromic surveillance in public health practice, New York City. Emerging Infectious Diseases 10, 5 (2004), 858–864. [128] Henning, K. J. What is syndromic surveillance? Morbidity and Mortality Weekly Report (2004), 7–11. [129] Homan, C. M., Lu, N., Tu, X., Lytle, M. C., and Silenzio, V. Social structure and depression in TrevorSpace. In Computer supported Cooperative Work, ACM (2014), 615–625. [130] Houston, T. K., Cooper, L. A., and Ford, D. E. Internet support groups for depression: a 1-year prospective cohort study. American Journal of Psychiatry 159, 12 (2002), 2062–2068. [131] Høybye, M. T., Johansen, C., and Tjørnhøj-Thomsen, T. Online interaction. Effects of storytelling in an Internet breast cancer support group. Psycho-Oncology 14, 3 (2005), 211–220. BIBLIOGRAPHY 151 [132] Hulth, A., and Rydevik, G. Web query-based surveillance in Sweden during the influenza A (H1N1) 2009 pandemic, April 2009 to February 2010. Euro Surveillance 16, 18 (2011). [133] Humphreys, K. Circles of recovery: Self-help organizations for addictions. Cambridge Univ. Press, 2004. [134] Hwang, K. O., Ottenbacher, A. J., Green, A. P., Cannon-Diehl, M. R., Richardson, O., Bernstam, E. V., and Thomas, E. J. Social support in an Internet weight loss community. International Journal of Medical Informatics 79, 1 (2010), 5–13. [135] Jamison-Powell, S., Linehan, C., Daley, L., Garbett, A., and Lawson, S. I can’t get no sleep: discussing #insomnia on Twitter. In Human Factors in Computing Systems, ACM (2012), 1501– 1510. [136] Jha, M., and Elhadad, N. Cancer stage prediction based on patient online discourse. In workshop on Biomedical Natural Language Processing, ACL (2010), 64–71. [137] Johnson, H. A., Wagner, M. M., Hogan, W. R., Chapman, W., Olszewski, R. T., Dowling, J., Barnas, G., et al. Analysis of web access logs for surveillance of influenza. Studies in Health Technology and Informatics 107, Pt 2 (2004), 1202–1206. [138] Jonquet, C., Shah, N. H., and Musen, M. A. The Open Biomedical Annotator. In summit on Translational Bioinformatics, AMIA (2009), 56. [139] Kandel, D. B. Stages and pathways of drug involvement: Examining the gateway hypothesis. Cambridge University Press, 2002. [140] Kaskutas, L. A., Bond, J., and Humphreys, K. Social networks as mediators of the effect of Alcoholics Anonymous. Addiction 97, 7 (2002), 891–900. [141] Kelly, J. F., Hoeppner, B., Stout, R. L., and Pagano, M. Determining the relative importance of the mechanisms of behavior change within Alcoholics Anonymous: a multiple mediator analysis. Addiction 107, 2 (2012), 289–299. [142] Kendall, L., Hartzler, A., Klasnja, P., and Pratt, W. Descriptive analysis of physical activity conversations on Twitter. In extended abstracts on Human Factors in Computing Systems, ACM (2011), 1555–1560. BIBLIOGRAPHY 152 [143] Keselman, A., Smith, C. A., Divita, G., Kim, H., Browne, A. C., Leroy, G., and Zeng-Treitler, Q. Consumer health concepts that do not map to the UMLS: where do they fit? Journal of the American Medical Informatics Association 15, 4 (2008), 496–505. [144] Keselman, A., Tse, T., Crowell, J., Browne, A., Ngo, L., and Zeng, Q. Assessing consumer health vocabulary familiarity: an exploratory study. Journal of Medical Internet Research 9, 1 (2007), e5. [145] Kim, J.-D., Ohta, T., Tateisi, Y., and Tsujii, J. GENIA corpus – a semantically annotated corpus for bio-textmining. Bioinformatics 19, suppl 1 (2003), i180–i182. [146] Kim, J.-D., Ohta, T., Tsuruoka, Y., Tateisi, Y., and Collier, N. Introduction to the bio-entity recognition task at JNLPBA. In joint workshop on Natural Language Processing in Biomedicine and its Applications, ACL (2004), 70–75. [147] Kittur, A., Chi, E. H., and Suh, B. Crowdsourcing user studies with Mechanical Turk. In Human Factors in Computing Systems, ACM (2008), 453–456. [148] Klemm, P., Bunnell, D., Cullen, M., Soneji, R., Gibbons, P., and Holecek, A. Online cancer support groups: a review of the research literature. Computers Informatics Nursing (2003). [149] Kummervold, P. E., Gammon, D., Bergvik, S., Johnsen, J.-A. K., Hasvold, T., and Rosenvinge, J. H. Social support in a wired world: use of online mental health forums in Norway. Nordic Journal of Psychiatry 56, 1 (2002), 59–65. [150] LaCoursiere, S. P., Knobf, M. T., and McCorkle, R. Cancer patients’ self-reported attitudes about the Internet. Journal of Medical Internet Research 7, 3 (2005), e22. [151] Lafferty, J., McCallum, A., and Pereira, F. C. Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In International Conference on Machine Learning, ACM (2001), 282–289. [152] Lamb, A., Paul, M. J., and Dredze, M. Separating fact from fear: Tracking flu infections on Twitter. In North American Chapter of the ACL : Human Language Technologies, ACL (2013), 789–795. [153] Lasker, J. N., Sogolow, E. D., and Sharim, R. R. The role of an online community for people with a rare disease: content analysis of messages posted on a primary biliary cirrhosis mailing list. Journal of Medical Internet Research 7, 1 (2005), e10. BIBLIOGRAPHY 153 [154] Leaman, R., Wojtulewicz, L., Sullivan, R., Skariah, A., Yang, J., and Gonzalez, G. Towards Internet-age pharmacovigilance: extracting adverse drug reactions from user posts to healthrelated social networks. In workshop on Biomedical Natural Language Processing, ACL (2010), 117–125. [155] Lembke, A. Humphreys, K. Self-Help Organizations for Substance Use Disorders. Oxford Univ. Press, 2009. [156] Lewis, T. Seeking health information on the Internet: lifestyle choice or bad attack of cyberchondria? Media, Culture & Society 28, 4 (2006), 521–539. [157] Liang, P. Semi-supervised learning for natural language. PhD thesis, Massachusetts Institute of Technology, 2005. [158] Lieberman, M. A., Golant, M., Giese-Davis, J., Winzlenberg, A., Benjamin, H., Humphreys, K., Kronenwetter, C., Russo, S., and Spiegel, D. Electronic support groups for breast carcinoma. Cancer 97, 4 (2003), 920–925. [159] MacLean, D. L., and Heer, J. Identifying medical terms in patient-authored text: a crowdsourcingbased approach. Journal of the American Medical Informatics Association 20, 6 (2013), 1120– 1127. [160] Malik, S. H., and Coulson, N. The male experience of infertility: a thematic analysis of an online infertility support group bulletin board. Journal of Reproductive and Infant Psychology 26, 1 (2008), 18–30. [161] Malik, S. H., and Coulson, N. S. Coping with infertility online: An examination of self-help mechanisms in an online infertility support group. Patient Education and Counseling 81, 2 (2010), 315–318. [162] Maloney-Krichmar, D., and Preece, J. A multilevel analysis of sociability, usability, and community dynamics in an online health community. ACM Transactions on Computer-Human Interaction 12, 2 (2005), 201–232. [163] Mandl, K. D., Overhage, J. M., Wagner, M. M., Lober, W. B., Sebastiani, P., Mostashari, F., Pavlin, J. A., Gesteland, P. H., Treadwell, T., Koski, E., et al. Implementing syndromic surveillance: a BIBLIOGRAPHY 154 practical guide informed by the early experience. Journal of the American Medical Informatics Association 11, 2 (2004), 141–150. [164] Mankoff, J., Kuksenok, K., Kiesler, S., Rode, J. A., and Waldman, K. Competing online viewpoints and models of chronic illness. In Human Factors in Computing Systems, ACM (2011), 589–598. [165] Mayer, D. K., Terrin, N. C., Kreps, G. L., Menon, U., McCance, K., Parsons, S. K., and Mooney, K. H. Cancer survivors information seeking behaviors: a comparison of survivors who do and do not seek information about cancer. Patient Education and Counseling 65, 3 (2007), 342–350. [166] Mayer, M., and Till, J. The Internet: a modern Pandora’s box? Quality of Life Research 5, 6 (1996), 568–571. [167] McCray, A. T., Loane, R. F., Browne, A. C., and Bangalore, A. K. Terminology issues in user access to web-based medical information. In American Medical Informatics Association Annual Symposium, AMIA (1999), 107. [168] McLellan, A. T. What is recovery? Revisiting the Betty Ford Institute consensus panel definition. Journal of Substance Abuse Treatment (2010), 109–113. [169] McLellan, A. T., Lewis, D. C., O’Brien, C. P., and Kleber, H. D. Drug dependence, a chronic medical illness: implications for treatment, insurance, and outcomes evaluation. Journal of the American Medical Association 284, 13 (2000), 1689–1695. [170] McNeil, K., Brna, P., and Gordon, K. Epilepsy in the Twitter era: a need to re-tweet the way we think about seizures. Epilepsy & Behavior 23, 2 (2012), 127–130. [171] Medawar, C., Herxheimer, A., Bell, A., and Jofre, S. Paroxetine, panorama and user reporting of adrs: Consumer intelligence matters in clinical practice and post-marketing drug surveillance. The International Journal of Risk and Safety in Medicine 15, 3 (2002), 161–169. [172] Medlineplus use by quarter. National Library of Medicine (2013). [Online: http://www.nlm.nih. gov/medlineplus/usestatistics.html, accessed 25-August-2014]. [173] Meier, A., Lyons, E. J., Frydman, G., Forlenza, M., and Rimer, B. K. How cancer survivors provide support on cancer-related Internet mailing lists. Journal of Medical Internet Research 9, 2 (2007), e12. BIBLIOGRAPHY 155 [174] Merrill, J. O., Rhodes, L. A., Deyo, R. A., Marlatt, G. A., and Bradley, K. A. Mutual mistrust in the medical care of drug users. Journal of General Internal Medicine 17, 5 (2002), 327–333. [175] Migneault, J. P., Adams, T. B., and Read, J. P. Application of the transtheoretical model to substance abuse: historical development and future directions. Drug and Alcohol Review 24, 5 (2005), 437–448. [176] Miller, N. S., Sheppard, L. M., Colenda, C. C., and Magen, J. Why physicians are unprepared to treat patients who have alcohol-and drug-related disorders. Academic Medicine 76, 5 (2001), 410–418. [177] Mo, P. K., and Coulson, N. S. Exploring the communication of social support within virtual communities: A content analysis of messages posted to an online HIV/AIDS support group. Cyberpsychology & Behavior 11, 3 (2008), 371–374. [178] Morahan-Martin, J. M. How Internet users find, evaluate, and use online health information: a cross-cultural review. CyberPsychology & Behavior 7, 5 (2004), 497–510. [179] Mulveen, R., and Hepworth, J. An interpretative phenomenological analysis of participation in a pro-anorexia Internet site and its relationship with disordered eating. Journal of Health Psychology 11, 2 (2006), 283–296. [180] Murnane, E. L., and Counts, S. Unraveling abstinence and relapse: smoking cessation reflected in social media. In Human Factors in Computing Systems, ACM (2014), 1345–1354. [181] Murray, E., Lo, B., Pollack, L., Donelan, K., Catania, J., Lee, K., Zapert, K., and Turner, R. The impact of health information on the Internet on health care and the physician-patient relationship: national U.S. survey among 1.050 U.S. physicians. Journal of Medical Internet Research 5, 3 (2003). [182] Nettleton, S., Burrows, R., and O’Malley, L. The mundane realities of the everyday lay use of the Internet for health, and their consequences for media convergence. Sociology of Health & Illness 27, 7 (2005), 972–992. [183] Nikfarjam, A., and Gonzalez, G. H. Pattern mining for extraction of mentions of adverse drug reactions from user comments. In American Medical Informatics Association Annual Symposium, AMIA (2011), 1019. BIBLIOGRAPHY 156 [184] Noble, A., Best, D., Man, L.-H., Gossop, M., and Strang, J. Self-detoxification attempts among methadone maintenance patients: what methods and what success? Addictive Behaviors 27, 4 (2002), 575–584. [185] Nonnecke, B., and Preece, J. Shedding light on lurkers in online communities. Ethnographic Studies in Real and Virtual Environments: Inhabited Information Spaces and Connected Communities (1999), 123–128. [186] Nonnecke, B., and Preece, J. Lurker demographics: Counting the silent. In Human Factors in Computing Systems, ACM (2000), 73–80. [187] Olsen, Y., and Sharfstein, J. M. Confronting the stigma of opioid use disorder – and its treatment. Journal of the American Medical Association 311, 14 (2014), 1393–1394. [188] Owen, J. E., Giese-Davis, J., Cordova, M., Kronenwetter, C., Golant, M., and Spiegel, D. Selfreport and linguistic indicators of emotional expression in narratives as predictors of adjustment to cancer. Journal of Behavioral Medicine 29, 4 (2006), 335–345. [189] Owoputi, O., O’Connor, B., Dyer, C., Gimpel, K., Schneider, N., and Smith, N. A. Improved partof-speech tagging for online conversational text with word clusters. In North American Chapter of the ACL : Human Language Technologies, ACL (2013), 380–390. [190] Pagano, M. E., Friend, K. B., Tonigan, J. S., and Stout, R. L. Helping other alcoholics in Alcoholics Anonymous and drinking outcomes: Findings from Project MATCH. Journal of Studies on Alcohol 65, 6 (2004), 766. [191] Park, S., Lee, S. W., Kwak, J., Cha, M., and Jeong, B. Activities on Facebook reveal the depressive state of users. Journal of Medical Internet Research 15, 10 (2013), e217. [192] Parker, Life K., Project, and Wang, 2013. W. Modern [Online: Parenthood. Pew Internet & American http://www.pewsocialtrends.org/2013/03/14/ modern-parenthood-roles-of-moms-and-dads-converge-as-they-balance-work, ac- cessed 2-April-2013]. [193] Paul, M. J., and Dredze, M. A model for mining public health topics from Twitter. In Health, vol. 11 (2012), 16–6. BIBLIOGRAPHY 157 [194] Peat, H. J., and Willett, P. The limitations of term co-occurrence data for query expansion in document retrieval systems. JASIS 42, 5 (1991), 378–383. [195] Pennebaker, J. W., Francis, M. E., and Booth, R. J. Linguistic inquiry and word count: LIWC 2001. Mahway: Lawrence Erlbaum Associates 71 (2001). [196] Pennebaker, J. W., Mehl, M. R., and Niederhoffer, K. G. Psychological aspects of natural language use: Our words, our selves. Annual Review of Psychology 54, 1 (2003), 547–577. [197] Ploderer, B., Smith, W., Howard, S., Pearce, J., and Borland, R. Patterns of support in an online community for smoking cessation. In International Conference on Communities and Technologies, ACM (2013), 26–35. [198] Polgreen, P. M., Chen, Y., Pennock, D. M., Nelson, F. D., and Weinstein, R. A. Using Internet searches for influenza surveillance. Clinical Infectious Diseases 47, 11 (2008), 1443–1448. [199] Potts, H. W., and Wyatt, J. C. Survey of doctors’ experience of patients using the Internet. Journal of Medical Internet Research 4, 1 (2002), e5. [200] Powell, J., and Clarke, A. Internet information-seeking in mental health population survey. The British Journal of Psychiatry 189, 3 (2006), 273–277. [201] Pratt, W., and Yetisgen-Yildiz, M. A study of biomedical concept identification: Metamap vs. people. In American Medical Informatics Association Annual Symposium, AMIA (2003), 529. [202] Preece, J., Nonnecke, B., and Andrews, D. The top five reasons for lurking: improving community experiences for everyone. Computers in Human Behavior 20, 2 (2004), 201–223. [203] Prochaska, J. O., and Velicer, W. F. The transtheoretical model of health behavior change. American Journal of Health Promotion 12, 1 (1997), 38–48. [204] Pyysalo, S., Ginter, F., Heimonen, J., Björne, J., Boberg, J., Järvinen, J., and Salakoski, T. Bioinfer: a corpus for information extraction in the biomedical domain. BMC Bioinformatics 8, 1 (2007), 50. [205] Rainie, L., and Fox, American Life Project, S. 2000. The Online Health Care Revolution. [Online: Pew Internet & http://www.pewinternet.org/2000/11/26/ the-online-health-care-revolution/, accessed 2-April-2013]. BIBLIOGRAPHY 158 [206] Ravert, R. D., Hancock, M. D., and Ingersoll, G. M. Online forum messages posted by adolescents with type 1 diabetes. The Diabetes Educator 30, 5 (2003), 827–834. [207] Reis, B. Y., and Mandl, K. D. Time series modeling for syndromic surveillance. BMC Medical Informatics and Decision Making 3, 1 (2003), 2. [208] Resnik, P., Garron, A., and Resnik, R. Using topic modeling to improve prediction of neuroticism and depression. In Conference on Empirical Methods in Natural Language Processing, ACL (2013), 1348–1353. [209] Rideout, V. Generation Rx.com. what are young people really doing online? Marketing Health Services 22, 1 (2002), 26. [210] Risk, A., and Petersen, C. Health information on the Internet: quality issues and international initiatives. Journal of the American Medical Association 287, 20 (2002), 2713–2715. [211] Rodgers, S., and Chen, Q. Internet community group participation: Psychosocial benefits for women with breast cancer. Journal of Computer-Mediated Communication 10, 4 (2005). [212] Ruau, D., Mbagwu, M., Dudley, J. T., Krishnan, V., and Butte, A. J. Comparison of automated and human assignment of MeSH terms on publicly-available molecular datasets. Journal of Biomedical Informatics 44 (2011), S39–S43. [213] Sadilek, A., Brennan, S., Kautz, H., and Silenzio, V. nEmesis: Which restaurants should you avoid today? In Human Computation and Crowdsourcing, AAAI (2013). [214] Saha, S. K., Sarkar, S., and Mitra, P. Feature selection techniques for maximum entropy based biomedical named entity recognition. Journal of Biomedical Informatics 42, 5 (2009), 905–911. [215] Salem, D. A., Bogat, G. A., and Reid, C. Mutual help goes on-line. Journal of Community Psychology 25, 2 (1997), 189–207. [216] Sanderson, M., and Croft, B. Deriving concept hierarchies from text. In Research and Development in Information Retrieval, ACM SIGIR (1999), 206–213. [217] Sanford, A. A. “I can air my feelings instead of eating them”: Blogging as social support for the morbidly obese. Communication Studies 61, 5 (2010), 567–584. BIBLIOGRAPHY 159 [218] Scanfeld, D., Scanfeld, V., and Larson, E. L. Dissemination of health information through social networks: Twitter and antibiotics. American Journal of Infection Control 38, 3 (2010), 182–188. [219] Schatz, B. R., Johnson, E. H., Cochrane, P. A., and Chen, H. Interactive term suggestion for users of digital libraries: Using subject thesauri and co-occurrence lists for information retrieval. In International Conference on Digital libraries, ACM (1996), 126–133. [220] Seale, C., Ziebland, S., and Charteris-Black, J. Gender, cancer experience and Internet use: a comparative keyword analysis of interviews and online cancer support groups. Social Science & Medicine 62, 10 (2006), 2577–2590. [221] Seifter, A., Schwarzwalder, A., Geis, K., and Aucott, J. The utility of Google Trends for epidemiological research: Lyme disease as an example. Geospatial Health 4, 2 (2010), 135–137. [222] Settles, B. Biomedical named entity recognition using conditional random fields and rich feature sets. In joint workshop on Natural Language Processing in Biomedicine and its Applications, ACL (2004), 104–107. [223] Sheeren, M. The relationship between relapse and involvement in Alcoholics Anonymous. Journal of Studies on Alcohol and Drugs 49, 1 (1988), 104. [224] Shuyler, K. S., and Knight, K. M. What are patients seeking when they turn to the Internet? Qualitative content analysis of questions asked by visitors to an orthopaedics web site. Journal of Medical Internet Research 5, 4 (2003), e24. [225] Sillence, E., Briggs, P., Harris, P. R., and Fishwick, L. How do patients evaluate and make use of online health information? Social Science & Medicine 64, 9 (2007), 1853–1862. [226] Smith, C. A., and Wicks, P. J. PatientsLikeMe: Consumer health vocabulary as a folksonomy. In American Medical Informatics Association Annual Symposium, AMIA (2008), 682. [227] Smyth, B., Barry, J., Keenan, E., and Ducray, K. Lapse and relapse following inpatient treatment of opiate dependence. Irish Medical Journal 103, 6 (2010), 176–179. [228] Snow, R., O’Connor, B., Jurafsky, D., and Ng, A. Y. Cheap and fast—but is it good? Evaluating non-expert annotations for natural language tasks. In Empirical Methods in Natural Language Processing, ACL (2008), 254–263. BIBLIOGRAPHY 160 [229] Sproule, B., Brands, B., Li, S., and Catz-Biro, L. Changing patterns in opioid addiction – characterizing users of oxycodone and other opioids. Canadian Family Physician 55, 1 (2009), 68–69. [230] Strang, J., Babor, T., Caulkins, J., Fischer, B., Foxcroft, D., and Humphreys, K. Drug policy and the public good: evidence for effective interventions. The Lancet 379, 9810 (2012), 71–83. [231] Substance Abuse and Mental Health Services Administration. Drug Abuse Warning Network, 2011: National Estimates of Drug-Related Emergency Department Visits. HHS Publication No. (SMA) 13-4760, DAWN Series D-39. Rockville, MD: Substance Abuse and Mental Health Services Administration, 2013. [232] Substance Abuse and Mental Health Services Administration, Center for Behavioral Health Statistics and Quality. The N-SSATS report: Trends in the use of methadone and buprenorphine at substance abuse treatment facilities: 2003 to 2011. Rockville, MD. 2013. [233] Sullivan, C. F. Gendered cybersupport: A thematic analysis of two online cancer support groups. Journal of Health Psychology 8, 1 (2003), 83–104. [234] Sullivan, S. J., Schneiders, A. G., Cheang, C.-W., Kitto, E., Lee, H., Redhead, J., Ward, S., Ahmed, O. H., and McCrory, P. R. What’s happening? A content analysis of concussion-related traffic on Twitter. British Journal of Sports Medicine 46, 4 (2012), 258–263. [235] Teodoro, R., and Naaman, M. Fitter with Twitter: Understanding personal health and fitness activity in social media. In International Conference on Weblogs and Social Media (2013). [236] Thomas, D. R. A general inductive approach for analyzing qualitative evaluation data. American Journal of Evaluation 27, 2 (2006), 237–246. [237] Tonigan, J. S., and Rice, S. L. Is it beneficial to have an Alcoholics Anonymous sponsor? Psychology of Addictive Behaviors 24, 3 (2010), 397. [238] Tsai, R. T.-H., Wu, S.-H., Chou, W.-C., Lin, Y.-C., He, D., Hsiang, J., Sung, T.-Y., and Hsu, W.-L. Various criteria in the evaluation of biomedical named entity recognition. BMC Bioinformatics 7, 1 (2006), 92. [239] Tsai, T.-h., Chou, W.-C., Wu, S.-H., Sung, T.-Y., Hsiang, J., and Hsu, W.-L. Integrating linguistic knowledge into a conditional random fieldframework to identify biomedical named entities. Expert Systems with Applications 30, 1 (2006), 117–128. BIBLIOGRAPHY 161 [240] Turner-McGrievy, G. M., and Tate, D. F. Weight loss social support in 140 characters or less: use of an online social network in a remotely delivered weight loss intervention. Translational Behavioral Medicine 3, 3 (2013), 287–294. [241] United States Department of Health and Human Services. Substance Abuse and Mental Health Services Administration. Center for Behavioral Health Statistics and Quality. Treatment Episode Data Set – Admissions (TEDS-A), 2011. ICPSR34876-v3. Ann Arbor, MI: Inter-university Consortium for Political and Social Research [distributor], 2014-09-11. http://doi.org/10.3886/ICPSR34876.v3. [242] U.S. Department of Health and Human Services. Substance Abuse and Mental Health Services Administration. Results from the 2010 National Survey on Drug Use and Health: Summary of National Findings. [Online: http://www.samhsa.gov/data/nsduh/2k10nsduh/2k10results. htm, accessed 15-Sept-2013]. [243] Ussher, J., Kirsten, L., Butow, P., and Sandoval, M. What do cancer support groups provide which other supportive relationships do not? The experience of peer support groups for people with cancer. Social Science & Medicine 62, 10 (2006), 2565–2576. [244] Van Hout, M. C., and Bingham, T. Silk road, the virtual drug marketplace: a single case study of user experiences. International Journal of Drug Policy 24, 5 (2013), 385–391. [245] van Rijsbergen, C. J. A theoretical basis for the use of co-occurrence data in information retrieval. Journal of Documentation 33, 2 (1977), 106–119. [246] van Uden-Kraan, C. F., Drossaert, C. H., Taal, E., Seydel, E. R., and van de Laar, M. A. Selfreported differences in empowerment between lurkers and posters in online patient support groups. Journal of Medical Internet Research 10, 2 (2008), e18. [247] Velicer, W. F., Prochaska, J. O., Fava, J. L., Norman, G. J., and Redding, C. A. Smoking cessation and stress management: Applications of the transtheoretical model of behavior change. Homeostasis in Health and Disease 38 (1998), 216–233. [248] Vlahovic, T. A., Wang, Y.-C., Kraut, R. E., and Levine, J. M. Support matching and satisfaction in an online breast cancer support community. In Human Factors in Computing Systems, ACM (2014), 1625–1634. BIBLIOGRAPHY 162 [249] Volkow, N. D. Prescription drugs: Abuse and addiction, 2005. [Online: http://www.drugabuse. gov/sites/default/files/rxreportfinalprint.pdf, accessed 9/4/2014]. [250] Wang, Y.-C., Kraut, R., and Levine, J. M. To stay or leave? The relationship of emotional and informational support to commitment in online health support groups. In Computer Supported Cooperative Work, ACM (2012), 833–842. [251] Warner, M., Chen, L. H., Makuc, D. M., Anderson, R. N., and Miniño, A. M. Drug poisoning deaths in the United States, 1980-2008. NCHS Data Brief, 81 (2011), 1–8. [252] Wen, M., and Rosé, C. P. Understanding participant behavior trajectories in online health support groups using automatic extraction methods. In International Conference on Supporting Group Work, ACM (2012), 179–188. [253] West, R. Time for a change: putting the transtheoretical (stages of change) model to rest. Addiction 100, 8 (2005), 1036–1039. [254] White, R. W., and Horvitz, E. Cyberchondria: studies of the escalation of medical concerns in web search. ACM Transactions on Information Systems 27, 4 (2009), 23. [255] White, R. W., and Horvitz, E. Web to world: Predicting transitions from self-diagnosis to the pursuit of local medical assistance in web search. In American Medical Informatics Association Annual Symposium, AMIA (2010), 882. [256] White, R. W., and Horvitz, E. Studies of the onset and persistence of medical concerns in search logs. In Research and Development in Information Retrieval, ACM SIGIR (2012), 265–274. [257] White, R. W., Tatonetti, N. P., Shah, N. H., Altman, R. B., and Horvitz, E. Web-scale pharmacovigilance: listening to signals from the crowd. Journal of the American Medical Informatics Association 20, 1 (2013), 404–408. [258] Wicks, P., Keininger, D. L., Massagli, M. P., la Loge, C. d., Brownstein, C., Isojärvi, J., and Heywood, J. Perceived benefits of sharing health data between people with epilepsy on an online platform. Epilepsy & Behavior 23, 1 (2012), 16–23. [259] Wicks, P., Massagli, M., Frost, J., Brownstein, C., Okun, S., Vaughan, T., Bradley, R., and Heywood, J. Sharing health data for better outcomes on PatientsLikeMe. Journal of Medical Internet Research 12, 2 (2010), e19. BIBLIOGRAPHY 163 [260] Wicks, P., Vaughan, T. E., Massagli, M. P., and Heywood, J. Accelerated clinical discovery using self-reported patient data collected online and a patient-matching algorithm. Nature Biotechnology 29, 5 (2011), 411–414. [261] Wilson, J. L., Peebles, R., Hardy, K. K., and Litt, I. F. Surfing for thinness: a pilot study of pro– eating disorder web site usage in adolescents with eating disorders. Pediatrics 118, 6 (2006), e1635–e1643. [262] Wilson, K., and Brownstein, J. S. Early detection of disease outbreaks using the Internet. Canadian Medical Association Journal 180, 8 (2009), 829–831. [263] Wood, E., Samet, J. H., and Volkow, N. D. Physician education in addiction medicine. Journal of the American Medical Association 310, 16 (2013), 1673–1674. [264] Xu, R., Supekar, K., Morgan, A., Das, A., and Garber, A. Unsupervised method for automatic construction of a disease dictionary from a large free text collection. In American Medical Informatics Association Annual Symposium, AMIA (2008), 820. [265] Yang, C. C., Jiang, L., Yang, H., and Tang, X. Detecting signals of adverse drug reactions from health consumer contributed content in social media. In workshop on Health Informatics, ACM SIGKDD (2012). [266] Yang, C. C., Yang, H., Jiang, L., and Zhang, M. Social media mining for drug safety signal detection. In workshop on Smart Health and Wellbeing, ACM (2012), 33–40. [267] Yang, Z., Lin, H., and Li, Y. Exploiting the contextual cues for bio-entity name recognition in biomedical literature. Journal of Biomedical Informatics 41, 4 (2008), 580–587. [268] Yates, A., and Goharian, N. ADRTrace: detecting expected and unexpected adverse drug reactions from user reviews on social media sites. In Advances in Information Retrieval. Springer, 2013, 816–819. [269] Yates, A., Goharian, N., and Frieder, O. Extracting adverse drug reactions from forum posts and linking them to drugs. In workshop on Health Search and Discovery, ACM SIGIR (2013). [270] Ybarra, M. L., and Eaton, W. W. Internet-based mental health interventions. Mental Health Services Research 7, 2 (2005), 75–87. BIBLIOGRAPHY 164 [271] Yeh, A., Morgan, A., Colosimo, M., and Hirschman, L. BioCreAtIvE task 1A: gene mention finding evaluation. BMC Bioinformatics 6, Suppl 1 (2005), S2. [272] Zeng, Q., Kogan, S., Ash, N., Greenes, R., and Boxwala, A. Characteristics of consumer terminology for health information retrieval. Methods of Information in Medicine 41, 4 (2002), 289–298. [273] Zeng, Q. T., and Tse, T. Exploring and developing consumer health vocabularies. Journal of the American Medical Informatics Association 13, 1 (2006), 24–29. [274] Zeng, Q. T., Tse, T., Divita, G., Keselman, A., Crowell, J., Browne, A. C., Goryachev, S., and Ngo, L. Term identification methods for consumer health vocabulary development. Journal of Medical Internet Research 9, 1 (2007), e4. [275] Ziebland, S., Chapple, A., Dumelow, C., Evans, J., Prinjha, S., and Rozmovits, L. How the Internet affects patients’ experience of cancer: a qualitative study. British Medical Journal 328, 7439 (2004), 564.