insights from patient authored text: from close reading to automated

Transcription

INSIGHTS FROM PATIENT AUTHORED TEXT:
FROM CLOSE READING TO AUTOMATED EXTRACTION
A DISSERTATION
SUBMITTED TO THE DEPARTMENT OF COMPUTER SCIENCE
AND THE COMMITTEE ON GRADUATE STUDIES
OF STANFORD UNIVERSITY
IN PARTIAL FULFILLMENT OF THE REQUIREMENTS
FOR THE DEGREE OF
DOCTOR OF PHILOSOPHY
DIANA LYNN MACLEAN
MARCH 2015
© 2015 by Diana Lynn MacLean. All Rights Reserved.
Re-distributed by Stanford University under license with the author.
This work is licensed under a Creative Commons AttributionNoncommercial 3.0 United States License.
http://creativecommons.org/licenses/by-nc/3.0/us/
This dissertation is online at: http://purl.stanford.edu/nh030tg4542
ii
I certify that I have read this dissertation and that, in my opinion, it is fully adequate
in scope and quality as a dissertation for the degree of Doctor of Philosophy.
Jeffrey Heer, Primary Adviser
Michael Bernstein
Christopher Manning
Stuart Card
Approved for the Stanford University Committee on Graduate Studies.
Patricia J. Gumport, Vice Provost for Graduate Education
This signature page was generated electronically upon submission of this dissertation in
electronic format. An original signed hard copy of the signature page is on file in
University Archives.
iii
Abstract
Millions of people collaborate online with others who share their health concerns. In the process,
these users perform complex health-related tasks, such as differential diagnosis and treatment comparison. The result is a massive, growing and readily accessible corpus of patient authored text (PAT) that
documents patients’ behavior outside of the clinical environment. As a result, PAT can provide insights
into otherwise obscure topics, such as why patients follow only certain parts of a treatment protocol, or
how people self-treat stigmatized conditions such as prescription drug addiction.
Despite the potential value of PAT, attempts to extract medically-relevant insights from it have been
limited. PAT is notoriously noisy and challenging to work with, and there is a dearth of methods and tools
for processing and analyzing it. Moreover, the specific research questions that PAT can support are not
obvious: determining what data PAT encodes, and how, is a challenge in and of itself.
In this thesis, I develop methods for automatically extracting medically-relevant data from PAT. I focus
specifically on the topic of addiction: a stigmatized and prevalent medical condition. Building on close
readings of source text to inform schema induction, data annotation, and feature engineering, I train classifiers that accurately identify (1) medically-relevant terms in PAT; (2) users’ motivations for participating
in an addiction-related online health community; (3) users’ drugs of choice, and (4) users’ transitions
through relapse and recovery. Using these classifiers to scale analyses to large PAT corpora, I derive
novel insights into the process of addiction, as well as the role that online health communities play in
giving users informational and emotional support and, ultimately, in enabling recovery.
In concert, these contributions both underscore PAT’s latent value for illuminating poorly understood
or clandestine medical topics, and offer viable methods that dramatically improve our ability to realize
this value.
iv
For Angus and June
v
Acknowledgements
My first and foremost thanks to go my advisor, Jeffrey Heer. Jeff has been a wonderful source of
support, knowledge and inspiration during my time at Stanford, and I am deeply indebted to him for
not only supporting my curiosity as my research ventured into uncharted territory, but for doing so with
enthusiasm and confidence. Most importantly, however, Jeff has been an exemplary role model. I am
lucky, grateful, and unquestionably better for having had the opportunity to learn from him, and am proud
to be taking that with me as I start my next great adventure.
There are several people without whom this dissertation would not have been possible: Anna Lembke, who brought with her invaluable medical perspective, and whose enthusiasm, thoughtful insight and
patience were instrumental in making this cross-disciplinary work a reality; Stuart Card, whose ingeniousness I aspire to, and whose advice I have had the fortune to benefit from on several occasions;
Sonal Gupta, a close friend and collaborator from whom I have learned a great many things, and hope to
learn many more; and Michael Bernstein and Christopher Manning, who have given generously of their
time and advice, helping to steer this work from its inception through its completion.
I am fortunate to have had many wonderful co-conspirators while at Stanford. Sudheendra Hangal,
whose patient support and advice were instrumental in my early graduate school years, has been a
fantastic collaborator and a dear friend. Monica Lam, with whom I worked closely during my first year,
remains an uplifting source of inspiration. The UW IDL group, the Stanford HCI group, and the fantastic
people in the 3B wing have been a fun, dynamic and reliable source of new ideas, feedback and camaraderie, and will be greatly missed. Finally, Jillian Lentz and Monica Niemiec deserve special thanks
for not only providing efficient administrative support, but also for answering even the most frantic of
questions with a smile.
Finally, there are some people without whom I would not be where I am today. The inimitable Margo
Seltzer who, suffice it to say, started this whole business in the first place; David Holland, whose patient
and thorough technical tutelage stands me in good stead to this day; Will Phan, who helped me to see
the real joy in coding; my mother, Heather, who is the embodiment of never giving up; and, of course, my
husband, Isa, who inspires and challenges me to be a little better every day. It makes all the difference.
vi
Table of Contents
1 Introduction
1
1.1 Overview & Focus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1
1.2 Contributions
4
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.3 Outline of Thesis
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2 The Internet and Health
6
9
2.1 Online Health Information Seeking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
9
2.1.1 Historical Overview & Current Landscape . . . . . . . . . . . . . . . . . . . . . .
9
2.1.2 What Health Information Do Users Seek Online? . . . . . . . . . . . . . . . . . .
12
2.1.3 Who Seeks Health Information Online? . . . . . . . . . . . . . . . . . . . . . . .
12
Gender . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
13
Age . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
13
Health . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
14
Race . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
14
Socio-Economic Status & Education . . . . . . . . . . . . . . . . . . . . . . . . .
15
Role (Patient vs. Caregiver) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
15
2.1.4 Where Do People Find Health Information Online? . . . . . . . . . . . . . . . . .
15
2.2 Online Health Community Participation
. . . . . . . . . . . . . . . . . . . . . . . . . . .
16
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
16
2.2.2 Who Participates in OHCs? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
16
2.2.3 Reasons for Participation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
17
2.2.1 Modes of Participation
Medium-Based Affordances
. . . . . . . . . . . . . . . . . . . . . . . . . . . . .
17
Informational Support . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
17
Emotional Support . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
18
2.2.4 Efficacy of Online Health Forums . . . . . . . . . . . . . . . . . . . . . . . . . . .
18
2.3 Summary
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
vii
19
3 Prior Work on Patient Authored Text
21
3.1 Patient Authored Text (PAT): Introduction & Overview . . . . . . . . . . . . . . . . . . . .
21
3.1.1 Value of PAT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
22
3.1.2 Challenges of Working with PAT . . . . . . . . . . . . . . . . . . . . . . . . . . .
22
Noisiness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
22
Lack of Analysis Tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
23
Applicability to Research Questions . . . . . . . . . . . . . . . . . . . . . . . . .
23
3.2 Syndromic Surveillance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
24
3.2.1 Condition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
24
3.2.2 Data Source . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
25
3.2.3 Filtering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
25
3.2.4 Modeling and Prediction
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
26
3.2.5 Real-World Evaluation Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . .
26
3.3 Pharmacovigilance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
26
3.3.1 Data Source . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
27
3.3.2 Identifying Drugs in PAT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
27
3.3.3 Identifying Adverse Events in PAT . . . . . . . . . . . . . . . . . . . . . . . . . .
28
3.3.4 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
28
3.4 Named Entity Recognition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
28
3.4.1 Ontology-Based Tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
29
3.4.2 Statistical Classifiers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
29
3.5 Thematic Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
30
3.5.1 Condition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
31
3.5.2 Data Source . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
31
3.5.3 Analysis Question . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
31
3.5.4 Scaling Thematic Analyses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
32
3.6 Summary
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4 Data
33
35
4.1 MedHelp Corpus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
35
4.1.1 Terminology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
35
4.1.2 Forum77 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
37
viii
4.2 CureTogether Corpus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5 Identifying Medically Relevant Terms in PAT
39
40
5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
40
5.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
42
5.2.1 Medical Term Identification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
42
5.2.2 Consumer Health Vocabularies . . . . . . . . . . . . . . . . . . . . . . . . . . . .
43
5.3 Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
44
5.3.1 Preparation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
44
5.3.2 Samples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
45
5.4 Labeling Medically Relevant Terms with the Crowd . . . . . . . . . . . . . . . . . . . . .
45
5.4.1 Task Design and Pilot Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
46
5.4.2 Experiment
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
48
Determining a Gold Standard . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
49
Comparing Turkers Against a Gold Standard . . . . . . . . . . . . . . . . . . . .
49
5.4.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
50
5.4.4 Limitations of the Crowd
50
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.5 Training a Classifier on Crowd-Labeled Data
. . . . . . . . . . . . . . . . . . . . . . . .
52
5.5.1 Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
52
5.5.2 Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
53
5.5.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
53
Failure Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
53
5.6 Example Applications of ADEPT to PAT . . . . . . . . . . . . . . . . . . . . . . . . . . .
56
5.6.1 Summarizing Important Medical Content in MedHelp’s Arthritis Forum . . . . . . .
57
5.6.2 Navigating MedHelp’s Substance Abuse Forum (Forum77) . . . . . . . . . . . . .
57
5.7 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
62
6 What do People Seek on Forum77?
64
6.1 Why Study Addiction? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
65
6.1.1 Addiction is Highly Prevalent . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
65
6.1.2 Addiction is Highly Stigmatized . . . . . . . . . . . . . . . . . . . . . . . . . . . .
65
6.1.3 People are Turning Online for Help with Addiction . . . . . . . . . . . . . . . . . .
66
ix
6.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
66
6.3 Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
68
6.3.1 Thematic Analysis Development Dataset
. . . . . . . . . . . . . . . . . . . . . .
68
6.3.2 Labeled Training & Testing Dataset . . . . . . . . . . . . . . . . . . . . . . . . . .
68
6.4 Who Posts? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
69
6.5 Users’ Objectives in Initiating Discussions . . . . . . . . . . . . . . . . . . . . . . . . . .
69
6.6 Classifying Informational vs. Emotional Support . . . . . . . . . . . . . . . . . . . . . . .
70
6.6.1 Training Dataset Annotation and Agreement . . . . . . . . . . . . . . . . . . . . .
70
6.6.2 Classifier Training . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
71
6.6.3 Classifier Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
73
6.7 Classifying Updates vs. Non-updates . . . . . . . . . . . . . . . . . . . . . . . . . . . .
73
6.7.1 Classifier Training . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
73
6.7.2 Classifier Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
73
6.8 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
74
6.8.1 Thomas’ Recipe: An Informal Collaboration . . . . . . . . . . . . . . . . . . . . .
76
6.9 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
77
6.9.1 Limitations and Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
79
6.10 Summary
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
7 Identifying Drugs of Choice
79
83
7.1 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
84
7.2 Datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
85
7.3 Automatically Identifying Drugs of Choice . . . . . . . . . . . . . . . . . . . . . . . . . .
85
7.3.1 Definition of Drug of Choice . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
85
7.3.2 Data Annotation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
86
7.3.3 Classifier Training & Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . .
86
Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
86
7.3.4 Drug Term Resolution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
87
7.4 Comparing Real-World DOC Distributions . . . . . . . . . . . . . . . . . . . . . . . . . .
88
7.4.1 Forum77 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
89
7.4.2 Narcotics Anonymous . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
90
7.4.3 TEDS
90
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
x
7.4.4 DAWN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
90
7.5 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
91
7.6 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
93
7.6.1 Limitations & Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
94
7.7 Summary
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
8 Quantifying Recovery and Relapse
95
96
8.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
96
8.2 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
97
8.2.1 The Prescription Drug Abuse Cycle . . . . . . . . . . . . . . . . . . . . . . . . .
97
Withdrawal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
98
Self-Detoxification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
98
Relapse & Recovery
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
98
8.2.2 In-Person Mutual Help Groups . . . . . . . . . . . . . . . . . . . . . . . . . . . .
99
8.2.3 Inferring Health State from Social Media . . . . . . . . . . . . . . . . . . . . . . .
99
8.3 Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
8.4 Exploring & Modeling Phases of Addiction . . . . . . . . . . . . . . . . . . . . . . . . . . 101
8.4.1 Transtheoretical Model for Behavior Change . . . . . . . . . . . . . . . . . . . . . 101
8.4.2 Rubric Development . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
8.4.3 A Taxonomy of the Phases of Addiction . . . . . . . . . . . . . . . . . . . . . . . 102
8.4.4 Labeling People, not Posts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
8.5 Characterizing the Phases of Addiction . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
8.5.1 Sample & Labeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
8.5.2 Activity Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
8.5.3 Linguistic & Content Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
LIWC Features
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
Days Mentioned and Question Features . . . . . . . . . . . . . . . . . . . . . . . 105
Phase-Specific Term Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
8.5.4 Results: Activity and Linguistic Features . . . . . . . . . . . . . . . . . . . . . . . 106
8.6 Automatically Classifying Addiction Phase . . . . . . . . . . . . . . . . . . . . . . . . . . 108
8.6.1 Model & Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
8.6.2 Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
xi
8.6.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
8.7 Automatically Classifying Relapse and Recovery . . . . . . . . . . . . . . . . . . . . . . 111
8.7.1 Identifying Relapse . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
8.7.2 Identifying Recovery . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
8.7.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
8.8 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
8.8.1 Use and Efficacy of Forum77 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116
8.8.2 Implications for Forum Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117
8.8.3 Implications for Addiction Treatment . . . . . . . . . . . . . . . . . . . . . . . . . 118
8.8.4 Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119
8.9 Summary
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119
9 Conclusion
121
9.1 Contribution Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121
9.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123
9.2.1 Supporting the Methodological Process . . . . . . . . . . . . . . . . . . . . . . . 123
Interface Support for Thematic Analysis . . . . . . . . . . . . . . . . . . . . . . . 124
Improved Tools for Annotation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124
Mapping the Limits of the Crowd in PAT Annotation Tasks . . . . . . . . . . . . . 125
9.2.2 PAT Interface Design & Support . . . . . . . . . . . . . . . . . . . . . . . . . . . 125
Expose Aggregate Data to Users . . . . . . . . . . . . . . . . . . . . . . . . . . . 125
Support Data Entry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126
Automatically Construct User Timelines . . . . . . . . . . . . . . . . . . . . . . . 126
9.2.3 Making the Leap to Medical Discoveries . . . . . . . . . . . . . . . . . . . . . . . 126
9.3 Concluding Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127
A ADEPT Supplementary Material
128
B F77 Purpose Supplementary Material
129
C F77 Drug of Choice Supplementary Material
130
D F77 Phase Supplementary Material
136
xii
List of Tables
4.1 Top 40 MedHelp forums ranked by total post count. A ◦ in the Stigmatized column denotes our conservative estimate of whether the condition represented by the forum carries
a stigma or is otherwise embarrassing. . . . . . . . . . . . . . . . . . . . . . . . . . . . .
36
5.1 Majority vote at the token level over RN responses. Terms identified by RNs as medically
relevant are shown in bold. Stopwords (e.g.,“and”, “of”) are excluded from the vote. . . .
49
5.2 Turker performance against the RN gold standard. Voting threshold indicates the minimum
number of Turkers who have to annotate a term as medically relevant for it to be included
in the result. Maximum column values are indicated in bold. A corroborative policy of 2+
votes yields high scores across the board, and maximizes F1-score.
. . . . . . . . . . .
50
5.3 Annotator performance against the crowd-labeled data set and the gold standards. Maximum column values are indicated in bold. . . . . . . . . . . . . . . . . . . . . . . . . . .
54
5.4 Examples of ADEPT’s misclassifications in the test corpora. . . . . . . . . . . . . . . . .
56
6.1 Summary statistics of a representative sample of online health communities focused on
addiction recovery. We identified sites through Google searches and gathered statistics
(if available) from site pages. Data current as of 3/1/2014. . . . . . . . . . . . . . . . . .
67
6.2 Annotator-derived taxonomy for users’ objectives in initiating a post, with % prevalence in
the 1,000 post labeled sample on the right. Note that 1.) labels are mutually exclusive, 2)
“w/d” stands for “withdrawal”. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
71
6.3 Descriptions and samples of taxonomy labels. Samples are synthesized in order to preserve user privacy.
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
72
6.4 Classifier performance for labeling initiating posts as seeking informational support or
emotional support. Performance scores are averaged over 10 folds. . . . . . . . . . . . .
73
6.5 Classifier performance labeling posts as either update or non-update. Performance scores
are averaged over 10 folds. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
xiii
74
6.6 Thomas’ Recipe (circa 2001) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
81
6.7 Thomas’ Recipe (circa 2006) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
82
7.1 DOC classifier performance across term categories. The classifier performs best on correctly spelled, specific drug terms; worst on general drug terms. . . . . . . . . . . . . . .
87
7.2 Examples of DOCs extracted by our CRF classifier. Identified SOA terms are shown in
bold in the context of their originating sentence, and the resolved drug name, generic
name and category are shown on the right. . . . . . . . . . . . . . . . . . . . . . . . . .
87
7.3 Summary of similarities and differences between our Forum77, NA, TEDS and DAWN
datasets. Forum77 is unique in that participation is always voluntary and that users report
only substances that they deem relevant. . . . . . . . . . . . . . . . . . . . . . . . . . .
88
7.4 Alignment of categories across the Forum77, NA, TEDS and DAWN datasets for comparative purposes. Exact category terms from each survey have been preserved in this table
for replicability. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
89
8.1 Addiction Phase Taxonomy derived via a thematic analysis. . . . . . . . . . . . . . . . . 103
8.2 Sample phase specific terms for the USING, WITHDRAWING and RECOVERING categories.
106
8.3 CRF performance scores aggregated over 10 runs of 10-fold cross validation, with randomly shuffled input sets. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
8.4 Performance for identifying relapse events (top) and whether a user’s final state is RECOVERING
(bottom). Combined scores across classes are shown in bold. . . . . . . . . . . . 113
8.5 Comparison of activity features for users who are and are not RECOVERING in their last initiating post. Per-user values are aggregated over USING and WITHDRAWING posts. Statistical significance is determined using Kruskal-Wallis tests (*** p < 0.001) after Bonferroni
corrections. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
A.1 The following features are specified when training our CRF. Other features retain their default values as described at http://nlp.stanford.edu/nlp/javadoc/javanlp/edu/
stanford/nlp/ie/NERFeatureFactory.html . . . . . . . . . . . . . . . . . . . . . . . 128
B.1 Features used to train our purpose classifiers, which distinguish emotional from informational support seeking, as well as update from non-update posts. . . . . . . . . . . . . . 129
xiv
C.1 Drug term resolution map, manually compiled from classifier output. The i column indicates whether the drug category is included in our analysis in Chapter 7. . . . . . . . . . 130
C.2 The default feature list for Stanford’s NER classifier is at nlp.stanford.edu/nlp/javadoc/
javanlp/edu/stanford/nlp/ie/NERFeatureFactory.html. Here, we list all features
whose default values were changed to train our DOC classifier. . . . . . . . . . . . . . . 134
C.3 Gazette of common substances used as a feature in the DOC classifier. This gazette was
compiled from a range of online resources. . . . . . . . . . . . . . . . . . . . . . . . . . 135
D.1 LIWC features for the three classes in the labeled dataset over initiating posts. Only
statistically significant variables are shown. Statistical significance is determined using
Kruskal-Wallis tests (* p < 0.05; ** p < 0.005; *** p < 0.001) after Bonferroni corrections
to adjust for family-wise error rate across all 184 variables (includes activity features).
Column c denotes (◦) if the feature is used in our CRF classifier. . . . . . . . . . . . . . . 136
D.2 LIWC features for the three classes in the labeled dataset. Only statistically significant
variables are shown. Statistical significance is determined using Kruskal-Wallis tests (*
p < 0.05; ** p < 0.005; *** p < 0.001) after Bonferroni corrections to adjust for family-wise
error rate across all 184 variables (includes activity features). Column c denotes (◦) if the
feature is used in our CRF classifier. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137
D.3 Activity and content-based features for the three classes in the labeled dataset. Statistical
significance is determined using Kruskal-Wallis tests (* p < 0.05; ** p < 0.005; *** p <
0.001) after Bonferroni corrections to adjust for family-wise error rate across all 184 variables (includes 160 LIWC variables). Column c denotes (◦) if the feature is used in our
CRF classifier. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138
xv
List of Figures
1.1 Our general methodological process. Nodes in grey show avenues for future work supported by our contributions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.1 Illustrative example of MedHelp and Forum77 content and structure.
. . . . . . . . . . .
5
37
4.2 Summary statistics of Forum77 variables: post volume by month (A), user volume by
month (B), thread length distribution (C), user tenure distribution (D), user initiating post
count distribution (E), and user response post count distribution (F). . . . . . . . . . . . .
38
5.1 Final PAT medical term identification task instructions and interface. Turkers were informed
that their answers would be checked against other Turkers’ in the HIT description on the
MTurk interface. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
48
5.2 Sample sentences labeled by ADEPT, the dictionary, MetaMap, OBA and TerMINE. . . .
54
5.3 Term classification accuracy plotted against logged term frequency in test corpora. Purple
(darker) circles represent terms that are always classified correctly; blue (lighter) circles
represent terms that are misclassified at least once. A LOWESS fit line to the entire data
set (black) shows that most terms are always classified correctly. A LOWESS fit line to the
misclassified points (blue/lighter) shows that classification accuracy increases with term
frequency. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
55
5.4 Top 50 terms, ranked by frequency, derived from MedHelp’s Arthritis forum as determined
by ADEPT (left) and OBA (right). Terms unique to their respective portion of the list are
shown in bold. Terms occurring in both lists are linked with a line. The gradient of these
lines show that all co-occurring terms, bar three, are more highly ranked by ADEPT. . . .
58
5.5 A graph showing important terms in Forum77 (nodes), and significant co-occurrence relationships between them (edges). Node size is proportional to degree, while colors indicate
clusters. Node labels are omitted for legibility; instead, we examine main clusters in-depth
in subsequent figures. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
xvi
59
5.6 The largest cluster in Figure 5.5 suggests that discussions frequently involve detoxification
from prescription drugs. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
60
5.7 The second-largest cluster in Figure 5.5 suggests that discussions frequently pair specific
drugs and the withdrawal symptoms that they cause. . . . . . . . . . . . . . . . . . . . .
60
5.8 The third-largest cluster in Figure 5.8 contains medically relevant terms from Thomas’
Recipe: a user-developed schedule for medication-assisted opioid withdrawal. . . . . . .
61
6.1 Thematic analysis process. Orange edges indicate the iterative component of the analysis. 70
6.2 Normalized transition probabilities and average transition times between consecutive update and non-update posts. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
76
7.1 Drug of choice distributions (% of population using) across the Forum77, TEDS, NA and
DAWN data sets. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
91
7.2 Prevalence of major opioids in the Forum77 population over time. . . . . . . . . . . . . .
92
8.1 Illustration of how sequence analysis can (1) reduce NA labels by leveraging context from
surrounding posts, and (2) capture relapse events in regressive sequences without requiring the user to explicitly state that she relapsed. . . . . . . . . . . . . . . . . . . . . . . . 104
8.2 Confusion matrix for our CRF classifier aggregated across 10 randomized runs of 10-fold
cross validation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
8.3 (a) Normalized transition frequencies between addiction phases (e.g., USING → RECOVERING
edges comprise 1.12% of the total transitions in the CRF-labeled data) and (b)
conditional transition probabilities (e.g., the probability of a user moving from USING to
RECOVERING
is 4.57%.) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
8.4 Distributions of phase lengths. A red bar indicates the median value, while the dark blue
region indicates the middle spread. The light blue region indicates values that fall within
1.5 ∗ the interquartile range of the middle spread. . . . . . . . . . . . . . . . . . . . . . . 112
8.5 Aggregated user transitions from start to end state. Bar widths denote population proportion. For example, 48% of users in our sample relapsed during their tenure on Forum77.
114
9.1 Our general methodological process. Nodes in grey show avenues for future work supported by our contributions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124
xvii
Chapter 1
Introduction
Just keep in mind that whether you recommend online support groups or not, your patients
will use them. There’s no getting around the fact that certain patients in your practice will become as knowledgeable about their conditions as they can. They will also begin to develop
clinical judgment on their own.
– Deborah Grandinetti: Doctors and the Web. Help your patients surf the Net safely [104].
1.1
Overview & Focus
The Internet has revolutionized the way in which people interact with medical knowledge, transforming
its availability, leveling the playing field in terms of who can contribute such knowledge, and facilitating connections between people with shared health concerns. While to this day accessing and sharing
medical knowledge via traditional resources (e.g., medical practitioners, textbooks, pamphlets, etc.) requires overcoming financial, scheduling and geographic barriers, such frictions are divorced from online
resources. Indeed, the use of the Internet as a health resource is one of its earliest functions: with the
commercialization of the Internet in 1995, patients readily took advantage of the ability to collaborate
with others who shared their health concerns, and the first online health communities (OHCs), in the
form of listservs, came into existence.
Demand for such groups remains high today. Pew’s 2013 Health Online survey [91] reported that
59% of U.S. adults looked online for health information in the last year, and that of these, 16-18% specifically sought to find others who shared their health concerns. Based on the U.S. Census Bureau’s
population estimate for 2013 [9], this comprises some 50-57 million people. Today, thousands of OHCs
1
CHAPTER 1. INTRODUCTION
2
exist, and while their interfaces have become slightly more sophisticated, their underlying functionality of
connecting patients with mutual health interests remains unchanged.
Through participation in online mutual help groups, patients can spend a sizable number of hours
performing complex, health-related tasks. These include differential diagnoses (of either their own or
someone else’s condition), treatment comparison and evaluation, symptom measurement and documentation, and seeking and providing emotional support. To perform these tasks, patients draw on a
variety of resources: their own experiential knowledge, observations of other community members’ experiences, information sourced from healthcare providers, and the fruits of self-directed research efforts.
The culmination of this effort is a massive, and growing, corpus of data contributed by patients who have
gained not a small degree of clinical expertise in their own condition. Although in some cases these data
are structured (e.g., PatientsLikeMe1 and CureTogether2 collect symptom severity measurements on numerical scales), for the most part OHCs have barely deviated from the original listserv format, meaning
that a large portion of these data exist as free-form text.
We term any medical text authored by patients patient authored text (PAT). PAT contains inherently
valuable content. Foremost, PAT uniquely documents patients’ behavior outside of the clinical environment. As such, it can host insight into topics that remain obscure in traditional medical data sets, such
as why patients follow only certain parts of a treatment protocol, or how people self-manage conditions
that carry a stigma in the medical profession, like addiction [176, 187]. Answers to such questions could
have high-level policy impacts on healthcare systems, potentially affecting both their efficiency and efficacy. PAT may also contain data of immediate medical value. Prior work has leveraged PAT to identify
disease trends [33, 41] and adverse drug events [257]. Through active collaboration, OHC participants
have uncovered novel insights into disease co-morbidities (such as a correlation between asthma and
infertility [40]) and drug-treatment effects (such as the questionable efficacy of lithium as a treatment for
ALS [260]) which have been replicated in subsequent medical trials [97, 260]. Finally, medically-relevant
data derived from PAT could be used to both enhance community design as well as support members in
tasks that they already perform, such as polling treatment popularity or sourcing drug reviews.
In spite of the inherent value in PAT and the enormous number of human-intelligence hours invested
in its creation, attempts to leverage PAT have been limited for three main reasons. First, PAT is notoriously
noisy and often incomplete, making it challenging to work with. For example, the fact that authors may
have only partial mastery over medical terminology casts the accuracy of their symptom descriptions
1 http://www.patientslikeme.com
2 http://www.curetogether.com
3
into doubt. Moreover, they may omit important information and their contributions may be infrequent and
irregular.
Second, and closely related, is the dearth of methods, approaches and toolkits for extracting medicallymeaningful data from PAT. Take, for example, the basic problem of identifying medically-relevant terms
in PAT. While well-established toolkits for extracting medical terms from text authored by medical experts
exist, as we show in Chapter 5 their performance on PAT is sufficiently poor that the resulting output is
of dubious analytic value.
Third, the question of whether PAT contains data of medical relevance is contentious. As we discuss
in detail in Chapter 2, medical professionals especially take issue with such claims. Even taking an openminded perspective, however, the medical relevance of PAT in relation to a specific research question is
usually unclear, and must be determined empirically. This relevance tends to depend on how well the
research question aligns with users’ motivations for authoring PAT. For example, because people mention
their influenza-like symptoms on social media platforms, Twitter is a viable data source for monitoring
influenza outbreaks [10, 15, 62, 213]. However, Twitter would be a poor data source for comparing drug
dosage efficacy, because people do not consistently tweet drug dosages, schedules, and self-reported
wellness metrics. Determining what medically relevant signals are present in PAT is a challenge separate
from extracting them.
Our goals in this work are twofold: first, to develop methods for extracting a variety of medicallyrelevant data from PAT. Second, to uncover medically-meaningful insights through the application of
these methods. To this end, we focus specifically on the topic of addiction, studying Forum77: MedHelp’s
3
online health community for Addiction & Substance Abuse. Addiction is both highly prevalent,
affecting 16% of Americans ages 12 or older (about 40 million people), which far exceeds the number of people afflicted with heart disease (27 million), diabetes (26 million), or cancer (19 million) [4],
and highly stigmatized, even within the medical profession [176, 187]. These facts conspire to make
addiction-related PAT a rich source for novel and impactful insights.
Our work draws from and contributes to several fields in Computer Science. From the Human Computer Interaction perspective, we investigate crowdsourcing as a method for large-scale data annotation,
and leverage methodological work on thematic analyses to develop taxonomies of medically relevant
information contained in our PAT data sources. From the Computer Supported Cooperative Work perspective, we investigate the types of support that users give and receive, and analyze on-site behavioral
3 http://www.medhelp.org
4
and content features that correlate with successful and unsuccessful participatory outcomes in Forum77.
On the Natural Language Processing side, we evaluate the application and extension of existing statistical classification methods to a variety of PAT information extraction tasks. Finally, to guide the validity
of our work from a medical perspective, we collaborated closely with an addiction specialist: a practicing
psychiatrist who specializes in the topic of substance use disorders.
1.2
Contributions
In concert, this thesis contributes a viable, multi-stage approach for finding and extracting data of medical relevance from PAT. The specific contributions of this thesis are:
Targeted literature reviews that serve both to illuminate the landscape of related work as well as contextualize our own work. In particular, we review:
Online health seeking behavior: via a cross-disciplinary literature review, we first synthesize an
overview of the demographics, methods and motives of people who seek health information online.
Next, we narrow our focus to the specific topic of OHC participation, exploring users’ reasons for
participation as well as whether and how such participation is beneficial (Chapter 2).
Prior work analyzing patient authored text: we conduct an extensive review of literature utilizing
PAT as a primary data source, including work on pharmacovigilance, syndromic surveillance, entity
extraction and thematic analyses (Chapter 3). To our knowledge, this review is the first comprehensive synthesis and summary of data sources, methods, goals and outcomes of prior work that
utilizes PAT as a primary data source.
Methods for extracting medically-relevant data from PAT. Our characteristic methodology, illustrated
in Figure 1.1, moves through human categorization and labeling of data to automatic extraction and
analysis. Accordingly, our methods comprise multiple stages, including inductive content analysis, data
annotation, feature engineering, classifier training and result analysis. Our specific contributions are:
A method for crowdsourcing medically-relevant term annotation in PAT. Having medical experts annotate data is both costly and slow. We show that for the task of identifying medicallyrelevant terms in PAT, a crowd of non-experts yields annotations comparable in quality to those
submitted by medical professionals (Chapter 5).
5
Data-driven annotation rubrics describing what users seek when they initiate posts on Forum77
(Chapter 6), as well as the phases of addiction that users exhibit on Forum77 (Chapter 8). These
rubrics, educed via thematic analyses of Forum77 content, serve as novel contributions in their
own right as well as reusable guides for data annotation.
A novel analysis of behavioral and linguistic features that correlate with each phase of addiction. The results of this feature space analysis (Chapter 8) give novel insight into how the
psychologically and physiologically distinct phases of addiction correspond with Forum77 users’
behavior and linguistic usage. They are also a valuable resource for feature design and engineering.
Trained classifiers that accurately extract medically-relevant data from PAT. We train classifiers that accurately extract medically-relevant terms (Chapter 5), addictive drugs of choice (Chap-
Future Work
ter 7), phase of addiction at the time of writing (Chapter 8) and the type of support that a user is
seeking when she initiates a thread (Chapter 6) from PAT. These classifiers are novel in function.
We make them freely available to support future work and comparisons in this area.
close annotation
reading
PAT
Content
Schema
training
application
Labeled
Labeled
Data
Data
Features Classifier
(human)
(auto)
schema
revision
processing
& analysis
Processed Data
Insights
tuning
PAT
interface
design
Medical
Discovery
Figure 1.1: Our general methodological process. Nodes in grey show avenues for future work supported
108
by our contributions.
Medically-relevant insights on Addiction.
Our classification methods allow us to scale our analyses
to the entire Forum77 population. Some of the resulting insights are, to the best of our knowledge, novel
to both the Computer Science and the Addiction literature. These insights include the discovery that:
6
Users actively collaborate on developing highly effective medication-assisted withdrawal
treatment protocols. The most prevalent example of this is Thomas’ Recipe, a detailed protocol
for medication-assisted opiate withdrawal that has evolved on Forum77 over the course of several
years (§ 6.8.1).
The Forum77 population is comprised almost entirely of people struggling with prescription
opioid abuse, making it strongly distinct from traditionally surveyed drug-using populations. Our
results evidence that such populations are not well covered by existing medical research methods.
While relapse is common, chances of a user leaving Forum77 in the state of RECOVERING
are favorable. Although different methodological approaches make comparison with real-world
treatments difficult, our results suggest that Forum77 is an effective self-detoxification resource.
Active participants are more likely to leave Forum77 in a state of RECOVERING. Such users
participate significantly more frequently than those who leave in a state of ¬ RECOVERING, even
when they are USING and WITHDRAWING. This resonates with prior research that shows that
increased participation in the traditional mutual help group Alcoholics Anonymous correlates with
sustained sobriety [190, 223].
1.3
Outline of Thesis
Chapters 2-4 serve to contextualize our work and give the reader a framework for reference and evaluation. Chapter 2 presents a targeted literature review of online health information seeking. We begin
with a broad overview of online health information seeking (§ 2.1) before focusing on the question of who
participates in OHCs, their motivations for doing so, and the associated benefits and pitfalls of participation (§ 2.2). Chapter 3 begins with a definition of PAT accompanied by a discussion of its values and
the challenges that it presents (§ 3.1). Next, we synthesize prior work that utilizes PAT as a primary data
source, including syndromic surveillance (§ 3.2), pharmacovigilance (§ 3.3), Named Entity Recognition
(§ 3.4) and Thematic Analyses (§ 3.5). Chapter 4 describes the data sets that we use in our work: the
MedHelp corpus (§ 4.1), which includes the Forum77 data set, and the CureTogether corpus (§ 4.2).
While PAT contains a wealth of information, it is inherently noisy, and requires text mining techniques
to extract data of value. In Chapter 5, we address one of the most basic problems of this sort: identifying
medically-relevant terms in PAT. After discussing related work (§ 5.2) and data preparation (§ 5.3), we
7
explore the feasibility of replacing experts with non-expert crowds in medical term annotation tasks
(§ 5.4). Next, we show that a conditional random field (CRF) model trained on crowd-labeled data
dramatically outperforms state of the art medical term annotation tools (§ 5.5). Finally, we demonstrate
the effectiveness of our approach through applying our classifier to large PAT corpora (§ 5.6). While
our results demonstrate the efficacy of our approach, we find that the extracted data are too broad for
deriving insights on specific medical conditions. We narrow our focus to the topic of addiction, one of the
most urgent public health issues of the day.
Understanding why people participate in Forum77 is a precursor to more targeted analyses. Chapter 6 poses the question, “what do people seek on Forum77?”. We first motivate studying the topic of
addiction (§ 6.1), before discussing related work (§ 6.2) and data preparation (§ 6.3). Next, we present the
process and result of a thematic analysis of users’ motivations for initiating Forum77 discussions (§ 6.5).
Congruent with prior work, driving motivations are the seeking of informational and emotional support.
In terms of informational support, we find that users primarily seek explicit medical advice on prescription opioids. In the emotional support category, the update post, in which users log their progress but
request no feedback, is highly prevalent. We train machine learning classifiers to distinguish emotional
from informational support-seeking (§ 6.6), as well as update from non-update posts (§ 6.7). Finally, we
present and discuss the results of applying our classifiers to the entire Forum77 data set (§ 6.8 & § 6.9).
Chapter 7 establishes whether the Forum77 population is similar to traditionally surveyed drug-using
populations in terms of drugs of choice (DOCs). We first discuss related work (§ 7.1) as well as our
data preparation and sampling (§ 7.2). Next, we present our method for automatically extracting users’
DOCs from Forum77 initiating posts (§ 7.3), which comprises data annotation, classifier training and term
resolution. We then detail how we compare our classifier-derived Forum77 DOC distribution with those
from three traditionally-surveyed drug-using populations (§ 7.4). Among other things, our results (§ 7.5)
indicate that Forum77 is used primarily by people struggling with prescription opioid use disorders, rather
than by people using traditionally-abused substances such as alcohol, cocaine and marijuana (§ 7.5).
Finally, we discuss the implications and opportunities revealed by these results (§ 7.6).
Chapter 8 focuses on the topic of the cycle of abuse, a well-known concept whose stages and
transitions, to the best of our knowledge, have never been quantified. Drawing on the addiction literature,
we first describe the phases of drug abuse and define key terminology (§ 8.2), and then describe our data
preparation and sampling (§ 8.3). Next, building on the well known Transtheoretic Model for Behavioral
Change [203], we develop a taxonomy describing the phases of addiction as they are expressed on
8
Forum77 (§ 8.4). We then analyze a variety of behavioral and content-based features in order to identify
features that discriminate between the phases USING, WITHDRAWING and RECOVERING (§ 8.5). Next,
we present our statistical classifier for identifying addiction phase (§ 8.6), and discuss how this enables
us to identity important sequences in the process of addiction, such as relapse and recovery (§ 8.7).
Aggregating these events across the entire Forum77 membership base indicates, amongst other results,
that although relapse is common, reaching a state of RECOVERING prior to leaving the forum is likely
(§ 8.7.3).
In Chapter 9, we reiterate the main contributions of this thesis (§ 9.1), and outline challenges for
future work (§ 9.2), and offer our concluding thoughts (§ 9.3).
Chapter 2
The Internet and Health
Millions of people around the world seek health information online, and have been doing so since the
earliest days of the Internet [166]. But who are these people, and what do they seek? Our goal in this
chapter is to provide readers with a contextual backdrop against which to interpret our work. Drawing
on prior work from Computer Science, Medical Informatics and Medicine, we first describe online health
information seeking in general (§ 2.1), beginning with an historical overview before investigating what
kinds of information people seek, who seeks this information, and where. Next, we focus on a specific
subset of online health information seeking: online health community (OHC) participation (§ 2.2). We pay
particular attention to who participates, their motivations for doing so, and potential benefits associated
with participation. Finally, we summarize our findings (§ 2.3) before moving on to a literature review of
prior work utilizing PAT as a primary data source (Chapter 3).
2.1
Online Health Information Seeking
2.1.1 Historical Overview & Current Landscape
When the Internet was commercialized in 1995 [120], widespread consumer adoption brought with it
widespread supply and demand for health information [49]. The Internet made health information more
accessible. An example illustrates: between 1997-1998 the National Library of Medicine (NLM) made
Medline1 , a repository of journal citations and abstracts from the biomedical literature previously only
available to medical professionals, publicly accessible online. The number of queries to Medline increased almost threefold, from 7 million to 120 million, with more than 30% of new queries stemming
from consumers [49]. In response, the NLM launched MedlinePlus2 , a site hosting information targeted
1 https://www.nlm.nih.gov/bsd/pmresources.html
2 http://www.nlm.nih.gov/medlineplus
9
CHAPTER 2. THE INTERNET AND HEALTH
10
specifically at patients and their families [49]. The move was a roaring success: in the first quarter of
1999, MedlinePlus had 62,638 unique visitors. Since then, this statistic has only increased: in the third
quarter of 2013, the site had ∼81,000,000 unique visitors [172].
In addition to making health information more accessible to consumers, the Internet also broadened
the scope of potential contributors: for the first time, health information could be easily sourced from and
exchanged between patients. Widespread, patient-driven mutual help efforts unfolded simultaneously
with the commercial web. As early as 1997, Salem et al. [215] published an analysis of an online mutual
help group for depression; their study covered 2 weeks’ worth of data and comprised 533 participants.
Even earlier, in 1996 Mayer and Till [166] published a short, interview-based study of a breast cancer
listserv allegedly utilized by thousands of patients. Today, a full 8% of Internet users in the U.S. report
either sharing a personal health experience or posting a related question online [91].
The revolution in how health information was created and shared was received primarily positively by
consumers and sociologists, who celebrated its potential for “democratizing” healthcare and rebalancing
the power dynamic in doctor-patient relationships [182]. The reaction from the medical community was
substantially more turbulent. Early research on online health information seeking raised concerns about
the quality of the information available, as well as patients’ ability to evaluate it critically [49, 156, 181,
182, 199, 210]; some even described the phenomenon as an “epidemic of misinformation” [51]. Indeed,
discussion in the medical literature at the time communicates a strong resistance to the idea of patients
pursuing medical knowledge outside the purview of a medical professional [104, 182]. For example, in
2000 the Journal of Medical Economics initiated a series of articles aimed to educate doctors about
online resources so that they, in turn, could guide their patients through the plethora of available online
health information resources. The first article in the series is titled, “Doctors and the Web: Help your
patients surf the Net safely” [104].
Despite these concerns, analyses of online health seeking behavior indicates that patients are, in
fact, highly skeptical of information presented online and take care to evaluate it critically [21, 105, 156,
178, 182, 205, 209, 225]. Patients tend to mistrust information from websites that appear to be primarily
commercial [92, 182], have unclear sources of information [92], or that seem unprofessional or highly
opinionated [182]. Moreover, rather than taking a single source at face value, patients typically evaluate
information quality by aggregating information from multiple sources [82, 92, 182, 205], and even posing
and testing hypotheses from one information source to the next [225]. That said, online health seekers
are not infallible: cyberchondria – the escalation of a user’s perception of the severity of her medical
11
state as a result of researching it online – has been provably documented, and results in increased
stress levels and potentially unnecessary use of available medical resources [254, 255].
Measuring the quality of online health information is challenging. Prior work finds that information
accuracy tends to be high [25, 80]. For example, in an independent evaluation of 4,600 posts on The
Breast Cancer Mailing List3 , Esquivel et al. [80] found only 10 (0.22%) posts containing misleading or
incorrect information. Of these, 7 were identified as such by participants and corrected within 4.5 hours.
However, the majority of studies from the medical domain conclude that online health information is of
subpar quality [21,83]. A common point of failure cited is whether the information is “complete” (covers all
medically-relevant details). However, the value of the completeness metric has been called into question:
first, including all relevant medical information might comprise information overload for readers [83].
Second, as patients typically synthesize medical information from a variety of sources [82, 92, 182, 205],
they are likely robust to this. Patients themselves report that in general they have no trouble finding the
information that they need online [92, 105].
Despite this, strong resistance, and even condescension, from medical professionals is a common
response to the idea of patients pursuing medical knowledge online. “Many of the participants reported
symptoms that they attributed to using a computer keyboard, so it appeared incongruous that they turned
for help to an activity that required more typing”, quip Culver et al. [64] in an evaluation of an online
health community on Carpal Tunnel Syndrome. Yet even amongst surveyed physicians, there is general
agreement that the result of patients pursuing medical information online is rarely harmful, and in fact can
be moderately beneficial [181, 199]. One explanation may be that the public dissemination of medical
knowledge, which was previously exclusive and difficult to access, challenges medical professionals’
dominance as medical experts [116]. Indeed, many physicians who feel that online health information
seeking negatively impacts the doctor-patient relationship also feel that their patients are challenging
their authority [11, 181]. Today, almost 20 years later, most research agrees that the nature of the
patient-doctor-internet relationship remains in flux, with resistance from the medical field barring potential
synergies from reaching fruition [14, 121].
3 http://www.bclist.org
2.1.2
12
What Health Information Do Users Seek Online?
Despite the concerns echoed in the medical literature, patients seem disinclined to stage a cyber coup
d’etat against the medical profession. In fact, with the exception of teens [105, 209], patients rarely consider the Internet their primary or most important source of medical information [82, 165, 200]. Rather,
information acquired online tends to supplement or complement that acquired through traditional channels [82, 149, 209], and is often sought for the express purpose of discussing it with a medical practitioner [49, 92, 181, 205, 225]. Moreover, patients have preferences over which types of information they
would prefer to acquire online: respondents to Pew’s 2010 Peer-to-peer Healthcare Survey [90] said
that they would prefer to communicate with medical professionals for information regarding prescription
drugs and alternative treatments, an accurate diagnosis, and recommendations for other medical professionals and medical facilities. Peers and professionals were rated as equally helpful for practical advice
for day-to-day coping, and peers were rated most helpful for emotional support and quick remedies for
non-urgent, everyday health issues.
Major categories of online information sought by patients include finding disease-specific information [49, 91], finding information about particular medial treatments or procedures [91]; and attempting
to diagnose or treat a new condition [49, 91]. In fact, Pew’s 2013 Health Online survey found that 35%
of American adults tried to diagnose a condition using information found online; of these, roughly half
followed up with a medical professional [91]. Cartright et al. [42], who analyzed user search logs surrounding self-diagnosis attempts, observed two patterns: evidence-based searching, in which users
searched for a condition that matched a set of symptoms and risk factors, and hypothesis-based searching, in which given a specific condition, users searched for symptoms and risk factors associated with that
condition. Minor categories of health information sought online include finding information about health
insurance, food and drug safety recalls, interpreting medical test results, information on weight loss [91],
and finding reviews on medical professionals or medical facilities [49]. Finally, an estimated 16-18% of
online health seekers go online specifically to find others who share their health concerns [90, 91].
2.1.3
Who Seeks Health Information Online?
Early proponents of the Internet as a health information resource touted its potential as a liberating
technology for those with limited access to traditional health resources [182]. In some ways this is
true: online health information seeking seems to be need-driven, with those suffering from chronic or
13
stigmatized conditions more likely to seek health information online. However, survey-based research
also points to a strong “digital divide” between those who have access to, and are comfortable using
the Internet as a determinant of who searches for health information online. We discuss discriminating
features in detail below.
Gender
Women are more likely to seek health information in general [57], and this trend is mirrored online [37,57,
90–92, 165] despite the fact that men and women have equal access to the Internet [91]. Pew’s Health
Online [91] survey in 2013 estimated that while 53% of all U.S. male adults look for health information
online; the corresponding statistic for U.S. female adults is 64%. Extrapolating from the 2013 U.S.
Census results [9], approximately 55% of online health seekers are female.
In a survey exploring online health information seeking in 2000, Fox & Rainie [205] describe several
differences between men and women’s health seeking behavior. First, while both men and women are
equally likely to search for information in relation to a parent or older relative, women are twice as likely to
search for information on behalf of a child. This is likely a residual of the fact that women spend more time
on child care [192]. Finally, women are more likely to search for information related to specific conditions
(either physical or mental), while men are more likely to search for information related to sensitive topics
and for information on treatment timelines and administration [205].
Age
Studies measuring the age distribution across online health information seekers report that it is relatively
uniform among adults until the age of 65, at which point it declines [37, 91, 165]. This is contrary to the
fact that health needs generally increase with age, and stands in contrast to the age distribution over
offline health information seekers, who tend to be older (mean age 40 vs. 52) [57]. Both Cotten et
al. [57] and Bundorf et al. [37] hypothesize that this discrepancy is due to the fact that younger people
have more access to and experience with using the Internet. In fact, health information seeking is one
of the most common and important online activities for young people [105, 209]. A random-dial survey
of 1,209 respondents aged 15-24 initiated in 2002 by healthcare provider Kaiser [209] found that 75% of
respondents had looked for health information online: more than had downloaded music (72%), played
games (72%), shopped online (50%) and participated in chat rooms (67%). In fact, many young people
consider the Internet to be their primary source of health information [105].
14
Health
People suffering from chronic conditions (e.g., asthma, diabetes etc.) [37, 90] and people suffering from
stigmatized conditions (e.g., anxiety, herpes, addiction) [24, 67] are highly likely to seek health information online. A casual inspection of our own MedHelp data set (described in Chapter 4) corroborates
this: 8 of the top 20 forums focus on stigmatized or otherwise embarrassing conditions including addiction, Hepatitis C, STDs and HIV (see Table 4.1). Other health characteristics that correlate with online
health information seeking include experiencing a medical crisis within the past year [90], experiencing
a significant change in physical health (e.g., weight loss/gain, smoking cessation) [90], having a rare
condition [90], and having significant barriers to health care (e.g., expense, travel distance) [37].
This suggests that online health seeking behavior is need-driven; however, other evidence also points
to a digital divide: people are more likely to seek health information online if they have health insurance [91] and a regular healthcare provider [165]. Finally, online health seekers self-report as being
healthier than their offline counterparts [57].
Race
Pew’s 2013 Health Online survey [91] reports that 83% of Caucasian adults go online: significantly more
than adult African Americans (74%) and Latinos (73%). Therefore, at a population level, significantly
more Caucasians search for health information online. In a study of online health information seeking
in youth, Rideout et al. [209] observe the same phenomenon, noting that fewer African American and
Hispanic youth in their survey had Internet access at home.
Controlling for adults who use the Internet shows no significant differences in ethnicity between those
who search for health information online and those who do not. In addition, Cotton et al. [57] find
no significant differences in ethnicity between online and offline health seekers. However, Pew’s 2013
Health Online survey [91] highlights some statistically significant, ethnicity-based differences in what
kind of information people seek. For example, Caucasians are more likely than African Americans and
Latinos to look online for a diagnosis and for information pertaining to a specific disease/condition, and
are less likely to search for information on weight loss. African Americans are more likely to conduct
online research on a drug seen in advertising, while Latinos are more likely to search for information on
pregnancy.
15
Socio-Economic Status & Education
Online health seekers tend to have higher income levels than those who do not seek health information
online [57,74,165]. In addition, higher levels of education correlate with online health seeking [57,74,91,
165]. This again suggests a digital divide, with those who have ready Internet access being more likely
to use it as a health information resource. However, o work points out that literacy and language barriers
can prevent people from engaging fully with online health resources [25, 49].
Role (Patient vs. Caregiver)
Queries conducted on behalf of someone else (e.g., a child, a parent or other older relative, or a
friend) comprise roughly 50% of all online health inquiries [90, 91]. Usually such “caregivers” are either women [205] or parents [91] (or both).
2.1.4
Where Do People Find Health Information Online?
There are myriad ways of accessing health information online. We highlight those most often discussed
in related work.
Search Engines
The majority of online health information quests start at a search engine such as
Google4 , Yahoo5 or Bing6 [82, 91, 114, 178]. Users iteratively refine their queries based on search results [82, 114], and in the majority of cases are successful in finding the information that they are looking
for [92, 114].
Medical Information Portals
Sites such as WebMD7 and MedlinePlus8 serve as medical information
portals and are heavily utilized [172]. However, it is rare for online health seekers to have a favorite or
“go-to” information portal [92], and they are rarely the starting point of a user’s search [82].
Online Health Communities
Online health communities (OHCs) provide an interactive environment
in which users can seek others familiar with their health concerns and acquire tailored information.
These groups provide social support, information and shared experiences, and can be empowering
4 http://www.google.com
5 http://www.yahoo.com
6 http://www.bing.com
7 http://www.webmd.com
8 http://www.nlm.nih.gov/medlineplus
16
for patients [49]. Prior work indicates that a significant proportion of online health seekers ultimately
participate in an OHC, with estimates ranging from 8% [91] to 16% [90] to 25% [49]. We discuss OHC
participation in depth in the next section.
2.2
Online Health Community Participation
Having outlined the landscape of online health information seeking in general, we now turn to the specific topic of online health community participation. Where possible, we expand on any relevant details
introduced in § 2.1. We briefly discuss modes of participation (§ 2.2.1), before addressing the question
of who participates in OHCs (§ 2.2.2), why (§ 2.2.3), and what measurable benefits may result from their
participation (§ 2.2.4).
2.2.1
Modes of Participation
OHCs typically comprise environments in which users communicate via posted messages. There are
three primary forms of participation on an OHC: users start new discussions by contributing initiating
posts, and respond to existing discussions with response posts. The third, much overlooked, mode of
participation is lurking, in which users read community-generated content, but never contribute or make
their presence known in any way. Lurking is prevalent in all kinds of online communities [185, 202],
although possibly less so in health-oriented OHCs [186]. Prior work suggests that lurkers’ demographics
and motivations for participating align closely with those of active OHC participants [202]. Moreover,
lurkers and active members derive the same benefits from OHC participation [246]. As defining and
measuring lurking behavior is challenging, we do not discuss it further in our own work, but note here
that capturing lurking behavior is an important avenue for future work.
2.2.2 Who Participates in OHCs?
Demographic analyses of OHC participants similar to those offered in § 2.1.3 are scarce. Unlike the
problem of general health information seeking, OHCs focus on specific medical conditions, many of
which correlate with particular demographic factors. For example, people suffering from breast cancer
tend to be female, and people suffering from Alzheimer’s tend to be older.
However, in concert with research on online health seeking behavior [24], Davison et al. [67] find
that social factors that predict for face-to-face support group seeking correlate with those that predict for
17
online support group seeking. Specifically, conditions that are embarrassing, stigmatized, or disfiguring,
as well as conditions in which a patient’s attitude towards the condition is important in treatment outcome,
lead people to seek the support of others with similar conditions online.
2.2.3
Reasons for Participation
A user’s overarching goal in joining an OHC is to align herself with other people who share her health
concerns [90, 96, 259]. A great deal of literature examines patients’ perceived benefits to OHC participation. Results tend to fall into one of three categories: (1) medium-based affordances, in which users
cite practical advantages related to the fact that OHCs are online, digital resources; (2) informational
support; and (3) emotional support. We discuss each of these in detail below.
Medium-Based Affordances
By nature of being online and digital, OHCs have several unique characteristics that users view as
advantageous, such as the convenience of having the community be available around the clock [49, 60,
162, 205, 275]. Other factors cited include providing access to a wide range of people, information and
experiences [162, 205]; the fact that such information is personalized or tailored [49]; the ability to store
and edit personal narratives [117, 162]; and the perception of privacy and anonymity on OHCs [49, 105,
205, 270, 275]. Users’ ability to conceal their true identities has also been credited with increasing their
propensity to discuss issues that they would not discuss face-to-face [21,105,149]. Finally, OHC content
is easily searchable, making it easier for patients to browse and filter for suitable people to approach for
help. In an analysis of PatientsLikeMe, Frost et al. [96] conclude that searching for similar users is the
primary motivation behind patients’ sharing their data with each other.
Informational Support
The two most cited benefits of OHC participation are the information and emotional (sometimes called
“social”) support given by the community [36, 47, 86, 122, 131, 148, 149, 162, 211, 243, 250, 258]. Informational support constitutes the exchange of clinical as well as experiential knowledge relevant to a
particular condition. Typical topics of discussion include treatments and treatment options [47, 96, 258],
symptoms [96, 258], preventive care [47] and condition outcomes [47, 96]. Patients seek this information
for several reasons, including learning what to expect in the future and how to plan for it [47], informing
decision making (especially related to treatment options) [47, 122], informing day-to-day care/everyday
18
illness management (coping strategies) [60, 90, 122, 131], advice on managing interactions with others
(e.g., from healthcare professionals to colleagues to family) [122], and often for simply acquiring a better
understanding of their condition [47, 122, 149, 258]. As such, OHCs are often a source of information
distinct from and complementary to that typically acquired via medical practitioners.
Emotional Support
In addition to being valuable sources of personalized informational support, OHCs provide users with
an accepting and safe space to vent emotions or discuss uncomfortable topics [149, 243]. Participation
provides users with a means of articulating and making sense of their experience, which they find empowering [131,173]. Patients also receive positive affect, encouragement and sympathy from their fellow
community members [60, 131]. Continued participation over time may result in patients taking on new,
supportive roles [164] as well as developing increased optimism towards their situation [211]. OHCs also
provide patients managing serious conditions with unique types of emotional support that are difficult to
acquire elsewhere. For example, patients find that sharing with people like them partially relieves the
burden of care placed on family members who, despite their best intentions, cannot empathize with the
patient’s experience [162, 243]. In addition, patients find that while family and friends tend to try to normalize their (the patient’s) emotions – even when they are inappropriate – online communities challenge
users on inappropriate emotional behavior [243].
2.2.4
Efficacy of Online Health Forums
While patients perceive many benefits to participating in OHCs, measuring the effect of participation on
their health outcomes is difficult, and raises the question of what metrics really matter in health management. Would we consider OHC participation effective if it altered disease outcome, or shortened time to
recovery? What about if it imparted a sense of control and wellbeing on patients, improving quality of life,
even if it had no effect on prognosis? Although OHC efficacy is difficult to define, participation has been
shown to promote effective disease management strategies [93, 131, 148, 211], and impart psychosocial
benefits, such as improved ability to cope [148, 150, 179], improved mood/decreased distress [158, 211],
and improved stress management [211]. Moreover, some studies report measurable beneficial effects
on symptoms. Houston et al. [130] found that increased participation in a depression-oriented OHC correlated with likelihood of users experiencing a resolution in their condition. Lieberman et al. [158] found
that cancer patients who participated in OHCs reported a decrease in physical pain. However, they note
19
that it is impossible to tell whether this was due to emotional suppression on behalf of their subjects: a
conundrum afflicting the measurement of any subjective symptom.
In general, then, research points to OHC participation having beneficial effects for patients. However,
the jury is still out when it comes to conditions in which negative behaviors are enabled through social interaction with similar patients [21]. While some research finds that OHC participation provides increased
protection and motivation for continuing these behaviors, others conclude that the overall experience may
be a more positive way of dealing with the condition than traditional methods [21, 89, 179]. For example,
Wilson et al. [261] found that patients learned new binging and purging techniques on both pro-eating
disorder sites9 and pro-recovery sites. However, while they found no significant difference in final health
outcomes between the two groups, users of pro-eating disorder sites experienced a significantly longer
illness duration [261]. On the other hand, group bonds forged through shared secret identity may render
participants less likely to reveal their condition to others, potentially increasing the likelihood that they
will not seek appropriate help [98].
2.3
Summary
Our goal in this chapter was to provide a general overview of the landscape of online health information seeking. Beginning with an historical overview (§ 2.1.1), we noted that the advent of the Internet
both made health information more accessible, and made it possible for anybody to contribute health
information online. From patient’s perspective, this was a largely positive improvement, and a great deal
of research supports the notion that little harm, other than cyberchondria, arises from online health information seeking. The medical community, however, remains somewhat opposed to people pursuing
health information outside of the purview of medical professionals.
In general, online health seekers search for information on specific diseases and diagnoses (§ 2.1.2).
This behavior appears to be partially need-driven, with people suffering from chronic or stigmatized
conditions more likely to seek help online. It is also partially driven by a digital divide, in which those
with ready Internet access and technical skills (i.e., younger, wealthier, and more educated people) are
more likely to seek health information online. One exception to the digital divide pattern is gender: 55%
of online health seekers are female (§ 2.1.3).
9 Sites
that promote eating disorders.
20
While medical information portals such as WebMD and MedlinePlus are heavily utilized, most health
information quests begin with search engines. A significant proportion (8-25%) of online health information seekers eventually participate in an OHC (§ 2.1.4).
The primary reason for participating in an OHC is to find others who share the same health concerns.
While we know that people with stigmatized, or otherwise embarrassing, medical conditions are more
likely to participate in OHCs, we know little else about participant demographics, which are rarely studied.
Given the demographic specificity of many medical conditions (e.g., only women acquire breast cancer),
it is likely that such demographics vary widely across conditions (§ 2.2.2).
Users perceive several benefits to participating in OHCs, which we can categorize into: mediumbased affordances – unique and valuable characteristics that OHCs have by nature of being an online,
digital resource; informational support benefits; and emotional support benefits (§ 2.2.3). While acquiring an objective assessment of an OHC’s efficacy is challenging, participation does appear to impart
psychosocial benefits on users, and may play a role in measurably reducing certain symptoms. However, the answer to whether OHC participation benefits those afflicted with conditions that are stimulated
by social contact with similar patients, such as eating disorders, is less clear (§ 2.2.4).
Chapter 3
Prior Work on Patient Authored Text
A great deal of prior work utilizes patient authored text (PAT) as a primary data source. Despite this,
to our knowledge no organized review of data sources, methods, goals and outcomes of such work
exists. Our goal in this chapter is to motivate the utility of PAT as a data source and provide a structured
framework over relevant prior work. We first scope our definition of PAT, and discuss its latent value as a
data source as well as the challenges it poses for analysis (§ 3.1). We then review prior work that uses
PAT as a primary data source. This work tends to fall into one of four categories: syndromic surveillance
(§ 3.2), pharmacovigilance (§ 3.3), entity extraction (§ 3.4), and thematic analysis (§ 3.5). Finally, we
summarize our findings (§ 3.6).
3.1
Patient Authored Text (PAT): Introduction & Overview
We define patient authored text (PAT) as any online, medical text authored by someone who is not a
medical professional. A main source of PAT is online health communities (OHCs): online discussion
forums dedicated to specific health topics where people converse in the form of posted messages.
MedHelp1 , PatientsLikeMe2 and CureTogether3 are all examples of OHCs. Other sources of PAT include
search logs, social media data (e.g. Twitter4 and Facebook5 ), personal blogs (e.g. Lady of Lyme6 ), and
email.
1 http://www.medhelp.org
2 http://www.patientslikeme.com
4 http://www.twitter.com
5 http://www.facebook.com
6 http://www.ladyoflyme.com
21
CHAPTER 3. PRIOR WORK ON PATIENT AUTHORED TEXT
3.1.1
22
Value of PAT
In the process of creating PAT, users are documenting medical data, making sense of it, prioritizing it, and
synthesizing it in order to solve problems that are relevant to them. This is time intensive work, performed
by agents who may well make up in motivation for what they lack in medical expertise. The resulting text
is rich in medical information, with users recording medical histories, comparing treatments, detailing
symptoms and reasoning about differential diagnoses. At a minimum, this culminates in a unique record
of patient behavior outside of the clinical environment. In the case of stigmatized or otherwise embarrassing conditions7 , PAT may well contain medical data that is rarely captured elsewhere. For example,
someone struggling with substance abuse might detail her self-prescribed treatment schedule for withdrawal. In concert, then, PAT comprises a valuable and, in many cases, unique medical data set that is
abundant and readily available. However, PAT is also challenging to work with.
3.1.2
Challenges of Working with PAT
PAT is notoriously difficult to work with. We attribute this to three main reasons: it’s inherent noisiness;
the lack of existing tools for exploring and analyzing it; and the fact that it is often difficult to discern
whether PAT supports any given research question. As we will show in § 3.2-§3.5, prior work tends
to compensate for these challenges by either fixing some variables in a quantitative analysis, or by
conducting small-scale, qualitative analyses.
Noisiness
On the text level, PAT is riddled with spelling and grammatical errors. Compared with expert-authored
text, differences include lexical and semantic mismatches [167, 272], mismatches in consumers’ and experts’ understanding of medical concepts [99,272] and mismatches in descriptive richness and length [99,
167,272]. Consider, for example, the text snippets below, both discussing the predictive value of a family
history of breast cancer. The first snippet is from a medical study by De Bock et al. [68]:
In our study, at least 2 cases of female breast cancer in first-degree relatives, or having at
least 1 case of breast cancer in a woman younger than 40 years in a first or second-degree
relative were associated with early onset of breast cancer.
7 In
Chapter 2 we note that people suffering from stigmatized conditions are more likely to seek help online and to participate in
OHCs.
23
The second (unedited) snippet is from the MedHelp breast cancer community:
im 40yrs old and my mother is a breast cancer surivor. i have had a hard knot about an inch
long . the knot is a little movable. the knot has grew a little over the past year and on the
edge closest to my underarm. i am scared and dnt want to worry my mom ..
Moreover, PAT contributors vary widely in their level of medical expertise, command of medical jargon, and the frequency with which they document their experiences online. Most PAT would be considered unusable from a medical perspective: symptom descriptions, treatments and medical histories are
incomplete, and basic demographic data is absent.
Lack of Analysis Tools
The dearth of tools and methods for mining PAT is likely exacerbated by its noisiness and inconsistencies.
As we discuss in § 3.4, the handful of medical annotation toolkits that do exist are tailored to process
well formatted, expert-authored text (e.g. clinical text, journal publications), and perform poorly on PAT.
As a result, exploring PAT corpora is costly, often requiring researchers to build ad hoc tools for large
scale annotation and extraction. Moreover, as there is no systematic method for exploring the space of
possible approaches to extracting medically useful information from PAT, these ad hoc tools are often
not recyclable.
Applicability to Research Questions
The question of whether or not a PAT corpus supports a given research question is not always obvious,
and depends very much on users’ reasons for authoring the PAT in the first place. Finding a tight
match between a research question and users’ motivations for authoring PAT is crucial for success. For
example, search logs are an appropriate data source for monitoring influenza trends, because users are
motivated to search for their symptoms when they get sick. However, Twitter would be an inappropriate
data source for mining optimal drug dosages, as users tend not to tweet this information en masse.
Determining what data PAT encodes, and how it is encoded, is a costly investment and a separate
challenge from extracting these data.
3.2
24
Syndromic Surveillance
Syndromic surveillance – also known as early warning, outbreak detection, or biosurveillance – is the
utilization of health-related data for the purpose of detecting, analyzing and monitoring potential disease
outbreaks [128]. Syndromic surveillance systems do not necessarily utilize online data: the first such
systems were developed to give advanced notice of bioterrorism attacks – in particular, those related to
anthrax – after 9/11, and utilized data such as pharmacy purchases and emergency room visits [35,127,
128, 163, 207].
However, building syndromic surveillance systems based on PAT is appealing for a number of reasons. The first is users’ proclivity for seeking health information online. For example, it is fairly common
for users to search online for symptoms that they are experiencing, or for conditions that they believe
they might have [156, 254, 256]. As such, data useful for syndromic surveillance tends to accrue naturally, which is preferable to resource-intensive, manual data collection [128, 262]. In addition, collecting
and analyzing online data is fast, enabling advanced (or even real-time) detection of outbreaks, which is
not possible using traditional syndromic surveillance systems [41, 100, 128].
The best known example of a PAT-based syndromic surveillance system is likely Google Flu Trends8 ,
which estimates regional flu activity from aggregated search queries [41]. Google Flu Trends can often
identify flu outbreaks a full 1-2 weeks ahead of the CDC, which bases its reports on laboratory and
clinical data [41]. However, the system is vulnerable to anomalous situations, such as outbreaks of new
influenza strains, or particularly bad influenza seasons [38]. Other challenges to syndromic surveillance
systems based on PAT include their vulnerability to changes in users’ online health seeking behavior [38,
262], making it difficult to estimate false positive and false negative rates [262]. Finally, a successful
syndromic surveillance system requires that a sufficient portion of the population of interest is seeking
health information online, which is not always the case. Below, we outline the chief components of
syndromic surveillance projects.
3.2.1
Condition
Typically, syndromic surveillance systems focus on a single medical condition of interest. To date, the
majority of work on syndromic surveillance focuses on influenza [10, 15, 55, 56, 62, 63, 81, 100, 132, 137,
152, 198]. Exceptions include investigating general infectious disease outbreaks [33, 52, 109, 262], Lyme
8 http://www.google.org/flutrends/us
25
Disease [221], and potential foodborne illness outbreaks at restaurants [213]. Syndromic surveillance
techniques have also been used to monitor “non-outbreak” conditions or behaviors. For example, Cooper
et al. [53] use syndromic surveillance techniques to monitor cancer prevalence, while Ayers et al. [18]
use them to track the popularity of electronic nicotine delivery systems (e-cigarettes).
3.2.2
Data Source
People searching for their own symptoms online is a well documented phenomenon [254, 257]. Accordingly, search logs are a natural choice for a syndromic surveillance data source, and are successfully
utilized in several instances of prior work [18, 53, 100, 132, 198, 221]. More recently, Twitter has come
to light as another suitable source [10, 15, 62, 63, 152, 213], suggesting that users are prone to mentioning when they, or someone around them, falls ill. Rarer data sources include blogs [55, 56], website
access logs [137], and aggregated web data (a combination of search logs, news articles, RSS feeds
etc.) [33, 52]. The latter may be particularly appropriate when trying to survey regions in which the
population of interest has limited education and/or Internet access, such as developing countries.
3.2.3
Filtering
As syndromic surveillance aims to correlate online frequency data with real-world epidemiological trends,
separating signal from noise in the data stream is important. Mentions of a condition do not necessarily
correlate with real-world instances of it [152].
On the simple end of the spectrum is keyword filtering. While common [10, 18, 53, 55, 56, 198, 221],
this approach has several shortcomings. First, relying on a static set of keywords makes the system
susceptible to over-fitting [62], as well as fluctuations in the use of those keywords that are unrelated
to the disease in question [15, 38, 100]. For example, a news story on flu could galvanize a “burst” of
online activity around the topic of flu, even while infection levels in the population remain unchanged.
Finally, although keywords are occasionally picked in a principled and consistent manner (e.g. Ginsberg
et al. [100] pick keywords based on how their frequency fluctuations correlate with regional influenza activity), in general selection is arbitrary and prone to human misjudgment. For example, spelling variations
of keywords may be ignored [56].
Other work indicates that more nuanced filtering yields higher quality results [62, 152]. One such
approach is to train statistical classifiers to automatically identify whether a datum is relevant or not. Both
Support Vector Machines (SVMs) [15,193] and other simple bag-of-words models [52,62,63] have been
26
successfully leveraged to identify data that correspond to actual influenza infections. Moreover, Lamb
et al. [152] show that using binary classifiers to acquire even more detailed information (specifically,
whether a tweet is about the author or about someone else; whether a tweet represents an awareness
vs. an instance of flu; and whether a tweet is flu-related or not) greatly improves prediction.
3.2.4
Modeling and Prediction
In the case of syndromic surveillance systems that focus on a specific condition (e.g. influenza), linear
models are commonly used to predict trends from the filtered data [10, 62, 63, 100, 152, 198]. Simpler
approaches do not model the filtered data, deeming frequency counts sufficient for reflecting real-world
trends [15, 53, 55, 56, 137, 221].
The few syndromic surveillance systems attempting to monitor a range of diseases require the additional step of identifying specific diseases and geographic locations [33, 52]. Of note is the approach
used by Paul et al. [193], who use topic modeling over their filtered data to acquire distributions of ailments over time. One key advantage of this approach is its ability to surface new diseases without
manual intervention [193].
3.2.5
Real-World Evaluation Dataset
In order to prove the utility of a syndromic surveillance system, a corresponding real-world metric of the
same phenomenon that the system is trying to measure is required for comparison. In the case of influenza, the CDC frequently releases timely data on cases of influenza-like illnesses detected through its
traditional surveillance systems9 . It is likely that the availability of this data set is the driving force behind
the fact that almost all PAT-based syndromic surveillance research focuses on the topic of influenza.
3.3
Pharmacovigilance
Pharmacovigilance is concerned with detecting, monitoring and preventing adverse affects related to
pharmaceutical products. Like syndromic surveillance, traditional Pharmacovigilance systems are offline, typically comprising adverse drug event reports contributed by patients, physicians and pharmacists, which are collected by the United States Food and Drug Administration10 . Many of the appeals
9 http://www.cdc.gov/flu/weekly/fluactivitysurv.htm
10 http://www.fda.gov
27
of making online-based syndromic surveillance systems apply to Pharmacovigilance. However, by construction Pharmacovigilance is a more complex problem: whereas syndromic surveillance systems typically monitor only a single variable (e.g. how many people have the flu), an adverse event involves at
least two elements: a drug and an adverse effect (e.g. unexpected side effects). Extracting such entities
can be challenging. Unlike syndromic surveillance, prior work on Pharmacovigilance addresses a wide
array of topics and conditions. Below, we discuss important components of Pharmacovigilance systems.
3.3.1
Data Source
In order to leverage the advantages of both scale and relevant content, researchers must find a large
source of PAT where patients typically disclose both which drugs they use as well as adverse events they
experience. Online health communities (OHCs) are rich with discussions disclosing users’ medications,
symptoms and current health states (see Chapter 2). Accordingly, almost all work on PAT-based Pharmacovigilance utilizes OHC communications as a primary data source [23, 45, 154, 171, 183, 265, 266,
268, 269]. To our knowledge, the only exception to this is also arguably the most successful & impactful
work on Pharmacovigilance: White et al [257] successfully utilize search query logs to discover a novel
adverse drug-drug interaction, which was later proved in medical trials.
3.3.2
Identifying Drugs in PAT
Identifying drugs in PAT is challenging: in addition to the many spelling variations of a drug that might
be present in a PAT data set, users may mention several drugs at once, making it difficult to tell which
one is responsible for the adverse event [119]. Accordingly, only a handful of prior Pharmacovigilance
work attempts to explicitly identify drugs related to adverse events in a data set. Yang et al. [265, 266]
extract drug entities using a lexicon, and Yates et al. [269] train a conditional random field (CRF) model
for this purpose. A more common approach is to pre-select a small number of drugs of interest, filter
the original data set for mentions of these drugs, and then attempt to extract adverse events from these
filtered data [154, 171, 183, 257, 268].
Chee et al. [45] take a different approach that is worth noting. Rather than attempting to extract {drug,
adverse event} pairs, they use an ensemble classifier over OHC text to identify drugs that are similar
to “watch list” drugs: drugs that already have adverse effects reported by the FDA11 . Unfortunately this
method gives no insight into why a drug might be worthy of inclusion on such a list.
11 http://www.fda.gov/Safety/MedWatch
3.3.3
28
Identifying Adverse Events in PAT
Unlike the drug involved in an adverse event, the adverse events themselves are rarely fixed: typically
a Pharmacovigilance system will attempt to identify any adverse event related to a particular drug. The
list of extracted events is then somehow ranked and given to an human reviewer for analysis. Yang et
al. [265,266] and Lehman et al. [154] identify adverse events in PAT by first compiling lexicons describing
adverse events, and then scoring matches against sliding n-gram windows over PAT sentences.
Yates et al. [269] train a CRF to identify adverse events in PAT. Nikfarjam et al. [183] learn patterns
from text about known adverse drugs; they then apply these patterns to identify new adverse events.
White et al. [257] are the sole exception to extracting an open set of adverse events: rather, they limit
their extraction to a pre-specified set of symptoms related to hyperglycemia. The fact that theirs is
arguably the most successfully Pharmacovigilance system to date suggests that this may be a promising
approach.
3.3.4
Evaluation
In general, evaluating the efficacy of Pharmacovigilance systems is difficult: results typically contain
several known indications; the remaining result elements are either false positives, or true positives that
have yet to be detected via traditional reporting mechanisms. In general, most work serves as a proof of
concept that some adverse drug events manifest in PAT, but there is little quantification of how many and
how strongly different events are represented. Most importantly, determining how to surface the most
relevant true positives remains an area for future work. The work by White et al. [257], which rigorously
demonstrates the existence of the connection between paroxetine, pravastatin and hyperglycemia in
PAT (predating the FDA’s discovery of this), comes closest to proposing a methodology for doing this.
However, their approach lacks flexibility in that both their drugs and adverse events of interest were
predefined.
3.4
Named Entity Recognition
Named entity recognition (NER) is an information extraction task in which the goal is to develop methods
that automatically identify entities of a specific type from text. For example, extracting drugs, adverse
events or symptoms from medical records are all NER tasks. In general, there are two ways to go about
medical NER in PAT: the first is to use state of the art ontology-based tools, which work “straight out of
29
the box”, but have poor performance on PAT. The second is to use custom statistical classifiers, which
tend to have high accuracy, but require large volumes of labeled data for training and testing. We discuss
each in detail below.
3.4.1
Ontology-Based Tools
Historically, the go-to tools for medical text annotation are MetaMap12 [17] and, more recently, the Open
Biomedical Annotator (OBA)13 [138]. These tools are ontology-based, meaning that they search through
text for matches against underlying ontologies (curated vocabularies of medical terms and the relationships between them) [17, 138]. While these tools are capable of fine-grained entity resolution, a
previous study [201] comparing OBA and MetaMap against human annotator performance underscores
two sources of performance error on PAT. The first is ontology incompleteness, which results in low recall, and the second is inclusion of contextually irrelevant terms. For example, when restricted to the
RxNORM ontology and semantic-type Antibiotic (T195), OBA will extract both “Today” and “Penicillin”
from the sentence “Today I filled my Penicillin rx”. We observe the same limitations in Chapter 5 and in
later collaborative work with Gupta et al. [112].
Despite recent efforts to develop an ontology suitable for PAT - the open and collaborative Consumer
Health Vocabulary (OAC) CHV [77, 273, 274] - we suspect that tools like MetaMap and OBA will remain
ill-suited to the task of medical term identification in PAT due to structural differences between PAT and
text authored by medical experts that we discuss in § 3.1.2. Finally, in addition to including misspellings
and slang, consumer medical jargon may evolve over time as patients acquire expertise.
3.4.2
Statistical Classifiers
A natural alternative to ontology-based tools are statistical classifiers, which can be trained to extract
biomedical entities of interest with high accuracy. However, such methods require sizable corpora of
labeled data for training and evaluation. This is problematic in the medical domain, as having medical experts annotate text is both expensive and time consuming. Only a handful of publicly available
annotated medical corpora exist, all of them comprised of annotated biomedical journal publication
abstracts (i.e. expert authored text) [145, 146, 204, 271]. This has had the dual effect of generating a plethora of prior work demonstrating the efficacy of statistical-based approaches to biomedical
12 http://metamap.nlm.nih.gov
13 http://bioportal.bioontology.org/annotator
30
NER [76, 87, 95, 124, 125, 214, 238, 239, 267], but little work that explicitly examines PAT as a potential
data source.
Our work on ADEPT (Chapter 5) is an exception to this. By proving that crowdsourcing medical term
annotations yields labels comparable in quality to experts’, we were able to use crowd-labeled PAT to
train a conditional random field (CRF) classifier to identify medically-relevant terms in PAT. However, we
also find that crowdsourcing is not always a ready solution to PAT annotation tasks (§ 5.7). In Chapter 7
we show that a CRF similarly extracts users’ drugs of choice (preferred substances of abuse) from
PAT from a manually-labeled data set. Later work in collaboration with Gupta et al. [112] shows that
the unsupervised method of lexico-syntactic pattern induction is a promising approach for extracting
specific types of biomedical entities (including symptoms & conditions, as well as drugs & treatments)
from PAT. This approach is also employed by Xu et al. [264], although our method achieves higher
scores. Finally, other work demonstrating entity extraction on PAT includes some of the work discussed
in Pharmacovigilance (§ 3.3), which utilizes CRFs [269] and pattern learning [183] to extract drugs and
adverse events from PAT.
3.5
Thematic Analysis
Thematic analyses (sometimes called content analyses) involve the systematic reading of text with the
goal of eliciting a taxonomy (i.e., an organized collection of significant patterns and themes) that describes the source data. While some literature outlines standard practice for thematic analyses [30,
110, 236], it is infrequently referenced, and methods utilized in applied research tend to be somewhat
ad hoc. Thematic analysis is the most extensively used qualitative analysis technique [110], and in our
experience, the most common type of analysis applied to PAT, easily outnumbering work on syndromic
surveillance, pharmacovigilance, and Named Entity Recognition. This is likely due to the fact that (1)
thematic analyses are easy to apply: any kind of text is a suitable candidate for thematic analysis, which
is not true for quantitative analyses requiring automated extraction, (2) they are interesting: the results
of a thematic analysis over PAT almost always satisfy our latent curiosity about what people actually do
online in relation to their own health, and (3) they are useful: illuminating corpus content via thematic
analysis is a sensible precursor to higher-investment, quantitative research with automated components.
Below, we discuss compare and contrast prior work that conducts thematic analyses on PAT.
3.5.1
31
Condition
There is a great deal of diversity in the conditions studied via thematic analysis. Stigmatized, or otherwise embarrassing, conditions receive notably more coverage than they do in syndromic surveillance, pharmacovigilance or NER. Examples include smoking cessation [180, 197], infertility [160, 161],
HIV/AIDS [61, 177], Huntington’s disease [59], irritable bowel syndrome [58], and post-partum depression [69]. Underlying the interest in these topics is likely the fact that PAT comprises a unique data
source, especially for stigmatized conditions. Another common topic of study are conditions that have a
behavioral component through which the user can directly influence health outcomes. These include diabetes [107,206], smoking cessation [180,197], weight loss and fitness [134,142,217,240], and general
wellness [108].
3.5.2
Data Source
The majority of thematic analyses focus on online health communities (OHCs) [29, 34, 58–61, 101, 134,
160, 161, 177, 206, 220, 233], a natural choice given the volume and richness of OHC text. However,
contemporary PAT thematic analyses also turn to Twitter [69, 135, 142, 170, 180, 218, 234, 235, 240] and
Facebook [22, 71, 107, 197]. Other data sources include search logs [224, 255, 256], email [13], and
personal blogs [217].
3.5.3
Analysis Question
Thematic analyses are, by nature, exploratory, and researchers leverage them to answer a wide array of questions. A frequent focus is unearthing users’ reasons for participating in a particular OHC,
which alludes to the question of what role the community plays in helping users meet their health
goals [13, 34, 60, 134, 142, 161, 206, 217, 224, 235]. Results usually contain some interesting insights.
For example, Hwang et al. [134] find that online support groups for weight loss are an important source
of encouragement as well as friendly competition. Relatedly, Kendall et al. [142] find that people use
Twitter to realize their fitness goals in two ways: the first is to publish evidence of having worked out, the
second is to publish a commitment to work out in the future.
The assumed role of many OHCs is to provide users with support. In such cases, a natural question
to ask is what types of support users receive. Results are practically unanimous in noting that users
seek primarily informational and emotional support [58, 59, 61, 153, 177, 197, 224].
32
In larger communities that are not necessarily specifically health-oriented (e.g. Twitter and Facebook), the research question often takes the angle of, “When people mention X on interface Y, what
do they talk about?”. A wide range of health topics have been analyzed on Twitter along these lines,
including insomnia [135], epileptic seizures [170], and concussions [234], often with interesting insights.
For example, Scanfeld et al. [218] find that Tweets mentioning antibiotics often indicate misuse. McNeil
et al. [170] note that most tweets about concussions are in reference to professional sports injuries,
and Bender et al. [22] find that a great deal of breast cancer related discussion on Facebook involves
fundraising.
Finally, a handful of thematic analyses investigate how the experience of an illness can differ by
gender. Makil et al. [160, 161] investigate infertility, paying special attention to the experience of men
whose partners are infertile. Another topic that has received some attention is how coping and self-help
mechanisms differ between people with breast cancer and prostate cancer. In general, these studies find
that men seek more informational support and less emotional support than women do [101, 220, 233].
3.5.4
Scaling Thematic Analyses
Only a handful of prior work uses thematic analysis results as the foundation for a larger-scale analysis
of PAT. Most notable is that by De Choudhury et al. [69–71], who analyze how postpartum depression
(PPD) is characterized on both Twitter and Facebook. Using their findings, they leverage activity and
linguistic features to build models that can predict the onset of PPD from Facebook data [71]. Also of
note is the work on cyberchondria by White & Horvitz [255, 256], who analyze health-related search
logs and leverage the results of their analysis to model anxiety escalation and predict the transition from
self-diagnosis to seeking medical assistance. Our work on identifying users’ reasons for participating in
Forum77 (Chapter 6) and their transitions through addiction (Chapter 8) also implements scaled thematic
analyses.
Results of scaled thematic analyses are especially powerful, as they provide both a novel, insightful
contextualization of PAT acquired via close reading of a small sample, as well as population-level insights
acquired via extending these results through automated annotation and large-scale analysis. As such,
their rarity is puzzling: it is possible that many researchers who conduct thematic analyses do not have
experience with machine learning. Alternatively, categories derived in thematic analyses may be too
fine-grained for classifier training. A final explanation may be that there is sufficient reward for publishing
the results of a thematic analysis without investing the resources required to scale it.
3.6
33
Summary
Our goal in this chapter was to motivate PAT as a data source and present a comprehensive overview of
relevant prior work. We define PAT as any medical text authored by someone who is not a medical professional (§ 3.1). PAT, which is often the product of many human hours spent on complex health-related
problem solving, provides a unique window into patient behavior outside of the clinical environment
(§ 3.1.1). However, it is also challenging to work with: PAT is noisy, few tools support mining and exploring it, and determining what medical data PAT encodes, and how, is often unclear upon casual inspection
(§ 3.1.2). This underscores the importance of matching research questions with users’ motivations for
authoring PAT in the first place.
Work utilizing PAT as a primary data source tends to fall into one of four categories. Syndromic
Surveillance (3.2) and Pharmacovigilance (3.3) both involve processing large quantities of data in order
to monitor health-related variables. Entity extraction (3.4), which lies under the purview of Natural Language Processing and Machine Learning, concerns the identification of specific entities in PAT. Finally,
on the qualitative side, thematic analyses (3.5) involve close readings of text in order to gain insight into
its structure and content.
PAT-based syndromic surveillance systems have great potential in the toolbox of techniques for the
real-time monitoring of medical conditions. To date, the majority of such systems focus on the topic
of influenza, relying either upon search query logs or Twitter as a primary data source. Filtering the
PAT data stream for relevant entities is crucial for a cleaner signal: although keyword-based filtering is
popular due to its simplicity, training classifiers to discriminate relevant from irrelevant data produces
superior results. Often, frequency counts of these filtered data are compared as-is to real-world gold
standards (most commonly, the CDC ILI data set14 ), but prior work shows that linear models built on
these data have promising predictive value.
Pharmacovigilance (§ 3.3) is concerned with detecting adverse effects related to pharmaceutical
products in real-time. PAT comprises a potentially valuable, but difficult to work with, data source for Pharmacovigilance [119]. Most prior work focuses on online health communities (OHCs), although search
logs have also been shown to be a viable data source for web-scale pharmacovigilance [257]. While
many systems demonstrate the ability to identify {drug, adverse event} pairs, automatically identifying
14 http://www.cdc.gov/flu/weekly/fluactivitysurv.htm
34
which of these pairs (amongst thousands) are important is an unsolved problem. To date, no work has
presented a viable predictive model for adverse drug events.
A great deal of work on biomedical named entity recognition (NER) exists. While ontology-based
MetaMap and Open Biomedical Annotator are the go-to tools for medical term annotation, they perform poorly on PAT for two reasons: first, ontologies have insufficient coverage of consumer medical
terminology. Second, their lack of context sensitivity leads to over-inclusion of irrelevant terminology in
results.
Statistical classifiers have been shown to achieve high accuracy in biomedical NER tasks. However,
these approaches are limited by their requirement for a sizable corpus of annotated data for training and
testing. Most research on biomedical NER utilizes existing publicly available data sets, which are based
on abstracts from biomedical journal publications. Consequently, little prior work on biomedical NER
in PAT exists. Exceptions to this include some of the work on Pharmacovigilance [183, 269], and our
work on ADEPT (Chapter 5), identifying drugs of choice (Chapter 7) and using patterns to extract entity
types [112, 264].
Thematic analyses over PAT cover a wide array of conditions. However, notably present are stigmatized conditions and conditions that have a behavioral component through which the user can influence
health outcomes. Online health communities, Twitter and Facebook are the most commonly utilized PAT
sources for thematic analyses. As thematic analyses are exploratory by nature, they are used to answer
a wide array of questions. Common topics include elucidating users’ reasons for participating in an online community as well as what kinds of support such a community provides. The results of a thematic
analyses can be used to train automatic classifiers, thereby extending the research from a small PAT
sample to large PAT corpora. While prior work demonstrates the power and value in this approach, it is
rare.
In sum, PAT is a valuable data source that has been proven to have clinical value. However, PAT is
challenging to work with. To date, prior work on PAT tends to be either structured in such a way as to
reduce the number of variables being analyzed, making analysis and evaluation easier (e.g. syndromic
surveillance, pharmacovigilance, NER), or focuses on qualitative analyses of PAT (e.g. thematic analyses). Although little work builds automated extraction and analysis on top of the results of a thematic
analysis, prior work, as well as our findings in Chapters 6-8, indicate that this approach yields novel and
valuable insights.
Chapter 4
Data
In this chapter we describe our PAT data sets and define terminology relevant to our work. We first
present our full MedHelp data set (§ 4.1), which we use in our work on medical term identification
(Chapter 5), and define key terminology (§ 4.1.1). We then describe Forum77 (§ 4.1.2), a subset of the
MedHelp data set, which we use for our work on addiction (Chapters 6, 7 and 8). Finally, we present our
CureTogether data set (§ 4.2), which we use as an independent test set in Chapter 5. We acquired our
data sets through research agreements with MedHelp and CureTogether, respectively, who anonymized
the data prior to sharing them.
4.1
MedHelp Corpus
MedHelp1 is an online health community designed to aid users in the diagnosis, exploration, and management of personal health conditions. The site boasts a variety of tools and services, including over
200 condition-specific user online health communities (OHCs). Our data set comprises all discussions
on all of MedHelp’s forums from 2006 through mid-2011: a total of ∼1,250,000 threads. Table 4.1 lists
the top 40 MedHelp forums by post volume, along with unique contributor counts.
4.1.1 Terminology
Figure 4.1 provides an illustrative example of the composition and content of our MedHelp data. A forum
comprises several threads (or discussions) centered around a specific medical condition (e.g. addiction,
breast cancer, etc.). A thread is composed of an initiating post, in which the initiator posts new content
for the community’s consideration, and a series of response posts, in which respondents contribute to
1 http://www.medhelp.com
35
CHAPTER 4. DATA
36
Table 4.1: Top 40 MedHelp forums ranked by total post count. A ◦ in the Stigmatized column denotes our
conservative estimate of whether the condition represented by the forum carries a stigma or is otherwise
embarrassing.
Stigmatized
◦
◦
◦
◦
◦
◦
◦
◦
◦
◦
◦
◦
Forum
Post count
Unique users
Addiction: Substance Abuse
486,972
32,542
Maternal & Child
402,065
45,821
Pregnancy 18-34
364,475
28,321
Hepatitis C
343,433
14,330
HIV Prevention
274,072
27,528
Fertility
243,919
17,391
Women’s Health
208,683
76,221
Thyroid Disorders
169,713
21,939
Multiple Sclerosis
156,500
5,545
STDs
117,462
29,455
Neurology
111,671
47,968
Dermatology
107,134
47,612
Ovarian Cancer
99,954
10,425
Anxiety
98,971
17,373
Herpes
89,792
17,061
Undiagnosed Symptoms
82,301
30,741
Gastroenterology
79,659
32,694
Heart Disease
74,671
22,294
2,122
Hepatitis Social
74,412
Pregnancy 35+
72,414
5,923
Eye Care
70,744
18,666
3,253
Addiction: Social
68,831
Heart Rhythm
57,001
9,496
Child Behavior
45,660
14,961
Relationships
42,891
4,724
Pain Management
42,099
7,990
Breast Cancer
41,197
10,869
Urology
37,121
17,351
Weight Loss Alternatives
36,925
15,003
Depression
35,614
9,035
Chiari Malformation
32,493
1,892
Sexual Health
32,269
11,344
MedHelp Social
31,800
778
Men’s Health
31,712
14,832
Bipolar Disorder
29,057
3,775
Back & Neck
28,926
13,082
Hepatitis B
28,664
4,621
Ear, Nose & Throat
28,439
14,244
Miscarriages
26,043
3,703
CHAPTER 4. DATA
37
the discussion galvanized by the initiating post. When an initiator posts a response to a thread that she
started, this post is called a self-response.
While features for sub-discussions (nested responses) as well as picking a “best response” in a
thread do exist, they are used infrequently and we do not consider them in our analyses. Moreover, we
have neither demographic data (age, geographic location etc.) describing MedHelp users nor page view
data describing lurking (reading without posting – see § 2.2.1) behavior.
MEDHELP COMMUNITIES
ADD/ADHD
Addiction (Forum77)
Allergies – Food
Allergy
Arthritis
Asthma
Autism
Back & Neck
Bipolar Disorder
Bone Cancer
Breast Cancer
Breastfeeding
Cancer
Carpal Tunnel Syndr.
Celiac Disease
Cerebral Palsy
Cervical Cancer
Chemotherapy
DISCUSSION THREAD
FORUM77
Suboxone withdrawal
By liquid_daisy 10 hours ago
10
3
By liquid_daisy 6/12/2012
I quit cold turkey off 32mgs of suboxone. Today
is day 5 and I’m in a lot of pain. I just want to
know how long these withd…
the best way?
By sparklystars 23 minutes ago
3
I want to come off 10 percs per day. Is it better
to taper, or to go gold turkey???
oxycodone
12
By oxyuser 5 hours ago
I have been taking vics for about 5 years now.
At times I have taken as much as 40 a day. I’m
sorta on day 3. I took 1 pill y…
300 DAYS for LEX!!!
By happystar 6/12/2013
19
Guess who had 10 months clean today!?? LEX,
you go girl!!! Great job we are all sooooooo
proud of you!
Can you withdraw from Lyrica?
By fl12abs 6/11/2013
My doctor prescribed Lyrica for chronic
Suboxone withdrawal
2
INITIATING
POST
I quit cold turkey off 32mgs of suboxone. Today
is day 5 and I’m in a lot of pain. I just want to
know how long these withdrawals will last…? Is
there anything I can get OTC that will help???
Thanks.
10 responses
Boo28 on 6/12/2012
Congrats on the 5 days clean! 32mgs is a high
dose to CT, but doable. First, some questions:
are you on any other medications? What other
w/d symptom…
RESPONSES
yellowPop on 6/12/2012
hi congrats and keep posting for support. I
myself jumped from 44mgs although it wasn’t
pretty. Physical w/ds tend to last 10 days to 2
weeks but everyone is diff…
liquid_daisy on 6/12/2012
SELF
RESPONSE
No diarrhea, just cold sweats. I stay busy so
that I don’t let my mind wander. Don’t have
much of an appetite, but redbulls seem to
help… chugged 4 today alrea…
Figure 4.1: Illustrative example of MedHelp and Forum77 content and structure.
4.1.2
Forum77
MedHelp’s largest forum is dedicated to the topic of Addiction: Substance Abuse2 . We dub this community Forum77 3
Our data set covers all Forum77 content from 2007 to mid-2014 (7.5 years), and comprises 80,529
discussions (740,046 total posts) authored by 51,153 unique users. Figure 4.2 illustrates summary statistics describing content and activity on Forum77. As expected, the volume of response posts correlates
strongly with the volume of initiating posts; moreover, both experience a slight decline from 2009 - 2014
(Figure 4.2 (A)). While the number of new users to Forum77 varies widely each month, the number of
2 http://www.medhelp.org/forums/Addiction-Substance-Abuse/show/77
3 All of MedHelp’s forums have a unique identifier, and the Addiction: Substance Abuse community’s is 77. We settled on
Forum77 as a convenient way to refer to this community. To our knowledge nobody within the community refers to it as Forum77.
100
12
24
36
48
60
100
90
80
70
60
50
40
30
20
10
9
8
7
6
5
4
3
2
1
72
10
20
30
40
50
60
70
80
7/26/2014 20,000
localhost:8081/index_hist.html
7/24/201418,000
16,000
40,000
30,000
20,000
9,000
7/24/2014 10,000
8,000
7,000
6,000
5,000
CHAPTER
4. DATA
4,000
3,000
38
20,000
2,000
14,000
1,000
20,000
10,000
20,000
900
800
700
12,000
600
500
400
300
10,000
10,000
10,000
200
9,000
9,000
8,000
8,000
return
users,
which
comprise
the
core
community
base,
is more consistent: in any given month there
100
7,000
7,000
90
80
8,000
70
6,000
6,000
60
50
5,000
5,000
40
30
4,000
4,000
6,000
20
3,000
3,000
10
are between
200 - 300 return users participating in the1,000
forum (Figure 4.2 (B)). This is consistent with
9
4,000
8
7
2,000
6
2,000
5
4
2,000
3
2
1,000
0
900
900
1
800
800
user1,000
tenure
distribution on Forum77: while most users have
of ≤3 1 month,
a6 long
tail
indicates
700
700a0 tenure
600
600
1
2
4
5
7
8
9
10
10
20
30
40
50
60
70
80
500
500
200
400
400
300
300
2008 Finally,
2009 2010
2011
2012
2013 posts
several
thousand
users
who
have
tenure
>
1
year
(Figure
4.2
(D)).
while
some
initiating
200
200
20,000
12
24
36
48
60
72
12
24
36
48
60
72
18,000
get 16,000
no responses, most get at least one, and modal thread length is 4 posts (Figure 4.2 (C)).
7/26/2014 14,000
localhost:8081/index_hist.html
12,000
800
10,000
20,000
8,000
700
800
800
10,000
6,000
600
8,000
700
700
4,000
7,000
500
600
600
Responding
2,000
New users
6,000
400
0
500
500
5,000
0
1
2
3
4
5
6
7
8
9
10
300
400
400
1,000
4,000
200
300
300
3,000
Return users
100
200
200
Initiating
2,000
200
100
100
1,000
2008 2009 2010 2011 2012 2013
2007 2008 2009 2010 2011 2012
0
Year 48
12
24
36
60
72
0
512 1024 15 36 Year
20 48 25 6030 72
35
40
B
User count
20,000
40,000
18,000
30,000
20,000
16,000
10,000
9,000
8,000
7,000
6,000
14,000
5,000
4,000
3,000
2,000
12,000
1,000
900
800
10,000
700
600
500
400
300
8,000
200
6,000
100
90
80
70
60
50
40
4,000
30
20
2,000
10
9
8
7
6
5
40
3
2 0
1
40,000
C
10,000
40,000
30,000
20,000
10,000
9,000
8,000
40,000
7,000
6,000
30,000
5,000
1,000
4,000
20,000
3,000
2,000
10,000
9,000
8,000
7,000
6,000
1,000
5,000
900
800
4,000
700
600
3,000
500
100
400
2,000
300
200
1,000
900
800
700
600
100
500
90
80
400
70
60
300
50
10
40
200
30
20
100
90
80
70
60
10
50
9
8
40
71
6
30
5
4
20
3
28
10
0
9
7
16
5
4
3
2
1
D
User count
8,000
800
7,000
700
40,000
6,000
30,000
20,000
600
5,000
10,000
9,000
8,000
7,000
6,000
5,000
500
4,000
4,000
3,000
2,000
400
1,000
3,000
900
800
700
600
500
400
300
2,000
200
100
90
200
80
1,000
70
60
50
40
30
0
100
20
10
9
0
8
7
6
5
4
3
2
1
5
10
15
20
25
30
10
20
30
40
50
60
40+
35
Thread
length 48
(# posts)
24
36
60
12
72
70
80
10 15 20 25 30 35 40 45 50 55 60+
5
10
20
30
40(months)
50
60
Tenure
10
20
30
40
20,000
18,000
20,000
16,000
18,000
14,000
16,000
http://localhost:8081/index_hist.html
12,000
14,000
10,000
12,000
8,000
10,000
6,000
8,000
4,000
6,000
2,000
4,000
0
2,000 0
1
2
3
4
5
0
9
10+
0
1
2
3
4
5
50
70
80
70
80
60
E
F
User count
User count
Thread count
Post count
A
1
2
3
4
5
6
7
Initiating
posts
per
user
30
40
50
60
10
20
8
70
6
7
8
9
10
6
7
8
9
10+
Responses per user
80
Figure
4.2: Summary statistics of Forum77 variables: post volume by month (A), user volume by month
20,000
http://localhost:8081/index_hist.html
(B),18,000
thread length distribution (C), user tenure distribution (D), user initiating post count distribution (E),
16,000
and14,000
user response post count distribution (F).
12,000
10,000
8,000
6,000
4,000
2,000
0
8,000
7,000
6,000
0
1
2
3
4
5
6
7
8
9
10
8,000
7,000
6,000
5,000
4,000
3,000
2,000
1,000
0
40,000
30,000
20,000
10,000
9,000
8,000
7,000
6,000
5,000
4,000
3,000
2,000
1,000
900
800
700
600
500
400
300
200
100
90
80
70
60
50
40
0
5
10
15
20
25
30
35
40
localhost:8
localhost:
CHAPTER 4. DATA
4.2
39
CureTogether Corpus
CureTogether4 is an online health community that focuses on collecting structured health information
from its members via surveys. The site covers a wide array of medical conditions (589 in our data set),
each associated with a curated collection of symptom, treatment, side effect and cause/trigger terms. By
focusing on collecting structured data, CureTogether circumvents the problem of extracting medicallyrelevant information from PAT. However, discussion levels on the site are low: our data set contains
∼3,000 free-text posts on a variety of CureTogether’s medical topics. Despite this, these posts are
detailed and thoughtful and suffice, in Chapter 5, as a suitable PAT source independent from MedHelp.
Chapter 5
Identifying Medically Relevant Terms in
PAT
5.1
Introduction
When we began exploring our MedHelp corpus, we realized that our efforts were severely hampered
by the absence of a good solution to a seemingly simple problem: identifying the medically relevant
terms in PAT. How, for example, might one automatically extract the terms that we have flagged as
medically relevant in the following excerpt from MedHelp’s Addiction: Substance Abuse forum?
So, I’m 62 hours without pills, and its definitely getting worse, I ache all over, the anxiety is
the worst, along with restless legs but I ’m here now, and I’m not sure it can get much worse
so hopefully soon I’ll be out the other side. Last night was horrible. I had around 3 hours
broken sleep, night sweats and the most awful haunting nightmares when I was sleeping.
I’ve taken the l-tyrosine and B6 this morning, I’ll try and force some food down me shortly
and then take the rest of the vitamins.
The ability to distill medically relevant terms from PAT is useful for exploration: it filters out irrelevant
content, allowing for high-level insights into the corpus and facilitating hypothesis generation. More
sophisticated analyses can also be implemented on the extracted terms. The results of co-occurrence
analyses, for example, can improve query expansion and information retrieval over a corpus [194, 219,
245], or can be used to impose additional structure, such as clustering [39] or hierarchical concept
summaries [216], over the source data. In a PAT corpus, significant term co-occurrences could be used
to build a “map” of important links between symptoms and treatments.
40
CHAPTER 5. IDENTIFYING MEDICALLY RELEVANT TERMS IN PAT
41
Identifying medical concepts in text is a long-standing research challenge that has spurred the development of several software toolkits [17]. Those such as MetaMap1 and the Open Biomedical Annotator
(OBA)2 focus primarily on mapping words from text authored by medical experts to concepts in biomedical ontologies. A biomedical ontology is essentially a controlled collection of terms and the hierarchical
relationships between them. Usually, ontological terms are also categorized or typed (e.g., drug, sign or
symptom, medical device, etc.).
Thousands of biomedical ontologies exist, and differ according to the topic or level of specificity
covered by their terms. For example, the MOFEM3 (Emotion Ontology) covers concepts specifically
related to affective phenomena, while SNOMED-CT4 (Systemized Nomenclature of Medicine - Clinical
Terms) covers a broad array of clinical terms. Curating ontologies is a labor intensive process, in which
people must agree on which terms should be included, removed, combined or split, must categorize said
terms, and must define their hierarchical relationships.
Despite recent efforts to develop an ontology suitable for PAT - the open and collaborative Consumer
Health Vocabulary (OAC) CHV [77, 273, 274] - we suspect that tools like MetaMap and OBA will remain
ill-suited to the task of medical term identification in PAT due to structural differences between PAT and
text authored by medical experts. As we note in § 3.1.2, such differences include lexical and semantic
mismatches [167, 272], mismatches in consumers’ and experts’ understanding of medical concepts [99,
272] and mismatches in descriptive richness and length [99, 167, 272]. Finally, consumer medical jargon
may evolve over time as a patient acquires expertise. This would be a challenge for ontologies which
are, by design, inflexible and brittle.
Our goal is to automatically and accurately identify medically relevant terms in PAT. (Note that we do
not attempt to map terms to ontological concepts; we view this as a separate and complementary task.)
As acquiring annotated data sets is a major obstacle to classifier training, we investigate crowdsourcing
as an alternative option to having medical professionals label PAT (§ 5.4). First, we discuss the process
of designing the crowdsourcing task (§ 5.4.1). Next, we compare crowdsourced annotations from nonexperts (Amazon’s Mechanical Turk5 workers (Turkers)) and medical experts (Registered Nurses hired
via ODesk6 ) (§ 5.4.2). We find that crowdsourcing PAT medical term identification tasks to non-experts
achieves results comparable in quality to those given by medical experts (§ 5.4.3). While this result
2 http://bioportal.bioontology.org/annotator
3 http://bioportal.bioontology.org/ontologies/MFOEM
4 http://www.ihtsdo.org/snomed-ct
5 http://www.mturk.com
6 http://www.odesk.com
42
opens a new avenue for rapid and affordable PAT annotation, not all PAT annotation tasks are amenable
to crowd labeling (§ 5.4.4).
Next, we train a conditional random field (CRF) classifier to automatically identify medically relevant
terms in PAT (§ 5.5). Our classifier, trained on 10,000 crowd-labeled PAT sentences, dramatically outperforms state-of-the-art annotation tools MetaMap, OBA and TerMINE (§ 5.5.3). We call our classifier
ADEPT (Automatic Detection of Patient Terminology). In an error analysis, we observe that ADEPT
has the most trouble correctly classifying “generic” medical terms (e.g.,pills, medicine, doctor) (§ 5.5.3).
We attribute ADEPT’s success to the suitability of sentence-level context-sensitive learning models, like
CRFs, to PAT medical term identification tasks (§ 5.7).
Finally, we demonstrate ADEPT’s efficacy through applying it to text from our MedHelp corpus (§ 5.6).
First, we compare the top-50 terms extracted from MedHelp’s Arthritis forum by both ADEPT and the
OBA (§ 5.6.1), noting that those recovered by ADEPT are both diverse and richly descriptive of arthritic
conditions, while the majority of those recovered by OBA are spurious. Next, we construct a graph of
co-occurring terms extracted by ADEPT from MedHelp’s Addiction: Substance Abuse forum, Forum77
(§ 5.6.2). The resulting graph suggests that a primary topic of discussion on the forum is withdrawal, and
moreover, that users discuss explicit drugs, especially prescription opioids, on the forum. Our work in
Chapters 6, 7 and 8 further explores Forum77 and confirms that these high-level insights are accurate.
5.2
Related Work
5.2.1 Medical Term Identification
MetaMap, arguably the best-known medical entity extractor, is a highly configurable program that relates
words in free text to concepts in the UMLS Metathesaurus [16, 17]. MetaMap sports an array of analytic
components, including word sense disambiguation, lexical and syntactical analysis, variant generation,
and POS tagging. MetaMap has been widely used to process data sets ranging from email to MEDLINE7
abstracts to clinical records [17, 31, 43].
The Open Biomedical Annotator (OBA) is a more recent biomedical concept extraction tool under
development at Stanford University. OBA is based on MGREP: a concept recognizer developed at the
University of Michigan [138]. Like MetaMap, OBA maps words in free text to ontological concepts; its
7 A collection of biomedical publications abstracts. For more information see: http://www.nlm.nih.gov/pubs/factsheets/
medline.html
43
workflow, however, is simpler, comprising a dictionary-based concept recognition tool and a semantic
expansion component that finds concepts related to those present in the exact text [138].
A handful of studies compare MetaMap and/or OBA to human annotators, and tend to find the
tools wanting. Ruau et al. [212] evaluated automated MeSH annotations on PRoteomics IDEntification
(PRIDE) experiment descriptions against manually assigned MeSH annotations. MetaMap achieved
precision and recall scores of 15.66% and 79.44%, while OBA achieved 20.97% and 79.48%. Pratt and
Yetisgen-Yildiz [201] compare MetaMap’s annotations to human annotations on 60 MEDLINE titles: they
found that MetaMap achieved exact precision and recall scores of 27.7% and 52.8%, and partial precision and recall scores of 55.2% and 93.3%. They note that several failures result from missing concepts
in the UMLS. This is corroborated in an analysis of 376 patient-defined symptoms from PatientsLikeMe
by Smith and Wicks [226], who found that only 43% of unique terms had either exact or synonymous
matches in the UMLS; of the exact matches, 93% were contributed by SNOMED CT.
In addition to ontological approaches, there are several statistical approaches to medical term identification. NaCTeM’s TerMINE8 is a domain-independent tool that uses statistical scoring to identify
technical terms in text corpora [94]. Given a corpus, TerMINE produces a ranked list of candidate terms.
In a test on eye-pathology medical records, precision was highest for the top 40 – ranked by C-value –
terms (∼75%) and decreased steadily down the list (∼30% overall). Absolute recall was not calculated,
due to the time-consuming nature of having experts verify true negative classifications in the test corpus.
Recall relative to the extracted term list, however, was ∼97% [94].
As we discuss in Chapter 3, a great deal of prior work has focused on training statistical classifiers
for biomedical named entity recognition (NER) tasks [76, 87,95, 111, 124, 125,214, 222, 238,239, 267]. In
general, this work demonstrates good results, indicating that statistical classification methods are more
appropriate for biomedical NER tasks than MetaMap and OBA. However, none of this work utilizes PAT
as a primary data source: statistical classifiers require sizable quantities of labeled data for training and
testing, and to date all available such data sets are based on biomedical publication abstracts [145, 146,
204, 271].
5.2.2 Consumer Health Vocabularies
A complementary and closely related branch of research to ours is Consumer Health Vocabularies
(CHVs): ontologies that link layman and UMLS medical terminology [85,273]. Supporting motivations for
8 http://www.nactem.ac.uk/software/termine
44
developing CHVs include: narrowing knowledge gaps between consumers and providers [273,274], coding data for retrieval and analysis [77], improving the “readability” of health texts for lay consumers [144]
and coding new concepts that are missing from the UMLS [143, 226]. We are currently aware of two
CHVs: the MedlinePlus Consumer Health Vocabulary9 , and the open and collaborative Consumer Health
Vocabulary10 – (OAC) CHV – which was included in UMLS as of May 2011.
To date, most research on CHVs has focused on discovering new terms to add to the (OAC) CHV. In
2007, Zeng et al. [274] compared several automated approaches for discovering new “consumer medical
terms” from MedlinePlus query logs. Using a logistic regression classifier, they achieved an AUC of
95.5% on all n-grams not present in the UMLS. More recently, Doing-Harris & Zeng [77] proposed a
computer-assisted update (CAU) system to crawl PatientsLikeMe, suggesting candidate terms for the
(OAC) CHV to human reviewers. By filtering CAU terms by C-value [94] and termhood [274] scores, they
were able to achieve a 4:1 ratio of valid to invalid terms; however, this also resulted in discarding over
50% of the original valid terms. Given the goals of the CHV movement, our CRF model for PAT medical
term identification may prove to be an effective method for generating new candidates terms for CHVs.
5.3
Data
In this section we describe our data preparation and sampling methods. We use samples from our
MedHelp (§ 4.1) data set for comparing crowdsourced vs. expert sourced labels, and for training and
cross-validation of our CRF classifier. We use a sample from our CureTogether (§ 4.2) data set as a
hold-out gold standard for comparing our CRF classifier to state of the art medical term annotation tools.
5.3.1 Preparation
We analyze our data at the sentence level. This promotes a fairer comparison between machine taggers,
which break text into independent sentences or phrases before annotating, and human taggers, who may
otherwise transfer context across sentences. We use Lucene11 to tokenize our corpora into sentences.
For consistency, we excluded sentences from MedHelp forums that we agreed were tangentially
medical (e.g.,“Relationships”), over-general (e.g.,“General Health”), or that contain fewer than 1,000
9 http://www.nlm.nih.gov/medlineplus/xml.html
10 http://consumerhealthvocab.org
11 http://lucene.apache.org
45
sentences. The raw MedHelp data set contains approximately 1,250,000 discussions. After preparation, the data set comprises approximately 950,000 discussions from 138 forums: a total of 27,230,721
sentences.
5.3.2
Samples
We use the following samples:
MH1K :
1,000 MedHelp sentences sampled uniformly at random; labeled by crowd and experts. We
use this sample to compare expert and crowd labels. We also use the expert labels as a gold standard
for comparing our CRF classifier’s performance against state-of-the-art tools.
MH10K :
10,000 MedHelp sentences sampled uniformly at random; labeled by crowd. We use this
sample to train our CRF classifier to identify medically relevant terms in PAT. We also use it for 10-fold
cross validation of this classifier.
CT1K :
1,000 CureTogether sentences sampled uniformly at random; labeled by experts. We use this
as an independent gold standard for comparing our CRF classifier performance against those of stateof-the-art tools.
5.4
Labeling Medically Relevant Terms with the Crowd
A common barrier to both training and evaluating medical text annotators is the lack of sufficiently large,
labeled data sets [17,201]. The challenge in building such data sets lies in sourcing medical experts with
enough time to annotate text at a reasonably low cost [201].
Crowdsourcing is the allocation of a series of small tasks (often called micro-tasks) to a “crowd”
of online workers, typically via a web-based marketplace. Crowdsourcing is particularly attractive for
obtaining results faster and at lower cost than other participant recruitment schemes. When the workflow
is properly managed (e.g., via quality control measures such as aggregate voting, or by breaking up tasks
into suitable sub-components such the “find-fix-verify” method proposed by Bernstein et al. [26]) the
combined results are often comparable in quality to those obtained via more traditional task completion
methods [126, 147]. Snow et al. [228] find that non-expert crowds can effectively execute linguistic
annotation tasks (affect recognition, word similarity, textual entailment, temporal ordering, and word
46
sense disambiguation) that are typically performed by experts. However, designing a crowdsourcing
task such that quality results are obtained is challenging and requires careful though [26, 147].
Replacing medical experts with non-expert crowds would address concerns of time and cost, allowing
us to build labeled PAT data sets quickly and cheaply. To test the viability of this idea, we first design
a crowdsourcing task for medical term identification in PAT (§ 5.4.1). Next, we deploy this task to both
experts (in our case, Registered Nurses, or RNs) and non-experts (Amazon Mechanical Turk workers,
or Turkers), and compare their annotations over a sample of 1,000 sentences (MH1K ) (§ 5.4.2).
5.4.1 Task Design and Pilot Study
Amazon’s Mechanical Turk12 is an online crowdsourcing platform where workers (Turkers) can browse
“human intelligence tasks” (HITs) posted by requesters and complete them for a small payment. We designed a simple interface in which a HIT comprised 100 sentences, each of which was accompanied by
a text box into which Turkers could copy medically relevant terms. Our original prompt simply asked Turkers to copy/paste any terms that seemed medically relevant from each sentence into the accompanying
text box. The resulting data contained several inconsistencies, including:
terms taken out of context: users selected terms that had no medical relevance in the context of
the given sentence, but might have medical connotations in other contexts. E.g., “anxiety” in the
sentence “I apologize if my post created any undue anxiety”.
omission: users would often leave an empty response for a sentence that contained a term that
was clearly medically relevant.
numerical measurement inclusion: some users felt that numbers corresponding to medication
dosages, units of measurement, etc. were relevant, while others did not.
concept granularity and scope: in a sentence such as “I have low blood sugar”, users would not
know whether to select “low blood sugar” or just “blood sugar”.
repetition: if a medically relevant term appeared twice in the same sentence (e.g.,“pain” in “I am
in a lot of pain and the meds don’t seem to help, they just take the edge off the pain if anything”),
some users would extract it only once, and others would extract it each time it appeared.
47
Prior work shows that the design of a crowdsourcing task and prompt strongly impacts response
quality [147]. In order to arrive at a suitable prompt that produced consistent results, we iterated on our
original version several times, basing our changes on the design principles outlined by Kittur et al. [147].
We discuss pivotal changes below; Figure 5.1 shows our final prompt and interface.
The most problematic inconsistency was terms taken out of context, which amount to unnecessary
false positives. Subjective tasks are especially difficult for crowd workers [147], and the medical term
identification task is inherently subjective. We discovered, however, that making the task seem less
subjective by asking users to tag words/phrases that they thought doctors would find interesting, all but
eliminated this effect.
The next problematic issue was omission, or unnecessary false negatives. We suspected that one
reason Turkers were cheating was because by doing so they could complete the HIT faster. Kittur et
al. [147] note that to acquire accurate results from Turkers, malicious completion and good-faith completion should require comparable levels of effort. We changed our interface such that each text box had
to contain some value prior to completion of the HIT, and instructed Turkers to type “NA” into text boxes
corresponding to sentences containing no medically relevant concepts. This helped somewhat, but it
is still easier to type “NA” than to copy/paste several terms into a text box. Kittur et al. [147] also note
that signaling to Turkers that their responses will be verified in a believable manner is thought to reduce
invalid responses as well as increase time spent on task. Before accepting the HIT, we informed Turkers
that four other Turkers would be completing the same HIT, and that their response would be rejected if it
disagreed substantially from the others. We enforced this policy. Implementing these changes resulted
in a drastic reduction of omissions.
Explicitly asking users to ignore numerical measurements and providing illustrative examples on
multi-word concepts reduced conflicting incidences of numerical measurement inclusion and concept
granularity to the point where aggregating over Turker responses produced a good result. However,
similar interventions related to issues of repetition had no effect. Ultimately we propagated the “medically
relevant” label to all unlabeled terms in the sentence that matched an extracted term. It is reasonable
to assume that two identical terms should carry the same label in a sentence, and we observed no
instances in which this assumption was violated.
48
Instructions (please read to get full credit for this task)
For this HIT, we would like you to extract all words/phrases that are medical concepts from the
sentences below. There are 100 sentences; this should take ~15-25 minutes.
To find medical concepts, ask yourself the question: "If I was telling this to my doctor, which words
would the doctor find interesting?" To simplify things, do not extract numerical values such as age,
weight, gender, medication dosage, symptom duration etc. Do extract concepts describing body parts,
conditions (and causes and effects of conditions), symptoms, treatments, etc. Remember that some
medically relevant terms are abbreviated (e.g. BS for "blood sugar").
For each sentence, please COPY/PASTE the relevant text EXACTLY (do not re-type it, or correct
misspellings), and SEPARATE each concept with a COMMA. For example:
I gave up smoking 2 weeks ago, and my blood pressure is under control with verapamil (0.5mg twice
a day)..
smoking, blood pressure, verapamil
For multi-word concepts, include as many words as you can, but make sure that they refer to just ONE
concept. Do not extract overlapping concepts. For example, in the sentence below, the term "blood
sugar" is preferred to "blood".
Shakes in the hands can be symptomatic of low blood sugar.
shakes, hand, blood sugar
Finally, many of the sentences will contain no medically relevant concepts. Just enter NA in the boxes
in these cases. For example:
You need to take care of yourself before you can take care of someone else.
NA
NOTE: you will be able to complete ONLY ONE of these HITs. Please do not attempt to accept
another hit after completing this one. Have fun!
Figure 5.1: Final PAT medical term identification task instructions and interface. Turkers were informed
that their answers would be checked against other Turkers’ in the HIT description on the MTurk interface.
Submit
5.4.2
Experiment
We use our MH1K data set for this experiment: a uniform sample of 1,000 sentences from the general
MedHelp data set. We deemed 1,000 sufficiently large for an informative comparison between RN and
49
Table 5.1: Majority vote at the token level over RN responses. Terms identified by RNs as medically
relevant are shown in bold. Stopwords (e.g.,“and”, “of”) are excluded from the vote.
RN 1:
shakes
in
the
hands
can
be
symptomatic
RN 2:
shakes
in
the
hands
can
be
symptomatic
RN 3:
shakes
in
the
hands
can
be
symptomatic
Result:
shakes
hands
symptomatic
low
blood
of
low
blood
sugar
of
low
blood
sugar
blood
sugar
of
sugar
Turker responses, but small enough to make expert annotation affordable. We split the sample into 10
groups of 100 sentences.
Our experts comprised 30 RNs from ODesk13 , an online professional contracting service. In addition
to the RN qualification, we required that each expert have perfectly rated English language proficiency.
Each expert did one PAT medical term identification task (100 sentences), and each group of 100 sentences was tagged by three experts, who were reimbursed $5.00 for completing the task. All tasks were
completed within two weeks at a cost of $150.00.
Our non-expert crowd comprised 50 Turkers recruited from Amazon’s Mechanical Turk (AMT). We
required that the Turkers have high English language proficiency, reside in the United States, and be
certified to work on potentially explicit content. Each Turker performed a single PAT medical term identification task (100 sentences), and each sentence group was tagged by five Turkers. The Turkers were
reimbursed $1.20 upon faithful completion of the task. All tasks were completed within 17 hours at a cost
of $60.00.
Determining a Gold Standard
We determine a gold standard for each sentence by taking a majority vote over the RNs’ responses.
Voting is performed at the word level, despite the prompt to extract words or phrases from the sentences.
Table 5.1 illustrates how this simplifies term identification by eliminating partial matching considerations
over multi-word concepts. N-gram terms can be recovered by heuristically combining adjacent words.
Comparing Turkers Against a Gold Standard
To test the feasibility of using non-expert crowds in place of experts, we compare Turker to RN responses
directly, aggregating across all 5 possible Turker voting thresholds. This allows us both to evaluate
13 http://www.odesk.com
50
Table 5.2: Turker performance against the RN gold standard. Voting threshold indicates the minimum
number of Turkers who have to annotate a term as medically relevant for it to be included in the result.
Maximum column values are indicated in bold. A corroborative policy of 2+ votes yields high scores
across the board, and maximizes F1-score.
Vote Threshold
F1
Precision
Recall
Accuracy
MCC
1
78.45
67.15
94.31
93.96
0.77
2
84.43
82.53
86.41
96.29
0.82
3
83.80
91.67
77.18
96.52
0.82
4
76.61
95.70
63.87
95.46
0.76
5
59.81
97.99
43.04
93.26
0.62
the quality of aggregated Turker responses against the gold standard and to select the optimal voting
threshold.
5.4.3
Results
Both the RN and the Turker group achieve high inter-rater reliability scores: κ = 0.709 and κ = 0.707
respectively using Fleiss’ Kappa [88], which measures agreement across two or more voters. Table 5.2
compares aggregated Turker responses against the RNs’ gold standard; voting thresholds dictate the
number of Turker votes required for a word to be tagged as “medically relevant”.
F1-score is maximized at a voting threshold of 2. We call this a corroborated vote, and select 2 as
the appropriate threshold for our remaining experiments. Overall, Turker scores are sufficiently high that
we regard corroborated Turker responses as an acceptable approximation for expert judgment.
5.4.4
Limitations of the Crowd
Crowdsourcing medical term identification in PAT allows us to build large, annotated data sets both
cheaply and quickly. Exploring the crowd’s efficacy at other medical entity annotation tasks is an important avenue for future work. Here, we offer some anecdotal insights based on our own attempts to get
the crowd to label specific types of medical terms in PAT. We attempted to implement two tasks similar
to that described in § 5.4.1: in the first, we asked Turkers to identify terms referring to symptoms and/or
conditions (e.g.,“cough”, “asthma”, “headache”). In the second task, we asked them to identify terms
referring to drugs and/or treatments (e.g.,“acupuncture”, “Tylenol”, “cough medicine”).
51
Although Turkers’ seemed to approach the task earnestly (they spent a reasonable amount of time
on it), the results were surprisingly inconsistent. In fact, some workers defaulted to labeling any terms
that were medically relevant, even though it is unlikely that they had been exposed to the original task
described in § 5.4.1, as more than 6 months had since elapsed. Ultimately, we hypothesized that there
were three factors explaining Turkers’ poor performance:
The first is subjectivity. The task of identifying symptoms or treatments is ambiguous and, in our
experience, more subjective than that of identifying terms that are simply medically relevant. For example, do wheelchairs, relaxation classes, birth control or drinking water constitute treatments? Do
sensations, flare-ups, pregnant and being worried constitute symptoms or conditions? The answers to
these questions tend to be “it depends”.
The second is concept scatteredness, which primarily affects the symptom/condition category. Symptom descriptions are often spread across an entire sentence, and Turkers are unsure of how to scope
such concepts. Consider, for example, the phrase “after I took the meds I felt like I’d been hit by a truck”.
Is “felt like I’d been hit by a truck” a symptom? This challenge is also cited by Leaman et al. [154] in work
on mining adverse drug events from user comments on DailyStrength14 .
The final factor that likely affected Turker performance was task overlap. The postings of the symptom
and/or condition task and the drug and/or treatment tasks were staggered by a couple of days. However,
we noticed that some people tried to pick out just drugs and/or treatments in a symptoms and/or conditions task, and vice versa. We attribute such mixups to the fact that the same Turkers who had done
the earlier task were also attempting the staggered task, but had habituated to the first task. Allowing
more time to elapse before posting the second task, or preventing Turkers from doing both tasks, should
ameliorate this effect.
We believe that with additional design and iteration, it would be possible to get Turkers to identify
specific types of medical terminology in PAT. For example, a multi-tiered approach such as find-fixverify [26] might reduce the level of task subjectivity. Enhancing the interface such that Turkers could
select “core” concepts and then related supporting terms might facilitate accuracy. Refining the task to
make it more specific would likely reap rewards. For example, instead of asking Turkers to “find terms
referring to symptoms or conditions”, they might be asked to “find terms that refer to symptoms related
to the condition Asthma”.
14 http://www.dailystrength.com
52
In sum, however, designing a crowdsourcing task can be a resource intensive process, and this
must be traded off against alternative annotation methods. In our later work on Forum77, our data were
sufficiently small that we elected to annotate it ourselves. However, systematically exploring the design
space of crowdsourcing PAT annotation tasks would likely yield high returns in the long term.
5.5
Training a Classifier on Crowd-Labeled Data
We now turn to the question of training a statistical classifier to identify medical terms in PAT automatically. We describe the models that we both use and compare against (§ 5.5.1), before describing our
experiment design (§ 5.5.2). Next, we present our results (§ 5.5.3), along with a failure analysis of our
classifier, ADEPT. Finally, we discuss our results and the limitations of our approach (§ 5.7).
5.5.1
Models
MetaMap, OBA and TerMINE
We use the Java API for MetaMap 201215 , running it under three con-
ditions: default; restricting the target ontology to SNOMED CT (a high percentage of “consumer health
vocabulary” is reputedly contained in SNOMED CT [226]), and restricting the target ontology to the
(OAC) CHV. We used the Java client for OBA [138], running it under two conditions: default; and restricting the target ontology to SNOMED CT, as the OAC (CHV) was not available to the OBA at the time of
writing. For TerMINE, we used the online web service16 .
Dictionary
A dictionary (or gazette) is one of the simplest classifiers that we can build using labeled
training data. Our dictionary compiles a vocabulary of all words tagged as “medical” in the training data
according to the corroborative voting policy; it then scans the test data and tags any words that match a
vocabulary element. Our dictionary implements case-insensitive, space-normalized matching.
ADEPT: A Conditional Random Field Model
Conditional random fields (CRFs) are probabilistic
graphical models particularly suited to labeling sequence data [151]. Their suitability stems from the
fact that they relax several independence assumptions made by Hidden Markov Models; moreover, they
can encode arbitrarily related feature sets without having to represent the joint dependency distribution
over features [151]. As such, CRFs can incorporate sentence-level context into their inference procedure. For example, a CRF can discern that the word “tired” represents a medical term in the sentence,
16 http://www.nactem.ac.uk/software/termine
53
“I’m feeling so tired, as though I am oxygen deprived.”, but not in the sentence, “I’m tired of feeling as
though I am oxygen deprived.”” The term “oxygen deprived” is medically relevant in both sentences17 :
Our CRF training procedure takes, as input, labeled training data coupled with a set of feature definitions, and determines model feature weights that maximize the likelihood of the observed annotations.
We use the Stanford Named Entity Recognizer package18 , a trainable Java implementation of a CRF
classifier, and its default feature set. Examples of default features include word substrings (e.g.,“ology”
from “biology”) and windows (previous and trailing words); the full list is detailed in Appendix A. We refer
to our trained CRF model as ADEPT (Automatic Detection of Patient Terminology).
5.5.2 Design
To test our second hypothesis, we create a crowd-labeled data set comprising 10,000 MedHelp sentences (MH10K ), and a RN-labeled data set comprising 1,000 CureTogether sentences (CT1K ). Using
the procedures described in § 5.4, this cost approximately $600 and $150, respectively. We train two
models – a dictionary and a CRF – on the MedHelp data set (MH10K ), and evaluate performance via
5-fold cross validation; we compare MetaMap, OBA and TerMINE’s output directly. Finally, we compare
the performance of all 5 models against the CureTogether gold standard (CT1K ).
5.5.3 Results
Table 5.3 shows the performance of MetaMap, OBA, TerMINE, the dictionary model and ADEPT on
MH10K , (MH1K and CT1K ). ADEPT achieves the maximum score in every metric, bar recall. Moreover,
its high performance carries over to the Cure Together test corpus, indicating adequate generalization
from the training data. Figure 5.2 provides illustrative examples of the models’ performance on sample
sentences from MH1K .
Failure Analysis
While ADEPT’s results are promising, assessing points of failure is useful for future improvements and
implementations. Figure 5.3 plots term classification accuracy against logged term frequency in both test
corpora. We observe that while most terms are always correctly classified, a number of terms (∼650) are
never classified correctly. Of these, almost all (>90%) appear only once in the test corpora. A LOWESS
17 Note:
this is actual output from our final classifier.
18 http://nlp.stanford.edu/software/CRF-NER.shtml
ADEPT:
Dictionary:
MetaMap:
OBA:
TerMINE:
it
it
it
it
it
says
says
says
says
says
ADEPT:
Dictionary:
MetaMap:
OBA:
TerMINE:
last
last
last
last
last
ADEPT:
Dictionary:
MetaMap:
OBA:
TerMINE:
in
in
in
in
in
ADEPT:
Dictionary:
MetaMap:
OBA:
TerMINE:
i
i
i
i
i
ADEPT:
Dictionary:
MetaMap:
OBA:
TerMINE:
mgmt
mgmt
mgmt
mgmt
mgmt
proliferative
proliferative
proliferative
proliferative
proliferative
summer
summer
summer
summer
summer
my
my
my
my
my
had
had
had
had
had
case
case
case
case
case
a
a
a
a
a
i
i
i
i
i
the
the
the
the
the
chest
chest
chest
chest
chest
retail
retail
retail
retail
retail
sales
sales
sales
sales
sales
ductal
ductal
ductal
ductal
ductal
was
was
was
was
was
at
at
at
at
at
woman
woman
woman
woman
woman
xray
xray
xray
xray
xray
hyperplasia
hyperplasia
hyperplasia
hyperplasia
hyperplasia
home
home
home
home
home
my
my
my
my
my
done
done
done
done
done
not
not
not
not
not
without
without
without
without
without
with
with
with
with
with
my
my
my
my
my
husband
husband
husband
husband
husband
and
and
and
and
and
overweight
overweight
overweight
overweight
overweight
they
they
they
they
they
daughter
daughter
daughter
daughter
daughter
had
had
had
had
had
said
said
said
said
said
good
good
good
good
good
atypia
atypia
atypia
atypia
atypia
an
an
an
an
an
affair
affair
affair
affair
affair
there
there
there
there
there
almost
almost
almost
almost
almost
was
was
was
was
was
great
great
great
great
great
and
and
and
and
and
who
who
who
who
who
54
non-proliferative
non-proliferative
non-proliferative
non-proliferative
non-proliferative
is
is
is
is
is
with
with
with
with
with
now
now
now
now
now
in
in
in
in
in
ecstasia
ecstasia
ecstasia
ecstasia
ecstasia
without
without
without
without
without
carcinoma
carcinoma
carcinoma
carcinoma
carcinoma
2
2
2
2
2
reassured
reassured
reassured
reassured
reassured
something
something
something
something
something
duct
duct
duct
duct
duct
him
him
him
him
him
my
my
my
my
my
twice
twice
twice
twice
twice
she
she
she
she
she
had
had
had
had
had
no
no
no
no
no
stds
stds
stds
stds
stds
lung
lung
lung
lung
lung
posture
posture
posture
posture
posture
Figure 5.2: Sample sentences labeled by ADEPT, the dictionary, MetaMap, OBA and TerMINE.
Table 5.3: Annotator performance against the crowd-labeled data set and the gold standards. Maximum
column values are indicated in bold.
Validation data set
Crowd-labeled
MH10K
MedHelp
gold standard
MH1K
CureTogether gold
standard
CT1K
Annotator
F1
Accuracy
MCC
MetaMap
64.20
70.44
0.24
Default
55.85
76.83
0.26
SNOMED CT
24.48
60.63
74.75
0.26
CHV
43.77
30.20
79.53
77.21
0.39
Default
43.23
36.15
53.76
84.25
0.35
SNOMED CT
Dictionary
46.18
32.34
80.75
79.02
0.42
ADEPT
78.41
82.66
74.59
95.42
0.76
OBA
Precision
Recall
32.64
21.88
34.97
25.45
34.88
Parameters
MetaMap
37.73
28.03
57.67
77.82
0.29
SNOMED CT
OBA
45.78
32.10
79.31
78.04
0.41
SNOMED CT
TerMine
42.35
52.67
35.41
88.77
0.37
Dictionary
37.30
26.34
63.89
74.98
0.29
ADEPT
78.33
82.55
74.53
95.20
0.76
MetaMap
39.12
29.33
58.57
74.13
0.27
SNOMED CT
OBA
47.28
33.56
79.91
74.74
0.40
SNOMED CT
TerMine
43.09
53.11
36.25
86.43
0.37
Dictionary
38.74
27.53
65.35
70.65
0.27
ADEPT
77.74
78.82
76.69
93.78
0.74
Adept Chart
4/27/13 4:29 PM
55
100
90
80
Classification accuracy (%)
70
60
50
40
30
1 term
20
10 terms
10
100 terms
500 terms
0
0
1
2
3
4
5
6
7
ln(frequency) of term in test corpora
Figure 5.3: Term classification accuracy plotted against logged term frequency in test corpora. Purple
(darker) circles represent terms that are always classified correctly; blue (lighter) circles represent terms
that are misclassified at least once. A LOWESS fit line to the entire data set (black) shows that most
terms are always classified correctly. A LOWESS fit line to the misclassified points (blue/lighter) shows
that classification accuracy increases with term frequency.
fit to the points representing terms that were misclassified at least once shows that classification accuhttp://localhost:8999/scatter.html
racy increases with term
frequency in the test corpora (and by logical extension, term frequency in the
training corpus). As we might expect, over half (∼51%) of the misclassified terms occur with frequency
one in the test corpora. A review of these terms reveals no obvious term type (or set of term types)
Page 1 of 1
56
Table 5.4: Examples of ADEPT’s misclassifications in the test corpora.
Frequently Misclassified
(FP > 1, FN > 1)
baby, bc, condition, doctor, doctors, drs, health, ice, natural, relief, short, strain,
weight
Mostly False Positive
(FP > 1, FN ≤ 1)
accident, decreased, drinks, drunk, exertion, external, healthy, heavy, higher, lie,
lying, milk, million, pants, periods, prevention, solution, suicidal . . . [37 more terms]
Mostly False Negative
(FP ≤ 1, FN > 1)
appointment, clear, copd, hiccups, lack, ldn, massage, maxalt, missed, nurse, physician, pubic, rebound, silver, sleeping, smell, tea, treat, tree, tx . . . [41 more terms]
Infrequently Misclassified
(FP ≤ 1, FN ≤ 1)
cravings, generic, growing, hereditary, increasing, lab, limit, lunch, panel, pituitary,
position, possibilities, precursor, taste, version, weakness . . . [118 more terms]
likely to be incorrectly classified. Indeed, many are typical words with conceivable medical relevance
(e.g.,gout, aggravates, irritated). Such misclassifications would likely improve with more training data,
which would allow ADEPT to learn new terms and patterns.
It remains to investigate terms that are both frequent and frequently misclassified. Table 5.4 shows
terms from the test corpora that ADEPT misclassifies at least once. Immediately obvious is the presence of terms that are medical but generic, such as doctor, doctors, drs, physician, nurse, appointment,
condition, and health. These misclassifications likely stem from ambivalence in the training data; indeed,
Yetisgen-Yildiz and Pratt [201] find that human annotators have low certainty over whether to include
general terms such as these in medical term annotation tasks. In either case, specific instructions to
human annotators on how to handle generic terms, or rule-based post-processing of annotations, could
ameliorate such errors.
5.6
Example Applications of ADEPT to PAT
To illustrate ADEPT’s efficacy, we present two applications to PAT corpora. The first is to MedHelp’s
Arthritis forum, with an eye to summarizing its important medical concepts. In this application, we compare ADEPT’s output with OBA’s. Our second application is to Forum77, MedHelp’s Addiction: Substance Abuse forum, in which our goal is to generate a high-level concept map of its medically relevant
content.
5.6.1
57
Summarizing Important Medical Content in MedHelp’s Arthritis Forum
A simple way of summarizing the medical content in a PAT corpus is to simply rank all relevant terms by
frequency, and select the top N . Figure 5.4 compares the top 50 medical terms in MedHelp’s Arthritis
forum as determined by ADEPT and the OBA. (We picked OBA instead of MetaMap due to its superior
performance – see Table 5.2). The terms recovered by ADEPT are both diverse and richly descriptive of
arthritic conditions; in contrast, the majority of terms recovered by the OBA are spurious, and serve only
to demote the rankings of the few relevant terms that it does find.
5.6.2
Navigating MedHelp’s Substance Abuse Forum (Forum77)
A natural way of acquiring a casual overview of a corpus’ content is to visualize both the important
medical terms, as well as significant relationships between them. Including term relationships imparts an
extra layer of insight to the underlying content. For example, if drug terms tend to co-occur in sentences,
then it is likely that users compare drugs in their discussions. On the other hand, if drug terms tend to
co-occur with symptom terms, then discussions likely document which drugs treat specific symptoms.
To acquire a high-level topography of Forum77’s medical content, we first apply ADEPT to the Forum77 corpus. Filtering out infrequent terms (terms that appear < 10 times in the corpus), we score
connections between remaining co-occurring terms with the G2 metric, which rewards significant (or interesting) co-occurrence relationships over common ones [78]. We then use Gephi19 , a tool for graph
analysis and visualization, to explore the results interactively.
Note that what follows is a casual analysis in which we utilize Gephi’s internal filtering and clustering
features to facilitate rapid exploration. Our goal is to illustrate a typical point of departure in exploring
a novel corpus of ADEPT-extracted PAT terms. Figure 5.5 shows a co-occurrence graph over ADEPTextracted Forum77 terms, with node labels omitted to illustrate the underlying graph structure. Immediately obvious is the presence of two, large, interlinked clusters (dark and light blue). A third cluster
(dark green) is more independent. We examine each of these clusters in greater detail by filtering out
non-member nodes, and recalculating the graph layout.
Figure 5.6 shows the largest (light blue) cluster with node labels. This cluster appears to detail general aspects of addiction related to detoxification: suboxone and methadone are synthetic opioids used
in opioid-replacement therapy; detox and taper are direct detoxification references; many other nodes
19 http://gephi.github.io
ADEPT
pain
arthritis
symptoms
joints
knees
feet
hands
swelling
neck
knee
fingers
ankles
legs
tests
joint
rheumatologist
diagnosed
swollen
meds
disease
surgery
treatment
leg
shoulder
spine
doctor
inflammation
wrists
test
stiffness
painful
diagnosis
arms
toes
fatigue
shoulders
joint pain
wrist
bone
muscles
arm
osteoarthritis
foot
hip
medication
negative
positive
skin
cold
OBA
have
pain
doctor
arthritis
like
help
time
years
symptoms
right
did
work
blood
joint
good
does
need
months
joints
test
knee
day
started
ago
try
is a
tests
better
left
hope
long
year
disease
bad
rheumatologist
diagnosed
here
days
hands
old
sure
weeks
knees
doctors
normal
cause
lot
got
make
58
!
!
!
Figure 5.4: Top 50 terms, ranked
by frequency, derived from MedHelp’s Arthritis forum as determined
!
by ADEPT (left) and OBA (right).
Terms unique to their respective portion of the list are shown in bold.
! are linked with a line. The gradient of these lines show that all co-occurring
Terms occurring in both lists
!
terms, bar three, are more highly
ranked by ADEPT.
!
!
!
!
!
!
!
59
Figure 5.5: A graph showing important terms in Forum77 (nodes), and significant co-occurrence relationships between them (edges). Node size is proportional to degree, while colors indicate clusters.
Node labels are omitted for legibility; instead, we examine main clusters in-depth in subsequent figures.
detail withdrawal symptoms (anxiety, cramps, body aches, muscle-tremors, muscles-restlessness, etc.).
Overall, this cluster suggests that Forum77 hosts detailed discussions on the process and mechanisms
of opiate withdrawal.
Figure 5.7 illustrates the second-largest (dark blue) cluster. This cluster is almost clique-like, and its
core comprises primarily addictive prescription drugs: oxy (oxycodone), hydro (hydrocodone), xanax,
vicodin, benzo (benzodiazapine) etc. This cluster also details several withdrawal symptoms (tired, chills,
60
Figure 5.6: The largest cluster in Figure 5.5 suggests that discussions frequently involve detoxification
from prescription drugs.
Figure 5.7: The second-largest cluster in Figure 5.5 suggests that discussions frequently pair specific
drugs and the withdrawal symptoms that they cause.
61
Figure 5.8: The third-largest cluster in Figure 5.8 contains medically relevant terms from Thomas’
Recipe: a user-developed schedule for medication-assisted opioid withdrawal.
flu, etc.) as well as body parts (head, legs, skin, etc.), suggesting a great deal of discussion around
specific prescription opioids and their associated withdrawal symptoms.
Finally, Figure 5.8 shows the third-largest cluster (dark green). Like Figure 5.7, the structure is cliquelike. Its nodes constitute a combination of withdrawal symptoms (runny nose, general aches, leg cramps
etc.), terms representing wellness activities or supplements (mild exercise, cycling, vitamin b6, zinc,
l-tyrosine etc.), and non-opiate drugs (ativan, imodium, benzodiazepine). In hindsight, it is clear that
this cluster represents medically relevant terms from Thomas’ Recipe: a user-developed schedule for
medication assisted opioid withdrawal that is popular on Forum77. We discuss Thomas’ Recipe in depth
in § 6.8.1.
These casual explorations of co-occurring ADEPT-extracted Forum77 terms suggest that withdrawal
is a primary topic of discussion on the Forum (Figures 5.6, 5.7). Moreover, users discuss specific drugs,
primarily prescription drugs (Figure 5.7). Without prior knowledge of Thomas’ Recipe (§ 6.8.1), guessing
that Figure 5.8 partially represented a detoxification protocol would be difficult, although the nodes opiate
62
detox and at-home self-detox might have provided a clue. Overall, our later work in this thesis shows
that these explorations yield accurate, although incomplete, insights into Forum77’s primary content.
5.7
Conclusion
Our work on ADEPT was prompted by the observation that despite the abundance of PAT, tools for
extracting medically relevant content from it are lacking. This, in turn, restricts general exploration and
hypothesis generation over PAT corpora. One major limitation to building such tools is a lack of large,
annotated corpora for training and testing statistical models.
Our first result addresses this by proving that a crowd of non-experts is a sufficient replacement for
medical experts in the PAT medical term identification task (§ 5.4). Through paying careful attention to
existing crowdsourcing design principles, we were able to design a prompt and task that resulted in labels
of comparable quality to those produced by experts (§ 5.4.1). Combined and aggregated according to
a corroborative vote, Turker responses achieve an F1-Score of 84% against our RNs’ gold standard
(§ 5.4.2). As crowds of non-experts are much easier to coordinate than medical experts, this opens
up the option of building large, labeled PAT corpora of high quality both quickly and cheaply. We note,
however, that not all tasks may be suitable to crowd labeling; those that are more subjective or require
specialized knowledge may involve particularly challenging task design (§ 5.4.4).
Next, we addressed the issue of automating the PAT medically relevant term identification task
(§ 5.5). ADEPT, our CRF classifier trained on crowd-labeled data, dramatically outperforms existing
tools MetaMap, OBA and TerMINE (§ 5.5.3). Moreover, ADEPT’s performance carries over to an independently sourced PAT gold standard from CureTogether. While one limitation of ADEPT is that it
does not identify specific term types (e.g.,drugs, symptoms), it is excellent at finding terms of medical
relevance. This makes it a useful and novel tool for summarizing and exploring PAT corpora (§ 5.6.2).
We attribute ADEPT’s success to the suitability of sentence-level, context-sensitive learning models
like CRFs to PAT medical term identification tasks. Our dictionary, trained on the same data as ADEPT,
achieves high recall because it collects many medical terms from training data, but it achieves low precision because it cannot discriminate between relevant and irrelevant invocations of these terms. Unlike
ADEPT, for example, the dictionary cannot learn that the word “sugar” is of particular medical relevance
when it co-occurs with the word “diabetes”. The third sentence in Figure 5.2 suggests that context-based
relevance detection may be problematic for MetaMap and OBA, too. In this sentence, the term “case” is
63
annotated because of its membership in SNOMED-CT as a medically relevant term pertaining either to
a “situation” or a “unit of product usage”.
In concert, our contributions in this chapter constitute an alternative approach to medical term annotation and identification. In Chapter 7 we leverage the lessons learned in this chapter to extract a specific
type of medical term from Forum77 discussions: users’ drugs of choice. First, however, in Chapter 6 we
investigate users’ motivations for participating in Forum77.
Chapter 6
What do People Seek on Forum77?
Forum77 is the largest community on MedHelp, which indicates that it provides something that users
need and find useful. But what do people seek through participation on Forum77? Insight into how and
why users engage with Forum77 is instructional in its own right, but also provides a valuable template
for planning future, targeted explorations of the corpus. Our goal in this chapter is to elucidate users’
motivations for initiating discussions on Forum77.
We first motivate our focus on the topic of addiction (§ 6.1) before covering related work (§ 6.2) and
summarizing the data sets used in this chapter (§ 6.3). Next, we conduct a thematic analysis, developing
a taxonomy of users’ reasons for participation (§ 6.5). In congruence with prior work, the two driving
motivations are seeking emotional support and seeking informational support. Within these categories
are sub-categories specific to the topic of substance abuse, such as seeking information on withdrawal
and expressing concern about relapse. The most prevalent label, accounting for over 30% of all initiating
posts, is the update: a status log devoid of requests for feedback.
Next, we discuss the training and evaluation of two binary statistical classifiers than can distinguish
emotional from informational posts (§ 6.6), and update from non-update posts (§ 6.7). Our classifiers
perform well, achieving F1-scores of 80.12% and 76.54% for emotional vs. informational and update vs.
non-update, respectively.
Finally, we present the results of applying these classifiers to the entire Forum77 corpus (§ 6.8). We
compare and contrast features such as thread longevity and response rates across thread categories.
We also present and discuss Thomas’ Recipe: a highly prevalent informational support artifact on Forum77 that we came across in the course of our analyses. We conclude that Forum77 serves both as
a user-generated and tested repository of medically-explicit knowledge on managing substance abuse
64
CHAPTER 6. WHAT DO PEOPLE SEEK ON FORUM77?
65
withdrawal, as well as a public platform where people broadcast their progress as a mechanism for seeking emotional support and encouragement from others. In this latter capacity, Forum77 is similar to the
offline mutual help groups Alcoholics Anonymous (AA) and Narcotics Anonymous (NA); in its information
providing capacity, however, Forum77 is quite distinct, as AA and NA explicitly eschew the sharing of
medical information [133].
6.1
Why Study Addiction?
We focus on the topic of addiction for 3 primary reasons, which we expand on below. The first is that
addiction is highly prevalent. As such, any insights or results that arise from studying addiction could be
useful and impactful to a large number of people. Second, addiction is highly stigmatized. As a result,
people suffering from addiction are likely to turn online for help, and addiction-related PAT is likely to
contain information that is difficult to acquire through traditional medical channels. Finally, people are
turning online en masse for help with Addiction. Forum77 is MedHelp’s largest forum, but, as we show
in Table 6.1, only one of several online forums dedicated to the topic of substance abuse recovery.
6.1.1
Addiction is Highly Prevalent
Drug and alcohol use disorders, in particular the escalating misuse of prescription drugs, present one
of the most pressing public health issues of the day. Addiction affects 16% of Americans ages 12 or
older (about 40 million people), far exceeding the number of people afflicted with heart disease (27
million), diabetes (26 million), or cancer (19 million) [4]. Deaths due to accidental drug overdose now
exceed deaths due to motor vehicle accidents [251]. In 2008, more than 36,000 deaths were due to drug
overdoses; of these, opioid pain reliever (OPR) overdoses accounted for more than heroin and cocaine
combined [3, 249]. Taking into account workplace, criminal justice, and health care costs, the burden of
prescription drug abuse on the U.S. Economy was $56-$57 billion in 2006-2007 [27, 115].
6.1.2
Addiction is Highly Stigmatized
Recent medical research argues that drug dependence is a chronic, relapsing and remitting disorder
that behaves just like other chronic illnesses with a behavioral component, such as Type II Diabetes
Mellitus [169]. Despite this, prescription opioid abuse is a highly stigmatized condition: the opinion
66
that opioid misuse is a flaw of a person’s moral character, rather than a legitimate medical condition, is
common [187].
This stigma carries into the medical profession. In general, medical professionals feel that addiction
lacks parity with other medical conditions in terms of prestige and importance [176]. In addition, there
is a mutual mistrust between addiction patients, who feel that they are mistreated and stigmatized and
receive poor medical care as a result, and their providers, who find it difficult to evaluate whether patients’
requests for opioids stem from genuine “medically indicated” needs or from addictive behavior [174].
The stigma is compounded by the fact that the most effective treatments for opioid use disorders are
methadone or buprenorphine-assisted replacement therapies, which require patients to continue taking
prescription opioids under the supervision of a medical professional [187]. Finally, as pain treatment
is often the starting point of a longer addiction to prescription opioids, it is common for people with
prescription drug use disorders to acquire their drug of choice via a doctor’s prescription [229, 249].
6.1.3
People are Turning Online for Help with Addiction
People with substance use disorders are no exception to the trend of online health forum participation.
Myriad discussion forums focus on addiction recovery and are widely utilized. Table 6.1 describes a
representative sample of these that we curated during a brief search. The result of this is a massive,
growing and (until now) unexamined corpus of text in which users document their experiences with
addiction and their attempts at overcoming it.
6.2
Related Work
Emotional and informational support consistently emerge as the primary reasons for user engagement
in online health communities [36, 47, 86, 122, 131, 148, 149, 162, 211, 243, 250, 258]. However, little work
attempts to extend analyses of users’ support giving, seeking, or reasons for participation to data sets
that are too large for manual annotation. We discuss this work here, referring the reader to § 2.2.3 for a
thorough discussion of users’ reasons for participation in online health communities, and to § 3.5 for a
summary of prior work on thematic analyses of PAT.
67
Table 6.1: Summary statistics of a representative sample of online health communities focused on addiction recovery. We identified sites through Google searches and gathered statistics (if available) from
site pages. Data current as of 3/1/2014.
Name
Description
Forum77
medhelp.org/forums/
Addiction-SubstanceAbuse/show/77
The Suboxone Talk Zone
suboxforum.com
Addiction Recovery Guide
addictionrecovery
guide.org
Addiction Survivors
addictionsurvivors.org
Cyber Recovery
cyberrecovery.net
Sober Recovery
soberrecovery.com/forums
Join Join
to
to
post read
Members
Posts
Threads
Single forum dedicated to recovery in general.
∼51,153
∼740,046
∼80,529
Y
N
Multiple forums focused on
issues related to Suboxone.
∼11,000
∼77,000
∼8,900
Y
N
Collection of resources for
assisting recovery; includes
online forum.
N/A
700,000
N/A
Y
N
Forums focus on opiate, alcohol, benzodiazepine, and
stimulant addiction.
∼15,870
∼270,000
∼17,500
Y
N
Multiple forums dedicated to
recovery in general.
5,078
154,975
23,000
Y
Y
Multiple forums dedicated to
alcoholism and drug abuse
recovery.
132,964
>3.5 M
234,311
Y
N
Wang et al. [250] successfully use workers on Amazon’s Mechanical Turk1 (Turkers) to quantify the
amount of emotional and informational support contained in both initiating and response posts on Breastcancer.org2 . They then use this data to train regression models that have correlation scores 0.76 and
0.80 for emotional and informational content, respectively. Investigating whether certain types of support
are important for member retention, they found that receiving high levels of emotional support predicted
for lower dropout risk.
Biyani et al. [28] manually labeled ∼1,000 sentences from the Cancer Survivor’s Network forum3 as
either emotional or informational. An ensemble classifier trained on this data achieved an F1-score of
84% (88% for emotional support, 77% for informational support). Their goal was to determine whether
influential and regular community members differed in terms of the types of support they provided on
the forum. They found that influential members offer significantly more emotional support than regular
community members.
2 http://breastcancer.org
3 http://csn.cancer.org
68
To our knowledge, no other prior work attempts to automatically classify informational and emotional
support in PAT. However, some work does investigate methods for labeling or featurizing these data at
scale. Vlahovic et al. [248] found that Turkers produced good labels for emotional and informational
support on posts from a breast cancer support forum. Finally, both Owen et al. [188] and Alpers et
al. [12] evaluate the efficacy of using LIWC4 to automatically identify emotions expressed in posts on
breast cancer support forums. While both find the tool reasonably accurate, they do not attempt to
analyze users’ motives for posting.
Unlike Wang et al. [250] and Biyani et al. [28], we investigate and discuss a more detailed taxonomy
of users’ reasons for participation. In addition to automatically classifying informational and emotional
support, we are also able to train a classifier to identify a specific sub-category of emotional support
posts: the update. While we leave the analysis of response post content to future work, we do investigate
response levels to different categories of initiating posts.
6.3
Data
For clarity, we briefly summarize the data sets used in this chapter.
6.3.1 Thematic Analysis Development Dataset
We use our Forum77 data set (§ 4.1.2) for this work. For our thematic analysis (§ 6.5), we used ∼1,000
initiating posts sampled uniformly at random for each iteration of the analysis, and evaluated interannotator agreement on a 200-post subsample. With a total of 3 iterations, we used ∼3,000 initiating
posts sampled uniformly at random to conduct the thematic analysis.
6.3.2 Labeled Training & Testing Dataset
We created a data set for labeling and classifier training as follows: first, we curated a sample of initiating
posts from recurring Forum77 users by randomly sampling 200 users who had initiated 5 or more posts.
(We restricted the sample to recurring users in order to ensure a more balanced representation of taxonomy labels, as we observed in our thematic analysis (§ 6.5) that certain labels (e.g., support giving)
tend to appear only later in a user’s tenure.) Our 200 sampled users authored ∼32,000 initiating posts,
4 http://www.liwc.net
69
of which we took a random sample of 1,000 for subsequent coding. To prevent any user from dominating
the sample, we admitted no more than 30 posts per user.
6.4
Who Posts?
Traditional demographic information such as gender, age, race and socioeconomic status is rarely discernible from Forum77 posts. However, we were able to determine other aspects of identity, namely
whether a user was posting on their own behalf or on behalf of someone else. We noted that most
users initiate posts in which they are the subject; occasionally, however, users initiating posts in which
someone else is the subject. These proxies range from concerned parents, to members congratulating
each other on clean time, to loved ones posting on behalf of an incapacitated member.
We defined the subject of the post to be self if the author is writing about her own addiction, associate
if the author is writing about someone else’s addiction, or n/a if this information is absent or indeterminate.
Two authors labeled our 1,000 initiating post training data sample with the subject label. Inter-annotator
agreement was 92%, with a Cohen’s Kappa of 0.77.
The distribution of subject labels over the sample data set is: 85% self, 8% associate, and 7% n/a.
While most users post on their own behalf, a significant minority post on behalf of another. Moreover,
the number of posts in which the subject was indeterminate was higher than we expected. Such posts
typically consist of social chatter (e.g., talking about sports). As these results do not suggest anything
interesting or novel, we do not pursue this analysis at scale.
6.5
Users’ Objectives in Initiating Discussions
Thematic analyses are frequently used on PAT to identify structure and patterns in user behavior and
user-generated content (§3.5). To develop a taxonomy describing users’ objectives in initiating discussions on Forum77, we use an adapted General Inductive Approach [236]: over the course of reading ∼3,000 posts, two authors iteratively co-developed a taxonomy describing recurrent and emergent
themes in the posts. On each iteration, the authors used the taxonomy to independently label 1,000
randomly sampled posts. They then revised the rubric based on subsequent error analysis and interannotator agreement scores calculated on a 200-post subsample. The authors executed a total of three
iteration cycles. Figure 6.1 illustrates our thematic analysis process.
Thematic Analysis
Schema
Consult
Addiction
Specialist
Sample
n=1,000
70
Label
Set#1
n=600
Label
Set#2
n=600
Error
Analysis
Final
Schema
Figure 6.1: Thematic analysis process. Orange edges indicate the iterative component of the analysis.
Table 6.2 presents our final taxonomy, which was reviewed and approved by an Addiction specialist,
along with label prevalence in our labeled training data set. Table 6.3 presents sample text from posts in
each category in the taxonomy.
6.6
6.6.1
Classifying Informational vs. Emotional Support
Training Dataset Annotation and Agreement
Having finalized our taxonomy, two annotators used it to each label 600 of our 1,000 initiating post training data sample (§6.3.2). We annotated each post with its primary purpose using the most specific label
available. Inter-annotator agreement for specific purpose labels (Label in Table 6.2) was medium, with
agreement of 67% and Cohen’s kappa [50] of 0.62. Inter-annotator agreement on the three broader categories informational, emotional and neither (Category in Table 6.2), however, was high with agreement
of 87% and a Cohen’s kappa [50] of 0.78.
71
Table 6.2: Annotator-derived taxonomy for users’ objectives in initiating a post, with % prevalence in the
1,000 post labeled sample on the right. Note that 1.) labels are mutually exclusive, 2) “w/d” stands for
“withdrawal”.
Category
informational
emotional
neither
6.6.2
Label
Description
%
w/d expectations
Questions on what to expect when going through withdrawal, especially regarding symptom severity and duration.
11.8
w/d management
Questions about how to manage withdrawal and relieve symptoms.
8.7
w/d method
Soliciting advice on how best to quit drug(s) of choice. Topics include method of quitting (cold turkey vs. tapering) and scheduling
a time to detox.
7.8
general information
Subject posts medical questions unrelated to withdrawal.
8.5
seek support
Specific requests for support (like keeping in thoughts, prayers,
getting in touch).
4.6
give support
Primary purpose of the post is to offer encouragement to others,
often via relating a personal story of overcoming addiction.
9.9
update
Posts that comprise a log-like report of the user’s current status.
These are often highly detailed and contain no requests for feedback or support.
35.5
general guidance
Subject posts non-medical questions to the community. These
often comprise advice for personal relationships and scenarios
requiring moral judgement.
5.0
relapse concern
Subject is worried that she is going to relapse. While rare, these
posts typically forecast relapse due to a required medical procedure that will require prescription pain medication. These posts
varied in their information vs. support leanings, so we excluded
them from either category.
2.8
n/a
Impossible to speculate on the purpose of the post.
5.4
Classifier Training
To identify posts as either primarily informational or primarily emotional, we built a logistic regression
classifier (which outperformed Support Vector Machine and Naive Bayes classifiers) using the Stanford
CoreNLP toolkit5 . For each post, we used the following features: the number of question sentences,
content unigrams and bigrams, positive and negative word counts with polarity score ≥ 0.8 in SentiWordNet [19], and number of days clean, if stated. The last feature was determined by applying the
pattern “X days/weeks/months clean” and “on day X” to the post text. A full feature list is documented in
Appendix B.
5 http://nlp.stanford.edu/software/corenlp.shtml
72
Table 6.3: Descriptions and samples of taxonomy labels. Samples are synthesized in order to preserve
user privacy.
Label
Description (+ Additional Notes)
Synthesized Sample
w/d expectations
What to expect while going through w/d. (Typically users will ask how long symptoms will last,
whether the symptoms are normal etc.)
I stopped long term methadone 12 days ago. I
was wondering if anyone knows how long the anxiety RLS and hot/cold last? The other symptoms
rnt too bad...
w/d management
How to handle w/d symptoms. Implies w/d e.
(Typically users will try to source ideas for alleviating pain, RLS etc.)
I’m wondering about the Amino Acid protocol and
Thomas recipe. What would be the most important to take from day 1 to 4 during the worst W/D
symptoms? I know I suffer the most with RLS and
chemical chills [...]
w/d method
User seeks information on how to quit a substance. (Include questions like whether to go c/t
vs. taper, requests for tapering schedules or advice etc.)
I am taking 5000mg of vicodin currently daily can
anyone help me with this?
general information
User seeks informational advice that is not related to quitting/withdrawal. (Several possibilities, including questions about how much would
it take to overdose etc.)
I’m curious as to how long people were addicted/dependent to their DOC. I know using for
longer makes it harder to quit, and each time you
quit WDs are harder than before. As for me, I had
a 12 year run with vics/oxies.
seek support
User explicitly requests emotional support from
the community. (Request for emotional support
should be explicit. Typically users will ask for
help or prayers or thoughts.)
For those of you who are prayer warriors, please
could you pray for my friend, for recovery and protection. Could you also please pray for his family
- they are in a very hard place right now. Thank
you!
give support
User imparts a strong message of encouragement to the community. (Look for terms like “so
I just wanted everyone to know that it’s possible
and you can do it”)
Hey y’all! Well today the depression paid me visit
but I kept it caged! Anxiety about 20% Did a
2.5 mile run and that helped tons. I can’t say it
enough: exercise really helps withdrawals. If you
can then DO IT! When the wds hit don’t crawl into
bed - get up and move!
update
Update the community on the user’s status
The only reason I’m not getting more is the stress
involved in getting them and setting up a supply
because you can’t have just one. WD today are
ok not too bad. It’s my neck that’s killing me and
my body laughing at the Advil I took.
general guidance
Non-medical advice that doesn’t fall into any of
the above categories. (Typical examples include
questions of how to deal with telling spouses
about addiction, whether to cut off a family member etc.)
Do any of you guys have experience with giving
a husband an ultimatum? It seems simple: Get
treated or you’re out. But with 3 young children
it’s actually quite complicated. Help.
relapse concern
Often patients claiming to be clean but need a
medical procedure that will require pain meds.
i had an accident yesterday that got me stuck in
the emergency room. today i’m 21 days off my
roxies [...]. i ’m scared of going back because I
know i’ll be given pain meds [...]
n/a
Impossible to determine
I’ve been away for few days and everything seems
different. Anyway I hope everyone is doing great.
6.6.3
73
Classifier Performance
The final classifier performs well, achieving an accuracy of 80.98% in 10-fold cross validation versus
a baseline of 59.7% in which every post is labeled with the majority class. Table 6.4 shows precision,
recall, and F1 scores averaged over all 10 folds.
Table 6.4: Classifier performance for labeling initiating posts as seeking informational support or emotional support. Performance scores are averaged over 10 folds.
Label
6.7
6.7.1
Precision
Recall
F1
support
information
84.57
76.18
83.40
77.12
83.84
76.41
Average
80.37
80.26
80.12
Classifying Updates vs. Non-updates
Classifier Training
To automatically label all posts with update or non-update labels, we again built a logistic regression
classifier, using the same training and testing dataset from § 6.6.1. The non-update posts contain all
posts that are not an update or n/a. We added two features to those used in our informational vs. emotional classifier (§ 6.6): whether the post mentions a number of days (using the pattern: “day” or “days”
followed by a number), and time elapsed (days) since the user’s last initiating post. Table 6.2 shows that
the ratio of update to non-update posts is roughly 1:3. To compensate for this class imbalance, during
classifier training we randomly sub-sample such that non-update post quantity is at most 1.5x that of
update posts. We do not change the test set.
6.7.2
Classifier Performance
Our classifier achieves an accuracy of 78.40% compared to the majority-class baseline accuracy of
62.55% in 10-fold cross validation. Table 6.5 shows precision, recall and F1 scores.
74
Table 6.5: Classifier performance labeling posts as either update or non-update. Performance scores
are averaged over 10 folds.
Label
6.8
Precision
Recall
F1
update
non-update
72.15
82.36
69.29
84.16
70.09
82.99
Average
77.25
76.72
76.54
Results
Users post primarily on their own behalf:
In our sample, ∼85% of initiating posts were written by
the author on her own behalf, while only ∼8% were written on behalf of someone else. This differs from
reports by the Pew Research Center that find that ∼50% of online health inquiries are made on behalf
of another [90, 91]. It is possible that the stigmatized nature of addiction prevents users from disclosing
their situation to loved ones, who might otherwise ask questions on their behalf. Another possibility is
that the act of posting on Forum77 during the physically uncomfortable and painful process of withdrawal
is cathartic in and of itself: a benefit unavailable to proxy participants.
Informational and emotional support are the driving motivations for initiating discussion:
In
congruence with prior work, our thematic analysis revealed that seeking informational and emotional
support drives user participation on Forum77. Applying our classifier to the entirety of the Forum77 data
set, we find that users seek both types of support in roughly equal proportion: 47% of all initiating posts
seek primarily informational support, while 53% of all initiating posts seek primarily emotional support.
This stands in contrast to our manually-annotated sample (Table 6.2) in which only 36.8% of initiating
posts are informational. Given that our machine-labeled sample comprises recurring Forum77 users,
one potential explanation for this is that longer-tenure or more involved users seek emotional support
more than users who post only a couple of times on the forum.
Informational posts seek explicit medical advice about withdrawal:
Users primarily seek knowl-
edge on withdrawal methods, management and expectations in informational posts. Table 6.2 shows
that in our sample, almost 75% of informational posts specifically discuss the topic of withdrawal. A
casual analysis of informational posts also reveals that the type of information requested by users is
often explicitly medical in nature, such as the pharmacological management of withdrawal. A prevalent
75
example of this is Thomas’ Recipe, an opioid withdrawal tapering schedule that has evolved on Forum77
over time (§ 6.8.1).
Informational threads receive fewer responses, but have a longer lifespan:
Approximately 95%
of both informational and emotional initiating posts receive a response. Of these, initiating posts that primarily seek emotional support receive more responses than those seeking informational support (mean
8.7 vs. 7.4, median 6 vs. 5). The distributions are significantly different (Mann-Whitney-U test, n1 =
39,553, n2 = 38,954, U = 758,376,673, p < 0.001).
The “lifespan” of a discussion is the number of days between its initiating post and the last response
on record. On average, initiating posts that seek primarily informational support have a lifespan 2.5
times as long as those that seek primarily emotional support (mean 74.4 days vs. 27.6 days, median 0
(< 24 hours) vs. 0). The differences in means are statistically significant (Mann-Whitney-U test, n1 =
37,112, n2 = 41,395, U = 817,010,310, p < 0.001). Most (56% of informational and 59% emotional)
discussions have a lifespan of 0 days (<24 hours). Excluding these, informational discussions remain
dominant in terms of lifespan (mean 170.3 days vs. 68.8 days, median 2 days vs. 1 day).
Update posts are the most prevalent type of emotional post:
Our classifier identifies some 15,000
out of ∼55,000 (30%) initiating posts as updates. Update posts comprise a log-like status update of
the user’s current condition, and rarely explicitly request any sort of response from the community. For
example:
I was used to taking 8-10 5/325 oxycodones a day. Havent taken any of them since Friday
but I took one Oxy 40mg Sat and one on Sunday morning. Its been almost 24 hrs and not to
bad so far but im sure there is more to come.
Despite the lack of specific requests, update posts do indeed trigger a community response, as we
discuss in the next paragraph.
Update posts have more responses & more unique contributors, but shorter lifespans:
To further
assess the role that update posts play, we compared several features of threads that were initiated by
update vs. non-update posts. Update threads have a shorter average lifespan than other threads (mean
= 10.8 days vs. 30.0, sd = 88.8 vs. 151.1; t435332 = -18.2, p < 0.001). It is possible that the personal
nature of an update post makes them difficult to repurpose. Other differences are small: on average,
76
55%
9.7 days
Update
45%
4.4 days
Nonupdate
71%
22 days
29%
8.2 days
Figure 6.2: Normalized transition probabilities and average transition times between consecutive update
and non-update posts.
threads initiated by update posts net slightly more responses (mean 7.19 vs. 6.65; t27230 = 7.2, p <
0.001) and slightly more unique contributors (mean 4.91 vs. 4.35; t27126 = 10.6, p < 0.001).
Time elapsed between consecutive update posts is short:
Figure 6.2 shows users’ transition fre-
quencies between initiating update and non-update posts, along with the average number of days between transitions. Users posting consecutive updates do so in comparatively quick succession, averaging 4.4 days between each update.
6.8.1
Thomas’ Recipe: An Informal Collaboration
During our analysis, we noticed that not only do users share explicit medical advice with one another:
they test, evaluate, modify and re-share it. In others words, users informally collaborate on developing
treatment protocols that are effective at assisting withdrawal. A prevalent example of this on Forum77 is
Thomas’ Recipe.
Thomas’ Recipe6 is a detailed treatment protocol for medication-assisted opioid withdrawal management. It was written in the early 2000’s7 by a Forum77 user who had years of experience detoxing from
opioids, but no medical qualifications. Over the years, the original Thomas’ Recipe has evolved. Table 6.6 shows a version of Thomas’ Recipe from circa 2000, while Table 6.7 shows a version from circa
2006. While the core content remains, the newer version has a great deal more structure and formalization. Details of the recipe have also changed. For example, the older recipe recommends a 4000mg
6 http://www.medhelp.org/tags/health_page/45/Addiction/Thomas-Recipe-Re-Posted?hp_id=16
7 While our data set officially starts in 2007, it also contains some posts from as far back as 1999. We believe that this was
either a pilot program or another forum that was acquired by MedHelp.
77
dose of L-Tyrosine, while the newer recipe suggests beginning with a 2000mg dose and scaling up as
necessary.
An informal assessment of iterations of Thomas’ Recipe on Forum77 suggest that these changes are
a result of user testing and feedback. Users’ comments, too, suggest that over time, they have modified
Thomas’ Recipe to make it more generally applicable and effective:
“I’m actually doing pretty good I’ve taken the Thomas recipe from day 1 but I’ve also added
Vitamin D, and niacin.”
“I have a modified Thomas Recipe that seems to have done wonders on my withdrawals if
anyone is interested. (No Xanax or Valium etc) Added Potassium pills, Ensure protein drinks
(since I cant eat anything solid yet).”
“If it helps any, I did a modified Thomas’ Recipe. I didn’t use any pharmaceuticals and added
some additional supplements (Magnesium, Potassium and Calcium for RLS and Melatonin
for sleep).”
Thomas’ Recipe is wildly popular on Forum77. Approximately 1.72% of all posts in our data set
mention it directly. Moreover, it is not constrained just to MedHelp: these days Thomas’ Recipe is hosted
on a number of addiction recovery sites8
9 10 11
, and a Google search for “Thomas’ Recipe” brings up
sponsored advertisements for opiate withdrawal remedies.
The recipe’s prevalence is likely testimony to the fact that it does genuinely assist the process of
opiate withdrawal. Forum77 users swear by its efficacy, calling it a “life saver”, a “god send”, and something that “works wonders”. To evaluate the efficacy of Thomas’ Recipe, we showed it to a psychiatrist
specializing in addiction. She noted that not only was the recipe very similar to a treatment she might
have recommended professionally, but also that it contained novel elements that would facilitate the
withdrawal process.
6.9
Discussion
Forum77 serves as a valuable, user-generated repository of medical information pertaining to the process of addiction recovery. Moreover, this information is not static: it is curated, tested and modified. As
8 http://www.drugs.com/forum/featured-conditions/thomas-recipe-opiate-withdrawal-35169.html
9 http://www-personal.umich.edu/
~timaster/biopsych/home.html
10 http://opiatewithdrawaltips.com/thomas-recipe
11 https://www.drugs-forum.com/forum/showthread.php?t=12568
78
we saw in the example of Thomas’ Recipe (§ 6.8.1), users actively collaborate on developing effective
treatment protocols. The continual evolution of informational artifacts on Forum77 is likely a contributing
factor to the fact that informational discussions have significantly longer lifespans than emotional discussions. Another factor that we have observed lengthening the lifespan of informational discussions is
that some users repurpose them, sometimes years after the initial post, to describe their own situation.
In doing so, users may feel that they are not starting from scratch, that they have a ready made description of their condition, or that they are leveraging work that the previous initiator put into finding other
Forum77 members who could address their specific issue.
While users do explicitly seek emotional support on Forum77, most emotional posts are not explicit
requests, but rather, update posts. The prevalence of the update post suggests that users place value
in having a community bear witness to their struggle with addiction. The fact that update posts garner
slightly more responses on average than non-update posts shows, too, that responses are expected. It
is possible that users publicize update posts (rather than writing them, for example, in a private journal)
as a self-enforcement mechanism to help them progress with cessation. Qualitative evidence shows that
users feel a great deal of embarrassment and shame when a withdrawal attempt fails, and that failing
may even delay their return to the community.
In addition to having a community of witnesses, users derive utility from the process of documentation
itself. Authors find it valuable to reflect upon their past posts, which serve as reminders and evidence
of both accomplishments and regressions. For example, one user reflects on something that she was
scared to do:
I just found some old post about no desire for sex. Whew! I was so scared to ask the
question.
Another laments a relapse:
I cna’t believe I’m at 25 days when I was in the hundereds before. I’m so angry at myself for
relapsing and still keep beating myslef up!!
Readers, too, find others’ chronicles both informative and illustrative. This user mentions reading
through hundreds of old posts to glean insight into what his withdrawal will be like:
This is my first d/x and pray that it will be my last. I’ve read through tons of old posts and
they definitely help.
79
Another poster used narratives on Forum77 to help her husband prepare for the process of her
recovery:
i have showed him this site and let him read some of your stories, so he knows its not all
going to be plane sailing
6.9.1
Limitations and Future Work
The primary limitation to our work is our requirement that a post be labeled as either informational or
emotional. In our experience, while only one of these labels tends to be dominant in an initiating post,
Wang et al. [250] and Biyani et al. [28] do show that finer-grained labeling is possible at scale. Although
picking the dominant label was sufficient for examining our analysis questions, a more nuanced analysis
might benefit from more detail.
A natural avenue for future work is to analyze response posts in addition to initiating posts. While
Wang et al. [250] utilize the same scales of emotional and information support in scoring both initiating
and response posts, our informal analyses of Forum77 response posts suggest that response categories
would require an entirely new descriptive taxonomy. (For example, a fairly common response tactic
that we observed that does not manifest in Table 6.2 is the hijack : when a user attempts to shift the
focal attention of active thread participants away from the initiator and onto herself, usually by claiming
identical circumstances to the initiator. This tactic often kills the thread.) Having derived this taxonomy,
however, one could start to ask questions such as, “What is the most effective way of getting informational
support?”, or “What types of initiating threads draw a diverse crowd of respondents?”.
6.10
Summary
We set out in this chapter to answer the question: “What do users seek on Forum77?”. We first motivated
our focus on the topic of addiction, noting that both its prevalence and stigma make it a potentially
rewarding focus of study (§ 6.1). We then presented related work on identifying types of support seeking
on online health forums (§ 6.2), and described the data samples used in this chapter (§ 6.3).
Through conducting a thematic analysis over a sample of initiating posts, we found that, in congruence with prior work, users seek both informational and emotional support on Forum77. Moreover, we
discovered that the most prevalent form of emotional support seeking was to issue update posts: essentially status logs containing no explicit request for a community response (§ 6.5). With some feature
80
engineering, we were able to train two binary statistical classifiers to distinguish emotional from informational posts (§ 6.6), and update from non-update posts (§ 6.7), with F1 scores of 80.12% and 76.54%,
respectively. Applying our classifier to the entire Forum77 data set, we then analyze differences between
these post categories (§ 6.8). We find, for example, that informational posts have a longer lifespan than
emotional posts, and that while update posts make no explicit request for feedback, they garner more
responses on average than non-update posts. We also analyze Thomas’ Recipe (§ 6.8.1), an informational artifact of Forum77 that provides users with instructions for medication-assisted detoxification from
opioids.
In conclusion, Forum77 provides two main services to users: first, it serves as a repository of information on opioid abuse that is generated, tested, and modified for improved efficacy by community
members. Second, it offers a space where the disclosure of personal progress (whether forward or
backward) can be witnessed by others and recorded for posterity. In Chapter 7 we turn our attention to
identifying which drugs Forum77 users abuse.
81
Table 6.6: Thomas’ Recipe (circa 2001)
THOMAS RECIPE
Here’s my tried-and-true do-it-yourself ”cold turkey” detox protocol.
Supplies you’ll need first:
As many Valium, Xanax, Librium or Klonopin as you can get your hands on.
— first day off the opiate, use enough Valium or whatever, to, if possible, sleep through most of the first couple days. Then start
decreasing the dose until you’re down to nothing in about 5 or 6 days. You’ll have to do the math. The Valium or one of its sister
drugs will help tremendously with the anxiety and, somewhat, with the body aches. Valium may make you eat like a pig and, when
withdrawing from narcotics, one usually craves sweets, so I’d be ready to indulge myself, along with some good escapist movies.
That always worked for me.
Around-the-clock access to either hot baths or a Jacuzzi.
–speaking of those goddamn mostly thigh cramps that seem to love to show up in the middle of the night, have that hot bath or
Jacuzzi at the ready. Don’t hesitate to spend the majority of the week in that hot water if that’s what it takes to get you through
it. You may be wrinkled, but you’ll have your sanity. Don’t underestimate what the hot baths can do to relieve the withdrawal
discomfort. They really work. Heating pads between the thighs can help with those cramps, too, but not as much as the hot baths.
Brand-name-only Imodium (over the counter at the supermarket)
– if you’re a normal hydro addict, you’ll be getting the runs by no later than the second or third day off the lorcet. In my experience,
it’s an especially unpleasant variety. At the first impulse, take two or three and respond to returning urges with two tabs. It’s
important that you do it immediately.
L-Tyrosine (qty 50 of the 500mg caps) - an amino acid available at the health food store.
– chronic use of narcotics depletes the brain of several critical neurotransmitters responsible for well-being and mental performance
and attitude.
Plus: Bottle of 100 mg B6 caps
My experience detoxing with this stuff says take 4000 (four thousand) mg. (8x500mg caps of L-Tyrosine) with two 100mg B6 caps
every day for your ”detox week” to provide your brain with the raw material it needs to replenish its stores of these neurotransmitters.
Many feel the difference on the very first dose. ***Take it on an empty stomach, either first thing in the morning or at bedtime. You
can continue this regimen after the first week if it continues to make you feel good. I continue to use it every other day with very
few exceptions. After a few weeks, I cut down on the dosage, though, as it can cause the runs at high doses.
Multi-vitamins (most junkies don’t eat too well, so this one’s just for good sense).
Take a look at this link. According to this doc, you also need to add copper, phosphorus and Vitamin C to replenish the dopamine,
and the norepinephrine. You might have to do some hunting at the health food store to find the right vitamin or vitamins to supply
all this stuff. I got a pretty good result from just the L-Tyrosine and B6, however.
I also understand from another contributor that zinc and magnesium help replenish and restore vital substances depleted by
narcotics use.
WARNING: This same site says to avoid L-Tyrosine if you’re on an SSRI (serotonin reuptake inhibitor) such as Prozac, etc.
Good luck.
Thomas
Sourced on 9-02-2014 from: http://www.medhelp.org/posts/Addiction-Substance-Abuse/How-Long-Untill-You-Are-Normal/show/
43582
82
Table 6.7: Thomas’ Recipe (circa 2006)
THOMAS RECIPE
PLEASE NOTE: I am not a doctor, simply a long-time Rx opiate junkie who has had many opportunities to develop a way to
detox. This is a recipe for at-home self-detox from opiates based on my experience as well as that of many other addicts. It is not
intended as professional medical advice. It is always wise to make sure none of the recipe ingredients or procedures conflict with
medications you may be taking. Likewise, if you have any medical condition, disease, allergy or any other health issue, consult
your doctor before using the recipe.
Thanks, Thomas
If you can’t take time off to detox, I recommend you follow a taper regimen using your drug of choice or suitable alternate – the
slower the taper, the better.
For the Recipe, You’ll need:
1. Valium (or another benzodiazepine such as Klonopin, Librium, Ativan or Xanax). Of these, Valium and Klonopin are best
suited for tapering since they come in tablet form. Librium is also an excellent detox benzo, but comes in capsules, making
it hard to taper the dose. Ativan or Xanax should only be used if you can’t get one of the others.
2. Imodium (over the counter, any drug or grocery store).
3. L-Tyrosine (500 mg caps) from the health food store.
4. Strong wide-spectrum mineral supplement with at least 100% RDA of Zinc, Phosphorus, Copper, Magnesium and Potassium (you may not find the potassium in the same supplement).
5. Vitamin B6 caps.
6. Access to hot baths or a Jacuzzi (or hot showers if that’s all that’s available).
How to use the recipe:
• Start the vitamin/mineral supplement right away (or the first day you can keep it down), preferably with food. Potassium
early in the detox is important to help relieve RLS (Restless Leg Syndrome). Bananas are a good source of potassium if
you can’t find a supplement for it.
• Begin your detox with regular doses of Valium (or alternate benzo). Start with a dose high enough to produce sleep. Before
you use any benzo, make sure you’re aware of how often it can be safely taken. Different benzos have different dosing
schedules. Taper your Valium dosage down after each day. The goal is to get through day 4, after which the worst WD
symptoms will subside. You shouldn’t need the Valium after day 4 or 5.
• During detox, hit the hot bath or Jacuzzi as often as you need to for muscle aches. Don’t underestimate the effectiveness
of hot soaks. Spend the entire time, if necessary, in a hot bath. This simple method will alleviate what is for many the worst
opiate WD symptom.
• Use the Imodium aggressively to stop the runs. Take as much as you need, as often as you need it. Don’t take it, however,
if you don’t need it.
• At the end of the fourth day, you should be waking up from the Valium and experiencing the beginnings of the opiate WD
malaise. Upon rising (empty stomach), take the L-Tyrosine. Try 2000mgs, and scale up or down, depending on how you
feel. You can take up to 4,000mgs. Take the L-Tyrosine with B6 to help absorption. Wait about one hour before eating
breakfast. The L-Tyrosine will give you a surge of physical and mental energy that will help counteract the malaise. You
may continue to take it each morning for as long as it helps. If you find it gives you the ”coffee jitters,” consider lowering
the dosage or discontinuing it altogether. Occasionally, L-Tyrosine can cause the runs. Unlike the runs from opiate WD,
however, this effect of L-Tyrosine is mild and normally does not return after the first hour. Lowering the dosage may help.
• Continue to take the vitamin/mineral supplement with breakfast.
• As soon as you can force yourself to, get some mild exercise such as walking, cycling, swimming, etc. This will be hard at
first, but will make you feel considerably better.
—Thomas
Sourced on 9-02-2014 from: http://www.drugs.com/forum/featured-conditions/thomas-recipe-opiate-withdrawal-35169.html
Chapter 7
Identifying Drugs of Choice
Monitoring drug use at a population level is crucial for observing, managing, and responding to substance
abuse-related issues, such as the emergence of new “designer drugs”, or the existence of particularly
vulnerable populations. Drug use trends could also be useful for exploring more theoretical aspects of
addiction, such as the Gateway Hypothesis [139], which proposes that drug use follows a progressive
and hierarchical sequence in which the user begins with legal addictive substances (e.g. alcohol and
cigarettes), before progressing onto marijuana and, finally, illicit substances.
The stigmatized [174,176,187] and often illegal nature of substance abuse, however, can make such
data collection difficult. Existing substance abuse surveillance efforts are restricted to convenient populations: schools (Monitoring the Future1 ), hospital emergency room visits (Drug Abuse Warning Network2 ),
state run treatment facilities (Treatment Episodes Dataset3 ), and in-person mutual help groups (Narcotics
Anonymous4 ). However, as membership in each of these populations can be compelled, these surveys,
while large-scale and thorough, fail to capture a more representative sample of drug users.
Despite the fact that millions of people voluntarily participate in online health communities for substance use disorders, almost no prior work attempts to derive drug usage data from PAT. Our goal in this
chapter is to profile substance use in the Forum77 population, and to compare this against traditionally
surveyed drug-using populations. We begin by developing a method for automatically identifying Forum77 users’ drugs of choice (DOCs) from their initiating posts (§ 7.3). As this task is context-sensitive,
we build on lessons learned in Chapter 5 and train a conditional random field (CRF) classifier that identifies DOCs with F1, Precision and Recall scores of 84.65%, 91.12% and 79.46%, respectively. Next, we
1 http://www.monitoringthefuture.org
2 http://www.samhsa.gov/data/dawn.aspx
3 http://wwwdasis.samhsa.gov/webt/information.htm
4 http://www.na.org/?ID=PR-index
83
CHAPTER 7. IDENTIFYING DRUGS OF CHOICE
84
manually develop a map for resolving identical entities (e.g. Vicodin and Hydrocodone) extracted by our
classifier, and mapping these to classes.
Applying our classifier to the entire Forum77 data set, we develop a profile of substance use in
the Forum77 population. We contrast this with survey data on the face-to-face peer recovery group
Narcotics Anonymous (NA), as well as survey data on individuals who present to addiction treatment
centers (TEDs) and emergency rooms (DAWN) (§ 7.4). After normalizing each data set for comparison
(§ 7.4), we present both comparative results as well as substance use trends on Forum77 over time
(§ 7.5). Compared to other measured drug-using populations, prescription opioid use is highly prevalent
in Forum77, while use of more traditionally-abused substances (e.g. alcohol, marijuana and cocaine)
is notably scarce. Over time, opioid replacement therapy drugs have become increasingly prevalent on
Forum77, while use of other prescription opioids has declined. We discuss possible explanations for and
implications of these results (§ 7.6) before concluding (§ 7.7).
7.1
Related Work
Two branches of prior work apply to this chapter: the primary one is syndromic surveillance, which
is concerned with the utilization of of health-related data for the purpose of detecting, analyzing and
monitoring potential disease outbreaks [128]. We discuss syndromic surveillance in depth in § 3.2. The
second is work related specifically to observing substance abuse trends in online data, which we discuss
below.
Surprisingly little work attempts to survey substance use via online data, although the potential for
doing so has been recognized [44, 113]. In August 2014, the National Institute on Drug Abuse (NIDA)5
announced the funding of a 5-year initiative to build a substance abuse surveillance system using web
data [113]. A related system, called the “Psychonaut Web Mapping Project” already exists in Europe,
and has demonstrated an ability to give timely and accurate information related to the outbreak of novel
drugs [73]. The project aggregates data scraped from myriad sites, including discussion forums, online
stores, and Google search queries, the latter of which have also been shown to correlate with demand for
specific substances [65]. This is unsurprising, given that the Internet plays host to a highly competitive
market for illicit substances [54, 244]. Dasgupta et al. [66] were even able to show that black market
prices for prescription opioids can be accurately assessed via crowdsourcing. Although sparse, this
5 http://www.drugabuse.gov
85
prior work supports the supposition that PAT is a promising data source for extracting substance use
data.
7.2
Datasets
Users typically offer information about the substance(s) they are using in initiating posts, in which they set
the tone and topic of discussion, and disclose the issue for which they are seeking help. As respondents
may or may not offer similar information about themselves, we restrict our analysis to Forum77’s initiating
posts, of which there are 78,507 authored by a total of 28,005 unique users.
Training & Testing Dataset
Our classifiers require labeled data for training. As we felt that our fa-
miliarity with the data set would expedite labeling and reduce errors, we use 500 posts from the 1,000
initiating-post sample described in § 6.3.2. For completeness, we re-specify our sampling methodology
from § 6.3.2 here: first, we curated a sample of initiating posts from recurring Forum77 users by randomly sampling 200 users who had initiated 5 or more posts. Our 200 sampled users authored ∼32,000
initiating posts, of which we took a random sample of 1,000 for subsequent coding. To prevent any user
from dominating the sample, we admitted no more than 30 posts per user.
Analysis Dataset
We conduct our final analysis on all of Forum77’s initiating posts (78,507 posts
authored by some 28,005 unique users).
7.3
Automatically Identifying Drugs of Choice
In this section, we describe how we automatically identify DOCs from Forum77 initiating posts. After
defining the term drug of choice, we manually annotate our training & testing data set. Next, we trained
a CRF classifier to automatically identify drugs of choice in Forum77 initiating posts. Finally, we resolve
the extracted DOC entities to specific categories to facilitate analysis and comparison.
7.3.1 Definition of Drug of Choice
In the context of Forum77 data, we define a drug of choice (DOC) as any substance that the user
indicates that she is, or was, addicted to. Such indications can be direct (e.g. “I am addicted to
percs/patches”) or implied (e.g. “I need to get off 32mgs subox”). We also include as DOCs phrases that
86
unequivocally imply a misused substance (e.g. “chasing the dragon” implies opium, “blazing” implies
marijuana), although we found such occurrences to be rare.
Identifying DOCs in Forum77 text is a context sensitive task: whether a substance plays the role
of treatment or addiction depends on the user. Methadone and buprenorphine, opioids used in opioid
replacement therapy, are common examples. Valium, which is both an addictive benzodiazepine and an
ingredient in Thomas’ Recipe for aiding opioid withdrawal (§ 6.8.1), is another.
7.3.2
Data Annotation
Using the definition above, two authors each labeled DOCs in 300 of the 500 posts in our sample. Interrater agreement calculated on the 100 overlapping posts was high, with a Cohen’s kappa [50] of 0.84.
Of the total sample, 276 (∼ 55%) of posts contained DOC mentions.
7.3.3 Classifier Training & Evaluation
As discussed in § 5.5.1, conditional random field (CRF) models are particularly well suited to identifying
specific entities in text [151]. CRFs are also context sensitive. For example, a CRF could leverage other
words in a sentence to determine whether a term like methadone refers to a substance being abused
vs. a substance being used as a treatment. This, in addition to the fact that prior work has successfully
utilized CRF models to identify a variety of medical terms [159, 222], makes it an appropriate choice for
the challenge of identifying DOCs in text.
We trained a CRF to automatically identify DOCs mentioned in initiating posts on our labeled training
and testing data set. For training, we exclude annotations of general drug terms such as pills, meds and
drugs. As we observed in our work on ADEPT in Chapter 5, generic terms are uninformative as well as a
significant source of classifier error [201]. For full documentation of classifier features, see Appendix C.
Results
Our CRF performs well at identifying DOCs from initiating posts. On 10-fold cross validation it achieves
an F1-score of 84.65%, and Precision and Recall scores of 91.12% and 79.46%, respectively. Table 7.1 shows a breakdown of performance across different types of terms. The CRF performs best
on drug terms that are both specific and correctly spelled (e.g. marijuana, oxycodone) and informal/morphological variations thereof (e.g. pot, oxides), and performs worst on generic drug terms (e.g.
stuff, pain pills). Table 7.2 illustrates the results of applying our DOC classifier to sample sentences,
87
Table 7.1: DOC classifier performance across term categories. The classifier performs best on correctly
spelled, specific drug terms; worst on general drug terms.
Category
Examples
F1 score (%)
Precision (%)
Recall (%)
84.7
91.1
79.5
All terms
Specific
drug
terms,
spelled correctly
(53.1% of all terms)
marijuana, ultram, phenobarbital,
hydrocodone
87.0
90.3
83.9
Informal & morphological
variations of drug terms
roxies, oxyz, subs, pot, vics,
blues, hydros, smokes
84.6
93.4
77.2
General drug terms
pain pills, painkillers,
stuff, substances
79.7
94.0
69.2
powder,
Table 7.2: Examples of DOCs extracted by our CRF classifier. Identified SOA terms are shown in bold
in the context of their originating sentence, and the resolved drug name, generic name and category are
shown on the right.
Sentence
Resolved
Drug
Resolved
Generic
Resolved
Category
My doc prescribed suboxone on Sunday to help
quitting from vicodin.
Vicodin
hydrocodone
opioid
I need help. I am on vic for the last 20 years.
Vicodin
hydrocodone
opioid
She began with meth months ago and now is using
coke.
cocaine
methamphetamine
cocaine
methamphetamine
cocaine
stimulant
As for myself, it was a 7 year run with percs/patches.
Percocet
oxycodone
opioid
and resolving these to drug categories as per § 7.3.4. Note the model’s sensitivity to context: in the first
sentence, suboxone is not extracted because it is being used as a treatment for the author’s addiction to
Vicodin.
7.3.4
Drug Term Resolution
The DOC terms extracted by our classifier vary widely in terms of spelling (we saw 58 variations on
Vicodin alone) and specificity (users refer to drugs with brand, generic and even class names). For
example, somebody might refer to Suboxone as buprenorphine, or even just as an opiate. Resolving
related drug terms to common entities is necessary for analysis and comparison.
88
Table 7.3: Summary of similarities and differences between our Forum77, NA, TEDS and DAWN
datasets. Forum77 is unique in that participation is always voluntary and that users report only substances that they deem relevant.
Forum77
NA
TEDS
DAWN
Population size
19,634
8,837
1,844,720
131,698
Time in which data were generated
2007-2011
2011
2011
2011
Data self-reported?
Yes
Yes
Yes
Yes
Duplicate users in dataset possible?
Yes
Yes
Yes
Yes
Survey population membership voluntary?
Yes
Not always
Not always
Not always
Users can report multiple substances
Yes
Yes
Yes
Yes
Substances reported
only those which user
perceives as relevant
All
All
All
To resolve drug names, we compiled a list mapping misspellings in our data set to a single drug
name (either brand or generic). We then mapped all brand names to their respective generic names,
and finally, categorized each substance into a general class (Table C.1). We ultimately resolved ∼1,200
terms to 90 entities in 10 drug classes (see Appendix C).
7.4
Comparing Real-World DOC Distributions
We compare our results to survey data on the face-to-face peer recovery group Narcotics Anonymous
(NA), as well as survey data on individuals who present to addiction treatment centers (TEDs) and emergency rooms (DAWN). We use the 2011 (most recently available) reports for each of these surveys, and
compare results to the Forum77 data set spanning 2007-2011. We include multiple years of Forum77
data as we find that the DOC distributions in the Forum77 population vary only slightly over time. Below,
we describe how we process each data set, and summarize key similarities and differences between
them (Table 7.3). Final categorical alignment for cross-survey comparison between surveys is described
in Table 7.4.
89
Table 7.4: Alignment of categories across the Forum77, NA, TEDS and DAWN datasets for comparative
purposes. Exact category terms from each survey have been preserved in this table for replicability.
Forum77
NA
TEDS
DAWN
Alcohol
Alcohol
Alcohol
Alcohol
Cocaine
Cocaine, Crack
Cocaine/Crack
Cocaine
Hallucinogens
Hallucinogens (LSD, PCP)
PCP
Other Hallucinogens
LSD
PCP
Misc. hallucinogens
Heroin
Opiates (heroin, morphine)
Heroin
Heroin
Inhalants
Inhalants (glue, Nitrous Oxide)
Inhalants
Inhalants
Marijuana
Cannabis (pot, hashish)
Marijuana/Hashish
Marijuana
Synthetic cannabinoids
Methadone
and Suboxone
Methadone/Buprenorphine
Methadone (non-RX)
Methadone/Buprenorphine
Opioids
Opioids (Oxycodone, Vicodin, Fentanyl)
Opiates/Synthetics
Opiates/Opioids
Stimulants
Ecstasy
Stimulants (speed, crystal
meth)
Methamphetamine
Other Amphetamines
Other Stimulants
Amphetamines
Amphetaminedextroamphetamine
GHB
MDMA
Methamphetamine
Methyphenidate
Sedatives
Tranquilizers
(Klonopin,
Valium, Xanax)
Barbituates
Benzodiazepines
Non-Barbituate sedatives
Other non-benzodiazepine
tranquilizers
Barbiturates
Benzodiazepines
Ketamine
Misc. anxiolytics
sedatives and hypnotics
7.4.1
Forum77
Our classifier identifies DOCs for 19,634 (70%) of the 28,005 users who initiated discussions on Forum77, corresponding to ∼50% of the 78,507 initiating posts analyzed. This corroborates our observation that ∼55% of the posts in our 500-post training and testing sample contained DOC mentions. To
acquire a distribution of DOCs in the Forum77 population, we count, for each drug category (see Table 7.4) the number of unique users who abused a drug in that category. We then normalize the counts
by the DOC-identifiable population size.
7.4.2
90
Narcotics Anonymous
Narcotics Anonymous (NA) conducts an annual membership survey in which respondents identify both
main drugs used as well as any other drugs used on a regular basis [2]. Responses are identified using a
checklist of drug categories (Table 7.4). As the results are published only in aggregate form, we acquired
the raw data from NA for the online component of the survey for analysis. Omitting entries with either a
0 second response time or in which the user declined to answer the drug-related questions, there were
8,837 respondents.
Categorizing heroin in the NA survey data:
While both DAWN and TEDS have a separate category
for heroin, NA groups heroin in to the category “Opiates (heroin, morphine etc.)”. To align the NA data
set with DAWN and TEDS, we classify “Opiates (heroin, morphine, etc.)” with “Heroin”, based on the
assumption that most users in this category are using heroin rather than morphine or other opiates.
7.4.3
TEDS
The Treatment Episode Dataset is an annual survey detailing peoples’ self-reported drug use upon
admission to state and national rehabilitation facilities [241]. There is no need to process this data set
further, and we report results directly from the TEDS 2011 survey (1,844,720 respondents).
7.4.4
DAWN
The Drug Abuse Warning Network (DAWN) is a nationally representative public health surveillance system that monitors drug-related emergency department visits to hospitals. The survey records up to 22
drugs related to an emergency room visit [231]. We considered only DAWN data set instances corresponding to drug misuse (131,698 instances). As 95.5% of the users in this population mention at
most three drugs, we consider only the first three substances mentioned. From these, we filter out substances that are common but not typically abused, such as insulin. Finally, we map the remaining drugs
to categories using the DAWN Drug Reference Vocabulary6 .
6 Available
at http://www.samhsa.gov/data/dawn.aspx
7/4/2014
localhost:8081/index2.html
91
Opioids
Suboxone
Sedatives
Alcohol
Cocaine
Heroin
Marijuana
Stimulants
Hallucinogens
Inhalants
0%
25%
FORUM77
50%
75%
0%
25%
TEDS (2011)
50%
75%
0%
25%
NA (2011)
50%
75%
0%
25%
DAWN (2011)
50%
75%
Figure 7.1: Drug of choice distributions (% of population using) across the Forum77, TEDS, NA and
DAWN data sets.
7.5
Results
Forum77 users struggle with opioid addiction at much higher rates than other surveyed populations of drug users
Figure 7.1 shows substance usage distributions across the Forum77, TEDS,
NA and DAWN surveys. Prescription opioids, utilized by ∼70% of the population, are by far the most
prevalent DOC, followed by opioid replacement therapy opioids Methadone and Suboxone (25%). This
is more than double the population prevalence reported in any of the other three surveys.
Relatively few Forum77 users mention struggling with traditionally abused drugs:
Alcohol, mar-
ijuana and cocaine are the three most prevalent DOCs in the NA, TEDS and DAWN populations (Figure 7.1). However, these three substances are conspicuously scarce in the Forum77 population. For
example, alcohol is reportedly abused by approximately 80%, 55% and 37% of the NA, TEDS and DAWN
populations, respectively, but only by 10% of Forum77 users.
After peaking in 2008, the Forum77 population slowly declines:
Figure 7.2(a) shows the number of
active monthly users by DOC on Forum77. In February 2008, ∼180 unique hydrocodone users initiated
a discussion on Forum77. In contrast, the corresponding number of users for February 2014 is ∼60.
The decline in population of hydrocodone and oxycodone users is steeper than that of other DOCs. To
analyze DOC prevalence over time accounting for population decline, we normalize by population size
(Figures 7.2(b) and 7.2(c)).
Hydrocodone and oxycodone are the most prevalent DOCs on Forum77, but this prevalence declines over time: Figure
http://localhost:8081/index2.html
7.2(b) shows the prevalence of the six most common opioids in the Forum77
1/1
7/4/2014
92
localhost:8081/trends_interactive_raw.html
180
hydrocodone
oxycodone
suboxone
methadone
tramadol
heroin
Number of monthly users by SOA
160
140
120
100
80
60
40
20
2007
2008
(a) Number of unique monthly
7/2/2014
Raw Data Smoothing scale (0-100):
2009
2010
2011
2012
2013
2014
users for the
5 most prevalent opioids in Forum77 from 2007-2014.
localhost:8081/trends_interactive.html
50
Percentage of drug-identifiable population using (%)
LOESS fit
Smoothing parameter [0.25, 0.5, 0.75]:
hydrocodone
oxycodone
suboxone
methadone
tramadol
heroin
40
30
20
10
0
2007
2008
2009
2010
2011
2012
2013
2014
(b)RawUnique
monthly
users for the 5 most prevalent opioids from Forum77 as a percentage of the
scale (0-100):
Data Smoothing
95%localhost:8081/trends_interactive_agg.html
confidence intervals indicate trends.
Percentage of drug-identifiable population using (%)
7/2/2014
population.
LOESS
[48][0.25,
fit lines
with
parameter
0.5, 0.75]:
LOESS fit Smoothing
Rx opioids
ORT opioids
heroin
60
50
http://localhost:8081/trends_interactive_raw.html
1/1
40
30
20
10
0
2007
2008
2009
2010
2011
2012
2013
2014
(c)RawUnique
monthly
scaleusers
(0-100): of opioid replacement therapy (ORT) opioids, other prescription opioids and
Data Smoothing
heroin
a proportion
of the
Forum77 population. LOESS fit lines with 95% confidence intervals
Smoothing
parameter [0.25,
0.5, 0.75]:
LOESS fitas
indicate trends.
Figure 7.2: Prevalence of major opioids in the Forum77 population over time.
http://localhost:8081/trends_interactive.html
1/1
93
population over time. Locally weighted smoothing (LOESS [48]) is used to fit lines to each series, and
95% confidence intervals for each fit are shown. In 2007, hydrocodone and oxycodone are utilized by
approximately 45% and 33% of the population, respectively. By 2011, they each have a prevalence of
approximately 30%, which declines to about 27% (hydrocodone) and 26% (oxycodone) by 2014.
Opioid replacement therapy (ORT) opioids methadone and buprenorphine increase in prevalence
over time:
Figure 7.2(c) aggregates the data shown in Figure 7.2(b), showing the prevalence of ORT
opioids (methadone and buprenorphine), other prescription opioids (e.g. oxycodone, hydrocodone etc.),
and heroin in the Forum77 population over time. While prescription opioids remain the most prevalent DOCs, this prevalence declines from about 70% to 56% over time, while ORT opioid prevalence
increases from approximately 19% to 28%.
Heroin prevalence increases slightly in 2013:
On average, about 5% of Forum77 participants abuse
or misuse heroin until 2013, when the proportion of heroin users starts to increase noticeably, reaching
10% and looking to keep increasing by the end of our data set (Figures 7.2(b) and 7.2(c)). Moreover,
Figure 7.2(a) indicates a small absolute increase in heroin users from mid-2013 onwards, indicating that
the increase illustrated in Figures 7.2(b) and 7.2(c) is not purely an artifact of population normalization
with a decline of hydrocodone and oxycodone users.
7.6
Discussion
Prescription opioids are the strongly dominant DOC on Forum77, with their prevalence far exceeding
that measured in other drug-using populations. We suspect that this is the result of several factors.
First, users may be more receptive to seeking help anonymously online than discussing the issue with a
health care provider, since the healthcare provider may be the unwitting source of the opioids in the first
place [249]. Second, despite a robust evidence base for the medical treatment of opioid addiction [230],
few physicians have training in such treatment [263] and the condition remains highly stigmatized within
the medical community [176, 187]. Third, the more traditional self-help venues for addiction support,
namely Alcoholics Anonymous and Narcotics Anonymous, demand overcoming the stigma associated
with attending such meetings. The fact that opioid use disorders tend not to stem from recreational drug
use, which such venues are historically associated with, likely enhances this stigma. Finally, prescription
painkiller overdoses are growing at a significantly faster rate in the female population [8]. This, combined
94
with the fact that women are more likely than men to seek help online for health issues [37, 57, 90–92,
165], could partially account for the high prevalence of prescription opioid users on Forum77.
The scarcity of alcohol, marijuana and cocaine, the three most prevalent drugs present in the NA,
TEDS and DAWN surveys, could suggest a low number of recreational drug users in the Forum77
population. Alternatively, it is possible that Forum77 users are using alcohol and marijuana, but do not
see this use as problematic and so do not mention it. As we note in Table 7.3, the Forum77 data set
is unique in that users mention DOCs at their own discretion, and are not encouraged to disclose all
substances that they might be abusing. It is also possible that users approach different communities for
these issues: MedHelp, for example, has a separate, albeit very small, forum dedicated to alcoholism7 .
Temporal trends indicate an increase in prevalence of opioid replacement therapy (ORT) opioids and
heroin, and a corresponding decline in other prescription opioids. It is possible, perhaps even likely, that
these trends reflect real-world drug usage: Cicero et al. [46] report a recent increase in heroin usage due
to oxycodone being more difficult to acquire and tamper with. In addition, survey data report a steady
increase in national buprenorphine usage [232] over time, and a slight decrease in non-medical use of
prescription opioids in the younger population [242]. While non-medical use of prescription opioids has
increased in the population of users 50 and older [242], this demographic is less prevalent online [7].
However, drawing epidemiological conclusions from these data without further study into what other
factors might be influencing these trends is ill advised.
7.6.1
Limitations & Future Work
While our work is the first to analyze drug usage trends in an online population, several challenges
remain. Foremost is extending similar analyses to a variety of online forums. Analyzing multiple data
sources would yield more comprehensive insights, and would also help to triangulate features in PAT
that are universally useful for monitoring substance abuse trends.
Finally, a difficult but necessary challenge is to investigate whether and how drug usage trends reflected in PAT align with those observed in the real world. As we discussed in Chapter 2, online health
seeking populations are not necessarily representative of real-world populations. As such, understanding the relationship between PAT-observed and real-world drug usage trends would be necessary prior
7 http://www.medhelp.org/forums/Alcoholism/show/158
95
to utilizing such data for monitoring and surveillance. In sum, however, our contributions in this chapter both propose a viable methodology for automatically identifying DOCs from PAT, and lend the first
data-driven insights into drug usage in an online community.
7.7
Summary
Our goal in this chapter was to profile substance use in Forum77, and compare this to substance use
reported in traditionally surveyed drug-using populations. The ability to monitor population-level drug use
trends is valuable. Despite the popularity and uniqueness of OHCs focused on the topic of substance
abuse, however, no work to date focuses on automatically identifying users’ drugs of choice (DOCs) from
PAT. As such, our contributions – a method for automatically extracting and resolving DOCs, as well as
insights on the Forum77 population acquired through the application of this method – are both novel and
useful.
To automatically extract a user’s DOCs from her Forum77 initiating posts, we used manually-labeled
data to train a CRF classifier (§ 7.3.2 and 7.3.3). We use a CRF classifier as the problem of identifying
DOCs is context sensitive: many commonly abused drugs are also used as legitimate treatments for
withdrawal. Our CRF classifier is highly accurate, achieving F1, Precision and Recall scores of 84.65%,
91.12% and 79.46%, respectively (§ 7.3.3). Finally, to facilitate analysis and comparison, we resolve
extracted entities (e.g. vics, benzos) to drugs (e.g. Vicodin, benzodiazepines), and drugs to categories
(e.g. opiates, sedatives) (§ 7.3.4).
To profile substance use on Forum77, we applied our method to the entire set of initiating posts
on Forum77 (78,507 posts authored by some 28,005 users), and compared our results to those from
three surveys: the Narcotics Anonymous annual membership survey, the Treatment Episode Dataset,
which surveys users in state-funded rehabilitation facilities, and the Drug Abuse Warning Network, which
collects data on substance abuse related admissions to emergency departments (§ 7.4). Our results
(§ 7.5) show that Forum77 users are disproportionately addicted to prescription opioids, while more
traditionally-abused substances, such as alcohol, marijuana and cocaine, are infrequently reported. Our
analyses of drug usage trends on Forum77 over time suggest that Forum77 may reflect real-world trends
in substance use.
Chapter 8
Quantifying Recovery and Relapse
8.1
Introduction
Despite the prevalence of online health forums for substance use disorders, we have little understanding
of the role that they play in the process of cessation. For example, when in the cycle of abuse are they
most helpful to users? As we noted in Chapter 7, most substance abuse data are collected at pointof-care facilities. As such, online health communities (OHCs) are uniquely poised to offer quantified
answers to questions that have previously been answered only anecdotally. For example, in a cohort
of people with substance use disorders attempting recovery, what percentage relapse? Of those who
recover, how long do these recovery periods tend to last?
Our goal in this chapter is to educe patterns of relapse and recovery as they manifest on Forum77.
We begin by describing the process of prescription abuse drug cessation and related prior work (§ 8.2),
and describing the data samples used in this chapter (§ 8.3). We then make the following contributions:
A quantified taxonomy of phases of addiction as expressed by users on Forum77 (§ 8.4). Our taxonomy, developed in concert with an addiction specialist, is based on Prochaska’s Transtheoretical Model
(TTM) of behavior change [203], and serves both as a labeling rubric for mapping text to phases of
addiction, as well as a quantified summary of phase-based activity on Forum77. We use the taxonomy
to manually label initiating post sequences from 191 Forum77 users (2,266 posts total) with the labels
USING , WITHDRAWING
or RECOVERING. We find that Forum77 is most heavily utilized when users are
WITHDRAWING.
An analysis of activity and linguistic features across the phases of addiction (§ 8.5). We identify
features that are characteristic of each phase, and leverage them to train a conditional random field
(CRF) model to automatically label users’ phases of addiction over their tenure on Forum77. Our CRF
96
CHAPTER 8. QUANTIFYING RECOVERY AND RELAPSE
97
achieves an F1-score of 67.6% against a baseline F1-score of 20%. Using CRF-labeled sequences, we
are able to identify (1) whether a user relapsed at some point during their tenure, and (2) whether a user
was RECOVERING at the time of her final initiating post, with F1-scores of 78% and 82%, respectively.
An analysis of transition, relapse and recovery based on the CRF-labeled phase sequences of 2,848
Forum77 users (32,345 posts) (§ 8.6 and § 8.7). We find that overall, progressive transitions are more
prevalent than regressive transitions. Moreover, despite the fact that relapse is common (almost half
of users relapse at some point during their tenure), the chances of a user RECOVERING by her final
post are favorable. Finally, we observe a significant correlation between high forum engagement (both
frequency of participation and volume of response posts authored) during a user’s phases of USING and
WITHDRAWING
and the probability that she is RECOVERING when she leaves Forum77.
We discuss our results in the context of Forum77’s efficacy as a withdrawal aide, implications for
future forum design, and implications for Addiction research (§ 8.8) before concluding (§ 8.9).
8.2
Background
To our knowledge, our work is the first to investigate the topic of prescription drug abuse cessation in
social media. Given the secretive and stigmatized nature of this condition [174, 176, 187], our contribution provides a unique and often overlooked perspective on prescription drug abuse: that of patients
themselves. In this section, we provide an overview of prescription drug abuse as well as the traditional,
in-person mutual help groups Alcoholics Anonymous (AA) and Narcotics Anonymous (NA). Next, we
present work that, like ours, attempts to infer a person’s health state from her social media contributions.
For a review of literature analyzing the efficacy of OHC participation, we refer the reader to § 2.2.4.
8.2.1 The Prescription Drug Abuse Cycle
Prescription drug abuse (or “nonmedical use”) is defined as “the use of a medication without a prescription, in a way other than prescribed, or for the experience or feelings elicited” [249]. Opioid pain relievers,
such as hydrocodone, oxycodone, morphine and codeine, are the most frequently abused prescription
medications [5]. In 2010, some 5.1 million Americans reported misusing prescription pain relievers in
the last month, followed by sedatives (2.6 million) and stimulants (1.1 million) [5].
98
Withdrawal
Withdrawal (or detoxification) is a painful process that is frequently compared to having a bad case of
influenza [6, 84]. Common withdrawal symptoms include agitation, anxiety, muscle aches, insomnia,
sweating, abdominal cramping, diarrhea, goose bumps, nausea and vomiting [6]. Typically, symptom
onset aligns with the first missed dose in the case of a “cold turkey” approach, or within a few days of dose
reduction in the case of a taper [84]. Symptom severity peaks within a few days of final exposure, and
gradually reduces as the user’s physical dependence on the drug weakens [84]. Withdrawal duration,
dependent on biological factors, drug and dosage levels, and withdrawal method, ranges broadly from
7-10 days (cold turkey) [102] to 20-35 days (methadone-assisted taper) [84].
Self-Detoxification
Research on easing the withdrawal process focuses primarily on medication-assisted detoxification overseen by a medical professional, with almost no work on the subject of self-detoxification. We found two
studies in which attendees of the same London methadone treatment facility were interviewed about
prior self-detoxification attempts. In both studies, most patients had attempted self-detoxification, and
many had made multiple attempts [102, 184]. The short-term success rate of achieving 24 hours of
abstinence per episode was 41% [184], while the medium-term success rate of achieving 10 days of
abstinence per episode was 24% [102]. The design of these studies naturally exclude patients who successfully maintain long-term abstinence. When asked why their attempts had failed, subjects pointed to
lack of support during detoxification [102], as well as easy access to drugs and severity of withdrawal
symptoms [102,184]. Patient-reported strategies for effectively completing withdrawal include distraction
and avoidance, especially in the form of physical activity [102]. In addition, Green et al. [106] showed that
informing patients in full as to the type and severity of withdrawal symptoms that they were likely to experience resulted both in lower self-reported symptom severity scores as well as an increased probability
of completing the detoxification process.
Relapse & Recovery
Relapse rates for opioid use are high. Reported reuse statistics for individuals having gone through
detoxification programs range from 81-91% [103, 227]. However, long-term prognoses are more favorable, with evidence suggesting that 45-51% of patients may achieve sustained abstinence, and that
sustained abstinence is a gradual process [103].
99
“Recovery” is a hotly contested term in drug use disorder communities. Many align with the Alcoholics
Anonymous viewpoint that addiction is an uncurable disease and, as such, an individual never fully
“recovers” from addiction [1]. Rather, users who reach sustained sobriety are referred to as being “in
recovery”. In this work, we refer to users who have overcome physical withdrawal as RECOVERING.
8.2.2
In-Person Mutual Help Groups
Alcoholics Anonymous (AA), founded in the 1930s, is one of the most utilized services for substance
use disorders in the world, with over 4 million members across 100 different societies [133]. It has also
given rise to other peer recovery groups for addiction, like Narcotics Anonymous (NA) and Gamblers
Anonymous (GA). AA and NA are almost entirely based on mutual support, even condemning the giving
of medical advice as outside the expertise of the group, instead encouraging members to see a doctor if
medical or psychiatric problems arise [133].
Three decades of accumulated evidence demonstrates that active participation in such groups for
addiction improves outcomes [155], although success rates are ill-defined and vary across studies [20].
A high participation level in AA is reported to be one of the strongest predictors for abstinence [190,
223]. For example, Pagano et al. [190] found that users who actively helped other AA members had
a relapse rate of 55%, while those who did not relapsed at a rate of 75%. Correspondingly, many of
the benefits of AA are thought to stem from the social network that it provides its members, who afford
each other support, role modeling and experiential advice [140]. Kelly et al. [141] find that through their
interactions with other AA members, users experience increased abstinence self-efficacy, increased
spirituality/religiosity and reduced negative affect. Having a sponsor is also thought to help newcomers
avoid relapse [237].
8.2.3
Inferring Health State from Social Media
The idea that social media users’ health states will be somehow reflected in the content that they contribute, and that it may be possible to predict health state from these data, has captured the interest of
several researchers. De Choudhury et al. [69–71] analyze how postpartum depression (PPD) might be
reflected on both Twitter and Facebook. Using their findings, they leverage activity and linguistic features to build models that can predict the onset of PPD from Facebook data [71]. In other social media
studies, both activity features, such as social engagement and connectivity, and linguistic features, such
100
as affect and writing style, have been shown to be useful indicators of depression [72, 129, 191, 208],
neuroticism [208] and post-traumatic stress disorder [118].
A related challenge is to identify a user’s current phase within a specific medical condition. Jha and
Elhadad [136] found that a combination of linguistic and activity features are helpful for identifying cancer stages I–IV. Murnane and Counts [180] conducted an analysis of smoking cessation as reflected on
Twitter. They find that linguistic features of positive and negative sentiment, as well as social interaction variables, were significant differentiators between users who relapsed and users who ceased their
smoking behavior during the time of the study. Finally, Wen and Rosê use logistic regression and flexible pattern matching over posts from an online cancer community to extract pre-defined events onto a
timeline [252].
8.3
Data
Typically, users present their own current substance use situation (e.g., drugs used and number of days
clean) in initiating posts. In contrast, users are liable to discuss a wide range of substance abuse
situations in response posts, including their own and the initiator’s. Accordingly, we restrict our analysis
to Forum77’s initiating posts, of which there are 78,507 authored by a total of 28,005 unique users.
Below, we describe the data sets that we use for taxonomy development, classifier training and testing,
and analysis.
Taxonomy Development:
Our taxonomy development (§ 8.4) is an iterative process; for each iteration
we randomly sampled 1,000 of Forum77’s initiating posts.
Training & Testing Dataset:
In § 8.4.4 we describe the importance of labeling sequences of initiating
posts rather than randomly sampled individual posts (as we did for taxonomy development). For our
labeled data set (§ 8.5.1) we randomly sample 200 users who had authored > 5 initiating posts on
Forum77, and all of their 2,266 initiating posts.
Analysis Dataset:
We analyze all initiating post sequences of users who authored > 5 initiating posts
on Forum77. This totals 41,387 initiating posts authored by 2,848 users.
8.4
101
Exploring & Modeling Phases of Addiction
To systematically analyze phases of substance abuse in Forum77, we require both a valid taxonomy of
phases and a rubric mapping post text to these phases. Towards this aim, we derive a rubric based on
labels from the Transtheoretic Model (TTM) of behavior change, which we describe below.
8.4.1
Transtheoretical Model for Behavior Change
The Transtheoretical Model (TTM) is a framework that describes six stages of change that a person traverses in order to manifest permanent behavior change. Established in 1997 by Prochaska &
Velicer [203], the TTM has been applied to a range of behaviors, from smoking cessation [75, 180, 247]
and substance abuse [175], to sustainable energy usage [123]. The intuitiveness and universal applicability of the TTM make it a useful descriptive tool; however, care should be taken before utilizing it to
inform treatment or intervention [175, 253].
According to the TTM, a person begins in the stage of pre-contemplation, in which she is not thinking
about initiating a behavior change. After contemplation, she moves on to preparation, in which she
makes preparations necessary to initiate a behavior change. The person then moves on to action, a
concerted and deliberate attempt to affect short-term behavior change. If successful, the person enters
a period of maintenance, in which she tries to sustain the behavior change in the long term. If successful,
the person eventually enters the stage of termination [203]. As there is considerable debate over whether
addiction is a terminable condition [1], we omit this stage for our purposes.
8.4.2 Rubric Development
In order to match Forum77 posts to TTM stages, we randomly sampled 1,000 initiating posts. Two authors mapped these posts to stages in the TTM, assigning descriptive labels to emergent sub-categories
specific to the topic of addiction (e.g., tapering and cold turkey are both part of the TTM stage Action) in
the style of a General Inductive Approach [236]. We repeated this process several times, reviewing the
rubric with an addiction specialist prior to finalization. (Note: this is the same thematic analysis process
as that described in Figure 6.1 in § 6.5.)
8.4.3
102
A Taxonomy of the Phases of Addiction
Table 8.1 describes our resulting phase taxonomy, along with example posts (synthesized from genuine
posts to preserve user privacy) and the prevalence of each label in our final 1,000 initiating post sample.
Although descriptively interesting, several of the labels in the taxonomy (e.g., intent to quit and about to
quit) are rare. For parsimony, and to aid subsequent classification accuracy, we collapse labels into three
categories: USING, WITHDRAWING and RECOVERING. This improves inter-annotator agreement (over a
100-post, independently labeled sample) from a Cohen’s Kappa of 0.73 to 0.78.
8.4.4
Labeling People, not Posts
Moving forward, we want to analyze addiction phases at the level of individual people. Two factors that
emerged in our taxonomy development (see Table 8.1) convinced us that labeling randomly sampled
posts would be insufficient for such analyses, and that we should instead label users’ entire post sequences. The first was the high prevalence (9.8%) of n/a labels. These posts are often social in nature
and, taken independently, impossible to assign to a class. However, when read in the context of the
author’s previous and subsequent posts, the label is usually obvious (see Figure 8.1). The second factor
was the low prevalence of relapse labels. We noticed that while many users relapse, few announce the
fact directly. Rather, most users will mention a relapse when they are already committed to another cessation attempt (e.g., about to quit or even quitting again). However, a relapse can still be observed in a
regressive sequence, such as WITHDRAWING → USING (see Figure 8.1). Based on these observations,
in the rest of this paper we label sequences of posts.
8.5
Characterizing the Phases of Addiction
Phases of addiction coincide with distinct physiological and psychological states. In this section, we
analyze activity and linguistic features that might characterize an author’s phase on an initiating-day. We
define an initiating-day to be any day on which the user initiated a thread on Forum77. If the author
initiated multiple posts, we combine them for analysis. Our goal is two-fold: (1) to characterize phases
of addiction as they are expressed on Forum77, and (2) to identify discriminative features that might be
used for classification.
103
Table 8.1: Addiction Phase Taxonomy derived via a thematic analysis.
Final Category
TTM phase
Label
Description
Synthesized Example
USING
Precontemplation
Using
Subject is using substances and demonstrates no intention to
quit.
it has been forever since I’ve been
here and not much has changed.
I am still using the prescribed
amount of oxycodone for neck
pain.
3.1
Addicted
Subject is using substances and indicates
that she is addicted, but
demonstrates no intent
to quit.
Subject has used substances again after an
attempt to quit.
my girlfriend and i r both addicted
to percs but she is taking way
more than me and keeps getting
chest painonce every other week.
7.4
I just messed up majorly. I was
6 days clean, doing OK-ish, when
my mother stopped by with 10
Vics “incase I needed them”. Of
course, being the WEAK person I
am, I took them all right there.
1.3
Relapse
WITHDRAWING
RECOVERING
8.5.1
%
Contemplation
Intent to quit
Subject expresses desire to stop abusing a
substance in the future.
I want off roxies. is methadone
the answer. I need to work daily.
I cannot do withdrawls. PLEASE
HELP!
9.3
Preparation
About to quit
Subject
notes
time
and/or plan (e.g., tapering schedule) to
quit.
i was planning to quit the first
week of March. True to form addict fashion I’m out of both money
and pills. So I‘m about to go ct
now instead of next week when I‘d
planned.
2.5
Action
Quitting
Subject is in withdrawal;
method unspecified.
39.1
Tapering
detoxification method is
a taper.
Cold Turkey
detoxification method is
cold turkey.
Today is my 5th day of FREEDOM! I havent experienced any
w/ds yet. So much energy.
Have some Vics I am taking. I am
down to 6 a day. I plan to go down
to 3 a day then 1 a day until I am
done!
I am on day 6 of CT from 150mg+
a day of ocycodone. I‘m doing fine
just some overall anxiousness
In recovery
Subject has finished
detoxing; no physical
withdrawal
symptoms
expressed
Just an update to tell you that I
have 67 clean days today. I feel
amazing. I sleep well now and feel
good! I’ve had a lot of discussions
about aftercare.
17.8
n/a
Impossible to determine
status based on post
I’ve been away for few days and
everything seems different. Anyway I hope everyone is doing
great.
9.8
Maintenance
6.4
3.3
Sample & Labeling
To study how addiction phase sequences change over time, we restrict our analysis to users who have
initiated at least 5 threads on Forum77 (n=2,848 out of 29,196 users who initiated at least one post).
Of these, we randomly sampled 200 users (∼7% of the full 2,848) and all of their initiating posts. We
Label sequences, not posts
104
Day 4 off vics today and some cravings
but I’m going strong!! -WilB
Hey guys. Just checking who’s hanging
around on the forum tonight. Peace!
Absence
USING
WITHD.
RECOV.
6 days today and feeling pretty terrible.
The restless legs are killing me, can’t…
Relapse
First post
Last post
105 from
Figure 8.1: Illustration of how sequence analysis can (1) reduce NA labels by leveraging context
surrounding posts, and (2) capture relapse events in regressive sequences without requiring the user to
explicitly state that she relapsed.
discarded 9 users from the sample: two who had authored more than 100 posts, one account that
belonged to MedHelp, and six accounts for which there was no clear ownership (several different people
appeared to be using the same MedHelp account). The resulting sample contains 2,266 initiating posts
(average 11.9 posts per user) and comprises ∼5.5% of the full 41,387 initiating posts authored by the
2,848 users who have authored ≥ 5 posts on the forum.
Two authors categorized each initiating post in the sample using the taxonomy presented in Table 8.1.
We labeled each user’s data in chronological order so as to transfer context learned from surrounding
labels. Disagreements (which were rare) were relabeled based on a consensus reached after discussion.
8.5.2 Activity Features
We identify 15 activity characteristics that describe an initiator’s global activity over time, her local activity
5 days prior to the initiating-day in question, and both the initiator’s and respondents’ activity on the
initiating-day. The features capture user activity volume (e.g., number of posts initiated in the last 5
days), engagement (e.g., days elapsed since last response to another user) and attention (e.g., number
of unique respondents to a user’s initiating post on the initiating-day). For a full description of all features,
as well as summary statistics of their distributions across each class, we refer the reader to Table D.3.
8.5.3
105
Linguistic & Content Features
LIWC Features
Differences in word use and linguistic style are believed to reveal a range of information about people,
from psychological state to social identity [196]. The Linguistic Inquiry and Word Count (LIWC) [195]
software calculates 80 linguistic variables over text. In prior work, LIWC has been used to characterize
and distinguish women suffering from Post-Partum Depression (PPD) [71], individuals at risk for depression [72] and smokers on Twitter who are at risk for relapse [180]. We calculate all 80 LIWC variables
over initiating post text as well as over all responses received on the initiating-day. We then examine
differences in these variables across the USING, WITHDRAWING and RECOVERING phases (Tables D.1
& D.2).
Days Mentioned and Question Features
In addition to the LIWC features, we calculate three variables over initiating post text. Users frequently
mention how long they have been clean at the time of posting. We extract days clean automatically by
using hand written patterns, such as “clean X days” and “X weeks off”, where X represents a number.
We convert X to days if necessary. We also use a more relaxed version of this feature, called days
mentioned, in which we do not require the user to explicitly mention terms like “clean” or “off”. Finally,
we count the number of questions asked by identifying sentences that start with a question word and/or
end with a question mark. This feature has proved helpful in prior work [71]. We find that including these
three extra features improves classifier performance by ∼2.2%.
Phase-Specific Term Features
Finally, we count how many phase-specific words occur in both initiating post text as well as response
text. To determine whether a term t is particularly descriptive of a phase p, we calculate its frequencybased odds ratio. If fp (t) is the number of posts of phase p that contain t, then:
OR(t, p) =
fp (t) ∗ fp̄ (t̄)
fp (t̄) ∗ fp̄ (t)
106
The odds ratio is a measure of strength of association. We calculate the odds ratio for each term
across each phase, and retain terms with an odds ratio >2. Table 8.2 shows sample terms for both
initiating and response posts.
Table 8.2: Sample phase specific terms for the USING, WITHDRAWING and RECOVERING categories.
Initiating Posts
Response Posts
USING
withdrawls, wants, hate, addicted, scared,
tried, stop
situation, willing, treatment, withdrawl, option, advise, rehab, counseling
WITHDRAWING
rls, hot, restless, aches, slept, arms, legs,
headache, wd, worst, stomach, tramadol
potassium, heating, fluids, baths, pad, showers, legs, melatonin, hot, slept, bananas
RECOVERING
craving, recovery, lately, sober, fight, truly,
clean, cravings, true, worth
inspiration, accomplishment, congratulations, sharing, thank, miss, proud, paws
8.5.4
Results: Activity and Linguistic Features
We present linguistic features over initiating posts in Table D.1, linguistic features over response posts in
Table D.2, and activity features in Table D.3. Unless otherwise mentioned, we use Kruskal-Wallis tests
to assess statistical significance. A non-parametric test is appropriate for data that are not expected to
follow a normal distribution (such as ours), and a Kruskal-Wallis test determines whether any pair in a
trio of distributions is significantly different.
Our feature analysis indicates that both users’ activity and users’ content and linguistic characteristics
differ measurably across addiction phases. We discuss particularly descriptive features of each phase
below.
USING :
This phase is characterized by long absences from the forum and, correspondingly, low levels
of recent activity. Users who are USING have, on average, been absent from forum participation in all
capacities for more than twice as long as users who are WITHDRAWING or RECOVERING (40 vs. ∼18
days since last activity ). A longer absence from the forum may partially explain why USING posts are, on
average, longer (208 vs. ∼180 words): users must account for lost time and bring their audience back
up to speed.
Both days clean and days mentioned vary widely in USING posts, and have surprisingly high median
values. Examining the underlying data provides an explanation: users who are USING often mention how
107
long they had been clean prior to relapse in statements such as, “I was clean for 4 months before...” or
“I would have had 717 days clean today”.
Finally, USING posts offer the lowest levels of positive affect (16% less than WITHDRAWING and 32%
less than RECOVERING), and the highest levels of discussion around the topic of health (16% more
than WITHDRAWING and 36% more than RECOVERING); characteristics that are mirrored in responses to
USING
posts. The lack of positivity resonates with the fact that users who are USING have either relapsed
or failed to progress towards recovery.
WITHDRAWING:
In recent activity, users who are WITHDRAWING issue more initiating posts and self
responses than those who are USING or RECOVERING. In addition, they have the smallest average
number of days since last initiating post (21 vs. 31 RECOVERING and 50 USING) and days since last
self-response (29 vs. 42 RECOVERING and 66 USING).
As we might expect, WITHDRAWING users express the lowest numbers of days clean and days mentioned. In addition there is a great deal more language about feeling, biological processes and the body.
These observations align with the nature of detoxification as an uncomfortable physical process from
which people constantly seek relief [84].
Responses to WITHDRAWING posts are not particularly distinctive. Aside from expressing slightly
more anxiety, and writing slightly more about feeling and the body, other linguistic variables tend to take
on a value somewhere in between those of responses to USING and RECOVERING. It is possible that
respondents try to influence users from one side of the spectrum to the other, modifying their language
according to the user’s progress.
RECOVERING :
These users are highly active, especially in the area of responding to other peoples’
posts. In recent activity they issue, on average, 15.2 responses to other peoples’ threads, compared to
5.5 by users who are WITHDRAWING and 1.9 by users who are USING. Moreover, unlike WITHDRAWING
and USING users, their
# initiating posts
# responses authored
tends to be <1.
Linguistic features also suggest that RECOVERING users tend to focus on others. The pronoun you
is used almost 100% more while the I pronoun is used less, and language is more social. Moreover,
users express significantly more positive affect (25% more than WITHDRAWING, 48% more than USING)
and less anxiety (18% less than WITHDRAWING, 16% less than USING). The evident outward focus of
initiating posts from RECOVERING users resonates with the 12th step in traditional twelve-step programs
108
such as AA, which encourage people to strengthen their sobriety by using their experiences to help
others achieve it [1].
Responses to RECOVERING posts are distinct in that they express substantially more positive affect
(27% more than responses to WITHDRAWING, 57% more than responses to USING). They also tend to
host a notable quantity of exclamation marks (100% more than WITHDRAWING, 350% more than USING).
Inspection reveals that this is an expression of excitement and encouragement in response to good
news, for example, “hoooooorrrraaaahhhhh!!!!!!!!!” and “I am so PROUD of YOU!!!!!”.
8.6
Automatically Classifying Addiction Phase
Informed by our feature analysis, we next train a statistical classifier to automatically label Forum77 posts
as USING, WITHDRAWING or RECOVERING. Analyses of phase sequences can give insight into events
such as relapse and recovery. Our classifier allows us to scale such analyses to the entire Forum77 data
set. Below, we describe our classifier and report its performance. We discuss relapse and recovery in
§ 8.7.
8.6.1
Model & Features
A user’s path through the different phases of addiction forms a natural sequence. A conditional random
field (CRF) [151] is a probabilistic graphical model that performs inference over sequences, rather than
individual data points. By taking into account prior and subsequent data items in a sequence, CRFs
are context sensitive. For example, unlike a CRF, a non-sequence-based classifier might have difficulty
classifying a post like, “I’ve been away for a few days and everything seems different. Anyway I hope
everyone is doing great...”, even if it was sandwiched between two posts that were obviously USING, as
the post itself contains no clues as to the user’s phase.
Accordingly, we train a 3-class CRF to annotate a user’s sequence of initiating-days with the labels
USING , WITHDRAWING
or RECOVERING. We use an adapted a version of the Stanford Named Entity
Recognizer package, a trainable, Java implementation of a CRF classifier1 , that analyzes sequences of
documents (default unit of analysis is a token). Tables D.1, D.2 and D.3 indicate the subset of features
that we used for classifier training. We selected features based on apparent discriminability and iterative evaluation through 10-fold cross validation. In order to improve robustness and model potentially
1 http://nlp.stanford.edu/software/CRF-NER.shtml
109
Table 8.3: CRF performance scores aggregated over 10 runs of 10-fold cross validation, with randomly
shuffled input sets.
Label
Precision
Recall
F1 score
Accuracy
Combined
68.3
68.0
67.6
69.8
USING
62.4
61.7
61.4
WITHDRAWING
70.6
71.9
70.9
RECOVERING
72.1
71.2
70.9
Baseline
14.0
33.0
20.0
43.0
non-linear responses, we binned numeric features into octiles: ranks that divide the data evenly into 8
groups. While using quartiles is arguably more common in standard practice, we found that using octiles
improved classifier performance.
8.6.2
Performance
Table 8.3 shows precision, recall and F1 scores for the CRF classifier. Our classifier achieves an F1
score of 67.6% against a baseline F1 score of 20.0%, acquired by labeling each instance with the
majority class, WITHDRAWING.
It is useful to know which labels the CRF is likely to confuse. Figure 8.2 shows the CRF classifier’s
confusion matrix. Diagonal entries indicate counts of correctly-classified instances. The strong diagonal
indicates a relatively high level of accuracy. Most classification errors occur between adjacent phases:
confusing USING and WITHDRAWING, and confusing WITHDRAWING and RECOVERING is common, but
confusing USING and RECOVERING less so. This resonates with a point prevalent in the addiction literature: stages of recovery are not black and white but rather fall on a spectrum [79, 168].
8.6.3
Results
We analyze the result of applying our CRF classifier to the entirety of the Forum77 membership base
who have initiated > 5 posts (2,848 users, 32,345 initiating posts). Our results give us insight into
common transitions between addiction phases, enabling us to answer questions such as, “If a user is
WITHDRAWING
today, how likely is it that she will be RECOVERING on her next initiating-day?” and “what
is the most frequent phase change observed on Forum77?”
6/4/2014
localhost:8080/index_transition.html
110
327.2 131.8 62.2
150.2 686.9 142.7
52.2 139.8 560.2
G
Using
CRF LABELS
Recov. Withd. Using
CRF LABELS
Recov. Withd. Using
GOLD LABELS
Using Withd. Recov.
Figure 8.2: Confusion matrix for our CRF classifier aggregated across 10 randomized runs of 10-fold
cross validation.
Figure 8.3(a) shows the normalized transition frequency matrix for USING, WITHDRAWING and RE COVERING .
The most common transitions lie along the diagonal, indicating that users typically initiate
consecutive posts in any one phase. Self-transitions aside, the progressive edges between consecutive
stages (USING → WITHDRAWING and WITHDRAWING → RECOVERING) are the most common, accounting
for approximately 6% and 5.2% of total transitions, respectively. In contrast, regressive edges between
consecutive stages (WITHDRAWING → USING and RECOVERING → WITHDRAWING) are less common,
accounting for 2.6% and 1.1% of total transitions, respectively.
Figure 8.3(b) shows conditional transition probabilities across states. The likelihood of a samestate transition increases with the progressiveness of the state. For example, there is a 71% chance
that a USING user will be USING in her next post, an 81% chance that a WITHDRAWING user will be
WITHDRAWING
in her next post, and a 91% chance that a RECOVERING user will be RECOVERING in her
next post.
Figure 8.4 shows the distributions of phase length in days for each phase. We calculate phase
length as the number of days between the first and last post in a contiguous sequence. The typical
WITHDRAWING
phase lengths align well with those reported in the literature on addiction, which suggests
a 7–35 day duration depending on the detoxification method used, as well as other factors [84, 102].
http://localhost:8080/index_transition.html
6/3/2014
Source State
Recov. Withd. Using
Gold Labels
g Withd. Recov.
111
Target State
Using Withd. Recov.
Target State
Using Withd. Recov.
17.35 6.04
1.12
70.79 24.64 4.57
2.56 33.85 5.23
6.15 81.29 12.56
1.78
5.26
1.11 30.96
(a)
3.28 91.46
(b)
Figure 8.3: (a) Normalized transition frequencies between addiction phases (e.g., USING → RECOVERING
edges comprise 1.12%
of the totalSTATE
transitions in the CRF-labeled
data) and
(b) conditional transition
TARGET
TARGET
STATE
probabilities (e.g., the probability of a user moving from USING to RECOVERING is 4.57%.)
Using Withd. Recov.
Using Withd. Recov.
SOURCE STATE
Recov. Withd. Using
LD LABELS
Withd. Recov.
6/3/2014
8.7
Automatically Classifying Relapse and Recovery
Relapse and recovery are critical events in the process of addiction that are often viewed as “failure”
or “success”. Prior work in the addiction literature suggests that recovery is a long, iterative process
of which relapse is a part [103]. Leveraging our CRF classifier, we present methods for identifying (1)
if a user has relapsed during her tenure on the forum, and (2) if a user is RECOVERING on her last
initiating-day on Forum77. We then investigate if relapse adversely correlates with a user’s chance of
RECOVERING .
Finally, we identify activity features during USING and WITHDRAWING phases that discrim-
inate between users who wrote their final post on Forum77 in a state of RECOVERING, and those who
did not.
8.7.1
Identifying Relapse
To identify a relapse incident, we codify three regressive transition patterns:
RECOVERING
→ { WITHDRAWING, USING }
WITHDRAWING
→ USING
WITHDRAWING
→ (45+ days absent) → WITHDRAWING
http://localhost:8080/index_transition.html
112
7/24/2014
240
220
200
180
160
140
120
100
80
60
40
20
0
7/24/2014
260
240
220
200
180
160
140
120
100
80
60
40
20
0
localhost:8080
Median
Q1 – Q3
(1.5 * IQR) within Q1, Q3
13
13
8
7
7/24/2014
200
180
160
140
120
100
80
60
40
20
0
1
7
24
USING phase length (days)
16
35
WITHDRAWING phase length (days)
17
36
RECOVERING phase length (days)
55
60
localhost:8080
60
localhost:8080
60
79
Figure 8.4: Distributions of phase lengths. A red bar indicates the median value, while the dark blue
region indicates the middle spread. The light blue region indicates values that fall within 1.5 ∗ the
interquartile range of the middle spread.
This last pattern is based on the observation that a general window for withdrawal duration is 7-35
days [84, 103]. As such, if a user was absent for more than 45 days, and then returned in a state of
WITHDRAWING,
it is likely that she failed in her initial attempt and has restarted. While it is possible that
this pattern will capture individuals on a slow taper, in our experience it is unlikely that such users would
be inactive for a full 45 days.
We identify whether a user relapsed or not during her tenure on Forum77 by testing whether any of
http://localhost:8080/
the above patterns exist in her sequence of phase transitions. To evaluate the efficacy of this approach,
we apply it to both the gold label sequences as well as the CRF-labeled sequences in our labeled sample
113
Table 8.4: Performance for identifying relapse events (top) and whether a user’s final state is RECOVER ING (bottom). Combined scores across classes are shown in bold.
Identifying a relapse event
Label
Precision
Recall
F1 score
Accuracy
Combined
79.92
78.18
78.04
78.42
Relapse
86.11
66.67
75.15
No relapse
73.73
89.69
80.93
Baseline
25.65
50.00
33.91
51.30
Identifying final initiating post phase
Label
Precision
Recall
F1 score
Accuracy
Combined
81.47
81.52
81.49
81.57
RECOVERING
79.78
80.68
80.23
¬RECOVERING
83.17
82.35
82.76
Baseline
26.84
50.00
34.93
53.40
data set. Using this technique, we achieve an F1-score of 78% and accuracy of 78% in identifying
Relapse and No relapse, compared to baseline scores of 33.9% and 51.3% if we labeled each user with
the majority class, No relapse (Table 8.4).
8.7.2
Identifying Recovery
To identify whether a user was RECOVERING when she last initiated a post on Forum77, we simply
examine the final phase label in her transition sequence. Using the CRF-labeled sequences, we classify
a user’s last post as RECOVERING or ¬RECOVERING with an F1-score of 81.5% and accuracy of 81.6%;
the comparative baselines are 34.9% and 53.4%, in which all last posts are labeled as ¬RECOVERING
(Table 8.4).
8.7.3
Results
Using the methods described above, we identify users who are RECOVERING at the time of their last
initiating post on Forum77, as well as users who have relapsed at least once during their tenure on
Forum77. We apply this analysis to the entirety of the Forum77 membership base who have initiated >
5 posts (2,848 users, 32,345 initiating posts).
6/3/2014
Sankey Diagram
May 22, 2012
Mike Bostock
No relapse 52%
Withd. 44%
First post
17%
114
37%
Using 48%
Relapse 48%
Sankey Diagrams
Recov. 46%
Last post
Figure 8.5: Aggregated user transitions from start to end state. Bar widths denote population proportion.
For example, 48% of users in our sample relapsed during their tenure on Forum77.
Do users tend to recover on Forum77?
Overall, users progress towards recovery during their tenure.
Figure 8.5 shows the distribution over start state, relapse, and end state for the 2,848 users described
above. Most users first initiate contact on the forum when they are USING (48%), followed by WITH DRAWING
(44%). In contrast, only 17% of users are USING by the time of their last post, while 37% are
WITHDRAWING
and 46% are RECOVERING.
Does relapsing hurt recovery likelihood?
Roughly half of users experience a relapse during their
tenure. Users who experience no relapse are significantly more likely to end in RECOVERING than users
who relapse (53.4% vs. 44.4% end in RECOVERING, χ21 = 55.1, p < 0.001). Despite this, RECOVERING
is still the most likely end state for Forum77 users who relapse.
Are relapses associated with longer tenure?
Given the documented prevalence of relapse [103,
227], the observation that more than half of the users in our data set experience no relapse is surprising.
Analyzing tenure values reveals that the average tenure of no relapse users is 128 days, compared to
418 days for users who relapse. One hypothesis is that users who experience no relapse do relapse
after leaving the forum and do not return.
http://localhost:8081/
What differentiates users who are ultimately RECOVERING?
1/1
We define a user as active if she ini-
tiated a post on the forum in the last 45 days of our data set, and remove these. We then analyze
users’ global activity characteristics (Table D.3) aggregated over their USING and WITHDRAWING posts
(RECOVERING posts are omitted as this is the phenomenon that we are studying). Table 8.5 shows the
results.
115
Table 8.5: Comparison of activity features for users who are and are not RECOVERING in their last initiating post. Per-user values are aggregated over USING and WITHDRAWING posts. Statistical significance
is determined using Kruskal-Wallis tests (*** p < 0.001) after Bonferroni corrections.
not
RECOVERING
Activity Characteristic
p
Mean
Med.
IQR
MAD
RECOVERING
Mean
Med.
IQR
MAD
# initiating posts authored
***
8.99
5
8
4.44
9.89
6
6
2.96
# self responses authored
***
19.56
8
16
10.37
17.04
9
16
8.89
***
45.56
9
31
13.34
33.81
8
24
10.37
# initiating posts
***
0.73
0.50
0.76
0.44
1.04
0.67
0.83
0.49
Days since last init.
***
16.39
3.33
12.41
3.95
27.05
8.30
28.36
10.53
Days since last self-response
***
17.47
3.00
13.38
3.95
29.53
8.29
31.45
10.81
Days since last response
***
15.92
1.66
7.32
2.47
25.30
4.37
21.75
5.99
Days since last activity
***
14.11
1.80
6.09
1.90
20.94
4.80
20.09
5.79
# self responses
***
1.93
1.50
1.64
1.19
1.83
1.50
1.50
1.11
# replies received
***
5.63
5.00
3.40
2.37
5.56
4.83
3.30
2.29
# respondents
***
4.09
3.83
2.00
1.60
4.01
3.70
2.03
1.42
Users who leave the forum in a state of RECOVERING are significantly more engaged in forum activity,
even when they are USING and WITHDRAWING. The average time lapse between any form of activity
(initiation, self-response and response) is about 30% shorter for those who are RECOVERING when
they leave. Moreover, their activity is focused outwardly on other community members: users who are
RECOVERING
author, on average, 50% more responses than those who are ¬RECOVERING (average
45.6 vs. 33.8), but author slightly fewer initiating posts (average 9.0 vs. 9.9). These results resonate
strongly with prior work on AA that finds that both active participation in AA and explicitly focusing on
helping other members correlates with sustained abstinence [190, 223].
8.8
Discussion
Our motivating goals were to study phases of addiction as they manifest on Forum77 and to analyze
the forum’s effectiveness in promoting recovery. In this section, we discuss Forum77’s efficacy as a tool
for supporting users through withdrawal, relapse and sustained recovery, drawing on post excerpts to
contextualize our findings. We then discuss how our results might inform future interface design, before
touching on potential implications for addiction treatment.
8.8.1
116
Use and Efficacy of Forum77
Supporting Withdrawal:
Our results suggest that Forum77 is an effective tool for helping users through
opioid withdrawals and physical detoxification. In general, users progress more often than they regress
(Figure 8.3), and these local progressions translate into a global trend of many users reaching a state of
RECOVERING
ING
during their tenure. When first initiating a post, 48% of users are USING, 44% WITHDRAW-
and 8% RECOVERING; in their most recent initiating post, however, only 17% of users are USING,
37% are WITHDRAWING and 46% are RECOVERING, despite the fact that almost half of the population
experiences a relapse (Figure 8.5). If we interpret our results as a 46% success rate on users’ final
detoxification attempt before leaving the forum, this is an improvement over self-detoxification success
rates reported in the addiction literature [102, 184]. We must be cautious here, however, as we are
comparing across different study designs.
Forum77’s efficacy at supporting detoxification may be attributable, in part, to both the strong social
support and the detailed information on withdrawal that members receive from each other. Both of these
factors have been shown to improve withdrawal outcomes [102, 106, 184], and qualitative remarks from
users suggest that Forum77 meets the mark on both. “I have tried to cope by myself for too long. Its
so hard to deal with something like addiction by your self”, wrote one user. “[T]here is so much support
and advice on getting through this and addiction I am living proof it works!!!!!!”, and “i was on here once
before and was able to achieve 9 months of sobriety due to the support i had here and from meetings.”
remarked others. In other cases, simply discovering a supportive community might galvanize a cessation
attempt: “up until 3 weeks ago, I had no intentions of quitting, i was just looking to find some stuff on
addiction...and i just happened to run across this forum...”.
Relapse and Shame:
RECOVERING
Despite the favorable prognosis that users are more likely to reach a state of
during their tenure (Figure 8.5), we do not know whether they maintain this state upon
leaving. It is possible that the same strong support network that helps users through detoxification
deters them from wanting to admit a relapse. Quantitatively, although almost half of our sample relapsed
(Figure 8.5), we rarely observed posts in which users reported a relapse immediately after the fact
(Table 8.1).
The hypothesis that users are too ashamed to admit relapse until they implement a renewed attempt
to quit is qualitatively well supported. Statements such as “I suck!! I am so sorry, I’ve been too embarrased too admit I fell off the proverbial wagon around Christmas.” are common. Others, such as
117
“haven’t posted in a few weeks because, of course, i slipped up and am ashamed. but now i am back
on track with the sub” and “Im in day 3 of detox, i was too embarassed to post the first 3 days...” echo
these sentiments, and suggest that some users feel that a new detoxification effort is required as proof
of commitment before returning to the community.
Supporting Sustained Recovery:
Without observing users’ behavior outside the forum, we cannot
quantify Forum77’s effectiveness at supporting long term recovery. Qualitatively, however, some users
feel that this is something that Forum77 could improve upon. One user summarizes: “I wonder if there
is not a need for a forum community for long-term support. This community is great, but is skewed
towards the short-term wd symptoms and getting through the initial physical pain of wd.”. Also prevalent
are observations that the forum does not sufficiently prepare users to handle post-acute withdrawal
syndrome (PAWS): “I wish people would warn others about this PAWS thing”, wrote one user. “i was
doing so good i made it to about 100 days sober ... the PAWS really got me”, expressed another.
Moreover, users who return to Forum77 after some time may find that their support network has moved
on. One user who was struggling not to relapse asked “Where are all of the friends i made here that I no
longer see?!?”.
Other users, however, give qualitative evidence in support of Forum77’s efficacy at aiding sustained
recovery. “I have not posted much lately but continue to log on and read ppls posts and I believe that
is a key aspect in my recovery”, states one user. Another wrote “when I get a craving I come here
and read, even if I read it before, it helps me think of what I went through what I’m going through and
how others cope”. We found that higher engagement, in the form of activity levels and volumes of
responses contributed, correlate with the chances of a user being in a phase of RECOVERING by the end
of her tenure. Extending this idea, one possibility is that remaining engaged with the forum (even in the
form of “lurking”) after reaching a state of RECOVERING helps to prevent relapses in a similar way that
continued participation in AA correlates with longer periods of sobriety [190,223]. A deeper analysis into
the mechanisms through which Forum77 does and does not support long-term recovery is an important
topic for future work.
8.8.2 Implications for Forum Design
Computational tools for automatically identifying addiction phases, relapses, and whether a user’s tenure
ends in RECOVERING could prove valuable to communities like Forum77. One question commonly asked
118
by users is what to expect when they quit their drug of choice, and having access to this information has
been shown to improve the chances of a successful cessation attempt [106]. Using phase sequence
data labeled by our CRF classifier, users could set realistic expectations by exploring patterns based
on thousands of others’ prior experiences. Having a realistic perspective of the process of relapse and
recovery may also reduce the number of instances in which users feel too embarrassed or ashamed to
return to Forum77 after relapsing. Finally, exposing such data could help people find others who exhibit
similar patterns to their own. Finding “people like me” is one of the primary stated reasons for user
participation in online health communities [90].
While Forum77 appears to promote detoxification effectively, we observed that users have mixed
feelings about how well it supports sustained recovery. It is possible that this could be addressed via
altering community dynamics. For example, as we suggested above, continued participation in Forum77
post RECOVERING might help users achieve sustained recovery. Efforts focused on decreasing user
churn and increasing member retention could support this. Alternatively, in a similar vein to AA’s sponsorship program, which is thought to promote sustained recovery [237], we might consider automatically
matching newcomers with long-term members who would act as formal mentors (or sponsors). Finally,
it is possible that the community dynamics that support detoxification are different from those that would
support sustained recovery. In this case, a forward reference to a different community might help RE COVERING
8.8.3
Forum77 users plan what to do next.
Implications for Addiction Treatment
Forum77 accrues, at scale, information that is difficult to acquire through formal medical channels. First,
abusing prescription drugs usually entails deceiving one’s doctor. Second, addiction research data are
typically acquired at point-of-care facilities (e.g., emergency rooms) or surveys at high schools or colleges. Although the ethics and privacy of such analyses must be carefully considered, it is possible that
data extracted from sites like Forum77 (e.g., CRF-based transition frequencies, recovery trends, etc.)
could help medical professionals and policy makers better understand patients’ experiences with drug
abuse. For example, insight into the day to day difficulties of opioid-assisted withdrawal might inform
policy for improving the management of this popular treatment down the road. It is also possible that
research like ours could illuminate poorly understood aspects of addiction: to our knowledge, ours is the
first attempt to quantify the cycle of addiction.
8.8.4
119
Limitations
One limitation of this work is the selection bias of our subjects: users who come to Forum77 are likely
already open to (or at least, considering) the possibility of quitting. This problem is well known to those
hoping to analyze the efficacy of Alcoholics Anonymous [20]. As such, care should be taken in applying
our results to a more general population who misuse prescription medication. We cannot assume, for
example, that a random sample of people who misuse prescription medication would similarly progress
towards recovery if they were asked to participate in Forum77. We also cannot draw epidemiological
conclusions that apply to the population as a whole from these data. However, the size of Forum77,
the prevalence of the opioid epidemic, and the increasing popularity of online health communities alone
make the forum worth studying.
Another limitation is the acceptable, but still improvable, accuracy of our CRF classifier. While we
were able to use CRF-based sequences to identify relapse, and whether a user’s final post was written
when she was RECOVERING with high accuracy, improving our underlying classifier performance would
open up more nuanced analyses. Finally, having page view data would allow us to incorporate measures
of passive participation (“lurking”), which would add a new dimension to our study. We hope to address
such opportunities in future work.
8.9
Summary
Our goal in this chapter was to analyze the process of opioid withdrawal, recovery and relapse on Forum77, MedHelp’s Addiction: Substance Abuse community. Drawing on literature from the Addiction
community, we first present an overview of prescription drug abuse and present key concepts and terminology (§ 8.2). Next, using Prochaska’s Transtheoretical Model for behavior change, we develop a
taxonomy of phases of addiction that comprises three main categories: USING, WITHDRAWING and RE COVERING
(§ 8.4). The majority of initiating posts are authored when users are WITHDRAWING. Next, we
analyze linguistic and behavioral features across the USING, WITHDRAWING and RECOVERING phases.
Several significant differences characterize each phase (§ 8.5), and we leverage these results to train a
CRF model to automatically annotate users’ phase sequences (§ 8.6). We can identify relapse events,
and whether a user was RECOVERING when she authored her final post, with high accuracy from our
CRF-annotated sequences (§ 8.7).
120
Applying our classifiers to 2,848 users (§ 8.6.3 and § 8.7.3) reveals that progressive transitions towards RECOVERING are much more prevalent than regressive transitions. Moreover, despite the fact that
almost 50% of users relapse during their tenure, leaving Forum77 in a state of RECOVERING is the most
probable outcome for all users. Finally, we find that increased participation in the community correlates
with a user RECOVERING by the end of her tenure: users who are RECOVERING by their final initiating
post are significantly more engaged with the community when they are USING and WITHDRAWING than
users who are ¬RECOVERING by their final initiating post.
To our knowledge, ours is the first work to investigate the efficacy of online mutual help groups for
prescription drug abuse. Our results, which help to illuminate a previously poorly understood resource,
suggest that Forum77 is an effective detoxification aid. Based on our findings, we also highlight several
ways in which Forum77 might be enhanced to better support its users (§ 8.8), such as exposing aggregate user data describing the cycle of addiction, or matching newcomers with sponsors. Finally, as the
type of information shared on Forum77 is difficult to acquire at scale through traditional channels, we
note that the tools and insights presented here may be of use to the addiction research community.
Chapter 9
Conclusion
This dissertation presents both methods for automatically extracting medically-relevant data from patient
authored text (PAT) as well as insights derived through the application of these methods. In concert,
our contributions both underscore PAT’s latent potential for illuminating poorly understood or clandestine
medical topics that may be invisible to traditional medical data collection, as well as offer viable methods
that dramatically improve our ability to realize this potential. In this final chapter, we reiterate the contributions of this thesis (§ 9.1) and present principal opportunities for future research (§ 9.2) before offering
concluding thoughts (§ 9.3).
9.1
Contribution Summary
Our work is predicated on the observation that despite being both abundant and uniquely valuable,
patient authored text (PAT) is a heavily underutilized health data resource. In Chapter 2 we presented
an overview of prior work describing online health seeking behavior and, more specifically, online health
community (OHC) participation. Synthesized via a cross-disciplinary literature review, this chapter serves
to illuminate how people use the Internet as a health resource. In Chapter 3 we present a novel review
of prior work that utilizes PAT as a primary data source. We discuss goals, data sources, methodological
approaches and outcomes, providing a contextual background against which to interpret and evaluate
the rest of our work. To our knowledge, this review is the first such synthesis of prior work focused on
extracting value from PAT.
The development of ADEPT (Chapter 5) – our CRF classifier that automatically identifies medicallyrelevant terms in PAT – was prompted by our observation that existing biomedical term annotation toolkits perform poorly on PAT. While statistical classifiers present an attractive alternative, acquiring large,
expert-annotated PAT corpora on which to train and test them is a major challenge. To this end, we prove
121
CHAPTER 9. CONCLUSION
122
that a crowd of non-experts yields annotations comparable in quality to experts’ for the PAT medical term
identification task. Our result offers an alternative method for acquiring large annotated PAT corpora both
quickly and cheaply. However, our task design failed to yield similar quality results for more specific PAT
annotation tasks (e.g. identifying all symptom terms). This underscores the tradeoff between designing crowdsourcing tasks and annotating the data oneself. Applying ADEPT to large PAT corpora yields
high-level insights useful for summarization and hypothesis generation; however, the tool is too broad
for fine-grained analysis. For higher-resolution insights, we narrow our focus to the topic of addiction: a
highly prevalent but stigmatized medical condition.
Understanding why people author PAT is crucial for matching it with appropriate research questions.
In Chapter 6, we investigate users’ motivations for participating in Forum77: MedHelp’s Addiction: Substance Abuse community. Our thematic analysis over initiating posts concurs with prior work stating that,
in general, people seek both informational and emotional support from OHCs. However, our analysis
also reveals distinct sub-categories of these two kinds of support. Of particular interest is the update:
a prevalent emotional support seeking post in which the user does not explicitly request a community
response. We train two logistic regression classifiers: the first distinguishes emotional from informational
support-seeking posts; the second, update from non-update posts. Applying these to the entire Forum77
data set reveals that update posts garner slightly more responses on average than non-update posts.
The prevalence of update posts suggests that users value the forum as a place where their personal
progress can be witnessed by others and recorded for posterity. Forum77 also serves as a repository
for information on opioid withdrawal. In fact, Thomas’ Recipe, a protocol for medication-assisted opioid
withdrawal that evolved on Forum77, suggests that Forum77 users actively collaborate on developing
effective treatment protocols.
In Chapter 7 we investigate the distribution of drugs of choice (DOCs) in the Forum77 population. A
close reading indicates that identifying DOCs is a context sensitive problem, as a variety of substances
can serve as either addiction or treatment. A CRF classifier trained on manually annotated data is able to
identify DOCs with high accuracy. Our resulting analysis, which compares the Forum77 DOC distribution
to those of other drug-using populations, reveals that the Forum77 population struggles disproportionately more with prescription opioids, and disproportionately less with traditionally abused substances
such as alcohol, marijuana and cocaine. While it is difficult to ascertain whether Forum77 reflects realworld drug use trends, our results do suggest that Forum77 represents a population of drug users that
is not well covered by existing monitoring systems.
123
Finally, in Chapter 8, we analyze the process of opioid withdrawal, recovery and relapse on Forum77. Through a thematic analysis, we develop a taxonomy describing phases of addiction based on
Prochaska’s Transtheoretic Model for behavior change. Phases of addiction are accompanied by distinct
physiological and psychological changes, and this is mirrored in users’ usage of the site: exploring activity and linguistic features from posts across the phases USING, WITHDRAWING and RECOVERING reveals
several significant differences. We leverage these differences to train a sequence-based CRF model
to annotate users’ phase sequences automatically. We can also identify relapse events from these sequences, as well as whether a user’s final post was made in a state of RECOVERING, with high accuracy.
Our resulting analysis of all Forum77 users’ transition sequences indicates that despite the fact that
relapse is common, leaving the forum in a state of RECOVERING remains the most probable outcome.
Moreover, we show that high engagement with the community correlates with the probability of a user
RECOVERING
by her last initiating post on the forum. Overall, these results suggest that Forum77 is an
effective detoxification aide. To our knowledge, this work is the first that attempts to quantify the phases
of addiction and the transitions between them.
9.2
Future Work
Given the considerably high levels of enthusiasm currently surrounding health-related technology, our
contributions present a timely foundation and reference. However, many limitations to realizing the full
value of PAT remain. In this section, we articulate key opportunities for future research.
9.2.1
Supporting the Methodological Process
Figure 9.1 (replicated from Chapter 1) illustrates the stages of our methodological process for extracting
insights from PAT. At present, most of the stages in the main process (top row) must be cobbled together
in an ad-hoc fashion by the researcher. This hurts efficiency, replicability and makes comparison between studies difficult. Developing this process into more of a standardized pipeline would enable closer
synergy between disparate research efforts, and make it easier to identify quality results. We suggest
several areas for improvement below.
Future Work
close annotation
reading
PAT
Content
Schema
124
training
application
Labeled
Labeled
Data
Data
Features Classifier
(human)
(auto)
schema
revision
processing
& analysis
Processed Data
Insights
tuning
PAT
interface
design
Medical
Discovery
Figure 9.1: Our general methodological process. Nodes in grey show avenues for future work supported
108
by our contributions.
Interface Support for Thematic Analysis
Thematic analyses are frequently used to develop deep insights into text-based corpora and to inform
future analyses. Moreover, as we note in Chapters 6 and 8, not only do the results of thematic analyses
stand as their own qualitative contribution, they also indicate junctions at which we may shift from a closereading to a large-scale, automated analysis. In spite of their complexity and importance, there is no
interface support for thematic analyses: provenance of this iterative process is never recorded; reasons
(and supporting examples) for making particular decisions about categories are lost; and the clustering,
combining, and splitting of categories is done primarily in the researchers’ working memories. Based on
our own experience, a starting point for interface support would provide visual “sand boxes” for comparing
and organizing data elements into categories; support for flagging items that either especially support,
or especially contend, the proposed taxonomy; and facilitate the easy expression of categorization rules.
Aside from making thematic analyses more efficient and consistent, externalizing the process in this
fashion would make resulting taxonomies easier for a third party to verify, compare against and reuse.
Improved Tools for Annotation
Related to the matter of interface support for thematic analyses is interface support for data annotation.
In our work, we conducted this process primarily through the use of shared spreadsheets. While this
makes data output easy, it hinders comparison between non-adjacent data elements; does not support
125
the capture of spontaneous updates to annotation rules that arise from encountering novel examples;
and only weakly supports collaboration between annotators. Examples of features that an annotation interface might provide include visual support for clustering and comparing data elements; automatic label
suggestions based on underlying text analytics; iterative updating of annotation rules in response to new
data elements; and automatic evaluation of inter-annotator agreement that facilitates rapid exploration of
agreements and errors. Not only would such an interface make the annotation process faster and more
consistent, but it may also encourage standardization in annotation and reporting practices.
Mapping the Limits of the Crowd in PAT Annotation Tasks
In Chapter 5 we showed that the crowd can replace medical experts for some PAT annotation tasks. However, correctly designing crowdsourcing tasks is sufficiently time consuming that in subsequent chapters,
we elected to annotate our data manually. Exploring the crowd’s ability to perform a variety of PAT annotation tasks, however, remains a crucial avenue for future work. Without it, it would be difficult to scale
analyses such as ours to larger forums or to multiple data sets. More importantly, however, this would
make it easier to create and share large, labeled corpora within the research community. Due to our
data sharing agreement with MedHelp, we were unable to share any of our labeled data sets. However,
making a large, labeled PAT corpus available to the public would be the most direct way to stimulate
research on these topics.
9.2.2
PAT Interface Design & Support
Despite their popularity, the general structure of online health communities (OHCs) has barely changed
since the late 1990’s. However, both insights and classifiers derived through the PAT analysis pipeline
could prove valuable if incorporated into OHCs. As we show in Figure 9.1, closing this loop may create
a virtuous cycle, in which the results of interface improvements result in higher volumes and quality of
PAT. This, in turn, would lead to more fine-grained insights and improved classifiers. While we do not
implement any interface changes in this work, we have several suggestions.
Expose Aggregate Data to Users
OHC participants spend hours doing tasks that often amount to simple aggregation, such as calculating
treatment popularity, establishing what Forum77’s most popular DOC is, and estimating the probability
of a successful detoxification attempt conditional on a specific withdrawal method. This is inefficient: not
126
only are OHCs difficult to navigate for these sorts of tasks, but often many users will conduct identical
analyses at different points in time. In the best case, exposing such data to users could alleviate users’
need to reinvent the wheel for each analysis, freeing their time for alternative tasks.
Support Data Entry
One critique of PAT is that it is often incomplete in terms of containing all relevant medical information.
Nudging users towards providing more complete accounts of their conditions would enrich our analyses
and enhance PAT’s credibility as a data source. One example is “symptom autocomplete”: rather than
relying on users to remember and list all of their symptoms (some of which may not even be severe
enough to notice), it would be relatively straightforward to automatically suggest (or “autocomplete”)
symptoms based on the ones already entered.
Automatically Construct User Timelines
Personal timelines are commonplace in social media and the quantified self movement. Our work on
users’ reasons for participating in Forum77 (Chapter 6) indicate that they value its archival features.
Making it easier for users to browse their histories, especially histories enhanced with structured data
provided by classifiers, could facilitate an array of tasks, from discovering behavioral patterns to finding
other “people like them”. Quantitative uses aside, a timeline comprises a narrative of important life
events, failures, and accomplishments that would have strong emotional significance to users. Given the
chance, it is likely that users would take it upon themselves to curate their own timelines: a situation that
could be leveraged to have users label their own data.
9.2.3
Making the Leap to Medical Discoveries
Our work adds to a growing body of proof that medically-relevant insights are automatically extricable
from PAT. However, the holy grail is to move from medical insights to actionable medical discoveries. In
our own work, efforts along these lines might include extending our work on identifying drugs of choice
(Chapter 7) to support real-time identification of new drugs, or extending our work on phases of addiction
(Chapter 8) to prove that participation in Forum77 measurably reduces the number of relapses that
someone experiences. However, making such leaps is nontrivial. Challenges include understanding how
signals in PAT correspond to real-world trends, in spite of the fact that PAT rarely contains demographic
data; clinically verifying results, which is both slow and expensive; and developing new experimental
127
designs that are compatible with online health seeking behavior. Such challenges could only be met
through a close-knit collaboration with medical professionals who agree that PAT is a valuable data
source.
9.3
Concluding Remarks
Patient authored text is the abundant byproduct of hours of human intelligence spent on complex, healthrelated problem solving tasks. As long as this valuable resource is underutilized, researchers, patients
and medical professionals alike will be deprived of the unique insights and benefits that it has to offer.
Although this dissertation takes a step towards leveraging some of the considerable work that patients do
in managing their own health, this is only the tip of the iceberg: we anticipate a future in which technology
creates, supports and encourages synergy between patients, providers and data.
Appendix A
ADEPT Supplementary Material
Table A.1: The following features are specified when training our CRF. Other features retain their default
values as described at http://nlp.stanford.edu/nlp/javadoc/javanlp/edu/stanford/nlp/ie/
NERFeatureFactory.html
Property Name
Type
Value
Description
useClassFeature
boolean
TRUE
Include a feature for the class (as a class marginal). Puts a prior on the
classes which is equivalent to how often the feature appeared in the training
data.
useWord
boolean
TRUE
Gives you feature for w
useNGrams
boolean
TRUE
Make features from letter n-grams, i.e., substrings of the word
noMidNGrams
boolean
TRUE
Do not include character n-gram features for n-grams that contain neither the
beginning or end of the word
useDisjunctive
boolean
TRUE
Include in features giving disjunctions of words anywhere in the left or right
disjunctionWidth words (preserving direction but not position)
maxNGramLeng
int
7
If this number is positive, n-grams above this size will not be used in the
model
usePrev
boolean
TRUE
Gives you feature for (pw,c), and together with other options enables other
previous features, such as (pt,c) [with useTags)
useNext
boolean
TRUE
Gives you feature for (nw,c), and together with other options enables other
next features, such as (nt,c) [with useTags)
useSequences
boolean
TRUE
usePrev
boolean
TRUE
useNext
boolean
TRUE
maxLeft
int
1
The number of things to the left that have to be cached to run the Viterbi
algorithm: the maximum context of class features used.
useTypeSeqs
boolean
TRUE
Use basic zeroeth order word shape features.
useTypeSeqs2
boolean
TRUE
Add additional first and second order word shape features
useTypeySequences
boolean
TRUE
Some first order word shape patterns.
wordShape
String
chris2useLC
Either none for no wordShape use, or the name of a word shape function
recognized by WordShapeClassifier.lookupShaper(String)
128
Appendix B
F77 Purpose Supplementary Material
Table B.1: Features used to train our purpose classifiers, which distinguish emotional from informational
support seeking, as well as update from non-update posts.
Feature Name
Description
containsQuestion
whether the post contains a question (binary)
numQuestions
number of questions the post contains
unigrams
all words present in the post
bigrams
all bigrams (two consecutive words) present in the post
timeMentioned
number of days clean time (if mentioned).
patterns:
Extracted using the following two
X:NUM (day—days—week—weeks—months—month—year—years) (clean—off)
on? ”day—days” X:NUM
where NUM is any number and ”—” represents the OR operator. We then
convert weeks/months/years to days and use the number of days as the feature
value. The default value is 0.
numPosWords
number of words with a positive sentiment score in SentiWordNet of ¿ 0.8
numNegWords
number of words with a negative sentiment score in SentiWordNet of ¿ 0.8
daysMentioned
whether the user mentions a number followed by the term “day” or “days”’
days since last initiating post
the number of days since the user’s last initiating post
129
Appendix C
F77 Drug of Choice Supplementary Material
Table C.1: Drug term resolution map, manually compiled from classifier output. The i column indicates
whether the drug category is included in our analysis in Chapter 7.
Category
alcohol
cigarettes
cocaine
Drug name
Resolved drug terms
i
alcohol
acholic, acoholic, alcahol, alchohol, alchol, alcholo, alcohol, alcoholic, alcoholoc, alcolhol, alcololic,
alocholic, alocohol, champagne, beer, beers, vodka, wine, drink alcohol, drink beer, drinking alcohol, drink wine, drinking beer, drinking wine, drinks, beer bottles, beer drinking, alcohol drinking,
alcohol drinks, alcoholic drink, drink, drinking
◦
cigarettes
cigarettes, cigaretts, cigarrettes, cigars, cigerattes, ciggaretes, ciggarettes, ciggaretts, ciggerettes,
ciggies, ciggs, cigrattes, cigs, smoke, smoke cigarets, smoke cigarettes, smoke cigs, smoked,
smoker, smokes, smokes cigarettes, smokes ciggaretts, smokes cigs, smokin, smokin cigs, nicotine, smoking cigarettes, smoking, smoking cigs
◦
cocaine
cocaine, cocain, cocaine, cocane, coccaine, cociane, coke, coaine, powder, smoke cocaine, smoke
coke, smokin coke, smoking coke, smoking crack, smoking crack cocaine
◦
hallucinogens, mescaline
◦
psilocybin mushroom, mushrooms, shrooms, psychedelics
◦
heroin
heroin, herioin, herion, heroin, heroin cocaine, heroine, smoking heroin, smack, smoke heroin,
heroin heroin, heroin smoking
◦
marijuana
marijuana
marijuana, marajuana, marihuana, marijanna, marijauna, marijuan, marijuana, marijuana smoker,
marijuanna, marijuanna smoker, marjuana, marjuana smoke, pot, pot brownies, pot smoke, pot
smoker, pot smokers, pot smokin, weed, weed smoker, smoke marijuana, smoke marijuanna,
smoke pot, smoke weed, smoked pot, smokes marijuana, smokes pot, smokes weed, smokin
pot, smoking marijuana, smoking pot, smoking weed, dope, pot smoking, smoking weed, smoking dope, smoke dope, hash, hashish, smoked weed, smokin dope, marijuana smoke, marijuana
smoked, marijuana smoking
◦
methadone
methadone
methadone, mehadone, mehtadone, mehtadone pain killers, metadone, methadoen, methadome,
methadon, methadone, methadone pain killers, methadones, methadont, methadose,
methandone, methaodne, methatdone, methdaone, methdone, methedome, methedone, methodone, methodone pain pills, methondone, methone, mdone
◦
suboxone
sub, suoxone, subbies, subboxin, subboxine, subboxone, subetex, subitext, subloxone, subo,
subone, subonoxe, subooxone, subotex, subox, suboxan, suboxe, suboxen, suboxene, suboxens, suboxin, suboxine, suboxins, suboxne, suboxom, suboxome, suboxon, suboxone, suboxones, suboxtone, suboxyn, suboxzone, subozone, subroxone, subs, soboxan, soboxen, soboxene, soboxin, soboxine, soboxion, soboxon, soboxone, soboxones, sabonxon, saboxan, saboxen,
saboxin, saboxins, saboxon, saboxone, subtex, subutec, subutek, subutex, subutext, subutox, subuxone, subx, subxone, syboxin, syboxone, symboxin, buprenorphine, buprenorphine, bupenorphine, bupenorphrine, bupernepherine, bupernorphine, bupremorphine, buprenex, buprenophine,
buprenorphene, buprenorphine, bupreorphine
◦
hallucinogens
hallucinogens
psilocybin mushroom
heroin
Continued on next page
130
APPENDIX C. F77 DRUG OF CHOICE SUPPLEMENTARY MATERIAL
131
Table C.1 – Continued from previous page
Category
opioid
Drug name
Resolved drug terms
i
codeine
codeine, codeiene, codein, codeine, codeine otc pills, codeine painkillers, codeine sulphate, codene, codene pain pills, codien, codiene, codiene painkillers, codiens, codine, codone, coedine,
tylenol 3, tylenol3
◦
dextropropoxyphene
dextropropoxyphene, darovcet, darv, darvacet, darvacets, darvaset, darvecet, darvecette, darvicet,
darviset, darvo, darvocet, darvocets, darvocett, darvocette, darvon, darvoncet, darvos, darvoset,
darvs, darvys, davocet, davort, dextropropoxyphene
◦
dialudid
diladid, diladin, diladud, dilantin, dilatin, dilaudad, dilauded, dilaudeds, dilaudid, dilaudin, dilauid,
dillauded, dilodid, dilodids, diloted, dilotid, dilotted, diloudid, diluadid, diluadids, diludid, diluidid,
hydromorphone, hydromophone, hydromorophone, hydromorphcontin, hydromorphine, hydromorphone
◦
fentanyl
actiq, fenatyl, fentaly, fentanol, fentanyl, fentanyl pain patch, fentanyl pain patches, fentayl, fentenal,
fentenyl, fentinol, fentnyl, fentora, fentyl, fentynal, fentynal pain patches, fentynl, fentynol, fentynyl,
fetynal
◦
hydrocodone
hydrocodone, hrdrocodone, hudro, hycodan, hycodne, hydo, hydocodone, hydorcodone, hydors,
hydos, hydos-75, hydr, hydracodone, hydrco, hydrcodene, hydrcodone, hydro, hydro codeine,
hydro-codone, hydroc, hydrocdone, hydrochodone, hydroco, hydrocod, hydrocodan, hydrocode,
hydrocodeine, hydrocoden, hydrocodene, hydrocodien, hydrocodiene, hydrocodin, hydrocodine,
hydrocodine pills, hydrocodne, hydrocodon, hydrocodone, hydrocodones, hydrocodons, hydrocondone, hydrocondone pain medication, hydrocone, hydrocordon, hydrocordone, hydrodcodone, hydrododone, hydrodone, hydromet, hydromorphone hydrochloride, hydros, hydrycodone, hyrdo, hyrdocodone, hyrdos, hyrdro, hyrdrocodone, hyrdros, hyro, hyrocodone, hyros, smoke hydro
◦
lortab
lortab, loratab, loratabs, loratb, lorcet, lorcets, lorcett, lorecet, lorecets, lorects, loretab, loricet,
loritab, loritabs, lorocet, lorocets, lorotabs, lorset, lortab, lortab◦, lortab◦-5, lortabs, lotab, lotabs,
loracet, loracets
◦
meperidine, demerol, demeral, demerol, demoral, demorol
◦
morphine
morphine, mophine, moraphine, morhine, morhphine, morhpine, moriphine, morophine, morp, morphane, morpheine, morphen, morphene, morphin, morphine, morphines, morphone, morpine,
mscontin, morphine mscontin, morphine sulf, morphine sulphate, ms-contin, avinza, ms contin,
oramorph, kadian
◦
norco
norco, noco, noraco, norc, norce, norco, norco vicodin, norcos, norcs, nordco, noreco, norko,
noroco, norocs, narco, narcos
◦
opiates
opiates, opates, opiade, opiants, opiat, opiate, opiates, opiats, opiete, opiets, opiot, opiote, opiotes,
opitaes, opitate, opitates, opites, oopiate, opaite, opaite pain meds, opaites, opiate meds, opiate
narcotic pain pills, opiate narcotics, opiate pain killer, opiate pain killers, opiate pain medication,
opiate pain medications, opiate pain medicines, opiate pain meds, opiate pain pill, opiate pain
pills, opiate painkillers, opium, opiads-heroine/percs/hydro, opiate drug, opiate narcotic pain, opiate
pain, opiate pain med, opiates percs, opiates vicodin, opiates xanax, oppiates, smoking opium
◦
opioids
opioids, opiod, opiods, opioid, opioids, opoid, opoids, opiod drug, opiod narcotic, opioid meds,
opioid pain med, opioid pain medications, opioid pain meds
◦
oxycodone
oxycodone, roxcodone, roxi, roxicdone, roxicet, roxicets, roxicodne, roxicodone, roxicodones, roxicontin, roxicontins, roxicotin, roxies, roxiodone, roxis, roxocodone, roxy, roxy codone, roxy3, roxy4,
roxycet, roxycodine, roxycodone, roxycodones, roxycontin, roxycontins, roxycotin, roxycottin, roxys,
oxcodone, oxcontin, oxcotin, oxcy, oxcycodone, oxcycontin, oxcycotin, oxcyontin, oxcys, oxen, oxey,
oxeys, oxi, oxicoden, oxicodon, oxicodone, oxicontin, oxicotin, oxicotines, oxicoton, oxie, oxie codine, oxies, oxocodone, oxtcontin, oxxy, oxy, oxy codone, oxy contin, oxy-contin, oxy4, oxy8, oxy8s,
oxyc, oxyco, oxycocet, oxycocets, oxycod, oxycode, oxycodeine, oxycoden, oxycodene, oxycodin,
oxycodine, oxycodne, oxycodon, oxycodone, oxycodones, oxycodpne, oxycoidone, oxycondin, oxycondone, oxyconin, oxycontiin, oxycontin, oxycontine, oxycontins, oxyconton, oxycontontin, oxycoontin, oxycotdin, oxycoten, oxycotin, oxycotine, oxycotins, oxycotion, oxycoton, oxycotten, oxycottin, oxycottins, oxycotton, oxydocone, oxydodone, oxydone, oxyicodone, oxyies, oxyir, oxynorm,
oxys, oxytocin, oxyxodones, oxyz, oycodone, blues, blue pills, ocs, ocycodone, oxy hydro, oxy
ocs, oxy vics, oxy-norm, oxy/percs/tabs, oxycodone oxycontin, oxycodone pain meds, oxycontin,
oxycotontin, smoking oxy, smoking oxycontin
◦
meperidine
oxymorphone, opana, opanas
◦
percocet
percocet, perc, percacet, percacets, percaset, percasets, perccet, percecet, percecets, percet,
percets, percicet, percks, perco, percocect, percocet, percocete, percocets, percocett, percocette,
percocetts, percocite, percocoet, percoct, percodan, percodone, percoet, percoets, percoset, percosets, percot, percote, percots, percs, perkacet, perkeset, perkocet, perkocets, perks, perocaet,
perocet, perocets, persocet, pecocet, pecocets, pers, perts
◦
tramadol
tramadol, tradol, tram, tramacet, tramadal, tramadaol, tramado, tramadol, tramadole, tramadols,
tramadon, tramal, tramdol, tramedol, tramidol, trammadol, tramodal, tramodol, tramol, trams,
tranadol, ulram, ultam, ultracet, ultram, ultrams, ultrm, ultrum
◦
oxymorphone
132
Category
OTC
Drug name
Resolved drug terms
i
vicodin
vics, vicks, vic, vicadan, vicaden, vicadin, vicadine, vicadon, viccodin, viccoding, vicdin, vicdon,
vicdone, viciden, vicidin, vicidine, vicidon, vicidons, viciodin, vico, vicodan, vicodein, vicodeine,
vicoden, vicodene, vicodens, vicodent, vicodien, vicodine, vicodines, vicoding, vicodins, vicodion,
vicodn, vicodon, vicodone, vicodyn, vicoin, vicondin, vicos, vicotin, vidodin, vik, vikcs, vike, vikes,
vikoden, vikodin, viks, viocdin, viocidin, viocoden, viodin, vivodin, vivodins, vocidin, vocodin, vicodin
◦
vicoprofen
vicaprofen, vicobrofin, vicoprofen, vicoprofin, vicoprohen, vicoprophen, vicoprophin, vicroprofen,
vicuprofen
◦
acetaminophen, acetamenophin, acetamenophine, acetaminaphen, acetaminaphin,
etaminophen, acetem, aceteminophen, acetomenophine, acetominophen, acetominophin
◦
acetaminophen
benadryl, benadril, benadryl, benadryll, bendryl, benedryl, benodryl, benydryl
◦
dextromethorphan, dxm
◦
ibuprofen
advil, ibeprofen, ibogaine, ibp, ibprofen, ibprofin, ibprohin, ibprophin, ibu, ibupofen, ibupro, ibuprofen, ibuprofin, ibuprophen, ibuprophin, ibuprophren, ibupropin, mortin, mortrin, motrin, neurofen,
neurophen, nurofen
◦
melatonin
melantonin, melatonin, meletonin, melitonin, melotonin
◦
naproxen
naproxen, aleeve, aleve, aleive, alieve, alleve
◦
nyquil, nyquill
◦
paracetamol, paracetemol, paracetomal, paracetomol, parecetamol
◦
tyelonol, tyenol, tyl, tylanol, tylenal, tylenol, tylenol oc, tyleonol, tylinol, tylonal, tylonel, tylonol, tylox,
tyloxes, tynenol, tyneol, tylenol
◦
benadryl
dextromethorphan
nyquil
paracetamol
tylenol
sedative
ac-
alprazolam
ativan
barbiturates
benzodiazepine
buspirone
chlordiazepoxide
clonazepam
diazepam
eszopiclone
fioricet
flunitrazepam
gabapentin
alpralozam, alprazalam, alprazolam, alprozalam, alprozolam,
◦
ativan, adavan, adavant, adavin, adivan, advan
◦
barbiturates, barbituates, butalbital, phenobarbital, barbs
◦
benzodiazepine, benzo, benzocaine, benzodiazapenes, benzodiazapines, benzodiazepams, benzodiazepines, benzodiazpines, benzoes, benzoids, benzos, oxazepam
◦
buspirone, buspar
◦
chlordiazepoxide, librium
◦
clonazepam, klnopin, klodopin, klon, klonapin, klonapins, klonepin, kloni, klonidine, klonipin,
klonipin oxycontins, klono, klonoin, klonoipn, klonopan, klonopin, klonopine, klonopines, klonopins,
klonpin, klonpion, klonzapam, klopin, kloponin, kolonapin, kolonipin, kolonopin, kolonopins, kpins,
clonazepam, clonazepams, clonazepham, clonozepam, clonapin, clonapine, clonopin, clonopine,
clonipin, clonipine, clonipins, colonopin
◦
diazapam, diazapams, diazepam, diazipam
◦
eszopiclone, lunesta
◦
fioricet, fioricet, fierocet, fioracet, fiorcet, fiorecet, fiorecett, fioricet, fiorocet, fiurecet, floricet, fiorinal, fiorinals, fiorinol, fiornal, fiorinal, fiorinals, fiorinol, fiornal
◦
flunitrazepam, rohypnol
◦
gabapentin, nerontin, neuotin, neuratin, neurontin, neuronton, neurontrin, neurotin, neuroton, neurontin
◦
◦
ghb
lorazapam, lorazapan, lorazepam, lorazepan, lorazopam, lorazpam, lorezapam, lorezepam
◦
soma, soma pills, somas
◦
valium
valium, valiums, vallium, vallum, valuem, valuim, valuims, valum, valume
◦
xanax
xaanx, xana, xanac, xanacs, xananx, xanax, xanex, xanix, xannax, xannies, xantax, xantex, xanx,
xanxa, xanxax, xnax, xznax, zanax, zanaz, zanex, zanix, zannax, zantac, zanx
◦
zolpidem
zolpidem, ambein, ambian, ambiem, ambien
◦
sedatives
sedatives, ketamine
◦
adderal, adderal, adderall, adderalll, adderol, adderral, adderrall, adderrol, addreall, aderol, aderoll,
aderrall, dexedrine, dextroamphetamine
◦
amphetamine, amphetamines
◦
lorazepam
soma
stimulant
adderall
amphetamine
◦
amphetamine
LSD
mdma
◦
lsd, acid
◦
mdma, ecstacy, ecstasty, ecstasy, exstacy, extacy
133
Category
Drug name
methamphetamine
methylphenidate
modafinil
general
antidepressant
general
Resolved drug terms
i
methamphetamine, meth, meth smoker, methamphedamines, methamphetamine, methamphetamines, methamphetimines, methanphetamine, smoking meth
◦
methylphenidate, ritalin, ritilan, ritilin, concerta
◦
alertec
◦
meds, drugs, drug, med
narcotics
narcotics, narc, narc meds, narc pain meds, narc painkillers, narcan, narcanon, narcatic, narcatics,
narcodics, narcotic, narcotic meds, narcotic pain killers, narcotic pain medication, narcotic pain
medications, narcotic pain medicine, narcotic pain medicines, narcotic pain meds, narcotic pain
pill, narcotic pain pills, narcotic pain reliever, narcotic pain relievers, narcotic pain-killers, narcotic
painkillers, narcotic pills, narcotics, narcotis, narcs, narctoics
painkillers
pain pill, analgesic, analgesics, pain meds, pain pills, pain killers, painkillers, pain medication, pain
medicine, pain medications, pain relievers, pain kiilers, pain killer, pain killer pills, pain killlers, pain
kills, pain kller, painpills, pain med, pain reliever, painkillers hydros, painkliiers, painmeds, pains
meds, pill, pills, narcotic pain, pils, ls, pilss, pharmaceuticals, pain
amitriptyline
amitriptyline, amiltriptyline, amitriptaline, amitripthyline, amitriptyline, amitripyline, amitryptaline,
amitryptilline
aripiprazole
aripiprazole, abilfy, abilify
citalopram
citalopram, celexa, celexia, celxa
duloxetine
duloxetine, cymbalta, cybalta, cymalta, cymbalata, cymbalta, cymbalts, cymbata, cymbolta, cynbalta
fluoxetine
prozac, fluoxetine, fluoxtine
lexapro
paroxetine
paroxetine, paroxetine, paroxotine, paxatine, paxial, paxil, paxill
trazodone
trazadone, trazodone
venlafaxine
wellbutrin
zoloft
NA
albuterol
venlafaxine, effexor, efforex, effxor, eflexor
bupropion, buproprio, wellbutrin, welbrutrin, welbutrin, wellbrutrin, wellbutrin
zoloft, zoloff
albuterol sulphate, albuterol
amoxicillin
amoxicillin, amoxcillin, amoxicillin, amoxxillin
antibiotics
antibiotics, anitbiotics, anitdepressants, anphetamines, antabuse, antibiotic, antibiotics, antibitics,
antibotics
carisoprodol
carisoprodol
clonidine
clonidine, cloadine, clondine, clonidine, clonine, clonodin, clonodine, colodine, colondine, colonidine, colonodine
cyclobenzaprine
cyclobenzaprine, flexaril, flexarill, flexeral, flexerall, flexeril, flexerill, flexerils, flexerol, flexiril, flexirils,
flexirl, flexril, flexrill
naloxone
naloxone, nalorex, naloxone
naltrexone
naltrexone, naltexone, naltraxone, naltrex, naltrexone, naltrexone hydrochloride, naltexone, naltraxone, naltrex, naltrexone, naltrexone hydrochloride
prednisone
prednisone, predensone, predinsone, predisolone, predisone, prednisolone, prednison, prednisone, prednizone
pregabalin
pregabalin, lyrica
quetiapine
quetiapine, seraquel, seraquil, sereoqol, serequel, serequil, serezone, seriquil, seroqel, seroquel,
seroquell, seroquels, seroquil, serqual, serquil
steroids
steroids, roids
vitamins
vitamins, vitaimns, vitamans, vitamians, vitamines, vitamins, vitamns, vitams, vitc, vite, vitiamins,
vitiams, vitimans, vitimins, vits, supplements
zaleplon
zaleplon, sonata
134
Table C.2: The default feature list for Stanford’s NER classifier is at nlp.stanford.edu/nlp/javadoc/
javanlp/edu/stanford/nlp/ie/NERFeatureFactory.html. Here, we list all features whose default
values were changed to train our DOC classifier.
Feature Name
Feature Value
useTag
true
useClassFeature
true
useWord
true
maxNGramLeng
3
useNGrams
true
usePrev
true
useNext
true
useSequences
true
usePrevSequences
true
maxLeft
1
useTypeSeqs
false
useTypeSeqs2
false
useTypeySequences
false
wordShape
chris2useLC
useLemmas
true
useDistSim
true
distSimLexicon
We used Twitter word clusters [189] and word clusters generated using the Brown hierarchical word clustering algorithm [32, 157] on all MedHelp posts.
useDisjunctive
true
disjunctionWidth
3
cleanGazette
true
gazette
We utilized a dictionary composed from several online lists of commonly misused substances. Table C.3 shows all dictionary terms.
135
Table C.3: Gazette of common substances used as a feature in the DOC classifier. This gazette was
compiled from a range of online resources.
Acamprosate, acid, actiq, adderall, aerosol propellants, alcohol, alprazolam, ambien, amidone, amobarbital, amphetamine, amphetamines, amytal, anadrol, anexsia, angel dust, antabuse, apache, ativan, avinza
Barbs, beer, bennies, bidis, big o, biocodone, biocondone, biphetamine, biscuits, black beauties, black stuff, blue
heaven, blues, blunt, buprenorphine, butalbital, butane propane, butorphanol
Cactus, campral, captain cody, carisoprodol, cat valium, chalk, charlie, china girl, china white, chlordiazepoxide,
cigarettes, cigars, clarity, clonazepam, clonidine, cocaine, cocaine hydrochloride, codeine, cody, coke, concerta,
crack, crack cocaine, crank, crosses, crystal, crystal meth, cubes, cyclohexyl
Damason-p, dance fever, darvocet, darvon, demerol, demmies, depade, depo-testosterone, desoxyn, dexedrine,
dextroamphetamine, dextromethorphan, dextropropoxyphene, dextrostat, di-gesic, diacetylmorphine, diazepam, dicodid, dilaudid, dillies, disulfiram, dolophine, dope, downers, duodin, durabolin, duragesic, duramorph, dxm
Ecstasy, empirin, empirin with codeine, equipoise, eszopiclone
Fentanyl, fioricet, fiorinal, fiorinal with codeine, fizzies, flake, flunitrazepam, forget-me pill
Gamma-hydroxybutyrate, ganja, gasoline, georgia home boy, ghb, glues, goodfella, goop, grievous bodily harm, gym
candy
Halcion, hash, hash oil, hearts, hemp, heroin, hillbilly, hycodan, hydrococet, hydrocodone, hydromorphone, hydros
Inhalant, isoamyl isobutyl
Jackpot, jif, joint
Kadian, kapanol, ketalar sv, ketamine, klonopin
La turnaround, laam, laudanum, laughing gas, levacetylmethadol, librium, liquid ecstasy, liquid x, liquor, little smoke,
lorazepam, lorcet, lortab, love boat, lover’s speed, lsd, luminal, lunesta, lysergic acid diethylamide
Magic mint, magic mushrooms, maria pastora, marijuana, mary jane, meperidine, meperidine hydrochloride, mesc,
mescaline, meth, methadone, methadose, methadrine, methamphetamine, methaqualone, methylphenidate, mexican valium, microdot yellow sunshine, miss emma, monkey, morphine, mrs. o, ms contin, msir, murder 8, mushrooms
Naltrexone, nembutal, nitrites, nitrous oxide, norco, numorphone, numporphan
O bomb, o.c., octagons, opana, opium, oramorph, orlaam, oxandrin, oxy, oxycet, oxycodone, oxycontin, oxycotton
Paint thinners, palladone, panacet, paregoric, pcp, peace, peace pill, pentobarbital, percocet, percocet:oxy, percodan, percs, peyote, phencyclidine, phennies, phenobarbital, poppers, pot, pumpers, purple passion
Quaalude
R-ball, red birds, reds, reefer, revia, ritalin, roach, robitussin, robitussin a, robitussin a-c, robitussin b, robitussin c,
robo, robotripping, roche, rohypnol, roids, roofies, roofinol, rophies, roxanol, roxicodone, roxicondone, ryzolt
Sally-d, salvia, schoolboy, secobarbital, seconal, shepherdess’s herb, shrooms, sinsemilla, skag, skippy, sleeping
pills, smack, smoke, snappers, solvents, soma, sonata, special k, speed, steroids, stilnox, stop signs, sublimaze,
suboxone, subutex, symtan
Tango and cash, temesta, the smart drug, tnt, tooies, toot, tramadol, tramal, tranks, triazolam, triple c, truck drivers,
tussionex, tylenol, tylenol with codeine, tylox
Ultram, uppers
Valium, vicodin, vicoprofen, vike, vitamin k, vitamin r, vivitrol
Watson-387, weed, white horse, white stuff, wine
Xanax, xodol
Yellow jackets, yellows
Zaleplon, zolpidem, zydone
Appendix D
F77 Phase Supplementary Material
Table D.1: LIWC features for the three classes in the labeled dataset over initiating posts. Only statistically significant variables are shown. Statistical significance is determined using Kruskal-Wallis tests (*
p < 0.05; ** p < 0.005; *** p < 0.001) after Bonferroni corrections to adjust for family-wise error rate
across all 184 variables (includes activity features). Column c denotes (◦) if the feature is used in our
CRF classifier.
Initiating Post Linguistic Features
Word count
Dic
Numerals
Function words
Pronoun
Personal pronoun
Pronoun: I
Pronoun: you
Pronoun: he/she
Pronoun: they
Pronoun: impers.
Verb
Present tense
Numbers
Social
Humans
Affect
Affect: positive
Affect: anxiety
Cognitive Mech.
Certain
Inhibition
See
Feel
Biological
Body
Health
Relative
Time
Home
Comma
QMark
Other Punctuation
c
p
USING
Mean
Median
◦
◦
◦
◦
*
***
***
***
***
***
***
***
***
***
*
**
***
**
***
*
***
***
**
*
*
*
*
***
***
***
***
***
***
***
**
*
***
208.20
89.26
1.28
60.50
18.51
12.83
9.72
0.98
1.14
0.65
5.68
18.54
12.56
0.71
7.60
0.49
5.30
2.80
0.61
17.27
1.21
0.50
0.34
0.73
3.87
0.58
3.00
13.46
7.24
0.30
3.01
1.35
0.81
◦
◦
◦
◦
◦
◦
◦
◦
◦
◦
◦
◦
◦
◦
◦
◦
◦
◦
◦
◦
◦
◦
◦
◦
◦
◦
151
90.17
0.89
60.92
18.68
13.05
9.97
0.41
0
0.20
5.33
18.69
12.55
0.48
6.59
0
5.00
2.45
0.25
16.98
1.03
0.23
0
0.45
3.46
0
2.63
13.39
6.86
0
2.17
0.52
0
WITHDRAWAL
SD
Mean
Median
211.06
4.89
1.51
5.31
4.32
3.83
3.60
1.70
2.08
1.05
2.82
3.76
3.90
0.93
4.79
0.76
2.76
1.99
0.88
4.50
1.22
0.70
0.65
1.10
2.63
1
2.29
4.65
3.46
0.54
3.36
2.87
1.77
178.92
88.10
1.75
58.40
16.99
11.49
9.02
1.02
0.74
0.47
5.49
17.64
11.53
0.75
6.38
0.40
5.76
3.33
0.55
17.14
1.41
0.41
0.30
1.18
4.01
1.13
2.58
15.04
8.51
0.40
2.75
1.34
0.89
136
127.00
88.89
1.33
59.28
17.17
11.54
9.14
0.13
0
0
5.26
17.59
11.24
0.37
5.26
0
5.54
2.86
0
17.09
1.21
0
0
0.83
3.70
0.63
2.13
14.75
7.87
0
1.94
0.40
0
POST-WITHDRAWAL
SD
Mean
Median
168.81
6.26
1.97
6.45
4.70
4.19
3.76
2.04
1.82
1.12
2.81
4.20
4.09
1.12
5.18
0.79
3.09
2.85
0.98
4.95
1.41
0.74
0.80
1.50
2.90
1.53
2.36
5.25
4.21
0.77
3.27
2.58
1.91
183.23
89.38
1.32
59.74
17.97
11.88
7.89
2.05
1
0.54
6.09
18.13
11.95
0.54
8.85
0.57
6.41
4.14
0.45
17.93
1.57
0.43
0.50
0.85
3.31
0.68
2.20
13.72
7.33
0.68
2.19
1.50
0.62
124.50
90.54
0.83
60.48
18.16
11.86
8.18
0.99
0
0
5.76
17.96
11.63
0
7.89
0
6.11
3.50
0
17.96
1.36
0
0
0.50
2.89
0
1.72
13.61
7.02
0.14
1.63
0
0
SD
209.24
6.59
2.04
7.07
5.28
4.60
4.31
2.89
2.46
1.03
3.35
4.91
4.45
0.89
5.90
1.04
3.52
3.16
0.90
5.11
1.53
0.76
1.14
1.23
2.72
1.12
2.25
5.23
4.23
1.18
2.43
4.92
2.05
APPENDIX D. F77 PHASE SUPPLEMENTARY MATERIAL
137
Table D.2: LIWC features for the three classes in the labeled dataset. Only statistically significant variables are shown. Statistical significance is determined using Kruskal-Wallis tests (* p < 0.05; ** p <
0.005; *** p < 0.001) after Bonferroni corrections to adjust for family-wise error rate across all 184
variables (includes activity features). Column c denotes (◦) if the feature is used in our CRF classifier.
Response Post Linguistic Features
c
Word count
Words per sentence
Numerals
Function words
Personal Pronouns
Pronoun: she/he
Pronoun: they
Pronoun: impers.
Article
Verb
Aux. verb
Future
Preposition
Conjunction
Quantitative
Social
Affect
Affect: positive
Affect: negative
Affect: anxiety
Cognitive Proc.
Discrepancy
Tentative
Exclusive
Perceptual proc.
Feel
Biological
Body
Health
Sexual
Ingetion
Relativity
Time
Money
Assent
Colon
Exclamation
Dash
Other punct.
All punct.
◦
◦
◦
◦
◦
◦
◦
◦
◦
◦
◦
p
USING
Mean
Median
***
***
*
***
***
**
***
**
***
**
***
***
***
***
***
***
***
***
***
***
***
***
***
***
***
***
***
***
***
***
*
**
**
*
***
**
***
**
***
***
494.69
19.21
0.75
59.01
10.86
0.68
0.66
5.48
4.91
17.26
10.67
1.50
11.63
6.39
3.00
10.26
5.73
3.72
1.96
0.36
19.37
2.32
3.35
3.35
1.52
0.64
3.46
0.52
2.68
0.15
0.17
11.46
5.29
0.32
0.27
0.09
1.02
0.79
3.41
22.07
347.00
15.40
0.43
59.85
11.36
0
0.41
5.67
4.98
18.15
11.11
1.44
12.27
6.76
2.99
10.11
5.76
3.53
1.92
0.24
17.81
2.32
3.25
3.40
1.48
0.53
3.20
0.28
2.45
0
0
11.82
5.10
0.13
0.07
0
0.34
0.28
2.84
21.51
WITHDRAWAL
SD
Mean
Median
506.67
18.60
1.02
4.56
3.99
1.35
0.91
2.20
2.06
4.94
3.51
1.07
3.57
2.33
1.52
4.77
2.68
2.43
1.34
0.47
7.77
1.31
1.79
1.62
1.07
0.70
2.17
0.78
1.85
0.35
0.39
4.39
2.90
0.55
0.50
0.20
1.79
2.08
2.55
9.71
427.38
17.04
0.95
56.95
10.21
0.44
0.49
5.57
4.75
17.13
10.37
1.50
11.19
6.18
2.94
8.83
6.55
4.61
1.90
0.40
18.71
1.92
3.12
3.07
1.90
0.91
3.42
0.78
2.24
0.14
0.30
12.36
5.88
0.28
0.40
0.15
2.25
0.82
4.29
25.75
284.00
14.09
0.68
57.69
10.53
0
0.27
5.78
4.96
17.82
10.68
1.43
11.66
6.58
2.88
8.75
6.34
4.10
1.87
0.23
17.43
1.88
3.09
3.18
1.81
0.76
3.22
0.45
1.95
0
0
12.68
6.06
0
0.18
0
0.82
0
3.53
23.69
POST-WITHDRAWAL
SD
Mean
Median
487.46
14.73
1.12
5.41
3.81
1.16
0.66
2.36
2.02
4.88
3.44
1.13
3.38
2.46
1.64
4.23
3.25
3.17
1.33
0.55
7.83
1.33
1.77
1.66
1.34
0.85
2.46
1.08
1.90
0.36
0.66
4.70
3.12
0.56
0.81
0.42
5.08
2.20
3.22
14.52
356.29
14.98
0.95
55.82
10.86
0.64
0.49
5.10
4.20
16.09
9.66
1.10
10.61
5.72
2.50
9.78
7.54
5.84
1.67
0.32
18.77
1.63
2.55
2.56
1.87
0.65
2.71
0.52
1.70
0.30
0.25
11.90
5.66
0.23
0.62
0.27
4.52
0.62
5.64
29.69
210.50
12.99
0.56
57.06
11.58
0
0.13
5.32
4.41
17.23
10.33
1.01
11.51
6.13
2.58
9.81
7.33
5.13
1.50
0
16.80
1.60
2.45
2.60
1.68
0.45
2.41
0.19
1.32
0
0
12.50
5.69
0
0.27
0
1.68
0
4.29
26.82
SD
439.75
14.25
1.49
7.17
4.71
1.63
0.90
2.75
2.23
5.78
3.96
1.03
4.14
2.69
1.67
5.45
4.31
4.36
1.51
0.61
10
1.30
1.96
1.83
1.55
0.76
2.39
0.90
1.76
0.89
0.71
5.37
3.33
0.42
2.01
0.84
8.40
1.64
6.35
19.27
Days since last init. post
Days since last self resp.
Days since last response
Days since last activity
# initiating posts
# replies received
# respondants
# self responses
◦
◦
◦
◦
◦
◦
◦
◦
.
.
**
***
***
***
***
***
***
***
***
***
***
***
***
p
#
Responses#
#
Initiating
terms
terms
terms
RECOVERING
WITHDRAWING
USING
Days clean
Days mentioned
# questions
# USING terms
# WITHDRAWING terms
# RECOVERING terms
***
***
**
***
***
***
**
***
***
◦
◦
◦
◦
◦
◦
◦
◦
◦
Post and Response Content Characteristics
Today
# initiating posts
Last 5 days
All time
Activity Characteristics
c
0.31
0.86
0.53
421.15
52.10
2.94
0.73
0.50
0.38
5.15
3.82
1.57
0.93
1.37
1.87
1.02
8.84
13.93
26.90
1.36
50.94
66.34
73.37
39.56
Mean
0.00
0.00
0.00
14.00
10.00
2.00
0.00
0.00
0.00
4.00
3.00
1.00
0.00
0.00
0.00
1.00
5.00
5.00
6.00
1.00
5.00
9.00
5.00
3.00
Med
IQR
0.00
1.00
1.00
175.00
38.25
3.00
1.00
1.00
1.00
5.00
3.00
2.00
1.00
2.00
2.00
0.00
10.00
18.00
21.00
1.31
24.00
43.50
27.00
13.00
USING
0.00
0.00
0.00
17.79
11.86
1.48
0.00
0.00
0.00
2.97
2.97
1.48
0.00
0.00
0.00
0.00
5.93
7.41
8.90
1.02
5.93
11.86
5.93
2.97
MAD
0.19
1.18
0.53
47.50
19.08
2.35
0.35
1.11
0.39
5.52
4.05
1.89
2.01
3.32
5.48
1.06
8.78
13.80
23.61
1.28
21.04
29.94
33.51
16.66
Mean
0.00
1.00
0.00
5.00
5.00
2.00
0.00
1.00
0.00
4.00
3.00
1.00
1.00
1.00
1.00
1.00
5.00
8.00
8.00
0.82
2.00
2.00
2.00
1.00
Med
0.00
2.00
1.00
7.00
7.00
2.00
1.00
2.00
1.00
5.00
3.00
3.00
3.00
4.00
6.00
0.58
8.00
15.00
21.00
1.26
5.00
8.00
6.00
2.00
IQR
WITHDRAWING
0.00
1.48
0.00
4.45
4.45
1.48
0.00
1.48
0.00
2.97
2.97
1.48
1.48
1.48
1.48
0.64
4.45
8.90
10.38
0.85
1.48
1.48
1.48
0.00
MAD
0.18
0.76
0.78
125.97
57.03
2.60
0.25
0.44
0.94
6.09
4.68
1.53
1.81
2.89
15.20
0.58
20.73
33.26
178.69
0.53
31.04
42.05
28.68
17.76
Mean
0.00
0.00
0.00
45.00
27.00
2.00
0.00
0.00
1.00
4.00
3.00
1.00
1.00
0.00
5.00
0.33
14.00
23.00
67.00
0.22
4.00
6.00
2.00
1.00
Med
0.00
1.00
1.00
74.00
48.00
2.00
0.00
1.00
1.00
6.00
4.00
2.00
3.00
4.00
16.00
0.87
22.00
36.25
159.25
0.35
12.00
17.00
5.00
4.00
IQR
RECOVERING
0.00
0.00
0.00
43.00
28.17
1.48
0.00
0.00
1.48
4.45
2.97
1.48
1.48
0.00
7.41
0.42
13.34
23.72
83.77
0.21
4.45
7.41
1.48
0.00
MAD
Table D.3: Activity and content-based features for the three classes in the labeled dataset. Statistical significance is determined using Kruskal-Wallis
tests (* p < 0.05; ** p < 0.005; *** p < 0.001) after Bonferroni corrections to adjust for family-wise error rate across all 184 variables (includes 160
LIWC variables). Column c denotes (◦) if the feature is used in our CRF classifier.
APPENDIX D. F77 PHASE SUPPLEMENTARY MATERIAL
138
Bibliography
[1] Alcoholics Anonymous (“Big Book,” 4th ed.). AA World Services, Inc. (2001). [Online: http:
//www.aa.org/bigbookonline, accessed 20-May-2014].
[2] Narcotics Anonymous Annual Membership Survey.
Narcotics Anonymous (2011).
[Online:
http://www.na.org/admin/include/spaw2/uploads/pdf/PR/NA_Membership_Survey.pdf,
accessed 12-August-2013].
[3] Vital signs: Overdoses of prescription opioid pain relievers United States, 1999-2008. Center for
Disease Control. Morbidity and Mortality Weekly Report. (2011). [Online: http://www.cdc.gov/
mmwr/preview/mmwrhtml/mm6043a4.htm, accessed 93/4/2014.].
[4] Addiction medicine, closing the gap between science and practice. CASAColumbia (2012). [Online: http://www.casacolumbia.org/download/file/fid/1177, accessed 4/5/2014.].
[5] Commonly abused prescription drugs. National Institute on Drug Abuse (2012). [Online: http://
www.drugabuse.gov/sites/default/files/rx_drugs_placemat_508c_10052011.pdf, accessed 28-May-2014].
[6] Opiate withdrawal. MedlinePlus - U.S. National Library of Medicine (2012). [Online: http://www.
nlm.nih.gov/medlineplus/ency/article/000949.htm, accessed 28-May-2014].
[7] Internet
user
demographics.
[Online:
http://www.pewinternet.org/data-trend/
internet-use/latest-stats/, accessed 7/1/2014].
[8] Prescription painkiller overdoses: A growing epidemic, especially among women. Vital Signs.
CS238899B. Center for Disease Control. (2013). [Online: http://www.cdc.gov/vitalsigns/
pdf/2013-07-vitalsigns.pdf, accessed 9/4/2014].
[9] State and County QuickFacts. U.S. Census Bureau (2013). [Online: http://quickfacts.
census.gov/qfd/states/00000.html, accessed 28-August-2014].
139
BIBLIOGRAPHY
140
[10] Achrekar, H., Gandhe, A., Lazarus, R., Yu, S.-H., and Liu, B. Predicting flu trends using Twitter
data. In Computer Communications Workshops, IEEE (2011), 702–707.
[11] Ahmad, F., Hudak, P. L., Bercovitz, K., Hollenberg, E., and Levinson, W. Are physicians ready for
patients with Internet-based health information? Journal of Medical Internet Research 8, 3 (2006),
e22.
[12] Alpers, G. W., Winzelberg, A. J., Classen, C., Roberts, H., Dev, P., Koopman, C., and Barr Taylor,
C. Evaluation of computerized text analysis in an Internet breast cancer support group. Computers
in Human Behavior 21, 2 (2005), 361–376.
[13] Anand, S. G., Feldman, M. J., Geller, D. S., Bisbee, A., and Bauchner, H. A content analysis
of e-mail communication between primary care providers and parents. Pediatrics 115, 5 (2005),
1283–1288.
[14] Anderson, J. G., Rainey, M. R., and Eysenbach, G.
The impact of cyberhealthcare on the
physician–patient relationship. Journal of Medical Systems 27, 1 (2003), 67–84.
[15] Aramaki, E., Maskawa, S., and Morita, M. Twitter catches the flu: detecting influenza epidemics
using Twitter. In Empirical Methods in Natural Language Processing, ACL (2011), 1568–1576.
[16] Aronson, A. R. Effective mapping of biomedical text to the UMLS Metathesaurus: the MetaMap
program. In American Medical Informatics Association Annual Symposium, AMIA (2001), 17.
[17] Aronson, A. R., and Lang, F.-M. An overview of MetaMap: historical perspective and recent
advances. Journal of the American Medical Informatics Association 17, 3 (2010), 229–236.
[18] Ayers, J. W., Ribisl, K. M., and Brownstein, J. S. Tracking the rise in popularity of electronic nicotine
delivery systems (electronic cigarettes) using search query surveillance. American Journal of
Preventive Medicine 40, 4 (2011), 448–453.
[19] Baccianella, S., Esuli, A., and Sebastiani, F. SentiWordNet 3.0: An enhanced lexical resource for
sentiment analysis and opinion mining. In Language Resources and Evaluation (2010).
[20] Bebbington, P. E. The efficacy of Alcoholics Anonymous: the elusiveness of hard data. The British
Journal of Psychiatry 128, 6 (1976), 572–580.
BIBLIOGRAPHY
141
[21] Bell, V. Online information, extreme communities and Internet therapy: Is the Internet good for our
mental health? Journal of Mental Health 16, 4 (2007), 445–457.
[22] Bender, J. L., Jimenez-Marroquin, M.-C., and Jadad, A. R. Seeking support on Facebook: a
content analysis of breast cancer groups. Journal of Medical Internet Research 13, 1 (2011), e16.
[23] Benton, A., Ungar, L., Hill, S., Hennessy, S., Mao, J., Chung, A., Leonard, C. E., and Holmes,
J. H. Identifying potential adverse effects using the web: A new approach to medical hypothesis
generation. Journal of Biomedical Informatics 44, 6 (2011), 989–996.
[24] Berger, M., Wagner, T. H., and Baker, L. C. Internet use and stigmatized illness. Social Science &
Medicine 61, 8 (2005), 1821–1827.
[25] Berland, G. K., Elliott, M. N., Morales, L. S., Algazy, J. I., Kravitz, R. L., Broder, M. S., Kanouse,
D. E., Muñoz, J. A., Puyol, J.-A., Lara, M., et al. Health information on the Internet: accessibility,
quality, and readability in English and Spanish. Journal of the American Medical Association 285,
20 (2001), 2612–2621.
[26] Bernstein, M. S., Little, G., Miller, R. C., Hartmann, B., Ackerman, M. S., Karger, D. R., Crowell,
D., and Panovich, K. Soylent: a word processor with a crowd inside. In User Interface Software
and Technology, ACM (2010), 313–322.
[27] Birnbaum, H. G., White, A. G., Schiller, M., Waldman, T., Cleveland, J. M., and Roland, C. L.
Societal costs of prescription opioid abuse, dependence, and misuse in the United States. Pain
Medicine 12, 4 (2011), 657–667.
[28] Biyani, P., Caragea, C., Mitra, P., and Yen, J. Identifying emotional and informational support in
online health communities. In Computational Linguistics, ICCL (2014), 827–836.
[29] Braithwaite, D. O., Waldron, V. R., and Finn, J. Communication of social support in computermediated groups for people with disabilities. Health Communication 11, 2 (1999), 123–151.
[30] Braun, V., and Clarke, V. Using thematic analysis in psychology. Qualitative Research in Psychology 3, 2 (2006), 77–101.
[31] Brennan, P. F., and Aronson, A. R. Towards linking patients and clinical information: detecting
UMLS concepts in e-mail. Journal of Biomedical Informatics 36, 4 (2003), 334–341.
BIBLIOGRAPHY
142
[32] Brown, P. F., Desouza, P. V., Mercer, R. L., Pietra, V. J. D., and Lai, J. C. Class-based n-gram
models of natural language. In Computational Linguistics, vol. 18, ICCL (1992), 467–479.
[33] Brownstein, J. S., Freifeld, C. C., Reis, B. Y., and Mandl, K. D.
Surveillance Sans Fron-
tieres: Internet-based emerging infectious disease intelligence and the HealthMap project. PLoS
Medicine 5, 7 (2008), e151.
[34] Buchanan, H., and Coulson, N. S. Accessing dental anxiety online support groups: An exploratory
qualitative study of motives and experiences. Patient Education and Counseling 66, 3 (2007),
263–269.
[35] Buehler, J. W., Berkelman, R. L., Hartley, D. M., and Peters, C. J. Syndromic surveillance and
bioterrorism-related epidemics. Emerging Infectious Diseases 9, 10 (2003), 1197.
[36] Buis, L. R. Emotional and informational support messages in an online hospice support community. Computers Informatics Nursing 26, 6 (2008), 358–367.
[37] Bundorf, M. K., Wagner, T. H., Singer, S. J., and Baker, L. C. Who searches the Internet for health
information? Health Services Research 41, 3p1 (2006), 819–836.
[38] Butler, D. When google got flu wrong. Nature 494, 7436 (2013), 155.
[39] Card, S. K., Mackinlay, J. D., Pirolli, P. L., and Pitkow, J. E. Method and apparatus for clustering a
collection of linked documents using co-citation analysis, 2000. US Patent 6,038,574.
[40] Carmichael,
A.
Infertility-Asthma Link Confirmed. Cure Together Blog.
www.curetogether.com/blog/2011/03/07/infertility-asthma-link-confirmed,
[Online:
ac-
cessed 15-Sept-2013].
[41] Carneiro, H. A., and Mylonakis, E. Google trends: a web-based tool for real-time surveillance of
disease outbreaks. Clinical Infectious Diseases 49, 10 (2009), 1557–1564.
[42] Cartright, M.-A., White, R. W., and Horvitz, E. Intentions and attention in exploratory health search.
In Research and Development in Information Retrieval, ACM SIGIR (2011), 65–74.
[43] Chapman, W. W., Fiszman, M., Dowling, J. N., Chapman, B. E., and Rindflesch, T. C. Identifying
respiratory findings in emergency department reports for biosurveillance using MetaMap. Medinfo
11, Pt 1 (2004), 487–91.
BIBLIOGRAPHY
143
[44] Chary, M., Genes, N., McKenzie, A., and Manini, A. F. Leveraging social networks for toxicovigilance. Journal of Medical Toxicology 9, 2 (2013), 184–191.
[45] Chee, B. W., Berlin, R., and Schatz, B. Predicting adverse drug events from personal health
messages. In American Medical Informatics Association Annual Symposium, AMIA (2011), 217.
[46] Cicero, T. J., Ellis, M. S., and Surratt, H. L. Effect of abuse-deterrent formulation of oxycontin. New
England Journal of Medicine 367, 2 (2012), 187–189.
[47] Civan, A., and Pratt, W. Threading together patient expertise. In American Medical Informatics
Association Annual Symposium, AMIA (2007), 140.
[48] Cleveland, W. S., and Devlin, S. J. Locally weighted regression: an approach to regression analysis by local fitting. Journal of the American Statistical Association 83, 403 (1988), 596–610.
[49] Cline, R. J., and Haynes, K. M. Consumer health information seeking on the Internet: the state of
the art. Health Education Research 16, 6 (2001), 671–692.
[50] Cohen, J. Weighted kappa: Nominal scale agreement provision for scaled disagreement or partial
credit. Psychological Bulletin 70, 4 (1968), 213.
[51] Coiera, E. Information epidemics, economics, and immunity on the Internet: We still know so little
about the effect of information on public health. British Medical Journal 317, 7171 (1998), 1469.
[52] Collier, N., Doan, S., Kawazoe, A., Goodwin, R. M., Conway, M., Tateno, Y., Ngo, Q.-H., Dien, D.,
Kawtrakul, A., Takeuchi, K., et al. Biocaster: detecting public health rumors with a web-based text
mining system. Bioinformatics 24, 24 (2008), 2940–2941.
[53] Cooper, C. P., Mallon, K. P., Leadbetter, S., Pollack, L. A., and Peipins, L. A. Cancer Internet
search activity on a major search engine, United States 2001-2003. Journal of Medical Internet
Research 7, 3 (2005), e36.
[54] Corazza, O., Valeriani, G., Bersani, F. S., Corkery, J., Martinotti, G., Bersani, G., and Schifano,
F. “Spice”, “Kryptonite”, “Black Mamba”: An Overview of Brand Names and Marketing Strategies
of Novel Psychoactive Substances on the Web. Journal of Psychoactive Drugs 46, 4 (2014),
287–294.
BIBLIOGRAPHY
144
[55] Corley, C., Mikler, A. R., Singh, K. P., and Cook, D. J. Monitoring influenza trends through mining
social media. In Bioinformatics and Computational Biology (2009), 340–346.
[56] Corley, C. D., Cook, D. J., Mikler, A. R., and Singh, K. P. Text and structural data mining of influenza
mentions in web and social media. International Journal of Environmental Research and Public
Health 7, 2 (2010), 596–615.
[57] Cotten, S. R., and Gupta, S. S. Characteristics of online and offline health information seekers
and factors that discriminate between them. Social Science & Medicine 59, 9 (2004), 1795–1806.
[58] Coulson, N. S. Receiving social support online: an analysis of a computer-mediated support group
for individuals living with irritable bowel syndrome. CyberPsychology & Behavior 8, 6 (2005), 580–
584.
[59] Coulson, N. S., Buchanan, H., and Aubeeluck, A. Social support in cyberspace: a content analysis
of communication within a Huntington’s disease online support group. Patient Education and
Counseling 68, 2 (2007), 173–178.
[60] Coulson, N. S., and Knibb, R. C. Coping with food allergy: exploring the role of the online support
group. CyberPsychology & Behavior 10, 1 (2007), 145–148.
[61] Coursaris, C. K., and Liu, M. An analysis of social support exchanges in online HIV/AIDS self-help
groups. Computers in Human Behavior 25, 4 (2009), 911–918.
[62] Culotta, A. Towards detecting influenza epidemics by analyzing Twitter messages. In workshop
on Social Media Analytics, ACM (2010), 115–122.
[63] Culotta, A. Lightweight methods to estimate influenza rates and alcohol sales volume from Twitter
messages. Language Resources and Evaluation 47, 1 (2013), 217–238.
[64] Culver, J. D., Gerr, F., Frumkin, H., et al. Medical information on the Internet. Journal of General
Internal Medicine 12, 8 (1997), 466–470.
[65] Curtis, B., Alanis-Hirsch, K., Kaynak, Ö., Cacciola, J., Meyers, K., and McLellan, A. T. Using
web searches to track interest in synthetic cannabinoids (aka “herbal incense”). Drug and Alcohol
Review 34, 1 (2014), 105–108.
BIBLIOGRAPHY
145
[66] Dasgupta, N., Freifeld, C., Brownstein, J. S., Menone, C. M., Surratt, H. L., Poppish, L., Green,
J. L., Lavonas, E. J., and Dart, R. C. Crowdsourcing black market prices for prescription opioids.
Journal of Medical Internet Research 15, 8 (2013), e178.
[67] Davison, K. P., Pennebaker, J. W., and Dickerson, S. S. Who talks? The social psychology of
illness support groups. American Psychologist 55, 2 (2000), 205.
[68] De Bock, G. H., Jacobi, C. E., Seynaeve, C., Krol-Warmerdam, E. M., Blom, J., Van Asperen, C. J.,
Cornelisse, C. J., Klijn, J. G., Devilee, P., Tollenaar, R. A., et al. A family history of breast cancer
will not predict female early onset breast cancer in a population-based setting. BMC Cancer 8, 1
(2008), 203.
[69] De Choudhury, M., Counts, S., and Horvitz, E. Major life changes and behavioral markers in social
media: case of childbirth. In Computer Supported Cooperative Work, ACM (2013), 1431–1442.
[70] De Choudhury, M., Counts, S., and Horvitz, E. Predicting postpartum changes in emotion and
behavior via social media. In Human Factors in Computing Systems, ACM (2013), 3267–3276.
[71] De Choudhury, M., Counts, S., Horvitz, E. J., and Hoff, A. Characterizing and predicting postpartum depression from shared Facebook data. In Computer Supported Cooperative Work, ACM
(2014), 626–638.
[72] De Choudhury, M., Gamon, M., Counts, S., and Horvitz, E. Predicting depression via social media.
In International Conference on Weblogs and Social Media, AAAI (2013).
[73] Deluca, P., Davey, Z., Corazza, O., Di Furia, L., Farre, M., Flesland, L. H., Mannonen, M., Majava,
A., Peltoniemi, T., Pasinetti, M., et al. Identifying emerging trends in recreational drug use; outcomes from the Psychonaut Web Mapping Project. Progress in Neuro-Psychopharmacology and
Biological Psychiatry 39, 2 (2012), 221–226.
[74] Diaz, J. A., Griffith, R. A., Ng, J. J., Reinert, S. E., Friedmann, P. D., and Moulton, A. W. Patients’
use of the Internet for medical information. Journal of General Internal Medicine 17, 3 (2002),
180–185.
[75] DiClemente, C. C., Prochaska, J. O., Fairhurst, S. K., Velicer, W. F., Velasquez, M. M., and Rossi,
J. S. The process of smoking cessation: an analysis of precontemplation, contemplation, and
preparation stages of change. Journal of Consulting and Clinical Psychology 59, 2 (1991), 295.
BIBLIOGRAPHY
146
[76] Dingare, S., Nissim, M., Finkel, J., Manning, C., and Grover, C. A system for identifying named
entities in biomedical text: How results from two evaluations reflect on both the system and the
evaluations. Comparative and Functional Genomics 6, 1-2 (2005), 77–85.
[77] Doing-Harris, K. M., and Zeng-Treitler, Q. Computer-assisted update of a consumer health vocabulary through mining of social network data. Journal of Medical Internet Research 13, 2 (2011),
e37.
[78] Dunning, T. Accurate methods for the statistics of surprise and coincidence. Computational Linguistics 19, 1 (1993), 61–74.
[79] DuPont, R. L., McLellan, A. T., White, W. L., Merlo, L. J., and Gold, M. S. Setting the standard
for recovery: Physicians’ health programs. Journal of Substance Abuse Treatment 36, 2 (2009),
159–171.
[80] Esquivel, A., Meric-Bernstam, F., and Bernstam, E. V. Accuracy and self correction of information
received from an Internet breast cancer list: content analysis. British Medical Journal 332, 7547
(2006), 939–942.
[81] Eysenbach, G. Infodemiology: tracking flu-related searches on the web for syndromic surveillance.
In American Medical Informatics Association Annual Symposium, AMIA (2006), 244–248.
[82] Eysenbach, G., and Köhler, C. How do consumers search for and appraise health information on
the world wide web? Qualitative study using focus groups, usability tests, and in-depth interviews.
British Medical Journal 324, 7337 (2002), 573.
[83] Eysenbach, G., Powell, J., Kuss, O., and Sa, E.-R. Empirical studies assessing the quality of
health information for consumers on the world wide web: a systematic review. Journal of the
American Medical Association 287, 20 (2002), 2691–2700.
[84] Farrell, M. Opiate withdrawal. Addiction 89, 11 (1994), 1471–1475.
[85] Fernandez-Luque, L., Karlsen, R., and Bonander, J. Review of extracting information from the
social web for health personalization. Journal of Medical Internet Research 13, 1 (2011), e15.
[86] Finfgeld, D. L. Therapeutic groups online: the good, the bad, and the unknown. Issues in Mental
Health Nursing 21, 3 (2000), 241–255.
BIBLIOGRAPHY
147
[87] Finkel, J., Dingare, S., Nguyen, H., Nissim, M., Manning, C., and Sinclair, G. Exploiting context
for biomedical entity recognition: from syntax to the web. In joint workshop on Natural Language
Processing in Biomedicine and its Applications, ACL (2004), 88–91.
[88] Fleiss, J. L. Measuring nominal scale agreement among many raters. Psychological Bulletin 76,
5 (1971), 378.
[89] Fox, N., Ward, K., and O’Rourke, A. Pro-anorexia, weight-loss drugs and the Internet: an “antirecovery” explanatory model of anorexia. Sociology of Health & Illness 27, 7 (2005), 944–971.
[90] Fox, S.
Peer-to-Peer Health Care.
Pew Internet & American Life Project, 2011.
[Online:
http://pewinternet.org/Reports/2011/P2PHealthcare/Summary-of-Findings.aspx, accessed 6-January-2014].
[91] Fox, S., and Duggan, M.
Health Online.
Pew Internet & American Life Project, 2013.
[Online: http://pewinternet.org/Reports/2013/Health-online/Summary-of-Findings.
aspx, accessed 2-April-2013].
[92] Fox,
S.,
and Rainie,
L.
Vital Decisions:
How Internet Users Decide what In-
formation to Trust when They Or Their Loved Ones are Sick.
American Life Project,
2002.
[Online:
Pew Internet &
http://www.pewinternet.org/2002/05/22/
vital-decisions-a-pew-internet-health-report/, accessed 2-April-2013].
[93] Franklin, V. L., Waller, A., Pagliari, C., and Greene, S. A. A randomized controlled trial of Sweet
Talk, a text-messaging system to support young people with diabetes. Diabetic Medicine 23, 12
(2006), 1332–1338.
[94] Frantzi, K., Ananiadou, S., and Mima, H. Automatic recognition of multi-word terms: the cvalue/nc-value method. International Journal on Digital Libraries 3, 2 (2000), 115–130.
[95] Friedrich, C. M., Revillion, T., Hofmann, M., and Fluck, J. Biomedical and chemical named entity
recognition with conditional random fields: the advantage of dictionary features. In Semantic
Mining in Biomedicine, vol. 7 (2006), 85–89.
[96] Frost, J. H., and Massagli, M. P. Social uses of personal health information within PatientsLikeMe,
an online community: what can happen when patients have access to one anothers data. Journal
of Medical Internet Research 10, 3 (2008), e15.
BIBLIOGRAPHY
148
[97] Gade, E. J., Thomsen, S. F., Lindenberg, S., Kyvik, K. O., Lieberoth, S., and Backer, V. Asthma
affects time to pregnancy and fertility: a register-based twin study. European Respiratory Journal
43, 4 (2014), 1077–1085.
[98] Gavin, J., Rodham, K., and Poyer, H. The presentation of “pro-anorexia” in online group interactions. Qualitative Health Research 18, 3 (2008), 325–333.
[99] Gibbs, R. D., Gibbs, P. H., and Henrich, J. Patient understanding of commonly used medical
vocabulary. The Journal of Family Practice 25, 2 (1987), 176–178.
[100] Ginsberg, J., Mohebbi, M. H., Patel, R. S., Brammer, L., Smolinski, M. S., and Brilliant, L. Detecting
influenza epidemics using search engine query data. Nature 457, 7232 (2008), 1012–1014.
[101] Gooden, R. J., and Winefield, H. R. Breast and prostate cancer online discussion boards a thematic analysis of gender differences and similarities. Journal of Health Psychology 12, 1 (2007),
103–114.
[102] Gossop, M., Battersby, M., and Strang, J. Self-detoxification by opiate addicts. a preliminary
investigation. The British Journal of Psychiatry 159, 2 (1991), 208–212.
[103] Gossop, M., Green, L., Phillips, G., and Bradley, B. Lapse, relapse and survival among opiate
addicts after treatment. A prospective follow-up study. The British Journal of Psychiatry 154, 3
(1989), 348–353.
[104] Grandinetti, D. A. Doctors and the web. Help your patients surf the Net safely. Medical Economics
77, 5 (2000), 186.
[105] Gray, N. J., Klein, J. D., Noyce, P. R., Sesselberg, T. S., and Cantrill, J. A. Health informationseeking behaviour in adolescence: the place of the Internet. Social Science & Medicine 60, 7
(2005), 1467–1478.
[106] Green, L., and Gossop, M. Effects of information on the opiate withdrawal syndrome. British
Journal of Addiction 83, 3 (1988), 305–309.
[107] Greene, J. A., Choudhry, N. K., Kilabuk, E., and Shrank, W. H. Online social networking by patients with diabetes: a qualitative evaluation of communication with Facebook. Journal of General
Internal Medicine 26, 3 (2011), 287–292.
BIBLIOGRAPHY
149
[108] Grimes, A., Landry, B. M., and Grinter, R. E. Characteristics of shared health reflections in a local
community. In Computer Supported Cooperative Work, ACM (2010), 435–444.
[109] Grishman, R., Huttunen, S., and Yangarber, R. Information extraction for enhanced access to
disease outbreak reports. Journal of Biomedical Informatics 35, 4 (2002), 236–246.
[110] Guest, G., MacQueen, K. M., and Namey, E. E. Applied Thematic Analysis. Sage, 2011.
[111] GuoDong, Z., and Jian, S. Exploring deep knowledge resources in biomedical name recognition.
In workshop on Natural Language Processing in Biomedicine and its Applications, ACL (2004),
96–99.
[112] Gupta, S., MacLean, D. L., Heer, J., and Manning, C. D. Induced lexico-syntactic patterns improve
information extraction from online medical forums. Journal of the American Medical Informatics
Association 21, 5 (2014), 902–909.
[113] Hampton, T. Warning system aims to detect emerging trends in illegal drug use. Journal of the
American Medical Association 312, 8 (2014), 779–779.
[114] Hansen, D. L., Derry, H. A., Resnick, P. J., and Richardson, C. R. Adolescents searching for
health information on the Internet: an observational study. Journal of Medical Internet Research
5, 4 (2003), e25.
[115] Hansen, R. N., Oster, G., Edelsberg, J., Woody, G. E., and Sullivan, S. D. Economic costs of
nonmedical use of prescription opioids. The Clinical Journal of Pain 27, 3 (2011), 194–202.
[116] Hardey, M. Doctor in the house: the Internet as a source of lay health knowledge and the challenge
to expertise. Sociology of Health & Illness 21, 6 (1999), 820–835.
[117] Hardey, M. the story of my illness: Personal accounts of illness on the Internet. Health: 6, 1 (2002),
31–46.
[118] Harman, G. A., Coppersmith, C. T., and Dredze, M. H. Measuring post traumatic stress disorder
in Twitter. In International Conference on Weblogs and Social Media, AAAI (2014), 579–582.
[119] Harpaz, R., DuMouchel, W., Shah, N. H., Madigan, D., Ryan, P., and Friedman, C. Novel datamining methodologies for adverse drug event discovery and analysis. Clinical Pharmacology &
Therapeutics 91, 6 (2012), 1010–1021.
BIBLIOGRAPHY
150
[120] Harris, S., and Gerich, E. Retiring the NSFNET Backbone Service: Chronicling the end of an era.
Connexions 10, 4 (1996).
[121] Hartzband, P., and Groopman, J. Untangling the Web: patients, doctors, and the Internet. New
England Journal of Medicine 362, 12 (2010), 1063–1066.
[122] Hartzler, A., and Pratt, W. Managing the personal side of health: How patient expertise differs
from the expertise of clinicians. Journal of Medical Internet Research 13, 3 (2011), e62.
[123] He, H. A., Greenberg, S., and Huang, E. M. One size does not fit all: applying the transtheoretical
model to energy feedback technology design. In Human Factors in Computing Systems, ACM
(2010), 927–936.
[124] He, Y., and Kayaalp, M. Biological entity recognition with conditional random fields. In American
Medical Informatics Association Annual Symposium, AMIA (2008), 293.
[125] Hearst, M. S. A simple algorithm for identifying abbreviation definitions in biomedical text. In
Pacific Symposium on Biocomputing (2003), 451–462.
[126] Heer, J., and Bostock, M. Crowdsourcing graphical perception: using Mechanical Turk to assess
visualization design. In Human Factors in Computing Systems, ACM (2010), 203–212.
[127] Heffernan, R., Mostashari, F., Das, D., Karpati, A., Kulldorff, M., Weiss, D., et al. Syndromic
surveillance in public health practice, New York City. Emerging Infectious Diseases 10, 5 (2004),
858–864.
[128] Henning, K. J. What is syndromic surveillance? Morbidity and Mortality Weekly Report (2004),
7–11.
[129] Homan, C. M., Lu, N., Tu, X., Lytle, M. C., and Silenzio, V. Social structure and depression in
TrevorSpace. In Computer supported Cooperative Work, ACM (2014), 615–625.
[130] Houston, T. K., Cooper, L. A., and Ford, D. E. Internet support groups for depression: a 1-year
prospective cohort study. American Journal of Psychiatry 159, 12 (2002), 2062–2068.
[131] Høybye, M. T., Johansen, C., and Tjørnhøj-Thomsen, T. Online interaction. Effects of storytelling
in an Internet breast cancer support group. Psycho-Oncology 14, 3 (2005), 211–220.
BIBLIOGRAPHY
151
[132] Hulth, A., and Rydevik, G. Web query-based surveillance in Sweden during the influenza A (H1N1)
2009 pandemic, April 2009 to February 2010. Euro Surveillance 16, 18 (2011).
[133] Humphreys, K. Circles of recovery: Self-help organizations for addictions. Cambridge Univ. Press,
2004.
[134] Hwang, K. O., Ottenbacher, A. J., Green, A. P., Cannon-Diehl, M. R., Richardson, O., Bernstam,
E. V., and Thomas, E. J. Social support in an Internet weight loss community. International Journal
of Medical Informatics 79, 1 (2010), 5–13.
[135] Jamison-Powell, S., Linehan, C., Daley, L., Garbett, A., and Lawson, S. I can’t get no sleep:
discussing #insomnia on Twitter. In Human Factors in Computing Systems, ACM (2012), 1501–
1510.
[136] Jha, M., and Elhadad, N. Cancer stage prediction based on patient online discourse. In workshop
on Biomedical Natural Language Processing, ACL (2010), 64–71.
[137] Johnson, H. A., Wagner, M. M., Hogan, W. R., Chapman, W., Olszewski, R. T., Dowling, J., Barnas,
G., et al. Analysis of web access logs for surveillance of influenza. Studies in Health Technology
and Informatics 107, Pt 2 (2004), 1202–1206.
[138] Jonquet, C., Shah, N. H., and Musen, M. A. The Open Biomedical Annotator. In summit on
Translational Bioinformatics, AMIA (2009), 56.
[139] Kandel, D. B. Stages and pathways of drug involvement: Examining the gateway hypothesis.
Cambridge University Press, 2002.
[140] Kaskutas, L. A., Bond, J., and Humphreys, K. Social networks as mediators of the effect of
Alcoholics Anonymous. Addiction 97, 7 (2002), 891–900.
[141] Kelly, J. F., Hoeppner, B., Stout, R. L., and Pagano, M. Determining the relative importance of
the mechanisms of behavior change within Alcoholics Anonymous: a multiple mediator analysis.
Addiction 107, 2 (2012), 289–299.
[142] Kendall, L., Hartzler, A., Klasnja, P., and Pratt, W. Descriptive analysis of physical activity conversations on Twitter. In extended abstracts on Human Factors in Computing Systems, ACM (2011),
1555–1560.
BIBLIOGRAPHY
152
[143] Keselman, A., Smith, C. A., Divita, G., Kim, H., Browne, A. C., Leroy, G., and Zeng-Treitler, Q.
Consumer health concepts that do not map to the UMLS: where do they fit?
Journal of the
American Medical Informatics Association 15, 4 (2008), 496–505.
[144] Keselman, A., Tse, T., Crowell, J., Browne, A., Ngo, L., and Zeng, Q. Assessing consumer health
vocabulary familiarity: an exploratory study. Journal of Medical Internet Research 9, 1 (2007), e5.
[145] Kim, J.-D., Ohta, T., Tateisi, Y., and Tsujii, J. GENIA corpus – a semantically annotated corpus for
bio-textmining. Bioinformatics 19, suppl 1 (2003), i180–i182.
[146] Kim, J.-D., Ohta, T., Tsuruoka, Y., Tateisi, Y., and Collier, N. Introduction to the bio-entity recognition task at JNLPBA. In joint workshop on Natural Language Processing in Biomedicine and its
Applications, ACL (2004), 70–75.
[147] Kittur, A., Chi, E. H., and Suh, B. Crowdsourcing user studies with Mechanical Turk. In Human
Factors in Computing Systems, ACM (2008), 453–456.
[148] Klemm, P., Bunnell, D., Cullen, M., Soneji, R., Gibbons, P., and Holecek, A. Online cancer support
groups: a review of the research literature. Computers Informatics Nursing (2003).
[149] Kummervold, P. E., Gammon, D., Bergvik, S., Johnsen, J.-A. K., Hasvold, T., and Rosenvinge, J. H.
Social support in a wired world: use of online mental health forums in Norway. Nordic Journal of
Psychiatry 56, 1 (2002), 59–65.
[150] LaCoursiere, S. P., Knobf, M. T., and McCorkle, R. Cancer patients’ self-reported attitudes about
the Internet. Journal of Medical Internet Research 7, 3 (2005), e22.
[151] Lafferty, J., McCallum, A., and Pereira, F. C. Conditional random fields: Probabilistic models for
segmenting and labeling sequence data. In International Conference on Machine Learning, ACM
(2001), 282–289.
[152] Lamb, A., Paul, M. J., and Dredze, M. Separating fact from fear: Tracking flu infections on Twitter.
In North American Chapter of the ACL : Human Language Technologies, ACL (2013), 789–795.
[153] Lasker, J. N., Sogolow, E. D., and Sharim, R. R. The role of an online community for people with
a rare disease: content analysis of messages posted on a primary biliary cirrhosis mailing list.
Journal of Medical Internet Research 7, 1 (2005), e10.
BIBLIOGRAPHY
153
[154] Leaman, R., Wojtulewicz, L., Sullivan, R., Skariah, A., Yang, J., and Gonzalez, G. Towards
Internet-age pharmacovigilance: extracting adverse drug reactions from user posts to healthrelated social networks. In workshop on Biomedical Natural Language Processing, ACL (2010),
117–125.
[155] Lembke, A. Humphreys, K. Self-Help Organizations for Substance Use Disorders. Oxford Univ.
Press, 2009.
[156] Lewis, T. Seeking health information on the Internet: lifestyle choice or bad attack of cyberchondria? Media, Culture & Society 28, 4 (2006), 521–539.
[157] Liang, P. Semi-supervised learning for natural language. PhD thesis, Massachusetts Institute of
Technology, 2005.
[158] Lieberman, M. A., Golant, M., Giese-Davis, J., Winzlenberg, A., Benjamin, H., Humphreys, K.,
Kronenwetter, C., Russo, S., and Spiegel, D. Electronic support groups for breast carcinoma.
Cancer 97, 4 (2003), 920–925.
[159] MacLean, D. L., and Heer, J. Identifying medical terms in patient-authored text: a crowdsourcingbased approach. Journal of the American Medical Informatics Association 20, 6 (2013), 1120–
1127.
[160] Malik, S. H., and Coulson, N. The male experience of infertility: a thematic analysis of an online
infertility support group bulletin board. Journal of Reproductive and Infant Psychology 26, 1 (2008),
18–30.
[161] Malik, S. H., and Coulson, N. S. Coping with infertility online: An examination of self-help mechanisms in an online infertility support group. Patient Education and Counseling 81, 2 (2010),
315–318.
[162] Maloney-Krichmar, D., and Preece, J. A multilevel analysis of sociability, usability, and community
dynamics in an online health community. ACM Transactions on Computer-Human Interaction 12,
2 (2005), 201–232.
[163] Mandl, K. D., Overhage, J. M., Wagner, M. M., Lober, W. B., Sebastiani, P., Mostashari, F., Pavlin,
J. A., Gesteland, P. H., Treadwell, T., Koski, E., et al. Implementing syndromic surveillance: a
BIBLIOGRAPHY
154
practical guide informed by the early experience. Journal of the American Medical Informatics
Association 11, 2 (2004), 141–150.
[164] Mankoff, J., Kuksenok, K., Kiesler, S., Rode, J. A., and Waldman, K. Competing online viewpoints
and models of chronic illness. In Human Factors in Computing Systems, ACM (2011), 589–598.
[165] Mayer, D. K., Terrin, N. C., Kreps, G. L., Menon, U., McCance, K., Parsons, S. K., and Mooney,
K. H. Cancer survivors information seeking behaviors: a comparison of survivors who do and do
not seek information about cancer. Patient Education and Counseling 65, 3 (2007), 342–350.
[166] Mayer, M., and Till, J. The Internet: a modern Pandora’s box?
Quality of Life Research 5, 6
(1996), 568–571.
[167] McCray, A. T., Loane, R. F., Browne, A. C., and Bangalore, A. K. Terminology issues in user
access to web-based medical information. In American Medical Informatics Association Annual
Symposium, AMIA (1999), 107.
[168] McLellan, A. T. What is recovery? Revisiting the Betty Ford Institute consensus panel definition.
Journal of Substance Abuse Treatment (2010), 109–113.
[169] McLellan, A. T., Lewis, D. C., O’Brien, C. P., and Kleber, H. D. Drug dependence, a chronic medical
illness: implications for treatment, insurance, and outcomes evaluation. Journal of the American
Medical Association 284, 13 (2000), 1689–1695.
[170] McNeil, K., Brna, P., and Gordon, K. Epilepsy in the Twitter era: a need to re-tweet the way we
think about seizures. Epilepsy & Behavior 23, 2 (2012), 127–130.
[171] Medawar, C., Herxheimer, A., Bell, A., and Jofre, S. Paroxetine, panorama and user reporting of
adrs: Consumer intelligence matters in clinical practice and post-marketing drug surveillance. The
International Journal of Risk and Safety in Medicine 15, 3 (2002), 161–169.
[172] Medlineplus use by quarter. National Library of Medicine (2013). [Online: http://www.nlm.nih.
gov/medlineplus/usestatistics.html, accessed 25-August-2014].
[173] Meier, A., Lyons, E. J., Frydman, G., Forlenza, M., and Rimer, B. K. How cancer survivors provide
support on cancer-related Internet mailing lists. Journal of Medical Internet Research 9, 2 (2007),
e12.
BIBLIOGRAPHY
155
[174] Merrill, J. O., Rhodes, L. A., Deyo, R. A., Marlatt, G. A., and Bradley, K. A. Mutual mistrust in the
medical care of drug users. Journal of General Internal Medicine 17, 5 (2002), 327–333.
[175] Migneault, J. P., Adams, T. B., and Read, J. P. Application of the transtheoretical model to substance abuse: historical development and future directions. Drug and Alcohol Review 24, 5 (2005),
437–448.
[176] Miller, N. S., Sheppard, L. M., Colenda, C. C., and Magen, J. Why physicians are unprepared
to treat patients who have alcohol-and drug-related disorders. Academic Medicine 76, 5 (2001),
410–418.
[177] Mo, P. K., and Coulson, N. S. Exploring the communication of social support within virtual communities: A content analysis of messages posted to an online HIV/AIDS support group. Cyberpsychology & Behavior 11, 3 (2008), 371–374.
[178] Morahan-Martin, J. M. How Internet users find, evaluate, and use online health information: a
cross-cultural review. CyberPsychology & Behavior 7, 5 (2004), 497–510.
[179] Mulveen, R., and Hepworth, J. An interpretative phenomenological analysis of participation in a
pro-anorexia Internet site and its relationship with disordered eating. Journal of Health Psychology
11, 2 (2006), 283–296.
[180] Murnane, E. L., and Counts, S. Unraveling abstinence and relapse: smoking cessation reflected
in social media. In Human Factors in Computing Systems, ACM (2014), 1345–1354.
[181] Murray, E., Lo, B., Pollack, L., Donelan, K., Catania, J., Lee, K., Zapert, K., and Turner, R. The
impact of health information on the Internet on health care and the physician-patient relationship:
national U.S. survey among 1.050 U.S. physicians. Journal of Medical Internet Research 5, 3
(2003).
[182] Nettleton, S., Burrows, R., and O’Malley, L. The mundane realities of the everyday lay use of the
Internet for health, and their consequences for media convergence. Sociology of Health & Illness
27, 7 (2005), 972–992.
[183] Nikfarjam, A., and Gonzalez, G. H. Pattern mining for extraction of mentions of adverse drug
reactions from user comments. In American Medical Informatics Association Annual Symposium,
AMIA (2011), 1019.
BIBLIOGRAPHY
156
[184] Noble, A., Best, D., Man, L.-H., Gossop, M., and Strang, J. Self-detoxification attempts among
methadone maintenance patients: what methods and what success? Addictive Behaviors 27, 4
(2002), 575–584.
[185] Nonnecke, B., and Preece, J. Shedding light on lurkers in online communities. Ethnographic Studies in Real and Virtual Environments: Inhabited Information Spaces and Connected Communities
(1999), 123–128.
[186] Nonnecke, B., and Preece, J. Lurker demographics: Counting the silent. In Human Factors in
Computing Systems, ACM (2000), 73–80.
[187] Olsen, Y., and Sharfstein, J. M. Confronting the stigma of opioid use disorder – and its treatment.
Journal of the American Medical Association 311, 14 (2014), 1393–1394.
[188] Owen, J. E., Giese-Davis, J., Cordova, M., Kronenwetter, C., Golant, M., and Spiegel, D. Selfreport and linguistic indicators of emotional expression in narratives as predictors of adjustment to
cancer. Journal of Behavioral Medicine 29, 4 (2006), 335–345.
[189] Owoputi, O., O’Connor, B., Dyer, C., Gimpel, K., Schneider, N., and Smith, N. A. Improved partof-speech tagging for online conversational text with word clusters. In North American Chapter of
the ACL : Human Language Technologies, ACL (2013), 380–390.
[190] Pagano, M. E., Friend, K. B., Tonigan, J. S., and Stout, R. L. Helping other alcoholics in Alcoholics
Anonymous and drinking outcomes: Findings from Project MATCH. Journal of Studies on Alcohol
65, 6 (2004), 766.
[191] Park, S., Lee, S. W., Kwak, J., Cha, M., and Jeong, B. Activities on Facebook reveal the depressive
state of users. Journal of Medical Internet Research 15, 10 (2013), e217.
[192] Parker,
Life
K.,
Project,
and
Wang,
2013.
W.
Modern
[Online:
Parenthood.
Pew
Internet
&
American
http://www.pewsocialtrends.org/2013/03/14/
modern-parenthood-roles-of-moms-and-dads-converge-as-they-balance-work,
ac-
cessed 2-April-2013].
[193] Paul, M. J., and Dredze, M. A model for mining public health topics from Twitter. In Health, vol. 11
(2012), 16–6.
BIBLIOGRAPHY
157
[194] Peat, H. J., and Willett, P. The limitations of term co-occurrence data for query expansion in
document retrieval systems. JASIS 42, 5 (1991), 378–383.
[195] Pennebaker, J. W., Francis, M. E., and Booth, R. J. Linguistic inquiry and word count: LIWC 2001.
Mahway: Lawrence Erlbaum Associates 71 (2001).
[196] Pennebaker, J. W., Mehl, M. R., and Niederhoffer, K. G. Psychological aspects of natural language
use: Our words, our selves. Annual Review of Psychology 54, 1 (2003), 547–577.
[197] Ploderer, B., Smith, W., Howard, S., Pearce, J., and Borland, R. Patterns of support in an online
community for smoking cessation. In International Conference on Communities and Technologies,
ACM (2013), 26–35.
[198] Polgreen, P. M., Chen, Y., Pennock, D. M., Nelson, F. D., and Weinstein, R. A. Using Internet
searches for influenza surveillance. Clinical Infectious Diseases 47, 11 (2008), 1443–1448.
[199] Potts, H. W., and Wyatt, J. C. Survey of doctors’ experience of patients using the Internet. Journal
of Medical Internet Research 4, 1 (2002), e5.
[200] Powell, J., and Clarke, A. Internet information-seeking in mental health population survey. The
British Journal of Psychiatry 189, 3 (2006), 273–277.
[201] Pratt, W., and Yetisgen-Yildiz, M. A study of biomedical concept identification: Metamap vs. people. In American Medical Informatics Association Annual Symposium, AMIA (2003), 529.
[202] Preece, J., Nonnecke, B., and Andrews, D. The top five reasons for lurking: improving community
experiences for everyone. Computers in Human Behavior 20, 2 (2004), 201–223.
[203] Prochaska, J. O., and Velicer, W. F. The transtheoretical model of health behavior change. American Journal of Health Promotion 12, 1 (1997), 38–48.
[204] Pyysalo, S., Ginter, F., Heimonen, J., Björne, J., Boberg, J., Järvinen, J., and Salakoski, T. Bioinfer:
a corpus for information extraction in the biomedical domain. BMC Bioinformatics 8, 1 (2007), 50.
[205] Rainie,
L.,
and Fox,
American Life Project,
S.
2000.
The Online Health Care Revolution.
[Online:
Pew Internet &
http://www.pewinternet.org/2000/11/26/
the-online-health-care-revolution/, accessed 2-April-2013].
BIBLIOGRAPHY
158
[206] Ravert, R. D., Hancock, M. D., and Ingersoll, G. M. Online forum messages posted by adolescents
with type 1 diabetes. The Diabetes Educator 30, 5 (2003), 827–834.
[207] Reis, B. Y., and Mandl, K. D. Time series modeling for syndromic surveillance. BMC Medical
Informatics and Decision Making 3, 1 (2003), 2.
[208] Resnik, P., Garron, A., and Resnik, R. Using topic modeling to improve prediction of neuroticism and depression. In Conference on Empirical Methods in Natural Language Processing, ACL
(2013), 1348–1353.
[209] Rideout, V. Generation Rx.com. what are young people really doing online? Marketing Health
Services 22, 1 (2002), 26.
[210] Risk, A., and Petersen, C. Health information on the Internet: quality issues and international
initiatives. Journal of the American Medical Association 287, 20 (2002), 2713–2715.
[211] Rodgers, S., and Chen, Q. Internet community group participation: Psychosocial benefits for
women with breast cancer. Journal of Computer-Mediated Communication 10, 4 (2005).
[212] Ruau, D., Mbagwu, M., Dudley, J. T., Krishnan, V., and Butte, A. J. Comparison of automated and
human assignment of MeSH terms on publicly-available molecular datasets. Journal of Biomedical
Informatics 44 (2011), S39–S43.
[213] Sadilek, A., Brennan, S., Kautz, H., and Silenzio, V. nEmesis: Which restaurants should you avoid
today? In Human Computation and Crowdsourcing, AAAI (2013).
[214] Saha, S. K., Sarkar, S., and Mitra, P. Feature selection techniques for maximum entropy based
biomedical named entity recognition. Journal of Biomedical Informatics 42, 5 (2009), 905–911.
[215] Salem, D. A., Bogat, G. A., and Reid, C. Mutual help goes on-line. Journal of Community Psychology 25, 2 (1997), 189–207.
[216] Sanderson, M., and Croft, B. Deriving concept hierarchies from text. In Research and Development in Information Retrieval, ACM SIGIR (1999), 206–213.
[217] Sanford, A. A. “I can air my feelings instead of eating them”: Blogging as social support for the
morbidly obese. Communication Studies 61, 5 (2010), 567–584.
BIBLIOGRAPHY
159
[218] Scanfeld, D., Scanfeld, V., and Larson, E. L. Dissemination of health information through social
networks: Twitter and antibiotics. American Journal of Infection Control 38, 3 (2010), 182–188.
[219] Schatz, B. R., Johnson, E. H., Cochrane, P. A., and Chen, H. Interactive term suggestion for
users of digital libraries: Using subject thesauri and co-occurrence lists for information retrieval. In
International Conference on Digital libraries, ACM (1996), 126–133.
[220] Seale, C., Ziebland, S., and Charteris-Black, J. Gender, cancer experience and Internet use: a
comparative keyword analysis of interviews and online cancer support groups. Social Science &
Medicine 62, 10 (2006), 2577–2590.
[221] Seifter, A., Schwarzwalder, A., Geis, K., and Aucott, J. The utility of Google Trends for epidemiological research: Lyme disease as an example. Geospatial Health 4, 2 (2010), 135–137.
[222] Settles, B. Biomedical named entity recognition using conditional random fields and rich feature
sets. In joint workshop on Natural Language Processing in Biomedicine and its Applications, ACL
(2004), 104–107.
[223] Sheeren, M. The relationship between relapse and involvement in Alcoholics Anonymous. Journal
of Studies on Alcohol and Drugs 49, 1 (1988), 104.
[224] Shuyler, K. S., and Knight, K. M. What are patients seeking when they turn to the Internet?
Qualitative content analysis of questions asked by visitors to an orthopaedics web site. Journal of
Medical Internet Research 5, 4 (2003), e24.
[225] Sillence, E., Briggs, P., Harris, P. R., and Fishwick, L. How do patients evaluate and make use of
online health information? Social Science & Medicine 64, 9 (2007), 1853–1862.
[226] Smith, C. A., and Wicks, P. J. PatientsLikeMe: Consumer health vocabulary as a folksonomy. In
American Medical Informatics Association Annual Symposium, AMIA (2008), 682.
[227] Smyth, B., Barry, J., Keenan, E., and Ducray, K. Lapse and relapse following inpatient treatment
of opiate dependence. Irish Medical Journal 103, 6 (2010), 176–179.
[228] Snow, R., O’Connor, B., Jurafsky, D., and Ng, A. Y. Cheap and fast—but is it good? Evaluating
non-expert annotations for natural language tasks. In Empirical Methods in Natural Language
Processing, ACL (2008), 254–263.
BIBLIOGRAPHY
160
[229] Sproule, B., Brands, B., Li, S., and Catz-Biro, L. Changing patterns in opioid addiction – characterizing users of oxycodone and other opioids. Canadian Family Physician 55, 1 (2009), 68–69.
[230] Strang, J., Babor, T., Caulkins, J., Fischer, B., Foxcroft, D., and Humphreys, K. Drug policy and
the public good: evidence for effective interventions. The Lancet 379, 9810 (2012), 71–83.
[231] Substance Abuse and Mental Health Services Administration. Drug Abuse Warning Network,
2011: National Estimates of Drug-Related Emergency Department Visits. HHS Publication No.
(SMA) 13-4760, DAWN Series D-39. Rockville, MD: Substance Abuse and Mental Health Services
Administration, 2013.
[232] Substance Abuse and Mental Health Services Administration, Center for Behavioral Health Statistics and Quality. The N-SSATS report: Trends in the use of methadone and buprenorphine at
substance abuse treatment facilities: 2003 to 2011. Rockville, MD. 2013.
[233] Sullivan, C. F. Gendered cybersupport: A thematic analysis of two online cancer support groups.
Journal of Health Psychology 8, 1 (2003), 83–104.
[234] Sullivan, S. J., Schneiders, A. G., Cheang, C.-W., Kitto, E., Lee, H., Redhead, J., Ward, S., Ahmed,
O. H., and McCrory, P. R. What’s happening? A content analysis of concussion-related traffic on
Twitter. British Journal of Sports Medicine 46, 4 (2012), 258–263.
[235] Teodoro, R., and Naaman, M. Fitter with Twitter: Understanding personal health and fitness
activity in social media. In International Conference on Weblogs and Social Media (2013).
[236] Thomas, D. R. A general inductive approach for analyzing qualitative evaluation data. American
Journal of Evaluation 27, 2 (2006), 237–246.
[237] Tonigan, J. S., and Rice, S. L. Is it beneficial to have an Alcoholics Anonymous sponsor? Psychology of Addictive Behaviors 24, 3 (2010), 397.
[238] Tsai, R. T.-H., Wu, S.-H., Chou, W.-C., Lin, Y.-C., He, D., Hsiang, J., Sung, T.-Y., and Hsu, W.-L.
Various criteria in the evaluation of biomedical named entity recognition. BMC Bioinformatics 7, 1
(2006), 92.
[239] Tsai, T.-h., Chou, W.-C., Wu, S.-H., Sung, T.-Y., Hsiang, J., and Hsu, W.-L. Integrating linguistic
knowledge into a conditional random fieldframework to identify biomedical named entities. Expert
Systems with Applications 30, 1 (2006), 117–128.
BIBLIOGRAPHY
161
[240] Turner-McGrievy, G. M., and Tate, D. F. Weight loss social support in 140 characters or less: use of
an online social network in a remotely delivered weight loss intervention. Translational Behavioral
Medicine 3, 3 (2013), 287–294.
[241] United States Department of Health and Human Services. Substance Abuse and Mental Health Services Administration. Center for Behavioral Health Statistics and Quality.
Treatment Episode Data Set – Admissions (TEDS-A), 2011. ICPSR34876-v3. Ann Arbor,
MI: Inter-university Consortium for Political and Social Research [distributor], 2014-09-11.
http://doi.org/10.3886/ICPSR34876.v3.
[242] U.S. Department of Health and Human Services. Substance Abuse and Mental Health Services
Administration. Results from the 2010 National Survey on Drug Use and Health: Summary of
National Findings. [Online: http://www.samhsa.gov/data/nsduh/2k10nsduh/2k10results.
htm, accessed 15-Sept-2013].
[243] Ussher, J., Kirsten, L., Butow, P., and Sandoval, M. What do cancer support groups provide which
other supportive relationships do not? The experience of peer support groups for people with
cancer. Social Science & Medicine 62, 10 (2006), 2565–2576.
[244] Van Hout, M. C., and Bingham, T. Silk road, the virtual drug marketplace: a single case study of
user experiences. International Journal of Drug Policy 24, 5 (2013), 385–391.
[245] van Rijsbergen, C. J. A theoretical basis for the use of co-occurrence data in information retrieval.
Journal of Documentation 33, 2 (1977), 106–119.
[246] van Uden-Kraan, C. F., Drossaert, C. H., Taal, E., Seydel, E. R., and van de Laar, M. A. Selfreported differences in empowerment between lurkers and posters in online patient support
groups. Journal of Medical Internet Research 10, 2 (2008), e18.
[247] Velicer, W. F., Prochaska, J. O., Fava, J. L., Norman, G. J., and Redding, C. A. Smoking cessation and stress management: Applications of the transtheoretical model of behavior change.
Homeostasis in Health and Disease 38 (1998), 216–233.
[248] Vlahovic, T. A., Wang, Y.-C., Kraut, R. E., and Levine, J. M. Support matching and satisfaction
in an online breast cancer support community. In Human Factors in Computing Systems, ACM
(2014), 1625–1634.
BIBLIOGRAPHY
162
[249] Volkow, N. D. Prescription drugs: Abuse and addiction, 2005. [Online: http://www.drugabuse.
gov/sites/default/files/rxreportfinalprint.pdf, accessed 9/4/2014].
[250] Wang, Y.-C., Kraut, R., and Levine, J. M. To stay or leave? The relationship of emotional and
informational support to commitment in online health support groups. In Computer Supported
Cooperative Work, ACM (2012), 833–842.
[251] Warner, M., Chen, L. H., Makuc, D. M., Anderson, R. N., and Miniño, A. M. Drug poisoning deaths
in the United States, 1980-2008. NCHS Data Brief, 81 (2011), 1–8.
[252] Wen, M., and Rosé, C. P. Understanding participant behavior trajectories in online health support
groups using automatic extraction methods. In International Conference on Supporting Group
Work, ACM (2012), 179–188.
[253] West, R. Time for a change: putting the transtheoretical (stages of change) model to rest. Addiction 100, 8 (2005), 1036–1039.
[254] White, R. W., and Horvitz, E. Cyberchondria: studies of the escalation of medical concerns in web
search. ACM Transactions on Information Systems 27, 4 (2009), 23.
[255] White, R. W., and Horvitz, E. Web to world: Predicting transitions from self-diagnosis to the pursuit
of local medical assistance in web search. In American Medical Informatics Association Annual
Symposium, AMIA (2010), 882.
[256] White, R. W., and Horvitz, E. Studies of the onset and persistence of medical concerns in search
logs. In Research and Development in Information Retrieval, ACM SIGIR (2012), 265–274.
[257] White, R. W., Tatonetti, N. P., Shah, N. H., Altman, R. B., and Horvitz, E. Web-scale pharmacovigilance: listening to signals from the crowd. Journal of the American Medical Informatics Association
20, 1 (2013), 404–408.
[258] Wicks, P., Keininger, D. L., Massagli, M. P., la Loge, C. d., Brownstein, C., Isojärvi, J., and Heywood, J. Perceived benefits of sharing health data between people with epilepsy on an online
platform. Epilepsy & Behavior 23, 1 (2012), 16–23.
[259] Wicks, P., Massagli, M., Frost, J., Brownstein, C., Okun, S., Vaughan, T., Bradley, R., and Heywood, J. Sharing health data for better outcomes on PatientsLikeMe. Journal of Medical Internet
Research 12, 2 (2010), e19.
BIBLIOGRAPHY
163
[260] Wicks, P., Vaughan, T. E., Massagli, M. P., and Heywood, J. Accelerated clinical discovery using
self-reported patient data collected online and a patient-matching algorithm. Nature Biotechnology
29, 5 (2011), 411–414.
[261] Wilson, J. L., Peebles, R., Hardy, K. K., and Litt, I. F. Surfing for thinness: a pilot study of pro–
eating disorder web site usage in adolescents with eating disorders. Pediatrics 118, 6 (2006),
e1635–e1643.
[262] Wilson, K., and Brownstein, J. S. Early detection of disease outbreaks using the Internet. Canadian Medical Association Journal 180, 8 (2009), 829–831.
[263] Wood, E., Samet, J. H., and Volkow, N. D. Physician education in addiction medicine. Journal of
the American Medical Association 310, 16 (2013), 1673–1674.
[264] Xu, R., Supekar, K., Morgan, A., Das, A., and Garber, A. Unsupervised method for automatic construction of a disease dictionary from a large free text collection. In American Medical Informatics
Association Annual Symposium, AMIA (2008), 820.
[265] Yang, C. C., Jiang, L., Yang, H., and Tang, X. Detecting signals of adverse drug reactions from
health consumer contributed content in social media. In workshop on Health Informatics, ACM
SIGKDD (2012).
[266] Yang, C. C., Yang, H., Jiang, L., and Zhang, M. Social media mining for drug safety signal detection. In workshop on Smart Health and Wellbeing, ACM (2012), 33–40.
[267] Yang, Z., Lin, H., and Li, Y. Exploiting the contextual cues for bio-entity name recognition in
biomedical literature. Journal of Biomedical Informatics 41, 4 (2008), 580–587.
[268] Yates, A., and Goharian, N. ADRTrace: detecting expected and unexpected adverse drug reactions from user reviews on social media sites. In Advances in Information Retrieval. Springer,
2013, 816–819.
[269] Yates, A., Goharian, N., and Frieder, O. Extracting adverse drug reactions from forum posts and
linking them to drugs. In workshop on Health Search and Discovery, ACM SIGIR (2013).
[270] Ybarra, M. L., and Eaton, W. W. Internet-based mental health interventions. Mental Health Services Research 7, 2 (2005), 75–87.
BIBLIOGRAPHY
164
[271] Yeh, A., Morgan, A., Colosimo, M., and Hirschman, L. BioCreAtIvE task 1A: gene mention finding
evaluation. BMC Bioinformatics 6, Suppl 1 (2005), S2.
[272] Zeng, Q., Kogan, S., Ash, N., Greenes, R., and Boxwala, A. Characteristics of consumer terminology for health information retrieval. Methods of Information in Medicine 41, 4 (2002), 289–298.
[273] Zeng, Q. T., and Tse, T. Exploring and developing consumer health vocabularies. Journal of the
American Medical Informatics Association 13, 1 (2006), 24–29.
[274] Zeng, Q. T., Tse, T., Divita, G., Keselman, A., Crowell, J., Browne, A. C., Goryachev, S., and Ngo,
L. Term identification methods for consumer health vocabulary development. Journal of Medical
Internet Research 9, 1 (2007), e4.
[275] Ziebland, S., Chapple, A., Dumelow, C., Evans, J., Prinjha, S., and Rozmovits, L. How the Internet
affects patients’ experience of cancer: a qualitative study. British Medical Journal 328, 7439
(2004), 564.

insights from patient authored text: from close reading to automated

Transcription

Similar documents

June 2015_Newsletter_online

Addictive Games and affect on our children

Printable Newsletter

Start

details.

London Drugs Case Study

nicolle westlund

Foundations Worktour Pittsburgh 2016

In Search of Pandas - Tulsa Global Alliance

frank meeink - Admire Entertainment