For Peer Review - Center for Functional and Molecular Imaging

Transcription

For Peer Review - Center for Functional and Molecular Imaging
Technical Abstract
Background Post-Traumatic Stress Disorder (PTSD) is a multifactorial disease that develops following
exposure to traumatic events ranging from motor vehicle accidents to terrorism. While most individuals recover
from this acute form of stress, others are left with a devastating experience that could lead to a whole spectrum
of mental challenges. The complex and heterogeneous response to trauma makes treatment design and
assessment a challenge. While the recently released report from the Institute of Medicine (IOM) concluded that
only psychotherapies based on exposure therapy met their evidenced-based criteria for efficacy, the report
recommended that future studies of treatment efficacy include use of randomized control trials, investigator
independence, and proper handling of attrition (IOM: Committee on Treatment of Posttraumatic Stress Disorder,
2007). Gaps with regards to treatment of PTSD identified by the IOM included determining length of treatment
necessary, long-term follow-up of subjects, studying important veteran subpopulations, and investigating three
factors related to outcome: loss of PTSD diagnosis, symptom improvement, and end state functioning.
Objective Therefore, we propose a multidisciplinary approach to comparing a positive guided imagery-based
CAM treatment outcomes to the above mentioned exposure therapy on war zone-exposed PTSD soldiers using
state-of the-art functional brain imaging technology and a novel high throughput analytical platform,
phylomics® (patent pending), that is able to discriminate based on proteomics/genomics signatures between
treatment responders and non-responders.
Specific Aims/Hypothesis: Aim 1: To compare Guided Imagery against Prolonged Exposure therapy. These
two treatment modalities are similar in structure but differ in terms of emphasis on positive (Guided Imagery)
and traumatic (Prolonged Exposure) imagery. Aim 2: To identify the impact of both treatment regimes on
neuroendocrine markers of stress and neurobiological changes. We will focus on both central and peripheral
pathways. We hypothesize that the physiological impact of Guided Imagery treatment will be positive changes
in each of these stress makers: glucocorticoids, DHEA, DHEA-S, neuropeptide Y, allopregnanolone. Aim 3: To
determine the neurobiological changes associated with treatment. Previous functional MRI (fMRI) studies of
PTSD, have generally reported a decrease in activity in the medial prefrontal cortex (mPFC) and the anterior
cingulate cortex (ACC) with a corresponding increase in the amygdala. Activity in the mPFC and ACC has also
been reported to negatively correlate with PTSD severity. We predict that activation of ACC during an SDI task
will increase after treatment while the mPFC and amygdala will remain unchanged. Aim 4: To methodically
examine treatment responders from non-responders using phylomics®. The baseline measures of stress both
neuroendocrine and neuronal activity in addition to the genetic profile will be used as input to the this algorithm
and is expected to be able to distinguish responders from non-responders from their baseline profile.
Study Design: It centers on comparing CAM-based Guided Imagery to the standard exposure therapy. Subjects
will be randomized into one of the two arms. Baseline assessments will include a number of clinical assessment
instruments, collection of saliva, blood specimens, and fMRI. Two follow-up assessments will be performed:
upon completion of the treatment and six months later. Biological specimens and fMRI will be performed
within 2 weeks of the last treatment session. PTSD symptom severity will be assessed using the CAPS no more
than four weeks after the last treatment. The final assessment of PTSD symptoms will be performed over the
phone at the 6-month follow-up.
Innovation: The proposed study will use a novel classification technique called ‘phylomics®’ (patent pending)
to identify PTSD treatment responders from non-responders based on their neurobiologic signature. Success
will be based on randomized assignment to one of the two interventions. This study will not only determine the
efficacy of the CAM treatment for PTSD compared to accepted therapy, but it will also provide scientific
evidence of the changes induced by the each treatment in the neuroendocrine and neurobiological profile of the
subjects. The treatment outcomes will be compared with the predictions generated by the phylomics® algorithm.
Thus, the proposed study has many possible outcomes. Each outcome will independently move the study of the
treatment of PTSD forward and together will form a basis of understanding representing a major advance in this
field.
Impact: The relevance of the proposed study to treatment of PTSD is 1) a systematic comparison of a CAMbased treatment to the more accepted exposure based therapy, 2) identification of the changes in neural markers
that relate to treatment outcome, and 3) a test of a novel technique to predict treatment outcome.
Public Abstract
Background: Post-Traumatic Stress Disorder (PTSD) is a debilitating mental health disease that has been
increasing in occurrence, especially in the military population deployed in war zones. PTSD in our returning
soldiers from Operation Iraqi Freedom (OIF) has been estimated at 9.8% with an odds ratio (OR) of 5.51 and in
those returning from Operation Enduring Freedom (OEF) at 2.1% with an OR of 2.52. Unfortunately, several
surveys have shown that the percent of individuals who received mental health services within one year of postdeployment for any disorder was extremely low primarily due to concerns of stigma: 23% for members of the
army from OIF, 29% for marines from OIF, and 40% for soldiers from OEF. Furthermore, there has been more
than a 79.5% increase in the number of veterans receiving PTSD disability compensation from 1999 to 2004
(Office of Inspector General 2005) totaling $4.3 billion in 2004. The full cost of PTSD remains to be seen but
undoubtedly the costs to the individual and the community at large as well as the consequences to mission
readiness are quite high.
Ultimate Applicability of the proposed research: This study is approached from three different angles to
better cover the complexity of this devastating illness: a) a recently developed CAM-based modality
emphasizing positive Guided Imagery will be compared to Prolonged Exposure imagery that focuses on
revisiting traumatic events and we anticipate that PTSD patients’ symptoms will improve; b) in addition to the
psychological assessments, stress-related neurobiological markers will be measured in saliva and blood
specimens and correlated to neuronal parameters obtained by functional magnetic resonance imaging (fMRI)
before (at baseline) and after treatment; c) Integral blood proteins (proteome) and genes (genome) will be
fractionated using cutting-edge technology to decipher the neurobiological signature of each patient, a step that
has been of a great challenge to the biomedical research community due to the molecular heterogeneity and
complexity of the disease. By applying our novel analytical method, phylomics® (patent pending), we expect to
translate the proteome and genome information combined with the neuronal and physiological data to derive
biologically meaningful relationships that groups together those patients who share similar molecular signatures.
The analysis is innovative because it draws from evolutionary-based principles of parsimony phylogenetic
analysis that have successfully been utilized for the past 50 years in other biology disciplines but rarely applied
to biomedicine, a field that has been dominated by pure descriptive biostatistics. We expect that phylomics will
solve the issue of molecular heterogeneity that might in turn explain the complexity of the disease and the
capacity of recovering from trauma, response to treatment.
Consumer-related outcome: All aspects of the study are based on non-invasive interventions. Two outcomes
of this study are likely to have a major impact on the consumer. First, Guided Imagery is a much less difficult
therapy to implement and less painful to the patient than Exposure therapy. Thus, if Guided Imagery proves as
efficacious as Exposure therapy this will translate into a therapy that can be more widely applied in a variety of
clinical settings. Second, we expect the phylomics® algorithm will be able to identify subjects who will be
treatment responders from non-responders and ultimately provide a method for identifying targeted treatments
optimized to the individual PTSD sufferer.
Projected time to clinical translation: Once our hypotheses are verified, we think that the treatment modality,
neurobiological correlates, and molecular signature could be easily implemented in the clinic. It requires a
trained Mind-Body-Medicine facilitator, an MRI facility which an integral component of hospitals now, and a
clinical laboratory to process the blood for phylomics® analysis. The advantage of our proposed study is its
multidimensionality and clinical potential.
A. BACKGROUND
A number of studies have begun to examine the range of mental health problems (Hoge, Castro et al. 2004;
Hoge 2006; Milliken, Auchterlonie et al. 2007) and PTSD in particular (Hoge, Terhakopian et al. 2007) in
soldiers returning from Operation Enduring Freedom (OEF) and Operation Iraqi Freedom (OIF). Hoge, et al.,
2006, used the Post-Deployment Health Assessment (PDHA) from 303,905 Army soldiers and Marines to
assess a variety of mental health related issues in soldiers returning from OEF and OIF. They found the rate of
PTSD for those who served in OIF was 9.8% with an odds ratio (OR) of 5.51. While the rate for those returning
from OEF was 2.1% with an OR of 2.52. They hypothesized that the difference in the rates between the two
theaters was related to the number of combat related instances encountered (Hoge, Auchterlonie et al. 2006).
Unfortunately, the percent of individuals who seek mental health services within one year of post-deployment
for any disorder was extremely low primarily due to concerns of stigma: 23% for members of the army from
OIF, 29% for marines from OIF, and 40% for soldiers from OEF (Hoge, Castro et al. 2004). One measure of the
economic costs of PTSD can be gauged from the amount paid by the Department of Veteran Affairs (VA) in
disabilities payments for PTSD. There has been more than a 79.5% increase in the number of veterans receiving
PTSD disability compensation from 1999 to 2004 while payments for PTSD disabilities rose 148.8% to $4.3
billion in 2004 (Office of Inspector General 2005). The full cost of PTSD remains to be seen but undoubtedly
costs to the individual, the community at large, and the consequences to mission readiness are quite high.
Treatments for PSTD include psychotherapeutic approaches that rely on “re-living” the event such as cognitive
behavior therapy (CBT) including exposure-based therapies as well as eye movement desensitization and
reprocessing (EMDR). A number of drug treatments have been used with varying degrees of success including
various selective serotonin reuptake inhibitors (SSRI), anti-epileptics such as dilantin, and alpha-adrenergic
blockers. The VA’s clinical practice guidelines derived from an evidence-based assessment of treatments
conclude significant benefit of SSRIs and/or a number of psychotherapies including cognitive therapy, exposure
therapy, and EMDR (Clinical Practice Guideline Workgroup 2004). While the recently released report from the
Institute of Medicine (IOM) concluded that only psychotherapies based on exposure therapy met their
evidenced-based criteria for efficacy (Institute of Medicine: Committee on Treatment of Posttraumatic Stress
Disorder 2007).
The IOM’s report recommended that future studies of treatment efficacy include use of randomized control
trials, investigator independence, and proper handling of attrition including systematic follow-up of noncompleters (Institute of Medicine: Committee on Treatment of Posttraumatic Stress Disorder 2007). Gaps with
regards to treatment of PTSD identified by the IOM included determining length of treatment necessary, longterm follow-up of subjects, studying important veteran subpopulations, and investigating three factors related to
outcome: loss of PTSD diagnosis, symptom improvement, and end state functioning.
BOLD fMRI: The proposed experiments will employ functional magnetic resonance imaging (fMRI), which is
based on the principle that the recorded MRI signal changes with the magnetic properties of intravascular
contents. Since deoxygenated hemoglobin is paramagnetic (Thulborn, Waterton et al. 1982), it acts as an
endogenous intravascular paramagnetic contrast agent (Ogawa, Lee et al. 1990; Belliveau, Kennedy et al. 1991;
Turner, Jezzard et al. 1993). Blood Oxygenation Level Dependent (BOLD) contrast results from an increase in
cerebral blood flow greater than local oxygen consumption (Ogawa, Lee et al. 1990). As a result of this
discrepancy, local concentration of deoxyhemoglobin is decreased causing an increase signal intensity on T2*
weighted images, which allows estimation of task-related neural activation when compared to a baseline image.
fMRI has been used successfully to investigate cognitive processes and neurological disorders (Bandettini,
Wong et al. 1992; Frahm, Bruhn et al. 1992; Ogawa, Tank et al. 1992; Eden, VanMeter et al. 1996). Thus, fMRI
provides a method for examining neuronal activity with the advantage of high spatial resolution.
Neuronal Correlates of PTSD Severity/Response to Treatment: One commonly used paradigm in studies of
PTSD has been script driven imagery (SDI), which uses a script describing a traumatic event. Most studies have
found either decreased or no activation of the mPFC (medial prefrontal cortex) and the ACC (anterior cingulate
cortex) in PTSD (Shin, McNally et al. 1999; Britton, Phan et al. 2005; Lanius, Frewen et al. 2007). In an
emotional Stroop task Bremner, et al. found decreased activity in the ACC in PTSD compared to exposed non-
PTSD (Bremner, Vermetten et al. 2004). However, in the same study the PTSD subjects had equivalent
activation of the ACC for the classical Stroop task indicating that dysfunction of the ACC is specific to
emotional processing in PTSD. Using the CAPS (Clinician Administered PTSD Scale) score, several studies
have found a significant relationship between neuronal activity and severity of PTSD (Rauch, Whalen et al.
2000; Shin, Wright et al. 2005; Bryant, Kemp et al. 2007; Lanius, Frewen et al. 2007). These studies provide
strong, consistent evidence that changes in neuronal activity in specific regions relate to severity of PTSD
symptoms and response to treatment.
Overall, most studies report increased activation of the amygdala and decreased activation of the ACC in PTSD
with a variety of paradigms that use different types of emotionally evocative stimuli. Several studies also report
decreased activity in the mPFC, yet it remains unclear if this is related to trauma exposure rather than PTSD.
Seedat and colleagues found reductions in PTSD severity following SSRI treatment negatively correlated with
resting blood flow in the mPFC. Given the role the ACC plays in emotional regulation and affective processing,
reduced activation of the ACC is probably a major contributor to hyperactivation of the amygdala in PTSD
(Liberzon and Martis 2006). Thus, the dysfunction of the emotional processing circuit composed of the medial
prefrontal cortex, anterior cingulate, and amygdala appears to be the defining characteristic of PTSD
neurobiology. The neuronal changes that arise in PTSD and, importantly, responsiveness to treatment can
clearly be identified using fMRI.
Biological Markers and PTSD: The pathophysiologic complexity of psychiatric diseases renders diagnosis and
treatment challenging. Nonetheless, a number of altered biological molecules and pathways have been
identified as markers and used as pharmacological targets. Malfunctioning serotonin receptors have been
associated with PTSD (van Praag 2004). Increased platelet serotonin has been described in PTSD associated
with psychotic symptoms in war veterans, and is thus a trait marker (Pivac, Kozaric-Kovacic et al. 2006). The
protein p11 is among the biomarkers for PTSD that the Traumatic Stress Brain Study Group has identified
(Svenningsson, Chergui et al. 2006), finding its mRNA expression was increased in postmortem PTSD patients
as compared to matched control (Svenningsson, Chergui et al. 2006). Spivak’s group assessed male outpatients
with untreated chronic combat-related PTSD and showed that plasma DHEA and DHEA-S levels were
significantly higher compared to controls. They concluded that the neurosteroid-induced decreased GABAergic
tone could be used as a marker for chronic PTSD (Spivak, Maayan et al. 2000). In premenopausal women with
PTSD increased DHEA was associated with reduced avoidance and negative mood symptoms (Rasmusson,
Vasek et al. 2004) suggesting that cortisol and DHEA could be the modulators of recovery from PTSD
(Yehuda, Brand et al. 2006; Olff, de Vries et al. 2007). Furthermore, inflammatory markers (CRP, SAA) and
cytokines (Interleukins 2,6, and 8) have been proposed as markers for PTSD (Sondergaard, Hansson et al. 2004;
Song, Zhou et al. 2007). For a multifactorial disease, a comprehensive and biologically meaningful analysis
should be applied. Thus, we propose applying our novel analytical approach to high-throughput serum
“OMICS” data using maximum parsimony phylogenetics.
Proteomics, Genomics (Omics) and PTSD::Molecular processes underlying psychiatric diseases cannot be
further explained by traditional techniques considering that the patient’s molecular bio-signature is responsible
for the variable symptomolgy, behavioral response to traumatic events, and response to treatment. Protein mass
spectrometry (MS) and gene-expression microarray methodologies have been developed to facilitate the search
for biomarkers and only recently used for neuropsychiatric diseases including depression, Alzheimer’s disease,
and schizophrenia (Huang, Leweke et al. 2006; Davidsson, Westman-Brinkmalm et al. 2002; Brunner, Bronisch
et al. 2005; Cassidy, Zhao et al. 2007). A recent study of SNPs in FKBP5, a gene involved in glucocorticoid
receptor (GR) functioning, found there was an impact of early trauma on PTSD and the impact of PTSD and
trauma on GR sensitivity (Binder, Bradley et al. 2008). We plan to generate omics data from patients’ blood
specimens, and apply our novel functional, multidimensional, and dynamic method to analyze the highthroughput raw data to decipher response to treatment. We named our phylogenetic-based analytical method
Phylomics.
Deciphering Complex Heterogeneous Biological Systems using Phylomics: Subjecting blood to a thorough
MS or gene-microarray generates tens of thousands genes and protein data points. Current analytical methods,
such as clustering, do not discriminate between baseline similarity (ancestral states) and what changed or
mutated (derived states) to cause or reverse the disease state. Phylomics is a universal data-mining platform
capable of analyzing MS and gene-expression data to produce biologically meaningful classification (i.e. group
together biologically related specimens). Phylogenetics has been widely used since the 1950s in classifying
viruses, bacteria, fungi, plants, or animals based on their shared derived characters (DeLong and Pace 2001;
Pillay, Rambaut et al. 2007; Organ, Schweitzer et al. 2008). The diagram depicting the classification is termed a
cladogram, and the biomarkers defining each related group are identified as the synapomorphies (more
definitions in Attachment 2). Patients with similar pathology share a specific set of molecular changes
(synapomorphies) for every stage of the disease; this can be utilized to group patients into classes called clades
on the basis of their shared derived molecular changes. Unfortunately, the biomedical field is still almost
exclusively dominated by statistical approaches. We are the first group to apply parsimony phylogenetics to
biomedicine and have a patent pending (Abu-Asab, Chaouchi et al. 2006; Abu-Asab, Chaouchi et al. 2008). No
other method, to date offers a multi-dimensional and dynamic analysis that is capable of deciphering the
molecular bio-signature and response to treatment.
B. HYPOTHESES AND OBJECTIVES
The objective of this study is to compare a CAM (Complementary and Alternative Medicine) based imagery
treatment for PTSD with exposure therapy. In addition, we will collect a number of neuroendocrine, genomic,
and neuronal measures pre- and post- treatment, which will be used to determine the changes with treatment.
Finally, the genomic/proteomic and baseline measures will be entered into a novel classification algorithm
called ‘phylomics’ to predict PTSD treatment responders from non-responders a priori.
We specifically predict that the imagery treatment will have an impact on PTSD symptom severity equivalent to
exposure therapy. We also hypothesize that the physiological impact of imagery treatment will be reflected as
positive changes in the physiological stress makers. On a neuronal basis, we predict that activation in the
anterior cingulate cortex will increase following the imagery treatment corresponding to treatment success.
Lastly, we expect that the phylomic algorithm will be able to accurately distinguish responders from nonresponders based on their baseline profile.
The results of this study will include 1) a controlled and systematic comparison of a CAM-based imagery
treatment to the more accepted exposure based therapy, 2) identification of the changes in physiological and
neuronal markers that relate to treatment outcome, and 3) a test of a novel technique to predict treatment
outcome. Any of these end results alone has the potential to make a significant impact on PTSD treatment and
further our understanding of this debilitating disorder. Combined, this study represents a unique opportunity to
fundamentally change our understanding of PTSD.
C. PRELIMINARY DATA
Figure 1: Normalization of circadian cortisol levels before and after MBMS.
Morning
Evening
20
30
Cortisol (ng/ml)
Our preliminary data are two fold. First, we
show the effect of an imagery-based MindBody Medicine Skills program (MBMS) on
salivary stress hormones measured in
medical students. Second, we applied
phylomics to analyze genetic cancer data.
Normal range
Normal range
15
20
10
Measurements of Physiological Parameters
10
before and after MBMS: An eleven-week
5
elective MBMS course is offered to first
year medical students consisting of weekly
0
0
two-hour meetings in groups of ten with
Pre
Post
Pre
Post
Mind Body
Mind Body
Mind Body
Mind Body
their group facilitator and co-facilitator. AM
Medicine
Medicine
Medicine
Medicine
and PM Saliva specimens were collected from students pre- and post-MBMS intervention (n=24) and from a
control group (n-38). Both genders were represented. The Pre-MBMS collection occurred in early January and
the Post-MBMS in May while the students were preparing for their final exams. Saliva was processed for
cortisol and DHEA-S, hormones known for their involvement in stress-response.
p<0.001
40
Cortisol (ng/ml)
p<0.0001
NS
NS
20
0
Control
MBMS
Control
MBMS
p<0.001
DHEA -S (ng/ml)
12
p<0.01
Pre-Intervention
-
Post-Intervention
(week 11)
NS
NS
8
4
0
Control
MBMS
Pre-Intervention
Control
MBMS
Post-Intervention (week 11)
Figure 2: AM cortisol and DHEA-S levels of students
enrolled in MBMS and controls collected before and after
completion of MBMS.
Analysis: Both hormones were measured using ELISA
and statistical analyzed using log-transformed values. Preand post- data were analyzed using one sample paired ttest; pre- and post-control as well as post-MBMS were
tested with two sample unpaired t-test (p<0.05).
Cortisol: Both groups started the semester with cortisol
levels within the normal range. Three months later the
MBMS participants remained within the normal range
while controls had a 31% increase in AM values and 82%
in PM values. Controls had a 43.5% (p<0.0001; 95%CI: [1.61 to –0.67]) increase in cortisol levels by semester’s
end during final exams while MBMS participants
maintained normal levels (Figure 2). The PM values
followed the same pattern (Table 1). The MBMS program
helped students maintain their stress hormone levels
within the normal range. Furthermore, all abnormally
low AM cortisol values were raised to normal levels with
an average treatment-related increase of 7.3 fold. All
individuals with a reversed circadian cortisol secretion
pattern experienced a normalization of this adverse pattern
following MBMS intervention.
In subjects with both abnormally elevated AM and PM values in conjunction with a reversed secretion pattern,
the evening value reduction was significantly higher than the morning effect (92% vs. 78%), thus restoring both
normal range values as well as a physiological circadian distribution (Figure 1). MBMS intervention resulted in
normalization to adequate AM peak cortisol values, restoration of the physiological cortisol secretion pattern, as
well as significant reduction of elevated measurements across the daily spectrum.
DHEA-S: DHEA and DHEA-S, also known as active neurosteroids, tend to follow the cortisol patterns in
response to stress (Figure 2). Although all students started with similar levels, by semester’s end controls had
increased AM levels by
Table 1: Summary of statistical analysis performed on log transformed values.
53% (p<0.002; 95%CI:
Cortisol (p value)
DHEA-S (p value)
[-1.60 to –0.38]) and
Collection
PM values by 72%
time
AM
PM
AM
PM
(p<0.0001; 95%CI: [Pre-MBMS
1.92 ± 0.22
0.49 ± 0.21
1.02 ± 0.16
0.86 ± 0.18
1.65 to –0.59]). This
(n=24)
(0.89)
(0.33)
(0.86)
(0.21)
study using saliva
Pre-Control
1.50 ± 0.17
0.89 ± 0.32
0.44 ± 0.27
specimens
(n=16)
(0.20)
(0.71)
(0.19)
N/A*
demonstrates that
95% CI limits -1.05 to 0.23
-0.87 to 0.31
-0.80 to 0.55
-1.07 to 0.23
MBMS intervention
restored cortisol and
1.88 ± 0.17
0.21 ± 0.12
1.04 ± 0.21
0.64 ± 0.19
DHEA-S levels back to
Post- (n=24) (0.002)
(0.0001)
(0.001)
(0.0001)
normal levels and was
2.65 ± 0.15
1.15 ± 0.16
1.88 ± 0.48
1.56 ± 0.10
maintained throughout
(0.0001)
(0.001)
(0.0002)
Control (n=22) (0.001)
the semester.
Change (%)
31 %
82 %
45 %
59 %
Phylomics analysis is
95% CI limits -1.23 to -0.3
-1.36 to -0.53
-1.32 to -0.35 -1.32 to -0.35
capable of early
* Data did not follow normal distribution after log transformation.
detection and risk
assessment This shows phylomic analysis of serum proteomics from normal and prostate cancer specimens
obtained from MS data of 36 prostate cancer specimens and 49 non-cancerous specimens from the NCI Clinical
Proteomics Program (Petricoin, Paweletz et al. 2002; Zhu, Wang et al. 2003; Abu-Asab, Chaouchi et al. 2006).
The cladogram shows a hierarchical classification of prostate cancer specimens. Each segment of the cladogram
Figure 3: Most parsimonious cladogram for
prostate cancer based on serum proteomics
from 36 prostate cancer and 49 healthy men.
Specimens had 15144 m/z data points. Lines
on right side represent specimens. Red clade
corresponds to cancerous (independently
assessed); Green indicates healthy; and Blue
is presumed healthy but is a transitional zone
between healthy and cancerous clades.
denotes a specimen (Farris 1970). Each node on the cladogram is
defined by the shared derived state(s) among specimens in one of the
segments. Topology of the cladogram also conveys general trends
within the data that are not obvious otherwise by other types of
analysis (Abu-Asab, Chaouchi et al. 2006). We found three distinct
sections of the cladogram (Figure 3): the basal contains most of the
normal specimens (green); the middle has transitional specimens
between the normal and cancer that could represent the “at risk”
subjects (blue); and the upper section has the cancerous ones (red).
Because phylomics plots specimens classification on a hierarchical
continuum, it is the first analytical tool to identify transitional
specimens (transitional clades) nested distinctly between cancer and
non-cancerous main clades which most likely represent individuals at
risk of developing cancer or recovering from treatment. This makes
the cladogram a very useful tool to identify the transitional patterns
from healthy to cancerous tissue by directly modeling the data with
minimal restrictive assumptions, and possibly renders it a predictive
tool for early disease detection and risk assessment.
Robustness of Phylomics analyzing multiple datasets To illustrate
phylomics robustness, we carried out a comprehensive analysis
combining polarized matrices of 460 specimens representing ovarian
(n=143), pancreatic (n=70), and prostate (n=36) cancers as well as
non-cancerous specimens (n=211) from NCI Clinical Proteomics
Program. Analysis yielded a consensus cladogram where each of the
three cancers formed two large clades (the terminal and middle), and
numerous small transitional clades adjacent to non-cancerous clades
(Figure 4). Pancreatic and prostate clades formed sister groups in their
terminal and middle clades, and their terminal clades were nested
within the ovarian clades’ dichotomy. Ovarian specimens formed two
distinct clades. A set of transitional clades for each cancer type formed
between normal and large cancer clades (brown). Transitional clades
of each cancer type did not commingle with those of other clades.
Significance of our findings for the Proposed Application: Our
data showed the beneficial effects of imagery-based MBMS on stress
in healthy subjects. We showed that after MBMS, cortisol levels
normalized in
Figure 5: Hypothesized output of phylomics to PTSD derived
the students
from the neuronal, physiological, and proteomic/genomic
signature of individual subjects.
that had low
Figure 4: Phylomic analysis of cancer types.
AM or high
PM cortisol. Elevated DHEA and DHEA-S have been
suggested as clinical correlates associated with PTSD
(Spivak, Maayan et al. 2000). These two hormones are
active neurosteroids that tend to reduce GABAergic
tone (Spivak, Maayan et al. 2000). Thus, normalizing
their levels could play a role in PTSD symptom
improvement. Our data suggests that a MBMS imagerybased program could be an efficacious treatment for
PTSD.
We have also demonstrated that phylomics is a robust
analytical tool that offers a novel method to analyze
high-throughput MS and gene-expression microarray data resulting in biologically meaningful relationships
between subjects sharing similar molecular bio-signatures in a hierarchical, dynamic, and multi-dimensional
fashion. If applied to PTSD, we expect to be able to distinguish between those subjects who are predisposed to
developing PTSD if exposed to traumatic events (at risk group) as well as treatment responders from nonresponders (Figure 5). Furthermore, since it is a dynamic analysis we expect to be able to follow the response to
treatment of each patient by analyzing changes that occur during treatment leading to improvement. This will
translate into rearrangements of subjects in the responder/non-responders clades. PTSD is a multifactorial
disease that cannot be characterized by only one or two biomarkers as is traditionally done.
D. SPECIFIC AIMS
Previous studies of CAM-based treatment to treat PTSD have demonstrated positive outcomes to victims of
war-related trauma in Kosovo (Gordon, Staples et al. 2004). These studies while useful lacked comparison to an
accepted treatment modality such as exposure therapy. Furthermore, the neuronal/physiological mechanisms
that underlie treatment effects have not been identified. We propose to compare a positive mental imagery
technique called Guided Imagery (Naparstek 2004) to Prolonged Exposure, which uses mental imagery to
revisit the traumatic event. We further propose to use neuroimaging, physiological, and proteomic/genomic
data as input to the phylomic algorithm to predict treatment response. Thus, this study will produce three main
outcomes: 1) test the efficacy of a CAM-based imagery treatment (Guided Imagery) against the established
treatment (Prolonged Exposure) 2) further elucidate the neurological/physiological mechanisms underlying
PTSD and subsequent changes related to treatment, and 3) test the ability of phylomics, a novel classification
algorithm, to predict PTSD treatment responders from non-responders.
We therefore propose the following specific aims:
Aim 1: To compare Guided Imagery against Prolonged Exposure therapy. These two treatment modalities are
similar in structure but differ in terms of emphasis on positive (Guided Imagery) and traumatic (Prolonged
Exposure) imagery. We hypothesize Guided Imagery will be as effective as exposure therapy in reducing
PTSD symptoms. Specifically, we hypothesize that at post-treatment, mean symptom scores will differ by no
more than 0.5 standard deviations between groups, and will differ by at least 0.4 SD units from pre-treatment.
Aim 2: To identify impact of both treatment regimes on neuroendocrine markers of stress and neurobiological
changes. We will focus on both central and peripheral pathways: glucocorticoids and catecholamines as
peripheral-sympatho-adrenal markers, and the neuroactive steroids (DHEA, DHEA-S) and neuropeptide Y
(both measurable in plasma), known for their anxiolytic action and role in stress physiology. We hypothesize
that Guided Imagery treatment will result in positive changes in each of these stress makers.
Aim 3: To determine the neurobiological changes associated that occur with treatment. Previous fMRI studies
of PTSD have generally reported a decrease in activity in medial prefrontal cortex (mPFC) and the anterior
cingulate cortex (ACC) with a corresponding increase in the amygdala. Activity in the mPFC and ACC has
also been reported to negatively correlate with PTSD severity. We predict that activation in the ACC during
an SDI task will increase following treatment while the mPFC and amygdala will remain unchanged.
Aim 4: To methodically examine treatment responders from non-responders using phylomics. The baseline
measures of stress both neuroendocrine and neuronal activity in addition to the genetic profile will be used as
input to the phylomics classification algorithm. We expect that this algorithm will be able to distinguish
responders from non-responders based on their baseline profile.
The relevance of the proposed study to treatment of PTSD is 1) a systematic comparison of a CAM-based
treatment to the more accepted exposure based therapy, 2) identification of changes in neural markers that relate
to treatment outcome, and 3) a test of a novel technique to predict treatment outcome.
E. RESEARCH STRATEGY
1. Experimental Design
The experimental design centers on comparing a CAM-based Guided Imagery intervention to the more standard
exposure therapy. Subjects entering the study will be randomized into one of the two arms. Baseline
assessments will include administration of a number of clinical assessment instruments, collection of saliva
blood specimens, and fMRI imaging. Follow-up assessments include salvia, blood draw, and fMRI scanning
performed within two weeks of the last treatment session. PTSD symptom severity will be assessed using the
CAPS no more than four weeks after the last treatment. Details of each of these procedures are described below.
2. Subject Selection
Subjects will be recruited through the VA with the assistance of Drs. Richard Amdur and Marc Blackman.
Other avenues will also be pursued including working with physicians from Walter Reed Medical Center and
the National Navy Medical Center to provide referrals. In addition, we will also work with local veterans
organizations to assist with disseminating information about this study.
Participants:
Male and female subjects 18-55 years who have direct combat experience and meet criteria for combat-related
PTSD will be considered for this study.
Inclusion Criteria: All subjects in the PTSD cohorts must meet the diagnosis of PTSD based on a clinical
interview assessment using the DSM-IV-R criteria. Subjects will be screened to include only those with combat
related trauma. Subjects will be selected on the basis of whether they plan to seek other treatment during the
study period. If this proves to be impractical, this restriction will be eliminated.
Exclusion Criteria: Subjects who are younger than 18 and older than 55 years of age will be excluded.
Additional exclusion criteria: total WASI IQ score < 85, less than strongly right-handed (Edinburgh < 90),
diagnosis of psychosis, history of previous psychiatric treatment other than PTSD, current psychotropic drug
use, overt neurological injury or disease, seizure disorder, and mood disorders. Subjects who have suffered
traumatic brain injury (TBI), a closed head injury, a concussion, or been knocked unconscious for a period of
time as a result of head injury will be excluded. Finally, individuals will be excluded due to psychosis, mania,
current suicidal ideation and substance abuse/dependence as determined by screening procedures.
3. Safety Concerns
Due to safety concerns, participants who with psychosis, mania, current suicidal ideation and substance
abuse/dependence will be excluded from the study. Subjects will be monitored by interventionists for emerging
problems in these areas. All subjects will be given referral information for crisis concerns and a study contact
number will be answered 24/7 by a study team member. Special care will be paid to screening all subjects for
ferromagnetic metallic objects, implants, and shrapnel to ensure subject safety in the MRI scanner. A thorough
review consists of a standard 60-question list including injury with metallic objects. Medical records are
examined to verify that all implants are MRI-compatible.
4. Description of Data Collection Procedures
Subject Assignment and Intervention:
Randomization Procedure: Subjects will be randomized using permuted blocks (2, 4 and 6) (Friedman, Furberg
et al. 1998), providing the best opportunity of maximizing the benefits of randomization. Once a participant
completes the baseline assessment and is determined to be eligible, he/she will receive the next random number
assignment. This process will be conducted by a data manager who is otherwise not involved in the study.
Intervention Conditions: Both Guided Imagery and Prolonged Exposure will be delivered in 11 weekly 90minute audiotaped sessions. Manuals will guide interventionists. 1) Participants in the Guided Imagery arm
will receive treatment according to the intervention developed by Naparstek (Naparstek 2004) using a 3-stage
approach involving stabilization and self-soothing, cognitive and emotional integration, and long-term
functioning. 2) Participants in the Prolonged Exposure arm will receive treatment for PTSD used by Foa (Foa,
Dancu et al. 1999; Foa, Hembree et al. 2005), Schnurr (Schnurr, Friedman et al. 2007) and others. Assessment
of symptoms is routinely conducted with self-report instruments to monitor progress and assess safety.
Interventionists: Interventionists will be mental health clinicians, each experienced in delivering either Guided
Imagery or Prolonged exposure experienced in working with veteran populations. Adherence Measures: We
will use measures to capture the essential techniques of both interventions to include items related to techniques
that typify the intervention as well as those that should not be used in this method. Independent raters reviewing
audiotapes of sessions will make ratings using this measure. Competence will include items assessing skill with
which the interventionist phrased interventions, timing of the interventions, appropriateness of comments at the
time it was given, as well as tapping nonspecific items, such as his/her degree of warmth and supportiveness.
The same independent rater who assessed adherence will rate this measure; based on the same audiotapes.
fMRI Stimulation Paradigm:
The paradigms optimized for fMRI will include a script driven imagery (SDI) task and an emotional counting
Stroop. Script Driven Imagery (SDI) task will be used which will present short scenarios that uses auditory
presentation of the subjects traumatic event gathered through an interview. Scripts will alternate with a neutral
story in a block design. After each script subjects use an MRI compatible joystick to provide a rating of the
intensity of the sensations experienced on a scale of 1 to 7. Emotional Counting Stroop uses negative words that
include both trauma related and non-trauma related negative associations. This provides an excellent assessment
of limbic emotional regulation compared to executive control. To isolate the changes specific to emotional
stimuli, the classical Stroop task is used. Both Stroop tasks will be presented using a rapid event-related design.
Proteomic and Genomic Assessments:
Proteomics: Matrix-assisted laser desorption/ionization mass spectrometry (MALDI-MS) is one of the large
collections of high throughput technologies utilized for proteomics studies. The resulting mass spectra are mass
to charge (m/z) ratio values in which the intensity of peaks is correlated to the peptides concentration in the
analyzed fraction. All data will be analyzed with UNIPAL as previously described (Abu-Asab, Chaouchi et al.
2006; Abu-Asab, Chaouchi et al. 2008) . Protein expression level: Western blotting - Protein expression and
quantification are determined using the corresponding antibody of the targeted protein. Beta-actin is used to
normalize band quantification. Signals are detected via chemiluminescence as described before (Amri,
Ogwuegbu et al. 1996). ELISA: standard kits are used for NPY, DHEA, DHEA-S, cortisol and catecholamines.
Genomics: Gene-expression microarray - Peripheral blood will be collected directly into PaxGene tubes which
stabilize and protect RNA from degradation (Qiagen) then frozen at -80°C until use. Total RNA is isolated with
a Rneasy Mini Kit (Qiagen). RNA samples are analyzed using the fully integrated Affymetrix GeneChip
Instrument System including RNA amount and integrity. Gene array analysis uses a solid phase assay, followed
by RT-PCR and northern blot analysis to eliminate false positive results. For Real-time RT-PCR, RNA will
be isolated using a High Pure RNA Isolation Kit (Roche Diagnostic) and cDNA synthesized by iSrcript
(BioRad). RT- PCR will be performed using Icycler iQ Detection System (BioRad) and TaqMan PCR Reagent
Kit with pre-designed primers and fluorescien-labeled probes from Applied Biosystems (Foster City, CA).
Mental Health Assessments:
Screening: The Structured Clinical Interview for DSM-IV (SCID) (First et al., 1994; Spitzer et al., 1992) is
an interview for both past and current Axis I diagnoses based on DSM-IV. Selected modules are used to screen
out subjects with current substance abuse or dependence, lifetime or current psychosis, and bipolar disorder. If
participants endorse the suicide item, their level of intent will be explored by the trained assessor. If participants
endorse current suicidal intent, they will be excluded from the study and referred to the first available on-site
physician for disposition. The SCID has adequate test-retest reliability. Kappa values in patient samples were
.61 for current and .68 for lifetime diagnoses for most of the major categories (e.g., bipolar disorder, alcohol
abuse/dependence; major depressive disorder).
Combat Exposure: The Combat Exposure Scale (CES) (Keane, Fairbank et al. 1989) is a 7-item self-report
measure to assess wartime stressors experienced by combatants. Items are rated on a 5-point frequency (1 =
“no/never” to 5 = “> 50 times”), 5-point duration (1 = “never” to 5 = “> 6 months”), 4-point frequency (1 =
“no” to 4 = “more than 12 times”) or 4-point degree of loss (1 = “no one” to 4 = “more than 50%”) scale. The
total CES score (ranging from 0 to 41) is calculated by using a sum of weighted scores, which can be classified
into 1 of 5 categories of combat exposure ranging from “light” to “heavy.” The CES will be used at baseline.
PTSD: The Clinician Administered PTSD Scale (CAPS) (Blake, Weathers et al. 1995) will be used to
diagnose current PTSD and assess severity of PTSD symptoms. This scale assesses the frequency and intensity
of the 17 symptoms in the DSM-IV PTSD criteria. A frequency rating of at least 1 on a 0 to 4 scale ("once or
twice within the past month") and a severity rating of at least 2 on a 0 to 4 scale ("moderate") will qualify for
presence of the symptom for diagnostic purposes. Studies with combat veterans have been used to demonstrate
the reliability and validity of the CAPS (Weathers et al, 1992a, 1992b; Weathers, Blake & Litz, 1991). Internal
consistency (α coefficient) was estimated to be 0.94 for severity score (frequency and intensity) and test-retest
reliability ranged from 0.90 to 0.98. CAPS total severity score correlated with other established measures of
PTSD suggesting good convergent validity. CAPS will be completed at baseline and completion of treatment.
Depression: The Patient Health Questionnaire (PHQ), the self-report version of the Primary Care Evaluation
of Mental Disorders (PRIME-MD) (Spitzer, Kroenke et al. 1999) will be used to assess depression, a common
comorbid condition. This instrument has good psychometric properties with the diagnosis of any psychiatric
disorder k = 0.71; overall accuracy rate = 88% (Spitzer & Williams, 1994).
Participant Feedback: At the conclusion of the trial, participants will be invited to provide feedback during postintervention interviews with study staff regarding their interest in the intervention, comfort level, consistency
with cultural values, and perceived utility of the intervention as well as suggestions for change to increase utility
of the intervention. This information will be used to enhance interpretation of quantitative data.
5. Data Acquisition and Analysis
Functional MRI Data Acquisition: The fMRI data will be acquired on the research dedicated Siemens Trio
3.0T scanner with gradients suitable for echo-planar imaging sequences located at Georgetown University. A
whole brain high-resolution T1-weighted scan with with an effective resolution of 1.0mm3 is acquired to assess
brain morphology and to localize functional results. Functional MRI scans will be acquired using an EPI (echoplanar imaging) scan with a 2s TR, 30 slices, and an effective resolution of 3.0mm3.
Physiological and Behavioral Monitoring: Assessment of the subject’s mental state during the scanning
sessions will utilize the Invivo Millennia (Invivo Research, Orlando, FL) physiological monitoring system to
digitally record at heart rate (ECG), respiration, and pulse oximetry (SpO2). Galvanic skin response will also be
recorded using the MRA GSR system (MRA, Washington, PA). The physiological measures will be integrated
into the fMRI data analysis to identify neuronal responses related to the changes in these measures.
Functional MRI Data Analysis: We currently use a combination of tools including statistical parametric
mapping (SPM), MEDx, which was developed in large part under the direction and design of Dr. VanMeter,
FSL, and AFNI for individual and group analyses. Pre-processing of fMRI data includes Correction for
Geometric Distortion that occur due to inhomogeneities in the scanner’s static magnetic field using a field map
(Jezzard and Balaban 1995). Head Motion Correction uses rigid-body transformations (Woods, Grafton et al.
1998). High-pass filtering removes artifactual low-frequency Signal Drift. Spatial Normalization transforms
individual subject’s images into a standard coordinate system via nonlinear transformations which allows for
inter-subject averaging to improve statistical sensitivity by (Ashburner and Friston 1997; Woods, Grafton et al.
1998). Spatial Smoothing is applied to remove noise locally within the images and to allow for statistical
inference using Gaussian random field theory (Worsley 2005).
fMRI Statistical Analysis: We will use a Mixed-Effects Statistical Analysis technique that consists of two-stages
(Strange, Portas et al. 1999; Penny, Holmes et al. 2003). The first-level analysis uses a fixed-effects single
subject analysis followed by a second-level analysis that uses a random-effects group analysis on the summary
statistical images from the first-level analysis. Correction for Multiple Comparisons will use Gaussian random
field theory, which takes into account not only the multiplicity of simultaneous tests but also the spatial
smoothness of the data (Friston, Worsley et al. 1994; Worsley, Marrett et al. 1996). An alternative method for
addressing this issues use false discovery rate (FDR) (Genovese, Lazar et al. 2002).
Phylomics – Computational application: As a computational platform, phylomics encompasses two universal
algorithms that are run consecutively to produce the classification of specimens. First, UNIPAL: Universal
Parsing Algorithm to carry out polarity assessment of data points. This program was developed by the
investigators to perform outgroup comparison on the specimens. Second, MIX: a maximum parsimony program
which carries out the Wagner and Camin-Sokal parsimony methods (Felsenstein 1989). MIX produces the most
parsimonious cladogram for a dataset.
Statistical analysis - Biochemical and molecular experiments: To compare measured expression in patients’
specimens, normality and homoscedasticity are checked and appropriate transformations applied (log or
arcsine-square root). If transformed results follow a normal distribution and are homoscedastic, one-way
ANOVA will be used to compare the mean value. Otherwise the Wilcoxon rank-sum test for 2 groups will be
used. For the quantitative expression of a defined molecular target measured by, Western, RT-PCR, or
Northern, the data will be divided into negative and positive followed by appropriate tests and transformations.
Parametric tests will be used when possible and non-parametric tests otherwise.
6. Potential Problems and Alternatives
Budgetary limitations allow us to include assessments at baseline and post-intervention, only. If funded, we
will seek additional outside funding to permit assessments at 3-, 6-, and 12-months. It is possible that subjects
become distressed during the intervention sessions or during the assessments or other experimental procedures
(i.e., SDI). We plan to conduct intervention sessions on the GCRC (General Clinical Research Center), thus
allowing for readily available back-up medical and psychiatric staff who are also available to respond in the
Imaging Center, as needed. The multiple endpoints of this study outside of the intervention will ensure valuable
results even if the Guided Imagery treatment is not successful, which in of itself is an important outcome.
7. Statistical Power Analyses
To test the equivalence of Imagery & Control treatments on PTSD symptom severity at post-treatment, we will
do an intention-to-treat analysis using 1-way ANOVA with H0: the two group means are no more than 0.5 SD
units apart; H1: the exposure intervention is superior by at least 0.5 SD units. In order to have power > .80 to
test this directional hypothesis, we would need a total N of 101 or 51 per group, with alpha = .05, using a 1tailed test based on G*power3 (Faul, Erdfelder et al. 2007).
t tests - Means: Difference between two dependent means (matched pairs)
Tail(s) = Two, _ err prob = 0.05, Effect size dz = 0.4
To minimize problems of assay sensitivity and
biased end-point ratings inherent in noninferiority trials (Snapinn 2000), we will use a
repeated-measures ANOVA, testing equivalence
of pre-post-treatment change in PTSD symptom
severity between arms. With N=101, assuming a
pre-post r of .50, the power will be > .99 to detect
an interaction and a time effect in which each
explains 10% of the total within-group variance.
To demonstrate efficacy of the experimental
intervention, we will use a paired t-test. Assuming
pre-post r of .50, with n=51 in the Imagery group,
this test would have power > .80 to detect pre-post change of 0.4 SD units (with alpha=.05, 2-tailed).
1
0.9
Power (1- _ err prob)
0.8
0.7
0.6
0.5
0.4
0.3
0.2
10
20
30
40
50
60
Total sample size
70
80
90
100
F. SUMMARY
This proposal brings together psychologists who specialize in PTSD treatment (Drs. Dutton and Amdur) with
basic scientists who study the ability of CAM-types of treatment to reduce stress using neurophysiological
quantitative measures (Dr. Amri) and neurobiological basis of various disorders (Dr. VanMeter). While these
individuals come from very different backgrounds, the three PI’s have worked together as facilitators in the
MBMS program of stress management developed for use in the School of Medicine. In addition, the three PI’s
have two current collaborations underway examining the effect of various CAM modalities on stress biomarkers
and a neuroimaging study of PTSD. The proposed study utilizes a truly synergistic approach that leverages this
group of investigators unique talents to examine the efficacy of a CAM-based treatment (Guided Imagery)
compared to a traditional treatment (exposure therapy). In addition, this study will assess changes in
physiological measures of stress and the underlying neuronal patterns of activity as a function of the two
treatments. Lastly, the baseline neuroendocrine, genomic, and neuronal patterns will be used to classify a priori
treatment responders from non-responders using phylomics (patent pending) developed by Dr. Amri. The output
of the phylomic algorithm will provide not only a neurophysiological/genomic signature of PTSD but also a
stratified classification of subjects that could be used to target treatments to specific individuals suffering from
PTSD. Overall, this study has the potential to make a major impact on the field of PTSD and its treatment.
References:
Abu-Asab, M., M. Chaouchi, et al. (2006). "Phyloproteomics: what phylogenetic analysis reveals about serum
proteomics." J Proteome Res 5(9): 2236-40.
Abu-Asab, M., M. Chaouchi, et al. (2008). "Evolutionary medicine: A meaningful connection between omics,
disease, and treatment." Proteomics Clin Appl 2(2): 122-134.
Amri, H., S. O. Ogwuegbu, et al. (1996). "In vivo regulation of peripheral-type benzodiazepine receptor and
glucocorticoid synthesis by Ginkgo biloba extract EGb 761 and isolated ginkgolides." Endocrinology
137(12): 5707-18.
Ashburner, J. and K. Friston (1997). "Multimodal image coregistration and partitioning--a unified framework."
Neuroimage 6(3): 209-17.
Bandettini, P. A., E. C. Wong, et al. (1992). "Time course EPI of human brain function during task activation."
Magn Reson Med 25(2): 390-7.
Belliveau, J. W., D. N. Kennedy, Jr., et al. (1991). "Functional mapping of the human visual cortex by magnetic
resonance imaging." Science 254(5032): 716-9.
Binder, E. B., R. G. Bradley, et al. (2008). "Association of FKBP5 polymorphisms and childhood abuse with
risk of posttraumatic stress disorder symptoms in adults." Jama 299(11): 1291-305.
Blake, D. D., F. W. Weathers, et al. (1995). "The development of a Clinician-Administered PTSD Scale."
Journal of Traumatic Stress 8: 75-90.
Bremner, J. D., E. Vermetten, et al. (2004). "Neural correlates of the classic color and emotional stroop in
women with abuse-related posttraumatic stress disorder." Biol Psychiatry 55(6): 612-20.
Britton, J. C., K. L. Phan, et al. (2005). "Corticolimbic blood flow in posttraumatic stress disorder during scriptdriven imagery." Biol Psychiatry 57(8): 832-40.
Brunner, J., T. Bronisch, et al. (2005). "Proteomic analysis of the CSF in unmedicated patients with major
depressive disorder reveals alterations in suicide attempters." Eur Arch Psychiatry Clin Neurosci 255(6):
438-40.
Bryant, R. A., A. H. Kemp, et al. (2007). "Enhanced amygdala and medial prefrontal activation during
nonconscious processing of fear in posttraumatic stress disorder: An fMRI study." Hum Brain Mapp.
Cassidy, F., C. Zhao, et al. (2007). "Genome-wide scan of bipolar disorder and investigation of population
stratification effects on linkage: support for susceptibility loci at 4q21, 7q36, 9p21, 12q24, 14q24, and
16p13." Am J Med Genet B Neuropsychiatr Genet 144(6): 791-801.
Clinical Practice Guideline Workgroup (2004). VA/DoD Clinical Practic Guideline for the Management of
Post-Traumatic Stress, Department of Veterans Affairs and Health Affairs, Department of Defense.
Davidsson, P., A. Westman-Brinkmalm, et al. (2002). "Proteome analysis of cerebrospinal fluid proteins in
Alzheimer patients." Neuroreport 13(5): 611-5.
DeLong, E. F. and N. R. Pace (2001). "Environmental diversity of bacteria and archaea." Syst Biol 50(4): 4708.
Eden, G. F., J. W. VanMeter, et al. (1996). "Abnormal processing of visual motion in dyslexia revealed by
functional brain imaging." Nature 382(6586): 66-9.
Farris, J. S., A. G. Kluge, and M. J. Eckhart (1970). "On predictivity and efficiency." Systematic Zoology 19:
363-372.
Faul, F., E. Erdfelder, et al. (2007). "G*Power 3: a flexible statistical power analysis program for the social,
behavioral, and biomedical sciences." Behav Res Methods 39(2): 175-91.
Felsenstein, J. (1989). "PHYLIP: Phylogeny Inference Package (version 3.2)." Cladistics: 164-166.
Foa, E. B., C. V. Dancu, et al. (1999). "A comparison of exposure therapy, stress inoculation training, and their
combination for reducing posttraumatic stress disorder in female assault victims." J Consult Clin
Psychol 67(2): 194-200.
Foa, E. B., E. A. Hembree, et al. (2005). "Randomized trial of prolonged exposure for posttraumatic stress
disorder with and without cognitive restructuring: outcome at academic and community clinics." J
Consult Clin Psychol 73(5): 953-64.
Frahm, J., H. Bruhn, et al. (1992). "Dynamic MR imaging of human brain oxygenation during rest and photic
stimulation." J Magn Reson Imaging 2(5): 501-5.
Friedman, L. M., C. D. Furberg, et al. (1998). Fundamentals of clinical trials, 3rd ed. New York, Springer.
Friston, K. J., K. J. Worsley, et al. (1994). "Assessing the significance of focal activations using their spatial
extent." Human Brain Mapping 1: 214-220.
Genovese, C. R., N. A. Lazar, et al. (2002). "Thresholding of statistical maps in functional neuroimaging using
the false discovery rate." Neuroimage 15(4): 870-8.
Gordon, J. S., J. K. Staples, et al. (2004). "Treatment of posttraumatic stress disorder in postwar Kosovo high
school students using mind-body skills groups: a pilot study." J Trauma Stress 17(2): 143-7.
Hoge, C. W. (2006). "Deployment to the Iraq war and neuropsychological sequelae." Jama 296(22): 2678-9;
author reply 2679-80.
Hoge, C. W., J. L. Auchterlonie, et al. (2006). "Mental health problems, use of mental health services, and
attrition from military service after returning from deployment to Iraq or Afghanistan." Jama 295(9):
1023-32.
Hoge, C. W., C. A. Castro, et al. (2004). "Combat duty in Iraq and Afghanistan, mental health problems, and
barriers to care." N Engl J Med 351(1): 13-22.
Hoge, C. W., A. Terhakopian, et al. (2007). "Association of posttraumatic stress disorder with somatic
symptoms, health care visits, and absenteeism among Iraq war veterans." Am J Psychiatry 164(1): 1503.
Huang, J. T., F. M. Leweke, et al. (2006). "Disease biomarkers in cerebrospinal fluid of patients with first-onset
psychosis." PLoS Med 3(11): e428.
Institute of Medicine: Committee on Treatment of Posttraumatic Stress Disorder (2007). Treatment of
Posttraumatic Stress Disorder: An Assessment of the Evidence, The National Academies Sciences.
Jezzard, P. and R. S. Balaban (1995). "Correction for geometric distortion in echo planar images from B0 field
variations." Magn Reson Med 34(1): 65-73.
Keane, T., J. Fairbank, et al. (1989). "Clinical evaluation of a measure to assess combat exposure
." Psychological Assessment 1(53-55).
Lanius, R. A., P. A. Frewen, et al. (2007). "Neural correlates of trauma script-imagery in posttraumatic stress
disorder with and without comorbid major depression: a functional MRI investigation." Psychiatry Res
155(1): 45-56.
Liberzon, I. and B. Martis (2006). "Neuroimaging studies of emotional responses in PTSD." Ann N Y Acad Sci
1071: 87-109.
Milliken, C. S., J. L. Auchterlonie, et al. (2007). "Longitudinal Assessment of Mental Health Problems Among
Active and Reserve Component Soldiers Returning From the Iraq War." Jama 298(18): 2141-2148.
Naparstek, B. (2004). Invisible heroes: Survivors of trauma and how they heal. New York, Bantam Dell.
Office of Inspector General (2005). Review of State Variances in VA Disability Compensation Payments,
Department of Veterans Affairs,: vii.
Ogawa, S., T. M. Lee, et al. (1990). "Oxygenation-sensitive contrast in magnetic resonance image of rodent
brain at high magnetic fields." Magn Reson Med 14(1): 68-78.
Ogawa, S., D. W. Tank, et al. (1992). "Intrinsic signal changes accompanying sensory stimulation: functional
brain mapping with magnetic resonance imaging." Proc Natl Acad Sci U S A 89(13): 5951-5.
Olff, M., G. J. de Vries, et al. (2007). "Changes in cortisol and DHEA plasma levels after psychotherapy for
PTSD." Psychoneuroendocrinology 32(6): 619-26.
Organ, C. L., M. H. Schweitzer, et al. (2008). "Molecular phylogenetics of mastodon and Tyrannosaurus rex."
Science 320(5875): 499.
Penny, W. D., A. P. Holmes, et al. (2003). Random effects analysis. Human Brain Function. R. S. J.
Frackowiak, K. J. Friston, C. Frithet al, Academic Press.
Petricoin, E. E., C. P. Paweletz, et al. (2002). "Clinical applications of proteomics: proteomic pattern
diagnostics." J Mammary Gland Biol Neoplasia 7(4): 433-40.
Pillay, D., A. Rambaut, et al. (2007). "HIV phylogenetics." Bmj 335(7618): 460-1.
Pivac, N., D. Kozaric-Kovacic, et al. (2006). "Platelet serotonin in combat related posttraumatic stress disorder
with psychotic symptoms." J Affect Disord 93(1-3): 223-7.
Rasmusson, A. M., J. Vasek, et al. (2004). "An increased capacity for adrenal DHEA release is associated with
decreased avoidance and negative mood symptoms in women with PTSD." Neuropsychopharmacology
29(8): 1546-57.
Rauch, S. L., P. J. Whalen, et al. (2000). "Exaggerated amygdala response to masked facial stimuli in
posttraumatic stress disorder: a functional MRI study." Biol Psychiatry 47(9): 769-76.
Schnurr, P. P., M. J. Friedman, et al. (2007). "Cognitive behavioral therapy for posttraumatic stress disorder in
women: a randomized controlled trial." Jama 297(8): 820-30.
Shin, L. M., R. J. McNally, et al. (1999). "Regional cerebral blood flow during script-driven imagery in
childhood sexual abuse-related PTSD: A PET investigation." Am J Psychiatry 156(4): 575-84.
Shin, L. M., C. I. Wright, et al. (2005). "A functional magnetic resonance imaging study of amygdala and
medial prefrontal cortex responses to overtly presented fearful faces in posttraumatic stress disorder."
Arch Gen Psychiatry 62(3): 273-81.
Snapinn, S. M. (2000). "Noninferiority trials." Curr Control Trials Cardiovasc Med 1(1): 19-21.
Sondergaard, H. P., L. O. Hansson, et al. (2004). "The inflammatory markers C-reactive protein and serum
amyloid A in refugees with and without posttraumatic stress disorder." Clin Chim Acta 342(1-2): 93-8.
Song, Y., D. Zhou, et al. (2007). "Disturbance of serum interleukin-2 and interleukin-8 levels in posttraumatic
and non-posttraumatic stress disorder earthquake survivors in northern China."
Neuroimmunomodulation 14(5): 248-54.
Spitzer, R. L., K. Kroenke, et al. (1999). "Validation and utility of a self-report version of PRIME-MD: the
PHQ primary care study. Primary Care Evaluation of Mental Disorders. Patient Health Questionnaire."
JAMA. . 282(18): 1737-44.
Spivak, B., R. Maayan, et al. (2000). "Elevated circulatory level of GABA(A)--antagonistic neurosteroids in
patients with combat-related post-traumatic stress disorder." Psychol Med 30(5): 1227-31.
Strange, B. A., C. M. Portas, et al. (1999). "Random effects analyses for event-related f{MRI." Neuroimage 9:
36.
Svenningsson, P., K. Chergui, et al. (2006). "Alterations in 5-HT1B receptor function by p11 in depression-like
states." Science 311(5757): 77-80.
Thulborn, K. R., J. C. Waterton, et al. (1982). "Oxygenation dependence of the transverse relaxation time of
water protons in whole blood at high field." Biochim Biophys Acta 714(2): 265-70.
Turner, R., P. Jezzard, et al. (1993). "Functional mapping of the human visual cortex at 4 and 1.5 tesla using
deoxygenation contrast EPI." Magn Reson Med 29(2): 277-9.
van Praag, H. M. (2004). "The cognitive paradox in posttraumatic stress disorder: a hypothesis." Prog
Neuropsychopharmacol Biol Psychiatry 28(6): 923-35.
Woods, R. P., S. T. Grafton, et al. (1998). "Automated image registration: I. General methods and intrasubject,
intramodality validation." J Comput Assist Tomogr 22(1): 139-52.
Woods, R. P., S. T. Grafton, et al. (1998). "Automated image registration: II. Intersubject validation of linear
and nonlinear models." J Comput Assist Tomogr 22(1): 153-65.
Worsley, K. J. (2005). "Spatial smoothing of autocorrelations to control the degrees of freedom in fMRI
analysis." Neuroimage 26(2): 635-41.
Worsley, K. J., S. Marrett, et al. (1996). "Searching scale space for activation in PET images." Human Brain
Mapping 4: 74-90.
Yehuda, R., S. R. Brand, et al. (2006). "Clinical correlates of DHEA associated with post-traumatic stress
disorder." Acta Psychiatr Scand 114(3): 187-93.
Zhu, W., X. Wang, et al. (2003). "Detection of cancer-specific markers amid massive mass spectral data." Proc
Natl Acad Sci U S A 100(25): 14666-71.
Acronyms
ACC
AFNI
ANOVA
BOLD
CAM
CAPS
CBT
cDNA
CES
CRP
DHEA
DHEA-S
DIS-IV
DSM-IV-R
ECG
ELISA
EMDR
EPI
FDR
fMRI
GABA
GCRC
GR
IOM
MALDI-MS
MBMS
MIX
mPFC
MR
MRI
mRNA
MS
NCI
NPY
OEF
OIF
OMICS
OR
PDHA
PHQ
PRIME-MD
PTSD
RT-PCR
SAA
SCID
SDI
-
anterior cingulate cortex
Analysis of Functional NeuroImages
analysis of variance
Blood Oxygenation Level Dependent
Complementary and Alternative Medicine
Clinician-Administered PTSD Scale
cognitive behavior therapy
complementary Deoxyribonucleic acid
Combat Exposure Scale
C-reactive protein
Dehydroepiandrosterone
Dehydroepiandrosterone sulfate
Diagnostic Interview Schedule for DSM-IV
Diagnostic and Statistical Manual of Mental Disorders 4th Edition Revised
electrocardiogram
Enzyme-Linked ImmunoSorbent Assay
Eye Movement Desensitization and Reprocessing
echo-planar imaging
false discovery rate
functional magnetic resonance imaging
Gamma-aminobutyric acid
General Clinical Research Center
glucocorticoid receptor
Institute of Medicine
Matrix-assisted laser desorption/ionization mass spectrometry
Mind-Body Medicine Skills
maximum parsimony program
medial prefrontal cortex
magnetic resonance
magnetic resonance imaging
messenger ribonucleic acid
mass spectroscopy
National Cancer Institute
Neuropeptide Y
Operation Enduring Freedom
Operation Iraqi Freedom
Genomics
odds ratio
Post-Deployment Health Assessment
Patient Health Questionnaire
Primary Care Evaluation of Mental Disorders
Post Traumatic Stress Disorder
reverse transcription polymerase chain reaction
Serum amyloid A
Structured Clinical Interview for DSM-IV
script driven imagery
SNP
SPM
SSRI
3T
TBI
TR
UNIPAL
VA
WASI
-
single nucleotide polymorphism
Statistical parametric mapping
Selective Serotonin Reuptake Inhibitor
3 Tesla
traumatic brain injury
repetition time
Universal Parsing Algorithm
Department of Veteran Affairs
Wechsler Adult Intelligence Scale
FACILITIES and OTHER RESOURCES
CENTER FOR FUNCTION AND MOLECULAR IMAGING
The Center for Function and Molecular Imaging (CFMI), which is directed by Dr. John VanMeter, includes one
other researcher and a support staff including four research assistants, senior research associate/database
manager, a financial administrator, and a systems administrator. The imaging center has 3800 square feet,
which in addition to the 3T MRI Scanner, console, equipment room, and EEG/NIRS lab space includes four
offices and 4 cubicles. An additional 1400 square feet of office space is located in an adjacent building. CFMI
has extensive ongoing collaborations with Children’s National Medical Center, George Washington University,
University of Maryland, College Park, George Mason University, Kappametrics, Inc., and RTI (Research
Triangle Institute) International, Inc.
CFMI is also a core facility resource through the NIH funded General Clinical Research Center (GCRC) and the
Mental Retardation and Developmental Disorders Research Center (MRDDRC). Dr. VanMeter is the core
director for both of these centers.
Equipment
3T MRI Scanner - A research-dedicated 3.0 Tesla Siemens (Erlangen, Germany) Trio whole-body MRI system
with EPI (echo planar imaging) capability is located in the Center for Functional and Molecular Imaging
(CFMI), Georgetown University Medical Center (near to the Department of Neurology researchers and
accessible via indoor passages or an outdoor route). The gradient system has 40mT/m maximum strength with a
slew-rate of 200T/m/sec. The RF-system includes 8 parallel receiver channels each with a 1MHz bandwidth.
The console room is equipped with a two stimulus presentation systems for functional studies. In addition,
movies and music can be presented to the subject during structural imaging.
Computational and Backup/Archive
Workstations
Each employee of CMFI is provided with a workstation with an Intel Pentium-IV or better processor, 512MB of
RAM, 80GB hard disk, CD-RW drive, running Microsoft Windows XP Pro. A wide variety of software is
available for CFMI staff, including the MS Office XP suite of applications, which includes Word, Excel,
PowerPoint, Visio, Project, and Outlook; SPSS statistical packages, Adobe Photoshop, and utilities, such as
Adobe Acrobat, SSH, FTP, and Norton antivirus software.
Internet Connectivity
The LINUX Cluster and staff workstations are connected to the Georgetown University network, which has a
Internet2 connection to the internet. All of the CFMI computers are behind a CISCO firewall that limits access
from the outside. Key personnel are given unique VPN access allowing access to the computation resources
from home and other work sites.
Computer Security
CFMI local area network (LAN) is protected by Cisco PIX 525E firewall, which has 2x1Gbit ports for LAN
and WAN traffic, and 100Mbit interface for "demilitarized" zone (DMZ), that hosts the CFMI web server. It
provides security from outside threats and supports constant virtual private networks (VPN) connectivity
between several centers including CFMI offices in Building D, CSL (Center for the Study of Learning), and the
SAIL (Small Animal Imaging Laboratory, 7T MRI Facility) as well as a "dial-in" VPN connectivity for remote
access by CFMI users.
LINUX Cluster
CFMI is currently equipped with a 40-node Linux compute cluster, which has 10 TB of attached disk storage.
Every node is equipped with standard software for statistical analysis of fMRI and structural MRI data as well
as visualization utilizing software packages such as SPM, FSL, and MEDx. All data are backed up weekly by
writing to a 60-tape Ultrium Archive and Arkiea (Carlsbad, CA) backup software. All computers are linked via
an area 1000 base-T Ethernet Local Area Network (LAN). In addition, the center is currently equipped with 20
PCs running Windows XP. All PC’s can be used to login into the cluster and run an analysis via the center’s
LAN and are equipped with Microsoft Office and Adobe Photoshop.
Physical Facilities
The Center for Functional and Molecular Imaging occupies approximately 3800 square feet of space in the
Preclinical Science Building and 1000 square feet of office space in Building D, consisting of 12 separate
offices with four additional larger rooms that hold several desks and computer consoles and a printer. These
rooms provide work areas for up to 24 individuals. There is also a reception area for the Office Assistant and a
meeting area that s used solely by center personnel for meetings and to host guest lectures and discussions.
There is also an additional space set aside for office services, such as a copying machine, a fax machine and a
set of mailboxes.
We have two behavioral testing and evaluation rooms. These rooms are suitably furnished with adjustable
tables, chin rests and computers for subject testing and training as well as data entry and transfer. Subject
waiting areas are staffed. This setup ensures subject confidentiality is preserved and provides a comfortable
waiting area for family members. Both rooms will contain a computer for subject testing and a video camera for
recording sessions with subjects.
DEPARTMENT OF PSYCHIATRY
The Department of Psychiatry, under the chair of Steven Epstein, M.D., consists of 66 full and part-time faculty
members on site and 300 clinical faculty members. The department has a total of 43 offices located on campus
in Kober-Cogan Hall. Faculty and staff engage in extensive research, clinical, and educational efforts
throughout the university and beyond.
Georgetown Center for Trauma and the Community. The CTC is an interdisciplinary center housed in the
Department of Psychiatry (Bonnie L. Green, PhD, PI [P20 MH 068450] and Director). It has the goal of
developing culturally appropriate, innovative, and sustainable interventions to address trauma-related mental
health needs of low-income and minority populations seen in safety net primary care settings in the Washington
DC area. Academic partners include Georgetown’s School of Nursing and Health Studies, and its Departments
of Family Medicine, with research and/or training relationships with Physiology & Biophysics, Neurology,
Pediatrics, and Medicine. The Center provides coordinated research training and mentoring, and maintains
ongoing community partnerships to inform its direction and to implement collaborative research activities. To
increase the adoption and sustainability of these interventions, trauma-related services are conceptualized and
developed in close collaboration with four community partners: the Department of Health (Division of Maternal
and Child Health) and Greater Baden Medical Services, Inc., both in Prince George’s County, MD; the Primary
Care Coalition in Montgomery County, MD; and Unity Health Care, Inc. in the District of Columbia. The
Center’s work is conducted through three cores providing administrative oversight, statistical support, and
access to expert advisory groups; innovative methods/research designs integrating perspectives from applied
anthropology and public health; and expertise and support for the development of promising research. The
working partnerships and infrastructure capability fostered through the Center provide support to develop
comprehensive trauma intervention models that can be translated to settings serving low-income individuals in
the Washington DC area, and elsewhere.
The Center for Mental Health Outreach has been established to improve the mental health of underserved
children, adults and families in the Greater Washington DC area through education, public awareness, direct
clinical care, and collaboration with social service providers. The Department of Psychiatry has nationally
recognized expertise in finding ways to deliver mental health services to underserved people. Training,
education, and program evaluation further strengthen the Center’s capacity to serve as a mental health resource
for the community. As pioneers in adapting effective treatment methods to the special needs of underserved
populations, the Center for Mental Health Outreach will continue to expand the Department’s current research
and service collaborations with other departments, agencies, organizations and institutions, such as the
Depression and Related Affective Disorders Association (DRADA).
The Qualitative Data Lab at Georgetown University Department of Psychiatry has the capacity for storing
recorded data, and for processing and analyzing qualitative data. All recorded data are stored on secure
password protected computer networks and CD-ROMs. The lab resources include digitizing software that can
convert magnetic audiotape and videotape recordings into digital format. All recordings are digitized to mpeg
format because of compression efficiency and compatibility with qualitative software. The primary qualitative
software used in the lab is ATLAS.ti which can be used to review recorded data, transcribe relevant portions of
recordings, and conduct data searches. An important feature of ATLAS.ti is that it allows for the coding of not
just text, but also audio and video data. The lab is staffed by research assistants who have been trained to
digitize recordings, manage qualitative data and use the qualitative software.
Research offices include space for the research team. The department also has three conference rooms available
for meetings. The Psychiatry Research Conference room is wired to accommodate a PolyCom phone
conferencing system. Two TV/VCRs and two laptops are available for presentations (LCD projectors are
available from GU audiovisual dept).
Equipment: Each member of the research team will have a computer available connected to a LaserJet printer.
The computers are equipped with a modem and have a variety of software available for word processing and
statistical analysis (SPSS, SAS). Mainframe services are available for statistical procedures that are unavailable
on microcomputers. All computers are connected to the Internet.
Three photocopy machines are available within the department along with three fax machines and an HP 6350
CXI wit auto-feeder printer.
DEPARTMENT OF PHYSIOLOGY AND BIOPHYSICS
Department of Physiology and Biophysics: The Department of Physiology and Biophysics, under the chair of
Zofia Zukowska, MD, Ph.D., consists of 25 full and part-time faculty members and 20 research assistants and
Ph.D candidate students. The department occupies about 6,800 square feet of research laboratory space in the
Second floor of the Basic Science Building and in the Lombardi New Research Building. The offices and
conference rooms occupy about 2,600 square feet. Faculty members engage in extensive research and
educational efforts throughout the medical center.
Laboratory: Dr. Amri’s laboratory is located in the Basic Science building and occupies approximately 471
square feet of space. The tissue culture room of about 130 square feet is also located on the same floor.
Additional space is available on an as needed basis. The Department is fully equipped for cell culture and all
biochemical, morphological and molecular procedures described in the application.
Office: The Principal Investigator has about 120 square feet of office space total. Each investigator has
additional office space in the Department of Physiology and Biophysics. The Department provides telephone
and Fax.
Stress Physiology and Research Center (SPaRC) at the Department of Physiology and Biophysics,
Georgetown University. While clinical studies of the impact of stress on health and disease are many,
mechanistic human or animal studies are very sparse. There are only a handful of centers around the world,
where stress biology and/or stress management is being studied, and none approaches the field in an integrative
and comprehensive way. As a result, stress research has been too dispersed, interdisciplinary cooperation is
poor, and communication between researchers in traditional fields of stress biology and medicine and those
studying complementary medicine and mind-and-body modalities has been missing. Newly formed Stress
Physiology and Research Center (SPaRC) (or Stress Center for short) at Georgetown University will fulfill this
gap. Its mission is to study stress physiology in a comprehensive, integrative way, encompassing both
traditional medical sciences as well as alternative and complementary medical fields (CAM). The Center
capitalizes on expertise and research at Georgetown, beginning with the Department of Physiology and
Biophysics, and other basic departments of the Medical Center, and clinical departments, beginning with
Department of Psychiatry. The Stress Center also collaborates with others departments of the University, as
well as the Cardiovascular Research Institute at Medstar/Washington Hospital Center. The strengths of this
Stress Center is that its research is multi-departmental, based on both basic and clinical investigations and
translational medicine, and integrative in nature, encompassing genetic, molecular, cellular and whole animal
and human studies. The foundations for the newly formed Center already exist and are based on wellestablished and federally-funded projects in the Department of Physiology and Biophysics, Neuroscience,
Biochemistry and Molecular Biology, Medicine, Psychiatry and Demography. The Center will also carry out
investigations into the physiology of anti-stress or relaxation modalities, recently introduced into the CAM
educational and research program, at the Department of Physiology. Investigators currently involved in
working under the Stress Center include researchers from the Departments of Physiology and Biophysics (Dr.
Amri is an active member of SPaRC), Biochemistry and Molecular Biology, Neuroscience, Medicine and
Endocrinology, Psychiatry, the Center for Population and Health, and Cardiovascular Research
Institute/Medstar, The Center also includes a Human Stress Physiology Lab for conducting basic stress
reactivity tests, measuring hemodynamic, cognitive and behavioral, as well as biochemical parameters, allowing
for the phenotyping of human behavior and health with measurable outcomes.
Macromolecular Analysis Shared Resource: The LCCC Macromolecular Analysis Shared Resource utilizes
DNA sequencing, micro array, real-time PCR, phosphorimaging, densitometry, luminescence, molecular
modeling and spectrophotometry to support researchers on the Georgetown University campus for a nominal
fee. The resource instruments include a DNA sequencer (ABI 377), Multiimage workstation (Alpha Innotech
Chemiimager 5500), a phosphorimager (Molecular Dynamics 445SI), molecular modeling equipment from
Silicon Graphics with Insight II modeling software from Molecular Simulations, a fluorescence
spectrophotometer (Hitachi F-4500), a Fluorescence Polarization plate reader (Tecan Ultra), UV/VIS
spectrophotometer (DU640), and Wallac Victor2 multilabel counter. The shared resource also includes Agilent
Technologies’ Bioanalyzer, ABI Real-time PCR (7900 HT, a robot capable sequence detection system) and a
fully integrated Affymetrix GeneChip Instrument System. The Genechip system includes a fluidics station 400,
hybridization oven 640, GeneArray scanner and computer workstations for instrument control and data analysis.
The shared resource also maintains multiple software types for data analysis and provides data analysis services
for array users. The equipment is operated by two support staff and two co-faculties. This resource is
supported, in part, by a peer-reviewed NCI Cancer Center Support Grant to the LCCC and modest user fees.
Approximately 68 investigators utilize this facility annually.
Proteomics Shared Resource: The Proteomics core is equipped to provide a broad spectrum of proteomics
services to the research community. The services include technologies for the fractionation of complex protein
mixtures coupled with mass spectrometry. The Proteomics Core Facility is equipped with a 4800 MALDI -TOF
-TOF Mass Spectrometer (Applied Biosystems), a 4700 ABI MALDI-TOF-TOF mass spectrometer, Thermo
Electron LTQ ion trap mass spectrometer connected to a nano-HPLC system (LC-Packings) and QSTAR Elite
Hybrid LCMS/MS system, a nanoHPLC system online with a Probot MALDI spotter (Agilent). These
instruments will provide you with a range of techniques to analyze different aspects of a fractionated protein
sample. The Facility is also equipped with A complete set of 2D gel electrophoresis apparatus from Bio-Rad
(IEF cell and Protean XL), and the DALT6 large format 2D electrophoresis system (Amersham Biosciences),
high resolution densitometer G800, a PDQuest proteomics software (BioRad) for 2D gel image analysis as well
as Dymension software from Syngene.
The core provides two distinct mass spectrometry services, intact mass analysis (to identify masses of
proteins/peptides in relatively pure solutions) and protein identification using peptide mass mapping (involving
trypsin digestion of protein followed by mass spectrometry of the resulting peptide fragments). We routinely
perform 2D gel electrophoresis for proteins from cell, serum or tissue lysates, image analysis for differential
protein expression followed by protein identification using Mass Spectrometry. The core has also developed
protocols to successfully identify proteins from Immunoprecipitation reactions. The core plans to upgrade the
2D electrophoresis by the introduction of DIGE technology, use robotics for spot picking and also test, develop
and optimize protocols for Multidimensional Protein identification from reaction samples, non-radioactive
differential protein labeling using the SILAC, ICAT or ITRAQ systems and also serum profiling studies.
GEORGETOWN UNIVERSITY MEDICAL CENTER RESOURCES
National Center for Cultural Competence. The mission of the National Center for Cultural Competence
(NCCC) is to increase the capacity of health care and mental health programs to design, implement and evaluate
culturally and linguistically competent service delivery systems. The NCCC conducts an array of activities to
fulfill its mission including: (1) training, technical assistance and consultation; (2) networking, linkages and
information exchange; and (3) knowledge and product development and dissemination. Major emphasis is
placed on policy development, assistance in conducting cultural competence organizational self-assessments,
and strategic approaches to the systematic incorporation of culturally competent values, policy, structures and
practices within organizations. The NCCC is a component of the Georgetown University Child Development
Center and is housed within the Department of Pediatrics of the Georgetown University Medical Center. It is
funded and operates under the auspices of Cooperative Agreement #U93-MC-00145-08 and is supported in part
from the Maternal and Child Health program (Title V, Social Security Act), Health Resources and Services
Administration, Department of Health and Human Services. DO WE NEED THIS?
Community Research & Learning Network (CoRAL) – promotes partnerships between researchers and
community-based organizations that mobilize their collective resources to support social change. Partners
include faculty/researchers, the community organizations, and GU students. The network provides
opportunities for faculty/researchers to pursue collaborative projects with community members.
General Clinical Research Center (GCRC). The objective of the GCRC program is to make available to
medical scientists the resources that are necessary for the conduct of clinical research. The General Clinical
Research Center (GCRC) is funded by a grant from the National Institutes of Health (NIH) and offers the
faculty of Georgetown University Medical Center and peer reviewed funded investigators from the surrounding
District of Columbia hospitals the optimal environment in which to conduct clinical research. The GCRC does
not fund specific research projects, but provides infrastructure and support in the form of inpatient beds,
outpatient services, staff and core equipment necessary to conduct studies. The General Clinical Research
Centers (GCRC) program of the NIH was established in 1960 to create and sustain specialized institutional
resources in which clinical investigators can observe and study human physiology as well as study and treat
disease with innovative approaches. The objective of the GCRC program is to make available to medical
scientists the resources that are necessary for the conduct of clinical research. The primary purpose for a GCRC
is to provide the clinical research infrastructure to investigators who receive peer-reviewed primary research
funding from the NIH and other components of the US Government. It can also be used to support other
hypothesis-based research and can be available for industry-sponsored research at cost. The Clinical Research
Center occupies the east wing of seventh floor of the Main Hospital Building of the Georgetown University
Medical Center. The GCRC will provide space and nursing and other technical staff at no cost. Labaoratory
costs are budgeted at cost.
The Georgetown University Bioanalytical Center (BAC) within the GCRC is a chromatography lab located
on the ground level of the Preclinical Sciences Building and occupies approximately 1500 square feet in rooms
GD1 and GD3. The BAC contains a core laboratory that is funded as part of the Georgetown University
Clinical Research Center but is also available to investigators at the Medical Center, the University, as well as
outside clients, on a fee-for-service basis. The laboratory is dedicated to the development, validation and
application of bioanalytical methods in support of clinical, pharmacokinetic and pharmacogenetic studies as
well as basic pharmacological research. The staff consists of experienced laboratory scientists, all of whom are
capable of performing the following services:
Method development for established and experimental drugs or other analysis of interest.
Method validation
•
•
•
•
•
•
•
•
•
•
Sample analysis from clinical studies
Confirmation of mass and/or purity of products resulting from synthesis or in-vitro metabolism studies
Chiral assays
Separation and collection of chiral enantiomers
Immunoassays
Use of HPLC as sample clean-up for immuno-assays
Liquid chromatograph with mass spectroscopy
High performace liquid chromatography with UV and fluorescence detection
Gas chromatography with nitrogen, phosphorus, and flame ionization detection
Capillary electrophoresis with UV and laser-induced detection
The laboratory maintains the equipment listed below:
•
•
•
•
•
•
5 HPLC systems with UV, fluorescence and electrochemical detectors (ThermoSeparations/Agilent)
2 Capillary electrophoresis (CE) systems with UV detectors (ABI/PE)
1 CE system with UV and laser-induced florescence (LIF) detectors (Biorad)
API-3000 Mass Spectrometer (sciex) (Applied Biosystems)
API-4000 Mass Spectrometer
IMMULITE by Diagnostic Products Inc
Some of the systems are fully automated with autosamplers and on-line computer-based data collection. All
necessary support equipment for the storage {three -80 °C and one -20 °C freezers} and preparation {balances,
pH meters, centrifuges, solid phase extraction apparatus, etc} of clinical samples are contained within the
laboratory.
The Center for Clinical Bioethics (CCB). was established in 1991 as a center of excellence at Georgetown
University Medical Center, complementing the activities in ethics of the other divisions of the University. Thus,
the CCB functions in concert with the Kennedy Institute of Ethics and the Department of Philosophy on the
main campus, as well as with faculty at the Law Center. Center scholars participate in internal review boards,
the Georgetown University Hospital Ethics Committee, and interdisciplinary and post-care rounds. Faculty also
collaborate with MedStar’s ethics program based at the Washington Hospital Center. Visiting scholars from all
over the world participate in seminars, meetings, consultations, and all programs of the CCB. The faculty of the
CCB have primary appointments in Internal Medicine, Family Medicine, Philosophy, Nursing, and Oncology.
They conduct research in the philosophy of medicine, end-of-life issues, beginning-of-life concerns, genetics,
research ethics, and organizational ethics, teach, and participate in patient care. The CCB also coordinates the
Medical Center’s Ethics Consult Service on behalf of the Ethics Committee of Georgetown University Hospital.
Faculty members teach research ethics in the graduate school and in the DC Clinical Research Training
Consortium. The four-year Bioethics Curriculum for Health Care Professionals is directed by the CCB, and
combines graduate nursing students with second year medical students in a single, innovative course. The
Center also organizes a formal ethics curriculum for the Internal Medicine house staff. The CCB sponsors
colloquia and conferences, providing continuing ethics education for faculty, staff, students, and the wider
community, both local and national. The CCB coordinates the bioethics track in the MD/PhD combined degree
program.
The School Of Nursing And Health Studies. The mission of the School of Nursing & Health Studies (NHS)
is consistent with that of the University’s mission to provide student-centered, excellent undergraduate and
graduate professional education in the Jesuit and Catholic tradition. NHS continues its long tradition of
preparing morally reflective health care leaders and scholars who strive to improve the health and well being of
all people, with sensitivity to cultural differences and issues of justice. Since its founding in 1903, NHS has
been at the forefront of the health care field, preparing future leaders to respond to the growing complexity of
health care delivery at all levels. Graduates pursue various careers within nursing, medicine, law, health policy,
health management, and public health among many others. The Undergraduate Program offers its students a
broad liberal arts education balanced with the natural and behavioral sciences through innovative curricula in
either the Bachelor of Science in Nursing (BSN) or the Bachelor of Science (BS) in Health Studies with majors
in Health Care Management and Policy, Human Science, and International Health. The Master of Science
degree programs lead to advanced nursing practice in six specialty areas: Nursing Education, Nurse Midwifery /
Women’s Health, Acute Care Nurse Practitioner, Acute and Critical-Care Clinical Nurse Specialist, Family
Nurse Practitioner, and Nurse Anesthesia. The Master of Science in Health Systems Administration is taught in
conjunction with the School of Business and does not require a BSN. The Center on Health and Education
focuses on the development and testing of culturally competent prevention, intervention strategies, and public
policies that promote the health of individuals and families, and empower communities in order to eliminate
racial/ethnic health disparities across the life span. St. Mary’s Hall, renovated during 2001-2002, is the home of
NHS and houses the offices for administration, faculty, and staff, and includes classrooms, conference rooms, a
computer laboratory, a simulator center and a technologies laboratory. Academic instruction occurs in one of
six new multi-media class rooms: one room with 122 desks, three rooms with 50 desks each and two seminar
rooms with 22 seats. The student commons is equipped with computers, lockers, a study area, and gathering
place for students.
CAPRICORN (Capital Area Primary Care Research Network) is a network of providers of primary care
health care in Washington metropolitan area interested in conducting practice-based research and was founded
by proposed co-investigator and current faculty member in the department of Family Medicine, Dan
Merenstein, M.D. CAPRICORN identifies and conducts research studies that expand the science base of
primary care. CAPRICORN provides efficient means of studying outcomes in primary care, thus being highly
applied and practical for physicians and health care providers. CAPRICORN pools patient populations of
differing ethnic and socioeconomic status, allowing greater application of research findings and the ability to
compare and contrast different populations. CAPRICORN is supported by the primary care units of
Georgetown University School o of Medicine, which strengthens its capacity for protocol development and
human subjects review.
Community Partners
Unity Health Care, Inc. (Unity), Washington, DC is the largest private organization providing primary
medical care to low-income, uninsured District of Columbia residents. A 501©(3), private, nonprofit agency,
Unity operates health centers established under Section 330(h) Stewart B. McKinney Homeless Assistance and
330(e) of the US Public Health Services Act. In 2001 as a result of its long history of providing high quality
primary health care to the city's indigent, uninsured, and underserved, Unity was sought out to assist in the
transformation of the District's publicly operated health care system. As a contractor of the District of
Columbia Department of Health, Unity operates six ambulatory health centers throughout the city bringing the
total to eleven health centers, in addition to nine homeless health care sites, two HIV/AIDS treatment centers,
and a high school-based health center. Unity has ongoing clinical partnerships with Georgetown, including eight
Family Medicine physicians.
In 2001, prior to expanding to include six former city-run ambulatory care centers, Unity served over 38,000
clients. Over half of the clients earned less than 200% of the poverty level, 38% below 100%. Fifty seven
percent of Unity's patients were uninsured, with the rest covered by Medicaid (7%), Medicare (4%), other
public insurance (30%), and private insurance (1%). Seventy four percent of Unity clients were Black, 21%
Hispanic, 1% Asian, 1% white, and 3% unknown. Fifty seven percent of Unity clients were female. Children
under the age of five made up 9 % of clients. Children between five and nineteen accounted for an additional
19% of Unity clients. Adults over sixty-five years of age represented 6 % of Unity's patient population in 2001.
The addition of the six new ambulatory care centers is expected to double the Unity patient base in 2002.
Unity is committed to providing culturally responsible health care and social services. Towards this end, Unity
has recruited a diverse workforce and includes issues of cultural sensitivity and competence in the orientation
program for all new employees. Many employees are members of the community in which they work. Centers
serving clients for whom English is not the primary language have clinical and non-clinical bi- and tri-lingual
staff members. Unity has incorporated the goal of providing superior culturally competent health care and
social services into its ongoing quality management program.
Unity has many longstanding relationships with the District’s academic health centers. Health professional
students, including medical students, nursing students, physician assistants, and medical residents from
Georgetown, George Washington, Howard, and Catholic Universities partner every day with Unity providers.
Unity also has a long tradition of partnering with researchers as part of its commitment to improving the quality
of health and healthcare of the District’s residents and communities.
Community Partners/Research Sites
Primary Care Coalition of Montgomery County, MD. Montgomery County is the largest jurisdiction in
Maryland, and it has the largest concentration of Latinos in the greater DC area and the largest minority
population in the state of Maryland. The minority population is approximately 12% Latino (mostly from Central
and South America), 12% Asian and Pacific Islanders, and 12% African American. Almost 25% of public
school children qualify to receive free or reduced meals, and these same children speak more than 150 different
languages. There are an estimated 80,000 adults in the county without health insurance, many of whom
experience psychosocial stressors associated with poverty, language deficiencies, immigration, and social
isolation. Many immigrants originate from war-torn countries. The state of Maryland instituted major changes
in its mental health system in 1995 that closed public sector mental health programs and established a network
of non-profit and for-profit clinics and private practitioners (Maryland Health Partners) to provide care. Since
that time, the capacity for jurisdictions including Montgomery County to provide mental health care has been
shrinking as a result of inadequate financing of the program by the state, the resulting bankruptcy of the largest
non-profit mental health clinic, and the fragmentation of services across the county.
The Primary Care Coalition of Montgomery County, Inc. was established in 1993 to provide access to high
quality, culturally sensitive, primary and specialty care services for low-income uninsured children and adults in
Montgomery County. Through six safety net clinics, last year the PCC helped support the health care of nearly
13,000 patients through more than 20,000 patient visits. It also manages a variety of programs, including a
county-funded program that provides support for the safety net health-care providers, and Care for Kids, which
purchases primary care for 2,000 uninsured children. The Coalition is also initiating a Child Assessment Center
providing multi-disciplinary services to children who have been the victims of child abuse and neglect.
In 2001, the Montgomery County Council committed to funding a "system of primary care" through the
Primary Care Coalition. In December 2004, the Montgomery County Executive and County Council announced
long-term support for Montgomery Cares, a Coalition program to expand access to care to 40,000 presently
uninsured people through a network of community clinics. Demonstration projects in dental health, and in
mental health (in collaboration with Georgetown University), have been funded and are being mounted.
Greater Baden Medical Services Inc. Greater Baden is a federally qualified 501(c)(3) healthcare system that
was founded in 1972. The system serves communities in southern Prince George’s County, Charles County, and
St. Mary’s County Maryland. Comprised of 5 clinics that provide a spectrum of services, Greater Baden is a
community based health provider committed to delivering the highest quality of healthcare services. It provides
primary health services and facilitates health promotion/disease prevention activities in an efficient, effective,
and comprehensive manner for the individuals and communities served, regardless of ability to pay, and serves
as the safety net provider for Southern Maryland.
In 2004, the system served over 8000 patients, consisting of over 21,000 medical encounters. Of those, 60%
were uninsured, 30% had Medicare/Medicaid, and only 8% had private insurance. The patients served are 66%
African American, 21% White, 10% Latino, and 3% other. As a member of the Bureau of Primary Care’s
Health Disparities Collaborative, GBMS uses the Chronic Care Model as an operational framework. Programs
include Women, Infants, and Children (WIC) services, Access to care for Asian Indians, telemedicine and
continuing education, and expanded title III capacity to improve communications with its rural clinics and
increase accessibility to computers for staff.
Ft. Lincoln Family Medicine Center. The Ft. Lincoln Family Medicine Center in Colmar Manor, Maryland is
a full spectrum Family Medicine office caring for children and adults of all ages, including prenatal care. It is
affiliated with the Georgetown University Medical Center, serving as a training site for its Family Medicine
residents, as well as the Providence Hospital of Prince George’s County, Maryland. The Center’s patients live
in Washington D.C. and suburban Maryland. They consist of mostly ethnic minorities, and most patients are on
Medicaid or Medicare, although patients with a wide variety of insurance plans are seen. The Center averages
about 19,000 patient visits each year.
Georgetown Computational Resources
Georgetown Computational Core Facility (CCF). All grant/contract proposals to have access to the
Georgetown Computational Core Facility (CCF). The CCF provides state-of-the-art computational resources
and expertise to researchers who are developing and analyzing computational and/or data intensive models in
neuroscience, oncology, cellular processes, and other numerically-intensive disciplines. The primary hardware
of the facility are several multi-processor Beowulf clusters capable of serial and parallel-processing across an
array of high-speed central processing units (CPU's). This system also provides centralized file-server
capabilities as well as resources necessary to archive and secure data. All computer resources are connected to
the University's high-speed network, and to the Internet2 Abilene network, and follow the Georgetown
University Information Security guidelines in compliance with the NIH Application/System Security Plan for
Applications and General Support Systems. On-line user statistics for CCF clusters are available at
http://www.clusters.arc.georgetown.edu/statistics/. The Computational Core Facility is administered by the
Georgetown University division of Advanced Research Computing (ARC) - http://arc.georgetown.edu. All
University Departments can access the resources of this facility. In addition, the facility supports several Ph.D
and Master's level personnel with extensive experience in programming and computational support including
systems administration and database programming. These personnel help ensure that faculty can take full
advantage of available resources, as well as planned NIH technology in the grid computing space. Thus, the
CCF provides an Institutionally facility for extensive scientific support for grants and contracts via access to
leading edge computational resources.
University Information Services (UIS) is charged with providing technology services, access to information,
and supporting administrative systems for the faculty, students, staff, and administration of Georgetown
University. In addition, UIS is responsible for creating a technology infrastructure to support electronic
communication -- voice, video, and data -- now and into the future. UIS operates under the direction of the Vice
President for Information Services and Chief Information Officer (CIO), with
guidance from various advisory groups.
GU Information Services - Video Teleconferencing is available to faculty and staff for professional purposes.
This service is made possible via the University’s phone system, a conventional TV, and a PolyCom View
Station. There are two rooms on campus that have been specially wired for teleconferencing.
Georgetown University has several conference rooms available for departmental functions. The Department of
Psychiatry has slated the Research Auditorium located in the Research Building to accommodate the larger
workshops and seminars associated with the Center grant proposal. The Research Auditorium houses state of
the art equipment and technical experts to assist with functions.
In addition to individual conference rooms located throughout the University campus, The Leavey Conference
Center houses an on-campus hotel for out of the area participants along with catering services and several
interconnected conference rooms.
EXISTING EQUIPMENT
CENTER FOR FUNCTION AND MOLECULAR IMAGING
The Center for Function and Molecular Imaging (CFMI), which is directed by Dr. John VanMeter, currently has
a 3T MRI Scanner, a stand-alone high-density EEG system, and two NIRS (Near Infrared Spectroscopy)
systems.
Equipment
3T MRI Scanner - A research-dedicated 3.0 Tesla Siemens (Erlangen, Germany) Trio whole-body MRI system
with EPI (echo planar imaging) capability is located in the Center for Functional and Molecular Imaging
(CFMI), Georgetown University Medical Center (near to the Department of Neurology researchers and
accessible via indoor passages or an outdoor route). The gradient system has 40mT/m maximum strength with a
slew-rate of 200T/m/sec. The RF-system includes 8 parallel receiver channels each with a 1MHz bandwidth.
The console room is equipped with a two stimulus presentation systems for functional studies. In addition,
movies and music can be presented to the subject during structural imaging.
As of December 2007, the Trio MRI scanner was retrofitted with the Tim (Total Image Matrix) upgrade. This
upgrade includes a combinable 18 RF channels, a new digital RF transmit/receive system supporting the new
matrix coils: a new integrated body coil, a 12-channel head matrix coil, a 24-channel spine matrix coil, and a 4channel neck matrix coil. These matrix coils are compatible with the iPAT (integrated Parallel Acquisition
Techniques) technology that supports parallel data acquisition in all phase-encode directions providing up to a
12-fold decrease in acquisition speed and/or significant improvement in the signal-to-noise ratio. The other
major feature of the Tim upgrade is the actively shielded water-cooled Siemens exclusive gradient TQ-engine
system that includes a noise-optimized system with a complete noise capsule for the whole magnet via a foam
insulation of the system covers and an upgrade to the gradient set. The reduction in acoustic noise is up to 20
dB(A) as compared to conventional systems. This is a reduction of 90% in sound pressure. This reduction in
noise is most evident in the gradient demanding protocols in particular the EPI-based techniques such as fMRI,
diffusion imaging, and perfusion imaging. In addition, the new TQ-engine gradients have maximum gradient
amplitude of 45 mT/m for longitudinal direction and 40 mT/m for horizontal and vertical direction, (i.e. 72
mT/m vector summation gradient performance).
Physiological Monitoring - Assessment of various physiological measures can be useful for some experiments.
The CFMI imaging center has an Invivo Millennia 3155A/3155MVS (Invivo Research, Orlando, FL)
physiological monitoring system that captures heart rate (ECG electrocardiogram), respiration rate, end tidal
CO2, inspire CO2, and pulse oximetry. Data is acquired with a sampling rate of 1 Hz by the main system and
sent to the remote monitor through a wave guide. The monitoring unit is connected to the stimulus presentation
computer via a serial cable/port where the measures are recorded. The Psylab/SAM unit (Contact Precision
Instruments, Boston, MA) connects to the stimulus presentation computer via a parallel port and is configured
to receive event codes from E-Prime. The SAM unit records both galvanic skin response and temperature with a
sampling rate of 100 Hz.
Eye-tracker - The Mag Design & Engineering (Sunnyvale, CA) eye-tracker glasses use a fiber optic camera to
capture the right eye but still allows for binocular viewing of stimuli. This eye-tracker has a 30 Hz sampling
rate. Video output from the eye-tracker is connected to a PC via a video tuner card. Output is sent to the
ViewPoint software (Arrington Researc, Scottsdale, AZ), which has a real-time recording capability and
interface to other software. Currently, this system is configured to receive the trigger pulse from the scanner to
signal when to begin recording. With ViewPoint it is possible to record X and Y pupil position (eye-gaze), pupil
width, and ocular torsion.
EEG Laboratory – The EEG lab is a fully equipped electrophysiology laboratory, with a dedicated Electrical
Geodesics high-density EEG system. The EGI GES 250 digitizes 256 channels of data up to 1000 samples/sec,
with a 0.1 to 300 Hz bandwidth, and a vertex recording reference. The system is supplied with a dual-processor
PowerMac G5, the Apple Cinema Display HD and a digital video synchronized with the EEG signal. The
instrument has advanced software for electrode impedance control and eye movement artifact rejection.
Averaged event-related potentials (ERPs) can be examined with both topographic waveform plots and surface
electrical field animations (maps every 4 ms sample) for each experimental condition. The instrument also
allows estimates of radial current density to be made with the Laplacian transform (second spatial derivative of
the surface voltages) of the ERP averages across subjects to characterize the features of the head surface
electrical fields that can be attributed to superficial cortical sources. The addition of electrophysiology to the
other imaging modalities available to Core users will allow experiments combining the superb temporal
resolving capabilities ERP approaches with the sensitivity and spatial resolving properties of functional MRI.
The integration of these two methods will allow investigation of research questions probing modulations in the
spatiotemporal character of brain activity.
Near Infrared Spectroscopy (NIRS) – Two continuous-wave Near Infrared Spectroscopy systems are located
in the Center. Each has 32 lasers (intensities driven at 32 different frequencies) and 32 detectors. At present, the
32 lasers are divided into 16 lasers at 690 nm and 16 at 830 nm. Alternately, the number of wavelengths can be
increased and multiplexed by an optical switch. A master clock generates the 32 distinct frequencies between
6.4 kHz and 12.6 kHz in ~200 Hz steps. These frequencies are then used to drive the individual lasers with
current stabilized square-wave modulation. The detectors are avalanche photodiodes (APD’s, Hamamatsu
C5460-01), and following each APD module is a bandpass filter, cut-on frequency of ~500 Hz to reduce 1/f
noise and the 60 Hz room light signal, and a cut-off frequency of ~16 kHz to reduce the third harmonics of the
square-wave signals. After the bandpass filter is a programmable gain stage to match the signal levels with the
acquisition level on the analog-to-digital converter within the computer. Each detector is digitized at ~44 kHz
and the individual source signals are then obtained by use of a digital bandpass filter (e.g. an infinite-impulseresponse filter). The features of the system will allow a hexagonal-mesh optical probe to be created that spans a
rectangle measuring ~12 x18 cm. The two systems can be used together to cover the entire head with 64
detectors.
EEG+fMRI Stimulus Presentation – The stimulus presentation system developed by MRA, Inc (Washington,
PA) is available. This system features a 2.53GHz Pentium 4 computer with 1Gb of RAM in the CFMI control
room that is used to present fMRI paradigms to subjects in the Siemens Trio scanner. Associated with this
computer system is audio equipment for playing sound from a variety of sources to the subject and a display
system for showing the computer screen, DVD/VCR prerecorded programs, or live-TV from a Cable TV
system. The projector in use at the CFMI is an Epson PowerLite 5000. This projector uses three 1.32-inch LCD
panels with a range of resolutions 640x480, 832x624, and 1024x768. The stock lens provided with the projector
was replaced with a custom made 150-230mm focal length zoom lens built by Buhl Optical (Pittsburgh, PA).
The projector is located in the equipment room adjacent to the rear of the scanner room. The projected image
displays on a rear projection screen (Da-Lite, Da-plex substrate with Video Vision optical coating) cut to fit the
upper half of the scanner bore. The audio amplifier/receiver is a TEAC model AG-370. The graphic equalizer is
an TEAC EQA-220 which features ten frequency bands per channel, a multi-colored spectrum analyzer display,
left and right channel level controls, an 80 dB S/N ratio, 5-100 kHz (±1 dB) frequency response, and an 0.03%
THD. The DVD/CD-player is a Panasonic DMR-E30 which is also capable of playing MP3 audio CD’s. The
VCR unit is a JVC HR-S2901U which features Super VHS with digital live circuitry, Super VHS ET allows
high resolution recording on a conventional VHS cassette, a Hi-Fi Stereo with built-in MTS decoder, Pro-cision
19u EP heads for near SP quality in EP speed, Ultra-Spec Drive with jitter reduction circuit, shuttle plus, instant
review, digital AV tracking, on-screen tape position indicator, variable slow motion, 181-channel cable
compatible frequency-synthesized tuner, HQ (High Quality) system circuitry for excellent VHS picture quality,
color on-screen display, multi-speed search (19-step SP/21-Step EP) including 5-Speed slow motion. The
system also includes 10 fiber optic button response boxes that interface to both E-Prime and SuperLab (Cedrus
Corporation, San Pedro, CA). Finally, the system receives fiber optic output from the Siemens Trio scanner for
paradigm triggering.
Computational and Backup/Archive
Workstations
Each employee of CMFI is provided with a workstation with an Intel Pentium-IV or better processor, 512MB of
RAM, 80GB hard disk, CD-RW drive, running Microsoft Windows XP Pro. A wide variety of software is
available for CFMI staff, including the MS Office XP suite of applications, which includes Word, Excel,
PowerPoint, Visio, Project, and Outlook; SPSS statistical packages, Adobe Photoshop, and utilities, such as
Adobe Acrobat, SSH, FTP, and Norton antivirus software.
Internet Connectivity
The LINUX Cluster and staff workstations are connected to the Georgetown University network, which has a
Internet2 connection to the internet. All of the CFMI computers are behind a CISCO firewall that limits access
from the outside. Key personnel are given unique VPN access allowing access to the computation resources
from home and other work sites.
Computer Security
CFMI local area network (LAN) is protected by Cisco PIX 525E firewall, which has 2x1Gbit ports for LAN
and WAN traffic, and 100Mbit interface for "demilitarized" zone (DMZ), that hosts the CFMI web server. It
provides security from outside threats and supports constant virtual private networks (VPN) connectivity
between several centers including CFMI offices in Building D, CSL (Center for the Study of Learning), and the
SAIL (Small Animal Imaging Laboratory, 7T MRI Facility) as well as a "dial-in" VPN connectivity for remote
access by CFMI users.
LINUX Cluster
CFMI is currently equipped with a 40-node Linux compute cluster, which has 10 TB of attached disk storage.
Every node is equipped with standard software for statistical analysis of fMRI and structural MRI data as well
as visualization utilizing software packages such as SPM, FSL, and MEDx. All data are backed up weekly by
writing to a 60-tape Ultrium Archive and Arkiea (Carlsbad, CA) backup software. All computers are linked via
an area 1000 base-T Ethernet Local Area Network (LAN). In addition, the center is currently equipped with 20
PCs running Windows XP. All PC’s can be used to login into the cluster and run an analysis via the center’s
LAN and are equipped with Microsoft Office and Adobe Photoshop.
DEPARTMENT OF PHYSIOLOGY AND BIOPHYSICS
Macromolecular Analysis Shared Resource: The LCCC Macromolecular Analysis Shared Resource utilizes
DNA sequencing, micro array, real-time PCR, phosphorimaging, densitometry, luminescence, molecular
modeling and spectrophotometry to support researchers on the Georgetown University campus for a nominal
fee. The resource instruments include a DNA sequencer (ABI 377), Multiimage workstation (Alpha Innotech
Chemiimager 5500), a phosphorimager (Molecular Dynamics 445SI), molecular modeling equipment from
Silicon Graphics with Insight II modeling software from Molecular Simulations, a fluorescence
spectrophotometer (Hitachi F-4500), a Fluorescence Polarization plate reader (Tecan Ultra), UV/VIS
spectrophotometer (DU640), and Wallac Victor2 multilabel counter. The shared resource also includes Agilent
Technologies’ Bioanalyzer, ABI Real-time PCR (7900 HT, a robot capable sequence detection system) and a
fully integrated Affymetrix GeneChip Instrument System. The Genechip system includes a fluidics station 400,
hybridization oven 640, GeneArray scanner and computer workstations for instrument control and data analysis.
The shared resource also maintains multiple software types for data analysis and provides data analysis services
for array users. The equipment is operated by two support staff and two co-faculties. This resource is
supported, in part, by a peer-reviewed NCI Cancer Center Support Grant to the LCCC and modest user fees.
Approximately 68 investigators utilize this facility annually.
Proteomics Shared Resource: The Proteomics core is equipped to provide a broad spectrum of proteomics
services to the research community. The services include technologies for the fractionation of complex protein
mixtures coupled with mass spectrometry. The Proteomics Core Facility is equipped with a 4800 MALDI -TOF
-TOF Mass Spectrometer (Applied Biosystems), a 4700 ABI MALDI-TOF-TOF mass spectrometer, Thermo
Electron LTQ ion trap mass spectrometer connected to a nano-HPLC system (LC-Packings) and QSTAR Elite
Hybrid LCMS/MS system, a nanoHPLC system online with a Probot MALDI spotter (Agilent). These
instruments will provide you with a range of techniques to analyze different aspects of a fractionated protein
sample. The Facility is also equipped with A complete set of 2D gel electrophoresis apparatus from Bio-Rad
(IEF cell and Protean XL), and the DALT6 large format 2D electrophoresis system (Amersham Biosciences),
high resolution densitometer G800, a PDQuest proteomics software (BioRad) for 2D gel image analysis as well
as Dymension software from Syngene.
The core provides two distinct mass spectrometry services, intact mass analysis (to identify masses of
proteins/peptides in relatively pure solutions) and protein identification using peptide mass mapping (involving
trypsin digestion of protein followed by mass spectrometry of the resulting peptide fragments). We routinely
perform 2D gel electrophoresis for proteins from cell, serum or tissue lysates, image analysis for differential
protein expression followed by protein identification using Mass Spectrometry. The core has also developed
protocols to successfully identify proteins from Immunoprecipitation reactions. The core plans to upgrade the
2D electrophoresis by the introduction of DIGE technology, use robotics for spot picking and also test, develop
and optimize protocols for Multidimensional Protein identification from reaction samples, non-radioactive
differential protein labeling using the SILAC, ICAT or ITRAQ systems and also serum profiling studies.
DEPARTMENT OF PSYCHIATRY
The Department of Psychiatry, under the chair of Steven Epstein, M.D., consists of 66 full and part-time faculty
members on site and 300 clinical faculty members. The department has a total of 43 offices located on campus
in Kober-Cogan Hall. Faculty and staff engage in extensive research, clinical, and educational efforts
throughout the university and beyond.
The Qualitative Data Lab at Georgetown University Department of Psychiatry has the capacity for storing
recorded data, and for processing and analyzing qualitative data. All recorded data are stored on secure
password protected computer networks and CD-ROMs. The lab resources include digitizing software that can
convert magnetic audiotape and videotape recordings into digital format. All recordings are digitized to mpeg
format because of compression efficiency and compatibility with qualitative software. The primary qualitative
software used in the lab is ATLAS.ti which can be used to review recorded data, transcribe relevant portions of
recordings, and conduct data searches. An important feature of ATLAS.ti is that it allows for the coding of not
just text, but also audio and video data. The lab is staffed by research assistants who have been trained to
digitize recordings, manage qualitative data and use the qualitative software.
Research offices include space for the research team. The department also has three conference rooms available
for meetings. The Psychiatry Research Conference room is wired to accommodate a PolyCom phone
conferencing system. Two TV/VCRs and two laptops are available for presentations (LCD projectors are
available from GU audiovisual dept).
Equipment: Each member of the research team will have a computer available connected to a LaserJet printer.
The computers are equipped with a modem and have a variety of software available for word processing and
statistical analysis (SPSS, SAS). Mainframe services are available for statistical procedures that are unavailable
on microcomputers. All computers are connected to the Internet.
Three photocopy machines are available within the department along with three fax machines and an HP 6350
CXI wit auto-feeder printer.
GEORGETOWN UNIVERSITY MEDICAL CENTER RESOURCES
General Clinical Research Center (GCRC). The objective of the GCRC program is to make available to
medical scientists the resources that are necessary for the conduct of clinical research. The General Clinical
Research Center (GCRC) is funded by a grant from the National Institutes of Health (NIH) and offers the
faculty of Georgetown University Medical Center and peer reviewed funded investigators from the surrounding
District of Columbia hospitals the optimal environment in which to conduct clinical research. The GCRC does
not fund specific research projects, but provides infrastructure and support in the form of inpatient beds,
outpatient services, staff and core equipment necessary to conduct studies. The General Clinical Research
Centers (GCRC) program of the NIH was established in 1960 to create and sustain specialized institutional
resources in which clinical investigators can observe and study human physiology as well as study and treat
disease with innovative approaches. The objective of the GCRC program is to make available to medical
scientists the resources that are necessary for the conduct of clinical research. The primary purpose for a GCRC
is to provide the clinical research infrastructure to investigators who receive peer-reviewed primary research
funding from the NIH and other components of the US Government. It can also be used to support other
hypothesis-based research and can be available for industry-sponsored research at cost. The Clinical Research
Center occupies the east wing of seventh floor of the Main Hospital Building of the Georgetown University
Medical Center. The GCRC will provide space and nursing and other technical staff at no cost. Labaoratory
costs are budgeted at cost.
The Georgetown University Bioanalytical Center (BAC) within the GCRC is a chromatography lab located
on the ground level of the Preclinical Sciences Building and occupies approximately 1500 square feet in rooms
GD1 and GD3. The BAC contains a core laboratory that is funded as part of the Georgetown University
Clinical Research Center but is also available to investigators at the Medical Center, the University, as well as
outside clients, on a fee-for-service basis. The laboratory is dedicated to the development, validation and
application of bioanalytical methods in support of clinical, pharmacokinetic and pharmacogenetic studies as
well as basic pharmacological research. The staff consists of experienced laboratory scientists, all of whom are
•
•
•
•
•
•
•
•
•
•
Sample analysis from clinical studies
Confirmation of mass and/or purity of products resulting from synthesis or in-vitro metabolism studies
Chiral assays
Separation and collection of chiral enantiomers
Immunoassays
Use of HPLC as sample clean-up for immuno-assays
Liquid chromatograph with mass spectroscopy
High performace liquid chromatography with UV and fluorescence detection
Gas chromatography with nitrogen, phosphorus, and flame ionization detection
Capillary electrophoresis with UV and laser-induced detection
This laboratory maintains the equipment listed below:
•
•
•
•
•
•
5 HPLC systems with UV, fluorescence and electrochemical detectors (ThermoSeparations/Agilent)
2 Capillary electrophoresis (CE) systems with UV detectors (ABI/PE)
1 CE system with UV and laser-induced florescence (LIF) detectors (Biorad)
API-3000 Mass Spectrometer (sciex) (Applied Biosystems)
API-4000 Mass Spectrometer
IMMULITE by Diagnostic Products Inc
Some of the systems are fully automated with autosamplers and on-line computer-based data collection. All
necessary support equipment for the storage {three -80 °C and one -20 °C freezers} and preparation {balances,
pH meters, centrifuges, solid phase extraction apparatus, etc} of clinical samples are contained within the
laboratory.
Georgetown Computational Resources
Georgetown Computational Core Facility (CCF). All grant/contract proposals to have access to the
Georgetown Computational Core Facility (CCF). The CCF provides state-of-the-art computational resources
and expertise to researchers who are developing and analyzing computational and/or data intensive models in
neuroscience, oncology, cellular processes, and other numerically-intensive disciplines. The primary hardware
of the facility are several multi-processor Beowulf clusters capable of serial and parallel-processing across an
array of high-speed central processing units (CPU's). This system also provides centralized file-server
capabilities as well as resources necessary to archive and secure data. All computer resources are connected to
the University's high-speed network, and to the Internet2 Abilene network, and follow the Georgetown
University Information Security guidelines in compliance with the NIH Application/System Security Plan for
Applications and General Support Systems. On-line user statistics for CCF clusters are available at
http://www.clusters.arc.georgetown.edu/statistics/. The Computational Core Facility is administered by the
Georgetown University division of Advanced Research Computing (ARC) - http://arc.georgetown.edu. All
University Departments can access the resources of this facility. In addition, the facility supports several Ph.D
and Master's level personnel with extensive experience in programming and computational support including
systems administration and database programming. These personnel help ensure that faculty can take full
advantage of available resources, as well as planned NIH technology in the grid computing space. Thus, the
CCF provides an Institutionally facility for extensive scientific support for grants and contracts via access to
leading edge computational resources.
University Information Services (UIS) is charged with providing technology services, access to information,
and supporting administrative systems for the faculty, students, staff, and administration of Georgetown
University. In addition, UIS is responsible for creating a technology infrastructure to support electronic
communication -- voice, video, and data -- now and into the future. UIS operates under the direction of the Vice
President for Information Services and Chief Information Officer (CIO), with
guidance from various advisory groups.
GU Information Services - Video Teleconferencing is available to faculty and staff for professional purposes.
This service is made possible via the University’s phone system, a conventional TV, and a PolyCom View
Station. There are two rooms on campus that have been specially wired for teleconferencing.
Phylomics® Patent Application:
USPTO Application #: 20070259363
Inventors: H. Amri, M. Abu-Asab, and M. Chaouchi
Title: Phylogenetic analysis of mass spectrometry or gene array data for the diagnosis of
physiological conditions
Abstract: A universal data-mining platform capable of analyzing mass spectrometry
(MS) serum proteomic profiles and/or gene array data to produce biologically meaningful
classification; i.e., group together biologically related specimens into clades. This
platform utilizes the principles of phylogenetics, such as parsimony, to reveal
susceptibility to cancer development (or other physiological or pathophysiological
conditions), diagnosis and typing of cancer, identifying stages of cancer, as well as posttreatment evaluation. To place specimens into their corresponding clade(s), the invention
utilizes two algorithms: a new data-mining parsing algorithm, and a publicly available
phylogenetic algorithm (MIX). By outgroup comparison (i.e., using a normal set as the
standard reference), the parsing algorithm identifies under and/or overexpressed gene
values or in the case of sera, (i) novel or (ii) vanished MS peaks, and peaks signifying
(iii) up or (iv) down regulated proteins, and scores the variations as either derived (do not
exit in the outgroup set) or ancestral (exist in the outgroup set); the derived is given a
score of “1”, and the ancestral a score of “0”—these are called the polarized values.
Furthermore, the shared derived characters that it identifies are potential biomarkers for
cancers and other conditions and their subclasses. (end of abstract)
NIH Public Access
Author Manuscript
Proteomics Clin Appl. Author manuscript; available in PMC 2008 May 5.
NIH-PA Author Manuscript
Published in final edited form as:
Proteomics Clin Appl. 2008 February ; 2(2): 122–134.
Evolutionary medicine: A meaningful connection between omics,
disease, and treatment
Mones Abu-Asab1, Mohamed Chaouchi2, and Hakima Amri2,*
1Laboratory of Pathology, National Cancer Institute, National Institutes of Health, Bethesda, MD, USA
2Department of Physiology and Biophysics, School of Medicine, Georgetown University, Washington, DC,
USA
Abstract
NIH-PA Author Manuscript
The evolutionary nature of diseases requires that their omics be analyzed by evolution-compatible
analytical tools such as parsimony phylogenetics in order to reveal common mutations and pathways’
modifications. Since the heterogeneity of the omics data renders some analytical tools such as
phenetic clustering and Bayesian likelihood inefficient, a parsimony phylogenetic paradigm seems
to connect between the omics and medicine. It offers a seamless, dynamic, predictive, and
multidimensional analytical approach that reveals biological classes, and disease ontogenies; its
analysis can be translated into practice for early detection, diagnosis, biomarker identification,
prognosis, and assessment of treatment. Parsimony phylogenetics identifies classes of specimens,
the clades, by their shared derived expressions, the synapomorphies, which are also the potential
biomarkers for the classes that they delimit. Synapomorphies are determined through polarity
assessment (ancestral vs. derived) of m/z or gene-expression values and parsimony analysis; this
process also permits intra and interplatform comparability and produces higher concordance between
platforms. Furthermore, major trends in the data are also interpreted from the graphical representation
of the data as a tree diagram termed cladogram; it depicts directionality of change, identifies the
transitional patterns from healthy to diseased, and can be developed into a predictive tool for early
detection.
Keywords
Biomarkers; Cancer; Early detection; Evolution; Omics; Parsimony; Phylogenetics
NIH-PA Author Manuscript
1 Introduction
Evolution is the unifying theme of all biological disciplines, and all explanations of biological
phenomena should be compatible with evolutionary principles. Medicine cannot be an
exception. Yet, the vast majority of publications that incorporate recent advances in genomics
and proteomics are devoid of evolutionary reasoning and analytical methods. However, only
recently there are new calls for the need of evolution in medicine in order to provide
explanations for drug resistance in HIV and bacterial strains, autoimmune and degenerative
diseases, as well as cancer typing and treatment [1-3]. Cancer development, progression, and
maintenance are all evolutionary processes; they mirror similar evolutionary processes at the
cellular and population levels in that they all involve genetic modifications, selective pressure,
Correspondence: Dr. Mones Abu-Asab, Laboratory of Pathology, National Cancer Institute, NIH, Bldg. 10/Rm 2A33, Bethesda, MD
20892, USA, E-mail: [email protected], Fax: +1-301-480-9197. * Additional corresponding author: Dr. Hakima Amri; E-mail:
[email protected].
The authors have declared a conflict of interest. They will seek US patent rights for their UNIPAL algorithm.
Abu-Asab et al.
Page 2
and clonal propagation [2,4,5]. Therefore, evolution-compatible methods of analysis have a
potentially useful role in cancer studies and diagnosis as well [2,6,7].
NIH-PA Author Manuscript
Evolutionary medicine seeks to explain the nature of disease in light of evolutionary theory
[8]. It views the physicalities of the human body as a result of millions of years of natural
selection that present compromises between differentiation at all levels and vulnerabilities
[9]. Invoking evolution to explain medical phenomena will contribute to our understanding of
how evolution works in diseases and how to counter with the proper treatment.
NIH-PA Author Manuscript
One of the earliest studies of disease etiology by evolutionary criteria was that of Sarnat and
Netsky [5]. They described as “phylogenetic diseases” some of the degenerative and metabolic
diseases that occurred in the derived structures of the mammalian brain. However, Azzone
[4] attributed many diseases, such as cancer and autoimmunity, to mutations and their
sustenance by natural selection, two processes that are at the crux of the evolutionary course
of action. Since natural selective pressure is the main force determining diversity of living
organisms and their state of health and disease, the data produced by omics (genomics,
metabolomics, and proteomics) have to be analyzed in an evolutionary compatible way in order
to produce biologically meaningful interpretations [2]. The tool that can bridge the gap between
the omics data avalanche and evolutionary medicine is phylogenetics [10-12]. Phylogenetics
is an analytical paradigm based on the principles of evolution. It has been employed by
biologists in many disciplines such as botany, microbiology, and zoology, to construct
relationships in an evolutionary sense at all the levels of the systematic hierarchy and more
recently the tree of life [13]. Applying phylogenetic analysis to the omics data creates a
paradigm shift where the evolutionary meaning of the data is brought out and applied to produce
natural class determination, biomarker recognition, and modeling of the evolutionary processes
of disease development. Furthermore, phylogenetic analysis is the evolutionary path between
the omics data and their application in various practical settings. As the flowchart of Fig. 1
shows, there are only a few steps leading from raw data to applications: evolutionary polarity
assessment of data values, phylogenetic algorithmic analysis, and interpretation.
2 The omics need phylogenetics
NIH-PA Author Manuscript
As it has become more evident recently, solving many of the problems in biomedical research
is not going to be by producing more data, but rather by new methods of analyzing the data
[13]. Today’s omics data producing machines are sophisticated, sensitive, and accurate, and
in the absence of human errors, their output is reliable and reproducible [14]. However, the
over-reliance on parametric statistics for data analysis has reduced the useful inferences of
patterns within the data [15]. Inferring of biological processes from data patterns is the main
goal of bioinformatics [16], and a superior analytical tool has to be multidimensional. It must
be able to accurately reveal biological patterns, processes, and classes; possess high
predictivity; seamless and dynamic; able to combine several large datasets from multiple
sources; suitable for intra and interplatform comparability; produce higher interplatform
concordance; and its results can be utilized for early detection of disease, diagnosis, prognosis,
and assessment of treatment [2].
This review will demonstrate how phylogenetics appears to be the most suitable analytical tool
to provide the multidimensionality we are seeking, and its ability to translate the omics into a
clinical tool.
3 Choosing between phylogenetics and phenetic clustering
There are two main schools of analytical thought in the bioinformatics of omics: the phenetic
and the phylogenetic. The two differ on the relationship between the data values and the
classification [17]. The phenetic school is very predominant in the analysis of microarray
Proteomics Clin Appl. Author manuscript; available in PMC 2008 May 5.
Abu-Asab et al.
Page 3
NIH-PA Author Manuscript
expression data where it utilizes a distance matrix to produce dimensionless levels of difference
[18]; the grouping of specimens is solely based on overall raw similarity without any
evolutionary connotation (for an elaborate comparison of phenetic methods as applied to
microarrays see Planet et al. [19]). On the other hand, the phylogenetic school attempts to
reconstruct the biological processes from the known patterns within the data in an evolutionary
meaningful sense. It sorts out the data into derived and ancestral states and uses only the derived
ones to group specimens into a hierarchical model [10,11]. Because the two approaches differ
conceptually and algorithmically, they seldom produce total congruency in their results (Fig.
2).
NIH-PA Author Manuscript
Phylogenetics appears to be very suitable for studying diseases with fast arising mutations,
such as cancer, because it can modulate very recent divergence from normal conditions [20].
A phylogenetic analysis produces a hierarchical hypothesis of relationships (i.e. classification)
among specimens that aims to reflect relatedness based on shared mutations and altered
pathways. It reveals novel states of gene and protein expressions (these can be potential
biomarkers, see below) and utilizes their distribution patterns among specimens for modeling
their relatedness (i.e. groups specimens on the basis of their shared derived expressions but not
overall similarity). Furthermore, it elucidates the direction of change among specimens that
leads to their molecular and cellular diversity. The latter point is better illustrated with a
graphical tree termed cladogram where the specimens with the highest number of novel
expressions are located on the upper part of the tree (Fig. 2A).
To illustrate many of the fundamental differences between the two schools, we will examine
one dataset analyzed by both methods. Figure 2 shows a phylogenetic cladogram (A) and a
phenetic dendrogram (B), both are based on the same dataset [21]. To understand why the two
trees have different explanations for the same data, we need to discuss the theoretical difference
that they represent. Whereas the cladogram’s hierarchy reflects the similarity between the
specimens as based only on their shared derived expression values–data-based, the dendrogram
uses Pearson’s similarity coefficient of raw data (both ancestral and derived expression
values)–specimen-based. Pearson’s measures the correlation, r, between the specimens and
produces a matrix of pair wise similarity ratios between the specimens; the average similarities
are then calculated between groups of specimens to plot the dendrogram. Each node of the
cladogram is based on the derived expression values that are shared by the specimens located
above the node, and the segments’ lengths bear no evolutionary significance (unless a
molecular clock is assumed as in genomic-distance data). However, the relative lengths of
dendrogram’s segments are indicative of the percentages of shared average similarity between
the specimens or groups of specimens.
NIH-PA Author Manuscript
Furthermore, among the significant differences between the two schools is that a phylogenetic
analysis lessens the adverse effect of homoplasy while the phenetic does not [2,17]. Homoplasy
is similarity due to convergence, parallelism, and reversal—all are evolutionary phenomena.
Convergence occurs when two or more specimens have different developmental pathways for
a homologous character state; parallelism is independently acquiring similar non-homologous
states; and reversal is reverting to an ancestral state from a derived state. Homoplasies have a
more detrimental effect on the phenetic analysis than the phylogenetic because they get an
equal weight like all other similarity in the phenetic, while in the phylogenetic they compete
against all other hypotheses of character distributions to generate the most plausible
explanation of the data [22].
When considering information content, transmission, and retrieval, a phylogenetic
classification is the most effective and efficient; it allows the storage of data in the smallest
size diagram [17]. For example, because of its hierarchical nature, the cladogram indicates the
direction of change from low to high, the dendrogram does not. This characteristic of a
Proteomics Clin Appl. Author manuscript; available in PMC 2008 May 5.
Abu-Asab et al.
Page 4
NIH-PA Author Manuscript
cladogram gives significance to a specimen’s position on the cladogram and the arrangements
of branches. As we are going to see below, this allows the extrapolation from the cladogram
to a clinical setting.
4 The omics meet phylogenetics
In the field of phylogenetics, a plethora of publications has accumulated during the last 30
years where phylogeneticists differed as to which approach is the optimum one for data
analysis. There are three widely used methods to carry out phylogenetic analysis on omics data;
these are Bayesian, likelihood, and parsimony. The first two methods are statistically based
and related in that they require an explicit model of evolution such as homogenous rate of
mutations, while parsimony is a non-statistical method that uses the minimum number of steps
to explain the data. The three methods claim to produce the best hypothesis of relationships
among a group of specimens [15,19]. Maximum likelihood method was devised by Felsenstein
[23] to calculate the maximum likelihood function of a tree by incorporating specific
assumptions (such as Markovian evolution and Poisson substitution) and branch lengths–times
and mutation rates combined. The Bayesian approaches are derived from the likelihood method
to measure the maximum posterior probability of individual trees by a sampling mechanism
that incorporates branch lengths, substitution models, and their prior distribution [15].
NIH-PA Author Manuscript
When it concerns omics data, the choice of the analytical method to carry out a phylogenetic
analysis is based on the optimum hypothesis of character states’ distribution with highest
fidelity to the data matrix, and obtaining the sought after information namely biomarkers,
altered expressions and pathways, as well as disease classes (the clades). From a practical point
of view, there is an obvious conflict between our stated goals and the first two statistical-based
methods; their definitions of clades are irrelevant to medical interpretations, and their trees do
not allow the tracing back of derived states, which are the potential biomarkers. Furthermore,
the Bayesian and maximum likelihood approaches may be inefficient in dealing with
heterogeneous rates of mutations and large number of specimens [15,24], could mistakenly
attribute high probability to ambiguous groups, and may erroneously separate true sister groups
because of their unequally long branches [25].
NIH-PA Author Manuscript
Maximum parsimony requires fewer parameters estimation than maximum likelihood [26],
and functions better than Bayesian and likelihood when data are heterogeneous (i.e. have
various rates of mutation such as in cancer) [24,27,28]. Currently, we lack any predictive model
for most of the current diseases studied by omics to fulfill the parameters needed for a Bayesian
or likelihood analysis. The question of whether some diseases like cancer follow a
developmental model is still unanswered, although it is assumed that the specimens of a disease
share common pathway aberrations. A recent phylogenetic analysis of MS proteomes of three
cancers (ovarian, pancreatic, and prostate) has shown the probability of cancerous
developmental models that transcend the three types [2]. Furthermore, data analysis of genomic
and proteomic developments over a few clonal generations where the rate of change is fast and
heterogeneous, is more suitable for parsimony analysis rather than Bayesian or likelihood.
By identifying uniquely shared derived states (the synapomorphies), parsimony analysis
uncovers all the potential biomarkers within the dataset. It also defines natural classes, the
clades, which are circumscribed by the synapomorphies (Fig. 2A). The most parsimonious tree
it produces, the cladogram, maintains the same data pattern of the data matrix, therefore, the
fidelity of the cladogram’s character distribution to the original data is verifiable and can be
extended to a clinical setting.
The state of currently used analytical tools for omics data places the scientists between two
extremes, phenetic clustering and Bayesian/likelihood phylogenetics, without getting the
expected rewards from either. Neither method seems to be the suitable paradigm for the omics
Proteomics Clin Appl. Author manuscript; available in PMC 2008 May 5.
Abu-Asab et al.
Page 5
era. However, maximum parsimony appears to be the most optimum paradigm for the
phylogenetic analysis of the omics data [15,28].
NIH-PA Author Manuscript
5 Predictivity of a parsimony phylogenetic analysis
A major problem that has characterized many omics-based studies is the provisional nature of
their conclusions [29], also known in phylogenetics as low predictivity or lack of it. For a set
of specimens, there is only one correct hypothesis of relationships that is based on their profiles;
its predictive power is directly correlated with the robustness of the hypothesis—its capacity
to fulfill its predictions. It is well established that a phylogenetic classification has a higher
predictivity than a phenetic one [17]. Predictivity here is not the same as class prediction
[30]. While class prediction deals with assigning a specimen to a known class, predictivity is
the ability to list or predict a specimen’s characteristics when its class becomes known (i.e.
when biomarkers or an algorithmic analysis assigns the specimen to a class on the basis of its
omics profile). Therefore, predictivity is a statement of accuracy on the hypothesis of
relationships and its class definitions.
NIH-PA Author Manuscript
Predictivity is important when the hypothesis of relationships will be extended to and
implemented in a clinical setting, i.e. translating the phylogenetic classification into practice.
For example, high predictivity is needed in cancer diagnosis and prognosis where the
classification of cancers has been mostly based on microscopy and a few
immunohistochemistry markers. Furthermore, tumors with similar histopathology have shown
divergent clinical courses and outcomes [6,30]. Applying evolutionary parsimony
phylogenetics to cancer omics will produce a cancer classification that encompasses all types
of available data and is expected to have the highest degree of predictivity. Therefore, having
a predictive system of cancer classification brings higher objectivity to diagnosis and
prognosis, as well as robust biomarker identification.
6 Shared derived patterns: synapomorphies
6.1 General remarks
NIH-PA Author Manuscript
Parsimony phylogenetics is based on the principle that shared derived patterns, the
synapomorphies, can circumscribe natural groups called clades (s. clade). The shared derived
patterns in the omics context may constitute a number of novel changes that occur in the
specimens under study. These encompass all genetic mutations, novel and lost proteins, upand down-regulated proteins, over and under-expressed RNA, as well as dichotomously
asynchronous (DA) expression patterns of proteins and genes (see Section 8). A clade’s
members share one or more of these synapomorphies. For example, if only the specimens of
pancreatic cancer share a unique mutation that is not shared by any normal specimens, then
this mutation constitutes a synapomorphy, and specimens carrying the mutation are members
of a clade. However, as explained below, there is an exact procedure for determining what
constitutes a synapomorphy.
6.2 Evolutionary polarity assessment: identifying synapomorphies
Major omics techniques, such as MS proteomics and microarray, produce data in absolute
values that impose limitations on their use and interpretation due mainly to inconsistency in
reproducibility [14,29]. These data are usually utilized as similarity matrices for cluster
analyses or probed with custom algorithms in search of novel values. There are major
drawbacks for the direct use of absolute values in an analysis; prominent among them is the
limitation on comparability within and between platforms as well as the lack of directionality
within this form of data. Even the conversion of the data to similarity matrix by t and F statistics
as well as fold-change results in significant loss of meaningful information [19,31].
Proteomics Clin Appl. Author manuscript; available in PMC 2008 May 5.
Abu-Asab et al.
Page 6
NIH-PA Author Manuscript
The absolute data can be transformed into discrete values with evolutionary polarity assessment
(EPA) [2]. In two-state and multistate characters, EPA is used to determine the proper
evolutionary sequence of states, and consistent with this purpose, it is used here to sort out an
absolute value into one of two states: ancestral or derived. Putting this in a mathematical
context, the derived is given the value of 1, and 0 for the ancestral. The EPA process transforms
the initial data into discrete binary states of 0s and 1s.
To implement EPA on a dataset, the experimental design should include a control subset of
specimens; for example, when studying a cancer type, the control specimens should be healthy
non-cancerous specimens. The control specimens will be used as the outgroup against which
the values of the experimental specimens (the ingroup) will be compared. Table 1 illustrates
how the process of polarity assessment is carried out for every m/z or gene-expression values
of the experimental specimens. First, for every m/z point, the minimum and maximum values
of the controls are determined–the range; if the value of an experimental specimen falls within
the controls’ range then it is considered ancestral and is assigned the value 0; if it falls outside
the range, it is said to be derived and assigned a value of 1. Thus, the new transformed matrix
is a polarized matrix with 0s and 1s.
NIH-PA Author Manuscript
It is clear here that the number of control specimens used for an outgroup comparison is an
important criterion to correctly polarize the data and eliminate noise. For an analysis to be
meaningful and provide high predictivity, the number of normal specimens that incorporates
the maximum variation per population should be established [32]. An added advantage to the
EPA data-transforming process is that it diminishes data inconsistency—a difficult to control
noise that stems from several, mostly incontrollable, factors during the experiment and data
collection. However, noise reduction by EPA is handled by using control specimens in the
experiment as the outgroup for polarity assessment.
There are theoretical and practical implications to transforming the data through EPA. For
every specimen, the 1s represent the novel change that does not exist in the control outgroup,
and therefore, may be indicative of a genetic mutation or protein modification depending on
the data at hand. The 1s are called apomorphies (s. apomorphy); and if, all the experimental
specimens have 1 for the same data point, then this data point is a shared derived state and is
termed a synapomorphy. Therefore, all synapomorphies are potential biomarkers (see Section
7).
The 0s and 1s of a specimen make up its profile of ancestral and derived states. This profile
determines the relatedness of the specimen to other specimens through the apomorphies they
share—the synapomorphies. Therefore, class membership is determined by the competing
number of synapomorphies among the specimens on the basis of maximum parsimony.
NIH-PA Author Manuscript
There are several new ways of utilizing the polarized data in analysis that are not attainable
with the original absolute data such as pooling of datasets as well as intra- and interplatform
comparability. Polarized data from one experiment can be directly subjected to an algorithmic
analysis, or several polarized data from separate experiments with different specimens can be
pooled together to produce an inclusive analysis. Furthermore, genomic, metabolomic, and
proteomic data for the same set of specimens can be polarized separately, pooled in one matrix,
and analyzed together to produce an inclusive analysis based on the three sets. Table 1 shows
an example of pooling with two small real datasets. The number of polarized datasets that can
be pooled together is theoretically limitless. This type of pooling is possible because polarized
datasets have equal weight since each identifies the apomorphies of its specimens by discrete
rather than absolute values.
The hypothetical example of Table 1 illustrates the pooling of two datasets, MS proteome and
microarray for the same group of specimens. Each set was polarized using its own respective
Proteomics Clin Appl. Author manuscript; available in PMC 2008 May 5.
Abu-Asab et al.
Page 7
NIH-PA Author Manuscript
set of controls, and then the two polarized sets (right side of A and B) were pooled and analyzed
by a parsimony program. Although each of the two polarized datasets produced multiple
equally parsimonious cladograms, the inclusive matrix produced one most parsimonious
cladogram (C).
Using the data ranges of the control specimens provides higher stringency than using the
statistical means (see also Section 8 on complex patterns below). By utilizing the controls’
range as an ancestral criterion, every data value is evaluated individually to determine its
evolutionary polarity; the ones falling within the controls’ range are assigned an ancestral
status. Using the statistical means of the experimental specimens by averaging their values
does not exclude the values that fall within the controls’ range, distorts the significance of their
distribution, and prevents their tracking in the analysis. Furthermore, statistical means
misrepresent data points that violate normal distribution—the ones with DA distribution (Fig.
3).
7 Evolutionary definition of biomarkers as synapomorphies
NIH-PA Author Manuscript
As biomarker discovery is a highly sought after criterion in the omics data, one will favor the
analytical method that makes this process accurate, meaningful, and achievable. Parsimony
phylogenetic analysis differs from likelihood and phenetic methods in that it maintains the
identity of data points through the computational process, thus it allows the identification of
every significant shared derived value that defines the natural groups of the hierarchical
classification—the clades.
To carry out a parsimony phylogenetic analysis on a set of data, EPA of the data points is
needed to determine whether a gene expression or m/z value is derived or ancestral. A shared
derived state, a synapomorphy, among a number of diseased specimens is a potential biomarker
for the group. Equating biomarkers with synapomorphies has an evolutionary connotation
because it defines a natural group of specimens sharing similar expression and declares their
ontogenic relatedness. This logical definition of what constitutes a biomarker requires a clear
declaration that the biomarker is derived in relation to the controls and shared by all the
members of its clade.
A synapomorphic biomarker can be supported by other synapomorphies that circumscribe the
same clade; the higher the number of synapomorphies the higher the confidence in the
predictivity of the selected biomarker. In addition, the occurrence of several synapomorphies
for a clade offers more choices for selecting the optimum biomarker.
8 Incorporating complex patterns of omics
NIH-PA Author Manuscript
Proteomic and genomic data contain expression patterns that cannot simply be reduced to a
statistical abstraction for data analysis. Such patterns are misrepresented when transformed
into means, compared in fold-changes, or excluded from the analysis due to their complexity.
One such pattern that is pervasive in cancer specimens is the DA distribution of gene
expressions and proteins in a group of specimens [33]. The term dichotomous refers to a twopeak distribution with one peak above and the other below the range of control/normal
specimens, while asynchronous denotes deviation from the normal range (Fig. 3). DA seems
to be a population phenomenon that is only noticeable when a good number of specimens are
included in the study.
Statistical methods of analysis usually average the values of the specimens in the study in order
to carry out comparisons by either t- and F-statistics or fold-change, and therefore, misrepresent
and overlook any meaningful interpretation of the distribution pattern of the DA expressions.
The presence of several to many DA gene expressions and proteins in a set of specimens
Proteomics Clin Appl. Author manuscript; available in PMC 2008 May 5.
Abu-Asab et al.
Page 8
NIH-PA Author Manuscript
underscores a complex pattern of pathway diversity that is difficult to model or classify in a
simple phenetic clustering, but can be dealt with effectively and meaningfully in a parsimony
phylogenetic context. Evolutionary polarity assessment for this phenomenon should consider
it a multistate character and assigns different symbols for the values above and below the
normals’ range; a parsimony phylogenetic analysis will then deal with each of these states as
independent from one another. For a discussion and examples of multistate coding and analysis
see Felsenstein’s instructions for using PHYLIP [34].
Although the DA pattern is a known phenomenon to scientists, and has been reported in tissues
and cell lines [35,36], there has been no meaningful explanation for such variation or analytical
considerations, and its evolutionary implications are still unknown. However, it may be related
to the presence of several developmental pathways of cancer and other diseases [2].
9 Parsimony phylogenetic analysis of omics: an example
Currently available computing power permits the processing of large size matrices with relative
speed. It is possible now to run a large data matrix with hundreds of specimens and tens of
thousands of data points per specimen within a reasonable time [15]. During our
experimentation with polarized proteomic data matrices on the parsimony program MIX, we
managed to run a 23 × 106-point matrix (180 specimens) in 18 h on a 3.2 GHz CPU.
NIH-PA Author Manuscript
We have selected the parsimony program MIX of Felsenstein [34] to carry out a maximum
parsimony phylogenetic analysis because of its speed, reliability, available settings, and output
format (MIX is freely available from http://evolution.gs.washington.edu/phylip.html). MIX is
part of the PHYLIP package that contains a number of other applications that can be utilized
for a number of phylogenetic analyses for the same dataset such as likelihood and distance.
All of these programs are controlled by a menu that allows options for the analysis.
There are only a few examples of phylogenetic analysis of omics data [2,6,7,37]. One of the
practical problems for many researchers that limit their usage of phylogenetic programs is the
transformation process of the raw data to an input format that is acceptable by the program.
Figure 4 presents an example of a parsimony phylogenetic analysis of MS SELDI-TOF
proteomic serum data of prostate cancer patients and healthy men. The normalized raw data
was polarized according to the polarity assessment method described in Section 6.2 and
explained by an example in Table 1. However, this process was automated and carried out here
by a computer program written by the authors (UNIPAL, Universal Polarity Assessment
Algorithm) [2].
NIH-PA Author Manuscript
UNIPAL transformed the original raw data into a matrix of 0s and 1s for all the cancerous
specimens by using the normal specimens’ range for every m/z point as the baseline for polarity
assessment (outgroup comparison). Then, the newly produced polarized matrix was processed
with MIX to run a Wagner maximum parsimony. There are two output files produced by MIX,
both in text format and can be read by any text reader program: the Outfile and the Treefile.
The Outfile has all the equally parsimonious trees that data supports in a graphical format; and
a list of all the synapomorphies supporting every node can be produced if one invokes this
option. The Treefile lists the same trees as in the Outfile in a text format; this can be used to
draw and edit the trees in other programs such as TreeView [38].
MIX produced one most parsimonious cladogram for this dataset that is based on 36 prostate
cancer patients and 49 healthy men (Fig. 4). One most parsimonious cladogram means that
MIX found only one tree that has the smallest number of steps to explain the relationship
between the specimens. With other datasets, MIX may produce several equally parsimonious
cladograms with some variation in minor branches. An interpretation of the topology of the
cladogram and what it reveals about the trends in the dataset is discussed below.
Proteomics Clin Appl. Author manuscript; available in PMC 2008 May 5.
Abu-Asab et al.
Page 9
10 The structure of an omics cladogram
NIH-PA Author Manuscript
The cladogram is the graphical representation of the hypothesized hierarchical relationships
among the specimens that defines classes of specimens. The tip of each line of the cladogram
denotes a specimen. It is the most efficient summary of the information contained in the raw
data [17]. Each node on the cladogram is justified by the shared derived state(s) among the
specimens of one of the segments.
NIH-PA Author Manuscript
The topology of the cladogram also conveys general trends within the data that are not obvious
otherwise by other types of analysis [2]. Our own analyses of several MS proteomic and
microarray data have indicated that there are three distinct sections of the cladogram: the basal,
the middle, and upper. The basal contains most of the normal specimens; the middle has the
transitional specimens between the normal and cancer, and the upper section has the cancerous
ones. Figure 4 shows a parsimonious cladogram of MS proteomic serum specimens that were
taken from healthy and diseased men. Its upper section has a dichotomy defining two major
clades of the prostate cancer; both are more or less equal in size. The basal section is restricted
to healthy specimens; it has a well-defined large clade encompassing the majority of specimens,
one minor clade below the large one, and a few single-specimen clades at the bottom.
Additionally, the middle section has mostly single-specimen clades in tandem that are of
normal specimens and cancerous ones. The lower part of this section has the normal clades
and upper part has the cancerous clades.
Similar cladogram topology exists so far in all large genomic and proteomic datasets that we
have analyzed thus far [2]. This makes the cladogram the only tool thus far that could identify
the transitional patterns from healthy to cancerous, and possibly renders it a predictive tool for
early disease detection.
11 Testing the congruence of omics’ data
Incongruity of omics data is a criterion that has become a topic of serious debate [14,29], and
the field is in need of robust method for testing congruence. The parsimony phylogenetic
analysis as described here offers an evolutionary approach for testing concordance of datasets.
In addition to carrying out inclusive analysis by the pooling of multiple omics datasets, several
other data processes are possible under this model. Thus far we have focused in our presentation
and discussion on examples of high-throughput data experiments, the approach outlined here
is also applicable to many other types of data such as 2-D gels, as well as chromosomal and
genomics data. As long as data polarity can be determined, a parsimony analysis is most likely
achievable.
NIH-PA Author Manuscript
Interplatform comparability is attainable here at two levels: first, by testing the congruence of
the data from two or more sources for the same set of specimens (for example, does proteomic
data produce the same classification as genomic data?); secondly, by testing the congruence
of the synapomorphies (the potential biomarkers) among different sets of specimens. Having
multiple datasets for the same specimens allows the testing for the data accuracy and the
robustness of the classification hypothesis [20,39].
Small sample size and variable methods of analysis are two chronic problems in published
studies and experimental designs, which prevent direct comparisons of the results and
conclusions. However, applying EPA and parsimony analysis will enable us to test the
congruity of an experiment at the two levels mentioned above. The phylogenetic model
produced higher congruence in synapomorphies of two separate sets of specimens from the
same tissue type. We tested the concordance of two published studies [21,40] on uterine
fibroids (leiomyomas) by comparing their two lists of synapomorphies, and found 62%
concordance in synapomorphies despite the variation in the number of probes between the two
Proteomics Clin Appl. Author manuscript; available in PMC 2008 May 5.
Abu-Asab et al.
Page 10
datasets; which was greatly enhanced in comparison with the 13% concordance shown between
the published statistical gene lists of the two studies.
NIH-PA Author Manuscript
Because of its hierarchical nature, phylogenetic classification makes it possible to test gene
linkage and specific alterations to pathways. The path of synapomorphies from the base of the
cladogram to its tip is a sequential developmental map for the successional events that produce
the different stages of the disease and the diversity of its specimens. These synapomorphies
are the shared derived alterations that we need to identify since they will elucidate the disease
etiology and are the biomarkers of its various stages.
12 Translating phylogenetic analysis of omics into clinical practice
The practical aspect of phylogenetic analysis can be realized in various ways: better diagnosis
of diseases and disorders through evolutionary classification that reflects the real ontogeny and
phylogeny of disease, better treatment by fine targeting of pathways, and better assessment of
health status from faster and cost-effective omics data analysis. The following illustrates
through theoretical clinical scenarios the potential applications of parsimony phylogenetic
analysis of omics in a clinical setting.
Scenario A: Routine health assessment
NIH-PA Author Manuscript
The health status of the individual can be routinely assessed during a routine checkup from a
small blood specimen (<0.5 mL) for early detection of degenerative diseases and cancer. The
serum fraction is submitted for proteomic MS analysis, and the spectra are analyzed using
parsimony phylogenetics against the serum of control specimens (healthy and diseased). The
location of this individual on the cladogram (within healthy, diseased, or transitional clades,
Fig. 5) determines the health status (healthy, diseased, or transitioning from healthy to disease).
A specimen of a healthy individual assembles within the healthy clades.
Scenario B: Early detection and prevention
Individuals located within the transitional clade, nested between the healthy and cancer
specimens (in this example, Fig. 5), are at-risk of developing cancer. Therefore, the at-risk
individuals are accumulating mutations that are making them susceptible but have not yet
reached clinical manifestations. In the evolutionary medical paradigm offered by the parsimony
phylogenetic analysis, this person would be “at-risk” of developing disease/cancer. Preventive
medicine could play a major role in this case.
Scenario C: Diagnosis
NIH-PA Author Manuscript
If the phylogenetic analysis places the individual’s proteomic specimen within the cancer
clades (Fig. 5), then the patient is a cancer carrier. We have demonstrated before, but have not
yet validated, that for three cancer types (ovarian, pancreatic, and prostate) each produced its
own clades separate from the other two, therefore, it is possible in a comprehensive analysis
to place the patient within the respective type of cancer [2].
Scenario D: Post-treatment evaluation and prognosis
Depending on the position of the patient’s proteomic specimen within the cancerous clades
(basal, middle, or terminal) of the analysis cladogram, the cancer clinical stage can be
determined. The staging here is based on the derived mutations that patient carries—the
apomorphies.
After a course of treatment – chemotherapy, radiation, surgery – the patient’s progress can be
evaluated by a proteomic-phylogenetic analysis of their serum. If the location of the patient’s
specimen on the analysis cladogram has moved from the cancerous clades to the normal, then
Proteomics Clin Appl. Author manuscript; available in PMC 2008 May 5.
Abu-Asab et al.
Page 11
the treatment has succeeded. Follow up and monitoring can be carried out periodically by this
minimally invasive and cost-effective method.
NIH-PA Author Manuscript
13 Conclusions
NIH-PA Author Manuscript
Evolutionary analysis and interpretation of the omics data offer a unifying paradigm for the
various types of the data and provides a multidimensional application of the analysis in a
medical context. Cellular processes involved in disease development recapitulate evolutionary
processes; they involve genetic modifications, selective pressure, and clonal propagation.
Furthermore, diseases are currently assumed to be natural classes and subclasses with each
having its own unique aberrations in developmental pathways. Thus, employing of
evolutionary polarity assessment to sort out uniquely derived omics states coupled with
parsimony phylogenetic analysis seem to provide a predictive, seamless, and dynamic
evolutionary classification of the specimens that accurately reveal biological classes, patterns,
and processes. This parsimonious paradigm is also capable of combining several large datasets
from multiple sources for inclusive analyses, produces higher interplatform concordance, and
offers intra and interplatform comparability. Additionally, a parsimonious cladogram reveals
the directionality of change within a set of specimens, and could be utilized for early detection,
diagnosis, prognosis, assessment of treatment, and biomarker identification. The parsimony
phylogenetic approach could also serve as the basis for the individualized medicine of the
21st century.
References
NIH-PA Author Manuscript
1. Nesse RM, Stearns SC, Omenn GS. Medicine needs evolution. Science 2006;311:1071. [PubMed:
16497889]
2. Abu-Asab M, Chaouchi M, Amri H. Phyloproteomics: what phylogenetic analysis reveals about serum
proteomics. J Proteome Res 2006;5:2236–2240. [PubMed: 16944935]
3. Shackney SE, Silverman JF. Molecular evolutionary patterns in breast cancer. Adv Anat Pathol
2003;10:278–290. [PubMed: 12973049]
4. Azzone GF. The nature of diseases: evolutionary, thermodynamic and historical aspects. Hist Philos
Life Sci 1996;18:83–106. [PubMed: 8940904]
5. Sarnat HB, Netsky MG. Hypothesis: Phylogenetic diseases of the nervous system. Can J Neurol Sci
1984;11:29–33. [PubMed: 6704791]
6. Pennington G, Smith CA, Shackney S, Schwartz R. Expectation-maximization method for
reconstructing tumor phylogenies from single-cell data. Comput Syst Bioinformatics Conf 2006:371–
380. [PubMed: 17369656]
7. Desper R, Khan J, Schaffer AA. Tumor classification using phylogenetic methods on expression data.
J Theor Biol 2004;228:477–496. [PubMed: 15178197]
8. Nesse RM. How is Darwinian medicine useful? West J Med 2001;174:358–360. [PubMed: 11342524]
9. Culotta E, Pennisi E. Breakthrough of the year: evolution in action. Science 2005;310:1878–1879.
[PubMed: 16373538]
10. Felsenstein, J. Inferring phylogenies. Sinauer Associates; Sunderland, MA: 2004.
11. Wiley, EO. Phylogenetics: The Theory and Practice of Phylogenetic Systematics. John Wiley and
Sons; New York: 1981.
12. Hennig, W. Phylogenetic systematics. University of Illinois Press; Urbana: 1966.
13. Whitfield J. Linnaeus at 300: we are family. Nature 2007;446:247–249. [PubMed: 17361152]
14. The MicroArray Quality Control (MAQC) project shows inter- and intraplatform reproducibility of
gene expression measurements. Nat Biotechnol 2006;24:1151–1161. [PubMed: 16964229]
15. Goloboff, PA.; Pol, D. Parsimony, phylogeny, and genomics. Albert, VA., editor. Oxford University
Press; Oxford, New York: 2005. p. 148-159.
16. Gascuel, O. Mathematics of evolution and phylogeny. Gascuel, O., editor. Oxford University Press;
New York: 2005. p. 1-8.
Proteomics Clin Appl. Author manuscript; available in PMC 2008 May 5.
Abu-Asab et al.
Page 12
NIH-PA Author Manuscript
NIH-PA Author Manuscript
NIH-PA Author Manuscript
17. Farris JS. The information content of the phylogenetic system. Syst Zool 1979;28:483–519.
18. Albert, VA. Parsimony, phylogeny, and genomics. Albert, VA., editor. Oxford University Press;
Oxford, New York: 2005. p. 1-11.
19. Planet PJ, DeSalle R, Siddall M, Bael T, et al. Systematic analysis of DNA microarray data: ordering
and interpreting patterns of gene expression. Genome Res 2001;11:1149–1155. [PubMed: 11435396]
20. Kumazawa Y, Nishida M. Sequence evolution of mitochondrial tRNA genes and deep-branch animal
phylogenetics. J Mol Evol 1993;37:380–398. [PubMed: 7508516]
21. Quade BJ, Wang TY, Sornberger K, Dal Cin P, et al. Molecular pathogenesis of uterine smooth muscle
tumors from transcriptional profiling. Genes Chromosomes Cancer 2004;40:97–108. [PubMed:
15101043]
22. Mickevich MF. Taxonomic congruence. Syst Zool 1978;27:143–158.
23. Felsenstein J. A likelihood approach to character weighting and what it tells us about parsimony and
compatibility. Biol J Linnean Soc 1981;16:183–196.
24. Stefankovic D, Vigoda E. Phylogeny of mixture models: robustness of maximum likelihood and nonidentifiable distributions. J Comput Biol 2007;14:156–189. [PubMed: 17456014]
25. Siddall ME. Success of parsimony in the four-taxon case: long-branch repulsion by likelihood in the
Farris zone. Cladistics 1998;14:209–220.
26. Goloboff PA. Parsimony, likelihood, and simplicity. Cladistics 2003;19:91–103.
27. Stefankovic D, Vigoda E. Pitfalls of heterogeneous processes for phylogenetic reconstruction. Syst
Biol 2007;56:113–124. [PubMed: 17366141]
28. Kolaczkowski B, Thornton JW. Performance of maximum parsimony and likelihood phylogenetics
when evolution is heterogeneous. Nature 2004;431:980–984. [PubMed: 15496922]
29. Coombes KR, Morris JS, Hu J, Edmonson SR, Baggerly KA. Serum proteomics profiling–a young
technology begins to mature. Nat Biotechnol 2005;23:291–292. [PubMed: 15765078]
30. Golub TR, Slonim DK, Tamayo P, Huard C, et al. Molecular classification of cancer: class discovery
and class prediction by gene expression monitoring. Science 1999;286:531–537. [PubMed:
10521349]
31. Allison DB, Cui X, Page GP, Sabripour M. Microarray data analysis: from disarray to consolidation
and consensus. Nat Rev Genet 2006;7:55–65. [PubMed: 16369572]
32. Graybeal A. Is it better to add taxa or characters to a difficult phylogenetic problem? Syst Biol
1998;47:9–17. [PubMed: 12064243]
33. Lyons-Weiler J, Patel S, Becich MJ, Godfrey TE. Tests for finding complex patterns of differential
expression in cancers: towards individualized medicine. BMC Bioinformatics 2004;5:110. [PubMed:
15307894]
34. Felsenstein J. PHYLIP: Phylogeny Inference Package (Version 3.2). Cladistics 1989:164–166.
35. Fulda S, Poremba C, Berwanger B, Hacker S, et al. Loss of caspase-8 expression does not correlate
with MYCN amplification, aggressive disease, or prognosis in neuroblastoma. Cancer Res
2006;66:10016–10023. [PubMed: 17047064]
36. Reed JC, Meister L, Tanaka S, Cuddy M, et al. Differential expression of bcl2 protooncogene in
neuroblastoma and other human tumor cell lines of neural origin. Cancer Res 1991;51:6529–6538.
[PubMed: 1742726]
37. Uddin M, Wildman DE, Liu G, Xu W, et al. Sister grouping of chimpanzees and humans as revealed
by genome-wide phylogenetic analysis of brain gene expression profiles. Proc Natl Acad Sci USA
2004;101:2957–2962. [PubMed: 14976249]
38. Page RD. TreeView: an application to display phylogenetic trees on personal computers. Comput
Appl Biosci 1996;12:357–358. [PubMed: 8902363]
39. Miyamoto MM, Fitch WM. Testing species phylogenies and phylogenetic methods with congruence.
Syst Biol 1995;44:64–76.
40. Hoffman PJ, Milliken DB, Gregg LC, Davis RR, Gregg JP. Molecular characterization of uterine
fibroids and its implication for underlying mechanisms of pathogenesis. Fertil Steril 2004;82:639–
649. [PubMed: 15374708]
Proteomics Clin Appl. Author manuscript; available in PMC 2008 May 5.
Abu-Asab et al.
Page 13
Abbreviations
DA
NIH-PA Author Manuscript
dichotomously asynchronous
EPA
evolutionary polarity assessment
Glossary
Definitions of terms
Clade
a group of specimens sharing one or more synapomorphies
Cladogram
a graphic classification of hierarchical relationships among specimens based on
the synapomorphies (shared derived characters). It is also a summary of trends
that occur within the data, and shows the directionality of accumulation of change
with the highest number of synapomorphies shared by the specimens that are
closer to the upper part of the cladogram
NIH-PA Author Manuscript
Dendrogram
a tree diagram used to graphically illustrate the arrangement of the clusters
produced by a phenetic clustering algorithm (see Phenetic Clustering).
Dendrogram is produced in computational biology (e.g. microarray analysis) to
illustrate similarity and gene-linkage
Dichotomous asynchronicity
a two-tailed pattern of protein or gene expression in a number of specimens with
a physiological abnormality (e.g. cancer) in comparison with the normal
specimens. Usually, the values of m/z protein or gene expression of the abnormal
specimens are outside the range of the normal specimens (i.e. above and below
the normal range)
Dynamic classification
a classification that has the capacity to incorporate novel specimens without
major alterations to the composition of its main groups or their relationships
NIH-PA Author Manuscript
Evolutionary medicine
a branch of medicine that seeks to explain the nature of disease in an evolutionary
context
Homoplasy
similarity due to convergence, parallelism, or reversal. Convergence occurs when
two or more specimens have different developmental pathways for a homologous
character state; parallelism is independently acquiring similar non-homologous
states; and reversal is reverting to an ancestral state from a derived state
Ingroup
the group of specimens under study, for example, diseased specimens
Interplatform comparability
the ability to compare several datasets produced on different platforms.
Evolutionary polarity assessment transforms the omics data into polarized
matrices of discrete values (0/1), therefore, makes possible the ability to compare
Proteomics Clin Appl. Author manuscript; available in PMC 2008 May 5.
Abu-Asab et al.
Page 14
two or more separate experiments, and/or to combine several experiments in one
large analysis
NIH-PA Author Manuscript
Interplatform concordance
the intersection (sharing) of significant m/z values and gene-lists produced by
two or more separate experiments. The higher the concordance between
independent experiments the larger the number of common proteins and genes,
and the more significant the results
Mass to charge ratio (m/z)
a unique value of a protein’s mass (m) to the total charge (z) it carries. The m/z
is the value obtained from laser desorption mass spectrometry machines such as
SELDI or MALDI
Outgroup
a group of specimens used to polarize the ingroup values of m/z or gene
expression into ancestral (plesiomorphic) and derived (apomorphic)
Parsimony
NIH-PA Author Manuscript
means simplicity, the preferred hypothesis is the one requiring the least number
of explanations (Occam’s Razor). In the omics context, the preferred
phylogenetic cladogram is the one that requires the least number of steps to
construct it from the polarized data matrix
Phenetic clustering
grouping specimens on the basis of similarity without a priori sorting of similarity
into ancestral and derived. Phenetics does not provide any information about the
evolutionary phylogenetic relationships among specimens
Phylogenetic classification
a classification that uses synapomorphies to delimit clades (i.e. monophyletic
groups), it provides evolutionary phylogenetic relationships among specimens
Polarity assessment
also known as outgroup comparison. It is the basis of sorting out the data values
(whether proteomic [m/z], or microarray expression values) into ancestral and
derived. In large datasets, it transforms absolute numbers of data values into
polarized binary numbers (0/1), where zero (0) signifies ancestral and one (1)
signifies derived
NIH-PA Author Manuscript
Predictive classification
a classification that reveals the characteristics of a specimen when its place in the
classification is determined
Synapomorphy & biomarker
a shared derived protein or gene expression value in comparison with a number
of normal specimens (the outgroup). A protein synapomorphy may have one of
the following conditions: (i) a novel protein, (ii) a disappeared protein, (iii) upregulated protein, (iv) down-regulated protein, and (v) dichotomously
asynchronous regulated protein (the m/z values are above and below the normals’
range but not within the normals’ range). A gene synapomorphy may have one
of the following conditions: (i) over-expressed value above normals’ range, (ii)
under-expressed value below the normals’ range, (iii) dichotomously
asynchronous values, and (iv) undetectable expression value. Biomarkers are
unique proteins or gene expressions that could delimit (or characterize) a group
of specimens sharing a physiological condition. Because synapomorphies also
Proteomics Clin Appl. Author manuscript; available in PMC 2008 May 5.
Abu-Asab et al.
Page 15
NIH-PA Author Manuscript
group together specimens sharing uniquely derived protein or gene expression
into clades (i.e. every clade has its own set of synapomorphies), these
synapomorphies are potential biomarkers for the clade they define
NIH-PA Author Manuscript
NIH-PA Author Manuscript
Proteomics Clin Appl. Author manuscript; available in PMC 2008 May 5.
Abu-Asab et al.
Page 16
NIH-PA Author Manuscript
NIH-PA Author Manuscript
Figure 1.
Flowchart outlining the various stages of an evolutionary phylogenetic analysis of omics data,
as well as the interpretation and translation of the analysis results into a clinical setting.
NIH-PA Author Manuscript
Proteomics Clin Appl. Author manuscript; available in PMC 2008 May 5.
Abu-Asab et al.
Page 17
NIH-PA Author Manuscript
NIH-PA Author Manuscript
Figure 2.
Phylogenetics vs. Phenetics. (A) Phylogenetic cladogram based on maximum parsimony
analysis, and (B) phenetic dendrogram based on Pearson’s correlation for the same data set
[21]. While the cladogram resolves the relationship between the leiomyoma and
leiomyosarcoma specimens by finding 32 uniquely expressed synapomorphies shared by both
groups and 20 synapomorphies distinguishing leiomyosarcomas from the leiomyomas, the
dendrogram fails to resolve this relationship and clusters the leiomyomas with normal
myometrium specimens. The cladogram has directionality for accumulated synapomorphies,
and the dendrogram does not. For example, the cladogram indicates that the leiomyosarcoma
specimen GSM11779 has the highest number of synapomorphies, and GSM11769 has the
lowest. Dataset GDS533 available at http://www.ncbi.nlm.nih.gov/geo/.
NIH-PA Author Manuscript
Proteomics Clin Appl. Author manuscript; available in PMC 2008 May 5.
Abu-Asab et al.
Page 18
NIH-PA Author Manuscript
NIH-PA Author Manuscript
NIH-PA Author Manuscript
Figure 3.
Dichotomously asynchronous protein and gene-expression. Two-tailed distribution that occurs
in a group of cancerous specimens. (A) Protein intensity at m/z 12 215 of 11 specimens of
prostate cancer and 17 normals; six cancerous specimens show upregulation and five
downregulations. (B) RNA signal intensity of ten specimens of uterine leiomyosarcoma
showing four specimens overexpressing and six underexpressing Akt1.
Proteomics Clin Appl. Author manuscript; available in PMC 2008 May 5.
Abu-Asab et al.
Page 19
NIH-PA Author Manuscript
NIH-PA Author Manuscript
Figure 4.
NIH-PA Author Manuscript
A most parsimonious cladogram produced by MIX for MS serum proteomic data of 36 prostate
cancer patients and 49 healthy men. Each specimen had 15 144 m/z data points; polarity
assessment was carried out by UNIPAL. Each line that ends on the right side of the figure
represents a specimen. The red part of the cladogram indicates the cancerous specimens as
diagnosed before the experiment; the green section indicates the healthy specimens; and the
blue shows the presumed healthy specimens that seem to form a transitional zone between the
healthy and cancerous clades.
Proteomics Clin Appl. Author manuscript; available in PMC 2008 May 5.
Abu-Asab et al.
Page 20
NIH-PA Author Manuscript
NIH-PA Author Manuscript
Figure 5.
Translating the parsimony phylogenetic analysis of omics into clinical practice. A schematic
topology of a typical proteomic cladogram of a cancer analysis. There are two major cancerous
clades at the upper section of the cladogram; transitional clades in the middle section; and the
basal healthy clades. Adding an unknown specimen to an analysis will have three possible
scenarios: scenario A indicates the likely location of a healthy specimen within the healthy
clades; scenario B places a specimen from a susceptible individual with the transitional clades
between the healthy and cancerous clades; and scenario C would locate a cancerous specimen
within one of the two major cancer clades. A post-treatment analysis may change the location
from the cancerous to transitional or healthy clades depending on the treatment’s efficacy.
NIH-PA Author Manuscript
Proteomics Clin Appl. Author manuscript; available in PMC 2008 May 5.
NIH-PA Author Manuscript
Table 1
NIH-PA Author Manuscript
NIH-PA Author Manuscript
PTAFR
PRKCD
ACAT1
MARCKS
OGDH
CRK
CHKA
GPR109B
HTR1B
IL2RG
Gene
928.70
979.43
1018.58
1110.48
1228.72
1304.16
1407.12
1511.54
1623.71
1711.73
m/z
18
371.6
184.6
446.4
142.3
22.8
175
44.8
90.7
419.4
A
115
222
145
200
105
142
114
143
245
157
A
70.1
418.8
189
458.2
145.3
28
140.1
29
72.6
356.7
B
59
145
76
126
84
94
70
111
94
91
B
21.2
375.9
200.2
493.2
238.5
28.7
110.8
30.8
37.3
427.3
C
163
273
225
257
223
211
217
249
263
222
C
11.6
447.5
217
556.5
187.5
32.1
159.8
27.3
40.7
332.5
D
Controls
137
263
200
271
180
201
196
241
214
183
D
Controls
19.7
353
235.1
372.1
162.8
25
180.1
41
48.4
399.4
E
131
256
196
186
176
134
150
211
183
164
E
11.6
353
184.6
372.1
142.3
22.8
110.8
27.3
37.3
332.5
Min
59
145
76
126
84
94
70
111
94
91
Min
45
97
72
81
69
79
62
75
73
71
F
70.1
447.5
235.1
556.5
238.5
32.1
180.1
44.8
90.7
427.3
Max
93.4
239.4
196.8
324.3
171.9
41.6
121.9
147.4
38.1
297.7
F
B. Gene Expression Data
163
273
225
271
223
211
217
249
263
222
Max
Intensity
A. MS Proteomic Data
29
178.8
144.4
340.7
152.2
43.4
70.3
65.8
66.7
565.6
G
62
111
101
85
91
90
96
120
115
121
G
9.1
260.1
321.2
338.8
123.4
11.1
215.6
53.9
42.7
169.7
H
Experimentals
62
111
75
102
72
66
72
85
95
72
H
Experimentals
34.4
267
101.6
437.2
158.4
28.5
80
18.6
47.2
287.4
I
577
379
665
319
222
123
109
157
151
115
I
89.6
270.2
138.7
501.2
175.8
23.8
130.6
35.1
57.1
387.7
J
145
238
208
225
202
186
204
258
200
179
J
1
1
0
1
0
1
0
1
0
1
F
1
1
1
1
1
1
1
1
1
1
F
0
1
1
1
1
1
0
1
0
1
H
1
1
1
1
0
0
0
0
0
0
I
0
1
1
1
0
1
1
1
0
1
G
1
1
1
1
1
1
1
1
0
1
H
0
1
1
0
0
0
1
1
0
1
I
Polarized Values of
Experimentals
0
1
0
1
0
1
0
0
0
0
G
Polarized Values of
Experimentals
1
1
1
0
0
0
0
0
0
0
J
0
0
0
0
0
0
0
1
0
0
J
From omics to cladogram. The process of evolutionary polarity assessment is illustrated by using a sample of MS proteomic data, (A), and gene expression
data, (B). Each dataset consists of ten specimens: five controls and five experimental. The minimum and maximum (i.e. range) of the controls are determined
for each m/z point and gene, and then intensity values of the experimental specimens are transformed to either 0 or 1 depending on whether its value is within
the controls’ range, or outside, respectively. The polarized values from the proteomic data, (A), and gene-expression, (B), are pooled together and processed
in an algorithmic parsimony analysis (MIX) to produce the consensus cladogram (C). The synapomorphies for each clade are listed at its node
Abu-Asab et al.
Page 21
Proteomics Clin Appl. Author manuscript; available in PMC 2008 May 5.
B
NIH-PA Author Manuscript
A
C
D
Controls
E
NIH-PA Author Manuscript
m/z
Min
Max
Intensity
F
G
H
Experimentals
I
J
F
G
H
I
Polarized Values of
Experimentals
NIH-PA Author Manuscript
A. MS Proteomic Data
J
Abu-Asab et al.
Page 22
Proteomics Clin Appl. Author manuscript; available in PMC 2008 May 5.
OMICS: A Journal of Integrative Biology
OMICS: A Journal of Integrative Biology: http://mc.manuscriptcentral.com/omics
Phylogenetic Modeling of Heterogeneous Gene-Expression Microarray
Data from Cancerous Specimens
r
Fo
Journal:
Manuscript ID:
Manuscript Type:
Complete List of Authors:
OMI-2008-0010
Original Article
26-Feb-2008
Pe
Date Submitted by the
Author:
OMICS: A Journal of Integrative Biology
er
Abu-Asab, Mones; NCI, Lab of Path
Chaouchi, Mohamed; Georgetown University, Department of
Physiology and Biophysics
Amri, Hakima; Georgetown University, Department of Physiology
and Biophysics
Cancer, DNA Microarrays, Gene Expression, Data Analysis,
Biological Databases
ew
vi
Re
Keyword:
Mary Ann Liebert, Inc., 140 Huguenot Street, New Rochelle, NY 10801
Page 1 of 51
Phylogenetic Modeling of Heterogeneous GeneExpression Microarray Data from Cancerous Specimens
Mones S. Abu-Asab1*, Mohamed Chaouchi2*, and Hakima Amri2§
1
Laboratory of Pathology, National Cancer Institute, National Institutes of Health,
r
Fo
Bethesda, MD 20892, USA. Phone 301-496-2164, Fax 301-480-9197, Email:
[email protected]
2
Pe
Mohamed Chaouchi. Department of Physiology and Biophysics, School of Medicine,
er
Georgetown University, Washington, DC 20007, USA. Phone 202-687-8594, Fax 202687-7407, Email: [email protected].
vi
2
Re
Hakima Amri
Department of Physiology and Biophysics, School of Medicine, Georgetown University,
ew
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
OMICS: A Journal of Integrative Biology
Washington, DC 20007, USA. Phone 202-687-8594, Fax 202-687-7407. Email:
[email protected].
*These authors contributed equally to this work
§
To whom correspondence should be addressed: [email protected]
-1-
Mary Ann Liebert, Inc., 140 Huguenot Street, New Rochelle, NY 10801
OMICS: A Journal of Integrative Biology
ABSTRACT
The qualitative dimension of gene-expression data and its heterogeneous nature in
cancerous specimens can be accounted for by phylogenetic modeling that incorporates
the directionality of altered gene expressions, complex patterns of expressions among a
group of specimens, and data-based rather that specimen-based gene linkage. Our
phylogenetic modeling approach is a double algorithmic technique that includes polarity
assessment that brings out the qualitative value of the data, followed by maximum
r
Fo
parsimony analysis that is most suitable for the data heterogeneity of cancer geneexpression. We demonstrate that polarity assessment of expression values into derived
Pe
and ancestral states, via outgroup comparison, reduces experimental noise; reveals
dichotomously-expressed asynchronous genes; and allows data pooling and
er
comparability of intra and interplatforms. Parsimony phylogenetic analysis of the
polarized values produces a classification of specimens into clades that reveal shared
Re
derived gene expressions (the synapomorphies) that provides better qualitative
vi
assessment of ontogenic linkage of genes and phyletic relatedness of specimens;
ew
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
efficiently utilizes dichotomously-expressed genes; produces highly predictive class
recognition; illustrates gene linkage and multiple developmental pathways; provides
higher concordance between gene lists; and projects the direction of change among
specimens. Further implication of this phylogenetic approach is that it may transform
microarray into diagnostic, prognostic, and predictive tool.
INTRODUCTION
Gene microarray has been employed in studying comparative gene-expression in
cancer, genetic disorders, infections, drug response and interactions, as well as other
-2-
Mary Ann Liebert, Inc., 140 Huguenot Street, New Rochelle, NY 10801
Page 2 of 51
Page 3 of 51
biological processes (Quackenbush, 2006), and its data used to generate cancer taxonomy
(Bittner, Meltzer, Chen et al., 2000; Golub, Slonim, Tamayo et al., 1999; Lossos and
Morgensztern, 2006), diagnosis, prognosis (Beer, Kardia, Huang et al., 2002),
subtyping/class discovery (Alizadeh, Eisen, Davis et al., 2000; Beer, Kardia, Huang et al.,
2002) and biomarker detection (Lossos and Morgensztern, 2006). However, after more
than a decade since its introduction and subsequent wide usage, microarray geneexpression is still suffering from a number of problems that are limiting its usefulness and
r
Fo
potential (Harrison, Johnston and Orengo, 2007; Millenaar, Okyere, May et al., 2006;
Wang, He, Band et al., 2005). There are the problems of reproducibility of
measurements between runs, instruments, or laboratories; the inability to perform intra
Pe
and interplatform comparability, pooling, and insufficient concordance of gene lists; as
er
well as the lack of an analytical paradigm that can transform microarray data into a
multidimensional bioinformatic tool useful for a clinical setting. Current analytical
Re
paradigms such as phenetic clustering and maximum likelihood (including Bayesian)
have not resolved these issues (Abu-Asab, Chaouchi and Amri, 2006; Abu-Asab,
vi
Chaouchi and Amri, 2008). In an attempt to resolve some of the above listed problems
ew
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
OMICS: A Journal of Integrative Biology
and broaden the bioinformatic potential of the microarray technology, we introduce a
parsimony phylogenetic approach for microarray data analysis that is based on outgroup
comparison (a.k.a. polarity assessment) and maximum parsimony. Our approach is a
double-algorithmic procedure where the data values are first polarized into derived or
ancestral depending on whether they fall within the range of the outgroup, which is
usually composed of normal healthy specimens, then the polarized data is processed with
a maximum parsimony algorithm. The analysis produces a phylogenetic classification of
-3-
Mary Ann Liebert, Inc., 140 Huguenot Street, New Rochelle, NY 10801
OMICS: A Journal of Integrative Biology
the specimens that recognizes monophyletic classes (clades) that are delimited by shared
derived gene-expressions (the synapomorphies).
Biologically meaningful interpretation of the data, and better correlation with
clinical characteristics, diagnosis, and outcomes are highly desired criteria in an
analytical tool (Allison, Cui, Page et al., 2006; Beer, Kardia, Huang et al., 2002; Bittner,
Meltzer, Chen et al., 2000; Golub, Slonim, Tamayo et al., 1999). Clustering specimens
into discernable entities on the basis of overall quantitative gene-expression linkage
r
Fo
similarities has some serious drawbacks (Allison, Cui, Page et al., 2006; Lyons-Weiler,
Patel, Becich et al., 2004), and appears to be incongruent with the nature of disease
development (Abu-Asab, Chaouchi and Amri, 2006; Abu-Asab, Chaouchi and Amri,
Pe
2008; Nesse and Stearns, 2008). In this report, we are demonstrating that the use of
er
parsimony phylogenetic analysis of microarray data resolves the issues of gene-ranking
discrepancies, improves interplatform concordance, makes possible intra and
Re
interplatform comparability, eliminates biases in the gene linkage criteria, and casts geneexpression profiles into a biologically relevant and predictive model of class discovery.
vi
A superior classification is one that summarizes maximum knowledge about its
ew
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
specimens, reflects their true ontogenic relationships to one another, and offers
predictivity (Farris, 1979; Golub, Slonim, Tamayo et al., 1999). The latter is especially
significant when the classification will be applied in a clinical setting for diagnosis,
prognosis, or post-treatment evaluation. We are utilizing parsimony phylogenetics
because of its inherent ability to produce a robust classification of relationships—class
discovery; and its forecasting power to reveal the characters of a specimen when its place
in the classification is established—class prediction (Albert, 2005b). Parsimony models
-4-
Mary Ann Liebert, Inc., 140 Huguenot Street, New Rochelle, NY 10801
Page 4 of 51
Page 5 of 51
the heterogeneity of cancerous microarray data without any a priori assumptions
(Goloboff and Pol, 2005; Siddall, 1998; Stefankovic and Vigoda, 2007). Additionally, a
phylogenetic approach elucidates the direction of change among specimens that leads to
their molecular and cellular diversity: the presence of one or more developmental
pathway (Abu-Asab, Chaouchi and Amri, 2008), and novel expressions that are involved
in the progression and maintenance of the disease.
A strict parsimony phylogenetic analysis uses only shared derived values,
r
Fo
synapomorphies, to delimit a natural group of specimens within a clade (Wiley and
Siegel-Causey, 1991). Shared derived values of a gene among several specimens
constitute a synapomorphy; therefore, only a synapomorphy is indicative of their
Pe
relatedness. Since synapomorphies define clades at various grouping levels, a
er
parsimonious phylogenetic classification reflects hierarchical shared developmental
pathways among a group of specimens and may reveal the presence of subclasses with
Re
each having its own uniquely derived gene-expression synapomorphies. In biological
and clinical senses, class discovery and prediction should be based on shared derived
vi
gene expressions (i.e., synapomorphies). For example, a cancer class (a clade in
ew
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
OMICS: A Journal of Integrative Biology
phylogenetic terminology) is delimited by one or more synapomorphies, and a cancerous
specimen will be placed in a class only if it shares the same synapomorphies with the
members of the clade.
We are describing a double-algorithmic analytical method of microarray geneexpression data based on polarity assessment algorithm, UNIPAL (Abu-Asab, Chaouchi
and Amri, 2006) where the polarized values can be used by a parsimony algorithm, MIX
(Felsenstein, 1989) to produce a phylogenetic classification of specimens. This approach
-5-
Mary Ann Liebert, Inc., 140 Huguenot Street, New Rochelle, NY 10801
OMICS: A Journal of Integrative Biology
brings in a systematic solution to class discovery through phylogenetic classification
whereby every class is delimited by shared derived gene expressions—i.e.,
synapomorphies-delimited clades. Because such a classification reflects the shared
aberrations of gene expressions of the specimens, we expect it to have a biological and
clinical relevance, and to advance targeted treatments of disease.
MATERIALS AND METHODS
r
Fo
Gene-Expression Datasets
In order to demonstrate the applicability of parsimony phylogenetics to
Pe
microarray gene-expression data, and test the results of interplatform concordance and
comparability, we downloaded three publicly available datasets of gene-expression
er
comparative studies, GDS484 (Hoffman, Milliken, Gregg et al., 2004), GDS533 (Quade,
Wang, Sornberger et al., 2004), and GDS1210 (Hippo, Taniguchi, Tsutsumi et al., 2002),
Re
from NCBI’s Gene-Expression Omnibus (http://www.ncbi.nlm.nih.gov/geo/). The
GDS484 was conducted on GPL96 (Affymetrix GeneChip Human Genome U133 Array
vi
Set HG-U133A), and the other two studies on GPL80 (Affymetrix GeneChip Human Full
ew
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
Length Array HuGeneFL). The GDS484 was comprised of normal myometrium (n= 5)
and uterine leiomyomas (n= 5) obtained from fibroid afflicted patients. The GDS533
study encompassed normal myometrium (n= 4), benign uterine leiomyoma (n= 7), as
well as malignant uterine (n= 9) and extra-uterine (n= 4) leiomyosarcoma specimens.
The GDS1210 study included expression profiling of 22 primary advanced gastric cancer
tissues and 8 normal specimens.
-6-
Mary Ann Liebert, Inc., 140 Huguenot Street, New Rochelle, NY 10801
Page 6 of 51
Page 7 of 51
Polarity Assessment and Parsimony Analysis
Polarity assessment through outgroup comparison does not use comparison of
means and folds but rather it converts the continuous values into discontinuous ones
through the assessment of each gene’s values against that of the normals’ range and
produces a matrix of polarized values (0s and 1s). Our polarity assessment program,
UNIPAL, compares independently each gene’s value of experimental specimens against
its corresponding range within the outgroup, and scores each as either derived (1) or
r
Fo
ancestral (0), so the matrix of gene-expression values is transformed into a matrix of
polarized scores (0s & 1s).
We used all the expression data points of all specimens in the analysis. For
Pe
polarity assessment (apomorphic [or derived] vs. plesiomorphic [or ancestral]), data was
polarized with our customized algorithm (UNIPAL) written by the authors that
er
recognized derived values of each gene when compared with the outgroups (Abu-Asab,
Re
Chaouchi and Amri, 2006). Outgroups here were composed of normal healthy specimens
only. UNIPAL determines the polarity for every data point among the specimens via
vi
outgroup comparison, and then scores each value of the study group as derived (1) or
ew
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
OMICS: A Journal of Integrative Biology
ancestral (0). Ideally, the outgroup should be large enough to encompass the maximum
variation within normal healthy population.
The phylogenetic analysis was carried out with MIX, the maximum parsimony
program of PHYLIP ver. 3.57c (Felsenstein, 1989), to produce separate parsimony
phylogenetic analyses for each dataset, and the inclusive matrix of the two sets (GDS533
& GDS1210) that included all their specimens. MIX was run in randomized and nonrandomized inputs, and no significant differences were observed between the two
options.
-7-
Mary Ann Liebert, Inc., 140 Huguenot Street, New Rochelle, NY 10801
OMICS: A Journal of Integrative Biology
Phylogenetic trees were drawn using TreeView (Page, 1996).
Interplatform Concordance and Comparability
To test interplatform concordance when analyzed parsimoniously, we compared
the synapomorphies of the two uterine leiomyoma datasets, GDS484 & GDS533, and
recorded the percentage of concordance.
To test interplatform comparability (i.e., whether their datasets can be pooled
together for an a parsimony analysis), we combined the polarized matrices of the two
r
Fo
identical platform datasets, GDS533 & GDS1210, processed the combined matrix by
MIX, and compared the result to their separate cladograms.
Pe
RESULTS
er
The implications of a parsimonious analysis of the gene-expression data are
realized at several aspects: the recognition and utilization of partially asynchronous genes
Re
and dichotomously-expressed asynchronous genes; the importance of outgroup selection
vi
and its effect on gene listing, the multidimensionality of the cladograms, as well as
interplatform concordance and comparability.
ew
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
Dichotomously-Expressed Asynchronous (DEA) Genes
Our analysis identified a specific punctuated pattern of gene expression that
seemed to occur only in a set of specimens where a gene’s expression values were around
the normals’ distribution (over and underexpressed), but did not overlap with it (Tables 17). This pattern has been only recognized once in the literature but was not named
(Lyons-Weiler, Patel, Becich et al., 2004); we termed this phenomenon dichotomous
-8-
Mary Ann Liebert, Inc., 140 Huguenot Street, New Rochelle, NY 10801
Page 8 of 51
Page 9 of 51
asynchronicity to reflect its two-tailed distribution and deviation from the normal
expression range.
While t-statistic and fold-change may dismiss these asynchronous genes from the
list of differentially-expressed genes, or misrepresent their significance (Lyons-Weiler,
Patel, Becich et al., 2004), an outgroup polarity assessment will assess each value as
derived and let the parsimony algorithm plot its significance in relation to the rest of the
genes. A parsimony phylogenetic algorithm uses the polarity distribution of all genes to
r
Fo
produce the most parsimonious classification, one with the lowest number of reversals
and parallelisms (i.e., minimizes multiple origins of expression states in hypothesizing
the relationships among the specimens) (Albert, 2005a; Felsenstein, 2004).
Pe
Through polarity assessment a large number of asynchronous genes that exhibited
er
dichotomous expression were recognized. All these genes had their expression values
above and below that of the normal specimens’ range, i.e., derived in relation to
Re
outgroups. DEA genes were found in all the three datasets studied here (Tables 1-7), and
were included within all the analyses.
ew
vi
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
OMICS: A Journal of Integrative Biology
Most Parsimonious Cladograms
Parsimony analysis produced one most parsimonious cladogram (having the least
number of steps in constructing a classification of specimens) for the uterine GDS533
dataset (Fig.1). The topology of the tree showed one large inclusive clade that
encompassed all of the leiomyomas and leiomyosarcomas delimited by 32
synapomorphies (Table 1), a terminal clade with 9 sarcoma specimens, middle sarcoma
clade with 4 specimens, 5 small basal leiomyoma clades in tandem arrangement followed
by 4 basal normal clades.
-9-
Mary Ann Liebert, Inc., 140 Huguenot Street, New Rochelle, NY 10801
OMICS: A Journal of Integrative Biology
The cladogram in Fig. 1 showed that the leiomyoma specimens did not form a
natural group by themselves—they did not form their own clade separating them from the
leiomyosarcomas, and there were no synapomorphies circumscribing them as a clade
when the ingroup was composed of leiomyoma and leiomyosarcoma. However, as a
paraphyletic group, the leiomyomas shared 146 synapomorphies distinguishing them
from the normals (Table 2).
The 13 leiomyosarcoma specimens separated into a large terminal clade that was
r
Fo
delimited by 20 synapomorphies in comparison with an outgroup composed of
leiomyoma and normal specimens (Table 3), and 29 synapomorphies derived in relation
to leiomyomas only as an outgroup (Table 4). Extrauterine sarcoma specimens did not
Pe
assemble together, but rather were scattered within the sarcoma clades (denoted by * on
er
the cladogram in Fig. 1). When the leiomyomas were removed from the comparison,
there were 156 synapomorphies delimiting the sarcomas (Table 5); a result that illustrates
Re
the effect of outgroup and ingroup selections on the results.
For the gastric dataset, GDS1210, parsimony analysis produced one most
vi
parsimonious cladogram (Fig. 2). The cladogram topology showed two terminal clades
ew
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
with 6 and 5 specimens respectively and a tandem arrangement of 6 small clades with
largest having 3 specimens. The inclusive gastric cancer clade was circumscribed by 34
synapomorphies (Table 6). In a list by list comparison, our 34 synapomorphies for the
gastric cancer overlapped only with one common gene (CST4) from the gene list of the
authors of the study (Hippo, Taniguchi, Tsutsumi et al., 2002).
- 10 -
Mary Ann Liebert, Inc., 140 Huguenot Street, New Rochelle, NY 10801
Page 10 of 51
Page 11 of 51
Interplatform Concordance
Testing of interplatform concordance was carried out by comparing the two lists
of synapomorphies of two leiomyoma studies, GDS484 and GDS533 (comparison results
are summarized in Tables 7 & 8). Out of the ~ 22,000 genes in the GDS484 dataset, our
analysis produced a total of 1485 synapomorphic genes circumscribing the leiomyoma
specimens. While the leiomyomas of the GDS533 were delimited by 146
synapomorphies out of ~7000 gene probes. A comparison between the two sets of
r
Fo
leiomyomas’ synapomorphies produced 45 shared ones between the two (Tables 7 & 8 ),
a 31% concordance in synapomorphies despite the sizable difference in the number of
probes between the two datasets, which is still better than the 12% concordance between
Pe
the statistically-produced gene lists of the two published studies (Hoffman, Milliken,
Gregg et al., 2004; Quade, Wang, Sornberger et al., 2004).
er
However, 48% concordance resulted when comparing the 32 synapomorphies of
Re
the leiomyomas and leiomyosarcomas clade (GDS533, Table 1) with the 1485
synapomorphies of the leiomyomas of GDS484 (Table 7); the clades’ synapomorphies
vi
overlapped as follows: 1/1 OE, 7/8 UE (except FOSB), & 8/23 DE, an 89% concordance
ew
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
OMICS: A Journal of Integrative Biology
within the OE & UE and 35% within the DE. Additionally, there was 45% concordance
between the 32 synapomorphies of the leiomyomas and leiomyosarcomas clade and the
gene list of Quad et al. (Table 8).
Furthermore, a lower concordance was obtained when comparing the
phylogenetic synapomorphies against statistically-generated gene lists. The
synapomorphies of leiomyomas (GDS533, Table 2) showed 18% concordance (4/25 OE,
8/42 UE) with the 78 significant genes of Hoffman et al. (2004, GDS484, gene list
produced by fold-change), and 16.5% (5/25 OE, 6/42 UE) with the 146 genes of Quad et
- 11 -
Mary Ann Liebert, Inc., 140 Huguenot Street, New Rochelle, NY 10801
OMICS: A Journal of Integrative Biology
al. (2004, GDS533, gene list produced by F-statistic). This was higher than the
concordance between the two gene lists of the published uterine studies, 12% (3/25 OE,
5/42 UE). The two studies had no mention of DE genes.
Data Pooling and Interplatform Comparability
Data pooling and interplatform comparability was carried out on the combined
polarized matrices of the gastric (GDS1210) and uterine (GDS533) datasets. Their
inclusive parsimony analysis produced one most parsimonious cladogram (Fig. 3). Its
r
Fo
topology showed a total separation of the gastric cancer from the uterine leiomyoma and
sarcoma specimens into two large clades. However, the two types of cancers shared 16
synapomorphies that delimited a clade composed of all the gastric and uterine specimens
Pe
(Table 9).
er
The resulting inclusive cladogram (Fig. 3) showed an almost total agreement with
the single type cladograms (Figs. 1 & 2) indicating a successful pooling of datasets.
Re
However, there was a slight variation in the topology of minor branches between the
cladogram of Fig. 2 and the inclusive one of Fig. 3. These slight differences are most
vi
likely due to the increased number of normal specimens that were used in outgroup of the
ew
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
inclusive cladogram. Outgroup size used here was by no means the idealist; the larger
the membership of the outgroup the more stable the topology of the generated cladogram
(Graybeal, 1998).
DISCUSSION
Microarray aims to identify differentially expressed genes, and subsequently
characterize genetic patterns, classify specimens accordingly, and point out potential
biomarkers. However, most of the problems that are currently associated with microarray
- 12 -
Mary Ann Liebert, Inc., 140 Huguenot Street, New Rochelle, NY 10801
Page 12 of 51
Page 13 of 51
analysis arise from using only the quantitative aspect of the data (the absolute continuous
data values of gene-expression) to carry out parametric statistical analysis forecasting
gene linkage on the basis of quantitative correlation and not expression pattern; and
lacking the power to recognize and utilize specific gene-expression patterns such as
dichotomous-expression and partial asynchronicities (Abu-Asab, Chaouchi and Amri,
2008; Allison, Cui, Page et al., 2006). This results in discrepancies that affect which
genes are considered differentially expressed by the two main ranking criteria for
r
Fo
generating gene-lists, the t-test and fold-change (Guo, Lobenhofer, Wang et al., 2006).
Our double-algorithmic analysis supports a qualitative approach where the directionality
of expression is the first step to designate the expression value as significant, followed by
Pe
parsimony search to plot a classification of specimens with the smallest number of steps
er
that explains the data’s distribution pattern.
The results of parsimonious analysis of microarray gene-expression of three
Re
datasets show a total distinction of the sarcoma from the fibroid tissues (the leiomyomas),
and these two classes from gastric cancer. It also identified a number of synapomorphies
vi
for gastric and uterine cancers, thus defining each as a natural disease entity with its
ew
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
OMICS: A Journal of Integrative Biology
unique shared derived expression; produced higher interplatform concordance than gene
lists of t-test and fold-change (Tables 7 & 8); and allowed the pooling and comparability
of two independent experiments. Such results confer reliability to a qualitative
parsimonious approach to analyzing gene-expression data (Table 10).
Advantages of Polarity Assessment
There are several reasons for our preference of a combination of polarity
assessment via outgroup comparison and parsimony over other methods for the analysis
- 13 -
Mary Ann Liebert, Inc., 140 Huguenot Street, New Rochelle, NY 10801
OMICS: A Journal of Integrative Biology
of gene-expression microarray data (Allison, Cui, Page et al., 2006; Kolaczkowski and
Thornton, 2004). Parsimony phylogenetic analysis requires polarity assessment for each
data value to determine its novelty—whether it represents a change from the normal state
(Abu-Asab, Chaouchi and Amri, 2008). We advocate that qualitative, and not only
quantitative, similarity is a better measure of common ontogenetic steps among
specimens, and that a correlation of genes based on similar quantitative expression is not
necessarily indicative of ontogenic relationships among genes or specimens. Polarity
r
Fo
assessment converts the absolute continuous data values into fixed discontinuous binary
states (0/1) where the zero signifies no change in gene-expression and one indicates a
deviation from the range of normal specimens. The change of a state from zero to one
Pe
conveys the direction of change in the diseased specimens since a derived state (1)
er
denotes a state that does not occur in normal specimens.
Polarity assessment does not set an arbitrary stringency on gene selection
Re
especially where the distribution pattern is gene specific within a set of specimens (e.g.,
DE and partially asynchronous genes), and the other transformation methods are not
vi
optimal for its assessment (Huang and Qu, 2006; Lyons-Weiler, Patel, Becich et al.,
ew
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
2004). Fold-Change and t-test may dismiss from the gene-list those genes with
dichotomous-expressions although they are indicative of a unique expression type and
may account for some phenomena such transitional clades, and dichotomous or multipathway development in some disease types (Abu-Asab, Chaouchi and Amri, 2006;
Lyons-Weiler, Patel, Becich et al., 2004). The gene lists of Tables 1C-7C show a large
number of DE asynchronous genes that were mostly not considered significant by other
methods (Hippo, Taniguchi, Tsutsumi et al., 2002; Hoffman, Milliken, Gregg et al., 2004;
- 14 -
Mary Ann Liebert, Inc., 140 Huguenot Street, New Rochelle, NY 10801
Page 14 of 51
Page 15 of 51
Quade, Wang, Sornberger et al., 2004), or their dichotomous mode was not noticed by
the authors.
Identifying synapomorphies is an important goal of a parsimony analysis since
they are the basis for defining clades (Albert, 2005a; Wiley and Siegel-Causey, 1991).
Polarity assessment identifies the genes with derived expressions in all of the ingroup
specimens—i.e., it recognizes apomorphies, thus allows us to carry out parsimony
phylogenetic analysis and benefit from its unique implications (Felsenstein, 2004;
r
Fo
Hennig, 1966). It is the parsimony algorithm that plots a hierarchical distribution of
synapomorphies to produce a hypothesis of relationships among the specimens in the
form of a cladogram. A synapomorphy can be traced back from the parsimony
Pe
cladogram to the specimens that share it, thus permitting the determination of potential
er
biomarkers; this tracing back is almost impossible in other analytical methods (AbuAsab, Chaouchi and Amri, 2008).
Re
Because polarity assessment transforms the quantitative data into a qualitative
matrix, it reduces the data noise. The absolute quantitative nature of the microarray data
vi
restricts their use and interpretation due to their range of inconsistencies between runs,
ew
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
OMICS: A Journal of Integrative Biology
platforms, and laboratories. By polarizing each data set with its own set of outgroup
specimens, the inconsistencies of the experiment are eliminated since the polarization
process is a comparison between equals—data values generated at the same time. The
benefit here translates into the ability to pool a large number of experiments, carry out
intra and interplatform comparabilities, and a better gene-list concordance between
experiments. However, as discussed below, polarity assessment is sensitive to the choice
and size of the outgroup specimens.
- 15 -
Mary Ann Liebert, Inc., 140 Huguenot Street, New Rochelle, NY 10801
OMICS: A Journal of Integrative Biology
Selection and Size of the Outgroup
When conducting a polarity assessment, outgroup’s selection and its effective size
are very significant factors in correctly identifying synapomorphies, and therefore,
delimiting the natural clades within the study group. The composition of the outgroup
specimens affects the outcome of the analysis as demonstrated by the different
combinations of outgroups that we used to conduct polarity assessment (Tables 1-5). In
our opinion, the outgroup should be composed of only healthy specimens when the goal
r
Fo
is to find out the genes involved in disease inception, progression, and maintenance. As
Tables 1-5 show, variations of out/ingroup composition lead to variations in identifying
synapomorphies, and therefore, may generate erroneous conclusions. When the ingroup
Pe
is a paraphyletic group (e.g., leiomyomas), the identified synapomorphies are different
from those when the ingroup is monophyletic (contains all the related uterine specimens,
er
in this case the leiomyosarcoma as well).
Re
In our combined analysis (Fig. 3), the increase in outgroup size did not affect the
major topology of the cladogram, but rather the internal branching of some clades
vi
(normal and gastric cancer) when compared with their single analysis (Figs. 1-2).
ew
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
Because increasing the number of genes in the study does not have the same effect as
enlarging outgroup size (Graybeal, 1998), it is our conclusion that a successful analysis
requires a good number of normal specimens to be used as the outgroup. For microarray
experiments to be meaningful and provide high predictivity, the smallest number of
normal specimens that incorporates the maximum variation per population should be
established and used in the analysis.
- 16 -
Mary Ann Liebert, Inc., 140 Huguenot Street, New Rochelle, NY 10801
Page 16 of 51
Page 17 of 51
Gene Linkage
Whereas gene linkage of a clustering dendrogram is based on quantitative
correlations between differentially expressed genes, in a parsimony cladogram it is based
on the most parsimonious distribution of derived and ancestral gene-expression states of
all genes of all the specimens; it is a map of expression states--both ancestral and derived.
For this process to be accurate, the most parsimonious cladogram is selected. It reflects
the classification that has the lowest number of steps as well as parallels and reversals to
r
Fo
explain the distribution of expression states among specimens.
Gene linkage here is based on the location of genes on the cladogram as
synapomorphies. The synapomorphies below a node on the cladogram are the linked
Pe
genes that are shared among the specimens above that node. Because a parsimonious
cladogram is hierarchical, every one of its nodes has its synapomorphy(ies). This
er
characteristic of a cladogram presents it as a map of linked genetic alterations that
Re
produce the diversity/relatedness of its specimens and may also permit the tracing of
shared ontogenic pathways that are responsible for disease initiation and progression.
Phylogenetic Implications on Disease Definition
ew
vi
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
OMICS: A Journal of Integrative Biology
Although it is assumed that each disease has its own unique developmental
pathway(s) (Adsay, Merati, Andea et al., 2002; Chung, 2000; Hayashi, Yamashita and
Watanabe, 2004), thus far the omics data has not been used to prove this premise. Our
analysis of two independently-generated datasets that represent uterine (GDS533) and
gastric (GDS1210) cancers confirms that each of these two types of cancer is a natural
class of specimens (a clade) that is circumscribed by its own set of synapomorphies. If
this can be extended to other types of cancer, then each cancer can be considered a
natural clade with its unique gene-expression identifiers—the synapomorphies.
- 17 -
Mary Ann Liebert, Inc., 140 Huguenot Street, New Rochelle, NY 10801
OMICS: A Journal of Integrative Biology
There are several implications to this conclusion; the most obvious is its effect on
the definition of biomarkers. If a type of cancer is a clade, then any suggested biomarker
has to be a proven synapomorphy; otherwise it will not be a universal diagnostic test for
all the specimens of this cancer. Some of the currently applied immunohistomarkers are
not universal synapomorphies. For example, the memberships of all four clades of the
gastric cancers (Fig. 2) did not correlate well with the specimens’ immunoreactivity to
antibodies against p53, E-cadherin, and –catenin, and a published two-way clustering
r
Fo
did not correlate any better (Hippo, Taniguchi, Tsutsumi et al., 2002). The discordance
between molecular classifications and most of the currently used immunohistological
markers is a problem that can be better addressed in a phylogenetic sense to indicate
Pe
whether a marker is a synapomorphy or has a random distribution among the subclades of
er
a cancer. Most of the immunohistological markers do partially stain their tumors, and
therefore, are not expected to be synapomorphies.
Re
A second implication is that a phylogenetic classification can be a diagnostic tool
because it is a process of class discovery based on synapomorphy-defined clades. This
vi
can be realized either through a parsimony analysis where the place of a specimen will
ew
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
indicate its pathologic status or by using the synapomorphies as the biomarkers of a
specimen, i.e., through class prediction.
A third implication is that a parsimonious classification of specimens may be used
as a prognostic tool. Because the cladogram also indicates the direction of change in
gene-expression among the specimens; it places those specimens with the advanced
number of derived gene-expression patterns at the terminal end of the cladogram, and
places the specimens with the least number of gene-expression changes at the lower end
- 18 -
Mary Ann Liebert, Inc., 140 Huguenot Street, New Rochelle, NY 10801
Page 18 of 51
Page 19 of 51
of the cladogram, it may be developed for use in prognosis, targeted treatment, and posttreatment assessment.
Additionally, the phylogenetic classification is a dynamic tool that will
incorporate a novel specimen by placing it in the proximity of its sister groups, depending
on the number of synapomorphies it shares with other members of a clade, without any
radical alteration to the topology of the cladogram.
Improved Interplatform Concordance and Comparability
r
Fo
Improved interplatform concordance is a criterion that will bestow robustness and
significance on microarray as a valid experimental and clinical platform. Using
parsimony analysis, our tests of concordance by comparing the lists of synapomorphies
Pe
produced by polarity assessment of two experiments produced better results than those of
er
fold-change and F-statistic, and better than between the latter two (Table 8). When
comparing the synapomorphies of a clade composed of leiomyomas and
Re
leiomyosarcomas (GDS533) with the synapomorphies of leiomyomas (GDS484), we
obtained a high concordance of 89% within over and underexpressed and 35% within
vi
dichotomously expressed. The concordance between the two studies could have been
ew
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
OMICS: A Journal of Integrative Biology
higher if the number of probes of the GDS533 was closer to GDS484--7,000 v. 22,000
(Hoffman, Milliken, Gregg et al., 2004; Quade, Wang, Sornberger et al., 2004).
Furthermore, even a comparison of the synapomorphies of two paraphyletic leiomyoma
groups (GDS484 [1485 synapomorphies] & GDS533 [146 synapomorphies, Table 2])
produced 31% concordance between the two groups of leiomyoma (45/146, Table 7).
This was a higher percentage than was produced by statistical methods (12%).
- 19 -
Mary Ann Liebert, Inc., 140 Huguenot Street, New Rochelle, NY 10801
OMICS: A Journal of Integrative Biology
Interplatform comparability has been difficult to carry out on microarray data
because of data inconsistencies between runs, experiments, and laboratories, however,
with polarity assessment, which converts the quantitative values of gene-expression of
every experiment into a qualitative matrix, it is possible to combine several matrices and
carry out intra- and interplatform comparisons in a parsimonious phylogenetic sense. A
phylogenetic interplatform comparability of microarray data can be carried out if each
dataset can be polarized separately to produce its polarized matrix. Furthermore, when
r
Fo
their probes are identical, two or more polarized sets can be pooled together and analyzed
as Fig. 3 shows. We have successfully pooled and analyzed two separately polarized
datasets (GDS533 & 1210) of gastric cancer as well as uterine leiomyoma and
Pe
leiomyosarcoma, where the two datasets were prepared separately but on an identical
er
gene chip platform, GPL80; and previously we have pooled three mass spectrometry
proteomic datasets for a phylogenetic analysis (Abu-Asab, Chaouchi and Amri, 2006).
Re
Resolving Standing Questions Through Parsimony Phylogenetics: An Example
Our analysis of uterine tissues illustrates how a parsimony phylogenetic analysis
vi
may confront some of the unresolved issues in bioinformatics and medicine. For
ew
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
example, one of the persistent questions in pathology is the relationship between
leiomyoma and leiomyosarcoma (Quade, Wang, Sornberger et al., 2004). It has been
reported that approximately 1% of leiomyosarcoma may have arisen in pre-existing
leiomyoma (Lee, Kong, Lee et al., 2005). By analyzing data of normal uterus,
leiomyoma, and leiomyosarcoma, we are demonstrating that the latter two share a
number of synapomorphies and form an inclusive clade (Table 1, Figs. 1 & 3), and that
leiomyosarcoma has an additional number of synapomorphies distinguishing them from
- 20 -
Mary Ann Liebert, Inc., 140 Huguenot Street, New Rochelle, NY 10801
Page 20 of 51
Page 21 of 51
leiomyoma (Table 3). Although the leiomyoma specimens, when analyzed alone,
without the leiomyosarcoma, appear to have a large number of synapomorphies (Table
2), these synapomorphies are not unique to leiomyoma, and the group appears to be
paraphyletic (contains some but not all of its developmental relatives). Leiomyoma as a
group does not form a clade within a comprehensive ingroup that includes the
leiomyosarcoma; there is not even one gene-expression that is unique to the group itself
in this context. Because it shares with the sarcoma its synapomorphies, leiomyoma
r
Fo
should be considered an incipient form of leiomyosarcoma.
Conclusion
The application of phylogenetic analysis through polarity assessment and
Pe
parsimony to several gene-expression microarray datasets provides the basis for a new
er
paradigm to analyzing and interpreting microarray data (Table 10). It offers an
alternative to F & t-statistics and fold-change methods of generating differentially-
Re
expressed gene listing and statistical gene linkage; brings out a higher interplatform
concordance; resolves interplatform comparability problems; defines biomarkers as
vi
synapomorphies; circumscribes disease types as clades defined by synapomorphies; and
ew
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
OMICS: A Journal of Integrative Biology
possibly transforms microarray into diagnostic, prognostic, and post-treatment evaluation
tool.
ACKNOWLEDGEMENTS
Competing interests. The authors have filed for US patent for their analytical method.
- 21 -
Mary Ann Liebert, Inc., 140 Huguenot Street, New Rochelle, NY 10801
OMICS: A Journal of Integrative Biology
REFERENCE
Abu-Asab, M., Chaouchi, M., and Amri, H. (2006). Phyloproteomics: What
phylogenetic analysis reveals about serum proteomics. J Proteome Res 5,
2236-2240.
Abu-Asab, M., Chaouchi, M., and Amri, H. (2008). Evolutionary medicine: A
meaningful connection between omics, disease, and treatment. Proteomic
Clin Appl 2, 122-134.
r
Fo
Adsay, N. V., Merati, K., Andea, A., Sarkar, F., Hruban, R. H., Wilentz, R. E., et
al. (2002). The dichotomy in the preinvasive neoplasia to invasive carcinoma
sequence in the pancreas: differential expression of MUC1 and MUC2
Pe
supports the existence of two separate pathways of carcinogenesis. Mod
Pathol 15, 1087-1095.
er
Albert, V. A. (2005a). Parsimony and phylogenetics in the genomic age. In:
Re
Albert, V. A. (ed). Parsimony, phylogeny, and genomics. (Oxford University
Press, Oxford, New York).
vi
Albert, V. A. (2005b). Parsimony, phylogeny, and genomics. (Oxford University
ew
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
Press, Oxford, New York).
Alizadeh, A. A., Eisen, M. B., Davis, R. E., Ma, C., Lossos, I. S., Rosenwald, A.,
et al. (2000). Distinct types of diffuse large B-cell lymphoma identified by gene
expression profiling. Nature 403, 503-511.
Allison, D. B., Cui, X., Page, G. P., and Sabripour, M. (2006). Microarray data
analysis: from disarray to consolidation and consensus. Nature Reviews 7,
55-65.
- 22 -
Mary Ann Liebert, Inc., 140 Huguenot Street, New Rochelle, NY 10801
Page 22 of 51
Page 23 of 51
Beer, D. G., Kardia, S. L., Huang, C. C., Giordano, T. J., Levin, A. M., Misek, D.
E., et al. (2002). Gene-expression profiles predict survival of patients with
lung adenocarcinoma. Nature Medicine 8, 816-824.
Bittner, M., Meltzer, P., Chen, Y., Jiang, Y., Seftor, E., Hendrix, M., et al. (2000).
Molecular classification of cutaneous malignant melanoma by gene
expression profiling. Nature 406, 536-540.
Chung, D. C. (2000). The genetic basis of colorectal cancer: insights into critical
r
Fo
pathways of tumorigenesis. Gastroenterology 119, 854-865.
Farris, J. S. (1979). The Information Content of the Phylogenetic System.
Systematic Zoology 28, 483-519.
Pe
Felsenstein, J. (1989). PHYLIP: phylogeny inference package (version 3.2).
Cladistics 5, 164-166.
er
Felsenstein, J. (2004). Inferring phylogenies. (Sinauer Associates, Sunderland,
Mass.).
Re
Goloboff, P. A., and Pol, D. (2005). Parsimony and Bayesian phylogenetics. In:
vi
Albert, V. A. (ed). Parsimony, phylogeny, and genomics. (Oxford University
ew
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
OMICS: A Journal of Integrative Biology
Press, Oxford, New York).
Golub, T. R., Slonim, D. K., Tamayo, P., Huard, C., Gaasenbeek, M., Mesirov, J.
P., et al. (1999). Molecular classification of cancer: class discovery and class
prediction by gene expression monitoring. Science 286, 531-537.
Graybeal, A. (1998). Is it better to add taxa or characters to a difficult
phylogenetic problem? Systematic Biology 47, 9-17.
- 23 -
Mary Ann Liebert, Inc., 140 Huguenot Street, New Rochelle, NY 10801
OMICS: A Journal of Integrative Biology
Guo, L., Lobenhofer, E. K., Wang, C., Shippy, R., Harris, S. C., Zhang, L., et al.
(2006). Rat toxicogenomic study reveals analytical consistency across
microarray platforms. Nat Biotech 24, 1162-1169.
Harrison, A. P., Johnston, C. E., and Orengo, C. A. (2007). Establishing a major
cause of discrepancy in the calibration of Affymetrix GeneChips. BMC
Bioinformatics 8, 195.
Hayashi, Y., Yamashita, J., and Watanabe, T. (2004). Molecular genetic analysis
r
Fo
of deep-seated glioblastomas. Cancer Genet Cytogenet 153, 64-68.
Hennig, W. (1966). Phylogenetic systematics. (University of Illinois Press,
Urbana, IL).
Pe
Hippo, Y., Taniguchi, H., Tsutsumi, S., Machida, N., Chong, J. M., Fukayama, M.,
er
et al. (2002). Global gene expression analysis of gastric cancer by
oligonucleotide microarrays. Cancer Research 62, 233-240.
Re
Hoffman, P. J., Milliken, D. B., Gregg, L. C., Davis, R. R., and Gregg, J. P.
(2004). Molecular characterization of uterine fibroids and its implication for
vi
underlying mechanisms of pathogenesis. Fertility and Sterility 82, 639-649.
ew
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
Huang, S., and Qu, Y. (2006). The loss in power when the test of differential
expression is performed under a wrong scale. J Comput Biol 13, 786-797.
Kolaczkowski, B., and Thornton, J. W. (2004). Performance of maximum
parsimony and likelihood phylogenetics when evolution is heterogeneous.
Nature 431, 980-984.
- 24 -
Mary Ann Liebert, Inc., 140 Huguenot Street, New Rochelle, NY 10801
Page 24 of 51
Page 25 of 51
Lee, E. J., Kong, G., Lee, S. H., Rho, S. B., Park, C. S., Kim, B. G., et al. (2005).
Profiling of differentially expressed genes in human uterine leiomyomas. Int J
Gynecol Cancer 15, 146-154.
Lossos, I. S., and Morgensztern, D. (2006). Prognostic biomarkers in diffuse
large B-cell lymphoma. J Clin Oncol 24, 995-1007.
Lyons-Weiler, J., Patel, S., Becich, M. J., and Godfrey, T. E. (2004). Tests for
finding complex patterns of differential expression in cancers: towards
r
Fo
individualized medicine. BMC Bioinformatics 5, 110.
Millenaar, F. F., Okyere, J., May, S. T., van Zanten, M., Voesenek, L. A., and
Peeters, A. J. (2006). How to decide? Different methods of calculating gene
Pe
expression from short oligonucleotide array data will give different results.
BMC Bioinformatics 7, 137.
er
Nesse, R. M., and Stearns, S. C. (2008). The great opportunity: Evolutionary
Re
applications to medicine and public health. Evol Appl 1, 28-48.
Page, R. D. (1996). TreeView: an application to display phylogenetic trees on
vi
personal computers. Comput Appl Biosci 12, 357-358.
ew
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
OMICS: A Journal of Integrative Biology
Quackenbush, J. (2006). Microarray analysis and tumor classification. The New
England Journal of Medicine 354, 2463-2472.
Quade, B. J., Wang, T. Y., Sornberger, K., Dal Cin, P., Mutter, G. L., and Morton,
C. C. (2004). Molecular pathogenesis of uterine smooth muscle tumors from
transcriptional profiling. Genes, Chromosomes & Cancer 40, 97-108.
Siddall, M. E. (1998). Success of parsimony in the four-taxon case: long-branch
repulsion by likelihood in the Farris zone. Cladistics 14, 209-220.
- 25 -
Mary Ann Liebert, Inc., 140 Huguenot Street, New Rochelle, NY 10801
OMICS: A Journal of Integrative Biology
Stefankovic, D., and Vigoda, E. (2007). Phylogeny of mixture models: robustness
of maximum likelihood and non-identifiable distributions. J Comput Biol 14,
156-189.
Wang, H., He, X., Band, M., Wilson, C., and Liu, L. (2005). A study of inter-lab
and inter-platform agreement of DNA microarray data. BMC Genomics 6, 71.
Wiley, E. O., and Siegel-Causey, D. (1991). The Compleat cladist : a primer of
phylogenetic procedures. (Museum of Natural History, Dyche Hall, University
r
Fo
of Kansas, Lawrence, Kan.).
er
Pe
ew
vi
Re
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
- 26 -
Mary Ann Liebert, Inc., 140 Huguenot Street, New Rochelle, NY 10801
Page 26 of 51
Page 27 of 51
r
Fo
er
Pe
ew
vi
Re
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
OMICS: A Journal of Integrative Biology
FIG. 1. A cladogram of a parsimony phylogenetic analysis of microarray gene-expression
data representing normal myometrium (n= 4), leiomyoma (n= 7), leiomyosarcoma (n=
9), and extrauterine leiomyosarcoma (n= 4) specimens. The leiomyomas and
leiomyosarcomas form a clade defined by 32 synapomorphies (Table 1). The
leiomyosarcoma specimens form a terminal clade that is circumscribed by 20
synapomorphies (Table 3). Asterisk (*) denotes extrauterine leiomyosarcomas.
189x280mm (600 x 600 DPI)
Mary Ann Liebert, Inc., 140 Huguenot Street, New Rochelle, NY 10801
OMICS: A Journal of Integrative Biology
r
Fo
er
Pe
ew
vi
Re
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
FIG. 2. A cladogram of a parsimony phylogenetic analysis of gastric cancer and noncancerous specimens. It shows a clade delineated by 34 synapomorphies (Table 6)
encompassing all cancer specimens.
219x249mm (600 x 600 DPI)
Mary Ann Liebert, Inc., 140 Huguenot Street, New Rochelle, NY 10801
Page 28 of 51
Page 29 of 51
r
Fo
er
Pe
Re
FIG. 3. A cladogram representing a comparability analysis of the gastric (GDS1210) and
uterine (GDS533) datasets. The polarized matrices of the two datasets were pooled
together and processed by the parsimony phylogenetic algorithm, MIX. Each of the
cancers (gastric and leiomyosarcoma) forms its own clade, and the inclusive clade
encompassing the two cancers and leiomyomas is delimited by a set of synapomorphies
(Table 9).
248x189mm (600 x 600 DPI)
ew
vi
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
OMICS: A Journal of Integrative Biology
Mary Ann Liebert, Inc., 140 Huguenot Street, New Rochelle, NY 10801
OMICS: A Journal of Integrative Biology
Table 1. Synapomorphies defining a clade of leiomyoma and leiomyosarcoma
specimens in comparison to normal specimens (GDS533). Synapomorphies include:
one OE gene, 8 UE genes, and 23 DE genes. Last column reports the status of the
synapomorphies as described by [1] Hoffman et al.(2004) and [2] Quade et al. (2004)
in their significant genes’ lists. DE= dichotomously-expressed; NS= not significant;
OE= overexpressed; UN= underexpressed.
A. Overexpressed synapomorphic genes:
r
Fo
D00596
TYMS thymidylate synthetase
OE[1, 2]
B. Underexpressed synapomorphic genes:
L19871
ATF3 activating transcription factor 3
UE[1, 2]
U62015
CYR61 cysteine-rich, angiogenic inducer, 61
UE[1, 2]
X68277
DUSP1 dual specificity phosphatase 1
UE[1, 2]
V01512
FOS v-fos FBJ murine osteosarcoma viral oncogene
homolog
UE[1], NS[2]
L49169
FOSB FBJ murine osteosarcoma viral oncogene homolog B
NS[1], UE[2]
J04111
JUN v-jun sarcoma virus 17 oncogene homolog (avian)
UE[1, 2]
Y00503
KRT19 keratin 19
UE[1], NS[2]
U24488
TNXB tenascin XB
er
Pe
ew
vi
Re
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
Page 30 of 51
UE[1], UE,OE[2]
C. Dichotomously-expressed synapomorphic genes:
M31994
ALDH1A1 aldehyde dehydrogenase 1 family, member A1
UE[1], NS[2]
X05409
ALDH2 aldehyde dehydrogenase 2 family (mitochondrial)
NS[1, 2]
D25304
ARHGEF6 Rac/Cdc42 guanine nucleotide exchange factor
(GEF) 6
NS[1, 2]
K03430
C1QB complement component 1, q subcomponent, B chain
NS[1], OE[2]
U60521
CASP9 caspase 9, apoptosis-related cysteine peptidase
NS[1, 2]
M73720
CPA3 carboxypeptidase A3 (mast cell)
NS[1, 2]
HG2663HT2759_at
Cpg-Enriched DNA, Clone S19 (HG3995-HT4265)
NS[1, 2]
1
Mary Ann Liebert, Inc., 140 Huguenot Street, New Rochelle, NY 10801
Page 31 of 51
M14676
FYN oncogene related to SRC, FGR, YES
NS[1, 2]
M34677
F8A1 coagulation factor VIII-associated (intronic transcript)
1
OE,UE[2]
U60061
FEZ2 fasciculation and elongation protein zeta 2 (zygin II)
NS[1, 2]
U86529
GSTZ1 glutathione transferase zeta 1 (maleylacetoacetate
isomerase)
NS[1, 2]
HG358HT358_at
Homeotic Protein 7, Notch Group (HG358-HT358)
NS[2]
AB002365
KIAA0367 BCH motif-containing molecule at the carboxyl
terminal region 1
NS[1], OE,UE[2]
U37283
MFAP5 microfibrillar associated protein 5
NS[1, 2]
HG406HT406_at
MFI2 antigen p97 (melanoma associated) identified by
monoclonal antibodies 133.2 and 96.5
MMP2 matrix metallopeptidase 2 (gelatinase A, 72kDa
gelatinase, 72kDa type IV collagenase)
M55593
r
Fo
NS[1, 2]
NS[1], OE,UE[2]
M76732
MSX1 msh homeobox homolog 1
NS[1, 2]
L48513
PON2 paraoxonase 2
NS[1, 2]
U77594
RARRES2 retinoic acid receptor responder (tazarotene
induced) 2
NS[1, 2]
M11433
RBP1 retinol binding protein 1
NS[1, 2]
L03411
RDBP RD RNA binding protein
NS[1], OE[2]
Z29083
TPBG trophoblast glycoprotein
NS[1, 2]
S73591
TXNIP thioredoxin interacting protein
er
Pe
NS[1, 2]
ew
vi
Re
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
OMICS: A Journal of Integrative Biology
2
Mary Ann Liebert, Inc., 140 Huguenot Street, New Rochelle, NY 10801
OMICS: A Journal of Integrative Biology
Table 2. Synapomorphies of leiomyoma specimens in comparison to normal
specimens (GDS533). These comprise: 25 OE genes, 42 UE genes, and 79 DE genes.
Asterisk (*) indicates a synapomorphy for leiomyosarcoma as well. Last column
reports the status of the synapomorphies as described by Hoffman et al. (2004) and
Quade et al. (2004) in their significant genes lists.
A. Overexpressed synapomorphic genes:
D16469
ATP6AP1 ATPase, H+ transporting, lysosomal accessory
protein 1
NS[1, 2]
U07139
CACNB3 calcium channel, voltage-dependent, beta 3 subunit
NS[1], OE[2]
M11718
COL5A2 collagen, type V, alpha 2
NS[1, 2]
U18300
DDB2 damage-specific DNA binding protein 2, 48kDa
NS[1, 2]
D38550
E2F3 E2F transcription factor 3
NS[1, 2]
M34677
F8A1 coagulation factor VIII-associated (intronic transcript) 1
NS[1, 2]
D89289
FUT8 fucosyltransferase 8 (alpha (1,6) fucosyltransferase)
NS[1, 2]
D86962
GRB10 growth factor receptor-bound protein 10
NS[1, 2]
M32053
H19, imprinted maternally expressed untranslated mRNA
NS[1, 2]
U07664
HLXB9 homeobox HB9
OE[1, 2]
D87452
IHPK1 inositol hexaphosphate kinase 1
NS[1, 2]
U51336
ITPK1 inositol 1,3,4-triphosphate 5/6 kinase
NS[1, 2]
r
Fo
er
Pe
ew
vi
Re
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
Page 32 of 51
AB002365 KIAA0367
NS[1], OE[2]
D78611
MEST mesoderm specific transcript homolog (mouse)
OE[1], NS[2]
U19718
MFAP2 microfibrillar-associated protein 2
NS[1, 2]
M55593
MMP2 matrix metallopeptidase 2 (gelatinase A, 72kDa
gelatinase, 72kDa type IV collagenase)
OE[1, 2]
U79247
PCDH11X protocadherin 11 X-linked
NS[1, 2]
L24559
POLA2 polymerase (DNA directed), alpha 2 (70kD subunit)
NS[1, 2]
M65066
PRKAR1B protein kinase, cAMP-dependent, regulatory, type
I, beta
NS[1, 2]
D14694
PTDSS1 phosphatidylserine synthase 1
NS[1, 2]
U24186
RPA4 replication protein A4, 34kDa
NS[1, 2]
3
Mary Ann Liebert, Inc., 140 Huguenot Street, New Rochelle, NY 10801
Page 33 of 51
U85658
TFAP2C transcription factor AP-2 gamma (activating
enhancer binding protein 2 gamma)
NS[1, 2]
D82345
TMSL8 thymosin-like 8
NS[1, 2]
D85376
TRHR thyrotropin-releasing hormone receptor
NS[1, 2]
D00596
TYMS* thymidylate synthetase
OE[1, 2]
B. Underexpressed synapomorphic genes:
X03350
ADH1B alcohol dehydrogenase IB (class I), beta polypeptide
NS[1, 2]
M31994
ALDH1A1* aldehyde dehydrogenase 1 family, member A1
UE[1], NS[2]
X05409
ALDH2* aldehyde dehydrogenase 2 family (mitochondrial)
NS[1, 2]
L19871
ATF3* activating transcription factor 3
UE[1, 2]
U60521
CASP9 caspase 9, apoptosis-related cysteine peptidase
NS[1, 2]
D49372
CCL11 chemokine (C-C motif) ligand 11
NS[1, 2]
X05323
CD200 molecule
NS[1, 2]
M83667
CEBPD CCAAT/enhancer binding protein (C/EBP), delta
NS[1, 2]
U90716
CXADR coxsackie virus and adenovirus receptor
M21186
er
NS[1, 2]
CYBA cytochrome b-245, alpha polypeptide
NS[1, 2]
U62015
CYR61* cysteine-rich, angiogenic inducer, 61
UE[1, 2]
Z22865
DPT dermatopontin
NS[1, 2]
X56807
DSC2 desmocollin 2
X68277
DUSP1* dual specificity phosphatase 1
V01512
FOS* v-fos FBJ murine osteosarcoma viral oncogene
homolog
UE[1], NS[2]
L49169
FOSB FBJ murine osteosarcoma viral oncogene homolog B
NS[1], UE[2]
L11238
GP5 glycoprotein V (platelet)
NS[1, 2]
M36284
GYPC glycophorin C (Gerbich blood group)
NS[1, 2]
M60750
HIST1H2BG histone cluster 1, H2bg
NS[1, 2]
X79200
Homo spaiens mRNA for SYT-SSX protein
NS[1, 2]
X92814
HRASLS3 HRAS-like suppressor 3
NS[1, 2]
M62831
IER2 immediate early response 2
NS[1, 2]
J04111
JUN* v-jun sarcoma virus 17 oncogene homolog (avian)
UE[1, 2]
r
Fo
Pe
ew
vi
Re
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
OMICS: A Journal of Integrative Biology
NS[1, 2]
UE[1, 2]
4
Mary Ann Liebert, Inc., 140 Huguenot Street, New Rochelle, NY 10801
OMICS: A Journal of Integrative Biology
Y00503
KRT19* keratin 19
NS[1, 2]
X89430
MECP2 methyl CpG binding protein 2 (Rett syndrome)
NS[1, 2]
U46499
MGST1 microsomal glutathione S-transferase 1
NS[1, 2]
M93221
MRC1 mannose receptor, C type 1
NS[1, 2]
M76732
MSX1* msh homeobox homolog 1 (Drosophila)
NS[1, 2]
S71824
NCAM1 neural cell adhesion molecule 1
OE[1], NS[2]
X70218
PPP4C protein phosphatase 4
NS[1, 2]
U02680
PTK9 protein tyrosine kinase 9
NS[1, 2]
U79291
U77594
L20859
NS[1, 2]
NS[1, 2]
NS[1, 2]
SLC20A1 solute carrier family 20 (phosphate transporter),
member 1
STAT1 signal transducer and activator of transcription 1,
91kDa
er
M97935
RBP1* retinol binding protein 1, cellular
Pe
M11433,
X07438
PTPN11 protein tyrosine phosphatase, non-receptor type 11
(Noonan syndrome 1)
RARRES2* retinoic acid receptor responder (tazarotene
induced) 2
r
Fo
J04152
TACSTD2 tumor-associated calcium signal transducer 2
X14787
THBS1 thrombospondin 1
U24488
TNXB* tenascin XB
Z29083
TPBG* trophoblast glycoprotein
X51521
VIL2 villin 2 (ezrin)
D87716
WDR43 WD repeat domain 43
ew
vi
Re
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
Page 34 of 51
NS[1, 2]
NS[1, 2]
NS[1, 2]
NS[1, 2]
UE[1, 2]
NS[1, 2]
UE[1], NS[2]
NS[1, 2]
C. Dichotomously-expressed synapomorphic genes:
ABCB1; ADRM1; AIM1; ALDH1A3; AMDD; ARHGEF6; ARL4D; ATP5B; Atp8a2; C1QB;
CA9; CALM2; CTSB; CCRL2; CD52; CD99; CPA3; DPYD; DSG2; Emx2; FEZ2; FLNA;
FOXO1A; FYN; GAPDH; GNB3; GSTZ1; H1F0; H2-ALPHA; HBG2; Ubx, Notch1; Hox5.4;
HTR2C; ICA1; IGF2; INSR; ITGA6; ITGA9; KCNK1; KIAA0152; MAP1D; MATK; MBP;
MDM4; MFAP5; MFI2 antigen p97; MLH1; MPZ; NDUFS1; NELL2; NNAT; NOS3; NR4A1;
OASL; ODC1; OLFM1; PKN2; PON2; PRMT2; PSMC3; PTR2; RANBP2; RBMX; RDBP;
RHOG; SAFB2; SCRIB; SELP; SERPINF1; SMS; SPOCK2; ST3GAL1; THRA; TNXB;
TTLL4; TXNIP; UPK2; XA; ZNF43
5
Mary Ann Liebert, Inc., 140 Huguenot Street, New Rochelle, NY 10801
Page 35 of 51
Table 3. A clade of all leiomyosarcoma specimens defined by 20 synapomorphies in
comparison to normal and leiomyoma specimens. Last column reports the status of
the synapomorphies as described by Quade et al. (2004) in their significant genes
list.
A. Overexpressed synapomorphic genes:
X54942
CKS2 CDC28 protein kinase regulatory subunit 2
NS
U68566
HAX1 HCLS1 associated protein X-1
NS
L03411
X59543
r
Fo
RDBP RD RNA binding protein
OE
RRM1 ribonucleotide reductase M1 polypeptide
NS
Pe
B. Underexpressed synapomorphic genes:
D13639
CCND2, cyclin D2
UE
D21337
COL4A6 collagen, type IV, alpha 6
UE
er
HG2810-HT2921_at
L36033
HG2810-HT2921_at
AB002382
Homeotic Protein Emx2
ew
HG2663-HT2759_at
vi
HG2663-HT2759_at
Csh2 chorionic somatomammotropin hormone 2
[Rattus norvegicus]
CXCL12 chemokine (C-X-C motif) ligand 12 (stromal
cell-derived factor 1)
EMX2 empty spiracles homolog 2 (Drosophila).
Homeotic Protein Emx2
Re
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
OMICS: A Journal of Integrative Biology
HOXA10 homeobox A10 Expressed in the adult
human endometrium
LOC284394 hypothetical gene supported by
NM_001331
NS
NS
NS
NS
UE
NS
U69263
MATN2 matrilin 2
UE
U85707
Meis1, myeloid ecotropic viral integration site 1
homolog (mouse)
UE
Z29678
MITF microphthalmia-associated transcription factor
UE
L35240
PDLIM7 PDZ and LIM domain 7 (enigma)
NS
6
Mary Ann Liebert, Inc., 140 Huguenot Street, New Rochelle, NY 10801
OMICS: A Journal of Integrative Biology
D87735
RPL14 ribosomal protein L14
NS
L14076
SFRS4 splicing factor, arginine/serine-rich 4
UE
J05243
SPTAN1 spectrin, alpha, non-erythrocytic 1 (alphafodrin)
NS
C. Dichotomously-expressed synapomorphic gene:
M33197
GAPDH glyceraldehyde-3-phosphate dehydrogenase
r
Fo
er
Pe
ew
vi
Re
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
Page 36 of 51
7
Mary Ann Liebert, Inc., 140 Huguenot Street, New Rochelle, NY 10801
NS
Page 37 of 51
Table 4. A clade of all leiomyosarcoma specimens defined by 29 synapomorphies in
comparison to leiomyoma specimens only (GDS533). Last column reports the status
of the synapomorphies as described by Quade et al. (2004) in their significant genes
list.
A. Overexpressed synapomorphic genes:
X54941
CKS1B CDC28 protein kinase regulatory subunit 1B
OE
X54942
CKS2 CDC28 protein kinase regulatory subunit 2
NS
J03060
GBAP glucosidase, beta; acid, pseudogene
NS
U78027
GLA galactosidase, alpha (associated w/ Fabry’s) RPL36A
ribosomal protein L36a No 4 4922 GPX1 glutathione
peroxidase 1
NS
Y00433
GPX1 glutathione peroxidase 1
NS
U68566
HAX1 HCLS1 associated protein X-1
NS
X59543
RRM1 ribonucleotide reductase M1 polypeptide
NS
U12465
RPL35 ribosomal protein L35
OE
r
Fo
er
Pe
SLC10A2 solute carrier family 10 (sodium/bile acid cotransporter
family), member 2
B. Underexpressed synapomorphic: genes
U67674
Re
NS
U87223
CNTNAP1 contactin associated protein 1
UE
D30655
EIF4A2 eukaryotic translation initiation factor 4A, isoform 2
UE
L20814
GRIA2 glutamate receptor, ionotropic, AMPA 2
UE
M10051
INSR insulin receptor
NS
D79999
LOC221181 hypothetical gene supported by NM_006437
NS
D14812
MORF4L2 mortality factor 4 like 2
UE
L36151
PIK4CA phosphatidylinositol 4-kinase, catalytic, alpha
polypeptide
NS
D42108
PLCL1 phospholipase C-like 1
NS
L13434
RpL41 Ribosomal protein L41
NS
HG921HT3995_at
Serine/Threonine Kinase, Receptor 2-2, Alt. Splice 3
NS
D31891
SETDB1 SET domain, bifurcated 1
UE
ew
vi
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
OMICS: A Journal of Integrative Biology
8
Mary Ann Liebert, Inc., 140 Huguenot Street, New Rochelle, NY 10801
OMICS: A Journal of Integrative Biology
AB002318
Talin2
NS
U53209
TRA2A transformer-2 alpha
NS
D87292
TST thiosulfate sulfurtransferase (rhodanese)
NS
M15990
YES1 v-yes-1 Yamaguchi sarcoma viral oncogene homolog 1
NS
C. Dichotomously expressed synapomorphic genes:
U56417
AGPAT1 1-acylglycerol-3-phosphate O-acyltransferase 1
(lysophosphatidic acid acyltransferase, alpha)
NS
M63167
AKT1 v-akt murine thymoma viral oncogene homolog 1
NS
L27560
IGFBP5 insulin-like growth factor binding protein 5
NS
U40223
P2RY4 pyrimidinergic receptor P2Y, G-protein coupled, 4
NS
D76444
RNF103 ring finger protein 103
NS
r
Fo
er
Pe
ew
vi
Re
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
Page 38 of 51
9
Mary Ann Liebert, Inc., 140 Huguenot Street, New Rochelle, NY 10801
Page 39 of 51
Table 5. A clade composed of all leiomyosarcoma specimens is defined in relation to
normal specimens (GDS533). Last column reports the status of the synapomorphies
as described by Quade et al. (2004) in their significant genes list.
A. Overexpressed synapomorphic genes:
S78187
CDC25B cell division cycle 25B
NS
U40343
CDKN2D cyclin-dependent kinase inhibitor 2D (p19,
inhibits CDK4)
NS
X54942
CKS2 CDC28 protein kinase regulatory subunit 2
NS
r
Fo
X79353
GDI1 GDP dissociation inhibitor 1
NS
H2AFX H2A histone family, member X
NS
IRF5 interferon regulatory factor 5
NS
U04209
MFAP1 microfibrillar-associated protein 1
NS
U43177
MpV17 mitochondrial inner membrane protein
NS
U19796
MRPL28 mitochondrial ribosomal protein L28
OE
X14850
U51127
er
Pe
POLR2L polymerase (RNA) II (DNA directed)
polypeptide L, 7.6kDa
PPGB protective protein for beta-galactosidase
(galactosialidosis)
SLC18A3 solute carrier family 18 (vesicular
acetylcholine), member 3
STIP1 stress-induced-phosphoprotein 1 (Hsp70/Hsp90organizing protein)
U37690
Re
M22960
U09210
M86752
M26880
UBC ubiquitin C
U43177
UCN urocortin
B. Underexpressed synapomorphic genes:
ew
vi
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
OMICS: A Journal of Integrative Biology
NS
NS
NS
OE
OE
NS
HG3638HT3849_s_at
ADH1A alcohol dehydrogenase 1A (class I), alpha
polypeptide
Amyloid Beta (A4) Precursor Protein, Alt. Splice 2,
A4(751)
L28997
ARL1 ADP-ribosylation factor-like 1
NS
Z49269
CCL14 chemokine (C-C motif) ligand 14
UE
M92934
CTGF connective tissue growth factor
UE
M74099
CUTL1 cut-like 1, CCAAT displacement protein
(Drosophila)
NS
M12963
10
Mary Ann Liebert, Inc., 140 Huguenot Street, New Rochelle, NY 10801
UE
NS
OMICS: A Journal of Integrative Biology
M96859
DPP6 dipeptidyl-peptidase 6
UE
U94855
EIF3S5 eukaryotic translation initiation factor 3, subunit 5
epsilon, 47kDa
NS
L25878
EPHX1 epoxide hydrolase 1, microsomal (xenobiotic)
NS
U60061U69140
FEZ2 fasciculation and elongation protein zeta 2 (zygin II)
NS
X67491
GLUDP5 glutamate dehydrogenase pseudogene 5
NS
HG4334HT4604_s_at
Glycogenin
NS
X53296
IL1RN interleukin 1 receptor antagonist
NS
X55740
NT5E 5'-nucleotidase, ecto (CD73)
UE
PCBP2 poly(rC) binding protein 2
UE
r
Fo
X78136
PHLDA1 pleckstrin homology-like domain, family A,
member 1
PPP2R1A protein phosphatase 2 (formerly 2A), regulatory
subunit A (PR 65), alpha isoform
PPP2CB protein phosphatase 2, catalytic subunit, beta
isoform
NS
U25988
PSG11 pregnancy specific beta-1-glycoprotein 11
NS
M98539
PTGDS prostaglandin D2 synthase 21kDa (brain)
UE
X54131
PTPRB protein tyrosine phosphatase, receptor type, B
NS
M12174
RHOB ras homolog gene family, member B
NS
HG1879HT1919
RHOQ ras homolog gene family, member Q
M33493
TPSB2 tryptase beta 2
L14837
TJP1 tight junction protein 1 (zona occludens 1)
UE
HG3344HT3521_at
UBE2D1 ubiquitin-conjugating enzyme E2D 1 (UBC4/5
homolog, yeast)
NS
X98534
VASP vasodilator-stimulated phosphoprotein
NS
X51630
WT1 Wilms tumor 1
UE
HG3426HT3610_s_at
Zinc Finger Protein Hzf-16, Kruppel-Like, Alt. Splice 1
NS
M92843
ZFP36 zinc finger protein 36, C3H type, homolog (mouse)
UE
Z50194
Pe
J02902
J03805
er
ew
vi
Re
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
Page 40 of 51
NS
NS
NS
NS
C. Dichotomously-expressed synapomorphic genes:
U80226
ABAT 4-aminobutyrate aminotransferase
11
Mary Ann Liebert, Inc., 140 Huguenot Street, New Rochelle, NY 10801
NS
Page 41 of 51
M14758
ABCB1 ATP-binding cassette, sub-family B (MDR/TAP),
member 1
NS
M95178
ACTN1 actinin, alpha 1
NS
U76421
ADARB1 adenosine deaminase, RNA-specific, B1 (RED1
homolog rat)
NS
U46689
ALDH3A2 aldehyde dehydrogenase 3 family, member A2
NS
L34820
ALDH5A1 aldehyde dehydrogenase 5 family, member A1
(succinate-semialdehyde dehydrogenase)
NS
M84332
ARF1 ADP-ribosylation factor 1
NS
D14710
ATP5A1 ATP synthase, H+ transporting, mitochondrial F1
NS
complex, alpha subunit 1, cardiac muscle
X84213
U23070
M33518
r
Fo
BAK1 BCL2-antagonist/killer 1
NS
BAMBI BMP and activin membrane-bound inhibitor
homolog (Xenopus laevis)
NS
BAT2 HLA-B associated transcript 2
NS
Pe
X61123
BTG1 B-cell translocation gene 1, anti-proliferative
NS
S60415
CACNB2 calcium channel, voltage-dependent, beta 2
subunit
NS
M19878
CALB1 calbindin 1, 28kDa
NS
L76380
CALCRL calcitonin receptor-like
NS
M21121
CCL5 chemokine (C-C motif) ligand 5
NS
D14664
CD302 CD302 molecule
NS
X72964
CETN2 centrin, EF-hand protein, 2
NS
U66468
CGREF1 cell growth regulator with EF-hand domain 1
NS
M63379
CLU clusterin
NS
X52022
COL6A3 collagen, type VI, alpha 3
L25286
COL15A1 collagen, type XV, alpha 1
NS
S45630
CRYAB crystallin, alpha B
NS
X95325
CSDA cold shock domain protein A
NS
U03100
CTNNA1 catenin (cadherin-associated protein), alpha 1,
102kDa
NS
X52142
CTPS CTP synthase
NS
D38549
CYFIP1 cytoplasmic FMR1 interacting protein 1
NS
X64229
DEK DEK oncogene (DNA binding)
NS
er
ew
vi
Re
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
OMICS: A Journal of Integrative Biology
12
Mary Ann Liebert, Inc., 140 Huguenot Street, New Rochelle, NY 10801
UE
OMICS: A Journal of Integrative Biology
M63391
DES desmin
UE
Z34918
EIF4G3 eukaryotic translation initiation factor 4 gamma, 3
NS
U97018
EML1 echinoderm microtubule associated protein like 1
NS
U12255
FCGRT Fc fragment of IgG, receptor, transporter, alpha
NS
U36922
FOXO1A forkhead box O1A (rhabdomyosarcoma)
NS
U91903
FRZB frizzled-related protein
NS
M33197
GAPDH glyceraldehyde-3-phosphate dehydrogenase
NS
U09587
GARS glycyl-tRNA synthetase
NS
r
Fo
U66075
D13988
U31176
GATA6 GATA binding protein 6
NS
GDI2 GDP dissociation inhibitor 2
NS
GFER growth factor, augmenter of liver regeneration
(ERV1 homolog, S. cerevisiae)
NS
Pe
U28811
GLG1 golgi apparatus protein 1
NS
U66578
GPR23 G protein-coupled receptor 23
NS
L40027
GSK3A glycogen synthase kinase 3 alpha
NS
U77948
GTF2I general transcription factor II, i
UE
Z29481
HAAO 3-hydroxyanthranilate 3,4-dioxygenase
NS
D16480
HADHA hydroxyacyl-Coenzyme A dehydrogenase/3ketoacyl-Coenzyme A thiolase/enoyl-Coenzyme A
hydratase (trifunctional protein), alpha subunit
NS
U50079
HDAC1 histone deacetylase 1
NS
U50078
HERC1 hect (homologous to the E6-AP (UBE3A)
carboxyl terminus) domain and RCC1 (CHC1)-like
domain (RLD) 1
NS
M95623
HMBS hydroxymethylbilane synthase
NS
X79536
HNRPA1 heterogeneous nuclear ribonucleoprotein A1
NS
L15189
HSPA9B heat shock 70kDa protein 9B (mortalin-2)
NS
U05875
IFNGR2 interferon gamma receptor 2 (interferon gamma
transducer 1)
NS
X57025
IGF1 insulin-like growth factor 1 (somatomedin C)
UE
HG3543HT3739_at
IGF2 insulin-like growth factor 2 (somatomedin A)
NS
U40282
ILK integrin-linked kinase
NS
er
ew
vi
Re
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
Page 42 of 51
13
Mary Ann Liebert, Inc., 140 Huguenot Street, New Rochelle, NY 10801
Page 43 of 51
X74295
ITGA7 integrin, alpha 7
NS
X57206
ITPKB inositol 1,4,5-trisphosphate 3-kinase B
NS
AB002365
KIAA0367
UE
J00124
KRT14 keratin 14 (epidermolysis bullosa simplex,
Dowling-Meara, Koebner)
NS
X05153
LALBA lactalbumin, alpha-
NS
X02152
LDHA lactate dehydrogenase A
NS
HG3527HT3721_f_at
LHB luteinizing hormone beta polypeptide
NS
X86018
LRRC41 leucine rich repeat containing 41
NS
MFAP4 microfibrillar-associated protein 4
NS
MIA3 melanoma inhibitory activity family, member 3
NS
MSN moesin
NS
r
Fo
L38486
D87742
M69066
Pe
AB003177
mRNA for proteasome subunit p27
NS
U47742
MYST3 MYST histone acetyltransferase (monocytic
leukemia) 3
NS
M30269
NID1 nidogen 1
NS
M10901
NKX3-1 NK3 transcription factor related, locus 1
(Drosophila)
NR3C1 nuclear receptor subfamily 3, group C, member 1
(glucocorticoid receptor)
Re
U80669
er
NS
NS
M16801
NR3C2 nuclear receptor subfamily 3, group C, member 2
NS
U52969
PCP4 Purkinje cell protein 4
UE
J03278
PDGFRB platelet-derived growth factor receptor, beta
polypeptide
NS
D37965
PDGFRL platelet-derived growth factor receptor-like
NS
Z49835
PDIA3 protein disulfide isomerase family A, member 3
NS
U78524
PIAS1 protein inhibitor of activated STAT, 1
NS
U60644
PLD3 phospholipase D family, member 3
NS
D11428
PMP22 peripheral myelin protein 22
NS
U79294
PPAP2B phosphatidic acid phosphatase type 2B
NS
S71018
PPIC peptidylprolyl isomerase C (cyclophilin C)
NS
X07767
PRKACA protein kinase, cAMP-dependent, catalytic,
alpha
NS
ew
vi
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
OMICS: A Journal of Integrative Biology
14
Mary Ann Liebert, Inc., 140 Huguenot Street, New Rochelle, NY 10801
OMICS: A Journal of Integrative Biology
X83416
PRNP prion protein (p27-30)
NS
M55671
PROZ protein Z, vitamin K-dependent plasma
glycoprotein
NS
U72066
RBBP8 retinoblastoma binding protein 8
NS
L25081
RHOC ras homolog gene family, member C
NS
U40369
SAT1 spermidine/spermine N1-acetyltransferase 1
NS
M97287
SATB1 special AT-rich sequence binding protein 1 (binds
to nuclear matrix/scaffold-associating DNA's)
NS
U83463
SDCBP syndecan binding protein (syntenin)
NS
U28369
SEMA3B sema domain, immunoglobulin domain (Ig),
short basic domain, secreted, (semaphorin) 3B
NS
SFTPA2 surfactant, pulmonary-associated protein A2
NS
HG3925HT4195_at
L31801
r
Fo
SLC16A1 solute carrier family 16, member 1
(monocarboxylic acid transporter 1)
SLC2A4 solute carrier family 2 (facilitated glucose
transporter), member 4
SMARCD1 SWI/SNF related, matrix associated, actin
dependent regulator of chromatin, subfamily d, member 1
NS
NS
U50383
SMYD5 SMYD family member 5
NS
D43636
SNRK SNF related kinase
NS
D87465
SPOCK2 sparc/osteonectin, cwcv and kazal-like domains
proteoglycan (testican) 2
NS
M61199
SSFA2 sperm specific antigen 2
NS
U15131
ST5 suppression of tumorigenicity 5
U95006
STRA13 stimulated by retinoic acid 13 homolog (mouse)
NS
M74719
TCF4 transcription factor 4
NS
X14253
TDGF1 teratocarcinoma-derived growth factor 1
NS
U52830
TERT telomerase reverse transcriptase
NS
U12471
THBS1 thrombospondin 1
NS
U16296
TIAM1 T-cell lymphoma invasion and metastasis 1
NS
L01042
TMF1 TATA element modulatory factor 1
NS
U03397
TNFRSF9 tumor necrosis factor receptor superfamily,
member 9
NS
X05276
TPM4 tropomyosin 4
UE
M91463
U66617
er
Pe
ew
vi
Re
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
Page 44 of 51
15
Mary Ann Liebert, Inc., 140 Huguenot Street, New Rochelle, NY 10801
NS
NS
Page 45 of 51
HG4683HT5108_s_at
TRAF2 TNF receptor-associated factor 2
NS
U64444
UFD1L ubiquitin fusion degradation 1 like (yeast)
NS
U39318
UBE2D3 ubiquitin-conjugating enzyme E2D 3 (UBC4/5
homolog, yeast)
NS
X59739
ZFX zinc finger protein, X-linked
NS
r
Fo
er
Pe
ew
vi
Re
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
OMICS: A Journal of Integrative Biology
16
Mary Ann Liebert, Inc., 140 Huguenot Street, New Rochelle, NY 10801
OMICS: A Journal of Integrative Biology
Table 6. A list of 34 synapomorphies defining a clade composed of all gastric cancer
specimens (GDS1210). Synapomorphies include: 8 OE genes, 24 UE genes, and 2
DE genes in comparison with the normal specimens. Last column reports the status
of the synapomorphies as described by Hippo et al. (2002). Yes= listed; No= not
listed.
A. Overexpressed synapomorphic genes:
X81817
BAP31 mRNA
No
D50914
BOP1 block of proliferation 1
No
X54667
CST4: cystatin S MGC71923
Yes
L17131
HMGA1 high mobility group AT-hook 1
No
D63874
HMGB1 high-mobility group box 1
No
D26600
PSMB4 proteasome (prosome, macropain) subunit, beta type, 4
No
U36759
PTCRA pre T-cell antigen receptor alpha PT-ALPHA, PTA
No
X89750
TGIF TGFB-induced factor (TALE family homeobox)
No
r
Fo
er
Pe
B. Underexpressed synapomorphic genes:
Re
X76342
ADH7 alcohol dehydrogenase 7 (class IV), mu or sigma polypeptide ADH-4
No
M63962
ATP4A ATPase, H+/K+ exchanging, alpha polypeptide ATP6A
No
M75110
ATP4B ATPase, H+/K+ exchanging, beta polypeptide ATP6B
No
J05401
CKMT2 creatine kinase, mitochondrial 2 (sarcomeric)
No
L38025
CNTFR ciliary neurotrophic factor receptor
No
M61855
CYP2C9: cytochrome P450, family 2, subfamily C, polypeptide 9 CPC9
No
D63479
DGKD: diacylglycerol kinase, delta 130kDa DGKdelta, KIAA0145, dgkd-2
No
X99101
ESR2 estrogen receptor 2 (ER beta)
No
U21931
FBP1 fructose-1,6-bisphosphatase 1
No
HG3432HT3618_
at
Fibroblast Growth Factor Receptor K-Sam, Alt. Splice 1
No
M31328
GNB3 guanine nucleotide binding protein (G protein), beta polypeptide 3
No
D42047
GPD1L glycerol-3-phosphate dehydrogenase 1-like
No
ew
vi
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
Page 46 of 51
17
Mary Ann Liebert, Inc., 140 Huguenot Street, New Rochelle, NY 10801
Page 47 of 51
M62628
Human alpha-1 Ig germline C-region membrane-coding region, 3' end
No
D29675
Human inducible nitric oxide synthase gene, promoter and exon 1
No
M63154
Human intrinsic factor mRNA
No
Z29074
KRT9 keratin 9 (epidermolytic palmoplantar keratoderma) EPPK, K9
No
X05997
LIPF lipase, gastric
No
U50136
LTC4S leukotriene C4 synthase MGC33147
No
X76223
MAL: mal, T-cell differentiation protein
No
U19948
PDIA2 protein disulfide isomerase family A, member 2
No
L07592
PPARD peroxisome proliferative activated receptor, delta
No
U57094
RAB27A, member RAS oncogene family
No
AC00207
7
SLC38A3 solute carrier family 38, member 3
No
Z29574
TNFRSF17 tumor necrosis factor receptor superfamily, member 17
Pe
No
r
Fo
C. Dichotomously-expressed synapomorphic genes:
D00408
CYP3A7 cytochrome P450, family 3, subfamily A, polypeptide 7 CP37, P450HFLA
No
U29091
SELENBP1 selenium binding protein 1
No
er
ew
vi
Re
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
OMICS: A Journal of Integrative Biology
18
Mary Ann Liebert, Inc., 140 Huguenot Street, New Rochelle, NY 10801
OMICS: A Journal of Integrative Biology
Table 7. Interplatform concordance. A list of overlapping identical (22) and
homologous (23) synapomorphic genes in leiomyoma specimens of GDS484 &
GDS533. These include: 9 OE, 24 UE, and 12 DE.
GDS533
GDS484
A. Overexpressed synapomorphic
genes
a. Identical Synapomorphies
DDB2
DDB2
FUT8
FUT8
MEST
MEST
TMSL8
TMSL8
TYMS
TYMS
FOSB
JUNB
PPP4C
SLC20A1
THBS1
WDR43
b. Homologous synapomorphies
a. Identical synapomorphies
CACNB3
COL5A2
KIAA0367
PRKAR1B
CTSB
r
Fo
CACNA1C
COL4A5
KIAA0101
PRKACB
CTSB
b. Homologous synapomorphies
er
ARL4D
FOXO1A
GNB3
ITGA6
ITGA9
KCNK1
MFAP5
PSMC3
SELP
TXNIP
ZNF43
ew
vi
Re
B. Underexpressed synapomorphic
genes
a. Identical synapomorphies
ALDH1A1
ALDH1A1
ALDH2
ALDH2
ATF3
ATF3
CEBPD
CEBPD
CXADR
CXADR
CYR61
CYR61
DUSP1
DUSP1
FOS
FOS
HRASLS3
HRASLS3
IER2
IER2
JUN
JUN
KRT19
KRT19
RARRES2
RARRES2
TACSTD2
TACSTD2
TNXB
TNXB
VIL2
VIL2
b. Homologous synapomorphies
CASP9
CASP4
CYBA
CYB5R1
FOS
JUN
PPP1R10
SLC18A2
THBD
WDR37
C. Dichotomously-expressed
synapomorphic Genes
Pe
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
Page 48 of 51
ARL4C
FOXJ3
GNB1L
ITGA2B
ITGA2B
KCNJ5
MFAP4
PSMC2
SELL
TXNDC13
ZNF259P
19
Mary Ann Liebert, Inc., 140 Huguenot Street, New Rochelle, NY 10801
Page 49 of 51
Table 8. Summary of concordance results between GDS484 and GDS533. The
comparisons were carried out in various combinations: statistical v. statistical,
phylogenetic v. statistical, and phylogenetic v. phylogenetic.
GDS533 (Fibroids and Leiomyosarcomas)
Quad et al. Abu-Asab et al.
Abu-Asab et al.
Gene List
Synapomorphies
Synapomorphies for
for Fibroids (146) Fibroids and
Leiomyosarcoma
(32)
Concordance
Hoffman et al.
Gene List
GDS484 Abu-Asab et al.
(Fibroids) Synapomorphies
for Fibroids
(1485)
GDS533 Quad et al. Gene
List
r
Fo
12%
18%
19.3%
20%
31%
48%
16.5%
45%
er
Pe
ew
vi
Re
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
OMICS: A Journal of Integrative Biology
20
Mary Ann Liebert, Inc., 140 Huguenot Street, New Rochelle, NY 10801
OMICS: A Journal of Integrative Biology
Table 9. Interplatform comparability. A list of 16 synapomorphies defining a clade
composed of all gastric cancer (GDS1210) as well as uterine sarcoma and leiomyoma
specimens (GDS533).
ID
Gene
U52522
ARFIP2 ADP-ribosylation factor interacting protein 2 (arfaptin
2)
U51478
ATP1B3 ATPase, Na+/K+ transporting, beta 3 polypeptide
X66839
CA9 carbonic anhydrase IX
M60974
X01677M33197
r
Fo
GADD45A growth arrest and DNA-damage-inducible, alpha
GAPDH glyceraldehyde-3-phosphate dehydrogenase [two
readings]
Pe
X14850
H2AFX H2A histone family, member X
U52830
Homo sapiens Cri-du-chat region mRNA, clone CSC8
U25138
KCNMB1 potassium large conductance calcium-activated
channel, subfamily M, beta member 1
D21063
MCM2 minichromosome maintenance deficient 2
L38486
MFAP4 microfibrillar-associated protein 4
D87463
PHYHIP phytanoyl-CoA 2-hydroxylase interacting protein
X02419
PLAU plasminogen activator, urokinase
L48513
PON2 paraoxonase 2
U29091
SELENBP1 selenium binding protein 1
Z19083
TPBG trophoblast glycoprotein
M25077
TROVE2 TROVE domain family, member 2
er
ew
vi
Re
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
21
Mary Ann Liebert, Inc., 140 Huguenot Street, New Rochelle, NY 10801
Page 50 of 51
Page 51 of 51
Table 10. Summary of the characteristics of a parsimonious phylogenetic analysis
through polarity assessment of gene-expression values followed by a maximum
parsimony analysis.
Offers a qualitative assessment of microarray gene-expression data; uses
only shared derived states (synapomorphies) as the basis of similarity
between specimens.
Efficiently models the heterogeneous expression profiles of the diseased
r
Fo
specimens. Those with fast mutation rate such as cancer.
Incorporates gene-expressions that violate normal distribution in a set of
Pe
specimens—e.g., dichotomously expressed genes.
Identifies synapomorphies and uses them to delineate clades (class
er
discovery). Synapomorphies are also the potential biomarkers.
Reduces experimental noise.
Re
Permits pooling of multiple experiments.
vi
Allows intra and intercomparability of data.
ew
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
OMICS: A Journal of Integrative Biology
Produces higher concordance between gene lists than statistical methods
(F & t-statistics and fold-change).
Offers a non-parametric data-based, not specimen-based, gene listing and
gene linkage., gene listing and gene linkage.
22
Mary Ann Liebert, Inc., 140 Huguenot Street, New Rochelle, NY 10801
NIH Public Access
Author Manuscript
J Proteome Res. Author manuscript; available in PMC 2008 March 20.
NIH-PA Author Manuscript
Published in final edited form as:
J Proteome Res. 2006 September ; 5(9): 2236–2240.
Phyloproteomics: What Phylogenetic Analysis Reveals about
Serum Proteomics
Mones Abu-Asab*,†, Mohamed Chaouchi‡, and Hakima Amri§
Laboratory of Pathology, National Cancer Institute, National Institutes of Health, Bethesda,
Maryland, National Oceanic and Atmospheric Administration, National Ocean Service, CO-OPS/
Information Systems Division, Silver Spring, Maryland, and Department of Physiology and
Biophysics, School of Medicine, Georgetown University, Washington, D.C.
Abstract
NIH-PA Author Manuscript
Phyloproteomics is a novel analytical tool that solves the issue of comparability between proteomic
analyses, utilizes a total spectrum-parsing algorithm, and produces biologically meaningful
classification of specimens. Phyloproteomics employs two algorithms: a new parsing algorithm
(UNIPAL) and a phylogenetic algorithm (MIX). By outgroup comparison, the parsing algorithm
identifies novel or vanished MS peaks and peaks signifying up or down regulated proteins and scores
them as derived or ancestral. The phylogenetic algorithm uses the latter scores to produce a
biologically meaningful classification of the specimens.
Keywords
Cancer; dichotomous development; mass spectrometry; phylogenetics; phyloproteomics;
proteomics; serum; transitional clades
Introduction
NIH-PA Author Manuscript
The utilization of the serum proteome to accurately diagnose cancer has been challenging, and
its future continues to be surrounded by uncertainties.1 Although statistical analysis of mass
spectrometry (MS) profiles of serum proteins has gained enormous popularity and credibility,
2-6 algorithmic analysis that produces biologically meaningful results with possible clinical
diagnosis is still lacking. It now seems very simplistic to attempt to define cancer on the basis
of statistical patterns, since cancer is a multifaceted evolving and adapting cellular condition
with multiple proteomic profiles; some of these profiles cannot always be separated from
noncancerous ones by narrowly defined statistical proteomic patterns on the basis of a limited
number of spectral peaks. Cancer's incipience is marked by mutations that cause the
malfunction of the apoptotic apparatus of the cell, and its promotion is characterized by
different phases with each having its distinct proteomic profile.7,8 Advanced progression of
cancer is marked by cellular dedifferentiation, loss of apoptosis, and metamorphosis into a
primordial status where survival, and not function, becomes the cell's primary mission.8 In this
latter stage, many proteins responsible for differentiation are not produced, and therefore
missing MS peaks are as significant in defining the proteomic profiles of cancer.
* To whom correspondence should be addressed. [email protected]..
†National Institutes of Health.
‡National Oceanic and Atmospheric Administration.
§Georgetown University.
Abu-Asab et al.
Page 2
NIH-PA Author Manuscript
The multiphasic nature of cancer progression combined with possible multiple developmental
pathways8-11 entail the presence of a large number of proteomic changes for each type of
cancer and its phases. These factors suggest that the proteomic profile of a cancer type is a
hierarchical and continuous accumulation of proteomic change over time rather than one or a
few simple distinct proteomic patterns. For an analytical tool to be successful in producing a
clinical diagnosis, it has to uncover the hierarchical profile of cancer and be able to place a
specimen within this profile.
NIH-PA Author Manuscript
In the present study, we propose that cancer can be promptly diagnosed, even at early stages,
by phylogenetic analysis of the serum proteome. Since cancer is an evolutionary condition that
involves genetic modifications and clonal production, it therefore requires an evolutionary
method of analysis. Such an analysis is possible if an algorithm for sorting out the polarity
(derived vs ancestral) of the MS values is available. We are demonstrating here through our
polarity assessment algorithm (UNIPAL) that this task can be performed, and MS data can be
analyzed with an evolutionary algorithm (Figure 1). Phyloproteomics is an evolutionary
analytical tool that sorts out mass-to-charge (m/z) values into derived (apomorphic) or ancestral
(plesiomorphic) and then classifies specimens according to the distribution pattern of their
apomorphies into clades (a group composed of all the specimens sharing the same
apomorphies). Phyloproteomics also illustrates the multiphasic nature of cancer by assigning
cancer specimens to a hierarchical classification with each hierarchy defined by the apomorphic
protein changes that are present in its specimens. The classification is presented in a graphical
display termed cladogram or tree. The assumption that all cancerous specimens fit into welldefined proteomic models (patterns based on a few peaks) that distinguish them from
noncancerous ones12-16 is replaced here by phylogenetically distinct clades of specimens with
each clade sharing unique protein changes (synapomorphies) among its specimens.
Methods
Proteomic Data
We used mass spectrometry (MS) data of serum proteins generated by surface-enhanced laser
desorption–ionization time-of-flight (SELDI-TOF) of 460 specimens from three types of
cancer: ovarian (143), pancreatic (70), and prostate (36), as well as from noncancerous
specimens (211). All sets of data used here are available from the NCI–FDA Clinical
Proteomics Program (http://home.ccr.cancer.gov/ncifdaproteomics/ppatterns.asp) and are
described and referred to in a few publications.12,13,15,17,18 From the prostate cancer data
set, we included only the confirmed cancerous specimens.
Polarity Assessment and Phylogenetic Analysis
NIH-PA Author Manuscript
We employed the continuous range of mass-to-charge ratio (m/z) values of all specimens for
the analysis. For polarity assessment (apomorphic [or derived] vs plesiomorphic [or ancestral]),
data were polarized with a customized algorithm (UNIPAL) written by the authors that
recognized novel and vanished MS peaks, as well as peaks signifying upregulated and
downregulated proteins for each specimen. Each of these events was coded as equal; however,
no standardization, normalization, or smoothing of the data was applied before or after polarity
assessment—UNIPAL does not require any of these processes to carry out the polarization.
Outgroups used to carry out polarity for each cancer type were selected from the noncancerous
specimens; each outgroup encompassed the total variability within the noncancerous
specimens.
UNIPAL requires a set of noncancerous specimens to be included in every separate data set in
order to be used as an outgroup. It determines the polarity for every m/z value among the
noncancerous specimens and then scores each value of the study group as derived or ancestral.
J Proteome Res. Author manuscript; available in PMC 2008 March 20.
Abu-Asab et al.
Page 3
The outgroup should be large enough to encompass all possible variations that exist within
noncancerous specimens.
NIH-PA Author Manuscript
For phylogenetic analysis, we used MIX, the parsimony program of PHYLIP version 3.57c,
19 to carry out separate phylogenetic parsimony analysis for each cancer type and then pooled
all the specimens of the three cancer types plus the noncancerous in a larger analysis that
included all 460 specimens. Processing with MIX was carried out in randomized and
nonrandomized inputs; however, no significant differences were observed between the two.
Phylogenetic trees were drawn using TreeView.20
Results and Discussion
NIH-PA Author Manuscript
The results of a phylogenetic analysis are best illustrated by a phylogenetic tree termed
cladogram that shows the hierarchical classification in a graphical format. Parsimony analysis
produced one most parsimonious cladogram (requiring the least number of steps in constructing
a classification of specimens) for each of the pancreatic and prostate specimens (Figure 2a,b),
five equally parsimonious cladograms for ovarian specimens (Figure 2c shows only one), and
about 100 equally parsimonious cladograms for the inclusive analysis (Figure 3 summarizes
only one). We examined all multiple equally parsimonious cladograms and found them to be
fundamentally very similar in topology. They differed only in the internal arrangement of some
minor branches where one or two specimens had equally plausible locations within their
immediate clade.
A complete separation of the cancer specimens from noncancerous ones depended on the size
of the noncancerous outgroup used to carry out polarity assessment. Polarizing the m/z values
with the largest size outgroups (ones encompassing the largest amount of variation) available
for each cancer type produced cladograms with separate groupings of cancerous and
noncancerous specimens, that is, no cancer specimens grouped with the healthy and vice versa
(100% sensitivity and specificity). However, with the use of randomly selected smaller
outgroups, sensitivity dropped to 96% and below; this illustrates the significance of using the
largest number possible for outgroup polarity assessment.
NIH-PA Author Manuscript
Each of the cladograms (Figure 2a–c) showed an upper bifurcation composed of cancerous
specimens, while the lower end of the cladogram was occupied by a number of basal clades
composed of noncancerous specimens and a central assemblage of noncancerous clades
adjacent to cancerous ones. The latter assembly formed a distinct order of well-resolved and
mostly single-specimen clades in the middle of the cladogram nested between the cancer and
healthy clades (bracketed arrows in Figure 2a–c); we termed them transitional clades (TC).
The transitional clades bordered their respective types (cancer or noncancer) in a tandem
arrangement that formed a transitional zone (TZ) between the noncancer and cancer clades.
When data of all specimens of the three cancer types were pooled together with noncancerous
ones and processed, each of the three cancers formed two large clades (the terminal and middle)
and numerous small transitional clades adjacent to the noncancerous ones (Figure 3). The
pancreatic and prostate clades formed sister groups in their terminal and middle clades, and
their terminal clades were nested within the ovarian clades. The ovarian specimens formed two
distinct clades at the upper part of the cladogram.
The cladograms revealed greater similarities in topology among cancer types. For each of the
three cancer types, there were two large recognizable clades (the terminal and the middle)
forming a major dichotomy that encompassed the majority of the specimens of each type
(Figure 2a–c). This dichotomy persisted in the inclusive cladogram as well (Figure 3), with
each of the cancers having two clades.
J Proteome Res. Author manuscript; available in PMC 2008 March 20.
Abu-Asab et al.
Page 4
NIH-PA Author Manuscript
The use of mass spectrometry (MS) of serum proteins to produce clinically useful profiles has
proved to be challenging and has generated some controversy.21-23 Although several methods
have been published thus far,13-16 they all either had cancer type-specific sorting algorithms
that produced below 95% specificity and did not apply well across other cancer types, did not
utilize all potentially useful variability within the data, or were not widely tested.16,24
Furthermore, their relative success has been limited to diagnosis without any of the predictive
conclusions potentially offered by phyloproteomics. Since cancer is an evolutionary condition
produced by a set of mutations,7 its study should include evolutionary sound methods of
analysis. Phylogenetics reveals both relatedness and diversity through a hypothesis of
relationships among the specimens on the basis of the parsimonious distribution of novel m/z
values of their proteomes.
This is the first report on the application of a phylogenetic algorithm to MS serum proteomic
data for cancer analysis. By developing and applying an algorithm for polarity assessment and
then using a parsimony phylogenetic algorithm for classifying specimens of three cancer types
(ovarian, pancreatic, and prostate), we demonstrated that phylogenetics can successfully be
applied to MS serum proteomic data for cancer analysis, diagnosis, typing, and possibly
susceptibility assessment. Additionally, phyloproteomics points out the presence of distinct
trends within cancer proteomic profiles.
NIH-PA Author Manuscript
Despite the good number of algorithms used for MS serum analysis,13-16 reproducibility and
comparability of proteomic analyses are unattainable because of the lack of broadly acceptable
universal methods of analysis. Phyloproteomics is composed of two algorithms that are
applicable to MS data of any cancer (Figure 1). The first algorithm, UNIPAL, is a new polarity
assessment program that we designed to work with MS data to produce a listing of novel derived
values in a coded format, and the second algorithm is a popular phylogenetic parsimony
program, MIX of the PHYLIP package,19 that uses the values generated by the first algorithm
to classify the specimens. MIX is a robust analytical package that has been tested by scientists
for the past 16 years, and is probably the most cited in phylogenetic studies. An added benefit
to this approach is that it makes possible the comparison among results from different data sets
and the evaluation of competing analytical tools.
NIH-PA Author Manuscript
Phylogenetics has the intrinsic ability to reveal meaningful biological patterns by grouping
together truly related specimens better than any other known methods (Table 1). Proteomic
variability encompasses ancestral and derived variations, and only derived m/z intensity values
are useful in classifying cancer types and subtypes into a meaningful hierarchy that reflects the
phylogeny and ontogeny of their proteomic profiles. While clustering techniques use the
presence of common peaks (without resolving their polarity) in order to create distinct patterns
and then fit a specimen within a pattern,12,14,16 phylogenetics requires polarity assessment
to sort out m/z intensities into derived and ancestral at first and then uses the distribution pattern
of derived values among the specimens to produce their classification (i.e., the cladogram).
Using only common intensity peaks without polarity assessment for pattern modelling has not
been the most reliable means of classification.12,14 This is because clustering usually involves
ancestral values and does not resolve multiple origins of a character (parallelisms), and both
result in polyphyletic grouping (having unrelated specimens). Furthermore, phylogenetics can
resolve the position of a novel specimen with new variations by placing it in a group that
comprises its closest relatives on the basis of the number of apomorphic mutations it shares
with them (Table 1).
Phyloproteomics has a potential for cancer predictivity. Predictivity here is defined as the
capacity of the classification to predict the characteristics of a specimen by determining the
specimen's location within a cladogram. By using an ample number of well-characterized
cancer specimens in an analysis, the unknown characters of a new specimen will be forecasted
J Proteome Res. Author manuscript; available in PMC 2008 March 20.
Abu-Asab et al.
Page 5
NIH-PA Author Manuscript
when it assembles within a clade in the cladogram. The specimen's location in a cladogram is
always based on the type of mutations it carries and shares with the clade members, which will
determine the diagnosis, cancer type, or possibly the susceptibility to developing cancer.
Cladogram topology shows a hierarchical accumulation of novel serum protein changes across
a continuum spanning from the transitional noncancerous specimens to the cancerous ones,
with the latter having the highest number of apomorphic mutations.
Cladograms also revealed that the three types of cancer have fundamentally similar topologies;
they all have one major dichotomy that indicates two lineages within each type (represented
on the cladograms by the terminal clade and the middle clade [Figures 2–3]). If this typification
holds true for additional cancer types, then it is possible that ontogenetically all types of cancers
undergo two major common pathways in their development. There are only a few recent reports
that support a dichotomous pattern of development8 in colorectal cancer,9 glioblastomas,10
and pancreatic carcinoma.11 Dichotomies may arise in cancer because of the selective
advantages of cells harboring various mutations; the surviving mutations can be genetic or
chromosomal,8,9 point mutation or amplification,10 or differential expression of alleles.11
NIH-PA Author Manuscript
Noncancerous transitional clades, present in all cladograms and mostly composed of individual
specimens, are the closest sister groups to cancer clades. Because of their proximity to cancer
clades, we hypothesize that these specimens, assumed to be from cancer-free individuals,
represent the early stages of cancer development that cannot yet be morphologically or
microscopically diagnosed as cancerous. For diagnostic purposes, cancerous and noncancerous
transitional specimens will always be challenging to classify by other techniques. Occasionally,
these specimens are distinct from one another by only very few apomorphies. The mostly single
specimen composition of the transitional clades attests to their uniqueness.
Current diagnosis of cancer is not based on the number of mutations or synapomorphies;
therefore, the determination of the status of a transitional specimen is still subjective unless a
clear definition that is based on derived mutations is established by pathologists. Till then we
suggest that the position of a transitional specimen within the transitional zone determines its
diagnosis; if a specimen is on the upper end of the transitional zone (i.e., bordering cancer
clades), then it is a cancerous specimen, and those occurring in the middle and lower end of
the transitional zone are to be called high risk specimens.
So far, we have not yet carried out any correlations between specimens on the cladograms and
patients' survival. Therefore, it is uncertain at this stage of the analysis if the terminal clades
of cancers represent the advanced stages of cancer progression or if the two major clades have
any prediction on prognosis.
NIH-PA Author Manuscript
Searching for biomarkers is a challenging process in biomedical research, and phyloproteomics
offers the capacity to uncover many possible ones. The phylogenetic program, MIX, lists the
shared derived m/z intensity values (synapomorphies) of each clade it produces, and each
synapomorphy is a possible biomarker. In other words, the cladogram serves as a map showing
the apomorphic m/z values of all potential biomarkers and their effective levels of groupings.
A synapomorphy may represent a novel protein, a disappeared protein, or an up/down regulated
protein; thus, these proteins corresponding to the apomorphic m/z values need to be identified
if they are to be explored as biomarkers. Since the cladograms have hierarchical arrangement
(i.e., presenting various levels of groupings) one can look for biomarkers at various levels of
the cladogram. An apomorphic protein (we would like to call it apotein) that defines a clade
will serve as a potential biomarker for the clade, while another apotein defining a nested
subclade within the clade will be restricted as biomarker to the subgroup within the clade.
J Proteome Res. Author manuscript; available in PMC 2008 March 20.
Abu-Asab et al.
Page 6
Conclusion
NIH-PA Author Manuscript
Phyloproteomics offers a new paradigm in cancer analysis that reveals relatedness and diversity
of cancer specimens in a phylogenetic sense; its predictive power is a useful tool for diagnosis,
characterizing cancer types, discovering biomarkers, and identifying universal characteristics
that transcend several types of cancer. The implications of the new paradigm are of valuable
clinical, academic, and scientific value.
References
NIH-PA Author Manuscript
NIH-PA Author Manuscript
1. Hede K. $104 million proteomics initiative gets green light. J. Natl. Cancer Inst 2005;97(18):1324–
1325. [PubMed: 16174850]
2. Issaq HJ, Conrads TP, Prieto DA, Tirumalai R, Veenstra TD. SELDI-TOF MS for diagnostic
proteomics. Anal. Chem 2003;75(7):148A–155A.
3. Marvin LF, Roberts MA, Fay LB. Matrix-assisted laser desorption/ionization time-of-flight mass
spectrometry in clinical chemistry. Clin. Chim. Acta 2003;337(1−2):11–21. [PubMed: 14568176]
4. Merchant M, Weinberger SR. Recent advancements in surface-enhanced laser desorption/ionization–
time-of-flight-mass spectrometry. Electrophoresis 2000;21(6):1164–1177. [PubMed: 10786889]
5. Pusch W, Flocco MT, Leung SM, Thiele H, Kostrzewa M. Mass spectrometry-based clinical
proteomics. Pharmacogenomics 2003;4(4):463–476. [PubMed: 12831324]
6. Srinivas PR, Srivastava S, Hanash S, Wright GL Jr. Proteomics in early detection of cancer. Clin.
Chem 2001;47(10):1901–1911. [PubMed: 11568117]
7. Wyllie AH, Bellamy CO, Bubb VJ, Clarke AR, Corbet S, Curtis L, Harrison DJ, Hooper ML, Toft N,
Webb S, Bird CC. Apoptosis and carcinogenesis. Br. J. Cancer 1999;80(Suppl 1):34–37. [PubMed:
10466759]
8. Loeb KR, Loeb LA. Significance of multiple mutations in cancer. Carcinogenesis 2000;21(3):379–
385. [PubMed: 10688858]
9. Chung DC. The genetic basis of colorectal cancer: insights into critical pathways of tumorigenesis.
Gastroenterology 2000;119(3):854–865. [PubMed: 10982779]
10. Hayashi Y, Yamashita J, Watanabe T. Molecular genetic analysis of deep-seated glioblastomas.
Cancer Genet Cytogenet 2004;153(1):64–68. [PubMed: 15325097]
11. Adsay NV, Merati K, Andea A, Sarkar F, Hruban RH, Wilentz RE, Goggins M, Iocobuzio-Donahue
C, Longnecker DS, Klimstra DS. The dichotomy in the preinvasive neoplasia to invasive carcinoma
sequence in the pancreas: differential expression of MUC1 and MUC2 supports the existence of two
separate pathways of carcinogenesis. Mod. Pathol 2002;15(10):1087–1095. [PubMed: 12379756]
12. Petricoin EE, Paweletz CP, Liotta LA. Clinical applications of proteomics: proteomic pattern
diagnostics. J. Mammary Gland Biol. Neoplasia 2002;7(4):433–440. [PubMed: 12882527]
13. Alexe G, Alexe S, Liotta LA, Petricoin E, Reiss M, Hammer PL. Ovarian cancer detection by logical
analysis of proteomic data. Proteomics 2004;4(3):766–783. [PubMed: 14997498]
14. Conrads TP, Fusaro VA, Ross S, Johann D, Rajapakse V, Hitt BA, Steinberg SM, Kohn EC, Fishman
DA, Whitely G, Barrett JC, Liotta LA, Petricoin EF III, Veenstra TD. High-resolution serum
proteomic features for ovarian cancer detection. Endocr.-Relat. Cancer 2004;11(2):163–178.
[PubMed: 15163296]
15. Zhu W, Wang X, Ma Y, Rao M, Glimm J, Kovach JS. Detection of cancer-specific markers amid
massive mass spectral data. Proc. Natl. Acad. Sci. U.S.A 2003;100(25):14666–14671. [PubMed:
14657331]
16. Adam BL, Qu Y, Davis JW, Ward MD, Clements MA, Cazares LH, Semmes OJ, Schellhammer PF,
Yasui Y, Feng Z, Wright GL Jr. Serum protein fingerprinting coupled with a pattern-matching
algorithm distinguishes prostate cancer from benign prostate hyperplasia and healthy men. Cancer
Res 2002;62(13):3609–3614. [PubMed: 12097261]
17. Petricoin EF, Ornstein DK, Paweletz CP, Ardekani A, Hackett PS, Hitt BA, Velassco A, Trucco C,
Wiegand L, Wood K, Simone CB, Levine PJ, Linehan WM, Emmert-Buck MR, Steinberg SM, Kohn
EC, Liotta LA. Serum proteomic patterns for detection of prostate cancer. J. Natl. Cancer Inst 2002;94
(20):1576–1578. [PubMed: 12381711]
J Proteome Res. Author manuscript; available in PMC 2008 March 20.
Abu-Asab et al.
Page 7
NIH-PA Author Manuscript
18. Petricoin EF III, Ardekani AM, Hitt BA, Levine PJ, Fusaro VA, Steinberg SM, Mills GB, Simone
C, Fishman DA, Kohn EC, Liotta LA. Use of proteomic patterns in serum to identify ovarian cancer.
Lancet 2002;359(9306):572–577. [PubMed: 11867112]
19. Felsenstein, J. PHYLIP: Phylogeny Inference Package, version 3.2.; Cladistics. 1989. p. 164-166.
20. Page RD. TreeView: an application to display phylogenetic trees on personal computers. Comput.
Appl. Biosci 1996;12(4):357–358. [PubMed: 8902363]
21. Baggerly KA, Morris JS, Coombes KR. Reproducibility of SELDI-TOF protein patterns in serum:
comparing datasets from different experiments. Bioinformatics 2004;20(5):777–785. [PubMed:
14751995]
22. Sorace JM, Zhan M. A data review and re-assessment of ovarian cancer serum proteomic profiling.
BMC Bioinformatics 2003;4(1):24. [PubMed: 12795817]
23. Check E. Proteomics and cancer: running before we can walk? Nature 2004;429(6991):496–497.
[PubMed: 15175721]
24. Ornstein DK, Rayford W, Fusaro VA, Conrads TP, Ross SJ, Hitt BA, Wiggins WW, Veenstra TD,
Liotta LA, Petricoin EF III. Serum proteomic profiling can discriminate prostate cancer from benign
prostates in men with total prostate specific antigen levels between 2.5 and 15.0 ng/mL. J. Urol
2004;172(4 Pt 1):1302–1305. [PubMed: 15371828]
NIH-PA Author Manuscript
NIH-PA Author Manuscript
J Proteome Res. Author manuscript; available in PMC 2008 March 20.
Abu-Asab et al.
Page 8
NIH-PA Author Manuscript
NIH-PA Author Manuscript
Figure 1.
Schematic representation of phyloproteomic analysis. The process involves two steps. The first
is the algorithmic sorting of the m/z values into derived (exists in some but not all specimens)
and ancestral (in all specimens); the derived values are those signifying either novel, vanished,
or up and down regulated peaks. The second step is a parsimony phylogenetic analysis that
groups the specimens on the basis of the shared derived values.
NIH-PA Author Manuscript
J Proteome Res. Author manuscript; available in PMC 2008 March 20.
Abu-Asab et al.
Page 9
NIH-PA Author Manuscript
NIH-PA Author Manuscript
NIH-PA Author Manuscript
Figure 2.
Phyloproteomic cladograms of three cancers: (A) pancreatic, (B) prostate, and (C) ovarian.
The nodes of major clades are marked as follows: •, terminal cancer clade; ○, middle cancer
clade; □, middle healthy clade; and ■, basal healthy clade. Transitional zones (TZ) are marked
by bracketed arrows.
J Proteome Res. Author manuscript; available in PMC 2008 March 20.
Abu-Asab et al.
Page 10
NIH-PA Author Manuscript
NIH-PA Author Manuscript
Figure 3.
A phyloproteomic analysis showing dichotomous distribution of cancers into two clades. A
schematic cladogram of a comprehensive phyloproteomic analysis composed of 460 specimens
representing ovarian, pancreatic, and prostate cancers as well as noncancerous specimens.
Specimens of every cancer type are classified into two clades: a terminal and middle, as well
as transitional clades. Healthy specimens are classified into a major healthy clade and
transitional clades.
NIH-PA Author Manuscript
J Proteome Res. Author manuscript; available in PMC 2008 March 20.
Abu-Asab et al.
Page 11
Table 1
The Advantages of Phylogenetic Analysis over Statistical Cluster Analysis
NIH-PA Author Manuscript
phylogenetic analysis
cluster analysis
■ produces a classification based on shared derived similarities and reflects
phyletic relationships
■ uses one algorithm for the analysis of all types of cancers
■ discriminates between ancestral and derived states; uses only derived
character states (apomorphies)
■ resolves issues of parallelism (multiple independent origins) by parsimony
or maximum likelihood
■ offers predictivity
■ produces a classification based on overall similarity and may not
reflect phyletic relationship
■ may require a specific algorithm for each cancer type
■ does not discriminate between ancestral and derived character
states; uses both
■ does not resolve issues of parallelism
■ does not offer predictivity.
NIH-PA Author Manuscript
NIH-PA Author Manuscript
J Proteome Res. Author manuscript; available in PMC 2008 March 20.
Department of Veterans Affairs
Medical Center
50 Irving Street NW
Washington, DC 20422
May 13, 2008
John VanMeter, Ph.D.
Acting Director, Center for Functional and Molecular Imaging
Georgetown University Medical Center
3900 Reservoir Road NW, Suite LM-14
Washington, DC 20057-1488
SUBJECT:
Defense Center of Excellence for Psychological Health (PH) and Traumatic Brain Injury
(TBI)
Military Psychological Health Research – Complementary and Alternative Strategies
funding opportunity
W81XWH-08-PH-TBI
Dear Dr. VanMeter:
I am writing to express my strong interest and willingness to participate in your proposed study, entitled
“Distinguishing Responders from Non-responders in a Mind-Body Treatment for PTSD.” I believe that
your plan to utilize phlyogenetic methodology to distinguish treatment responders versus non-responders
based on their neuroendocrine and neuroimaging biomarkers is quite novel, and deserves exploration.
As a neuroendocrinologist with extensive experience in the design and conduct of both allopathic and
CAM-related clinical and laboratory research, I find the use of mind body medicine as a potential
treatment modality for patients with PTSD to be a promising area of research. Moreover, your project fits
thematically with my consultation with Dr. Dutton and her colleagues on a newly received Concept
Award from DOD to develop a mind-body intervention for use with primary care physicians in treating
their veteran patients with PTSD. The study proposed herein is likely to provide further novel
information regarding neural mechanisms that may help identify patients with PTSD who are likely to
benefit from this type of treatment strategy.
I shall participate in your proposed study as a consultant focusing on research design and data analysis,
neuroendocrinology, and CAM. I shall also help to recruit subjects within the Washington DC VA
Medical Center.
I enthusiastically support your proposed study and look forward to participating on it with you and Drs.
Mary Ann Dutton and Hakima Amri. This study will expand our ongoing collaboration regarding warrelated PTSD.
Sincerely,
Marc R. Blackman, M.D.
Associate Chief of Staff for Research & Development
Washington DC VAMC
Research Service (151)
50 Irving Street
Washington, DC 20422
Intellectual and Material Property Plan
Rights to scientific discoveries, new techniques, or algorithms resulting from the combined efforts of the PI’s
during the course of this study will be equally shared by all three and Georgetown University as dictated by
policies of the Georgetown Office of Technology Transfer. Rights to the patent potential and/or commercial
potential of the phylomics® algorithm were solely granted to Dr. Hakima Amri and her co-inventors as defined
in her existing patent application.
Statement of Work
The proposed study will use a novel classification technique called ‘phylomics’ (patent pending) to identify
PTSD treatment responders from non-responders based on their neurophysiological signature. Subjects will be
randomized to one of two interventions: a CAM-based imagery modality called Guided Imagery (Naparstek
2004) or standard exposure therapy. This study will not only determine the efficacy of the CAM treatment for
PTSD compared to and accepted therapy it will also provide scientific evidence of the changes induced by the
each treatment in the neuroendocrine and neurobiological profile of the subjects. The treatment outcomes will
be compared with the predictions generated by the phylomics algorithm. Thus, the proposed study has multiple
endpoints each of which independently will move the study and treatment of PTSD forward and together
represent a major advance in this field.
Methods
The experimental design centers on comparing CAM-based Guided Imagery intervention to the more standard
exposure therapy. Subjects entering the study will be randomized into one of the two arms. Baseline
assessments will include administration of a number of clinical assessment instruments, collection of saliva &
blood specimens, and fMRI imaging. Follow-up assessment will be performed upon completion of the
treatment. The salvia, blood draw, and fMRI scanning will be performed within two weeks of the last treatment
session. PTSD symptom severity will be assessed using the CAPS no more than four weeks after the last
treatment.
Outcomes of the Study
The major outcome of this study will be fourfold. First, we will determine the efficacy of Guided Imagery, a
CAM-based treatment relative to exposure therapy, which is part of the Veteran’s Administration clinical
practice guidelines (Clinical Practice Guideline Workgroup 2004). Second, we will be able to identify a set of
biomarkers based on functional MRI, neuroendocrine, and genomic data that represents a signature of PTSD.
Third, using the baseline measurements as input to a novel classification algorithm developed by Dr. Amri
called Phylomics (patent pending). The phylomics algorithm separates groups of subjects based on the most
parsimonious hierarchical separation on the basis of shared derived state(s). Using all of these biomarkers we
expect this algorithm will be able to identify subjects who will be treatment responders from non-responders
with the ultimate goal of identifying targeted treatments optimized to the individual PTSD sufferer.
Human Subjects Protections
The proposed project will involve studying humans at Georgetown University Medical Center. Institutional
review will be obtained from Georgetown University. Additional IRB review will be required by the
Washingon, DC Veterans Administration and the DOD, as the proposed studies will include subjects recruited
from the VA and DOD funding will support this study. We anticipate and have planned for nine months to
complete the review at all three IRB reviews.
Location of Researchers
Drs. VanMeter, Dutton, Amri and Amdur
Georgetown Univ., Preclinical Sci Bldg, Suite LM-14,
3900 Reservoir Road NW, Washington, DC 20057-1488
Leadership Plan
This proposal is a collaboration between Drs. VanMeter (Neuroloy), Amri (Physiology and Biophysics), and
Dutton (Psychiatry) all three of which are PI’s. All three will provide oversight of the entire study and
development and implementation of all policies, procedures and processes. In these roles, all three will be
responsible for the implementation of the scientific agenda, the specific aims, and ensure systems are in place to
guarantee institutional compliance for the protection of human subjects, data analysis, and facilities.
Specifically, Dr. VanMeter will oversee Aim 3 (fMRI) and be responsible for all human subjects research
approvals. Dr. Dutton is primarily responsible for Aim 1 (interventions). Dr. Amri will have primary
responsibility for Aim 2 (neuroendocrine and proteomics/genomics) and Aim 4 (Phylomics). Dr. VanMeter will
serve as contact PI and will assume fiscal and administrative management responsibility including maintaining
communication among PI’s and key personnel through monthly meetings. He will be responsible for
communication with the sponsor (Defense Center for Excellence) and submission of annual reports. Publication
authorship will be based on the relative scientific contributions of the PIs and key personnel.
Institution Name
Time
Outcome
Georgetown Univ
2 mos.
Dr. VanMeter
Georgetown Univ
1 mos.
Dr. VanMeter
Georgetown Univ
5 mos.
IRB protocol
and consent
forms
First level IRB
approval
IRB approval
for proposed
study
Incorporate any changes
requested by DOD and VA
and resubmit to all IRBs
Dr. VanMeter
Georgetown Univ
1 mos.
Final IRB
approval study
Extension of the phylomics
algorithm to work with fMRI
and neuroendocrine data
Develop of manuals for the
interventions. Recruit and
train interventionists
Development and testing of
fMRI stimulation paradigms
Recruit and screen subjects
for the first wave of
intervention groups
Dr. Amri
Georgetown Univ
9 mos.
Improved
phylomics
algorithm
Dr. Dutton
Georgetown Univ
9 mos.
Manuals and
Interventionists
Dr. VanMeter
Georgetown Univ
9 mos
fMRI paradigms
All
Georgetown Univ
3 mos.
Collect and assess
neuroendocrine and genomic
specimens
Dr. Amri
Georgetown Univ
3 mos.
26 subjects
enrolled in the
study
Bio-samples on
Wave 1
Perform baseline fMRI
scanning on wave 1
Dr. VanMeter
Georgetown Univ
3 mos.
fMRI on
Wave 1
Wave 1 Interventions
Interventionists
Georgetown Univ
4 mos.
Follow-up of Wave 1
Drs VanMeter
and Amri
Georgetown Univ
1 mos.
Wave 1
Completed
Wave 1 Followup
Review of Wave 1 results
and identification of any
problems
All
Georgetown Univ
1 mos.
Wave 1 Review
Phase
3B
Wave 2-4 recruited, tested,
and run through intervention
All
Georgetown Univ
18 mos.
Waves 2-4
Completed
Data analysis, paper writing,
final report generation
All
Georgetown Univ
12 mos.
Peer-reviewed
journal papers
and final report
to DOD
Phase 3A
Phase 2
Phase 1B
Phase 1A
Individual
Responsible
Dr. VanMeter
Phase 4
Tasks
Project
Phase
Task
Submit IRB protocol and
consent form to
Georgetown’s IRB
Incorporate any changes
requested and resubmit
Submit modified IRB
protocol and consent form to
DOD and VA IRBs
Impact Statement
Previous studies of CAM (Complementary and Alternative Medicine) modalities to treat PTSD have
demonstrated positive outcomes including studies of victims of war-related trauma in Kosovo (Gordon, Staples
et al. 2004). These studies were limited by the lack of comparison to an accepted treatment modality such as
exposure therapy. Furthermore, the neurological and physiological mechanisms that underlie the treatment
effects have not been identified. Some of the gaps that need to be addressed in future studies of PTSD identified
by the IOM (Institute of Medicine) include testing treatment efficacy using randomized control trials,
investigator independence, and investigating the factors related to outcome: loss of PTSD diagnosis and
symptom improvement (Institute of Medicine: Committee on Treatment of Posttraumatic Stress Disorder 2007).
Our study is designed to tackle each of these issues.
We propose to compare a positive mental imagery technique called Guided Imagery (Naparstek 2004) to
Prolonged Exposure, which uses mental imagery to revisit the traumatic event. We expect this CAM-based
treatment to reduce PTSD symptoms with an effect size that is equivalent to exposure therapy. This part of the
study alone if successful will provide validation of a relatively new treatment for PTSD that is more readily
implemented. A negative result for this part of the study would also be an important outcome regarding this
treatment.
In addition, we will collect a number of measures on each subject at baseline and immediately after the
conclusion of the interventions. These will include stress hormones such as cortisol and DHEA/DHEA-s as well
as genomic and proteomic data from peripheral blood samples. Further, we will investigate the neurobiology of
PTSD and remission using functional MRI. Together these measures will provide a biomarker profile of PTSD
from which we will be able to further our understanding of the neuronal, physiological, and genetic basis of
PTSD. By examining these measures both at baseline and follow-up we will be able to identify the factors that
lead to remission of PTSD symptoms. Based on these factors new treatments can be developed that target
relevant neural and physiological mechanisms. Proteomic and genomic markers could be used to identify
individual soldiers that would benefit from specialized pre-deployment inoculation strategies centered on stress
management.
Finally, this study will use a novel classification technique called phylomics developed by one of the PI’s that is
based on the techniques used in genomics to separate classes of species. This algorithm, which has a patent
pending, separates groups of subjects using the most parsimonious hierarchical separation on the basis of shared
derived ‘state(s)’. This algorithm has been successfully used to separate out different cancer specimens from
healthy tissues. Using the biomarkers colleted in this study as input, we expect this algorithm will not only
identify subjects who will respond to treatment, but ultimately identify targeted treatments optimized to the
individual PTSD sufferer.
Thus, this study will produce three main endpoints: 1) a test of the efficacy of a CAM-based imagery
treatment (Guided Imagery) against an established treatment (Prolonged Exposure); 2) further elucidate the
neurological/physiological mechanisms underlying PTSD and subsequent changes related to treatment; and
3) test the ability of phylomics, a novel classification algorithm, to predict PTSD treatment responders from
non-responders. Each of these results on their own has the potential to make a significant impact on PTSD
treatment and further our understanding of this debilitating disorder. Combined, this study represents a unique
opportunity to fundamentally change our understanding of PTSD and how to optimize treatments of individual
patients.
Innovation Statement
The proposed study includes two major innovative components. First, we will use a novel classification
technique called phylomics developed by one of the PI’s identify subjects treatment responders from nonresponders. Using the phylomics algorithm we will be able to classify subjects a priori using their baseline
measures. Furthermore, this algorithm will identify sub-classes within the responder and non-responder groups
(Figure 1). Second, we will collect both baseline and post-treatment data on each subject to assess their
proteomic and genomic profile, neuropsychological assessments, and their underlying neurobiological and
physiological state to provide a complete picture of the homeostatic state of the individual. Each of these
measures has been used in isolation to provide partial representation of the factors that contribute to PTSD. We
will be combining all of these data together to build a holistic framework for PTSD. Both of these components
will leverage the results of the random control trial assessment of Guided Imagery (Naparstek 2004), a CAMbased treatment modality in comparison to exposure therapy, a standard treatment.
The phylomics algorithm, which has a patent pending,
1: Hypothesized output of phylomics to PTSD derived
separates groups of subjects using the most parsimonious Figure
from the neuronal, physiological, and proteomic/genomic
hierarchical separation on the basis of shared derived
signature of individual subjects.
state(s). This algorithm has been successfully used to
separate out different cancer specimens from healthy
tissues. Using the biomarkers colleted in this study as
input, we expect this algorithm will be able to identify
subjects who will be treatment responders from nonresponders on an a priori basis with the added ability to
determine the combination of biomarkers needed to
make that distinction. Beyond that high-level
classification of subjects, this algorithm will generate
sub-classes of subjects that are likely to have meaningful
distinctions such PTSD with and without depression.
Ultimately, we expect that application of this algorithm
in the context of this study will lead to the ability to
identify targeted treatments optimized to the individual PTSD sufferer.
The measures collected on each subject at baseline and immediately after the conclusion of the interventions
will include stress hormones such as cortisol and DHEA/DHEA-s as well as genomic and proteomic data from
peripheral blood samples. Further, we will investigate the neurobiology of PTSD and remission using functional
MRI. Together these measures will provide a biomarker profile of PTSD from which we will be able to further
our understanding of the neuronal, physiological, and genetic basis of PTSD. By examining these measures both
at baseline and follow-up we will be able to identify the factors that lead to remission of PTSD symptoms.
Based on these factors new treatments can be developed that target those neural and physiological mechanisms.
Proteomic and genomic markers could be used to identify individual soldiers that would benefit from
specialized pre-deployment inoculation strategies centered on stress management.
Finally, this study will produce three main endpoints: 1) demonstrate the ability of phylomics, a novel
classification algorithm, to predict PTSD treatment responders from non-responders); 2) combine the
neuronal, physiological, and proteomic/genomic measures collected to derive a complete picture of the
mechanisms underlying PTSD and subsequent changes related to treatment; and 3) test of the efficacy of a
CAM-based imagery treatment (Guided Imagery) against an established treatment (Prolonged Exposure).
Each of these endpoints on their own represents pioneering advances in our understanding of PTSD and its
treatment. The combination of all three endpoints provides a unique opportunity to fundamentally change our
understanding of PTSD and how to optimize treatments of individual patients.