For Peer Review - Center for Functional and Molecular Imaging
Transcription
For Peer Review - Center for Functional and Molecular Imaging
Technical Abstract Background Post-Traumatic Stress Disorder (PTSD) is a multifactorial disease that develops following exposure to traumatic events ranging from motor vehicle accidents to terrorism. While most individuals recover from this acute form of stress, others are left with a devastating experience that could lead to a whole spectrum of mental challenges. The complex and heterogeneous response to trauma makes treatment design and assessment a challenge. While the recently released report from the Institute of Medicine (IOM) concluded that only psychotherapies based on exposure therapy met their evidenced-based criteria for efficacy, the report recommended that future studies of treatment efficacy include use of randomized control trials, investigator independence, and proper handling of attrition (IOM: Committee on Treatment of Posttraumatic Stress Disorder, 2007). Gaps with regards to treatment of PTSD identified by the IOM included determining length of treatment necessary, long-term follow-up of subjects, studying important veteran subpopulations, and investigating three factors related to outcome: loss of PTSD diagnosis, symptom improvement, and end state functioning. Objective Therefore, we propose a multidisciplinary approach to comparing a positive guided imagery-based CAM treatment outcomes to the above mentioned exposure therapy on war zone-exposed PTSD soldiers using state-of the-art functional brain imaging technology and a novel high throughput analytical platform, phylomics® (patent pending), that is able to discriminate based on proteomics/genomics signatures between treatment responders and non-responders. Specific Aims/Hypothesis: Aim 1: To compare Guided Imagery against Prolonged Exposure therapy. These two treatment modalities are similar in structure but differ in terms of emphasis on positive (Guided Imagery) and traumatic (Prolonged Exposure) imagery. Aim 2: To identify the impact of both treatment regimes on neuroendocrine markers of stress and neurobiological changes. We will focus on both central and peripheral pathways. We hypothesize that the physiological impact of Guided Imagery treatment will be positive changes in each of these stress makers: glucocorticoids, DHEA, DHEA-S, neuropeptide Y, allopregnanolone. Aim 3: To determine the neurobiological changes associated with treatment. Previous functional MRI (fMRI) studies of PTSD, have generally reported a decrease in activity in the medial prefrontal cortex (mPFC) and the anterior cingulate cortex (ACC) with a corresponding increase in the amygdala. Activity in the mPFC and ACC has also been reported to negatively correlate with PTSD severity. We predict that activation of ACC during an SDI task will increase after treatment while the mPFC and amygdala will remain unchanged. Aim 4: To methodically examine treatment responders from non-responders using phylomics®. The baseline measures of stress both neuroendocrine and neuronal activity in addition to the genetic profile will be used as input to the this algorithm and is expected to be able to distinguish responders from non-responders from their baseline profile. Study Design: It centers on comparing CAM-based Guided Imagery to the standard exposure therapy. Subjects will be randomized into one of the two arms. Baseline assessments will include a number of clinical assessment instruments, collection of saliva, blood specimens, and fMRI. Two follow-up assessments will be performed: upon completion of the treatment and six months later. Biological specimens and fMRI will be performed within 2 weeks of the last treatment session. PTSD symptom severity will be assessed using the CAPS no more than four weeks after the last treatment. The final assessment of PTSD symptoms will be performed over the phone at the 6-month follow-up. Innovation: The proposed study will use a novel classification technique called ‘phylomics®’ (patent pending) to identify PTSD treatment responders from non-responders based on their neurobiologic signature. Success will be based on randomized assignment to one of the two interventions. This study will not only determine the efficacy of the CAM treatment for PTSD compared to accepted therapy, but it will also provide scientific evidence of the changes induced by the each treatment in the neuroendocrine and neurobiological profile of the subjects. The treatment outcomes will be compared with the predictions generated by the phylomics® algorithm. Thus, the proposed study has many possible outcomes. Each outcome will independently move the study of the treatment of PTSD forward and together will form a basis of understanding representing a major advance in this field. Impact: The relevance of the proposed study to treatment of PTSD is 1) a systematic comparison of a CAMbased treatment to the more accepted exposure based therapy, 2) identification of the changes in neural markers that relate to treatment outcome, and 3) a test of a novel technique to predict treatment outcome. Public Abstract Background: Post-Traumatic Stress Disorder (PTSD) is a debilitating mental health disease that has been increasing in occurrence, especially in the military population deployed in war zones. PTSD in our returning soldiers from Operation Iraqi Freedom (OIF) has been estimated at 9.8% with an odds ratio (OR) of 5.51 and in those returning from Operation Enduring Freedom (OEF) at 2.1% with an OR of 2.52. Unfortunately, several surveys have shown that the percent of individuals who received mental health services within one year of postdeployment for any disorder was extremely low primarily due to concerns of stigma: 23% for members of the army from OIF, 29% for marines from OIF, and 40% for soldiers from OEF. Furthermore, there has been more than a 79.5% increase in the number of veterans receiving PTSD disability compensation from 1999 to 2004 (Office of Inspector General 2005) totaling $4.3 billion in 2004. The full cost of PTSD remains to be seen but undoubtedly the costs to the individual and the community at large as well as the consequences to mission readiness are quite high. Ultimate Applicability of the proposed research: This study is approached from three different angles to better cover the complexity of this devastating illness: a) a recently developed CAM-based modality emphasizing positive Guided Imagery will be compared to Prolonged Exposure imagery that focuses on revisiting traumatic events and we anticipate that PTSD patients’ symptoms will improve; b) in addition to the psychological assessments, stress-related neurobiological markers will be measured in saliva and blood specimens and correlated to neuronal parameters obtained by functional magnetic resonance imaging (fMRI) before (at baseline) and after treatment; c) Integral blood proteins (proteome) and genes (genome) will be fractionated using cutting-edge technology to decipher the neurobiological signature of each patient, a step that has been of a great challenge to the biomedical research community due to the molecular heterogeneity and complexity of the disease. By applying our novel analytical method, phylomics® (patent pending), we expect to translate the proteome and genome information combined with the neuronal and physiological data to derive biologically meaningful relationships that groups together those patients who share similar molecular signatures. The analysis is innovative because it draws from evolutionary-based principles of parsimony phylogenetic analysis that have successfully been utilized for the past 50 years in other biology disciplines but rarely applied to biomedicine, a field that has been dominated by pure descriptive biostatistics. We expect that phylomics will solve the issue of molecular heterogeneity that might in turn explain the complexity of the disease and the capacity of recovering from trauma, response to treatment. Consumer-related outcome: All aspects of the study are based on non-invasive interventions. Two outcomes of this study are likely to have a major impact on the consumer. First, Guided Imagery is a much less difficult therapy to implement and less painful to the patient than Exposure therapy. Thus, if Guided Imagery proves as efficacious as Exposure therapy this will translate into a therapy that can be more widely applied in a variety of clinical settings. Second, we expect the phylomics® algorithm will be able to identify subjects who will be treatment responders from non-responders and ultimately provide a method for identifying targeted treatments optimized to the individual PTSD sufferer. Projected time to clinical translation: Once our hypotheses are verified, we think that the treatment modality, neurobiological correlates, and molecular signature could be easily implemented in the clinic. It requires a trained Mind-Body-Medicine facilitator, an MRI facility which an integral component of hospitals now, and a clinical laboratory to process the blood for phylomics® analysis. The advantage of our proposed study is its multidimensionality and clinical potential. A. BACKGROUND A number of studies have begun to examine the range of mental health problems (Hoge, Castro et al. 2004; Hoge 2006; Milliken, Auchterlonie et al. 2007) and PTSD in particular (Hoge, Terhakopian et al. 2007) in soldiers returning from Operation Enduring Freedom (OEF) and Operation Iraqi Freedom (OIF). Hoge, et al., 2006, used the Post-Deployment Health Assessment (PDHA) from 303,905 Army soldiers and Marines to assess a variety of mental health related issues in soldiers returning from OEF and OIF. They found the rate of PTSD for those who served in OIF was 9.8% with an odds ratio (OR) of 5.51. While the rate for those returning from OEF was 2.1% with an OR of 2.52. They hypothesized that the difference in the rates between the two theaters was related to the number of combat related instances encountered (Hoge, Auchterlonie et al. 2006). Unfortunately, the percent of individuals who seek mental health services within one year of post-deployment for any disorder was extremely low primarily due to concerns of stigma: 23% for members of the army from OIF, 29% for marines from OIF, and 40% for soldiers from OEF (Hoge, Castro et al. 2004). One measure of the economic costs of PTSD can be gauged from the amount paid by the Department of Veteran Affairs (VA) in disabilities payments for PTSD. There has been more than a 79.5% increase in the number of veterans receiving PTSD disability compensation from 1999 to 2004 while payments for PTSD disabilities rose 148.8% to $4.3 billion in 2004 (Office of Inspector General 2005). The full cost of PTSD remains to be seen but undoubtedly costs to the individual, the community at large, and the consequences to mission readiness are quite high. Treatments for PSTD include psychotherapeutic approaches that rely on “re-living” the event such as cognitive behavior therapy (CBT) including exposure-based therapies as well as eye movement desensitization and reprocessing (EMDR). A number of drug treatments have been used with varying degrees of success including various selective serotonin reuptake inhibitors (SSRI), anti-epileptics such as dilantin, and alpha-adrenergic blockers. The VA’s clinical practice guidelines derived from an evidence-based assessment of treatments conclude significant benefit of SSRIs and/or a number of psychotherapies including cognitive therapy, exposure therapy, and EMDR (Clinical Practice Guideline Workgroup 2004). While the recently released report from the Institute of Medicine (IOM) concluded that only psychotherapies based on exposure therapy met their evidenced-based criteria for efficacy (Institute of Medicine: Committee on Treatment of Posttraumatic Stress Disorder 2007). The IOM’s report recommended that future studies of treatment efficacy include use of randomized control trials, investigator independence, and proper handling of attrition including systematic follow-up of noncompleters (Institute of Medicine: Committee on Treatment of Posttraumatic Stress Disorder 2007). Gaps with regards to treatment of PTSD identified by the IOM included determining length of treatment necessary, longterm follow-up of subjects, studying important veteran subpopulations, and investigating three factors related to outcome: loss of PTSD diagnosis, symptom improvement, and end state functioning. BOLD fMRI: The proposed experiments will employ functional magnetic resonance imaging (fMRI), which is based on the principle that the recorded MRI signal changes with the magnetic properties of intravascular contents. Since deoxygenated hemoglobin is paramagnetic (Thulborn, Waterton et al. 1982), it acts as an endogenous intravascular paramagnetic contrast agent (Ogawa, Lee et al. 1990; Belliveau, Kennedy et al. 1991; Turner, Jezzard et al. 1993). Blood Oxygenation Level Dependent (BOLD) contrast results from an increase in cerebral blood flow greater than local oxygen consumption (Ogawa, Lee et al. 1990). As a result of this discrepancy, local concentration of deoxyhemoglobin is decreased causing an increase signal intensity on T2* weighted images, which allows estimation of task-related neural activation when compared to a baseline image. fMRI has been used successfully to investigate cognitive processes and neurological disorders (Bandettini, Wong et al. 1992; Frahm, Bruhn et al. 1992; Ogawa, Tank et al. 1992; Eden, VanMeter et al. 1996). Thus, fMRI provides a method for examining neuronal activity with the advantage of high spatial resolution. Neuronal Correlates of PTSD Severity/Response to Treatment: One commonly used paradigm in studies of PTSD has been script driven imagery (SDI), which uses a script describing a traumatic event. Most studies have found either decreased or no activation of the mPFC (medial prefrontal cortex) and the ACC (anterior cingulate cortex) in PTSD (Shin, McNally et al. 1999; Britton, Phan et al. 2005; Lanius, Frewen et al. 2007). In an emotional Stroop task Bremner, et al. found decreased activity in the ACC in PTSD compared to exposed non- PTSD (Bremner, Vermetten et al. 2004). However, in the same study the PTSD subjects had equivalent activation of the ACC for the classical Stroop task indicating that dysfunction of the ACC is specific to emotional processing in PTSD. Using the CAPS (Clinician Administered PTSD Scale) score, several studies have found a significant relationship between neuronal activity and severity of PTSD (Rauch, Whalen et al. 2000; Shin, Wright et al. 2005; Bryant, Kemp et al. 2007; Lanius, Frewen et al. 2007). These studies provide strong, consistent evidence that changes in neuronal activity in specific regions relate to severity of PTSD symptoms and response to treatment. Overall, most studies report increased activation of the amygdala and decreased activation of the ACC in PTSD with a variety of paradigms that use different types of emotionally evocative stimuli. Several studies also report decreased activity in the mPFC, yet it remains unclear if this is related to trauma exposure rather than PTSD. Seedat and colleagues found reductions in PTSD severity following SSRI treatment negatively correlated with resting blood flow in the mPFC. Given the role the ACC plays in emotional regulation and affective processing, reduced activation of the ACC is probably a major contributor to hyperactivation of the amygdala in PTSD (Liberzon and Martis 2006). Thus, the dysfunction of the emotional processing circuit composed of the medial prefrontal cortex, anterior cingulate, and amygdala appears to be the defining characteristic of PTSD neurobiology. The neuronal changes that arise in PTSD and, importantly, responsiveness to treatment can clearly be identified using fMRI. Biological Markers and PTSD: The pathophysiologic complexity of psychiatric diseases renders diagnosis and treatment challenging. Nonetheless, a number of altered biological molecules and pathways have been identified as markers and used as pharmacological targets. Malfunctioning serotonin receptors have been associated with PTSD (van Praag 2004). Increased platelet serotonin has been described in PTSD associated with psychotic symptoms in war veterans, and is thus a trait marker (Pivac, Kozaric-Kovacic et al. 2006). The protein p11 is among the biomarkers for PTSD that the Traumatic Stress Brain Study Group has identified (Svenningsson, Chergui et al. 2006), finding its mRNA expression was increased in postmortem PTSD patients as compared to matched control (Svenningsson, Chergui et al. 2006). Spivak’s group assessed male outpatients with untreated chronic combat-related PTSD and showed that plasma DHEA and DHEA-S levels were significantly higher compared to controls. They concluded that the neurosteroid-induced decreased GABAergic tone could be used as a marker for chronic PTSD (Spivak, Maayan et al. 2000). In premenopausal women with PTSD increased DHEA was associated with reduced avoidance and negative mood symptoms (Rasmusson, Vasek et al. 2004) suggesting that cortisol and DHEA could be the modulators of recovery from PTSD (Yehuda, Brand et al. 2006; Olff, de Vries et al. 2007). Furthermore, inflammatory markers (CRP, SAA) and cytokines (Interleukins 2,6, and 8) have been proposed as markers for PTSD (Sondergaard, Hansson et al. 2004; Song, Zhou et al. 2007). For a multifactorial disease, a comprehensive and biologically meaningful analysis should be applied. Thus, we propose applying our novel analytical approach to high-throughput serum “OMICS” data using maximum parsimony phylogenetics. Proteomics, Genomics (Omics) and PTSD::Molecular processes underlying psychiatric diseases cannot be further explained by traditional techniques considering that the patient’s molecular bio-signature is responsible for the variable symptomolgy, behavioral response to traumatic events, and response to treatment. Protein mass spectrometry (MS) and gene-expression microarray methodologies have been developed to facilitate the search for biomarkers and only recently used for neuropsychiatric diseases including depression, Alzheimer’s disease, and schizophrenia (Huang, Leweke et al. 2006; Davidsson, Westman-Brinkmalm et al. 2002; Brunner, Bronisch et al. 2005; Cassidy, Zhao et al. 2007). A recent study of SNPs in FKBP5, a gene involved in glucocorticoid receptor (GR) functioning, found there was an impact of early trauma on PTSD and the impact of PTSD and trauma on GR sensitivity (Binder, Bradley et al. 2008). We plan to generate omics data from patients’ blood specimens, and apply our novel functional, multidimensional, and dynamic method to analyze the highthroughput raw data to decipher response to treatment. We named our phylogenetic-based analytical method Phylomics. Deciphering Complex Heterogeneous Biological Systems using Phylomics: Subjecting blood to a thorough MS or gene-microarray generates tens of thousands genes and protein data points. Current analytical methods, such as clustering, do not discriminate between baseline similarity (ancestral states) and what changed or mutated (derived states) to cause or reverse the disease state. Phylomics is a universal data-mining platform capable of analyzing MS and gene-expression data to produce biologically meaningful classification (i.e. group together biologically related specimens). Phylogenetics has been widely used since the 1950s in classifying viruses, bacteria, fungi, plants, or animals based on their shared derived characters (DeLong and Pace 2001; Pillay, Rambaut et al. 2007; Organ, Schweitzer et al. 2008). The diagram depicting the classification is termed a cladogram, and the biomarkers defining each related group are identified as the synapomorphies (more definitions in Attachment 2). Patients with similar pathology share a specific set of molecular changes (synapomorphies) for every stage of the disease; this can be utilized to group patients into classes called clades on the basis of their shared derived molecular changes. Unfortunately, the biomedical field is still almost exclusively dominated by statistical approaches. We are the first group to apply parsimony phylogenetics to biomedicine and have a patent pending (Abu-Asab, Chaouchi et al. 2006; Abu-Asab, Chaouchi et al. 2008). No other method, to date offers a multi-dimensional and dynamic analysis that is capable of deciphering the molecular bio-signature and response to treatment. B. HYPOTHESES AND OBJECTIVES The objective of this study is to compare a CAM (Complementary and Alternative Medicine) based imagery treatment for PTSD with exposure therapy. In addition, we will collect a number of neuroendocrine, genomic, and neuronal measures pre- and post- treatment, which will be used to determine the changes with treatment. Finally, the genomic/proteomic and baseline measures will be entered into a novel classification algorithm called ‘phylomics’ to predict PTSD treatment responders from non-responders a priori. We specifically predict that the imagery treatment will have an impact on PTSD symptom severity equivalent to exposure therapy. We also hypothesize that the physiological impact of imagery treatment will be reflected as positive changes in the physiological stress makers. On a neuronal basis, we predict that activation in the anterior cingulate cortex will increase following the imagery treatment corresponding to treatment success. Lastly, we expect that the phylomic algorithm will be able to accurately distinguish responders from nonresponders based on their baseline profile. The results of this study will include 1) a controlled and systematic comparison of a CAM-based imagery treatment to the more accepted exposure based therapy, 2) identification of the changes in physiological and neuronal markers that relate to treatment outcome, and 3) a test of a novel technique to predict treatment outcome. Any of these end results alone has the potential to make a significant impact on PTSD treatment and further our understanding of this debilitating disorder. Combined, this study represents a unique opportunity to fundamentally change our understanding of PTSD. C. PRELIMINARY DATA Figure 1: Normalization of circadian cortisol levels before and after MBMS. Morning Evening 20 30 Cortisol (ng/ml) Our preliminary data are two fold. First, we show the effect of an imagery-based MindBody Medicine Skills program (MBMS) on salivary stress hormones measured in medical students. Second, we applied phylomics to analyze genetic cancer data. Normal range Normal range 15 20 10 Measurements of Physiological Parameters 10 before and after MBMS: An eleven-week 5 elective MBMS course is offered to first year medical students consisting of weekly 0 0 two-hour meetings in groups of ten with Pre Post Pre Post Mind Body Mind Body Mind Body Mind Body their group facilitator and co-facilitator. AM Medicine Medicine Medicine Medicine and PM Saliva specimens were collected from students pre- and post-MBMS intervention (n=24) and from a control group (n-38). Both genders were represented. The Pre-MBMS collection occurred in early January and the Post-MBMS in May while the students were preparing for their final exams. Saliva was processed for cortisol and DHEA-S, hormones known for their involvement in stress-response. p<0.001 40 Cortisol (ng/ml) p<0.0001 NS NS 20 0 Control MBMS Control MBMS p<0.001 DHEA -S (ng/ml) 12 p<0.01 Pre-Intervention - Post-Intervention (week 11) NS NS 8 4 0 Control MBMS Pre-Intervention Control MBMS Post-Intervention (week 11) Figure 2: AM cortisol and DHEA-S levels of students enrolled in MBMS and controls collected before and after completion of MBMS. Analysis: Both hormones were measured using ELISA and statistical analyzed using log-transformed values. Preand post- data were analyzed using one sample paired ttest; pre- and post-control as well as post-MBMS were tested with two sample unpaired t-test (p<0.05). Cortisol: Both groups started the semester with cortisol levels within the normal range. Three months later the MBMS participants remained within the normal range while controls had a 31% increase in AM values and 82% in PM values. Controls had a 43.5% (p<0.0001; 95%CI: [1.61 to –0.67]) increase in cortisol levels by semester’s end during final exams while MBMS participants maintained normal levels (Figure 2). The PM values followed the same pattern (Table 1). The MBMS program helped students maintain their stress hormone levels within the normal range. Furthermore, all abnormally low AM cortisol values were raised to normal levels with an average treatment-related increase of 7.3 fold. All individuals with a reversed circadian cortisol secretion pattern experienced a normalization of this adverse pattern following MBMS intervention. In subjects with both abnormally elevated AM and PM values in conjunction with a reversed secretion pattern, the evening value reduction was significantly higher than the morning effect (92% vs. 78%), thus restoring both normal range values as well as a physiological circadian distribution (Figure 1). MBMS intervention resulted in normalization to adequate AM peak cortisol values, restoration of the physiological cortisol secretion pattern, as well as significant reduction of elevated measurements across the daily spectrum. DHEA-S: DHEA and DHEA-S, also known as active neurosteroids, tend to follow the cortisol patterns in response to stress (Figure 2). Although all students started with similar levels, by semester’s end controls had increased AM levels by Table 1: Summary of statistical analysis performed on log transformed values. 53% (p<0.002; 95%CI: Cortisol (p value) DHEA-S (p value) [-1.60 to –0.38]) and Collection PM values by 72% time AM PM AM PM (p<0.0001; 95%CI: [Pre-MBMS 1.92 ± 0.22 0.49 ± 0.21 1.02 ± 0.16 0.86 ± 0.18 1.65 to –0.59]). This (n=24) (0.89) (0.33) (0.86) (0.21) study using saliva Pre-Control 1.50 ± 0.17 0.89 ± 0.32 0.44 ± 0.27 specimens (n=16) (0.20) (0.71) (0.19) N/A* demonstrates that 95% CI limits -1.05 to 0.23 -0.87 to 0.31 -0.80 to 0.55 -1.07 to 0.23 MBMS intervention restored cortisol and 1.88 ± 0.17 0.21 ± 0.12 1.04 ± 0.21 0.64 ± 0.19 DHEA-S levels back to Post- (n=24) (0.002) (0.0001) (0.001) (0.0001) normal levels and was 2.65 ± 0.15 1.15 ± 0.16 1.88 ± 0.48 1.56 ± 0.10 maintained throughout (0.0001) (0.001) (0.0002) Control (n=22) (0.001) the semester. Change (%) 31 % 82 % 45 % 59 % Phylomics analysis is 95% CI limits -1.23 to -0.3 -1.36 to -0.53 -1.32 to -0.35 -1.32 to -0.35 capable of early * Data did not follow normal distribution after log transformation. detection and risk assessment This shows phylomic analysis of serum proteomics from normal and prostate cancer specimens obtained from MS data of 36 prostate cancer specimens and 49 non-cancerous specimens from the NCI Clinical Proteomics Program (Petricoin, Paweletz et al. 2002; Zhu, Wang et al. 2003; Abu-Asab, Chaouchi et al. 2006). The cladogram shows a hierarchical classification of prostate cancer specimens. Each segment of the cladogram Figure 3: Most parsimonious cladogram for prostate cancer based on serum proteomics from 36 prostate cancer and 49 healthy men. Specimens had 15144 m/z data points. Lines on right side represent specimens. Red clade corresponds to cancerous (independently assessed); Green indicates healthy; and Blue is presumed healthy but is a transitional zone between healthy and cancerous clades. denotes a specimen (Farris 1970). Each node on the cladogram is defined by the shared derived state(s) among specimens in one of the segments. Topology of the cladogram also conveys general trends within the data that are not obvious otherwise by other types of analysis (Abu-Asab, Chaouchi et al. 2006). We found three distinct sections of the cladogram (Figure 3): the basal contains most of the normal specimens (green); the middle has transitional specimens between the normal and cancer that could represent the “at risk” subjects (blue); and the upper section has the cancerous ones (red). Because phylomics plots specimens classification on a hierarchical continuum, it is the first analytical tool to identify transitional specimens (transitional clades) nested distinctly between cancer and non-cancerous main clades which most likely represent individuals at risk of developing cancer or recovering from treatment. This makes the cladogram a very useful tool to identify the transitional patterns from healthy to cancerous tissue by directly modeling the data with minimal restrictive assumptions, and possibly renders it a predictive tool for early disease detection and risk assessment. Robustness of Phylomics analyzing multiple datasets To illustrate phylomics robustness, we carried out a comprehensive analysis combining polarized matrices of 460 specimens representing ovarian (n=143), pancreatic (n=70), and prostate (n=36) cancers as well as non-cancerous specimens (n=211) from NCI Clinical Proteomics Program. Analysis yielded a consensus cladogram where each of the three cancers formed two large clades (the terminal and middle), and numerous small transitional clades adjacent to non-cancerous clades (Figure 4). Pancreatic and prostate clades formed sister groups in their terminal and middle clades, and their terminal clades were nested within the ovarian clades’ dichotomy. Ovarian specimens formed two distinct clades. A set of transitional clades for each cancer type formed between normal and large cancer clades (brown). Transitional clades of each cancer type did not commingle with those of other clades. Significance of our findings for the Proposed Application: Our data showed the beneficial effects of imagery-based MBMS on stress in healthy subjects. We showed that after MBMS, cortisol levels normalized in Figure 5: Hypothesized output of phylomics to PTSD derived the students from the neuronal, physiological, and proteomic/genomic signature of individual subjects. that had low Figure 4: Phylomic analysis of cancer types. AM or high PM cortisol. Elevated DHEA and DHEA-S have been suggested as clinical correlates associated with PTSD (Spivak, Maayan et al. 2000). These two hormones are active neurosteroids that tend to reduce GABAergic tone (Spivak, Maayan et al. 2000). Thus, normalizing their levels could play a role in PTSD symptom improvement. Our data suggests that a MBMS imagerybased program could be an efficacious treatment for PTSD. We have also demonstrated that phylomics is a robust analytical tool that offers a novel method to analyze high-throughput MS and gene-expression microarray data resulting in biologically meaningful relationships between subjects sharing similar molecular bio-signatures in a hierarchical, dynamic, and multi-dimensional fashion. If applied to PTSD, we expect to be able to distinguish between those subjects who are predisposed to developing PTSD if exposed to traumatic events (at risk group) as well as treatment responders from nonresponders (Figure 5). Furthermore, since it is a dynamic analysis we expect to be able to follow the response to treatment of each patient by analyzing changes that occur during treatment leading to improvement. This will translate into rearrangements of subjects in the responder/non-responders clades. PTSD is a multifactorial disease that cannot be characterized by only one or two biomarkers as is traditionally done. D. SPECIFIC AIMS Previous studies of CAM-based treatment to treat PTSD have demonstrated positive outcomes to victims of war-related trauma in Kosovo (Gordon, Staples et al. 2004). These studies while useful lacked comparison to an accepted treatment modality such as exposure therapy. Furthermore, the neuronal/physiological mechanisms that underlie treatment effects have not been identified. We propose to compare a positive mental imagery technique called Guided Imagery (Naparstek 2004) to Prolonged Exposure, which uses mental imagery to revisit the traumatic event. We further propose to use neuroimaging, physiological, and proteomic/genomic data as input to the phylomic algorithm to predict treatment response. Thus, this study will produce three main outcomes: 1) test the efficacy of a CAM-based imagery treatment (Guided Imagery) against the established treatment (Prolonged Exposure) 2) further elucidate the neurological/physiological mechanisms underlying PTSD and subsequent changes related to treatment, and 3) test the ability of phylomics, a novel classification algorithm, to predict PTSD treatment responders from non-responders. We therefore propose the following specific aims: Aim 1: To compare Guided Imagery against Prolonged Exposure therapy. These two treatment modalities are similar in structure but differ in terms of emphasis on positive (Guided Imagery) and traumatic (Prolonged Exposure) imagery. We hypothesize Guided Imagery will be as effective as exposure therapy in reducing PTSD symptoms. Specifically, we hypothesize that at post-treatment, mean symptom scores will differ by no more than 0.5 standard deviations between groups, and will differ by at least 0.4 SD units from pre-treatment. Aim 2: To identify impact of both treatment regimes on neuroendocrine markers of stress and neurobiological changes. We will focus on both central and peripheral pathways: glucocorticoids and catecholamines as peripheral-sympatho-adrenal markers, and the neuroactive steroids (DHEA, DHEA-S) and neuropeptide Y (both measurable in plasma), known for their anxiolytic action and role in stress physiology. We hypothesize that Guided Imagery treatment will result in positive changes in each of these stress makers. Aim 3: To determine the neurobiological changes associated that occur with treatment. Previous fMRI studies of PTSD have generally reported a decrease in activity in medial prefrontal cortex (mPFC) and the anterior cingulate cortex (ACC) with a corresponding increase in the amygdala. Activity in the mPFC and ACC has also been reported to negatively correlate with PTSD severity. We predict that activation in the ACC during an SDI task will increase following treatment while the mPFC and amygdala will remain unchanged. Aim 4: To methodically examine treatment responders from non-responders using phylomics. The baseline measures of stress both neuroendocrine and neuronal activity in addition to the genetic profile will be used as input to the phylomics classification algorithm. We expect that this algorithm will be able to distinguish responders from non-responders based on their baseline profile. The relevance of the proposed study to treatment of PTSD is 1) a systematic comparison of a CAM-based treatment to the more accepted exposure based therapy, 2) identification of changes in neural markers that relate to treatment outcome, and 3) a test of a novel technique to predict treatment outcome. E. RESEARCH STRATEGY 1. Experimental Design The experimental design centers on comparing a CAM-based Guided Imagery intervention to the more standard exposure therapy. Subjects entering the study will be randomized into one of the two arms. Baseline assessments will include administration of a number of clinical assessment instruments, collection of saliva blood specimens, and fMRI imaging. Follow-up assessments include salvia, blood draw, and fMRI scanning performed within two weeks of the last treatment session. PTSD symptom severity will be assessed using the CAPS no more than four weeks after the last treatment. Details of each of these procedures are described below. 2. Subject Selection Subjects will be recruited through the VA with the assistance of Drs. Richard Amdur and Marc Blackman. Other avenues will also be pursued including working with physicians from Walter Reed Medical Center and the National Navy Medical Center to provide referrals. In addition, we will also work with local veterans organizations to assist with disseminating information about this study. Participants: Male and female subjects 18-55 years who have direct combat experience and meet criteria for combat-related PTSD will be considered for this study. Inclusion Criteria: All subjects in the PTSD cohorts must meet the diagnosis of PTSD based on a clinical interview assessment using the DSM-IV-R criteria. Subjects will be screened to include only those with combat related trauma. Subjects will be selected on the basis of whether they plan to seek other treatment during the study period. If this proves to be impractical, this restriction will be eliminated. Exclusion Criteria: Subjects who are younger than 18 and older than 55 years of age will be excluded. Additional exclusion criteria: total WASI IQ score < 85, less than strongly right-handed (Edinburgh < 90), diagnosis of psychosis, history of previous psychiatric treatment other than PTSD, current psychotropic drug use, overt neurological injury or disease, seizure disorder, and mood disorders. Subjects who have suffered traumatic brain injury (TBI), a closed head injury, a concussion, or been knocked unconscious for a period of time as a result of head injury will be excluded. Finally, individuals will be excluded due to psychosis, mania, current suicidal ideation and substance abuse/dependence as determined by screening procedures. 3. Safety Concerns Due to safety concerns, participants who with psychosis, mania, current suicidal ideation and substance abuse/dependence will be excluded from the study. Subjects will be monitored by interventionists for emerging problems in these areas. All subjects will be given referral information for crisis concerns and a study contact number will be answered 24/7 by a study team member. Special care will be paid to screening all subjects for ferromagnetic metallic objects, implants, and shrapnel to ensure subject safety in the MRI scanner. A thorough review consists of a standard 60-question list including injury with metallic objects. Medical records are examined to verify that all implants are MRI-compatible. 4. Description of Data Collection Procedures Subject Assignment and Intervention: Randomization Procedure: Subjects will be randomized using permuted blocks (2, 4 and 6) (Friedman, Furberg et al. 1998), providing the best opportunity of maximizing the benefits of randomization. Once a participant completes the baseline assessment and is determined to be eligible, he/she will receive the next random number assignment. This process will be conducted by a data manager who is otherwise not involved in the study. Intervention Conditions: Both Guided Imagery and Prolonged Exposure will be delivered in 11 weekly 90minute audiotaped sessions. Manuals will guide interventionists. 1) Participants in the Guided Imagery arm will receive treatment according to the intervention developed by Naparstek (Naparstek 2004) using a 3-stage approach involving stabilization and self-soothing, cognitive and emotional integration, and long-term functioning. 2) Participants in the Prolonged Exposure arm will receive treatment for PTSD used by Foa (Foa, Dancu et al. 1999; Foa, Hembree et al. 2005), Schnurr (Schnurr, Friedman et al. 2007) and others. Assessment of symptoms is routinely conducted with self-report instruments to monitor progress and assess safety. Interventionists: Interventionists will be mental health clinicians, each experienced in delivering either Guided Imagery or Prolonged exposure experienced in working with veteran populations. Adherence Measures: We will use measures to capture the essential techniques of both interventions to include items related to techniques that typify the intervention as well as those that should not be used in this method. Independent raters reviewing audiotapes of sessions will make ratings using this measure. Competence will include items assessing skill with which the interventionist phrased interventions, timing of the interventions, appropriateness of comments at the time it was given, as well as tapping nonspecific items, such as his/her degree of warmth and supportiveness. The same independent rater who assessed adherence will rate this measure; based on the same audiotapes. fMRI Stimulation Paradigm: The paradigms optimized for fMRI will include a script driven imagery (SDI) task and an emotional counting Stroop. Script Driven Imagery (SDI) task will be used which will present short scenarios that uses auditory presentation of the subjects traumatic event gathered through an interview. Scripts will alternate with a neutral story in a block design. After each script subjects use an MRI compatible joystick to provide a rating of the intensity of the sensations experienced on a scale of 1 to 7. Emotional Counting Stroop uses negative words that include both trauma related and non-trauma related negative associations. This provides an excellent assessment of limbic emotional regulation compared to executive control. To isolate the changes specific to emotional stimuli, the classical Stroop task is used. Both Stroop tasks will be presented using a rapid event-related design. Proteomic and Genomic Assessments: Proteomics: Matrix-assisted laser desorption/ionization mass spectrometry (MALDI-MS) is one of the large collections of high throughput technologies utilized for proteomics studies. The resulting mass spectra are mass to charge (m/z) ratio values in which the intensity of peaks is correlated to the peptides concentration in the analyzed fraction. All data will be analyzed with UNIPAL as previously described (Abu-Asab, Chaouchi et al. 2006; Abu-Asab, Chaouchi et al. 2008) . Protein expression level: Western blotting - Protein expression and quantification are determined using the corresponding antibody of the targeted protein. Beta-actin is used to normalize band quantification. Signals are detected via chemiluminescence as described before (Amri, Ogwuegbu et al. 1996). ELISA: standard kits are used for NPY, DHEA, DHEA-S, cortisol and catecholamines. Genomics: Gene-expression microarray - Peripheral blood will be collected directly into PaxGene tubes which stabilize and protect RNA from degradation (Qiagen) then frozen at -80°C until use. Total RNA is isolated with a Rneasy Mini Kit (Qiagen). RNA samples are analyzed using the fully integrated Affymetrix GeneChip Instrument System including RNA amount and integrity. Gene array analysis uses a solid phase assay, followed by RT-PCR and northern blot analysis to eliminate false positive results. For Real-time RT-PCR, RNA will be isolated using a High Pure RNA Isolation Kit (Roche Diagnostic) and cDNA synthesized by iSrcript (BioRad). RT- PCR will be performed using Icycler iQ Detection System (BioRad) and TaqMan PCR Reagent Kit with pre-designed primers and fluorescien-labeled probes from Applied Biosystems (Foster City, CA). Mental Health Assessments: Screening: The Structured Clinical Interview for DSM-IV (SCID) (First et al., 1994; Spitzer et al., 1992) is an interview for both past and current Axis I diagnoses based on DSM-IV. Selected modules are used to screen out subjects with current substance abuse or dependence, lifetime or current psychosis, and bipolar disorder. If participants endorse the suicide item, their level of intent will be explored by the trained assessor. If participants endorse current suicidal intent, they will be excluded from the study and referred to the first available on-site physician for disposition. The SCID has adequate test-retest reliability. Kappa values in patient samples were .61 for current and .68 for lifetime diagnoses for most of the major categories (e.g., bipolar disorder, alcohol abuse/dependence; major depressive disorder). Combat Exposure: The Combat Exposure Scale (CES) (Keane, Fairbank et al. 1989) is a 7-item self-report measure to assess wartime stressors experienced by combatants. Items are rated on a 5-point frequency (1 = “no/never” to 5 = “> 50 times”), 5-point duration (1 = “never” to 5 = “> 6 months”), 4-point frequency (1 = “no” to 4 = “more than 12 times”) or 4-point degree of loss (1 = “no one” to 4 = “more than 50%”) scale. The total CES score (ranging from 0 to 41) is calculated by using a sum of weighted scores, which can be classified into 1 of 5 categories of combat exposure ranging from “light” to “heavy.” The CES will be used at baseline. PTSD: The Clinician Administered PTSD Scale (CAPS) (Blake, Weathers et al. 1995) will be used to diagnose current PTSD and assess severity of PTSD symptoms. This scale assesses the frequency and intensity of the 17 symptoms in the DSM-IV PTSD criteria. A frequency rating of at least 1 on a 0 to 4 scale ("once or twice within the past month") and a severity rating of at least 2 on a 0 to 4 scale ("moderate") will qualify for presence of the symptom for diagnostic purposes. Studies with combat veterans have been used to demonstrate the reliability and validity of the CAPS (Weathers et al, 1992a, 1992b; Weathers, Blake & Litz, 1991). Internal consistency (α coefficient) was estimated to be 0.94 for severity score (frequency and intensity) and test-retest reliability ranged from 0.90 to 0.98. CAPS total severity score correlated with other established measures of PTSD suggesting good convergent validity. CAPS will be completed at baseline and completion of treatment. Depression: The Patient Health Questionnaire (PHQ), the self-report version of the Primary Care Evaluation of Mental Disorders (PRIME-MD) (Spitzer, Kroenke et al. 1999) will be used to assess depression, a common comorbid condition. This instrument has good psychometric properties with the diagnosis of any psychiatric disorder k = 0.71; overall accuracy rate = 88% (Spitzer & Williams, 1994). Participant Feedback: At the conclusion of the trial, participants will be invited to provide feedback during postintervention interviews with study staff regarding their interest in the intervention, comfort level, consistency with cultural values, and perceived utility of the intervention as well as suggestions for change to increase utility of the intervention. This information will be used to enhance interpretation of quantitative data. 5. Data Acquisition and Analysis Functional MRI Data Acquisition: The fMRI data will be acquired on the research dedicated Siemens Trio 3.0T scanner with gradients suitable for echo-planar imaging sequences located at Georgetown University. A whole brain high-resolution T1-weighted scan with with an effective resolution of 1.0mm3 is acquired to assess brain morphology and to localize functional results. Functional MRI scans will be acquired using an EPI (echoplanar imaging) scan with a 2s TR, 30 slices, and an effective resolution of 3.0mm3. Physiological and Behavioral Monitoring: Assessment of the subject’s mental state during the scanning sessions will utilize the Invivo Millennia (Invivo Research, Orlando, FL) physiological monitoring system to digitally record at heart rate (ECG), respiration, and pulse oximetry (SpO2). Galvanic skin response will also be recorded using the MRA GSR system (MRA, Washington, PA). The physiological measures will be integrated into the fMRI data analysis to identify neuronal responses related to the changes in these measures. Functional MRI Data Analysis: We currently use a combination of tools including statistical parametric mapping (SPM), MEDx, which was developed in large part under the direction and design of Dr. VanMeter, FSL, and AFNI for individual and group analyses. Pre-processing of fMRI data includes Correction for Geometric Distortion that occur due to inhomogeneities in the scanner’s static magnetic field using a field map (Jezzard and Balaban 1995). Head Motion Correction uses rigid-body transformations (Woods, Grafton et al. 1998). High-pass filtering removes artifactual low-frequency Signal Drift. Spatial Normalization transforms individual subject’s images into a standard coordinate system via nonlinear transformations which allows for inter-subject averaging to improve statistical sensitivity by (Ashburner and Friston 1997; Woods, Grafton et al. 1998). Spatial Smoothing is applied to remove noise locally within the images and to allow for statistical inference using Gaussian random field theory (Worsley 2005). fMRI Statistical Analysis: We will use a Mixed-Effects Statistical Analysis technique that consists of two-stages (Strange, Portas et al. 1999; Penny, Holmes et al. 2003). The first-level analysis uses a fixed-effects single subject analysis followed by a second-level analysis that uses a random-effects group analysis on the summary statistical images from the first-level analysis. Correction for Multiple Comparisons will use Gaussian random field theory, which takes into account not only the multiplicity of simultaneous tests but also the spatial smoothness of the data (Friston, Worsley et al. 1994; Worsley, Marrett et al. 1996). An alternative method for addressing this issues use false discovery rate (FDR) (Genovese, Lazar et al. 2002). Phylomics – Computational application: As a computational platform, phylomics encompasses two universal algorithms that are run consecutively to produce the classification of specimens. First, UNIPAL: Universal Parsing Algorithm to carry out polarity assessment of data points. This program was developed by the investigators to perform outgroup comparison on the specimens. Second, MIX: a maximum parsimony program which carries out the Wagner and Camin-Sokal parsimony methods (Felsenstein 1989). MIX produces the most parsimonious cladogram for a dataset. Statistical analysis - Biochemical and molecular experiments: To compare measured expression in patients’ specimens, normality and homoscedasticity are checked and appropriate transformations applied (log or arcsine-square root). If transformed results follow a normal distribution and are homoscedastic, one-way ANOVA will be used to compare the mean value. Otherwise the Wilcoxon rank-sum test for 2 groups will be used. For the quantitative expression of a defined molecular target measured by, Western, RT-PCR, or Northern, the data will be divided into negative and positive followed by appropriate tests and transformations. Parametric tests will be used when possible and non-parametric tests otherwise. 6. Potential Problems and Alternatives Budgetary limitations allow us to include assessments at baseline and post-intervention, only. If funded, we will seek additional outside funding to permit assessments at 3-, 6-, and 12-months. It is possible that subjects become distressed during the intervention sessions or during the assessments or other experimental procedures (i.e., SDI). We plan to conduct intervention sessions on the GCRC (General Clinical Research Center), thus allowing for readily available back-up medical and psychiatric staff who are also available to respond in the Imaging Center, as needed. The multiple endpoints of this study outside of the intervention will ensure valuable results even if the Guided Imagery treatment is not successful, which in of itself is an important outcome. 7. Statistical Power Analyses To test the equivalence of Imagery & Control treatments on PTSD symptom severity at post-treatment, we will do an intention-to-treat analysis using 1-way ANOVA with H0: the two group means are no more than 0.5 SD units apart; H1: the exposure intervention is superior by at least 0.5 SD units. In order to have power > .80 to test this directional hypothesis, we would need a total N of 101 or 51 per group, with alpha = .05, using a 1tailed test based on G*power3 (Faul, Erdfelder et al. 2007). t tests - Means: Difference between two dependent means (matched pairs) Tail(s) = Two, _ err prob = 0.05, Effect size dz = 0.4 To minimize problems of assay sensitivity and biased end-point ratings inherent in noninferiority trials (Snapinn 2000), we will use a repeated-measures ANOVA, testing equivalence of pre-post-treatment change in PTSD symptom severity between arms. With N=101, assuming a pre-post r of .50, the power will be > .99 to detect an interaction and a time effect in which each explains 10% of the total within-group variance. To demonstrate efficacy of the experimental intervention, we will use a paired t-test. Assuming pre-post r of .50, with n=51 in the Imagery group, this test would have power > .80 to detect pre-post change of 0.4 SD units (with alpha=.05, 2-tailed). 1 0.9 Power (1- _ err prob) 0.8 0.7 0.6 0.5 0.4 0.3 0.2 10 20 30 40 50 60 Total sample size 70 80 90 100 F. SUMMARY This proposal brings together psychologists who specialize in PTSD treatment (Drs. Dutton and Amdur) with basic scientists who study the ability of CAM-types of treatment to reduce stress using neurophysiological quantitative measures (Dr. Amri) and neurobiological basis of various disorders (Dr. VanMeter). While these individuals come from very different backgrounds, the three PI’s have worked together as facilitators in the MBMS program of stress management developed for use in the School of Medicine. In addition, the three PI’s have two current collaborations underway examining the effect of various CAM modalities on stress biomarkers and a neuroimaging study of PTSD. The proposed study utilizes a truly synergistic approach that leverages this group of investigators unique talents to examine the efficacy of a CAM-based treatment (Guided Imagery) compared to a traditional treatment (exposure therapy). In addition, this study will assess changes in physiological measures of stress and the underlying neuronal patterns of activity as a function of the two treatments. Lastly, the baseline neuroendocrine, genomic, and neuronal patterns will be used to classify a priori treatment responders from non-responders using phylomics (patent pending) developed by Dr. Amri. The output of the phylomic algorithm will provide not only a neurophysiological/genomic signature of PTSD but also a stratified classification of subjects that could be used to target treatments to specific individuals suffering from PTSD. Overall, this study has the potential to make a major impact on the field of PTSD and its treatment. References: Abu-Asab, M., M. Chaouchi, et al. (2006). "Phyloproteomics: what phylogenetic analysis reveals about serum proteomics." J Proteome Res 5(9): 2236-40. Abu-Asab, M., M. Chaouchi, et al. (2008). "Evolutionary medicine: A meaningful connection between omics, disease, and treatment." Proteomics Clin Appl 2(2): 122-134. Amri, H., S. O. Ogwuegbu, et al. (1996). "In vivo regulation of peripheral-type benzodiazepine receptor and glucocorticoid synthesis by Ginkgo biloba extract EGb 761 and isolated ginkgolides." Endocrinology 137(12): 5707-18. Ashburner, J. and K. Friston (1997). "Multimodal image coregistration and partitioning--a unified framework." Neuroimage 6(3): 209-17. Bandettini, P. A., E. C. Wong, et al. (1992). "Time course EPI of human brain function during task activation." Magn Reson Med 25(2): 390-7. Belliveau, J. W., D. N. Kennedy, Jr., et al. (1991). "Functional mapping of the human visual cortex by magnetic resonance imaging." Science 254(5032): 716-9. Binder, E. B., R. G. Bradley, et al. (2008). "Association of FKBP5 polymorphisms and childhood abuse with risk of posttraumatic stress disorder symptoms in adults." Jama 299(11): 1291-305. Blake, D. D., F. W. Weathers, et al. (1995). "The development of a Clinician-Administered PTSD Scale." Journal of Traumatic Stress 8: 75-90. Bremner, J. D., E. Vermetten, et al. (2004). "Neural correlates of the classic color and emotional stroop in women with abuse-related posttraumatic stress disorder." Biol Psychiatry 55(6): 612-20. Britton, J. C., K. L. Phan, et al. (2005). "Corticolimbic blood flow in posttraumatic stress disorder during scriptdriven imagery." Biol Psychiatry 57(8): 832-40. Brunner, J., T. Bronisch, et al. (2005). "Proteomic analysis of the CSF in unmedicated patients with major depressive disorder reveals alterations in suicide attempters." Eur Arch Psychiatry Clin Neurosci 255(6): 438-40. Bryant, R. A., A. H. Kemp, et al. (2007). "Enhanced amygdala and medial prefrontal activation during nonconscious processing of fear in posttraumatic stress disorder: An fMRI study." Hum Brain Mapp. Cassidy, F., C. Zhao, et al. (2007). "Genome-wide scan of bipolar disorder and investigation of population stratification effects on linkage: support for susceptibility loci at 4q21, 7q36, 9p21, 12q24, 14q24, and 16p13." Am J Med Genet B Neuropsychiatr Genet 144(6): 791-801. Clinical Practice Guideline Workgroup (2004). VA/DoD Clinical Practic Guideline for the Management of Post-Traumatic Stress, Department of Veterans Affairs and Health Affairs, Department of Defense. Davidsson, P., A. Westman-Brinkmalm, et al. (2002). "Proteome analysis of cerebrospinal fluid proteins in Alzheimer patients." Neuroreport 13(5): 611-5. DeLong, E. F. and N. R. Pace (2001). "Environmental diversity of bacteria and archaea." Syst Biol 50(4): 4708. Eden, G. F., J. W. VanMeter, et al. (1996). "Abnormal processing of visual motion in dyslexia revealed by functional brain imaging." Nature 382(6586): 66-9. Farris, J. S., A. G. Kluge, and M. J. Eckhart (1970). "On predictivity and efficiency." Systematic Zoology 19: 363-372. Faul, F., E. Erdfelder, et al. (2007). "G*Power 3: a flexible statistical power analysis program for the social, behavioral, and biomedical sciences." Behav Res Methods 39(2): 175-91. Felsenstein, J. (1989). "PHYLIP: Phylogeny Inference Package (version 3.2)." Cladistics: 164-166. Foa, E. B., C. V. Dancu, et al. (1999). "A comparison of exposure therapy, stress inoculation training, and their combination for reducing posttraumatic stress disorder in female assault victims." J Consult Clin Psychol 67(2): 194-200. Foa, E. B., E. A. Hembree, et al. (2005). "Randomized trial of prolonged exposure for posttraumatic stress disorder with and without cognitive restructuring: outcome at academic and community clinics." J Consult Clin Psychol 73(5): 953-64. Frahm, J., H. Bruhn, et al. (1992). "Dynamic MR imaging of human brain oxygenation during rest and photic stimulation." J Magn Reson Imaging 2(5): 501-5. Friedman, L. M., C. D. Furberg, et al. (1998). Fundamentals of clinical trials, 3rd ed. New York, Springer. Friston, K. J., K. J. Worsley, et al. (1994). "Assessing the significance of focal activations using their spatial extent." Human Brain Mapping 1: 214-220. Genovese, C. R., N. A. Lazar, et al. (2002). "Thresholding of statistical maps in functional neuroimaging using the false discovery rate." Neuroimage 15(4): 870-8. Gordon, J. S., J. K. Staples, et al. (2004). "Treatment of posttraumatic stress disorder in postwar Kosovo high school students using mind-body skills groups: a pilot study." J Trauma Stress 17(2): 143-7. Hoge, C. W. (2006). "Deployment to the Iraq war and neuropsychological sequelae." Jama 296(22): 2678-9; author reply 2679-80. Hoge, C. W., J. L. Auchterlonie, et al. (2006). "Mental health problems, use of mental health services, and attrition from military service after returning from deployment to Iraq or Afghanistan." Jama 295(9): 1023-32. Hoge, C. W., C. A. Castro, et al. (2004). "Combat duty in Iraq and Afghanistan, mental health problems, and barriers to care." N Engl J Med 351(1): 13-22. Hoge, C. W., A. Terhakopian, et al. (2007). "Association of posttraumatic stress disorder with somatic symptoms, health care visits, and absenteeism among Iraq war veterans." Am J Psychiatry 164(1): 1503. Huang, J. T., F. M. Leweke, et al. (2006). "Disease biomarkers in cerebrospinal fluid of patients with first-onset psychosis." PLoS Med 3(11): e428. Institute of Medicine: Committee on Treatment of Posttraumatic Stress Disorder (2007). Treatment of Posttraumatic Stress Disorder: An Assessment of the Evidence, The National Academies Sciences. Jezzard, P. and R. S. Balaban (1995). "Correction for geometric distortion in echo planar images from B0 field variations." Magn Reson Med 34(1): 65-73. Keane, T., J. Fairbank, et al. (1989). "Clinical evaluation of a measure to assess combat exposure ." Psychological Assessment 1(53-55). Lanius, R. A., P. A. Frewen, et al. (2007). "Neural correlates of trauma script-imagery in posttraumatic stress disorder with and without comorbid major depression: a functional MRI investigation." Psychiatry Res 155(1): 45-56. Liberzon, I. and B. Martis (2006). "Neuroimaging studies of emotional responses in PTSD." Ann N Y Acad Sci 1071: 87-109. Milliken, C. S., J. L. Auchterlonie, et al. (2007). "Longitudinal Assessment of Mental Health Problems Among Active and Reserve Component Soldiers Returning From the Iraq War." Jama 298(18): 2141-2148. Naparstek, B. (2004). Invisible heroes: Survivors of trauma and how they heal. New York, Bantam Dell. Office of Inspector General (2005). Review of State Variances in VA Disability Compensation Payments, Department of Veterans Affairs,: vii. Ogawa, S., T. M. Lee, et al. (1990). "Oxygenation-sensitive contrast in magnetic resonance image of rodent brain at high magnetic fields." Magn Reson Med 14(1): 68-78. Ogawa, S., D. W. Tank, et al. (1992). "Intrinsic signal changes accompanying sensory stimulation: functional brain mapping with magnetic resonance imaging." Proc Natl Acad Sci U S A 89(13): 5951-5. Olff, M., G. J. de Vries, et al. (2007). "Changes in cortisol and DHEA plasma levels after psychotherapy for PTSD." Psychoneuroendocrinology 32(6): 619-26. Organ, C. L., M. H. Schweitzer, et al. (2008). "Molecular phylogenetics of mastodon and Tyrannosaurus rex." Science 320(5875): 499. Penny, W. D., A. P. Holmes, et al. (2003). Random effects analysis. Human Brain Function. R. S. J. Frackowiak, K. J. Friston, C. Frithet al, Academic Press. Petricoin, E. E., C. P. Paweletz, et al. (2002). "Clinical applications of proteomics: proteomic pattern diagnostics." J Mammary Gland Biol Neoplasia 7(4): 433-40. Pillay, D., A. Rambaut, et al. (2007). "HIV phylogenetics." Bmj 335(7618): 460-1. Pivac, N., D. Kozaric-Kovacic, et al. (2006). "Platelet serotonin in combat related posttraumatic stress disorder with psychotic symptoms." J Affect Disord 93(1-3): 223-7. Rasmusson, A. M., J. Vasek, et al. (2004). "An increased capacity for adrenal DHEA release is associated with decreased avoidance and negative mood symptoms in women with PTSD." Neuropsychopharmacology 29(8): 1546-57. Rauch, S. L., P. J. Whalen, et al. (2000). "Exaggerated amygdala response to masked facial stimuli in posttraumatic stress disorder: a functional MRI study." Biol Psychiatry 47(9): 769-76. Schnurr, P. P., M. J. Friedman, et al. (2007). "Cognitive behavioral therapy for posttraumatic stress disorder in women: a randomized controlled trial." Jama 297(8): 820-30. Shin, L. M., R. J. McNally, et al. (1999). "Regional cerebral blood flow during script-driven imagery in childhood sexual abuse-related PTSD: A PET investigation." Am J Psychiatry 156(4): 575-84. Shin, L. M., C. I. Wright, et al. (2005). "A functional magnetic resonance imaging study of amygdala and medial prefrontal cortex responses to overtly presented fearful faces in posttraumatic stress disorder." Arch Gen Psychiatry 62(3): 273-81. Snapinn, S. M. (2000). "Noninferiority trials." Curr Control Trials Cardiovasc Med 1(1): 19-21. Sondergaard, H. P., L. O. Hansson, et al. (2004). "The inflammatory markers C-reactive protein and serum amyloid A in refugees with and without posttraumatic stress disorder." Clin Chim Acta 342(1-2): 93-8. Song, Y., D. Zhou, et al. (2007). "Disturbance of serum interleukin-2 and interleukin-8 levels in posttraumatic and non-posttraumatic stress disorder earthquake survivors in northern China." Neuroimmunomodulation 14(5): 248-54. Spitzer, R. L., K. Kroenke, et al. (1999). "Validation and utility of a self-report version of PRIME-MD: the PHQ primary care study. Primary Care Evaluation of Mental Disorders. Patient Health Questionnaire." JAMA. . 282(18): 1737-44. Spivak, B., R. Maayan, et al. (2000). "Elevated circulatory level of GABA(A)--antagonistic neurosteroids in patients with combat-related post-traumatic stress disorder." Psychol Med 30(5): 1227-31. Strange, B. A., C. M. Portas, et al. (1999). "Random effects analyses for event-related f{MRI." Neuroimage 9: 36. Svenningsson, P., K. Chergui, et al. (2006). "Alterations in 5-HT1B receptor function by p11 in depression-like states." Science 311(5757): 77-80. Thulborn, K. R., J. C. Waterton, et al. (1982). "Oxygenation dependence of the transverse relaxation time of water protons in whole blood at high field." Biochim Biophys Acta 714(2): 265-70. Turner, R., P. Jezzard, et al. (1993). "Functional mapping of the human visual cortex at 4 and 1.5 tesla using deoxygenation contrast EPI." Magn Reson Med 29(2): 277-9. van Praag, H. M. (2004). "The cognitive paradox in posttraumatic stress disorder: a hypothesis." Prog Neuropsychopharmacol Biol Psychiatry 28(6): 923-35. Woods, R. P., S. T. Grafton, et al. (1998). "Automated image registration: I. General methods and intrasubject, intramodality validation." J Comput Assist Tomogr 22(1): 139-52. Woods, R. P., S. T. Grafton, et al. (1998). "Automated image registration: II. Intersubject validation of linear and nonlinear models." J Comput Assist Tomogr 22(1): 153-65. Worsley, K. J. (2005). "Spatial smoothing of autocorrelations to control the degrees of freedom in fMRI analysis." Neuroimage 26(2): 635-41. Worsley, K. J., S. Marrett, et al. (1996). "Searching scale space for activation in PET images." Human Brain Mapping 4: 74-90. Yehuda, R., S. R. Brand, et al. (2006). "Clinical correlates of DHEA associated with post-traumatic stress disorder." Acta Psychiatr Scand 114(3): 187-93. Zhu, W., X. Wang, et al. (2003). "Detection of cancer-specific markers amid massive mass spectral data." Proc Natl Acad Sci U S A 100(25): 14666-71. Acronyms ACC AFNI ANOVA BOLD CAM CAPS CBT cDNA CES CRP DHEA DHEA-S DIS-IV DSM-IV-R ECG ELISA EMDR EPI FDR fMRI GABA GCRC GR IOM MALDI-MS MBMS MIX mPFC MR MRI mRNA MS NCI NPY OEF OIF OMICS OR PDHA PHQ PRIME-MD PTSD RT-PCR SAA SCID SDI - anterior cingulate cortex Analysis of Functional NeuroImages analysis of variance Blood Oxygenation Level Dependent Complementary and Alternative Medicine Clinician-Administered PTSD Scale cognitive behavior therapy complementary Deoxyribonucleic acid Combat Exposure Scale C-reactive protein Dehydroepiandrosterone Dehydroepiandrosterone sulfate Diagnostic Interview Schedule for DSM-IV Diagnostic and Statistical Manual of Mental Disorders 4th Edition Revised electrocardiogram Enzyme-Linked ImmunoSorbent Assay Eye Movement Desensitization and Reprocessing echo-planar imaging false discovery rate functional magnetic resonance imaging Gamma-aminobutyric acid General Clinical Research Center glucocorticoid receptor Institute of Medicine Matrix-assisted laser desorption/ionization mass spectrometry Mind-Body Medicine Skills maximum parsimony program medial prefrontal cortex magnetic resonance magnetic resonance imaging messenger ribonucleic acid mass spectroscopy National Cancer Institute Neuropeptide Y Operation Enduring Freedom Operation Iraqi Freedom Genomics odds ratio Post-Deployment Health Assessment Patient Health Questionnaire Primary Care Evaluation of Mental Disorders Post Traumatic Stress Disorder reverse transcription polymerase chain reaction Serum amyloid A Structured Clinical Interview for DSM-IV script driven imagery SNP SPM SSRI 3T TBI TR UNIPAL VA WASI - single nucleotide polymorphism Statistical parametric mapping Selective Serotonin Reuptake Inhibitor 3 Tesla traumatic brain injury repetition time Universal Parsing Algorithm Department of Veteran Affairs Wechsler Adult Intelligence Scale FACILITIES and OTHER RESOURCES CENTER FOR FUNCTION AND MOLECULAR IMAGING The Center for Function and Molecular Imaging (CFMI), which is directed by Dr. John VanMeter, includes one other researcher and a support staff including four research assistants, senior research associate/database manager, a financial administrator, and a systems administrator. The imaging center has 3800 square feet, which in addition to the 3T MRI Scanner, console, equipment room, and EEG/NIRS lab space includes four offices and 4 cubicles. An additional 1400 square feet of office space is located in an adjacent building. CFMI has extensive ongoing collaborations with Children’s National Medical Center, George Washington University, University of Maryland, College Park, George Mason University, Kappametrics, Inc., and RTI (Research Triangle Institute) International, Inc. CFMI is also a core facility resource through the NIH funded General Clinical Research Center (GCRC) and the Mental Retardation and Developmental Disorders Research Center (MRDDRC). Dr. VanMeter is the core director for both of these centers. Equipment 3T MRI Scanner - A research-dedicated 3.0 Tesla Siemens (Erlangen, Germany) Trio whole-body MRI system with EPI (echo planar imaging) capability is located in the Center for Functional and Molecular Imaging (CFMI), Georgetown University Medical Center (near to the Department of Neurology researchers and accessible via indoor passages or an outdoor route). The gradient system has 40mT/m maximum strength with a slew-rate of 200T/m/sec. The RF-system includes 8 parallel receiver channels each with a 1MHz bandwidth. The console room is equipped with a two stimulus presentation systems for functional studies. In addition, movies and music can be presented to the subject during structural imaging. Computational and Backup/Archive Workstations Each employee of CMFI is provided with a workstation with an Intel Pentium-IV or better processor, 512MB of RAM, 80GB hard disk, CD-RW drive, running Microsoft Windows XP Pro. A wide variety of software is available for CFMI staff, including the MS Office XP suite of applications, which includes Word, Excel, PowerPoint, Visio, Project, and Outlook; SPSS statistical packages, Adobe Photoshop, and utilities, such as Adobe Acrobat, SSH, FTP, and Norton antivirus software. Internet Connectivity The LINUX Cluster and staff workstations are connected to the Georgetown University network, which has a Internet2 connection to the internet. All of the CFMI computers are behind a CISCO firewall that limits access from the outside. Key personnel are given unique VPN access allowing access to the computation resources from home and other work sites. Computer Security CFMI local area network (LAN) is protected by Cisco PIX 525E firewall, which has 2x1Gbit ports for LAN and WAN traffic, and 100Mbit interface for "demilitarized" zone (DMZ), that hosts the CFMI web server. It provides security from outside threats and supports constant virtual private networks (VPN) connectivity between several centers including CFMI offices in Building D, CSL (Center for the Study of Learning), and the SAIL (Small Animal Imaging Laboratory, 7T MRI Facility) as well as a "dial-in" VPN connectivity for remote access by CFMI users. LINUX Cluster CFMI is currently equipped with a 40-node Linux compute cluster, which has 10 TB of attached disk storage. Every node is equipped with standard software for statistical analysis of fMRI and structural MRI data as well as visualization utilizing software packages such as SPM, FSL, and MEDx. All data are backed up weekly by writing to a 60-tape Ultrium Archive and Arkiea (Carlsbad, CA) backup software. All computers are linked via an area 1000 base-T Ethernet Local Area Network (LAN). In addition, the center is currently equipped with 20 PCs running Windows XP. All PC’s can be used to login into the cluster and run an analysis via the center’s LAN and are equipped with Microsoft Office and Adobe Photoshop. Physical Facilities The Center for Functional and Molecular Imaging occupies approximately 3800 square feet of space in the Preclinical Science Building and 1000 square feet of office space in Building D, consisting of 12 separate offices with four additional larger rooms that hold several desks and computer consoles and a printer. These rooms provide work areas for up to 24 individuals. There is also a reception area for the Office Assistant and a meeting area that s used solely by center personnel for meetings and to host guest lectures and discussions. There is also an additional space set aside for office services, such as a copying machine, a fax machine and a set of mailboxes. We have two behavioral testing and evaluation rooms. These rooms are suitably furnished with adjustable tables, chin rests and computers for subject testing and training as well as data entry and transfer. Subject waiting areas are staffed. This setup ensures subject confidentiality is preserved and provides a comfortable waiting area for family members. Both rooms will contain a computer for subject testing and a video camera for recording sessions with subjects. DEPARTMENT OF PSYCHIATRY The Department of Psychiatry, under the chair of Steven Epstein, M.D., consists of 66 full and part-time faculty members on site and 300 clinical faculty members. The department has a total of 43 offices located on campus in Kober-Cogan Hall. Faculty and staff engage in extensive research, clinical, and educational efforts throughout the university and beyond. Georgetown Center for Trauma and the Community. The CTC is an interdisciplinary center housed in the Department of Psychiatry (Bonnie L. Green, PhD, PI [P20 MH 068450] and Director). It has the goal of developing culturally appropriate, innovative, and sustainable interventions to address trauma-related mental health needs of low-income and minority populations seen in safety net primary care settings in the Washington DC area. Academic partners include Georgetown’s School of Nursing and Health Studies, and its Departments of Family Medicine, with research and/or training relationships with Physiology & Biophysics, Neurology, Pediatrics, and Medicine. The Center provides coordinated research training and mentoring, and maintains ongoing community partnerships to inform its direction and to implement collaborative research activities. To increase the adoption and sustainability of these interventions, trauma-related services are conceptualized and developed in close collaboration with four community partners: the Department of Health (Division of Maternal and Child Health) and Greater Baden Medical Services, Inc., both in Prince George’s County, MD; the Primary Care Coalition in Montgomery County, MD; and Unity Health Care, Inc. in the District of Columbia. The Center’s work is conducted through three cores providing administrative oversight, statistical support, and access to expert advisory groups; innovative methods/research designs integrating perspectives from applied anthropology and public health; and expertise and support for the development of promising research. The working partnerships and infrastructure capability fostered through the Center provide support to develop comprehensive trauma intervention models that can be translated to settings serving low-income individuals in the Washington DC area, and elsewhere. The Center for Mental Health Outreach has been established to improve the mental health of underserved children, adults and families in the Greater Washington DC area through education, public awareness, direct clinical care, and collaboration with social service providers. The Department of Psychiatry has nationally recognized expertise in finding ways to deliver mental health services to underserved people. Training, education, and program evaluation further strengthen the Center’s capacity to serve as a mental health resource for the community. As pioneers in adapting effective treatment methods to the special needs of underserved populations, the Center for Mental Health Outreach will continue to expand the Department’s current research and service collaborations with other departments, agencies, organizations and institutions, such as the Depression and Related Affective Disorders Association (DRADA). The Qualitative Data Lab at Georgetown University Department of Psychiatry has the capacity for storing recorded data, and for processing and analyzing qualitative data. All recorded data are stored on secure password protected computer networks and CD-ROMs. The lab resources include digitizing software that can convert magnetic audiotape and videotape recordings into digital format. All recordings are digitized to mpeg format because of compression efficiency and compatibility with qualitative software. The primary qualitative software used in the lab is ATLAS.ti which can be used to review recorded data, transcribe relevant portions of recordings, and conduct data searches. An important feature of ATLAS.ti is that it allows for the coding of not just text, but also audio and video data. The lab is staffed by research assistants who have been trained to digitize recordings, manage qualitative data and use the qualitative software. Research offices include space for the research team. The department also has three conference rooms available for meetings. The Psychiatry Research Conference room is wired to accommodate a PolyCom phone conferencing system. Two TV/VCRs and two laptops are available for presentations (LCD projectors are available from GU audiovisual dept). Equipment: Each member of the research team will have a computer available connected to a LaserJet printer. The computers are equipped with a modem and have a variety of software available for word processing and statistical analysis (SPSS, SAS). Mainframe services are available for statistical procedures that are unavailable on microcomputers. All computers are connected to the Internet. Three photocopy machines are available within the department along with three fax machines and an HP 6350 CXI wit auto-feeder printer. DEPARTMENT OF PHYSIOLOGY AND BIOPHYSICS Department of Physiology and Biophysics: The Department of Physiology and Biophysics, under the chair of Zofia Zukowska, MD, Ph.D., consists of 25 full and part-time faculty members and 20 research assistants and Ph.D candidate students. The department occupies about 6,800 square feet of research laboratory space in the Second floor of the Basic Science Building and in the Lombardi New Research Building. The offices and conference rooms occupy about 2,600 square feet. Faculty members engage in extensive research and educational efforts throughout the medical center. Laboratory: Dr. Amri’s laboratory is located in the Basic Science building and occupies approximately 471 square feet of space. The tissue culture room of about 130 square feet is also located on the same floor. Additional space is available on an as needed basis. The Department is fully equipped for cell culture and all biochemical, morphological and molecular procedures described in the application. Office: The Principal Investigator has about 120 square feet of office space total. Each investigator has additional office space in the Department of Physiology and Biophysics. The Department provides telephone and Fax. Stress Physiology and Research Center (SPaRC) at the Department of Physiology and Biophysics, Georgetown University. While clinical studies of the impact of stress on health and disease are many, mechanistic human or animal studies are very sparse. There are only a handful of centers around the world, where stress biology and/or stress management is being studied, and none approaches the field in an integrative and comprehensive way. As a result, stress research has been too dispersed, interdisciplinary cooperation is poor, and communication between researchers in traditional fields of stress biology and medicine and those studying complementary medicine and mind-and-body modalities has been missing. Newly formed Stress Physiology and Research Center (SPaRC) (or Stress Center for short) at Georgetown University will fulfill this gap. Its mission is to study stress physiology in a comprehensive, integrative way, encompassing both traditional medical sciences as well as alternative and complementary medical fields (CAM). The Center capitalizes on expertise and research at Georgetown, beginning with the Department of Physiology and Biophysics, and other basic departments of the Medical Center, and clinical departments, beginning with Department of Psychiatry. The Stress Center also collaborates with others departments of the University, as well as the Cardiovascular Research Institute at Medstar/Washington Hospital Center. The strengths of this Stress Center is that its research is multi-departmental, based on both basic and clinical investigations and translational medicine, and integrative in nature, encompassing genetic, molecular, cellular and whole animal and human studies. The foundations for the newly formed Center already exist and are based on wellestablished and federally-funded projects in the Department of Physiology and Biophysics, Neuroscience, Biochemistry and Molecular Biology, Medicine, Psychiatry and Demography. The Center will also carry out investigations into the physiology of anti-stress or relaxation modalities, recently introduced into the CAM educational and research program, at the Department of Physiology. Investigators currently involved in working under the Stress Center include researchers from the Departments of Physiology and Biophysics (Dr. Amri is an active member of SPaRC), Biochemistry and Molecular Biology, Neuroscience, Medicine and Endocrinology, Psychiatry, the Center for Population and Health, and Cardiovascular Research Institute/Medstar, The Center also includes a Human Stress Physiology Lab for conducting basic stress reactivity tests, measuring hemodynamic, cognitive and behavioral, as well as biochemical parameters, allowing for the phenotyping of human behavior and health with measurable outcomes. Macromolecular Analysis Shared Resource: The LCCC Macromolecular Analysis Shared Resource utilizes DNA sequencing, micro array, real-time PCR, phosphorimaging, densitometry, luminescence, molecular modeling and spectrophotometry to support researchers on the Georgetown University campus for a nominal fee. The resource instruments include a DNA sequencer (ABI 377), Multiimage workstation (Alpha Innotech Chemiimager 5500), a phosphorimager (Molecular Dynamics 445SI), molecular modeling equipment from Silicon Graphics with Insight II modeling software from Molecular Simulations, a fluorescence spectrophotometer (Hitachi F-4500), a Fluorescence Polarization plate reader (Tecan Ultra), UV/VIS spectrophotometer (DU640), and Wallac Victor2 multilabel counter. The shared resource also includes Agilent Technologies’ Bioanalyzer, ABI Real-time PCR (7900 HT, a robot capable sequence detection system) and a fully integrated Affymetrix GeneChip Instrument System. The Genechip system includes a fluidics station 400, hybridization oven 640, GeneArray scanner and computer workstations for instrument control and data analysis. The shared resource also maintains multiple software types for data analysis and provides data analysis services for array users. The equipment is operated by two support staff and two co-faculties. This resource is supported, in part, by a peer-reviewed NCI Cancer Center Support Grant to the LCCC and modest user fees. Approximately 68 investigators utilize this facility annually. Proteomics Shared Resource: The Proteomics core is equipped to provide a broad spectrum of proteomics services to the research community. The services include technologies for the fractionation of complex protein mixtures coupled with mass spectrometry. The Proteomics Core Facility is equipped with a 4800 MALDI -TOF -TOF Mass Spectrometer (Applied Biosystems), a 4700 ABI MALDI-TOF-TOF mass spectrometer, Thermo Electron LTQ ion trap mass spectrometer connected to a nano-HPLC system (LC-Packings) and QSTAR Elite Hybrid LCMS/MS system, a nanoHPLC system online with a Probot MALDI spotter (Agilent). These instruments will provide you with a range of techniques to analyze different aspects of a fractionated protein sample. The Facility is also equipped with A complete set of 2D gel electrophoresis apparatus from Bio-Rad (IEF cell and Protean XL), and the DALT6 large format 2D electrophoresis system (Amersham Biosciences), high resolution densitometer G800, a PDQuest proteomics software (BioRad) for 2D gel image analysis as well as Dymension software from Syngene. The core provides two distinct mass spectrometry services, intact mass analysis (to identify masses of proteins/peptides in relatively pure solutions) and protein identification using peptide mass mapping (involving trypsin digestion of protein followed by mass spectrometry of the resulting peptide fragments). We routinely perform 2D gel electrophoresis for proteins from cell, serum or tissue lysates, image analysis for differential protein expression followed by protein identification using Mass Spectrometry. The core has also developed protocols to successfully identify proteins from Immunoprecipitation reactions. The core plans to upgrade the 2D electrophoresis by the introduction of DIGE technology, use robotics for spot picking and also test, develop and optimize protocols for Multidimensional Protein identification from reaction samples, non-radioactive differential protein labeling using the SILAC, ICAT or ITRAQ systems and also serum profiling studies. GEORGETOWN UNIVERSITY MEDICAL CENTER RESOURCES National Center for Cultural Competence. The mission of the National Center for Cultural Competence (NCCC) is to increase the capacity of health care and mental health programs to design, implement and evaluate culturally and linguistically competent service delivery systems. The NCCC conducts an array of activities to fulfill its mission including: (1) training, technical assistance and consultation; (2) networking, linkages and information exchange; and (3) knowledge and product development and dissemination. Major emphasis is placed on policy development, assistance in conducting cultural competence organizational self-assessments, and strategic approaches to the systematic incorporation of culturally competent values, policy, structures and practices within organizations. The NCCC is a component of the Georgetown University Child Development Center and is housed within the Department of Pediatrics of the Georgetown University Medical Center. It is funded and operates under the auspices of Cooperative Agreement #U93-MC-00145-08 and is supported in part from the Maternal and Child Health program (Title V, Social Security Act), Health Resources and Services Administration, Department of Health and Human Services. DO WE NEED THIS? Community Research & Learning Network (CoRAL) – promotes partnerships between researchers and community-based organizations that mobilize their collective resources to support social change. Partners include faculty/researchers, the community organizations, and GU students. The network provides opportunities for faculty/researchers to pursue collaborative projects with community members. General Clinical Research Center (GCRC). The objective of the GCRC program is to make available to medical scientists the resources that are necessary for the conduct of clinical research. The General Clinical Research Center (GCRC) is funded by a grant from the National Institutes of Health (NIH) and offers the faculty of Georgetown University Medical Center and peer reviewed funded investigators from the surrounding District of Columbia hospitals the optimal environment in which to conduct clinical research. The GCRC does not fund specific research projects, but provides infrastructure and support in the form of inpatient beds, outpatient services, staff and core equipment necessary to conduct studies. The General Clinical Research Centers (GCRC) program of the NIH was established in 1960 to create and sustain specialized institutional resources in which clinical investigators can observe and study human physiology as well as study and treat disease with innovative approaches. The objective of the GCRC program is to make available to medical scientists the resources that are necessary for the conduct of clinical research. The primary purpose for a GCRC is to provide the clinical research infrastructure to investigators who receive peer-reviewed primary research funding from the NIH and other components of the US Government. It can also be used to support other hypothesis-based research and can be available for industry-sponsored research at cost. The Clinical Research Center occupies the east wing of seventh floor of the Main Hospital Building of the Georgetown University Medical Center. The GCRC will provide space and nursing and other technical staff at no cost. Labaoratory costs are budgeted at cost. The Georgetown University Bioanalytical Center (BAC) within the GCRC is a chromatography lab located on the ground level of the Preclinical Sciences Building and occupies approximately 1500 square feet in rooms GD1 and GD3. The BAC contains a core laboratory that is funded as part of the Georgetown University Clinical Research Center but is also available to investigators at the Medical Center, the University, as well as outside clients, on a fee-for-service basis. The laboratory is dedicated to the development, validation and application of bioanalytical methods in support of clinical, pharmacokinetic and pharmacogenetic studies as well as basic pharmacological research. The staff consists of experienced laboratory scientists, all of whom are capable of performing the following services: Method development for established and experimental drugs or other analysis of interest. Method validation • • • • • • • • • • Sample analysis from clinical studies Confirmation of mass and/or purity of products resulting from synthesis or in-vitro metabolism studies Chiral assays Separation and collection of chiral enantiomers Immunoassays Use of HPLC as sample clean-up for immuno-assays Liquid chromatograph with mass spectroscopy High performace liquid chromatography with UV and fluorescence detection Gas chromatography with nitrogen, phosphorus, and flame ionization detection Capillary electrophoresis with UV and laser-induced detection The laboratory maintains the equipment listed below: • • • • • • 5 HPLC systems with UV, fluorescence and electrochemical detectors (ThermoSeparations/Agilent) 2 Capillary electrophoresis (CE) systems with UV detectors (ABI/PE) 1 CE system with UV and laser-induced florescence (LIF) detectors (Biorad) API-3000 Mass Spectrometer (sciex) (Applied Biosystems) API-4000 Mass Spectrometer IMMULITE by Diagnostic Products Inc Some of the systems are fully automated with autosamplers and on-line computer-based data collection. All necessary support equipment for the storage {three -80 °C and one -20 °C freezers} and preparation {balances, pH meters, centrifuges, solid phase extraction apparatus, etc} of clinical samples are contained within the laboratory. The Center for Clinical Bioethics (CCB). was established in 1991 as a center of excellence at Georgetown University Medical Center, complementing the activities in ethics of the other divisions of the University. Thus, the CCB functions in concert with the Kennedy Institute of Ethics and the Department of Philosophy on the main campus, as well as with faculty at the Law Center. Center scholars participate in internal review boards, the Georgetown University Hospital Ethics Committee, and interdisciplinary and post-care rounds. Faculty also collaborate with MedStar’s ethics program based at the Washington Hospital Center. Visiting scholars from all over the world participate in seminars, meetings, consultations, and all programs of the CCB. The faculty of the CCB have primary appointments in Internal Medicine, Family Medicine, Philosophy, Nursing, and Oncology. They conduct research in the philosophy of medicine, end-of-life issues, beginning-of-life concerns, genetics, research ethics, and organizational ethics, teach, and participate in patient care. The CCB also coordinates the Medical Center’s Ethics Consult Service on behalf of the Ethics Committee of Georgetown University Hospital. Faculty members teach research ethics in the graduate school and in the DC Clinical Research Training Consortium. The four-year Bioethics Curriculum for Health Care Professionals is directed by the CCB, and combines graduate nursing students with second year medical students in a single, innovative course. The Center also organizes a formal ethics curriculum for the Internal Medicine house staff. The CCB sponsors colloquia and conferences, providing continuing ethics education for faculty, staff, students, and the wider community, both local and national. The CCB coordinates the bioethics track in the MD/PhD combined degree program. The School Of Nursing And Health Studies. The mission of the School of Nursing & Health Studies (NHS) is consistent with that of the University’s mission to provide student-centered, excellent undergraduate and graduate professional education in the Jesuit and Catholic tradition. NHS continues its long tradition of preparing morally reflective health care leaders and scholars who strive to improve the health and well being of all people, with sensitivity to cultural differences and issues of justice. Since its founding in 1903, NHS has been at the forefront of the health care field, preparing future leaders to respond to the growing complexity of health care delivery at all levels. Graduates pursue various careers within nursing, medicine, law, health policy, health management, and public health among many others. The Undergraduate Program offers its students a broad liberal arts education balanced with the natural and behavioral sciences through innovative curricula in either the Bachelor of Science in Nursing (BSN) or the Bachelor of Science (BS) in Health Studies with majors in Health Care Management and Policy, Human Science, and International Health. The Master of Science degree programs lead to advanced nursing practice in six specialty areas: Nursing Education, Nurse Midwifery / Women’s Health, Acute Care Nurse Practitioner, Acute and Critical-Care Clinical Nurse Specialist, Family Nurse Practitioner, and Nurse Anesthesia. The Master of Science in Health Systems Administration is taught in conjunction with the School of Business and does not require a BSN. The Center on Health and Education focuses on the development and testing of culturally competent prevention, intervention strategies, and public policies that promote the health of individuals and families, and empower communities in order to eliminate racial/ethnic health disparities across the life span. St. Mary’s Hall, renovated during 2001-2002, is the home of NHS and houses the offices for administration, faculty, and staff, and includes classrooms, conference rooms, a computer laboratory, a simulator center and a technologies laboratory. Academic instruction occurs in one of six new multi-media class rooms: one room with 122 desks, three rooms with 50 desks each and two seminar rooms with 22 seats. The student commons is equipped with computers, lockers, a study area, and gathering place for students. CAPRICORN (Capital Area Primary Care Research Network) is a network of providers of primary care health care in Washington metropolitan area interested in conducting practice-based research and was founded by proposed co-investigator and current faculty member in the department of Family Medicine, Dan Merenstein, M.D. CAPRICORN identifies and conducts research studies that expand the science base of primary care. CAPRICORN provides efficient means of studying outcomes in primary care, thus being highly applied and practical for physicians and health care providers. CAPRICORN pools patient populations of differing ethnic and socioeconomic status, allowing greater application of research findings and the ability to compare and contrast different populations. CAPRICORN is supported by the primary care units of Georgetown University School o of Medicine, which strengthens its capacity for protocol development and human subjects review. Community Partners Unity Health Care, Inc. (Unity), Washington, DC is the largest private organization providing primary medical care to low-income, uninsured District of Columbia residents. A 501©(3), private, nonprofit agency, Unity operates health centers established under Section 330(h) Stewart B. McKinney Homeless Assistance and 330(e) of the US Public Health Services Act. In 2001 as a result of its long history of providing high quality primary health care to the city's indigent, uninsured, and underserved, Unity was sought out to assist in the transformation of the District's publicly operated health care system. As a contractor of the District of Columbia Department of Health, Unity operates six ambulatory health centers throughout the city bringing the total to eleven health centers, in addition to nine homeless health care sites, two HIV/AIDS treatment centers, and a high school-based health center. Unity has ongoing clinical partnerships with Georgetown, including eight Family Medicine physicians. In 2001, prior to expanding to include six former city-run ambulatory care centers, Unity served over 38,000 clients. Over half of the clients earned less than 200% of the poverty level, 38% below 100%. Fifty seven percent of Unity's patients were uninsured, with the rest covered by Medicaid (7%), Medicare (4%), other public insurance (30%), and private insurance (1%). Seventy four percent of Unity clients were Black, 21% Hispanic, 1% Asian, 1% white, and 3% unknown. Fifty seven percent of Unity clients were female. Children under the age of five made up 9 % of clients. Children between five and nineteen accounted for an additional 19% of Unity clients. Adults over sixty-five years of age represented 6 % of Unity's patient population in 2001. The addition of the six new ambulatory care centers is expected to double the Unity patient base in 2002. Unity is committed to providing culturally responsible health care and social services. Towards this end, Unity has recruited a diverse workforce and includes issues of cultural sensitivity and competence in the orientation program for all new employees. Many employees are members of the community in which they work. Centers serving clients for whom English is not the primary language have clinical and non-clinical bi- and tri-lingual staff members. Unity has incorporated the goal of providing superior culturally competent health care and social services into its ongoing quality management program. Unity has many longstanding relationships with the District’s academic health centers. Health professional students, including medical students, nursing students, physician assistants, and medical residents from Georgetown, George Washington, Howard, and Catholic Universities partner every day with Unity providers. Unity also has a long tradition of partnering with researchers as part of its commitment to improving the quality of health and healthcare of the District’s residents and communities. Community Partners/Research Sites Primary Care Coalition of Montgomery County, MD. Montgomery County is the largest jurisdiction in Maryland, and it has the largest concentration of Latinos in the greater DC area and the largest minority population in the state of Maryland. The minority population is approximately 12% Latino (mostly from Central and South America), 12% Asian and Pacific Islanders, and 12% African American. Almost 25% of public school children qualify to receive free or reduced meals, and these same children speak more than 150 different languages. There are an estimated 80,000 adults in the county without health insurance, many of whom experience psychosocial stressors associated with poverty, language deficiencies, immigration, and social isolation. Many immigrants originate from war-torn countries. The state of Maryland instituted major changes in its mental health system in 1995 that closed public sector mental health programs and established a network of non-profit and for-profit clinics and private practitioners (Maryland Health Partners) to provide care. Since that time, the capacity for jurisdictions including Montgomery County to provide mental health care has been shrinking as a result of inadequate financing of the program by the state, the resulting bankruptcy of the largest non-profit mental health clinic, and the fragmentation of services across the county. The Primary Care Coalition of Montgomery County, Inc. was established in 1993 to provide access to high quality, culturally sensitive, primary and specialty care services for low-income uninsured children and adults in Montgomery County. Through six safety net clinics, last year the PCC helped support the health care of nearly 13,000 patients through more than 20,000 patient visits. It also manages a variety of programs, including a county-funded program that provides support for the safety net health-care providers, and Care for Kids, which purchases primary care for 2,000 uninsured children. The Coalition is also initiating a Child Assessment Center providing multi-disciplinary services to children who have been the victims of child abuse and neglect. In 2001, the Montgomery County Council committed to funding a "system of primary care" through the Primary Care Coalition. In December 2004, the Montgomery County Executive and County Council announced long-term support for Montgomery Cares, a Coalition program to expand access to care to 40,000 presently uninsured people through a network of community clinics. Demonstration projects in dental health, and in mental health (in collaboration with Georgetown University), have been funded and are being mounted. Greater Baden Medical Services Inc. Greater Baden is a federally qualified 501(c)(3) healthcare system that was founded in 1972. The system serves communities in southern Prince George’s County, Charles County, and St. Mary’s County Maryland. Comprised of 5 clinics that provide a spectrum of services, Greater Baden is a community based health provider committed to delivering the highest quality of healthcare services. It provides primary health services and facilitates health promotion/disease prevention activities in an efficient, effective, and comprehensive manner for the individuals and communities served, regardless of ability to pay, and serves as the safety net provider for Southern Maryland. In 2004, the system served over 8000 patients, consisting of over 21,000 medical encounters. Of those, 60% were uninsured, 30% had Medicare/Medicaid, and only 8% had private insurance. The patients served are 66% African American, 21% White, 10% Latino, and 3% other. As a member of the Bureau of Primary Care’s Health Disparities Collaborative, GBMS uses the Chronic Care Model as an operational framework. Programs include Women, Infants, and Children (WIC) services, Access to care for Asian Indians, telemedicine and continuing education, and expanded title III capacity to improve communications with its rural clinics and increase accessibility to computers for staff. Ft. Lincoln Family Medicine Center. The Ft. Lincoln Family Medicine Center in Colmar Manor, Maryland is a full spectrum Family Medicine office caring for children and adults of all ages, including prenatal care. It is affiliated with the Georgetown University Medical Center, serving as a training site for its Family Medicine residents, as well as the Providence Hospital of Prince George’s County, Maryland. The Center’s patients live in Washington D.C. and suburban Maryland. They consist of mostly ethnic minorities, and most patients are on Medicaid or Medicare, although patients with a wide variety of insurance plans are seen. The Center averages about 19,000 patient visits each year. Georgetown Computational Resources Georgetown Computational Core Facility (CCF). All grant/contract proposals to have access to the Georgetown Computational Core Facility (CCF). The CCF provides state-of-the-art computational resources and expertise to researchers who are developing and analyzing computational and/or data intensive models in neuroscience, oncology, cellular processes, and other numerically-intensive disciplines. The primary hardware of the facility are several multi-processor Beowulf clusters capable of serial and parallel-processing across an array of high-speed central processing units (CPU's). This system also provides centralized file-server capabilities as well as resources necessary to archive and secure data. All computer resources are connected to the University's high-speed network, and to the Internet2 Abilene network, and follow the Georgetown University Information Security guidelines in compliance with the NIH Application/System Security Plan for Applications and General Support Systems. On-line user statistics for CCF clusters are available at http://www.clusters.arc.georgetown.edu/statistics/. The Computational Core Facility is administered by the Georgetown University division of Advanced Research Computing (ARC) - http://arc.georgetown.edu. All University Departments can access the resources of this facility. In addition, the facility supports several Ph.D and Master's level personnel with extensive experience in programming and computational support including systems administration and database programming. These personnel help ensure that faculty can take full advantage of available resources, as well as planned NIH technology in the grid computing space. Thus, the CCF provides an Institutionally facility for extensive scientific support for grants and contracts via access to leading edge computational resources. University Information Services (UIS) is charged with providing technology services, access to information, and supporting administrative systems for the faculty, students, staff, and administration of Georgetown University. In addition, UIS is responsible for creating a technology infrastructure to support electronic communication -- voice, video, and data -- now and into the future. UIS operates under the direction of the Vice President for Information Services and Chief Information Officer (CIO), with guidance from various advisory groups. GU Information Services - Video Teleconferencing is available to faculty and staff for professional purposes. This service is made possible via the University’s phone system, a conventional TV, and a PolyCom View Station. There are two rooms on campus that have been specially wired for teleconferencing. Georgetown University has several conference rooms available for departmental functions. The Department of Psychiatry has slated the Research Auditorium located in the Research Building to accommodate the larger workshops and seminars associated with the Center grant proposal. The Research Auditorium houses state of the art equipment and technical experts to assist with functions. In addition to individual conference rooms located throughout the University campus, The Leavey Conference Center houses an on-campus hotel for out of the area participants along with catering services and several interconnected conference rooms. EXISTING EQUIPMENT CENTER FOR FUNCTION AND MOLECULAR IMAGING The Center for Function and Molecular Imaging (CFMI), which is directed by Dr. John VanMeter, currently has a 3T MRI Scanner, a stand-alone high-density EEG system, and two NIRS (Near Infrared Spectroscopy) systems. Equipment 3T MRI Scanner - A research-dedicated 3.0 Tesla Siemens (Erlangen, Germany) Trio whole-body MRI system with EPI (echo planar imaging) capability is located in the Center for Functional and Molecular Imaging (CFMI), Georgetown University Medical Center (near to the Department of Neurology researchers and accessible via indoor passages or an outdoor route). The gradient system has 40mT/m maximum strength with a slew-rate of 200T/m/sec. The RF-system includes 8 parallel receiver channels each with a 1MHz bandwidth. The console room is equipped with a two stimulus presentation systems for functional studies. In addition, movies and music can be presented to the subject during structural imaging. As of December 2007, the Trio MRI scanner was retrofitted with the Tim (Total Image Matrix) upgrade. This upgrade includes a combinable 18 RF channels, a new digital RF transmit/receive system supporting the new matrix coils: a new integrated body coil, a 12-channel head matrix coil, a 24-channel spine matrix coil, and a 4channel neck matrix coil. These matrix coils are compatible with the iPAT (integrated Parallel Acquisition Techniques) technology that supports parallel data acquisition in all phase-encode directions providing up to a 12-fold decrease in acquisition speed and/or significant improvement in the signal-to-noise ratio. The other major feature of the Tim upgrade is the actively shielded water-cooled Siemens exclusive gradient TQ-engine system that includes a noise-optimized system with a complete noise capsule for the whole magnet via a foam insulation of the system covers and an upgrade to the gradient set. The reduction in acoustic noise is up to 20 dB(A) as compared to conventional systems. This is a reduction of 90% in sound pressure. This reduction in noise is most evident in the gradient demanding protocols in particular the EPI-based techniques such as fMRI, diffusion imaging, and perfusion imaging. In addition, the new TQ-engine gradients have maximum gradient amplitude of 45 mT/m for longitudinal direction and 40 mT/m for horizontal and vertical direction, (i.e. 72 mT/m vector summation gradient performance). Physiological Monitoring - Assessment of various physiological measures can be useful for some experiments. The CFMI imaging center has an Invivo Millennia 3155A/3155MVS (Invivo Research, Orlando, FL) physiological monitoring system that captures heart rate (ECG electrocardiogram), respiration rate, end tidal CO2, inspire CO2, and pulse oximetry. Data is acquired with a sampling rate of 1 Hz by the main system and sent to the remote monitor through a wave guide. The monitoring unit is connected to the stimulus presentation computer via a serial cable/port where the measures are recorded. The Psylab/SAM unit (Contact Precision Instruments, Boston, MA) connects to the stimulus presentation computer via a parallel port and is configured to receive event codes from E-Prime. The SAM unit records both galvanic skin response and temperature with a sampling rate of 100 Hz. Eye-tracker - The Mag Design & Engineering (Sunnyvale, CA) eye-tracker glasses use a fiber optic camera to capture the right eye but still allows for binocular viewing of stimuli. This eye-tracker has a 30 Hz sampling rate. Video output from the eye-tracker is connected to a PC via a video tuner card. Output is sent to the ViewPoint software (Arrington Researc, Scottsdale, AZ), which has a real-time recording capability and interface to other software. Currently, this system is configured to receive the trigger pulse from the scanner to signal when to begin recording. With ViewPoint it is possible to record X and Y pupil position (eye-gaze), pupil width, and ocular torsion. EEG Laboratory – The EEG lab is a fully equipped electrophysiology laboratory, with a dedicated Electrical Geodesics high-density EEG system. The EGI GES 250 digitizes 256 channels of data up to 1000 samples/sec, with a 0.1 to 300 Hz bandwidth, and a vertex recording reference. The system is supplied with a dual-processor PowerMac G5, the Apple Cinema Display HD and a digital video synchronized with the EEG signal. The instrument has advanced software for electrode impedance control and eye movement artifact rejection. Averaged event-related potentials (ERPs) can be examined with both topographic waveform plots and surface electrical field animations (maps every 4 ms sample) for each experimental condition. The instrument also allows estimates of radial current density to be made with the Laplacian transform (second spatial derivative of the surface voltages) of the ERP averages across subjects to characterize the features of the head surface electrical fields that can be attributed to superficial cortical sources. The addition of electrophysiology to the other imaging modalities available to Core users will allow experiments combining the superb temporal resolving capabilities ERP approaches with the sensitivity and spatial resolving properties of functional MRI. The integration of these two methods will allow investigation of research questions probing modulations in the spatiotemporal character of brain activity. Near Infrared Spectroscopy (NIRS) – Two continuous-wave Near Infrared Spectroscopy systems are located in the Center. Each has 32 lasers (intensities driven at 32 different frequencies) and 32 detectors. At present, the 32 lasers are divided into 16 lasers at 690 nm and 16 at 830 nm. Alternately, the number of wavelengths can be increased and multiplexed by an optical switch. A master clock generates the 32 distinct frequencies between 6.4 kHz and 12.6 kHz in ~200 Hz steps. These frequencies are then used to drive the individual lasers with current stabilized square-wave modulation. The detectors are avalanche photodiodes (APD’s, Hamamatsu C5460-01), and following each APD module is a bandpass filter, cut-on frequency of ~500 Hz to reduce 1/f noise and the 60 Hz room light signal, and a cut-off frequency of ~16 kHz to reduce the third harmonics of the square-wave signals. After the bandpass filter is a programmable gain stage to match the signal levels with the acquisition level on the analog-to-digital converter within the computer. Each detector is digitized at ~44 kHz and the individual source signals are then obtained by use of a digital bandpass filter (e.g. an infinite-impulseresponse filter). The features of the system will allow a hexagonal-mesh optical probe to be created that spans a rectangle measuring ~12 x18 cm. The two systems can be used together to cover the entire head with 64 detectors. EEG+fMRI Stimulus Presentation – The stimulus presentation system developed by MRA, Inc (Washington, PA) is available. This system features a 2.53GHz Pentium 4 computer with 1Gb of RAM in the CFMI control room that is used to present fMRI paradigms to subjects in the Siemens Trio scanner. Associated with this computer system is audio equipment for playing sound from a variety of sources to the subject and a display system for showing the computer screen, DVD/VCR prerecorded programs, or live-TV from a Cable TV system. The projector in use at the CFMI is an Epson PowerLite 5000. This projector uses three 1.32-inch LCD panels with a range of resolutions 640x480, 832x624, and 1024x768. The stock lens provided with the projector was replaced with a custom made 150-230mm focal length zoom lens built by Buhl Optical (Pittsburgh, PA). The projector is located in the equipment room adjacent to the rear of the scanner room. The projected image displays on a rear projection screen (Da-Lite, Da-plex substrate with Video Vision optical coating) cut to fit the upper half of the scanner bore. The audio amplifier/receiver is a TEAC model AG-370. The graphic equalizer is an TEAC EQA-220 which features ten frequency bands per channel, a multi-colored spectrum analyzer display, left and right channel level controls, an 80 dB S/N ratio, 5-100 kHz (±1 dB) frequency response, and an 0.03% THD. The DVD/CD-player is a Panasonic DMR-E30 which is also capable of playing MP3 audio CD’s. The VCR unit is a JVC HR-S2901U which features Super VHS with digital live circuitry, Super VHS ET allows high resolution recording on a conventional VHS cassette, a Hi-Fi Stereo with built-in MTS decoder, Pro-cision 19u EP heads for near SP quality in EP speed, Ultra-Spec Drive with jitter reduction circuit, shuttle plus, instant review, digital AV tracking, on-screen tape position indicator, variable slow motion, 181-channel cable compatible frequency-synthesized tuner, HQ (High Quality) system circuitry for excellent VHS picture quality, color on-screen display, multi-speed search (19-step SP/21-Step EP) including 5-Speed slow motion. The system also includes 10 fiber optic button response boxes that interface to both E-Prime and SuperLab (Cedrus Corporation, San Pedro, CA). Finally, the system receives fiber optic output from the Siemens Trio scanner for paradigm triggering. Computational and Backup/Archive Workstations Each employee of CMFI is provided with a workstation with an Intel Pentium-IV or better processor, 512MB of RAM, 80GB hard disk, CD-RW drive, running Microsoft Windows XP Pro. A wide variety of software is available for CFMI staff, including the MS Office XP suite of applications, which includes Word, Excel, PowerPoint, Visio, Project, and Outlook; SPSS statistical packages, Adobe Photoshop, and utilities, such as Adobe Acrobat, SSH, FTP, and Norton antivirus software. Internet Connectivity The LINUX Cluster and staff workstations are connected to the Georgetown University network, which has a Internet2 connection to the internet. All of the CFMI computers are behind a CISCO firewall that limits access from the outside. Key personnel are given unique VPN access allowing access to the computation resources from home and other work sites. Computer Security CFMI local area network (LAN) is protected by Cisco PIX 525E firewall, which has 2x1Gbit ports for LAN and WAN traffic, and 100Mbit interface for "demilitarized" zone (DMZ), that hosts the CFMI web server. It provides security from outside threats and supports constant virtual private networks (VPN) connectivity between several centers including CFMI offices in Building D, CSL (Center for the Study of Learning), and the SAIL (Small Animal Imaging Laboratory, 7T MRI Facility) as well as a "dial-in" VPN connectivity for remote access by CFMI users. LINUX Cluster CFMI is currently equipped with a 40-node Linux compute cluster, which has 10 TB of attached disk storage. Every node is equipped with standard software for statistical analysis of fMRI and structural MRI data as well as visualization utilizing software packages such as SPM, FSL, and MEDx. All data are backed up weekly by writing to a 60-tape Ultrium Archive and Arkiea (Carlsbad, CA) backup software. All computers are linked via an area 1000 base-T Ethernet Local Area Network (LAN). In addition, the center is currently equipped with 20 PCs running Windows XP. All PC’s can be used to login into the cluster and run an analysis via the center’s LAN and are equipped with Microsoft Office and Adobe Photoshop. DEPARTMENT OF PHYSIOLOGY AND BIOPHYSICS Macromolecular Analysis Shared Resource: The LCCC Macromolecular Analysis Shared Resource utilizes DNA sequencing, micro array, real-time PCR, phosphorimaging, densitometry, luminescence, molecular modeling and spectrophotometry to support researchers on the Georgetown University campus for a nominal fee. The resource instruments include a DNA sequencer (ABI 377), Multiimage workstation (Alpha Innotech Chemiimager 5500), a phosphorimager (Molecular Dynamics 445SI), molecular modeling equipment from Silicon Graphics with Insight II modeling software from Molecular Simulations, a fluorescence spectrophotometer (Hitachi F-4500), a Fluorescence Polarization plate reader (Tecan Ultra), UV/VIS spectrophotometer (DU640), and Wallac Victor2 multilabel counter. The shared resource also includes Agilent Technologies’ Bioanalyzer, ABI Real-time PCR (7900 HT, a robot capable sequence detection system) and a fully integrated Affymetrix GeneChip Instrument System. The Genechip system includes a fluidics station 400, hybridization oven 640, GeneArray scanner and computer workstations for instrument control and data analysis. The shared resource also maintains multiple software types for data analysis and provides data analysis services for array users. The equipment is operated by two support staff and two co-faculties. This resource is supported, in part, by a peer-reviewed NCI Cancer Center Support Grant to the LCCC and modest user fees. Approximately 68 investigators utilize this facility annually. Proteomics Shared Resource: The Proteomics core is equipped to provide a broad spectrum of proteomics services to the research community. The services include technologies for the fractionation of complex protein mixtures coupled with mass spectrometry. The Proteomics Core Facility is equipped with a 4800 MALDI -TOF -TOF Mass Spectrometer (Applied Biosystems), a 4700 ABI MALDI-TOF-TOF mass spectrometer, Thermo Electron LTQ ion trap mass spectrometer connected to a nano-HPLC system (LC-Packings) and QSTAR Elite Hybrid LCMS/MS system, a nanoHPLC system online with a Probot MALDI spotter (Agilent). These instruments will provide you with a range of techniques to analyze different aspects of a fractionated protein sample. The Facility is also equipped with A complete set of 2D gel electrophoresis apparatus from Bio-Rad (IEF cell and Protean XL), and the DALT6 large format 2D electrophoresis system (Amersham Biosciences), high resolution densitometer G800, a PDQuest proteomics software (BioRad) for 2D gel image analysis as well as Dymension software from Syngene. The core provides two distinct mass spectrometry services, intact mass analysis (to identify masses of proteins/peptides in relatively pure solutions) and protein identification using peptide mass mapping (involving trypsin digestion of protein followed by mass spectrometry of the resulting peptide fragments). We routinely perform 2D gel electrophoresis for proteins from cell, serum or tissue lysates, image analysis for differential protein expression followed by protein identification using Mass Spectrometry. The core has also developed protocols to successfully identify proteins from Immunoprecipitation reactions. The core plans to upgrade the 2D electrophoresis by the introduction of DIGE technology, use robotics for spot picking and also test, develop and optimize protocols for Multidimensional Protein identification from reaction samples, non-radioactive differential protein labeling using the SILAC, ICAT or ITRAQ systems and also serum profiling studies. DEPARTMENT OF PSYCHIATRY The Department of Psychiatry, under the chair of Steven Epstein, M.D., consists of 66 full and part-time faculty members on site and 300 clinical faculty members. The department has a total of 43 offices located on campus in Kober-Cogan Hall. Faculty and staff engage in extensive research, clinical, and educational efforts throughout the university and beyond. The Qualitative Data Lab at Georgetown University Department of Psychiatry has the capacity for storing recorded data, and for processing and analyzing qualitative data. All recorded data are stored on secure password protected computer networks and CD-ROMs. The lab resources include digitizing software that can convert magnetic audiotape and videotape recordings into digital format. All recordings are digitized to mpeg format because of compression efficiency and compatibility with qualitative software. The primary qualitative software used in the lab is ATLAS.ti which can be used to review recorded data, transcribe relevant portions of recordings, and conduct data searches. An important feature of ATLAS.ti is that it allows for the coding of not just text, but also audio and video data. The lab is staffed by research assistants who have been trained to digitize recordings, manage qualitative data and use the qualitative software. Research offices include space for the research team. The department also has three conference rooms available for meetings. The Psychiatry Research Conference room is wired to accommodate a PolyCom phone conferencing system. Two TV/VCRs and two laptops are available for presentations (LCD projectors are available from GU audiovisual dept). Equipment: Each member of the research team will have a computer available connected to a LaserJet printer. The computers are equipped with a modem and have a variety of software available for word processing and statistical analysis (SPSS, SAS). Mainframe services are available for statistical procedures that are unavailable on microcomputers. All computers are connected to the Internet. Three photocopy machines are available within the department along with three fax machines and an HP 6350 CXI wit auto-feeder printer. GEORGETOWN UNIVERSITY MEDICAL CENTER RESOURCES General Clinical Research Center (GCRC). The objective of the GCRC program is to make available to medical scientists the resources that are necessary for the conduct of clinical research. The General Clinical Research Center (GCRC) is funded by a grant from the National Institutes of Health (NIH) and offers the faculty of Georgetown University Medical Center and peer reviewed funded investigators from the surrounding District of Columbia hospitals the optimal environment in which to conduct clinical research. The GCRC does not fund specific research projects, but provides infrastructure and support in the form of inpatient beds, outpatient services, staff and core equipment necessary to conduct studies. The General Clinical Research Centers (GCRC) program of the NIH was established in 1960 to create and sustain specialized institutional resources in which clinical investigators can observe and study human physiology as well as study and treat disease with innovative approaches. The objective of the GCRC program is to make available to medical scientists the resources that are necessary for the conduct of clinical research. The primary purpose for a GCRC is to provide the clinical research infrastructure to investigators who receive peer-reviewed primary research funding from the NIH and other components of the US Government. It can also be used to support other hypothesis-based research and can be available for industry-sponsored research at cost. The Clinical Research Center occupies the east wing of seventh floor of the Main Hospital Building of the Georgetown University Medical Center. The GCRC will provide space and nursing and other technical staff at no cost. Labaoratory costs are budgeted at cost. The Georgetown University Bioanalytical Center (BAC) within the GCRC is a chromatography lab located on the ground level of the Preclinical Sciences Building and occupies approximately 1500 square feet in rooms GD1 and GD3. The BAC contains a core laboratory that is funded as part of the Georgetown University Clinical Research Center but is also available to investigators at the Medical Center, the University, as well as outside clients, on a fee-for-service basis. The laboratory is dedicated to the development, validation and application of bioanalytical methods in support of clinical, pharmacokinetic and pharmacogenetic studies as well as basic pharmacological research. The staff consists of experienced laboratory scientists, all of whom are • • • • • • • • • • Sample analysis from clinical studies Confirmation of mass and/or purity of products resulting from synthesis or in-vitro metabolism studies Chiral assays Separation and collection of chiral enantiomers Immunoassays Use of HPLC as sample clean-up for immuno-assays Liquid chromatograph with mass spectroscopy High performace liquid chromatography with UV and fluorescence detection Gas chromatography with nitrogen, phosphorus, and flame ionization detection Capillary electrophoresis with UV and laser-induced detection This laboratory maintains the equipment listed below: • • • • • • 5 HPLC systems with UV, fluorescence and electrochemical detectors (ThermoSeparations/Agilent) 2 Capillary electrophoresis (CE) systems with UV detectors (ABI/PE) 1 CE system with UV and laser-induced florescence (LIF) detectors (Biorad) API-3000 Mass Spectrometer (sciex) (Applied Biosystems) API-4000 Mass Spectrometer IMMULITE by Diagnostic Products Inc Some of the systems are fully automated with autosamplers and on-line computer-based data collection. All necessary support equipment for the storage {three -80 °C and one -20 °C freezers} and preparation {balances, pH meters, centrifuges, solid phase extraction apparatus, etc} of clinical samples are contained within the laboratory. Georgetown Computational Resources Georgetown Computational Core Facility (CCF). All grant/contract proposals to have access to the Georgetown Computational Core Facility (CCF). The CCF provides state-of-the-art computational resources and expertise to researchers who are developing and analyzing computational and/or data intensive models in neuroscience, oncology, cellular processes, and other numerically-intensive disciplines. The primary hardware of the facility are several multi-processor Beowulf clusters capable of serial and parallel-processing across an array of high-speed central processing units (CPU's). This system also provides centralized file-server capabilities as well as resources necessary to archive and secure data. All computer resources are connected to the University's high-speed network, and to the Internet2 Abilene network, and follow the Georgetown University Information Security guidelines in compliance with the NIH Application/System Security Plan for Applications and General Support Systems. On-line user statistics for CCF clusters are available at http://www.clusters.arc.georgetown.edu/statistics/. The Computational Core Facility is administered by the Georgetown University division of Advanced Research Computing (ARC) - http://arc.georgetown.edu. All University Departments can access the resources of this facility. In addition, the facility supports several Ph.D and Master's level personnel with extensive experience in programming and computational support including systems administration and database programming. These personnel help ensure that faculty can take full advantage of available resources, as well as planned NIH technology in the grid computing space. Thus, the CCF provides an Institutionally facility for extensive scientific support for grants and contracts via access to leading edge computational resources. University Information Services (UIS) is charged with providing technology services, access to information, and supporting administrative systems for the faculty, students, staff, and administration of Georgetown University. In addition, UIS is responsible for creating a technology infrastructure to support electronic communication -- voice, video, and data -- now and into the future. UIS operates under the direction of the Vice President for Information Services and Chief Information Officer (CIO), with guidance from various advisory groups. GU Information Services - Video Teleconferencing is available to faculty and staff for professional purposes. This service is made possible via the University’s phone system, a conventional TV, and a PolyCom View Station. There are two rooms on campus that have been specially wired for teleconferencing. Phylomics® Patent Application: USPTO Application #: 20070259363 Inventors: H. Amri, M. Abu-Asab, and M. Chaouchi Title: Phylogenetic analysis of mass spectrometry or gene array data for the diagnosis of physiological conditions Abstract: A universal data-mining platform capable of analyzing mass spectrometry (MS) serum proteomic profiles and/or gene array data to produce biologically meaningful classification; i.e., group together biologically related specimens into clades. This platform utilizes the principles of phylogenetics, such as parsimony, to reveal susceptibility to cancer development (or other physiological or pathophysiological conditions), diagnosis and typing of cancer, identifying stages of cancer, as well as posttreatment evaluation. To place specimens into their corresponding clade(s), the invention utilizes two algorithms: a new data-mining parsing algorithm, and a publicly available phylogenetic algorithm (MIX). By outgroup comparison (i.e., using a normal set as the standard reference), the parsing algorithm identifies under and/or overexpressed gene values or in the case of sera, (i) novel or (ii) vanished MS peaks, and peaks signifying (iii) up or (iv) down regulated proteins, and scores the variations as either derived (do not exit in the outgroup set) or ancestral (exist in the outgroup set); the derived is given a score of “1”, and the ancestral a score of “0”—these are called the polarized values. Furthermore, the shared derived characters that it identifies are potential biomarkers for cancers and other conditions and their subclasses. (end of abstract) NIH Public Access Author Manuscript Proteomics Clin Appl. Author manuscript; available in PMC 2008 May 5. NIH-PA Author Manuscript Published in final edited form as: Proteomics Clin Appl. 2008 February ; 2(2): 122–134. Evolutionary medicine: A meaningful connection between omics, disease, and treatment Mones Abu-Asab1, Mohamed Chaouchi2, and Hakima Amri2,* 1Laboratory of Pathology, National Cancer Institute, National Institutes of Health, Bethesda, MD, USA 2Department of Physiology and Biophysics, School of Medicine, Georgetown University, Washington, DC, USA Abstract NIH-PA Author Manuscript The evolutionary nature of diseases requires that their omics be analyzed by evolution-compatible analytical tools such as parsimony phylogenetics in order to reveal common mutations and pathways’ modifications. Since the heterogeneity of the omics data renders some analytical tools such as phenetic clustering and Bayesian likelihood inefficient, a parsimony phylogenetic paradigm seems to connect between the omics and medicine. It offers a seamless, dynamic, predictive, and multidimensional analytical approach that reveals biological classes, and disease ontogenies; its analysis can be translated into practice for early detection, diagnosis, biomarker identification, prognosis, and assessment of treatment. Parsimony phylogenetics identifies classes of specimens, the clades, by their shared derived expressions, the synapomorphies, which are also the potential biomarkers for the classes that they delimit. Synapomorphies are determined through polarity assessment (ancestral vs. derived) of m/z or gene-expression values and parsimony analysis; this process also permits intra and interplatform comparability and produces higher concordance between platforms. Furthermore, major trends in the data are also interpreted from the graphical representation of the data as a tree diagram termed cladogram; it depicts directionality of change, identifies the transitional patterns from healthy to diseased, and can be developed into a predictive tool for early detection. Keywords Biomarkers; Cancer; Early detection; Evolution; Omics; Parsimony; Phylogenetics NIH-PA Author Manuscript 1 Introduction Evolution is the unifying theme of all biological disciplines, and all explanations of biological phenomena should be compatible with evolutionary principles. Medicine cannot be an exception. Yet, the vast majority of publications that incorporate recent advances in genomics and proteomics are devoid of evolutionary reasoning and analytical methods. However, only recently there are new calls for the need of evolution in medicine in order to provide explanations for drug resistance in HIV and bacterial strains, autoimmune and degenerative diseases, as well as cancer typing and treatment [1-3]. Cancer development, progression, and maintenance are all evolutionary processes; they mirror similar evolutionary processes at the cellular and population levels in that they all involve genetic modifications, selective pressure, Correspondence: Dr. Mones Abu-Asab, Laboratory of Pathology, National Cancer Institute, NIH, Bldg. 10/Rm 2A33, Bethesda, MD 20892, USA, E-mail: [email protected], Fax: +1-301-480-9197. * Additional corresponding author: Dr. Hakima Amri; E-mail: [email protected]. The authors have declared a conflict of interest. They will seek US patent rights for their UNIPAL algorithm. Abu-Asab et al. Page 2 and clonal propagation [2,4,5]. Therefore, evolution-compatible methods of analysis have a potentially useful role in cancer studies and diagnosis as well [2,6,7]. NIH-PA Author Manuscript Evolutionary medicine seeks to explain the nature of disease in light of evolutionary theory [8]. It views the physicalities of the human body as a result of millions of years of natural selection that present compromises between differentiation at all levels and vulnerabilities [9]. Invoking evolution to explain medical phenomena will contribute to our understanding of how evolution works in diseases and how to counter with the proper treatment. NIH-PA Author Manuscript One of the earliest studies of disease etiology by evolutionary criteria was that of Sarnat and Netsky [5]. They described as “phylogenetic diseases” some of the degenerative and metabolic diseases that occurred in the derived structures of the mammalian brain. However, Azzone [4] attributed many diseases, such as cancer and autoimmunity, to mutations and their sustenance by natural selection, two processes that are at the crux of the evolutionary course of action. Since natural selective pressure is the main force determining diversity of living organisms and their state of health and disease, the data produced by omics (genomics, metabolomics, and proteomics) have to be analyzed in an evolutionary compatible way in order to produce biologically meaningful interpretations [2]. The tool that can bridge the gap between the omics data avalanche and evolutionary medicine is phylogenetics [10-12]. Phylogenetics is an analytical paradigm based on the principles of evolution. It has been employed by biologists in many disciplines such as botany, microbiology, and zoology, to construct relationships in an evolutionary sense at all the levels of the systematic hierarchy and more recently the tree of life [13]. Applying phylogenetic analysis to the omics data creates a paradigm shift where the evolutionary meaning of the data is brought out and applied to produce natural class determination, biomarker recognition, and modeling of the evolutionary processes of disease development. Furthermore, phylogenetic analysis is the evolutionary path between the omics data and their application in various practical settings. As the flowchart of Fig. 1 shows, there are only a few steps leading from raw data to applications: evolutionary polarity assessment of data values, phylogenetic algorithmic analysis, and interpretation. 2 The omics need phylogenetics NIH-PA Author Manuscript As it has become more evident recently, solving many of the problems in biomedical research is not going to be by producing more data, but rather by new methods of analyzing the data [13]. Today’s omics data producing machines are sophisticated, sensitive, and accurate, and in the absence of human errors, their output is reliable and reproducible [14]. However, the over-reliance on parametric statistics for data analysis has reduced the useful inferences of patterns within the data [15]. Inferring of biological processes from data patterns is the main goal of bioinformatics [16], and a superior analytical tool has to be multidimensional. It must be able to accurately reveal biological patterns, processes, and classes; possess high predictivity; seamless and dynamic; able to combine several large datasets from multiple sources; suitable for intra and interplatform comparability; produce higher interplatform concordance; and its results can be utilized for early detection of disease, diagnosis, prognosis, and assessment of treatment [2]. This review will demonstrate how phylogenetics appears to be the most suitable analytical tool to provide the multidimensionality we are seeking, and its ability to translate the omics into a clinical tool. 3 Choosing between phylogenetics and phenetic clustering There are two main schools of analytical thought in the bioinformatics of omics: the phenetic and the phylogenetic. The two differ on the relationship between the data values and the classification [17]. The phenetic school is very predominant in the analysis of microarray Proteomics Clin Appl. Author manuscript; available in PMC 2008 May 5. Abu-Asab et al. Page 3 NIH-PA Author Manuscript expression data where it utilizes a distance matrix to produce dimensionless levels of difference [18]; the grouping of specimens is solely based on overall raw similarity without any evolutionary connotation (for an elaborate comparison of phenetic methods as applied to microarrays see Planet et al. [19]). On the other hand, the phylogenetic school attempts to reconstruct the biological processes from the known patterns within the data in an evolutionary meaningful sense. It sorts out the data into derived and ancestral states and uses only the derived ones to group specimens into a hierarchical model [10,11]. Because the two approaches differ conceptually and algorithmically, they seldom produce total congruency in their results (Fig. 2). NIH-PA Author Manuscript Phylogenetics appears to be very suitable for studying diseases with fast arising mutations, such as cancer, because it can modulate very recent divergence from normal conditions [20]. A phylogenetic analysis produces a hierarchical hypothesis of relationships (i.e. classification) among specimens that aims to reflect relatedness based on shared mutations and altered pathways. It reveals novel states of gene and protein expressions (these can be potential biomarkers, see below) and utilizes their distribution patterns among specimens for modeling their relatedness (i.e. groups specimens on the basis of their shared derived expressions but not overall similarity). Furthermore, it elucidates the direction of change among specimens that leads to their molecular and cellular diversity. The latter point is better illustrated with a graphical tree termed cladogram where the specimens with the highest number of novel expressions are located on the upper part of the tree (Fig. 2A). To illustrate many of the fundamental differences between the two schools, we will examine one dataset analyzed by both methods. Figure 2 shows a phylogenetic cladogram (A) and a phenetic dendrogram (B), both are based on the same dataset [21]. To understand why the two trees have different explanations for the same data, we need to discuss the theoretical difference that they represent. Whereas the cladogram’s hierarchy reflects the similarity between the specimens as based only on their shared derived expression values–data-based, the dendrogram uses Pearson’s similarity coefficient of raw data (both ancestral and derived expression values)–specimen-based. Pearson’s measures the correlation, r, between the specimens and produces a matrix of pair wise similarity ratios between the specimens; the average similarities are then calculated between groups of specimens to plot the dendrogram. Each node of the cladogram is based on the derived expression values that are shared by the specimens located above the node, and the segments’ lengths bear no evolutionary significance (unless a molecular clock is assumed as in genomic-distance data). However, the relative lengths of dendrogram’s segments are indicative of the percentages of shared average similarity between the specimens or groups of specimens. NIH-PA Author Manuscript Furthermore, among the significant differences between the two schools is that a phylogenetic analysis lessens the adverse effect of homoplasy while the phenetic does not [2,17]. Homoplasy is similarity due to convergence, parallelism, and reversal—all are evolutionary phenomena. Convergence occurs when two or more specimens have different developmental pathways for a homologous character state; parallelism is independently acquiring similar non-homologous states; and reversal is reverting to an ancestral state from a derived state. Homoplasies have a more detrimental effect on the phenetic analysis than the phylogenetic because they get an equal weight like all other similarity in the phenetic, while in the phylogenetic they compete against all other hypotheses of character distributions to generate the most plausible explanation of the data [22]. When considering information content, transmission, and retrieval, a phylogenetic classification is the most effective and efficient; it allows the storage of data in the smallest size diagram [17]. For example, because of its hierarchical nature, the cladogram indicates the direction of change from low to high, the dendrogram does not. This characteristic of a Proteomics Clin Appl. Author manuscript; available in PMC 2008 May 5. Abu-Asab et al. Page 4 NIH-PA Author Manuscript cladogram gives significance to a specimen’s position on the cladogram and the arrangements of branches. As we are going to see below, this allows the extrapolation from the cladogram to a clinical setting. 4 The omics meet phylogenetics In the field of phylogenetics, a plethora of publications has accumulated during the last 30 years where phylogeneticists differed as to which approach is the optimum one for data analysis. There are three widely used methods to carry out phylogenetic analysis on omics data; these are Bayesian, likelihood, and parsimony. The first two methods are statistically based and related in that they require an explicit model of evolution such as homogenous rate of mutations, while parsimony is a non-statistical method that uses the minimum number of steps to explain the data. The three methods claim to produce the best hypothesis of relationships among a group of specimens [15,19]. Maximum likelihood method was devised by Felsenstein [23] to calculate the maximum likelihood function of a tree by incorporating specific assumptions (such as Markovian evolution and Poisson substitution) and branch lengths–times and mutation rates combined. The Bayesian approaches are derived from the likelihood method to measure the maximum posterior probability of individual trees by a sampling mechanism that incorporates branch lengths, substitution models, and their prior distribution [15]. NIH-PA Author Manuscript When it concerns omics data, the choice of the analytical method to carry out a phylogenetic analysis is based on the optimum hypothesis of character states’ distribution with highest fidelity to the data matrix, and obtaining the sought after information namely biomarkers, altered expressions and pathways, as well as disease classes (the clades). From a practical point of view, there is an obvious conflict between our stated goals and the first two statistical-based methods; their definitions of clades are irrelevant to medical interpretations, and their trees do not allow the tracing back of derived states, which are the potential biomarkers. Furthermore, the Bayesian and maximum likelihood approaches may be inefficient in dealing with heterogeneous rates of mutations and large number of specimens [15,24], could mistakenly attribute high probability to ambiguous groups, and may erroneously separate true sister groups because of their unequally long branches [25]. NIH-PA Author Manuscript Maximum parsimony requires fewer parameters estimation than maximum likelihood [26], and functions better than Bayesian and likelihood when data are heterogeneous (i.e. have various rates of mutation such as in cancer) [24,27,28]. Currently, we lack any predictive model for most of the current diseases studied by omics to fulfill the parameters needed for a Bayesian or likelihood analysis. The question of whether some diseases like cancer follow a developmental model is still unanswered, although it is assumed that the specimens of a disease share common pathway aberrations. A recent phylogenetic analysis of MS proteomes of three cancers (ovarian, pancreatic, and prostate) has shown the probability of cancerous developmental models that transcend the three types [2]. Furthermore, data analysis of genomic and proteomic developments over a few clonal generations where the rate of change is fast and heterogeneous, is more suitable for parsimony analysis rather than Bayesian or likelihood. By identifying uniquely shared derived states (the synapomorphies), parsimony analysis uncovers all the potential biomarkers within the dataset. It also defines natural classes, the clades, which are circumscribed by the synapomorphies (Fig. 2A). The most parsimonious tree it produces, the cladogram, maintains the same data pattern of the data matrix, therefore, the fidelity of the cladogram’s character distribution to the original data is verifiable and can be extended to a clinical setting. The state of currently used analytical tools for omics data places the scientists between two extremes, phenetic clustering and Bayesian/likelihood phylogenetics, without getting the expected rewards from either. Neither method seems to be the suitable paradigm for the omics Proteomics Clin Appl. Author manuscript; available in PMC 2008 May 5. Abu-Asab et al. Page 5 era. However, maximum parsimony appears to be the most optimum paradigm for the phylogenetic analysis of the omics data [15,28]. NIH-PA Author Manuscript 5 Predictivity of a parsimony phylogenetic analysis A major problem that has characterized many omics-based studies is the provisional nature of their conclusions [29], also known in phylogenetics as low predictivity or lack of it. For a set of specimens, there is only one correct hypothesis of relationships that is based on their profiles; its predictive power is directly correlated with the robustness of the hypothesis—its capacity to fulfill its predictions. It is well established that a phylogenetic classification has a higher predictivity than a phenetic one [17]. Predictivity here is not the same as class prediction [30]. While class prediction deals with assigning a specimen to a known class, predictivity is the ability to list or predict a specimen’s characteristics when its class becomes known (i.e. when biomarkers or an algorithmic analysis assigns the specimen to a class on the basis of its omics profile). Therefore, predictivity is a statement of accuracy on the hypothesis of relationships and its class definitions. NIH-PA Author Manuscript Predictivity is important when the hypothesis of relationships will be extended to and implemented in a clinical setting, i.e. translating the phylogenetic classification into practice. For example, high predictivity is needed in cancer diagnosis and prognosis where the classification of cancers has been mostly based on microscopy and a few immunohistochemistry markers. Furthermore, tumors with similar histopathology have shown divergent clinical courses and outcomes [6,30]. Applying evolutionary parsimony phylogenetics to cancer omics will produce a cancer classification that encompasses all types of available data and is expected to have the highest degree of predictivity. Therefore, having a predictive system of cancer classification brings higher objectivity to diagnosis and prognosis, as well as robust biomarker identification. 6 Shared derived patterns: synapomorphies 6.1 General remarks NIH-PA Author Manuscript Parsimony phylogenetics is based on the principle that shared derived patterns, the synapomorphies, can circumscribe natural groups called clades (s. clade). The shared derived patterns in the omics context may constitute a number of novel changes that occur in the specimens under study. These encompass all genetic mutations, novel and lost proteins, upand down-regulated proteins, over and under-expressed RNA, as well as dichotomously asynchronous (DA) expression patterns of proteins and genes (see Section 8). A clade’s members share one or more of these synapomorphies. For example, if only the specimens of pancreatic cancer share a unique mutation that is not shared by any normal specimens, then this mutation constitutes a synapomorphy, and specimens carrying the mutation are members of a clade. However, as explained below, there is an exact procedure for determining what constitutes a synapomorphy. 6.2 Evolutionary polarity assessment: identifying synapomorphies Major omics techniques, such as MS proteomics and microarray, produce data in absolute values that impose limitations on their use and interpretation due mainly to inconsistency in reproducibility [14,29]. These data are usually utilized as similarity matrices for cluster analyses or probed with custom algorithms in search of novel values. There are major drawbacks for the direct use of absolute values in an analysis; prominent among them is the limitation on comparability within and between platforms as well as the lack of directionality within this form of data. Even the conversion of the data to similarity matrix by t and F statistics as well as fold-change results in significant loss of meaningful information [19,31]. Proteomics Clin Appl. Author manuscript; available in PMC 2008 May 5. Abu-Asab et al. Page 6 NIH-PA Author Manuscript The absolute data can be transformed into discrete values with evolutionary polarity assessment (EPA) [2]. In two-state and multistate characters, EPA is used to determine the proper evolutionary sequence of states, and consistent with this purpose, it is used here to sort out an absolute value into one of two states: ancestral or derived. Putting this in a mathematical context, the derived is given the value of 1, and 0 for the ancestral. The EPA process transforms the initial data into discrete binary states of 0s and 1s. To implement EPA on a dataset, the experimental design should include a control subset of specimens; for example, when studying a cancer type, the control specimens should be healthy non-cancerous specimens. The control specimens will be used as the outgroup against which the values of the experimental specimens (the ingroup) will be compared. Table 1 illustrates how the process of polarity assessment is carried out for every m/z or gene-expression values of the experimental specimens. First, for every m/z point, the minimum and maximum values of the controls are determined–the range; if the value of an experimental specimen falls within the controls’ range then it is considered ancestral and is assigned the value 0; if it falls outside the range, it is said to be derived and assigned a value of 1. Thus, the new transformed matrix is a polarized matrix with 0s and 1s. NIH-PA Author Manuscript It is clear here that the number of control specimens used for an outgroup comparison is an important criterion to correctly polarize the data and eliminate noise. For an analysis to be meaningful and provide high predictivity, the number of normal specimens that incorporates the maximum variation per population should be established [32]. An added advantage to the EPA data-transforming process is that it diminishes data inconsistency—a difficult to control noise that stems from several, mostly incontrollable, factors during the experiment and data collection. However, noise reduction by EPA is handled by using control specimens in the experiment as the outgroup for polarity assessment. There are theoretical and practical implications to transforming the data through EPA. For every specimen, the 1s represent the novel change that does not exist in the control outgroup, and therefore, may be indicative of a genetic mutation or protein modification depending on the data at hand. The 1s are called apomorphies (s. apomorphy); and if, all the experimental specimens have 1 for the same data point, then this data point is a shared derived state and is termed a synapomorphy. Therefore, all synapomorphies are potential biomarkers (see Section 7). The 0s and 1s of a specimen make up its profile of ancestral and derived states. This profile determines the relatedness of the specimen to other specimens through the apomorphies they share—the synapomorphies. Therefore, class membership is determined by the competing number of synapomorphies among the specimens on the basis of maximum parsimony. NIH-PA Author Manuscript There are several new ways of utilizing the polarized data in analysis that are not attainable with the original absolute data such as pooling of datasets as well as intra- and interplatform comparability. Polarized data from one experiment can be directly subjected to an algorithmic analysis, or several polarized data from separate experiments with different specimens can be pooled together to produce an inclusive analysis. Furthermore, genomic, metabolomic, and proteomic data for the same set of specimens can be polarized separately, pooled in one matrix, and analyzed together to produce an inclusive analysis based on the three sets. Table 1 shows an example of pooling with two small real datasets. The number of polarized datasets that can be pooled together is theoretically limitless. This type of pooling is possible because polarized datasets have equal weight since each identifies the apomorphies of its specimens by discrete rather than absolute values. The hypothetical example of Table 1 illustrates the pooling of two datasets, MS proteome and microarray for the same group of specimens. Each set was polarized using its own respective Proteomics Clin Appl. Author manuscript; available in PMC 2008 May 5. Abu-Asab et al. Page 7 NIH-PA Author Manuscript set of controls, and then the two polarized sets (right side of A and B) were pooled and analyzed by a parsimony program. Although each of the two polarized datasets produced multiple equally parsimonious cladograms, the inclusive matrix produced one most parsimonious cladogram (C). Using the data ranges of the control specimens provides higher stringency than using the statistical means (see also Section 8 on complex patterns below). By utilizing the controls’ range as an ancestral criterion, every data value is evaluated individually to determine its evolutionary polarity; the ones falling within the controls’ range are assigned an ancestral status. Using the statistical means of the experimental specimens by averaging their values does not exclude the values that fall within the controls’ range, distorts the significance of their distribution, and prevents their tracking in the analysis. Furthermore, statistical means misrepresent data points that violate normal distribution—the ones with DA distribution (Fig. 3). 7 Evolutionary definition of biomarkers as synapomorphies NIH-PA Author Manuscript As biomarker discovery is a highly sought after criterion in the omics data, one will favor the analytical method that makes this process accurate, meaningful, and achievable. Parsimony phylogenetic analysis differs from likelihood and phenetic methods in that it maintains the identity of data points through the computational process, thus it allows the identification of every significant shared derived value that defines the natural groups of the hierarchical classification—the clades. To carry out a parsimony phylogenetic analysis on a set of data, EPA of the data points is needed to determine whether a gene expression or m/z value is derived or ancestral. A shared derived state, a synapomorphy, among a number of diseased specimens is a potential biomarker for the group. Equating biomarkers with synapomorphies has an evolutionary connotation because it defines a natural group of specimens sharing similar expression and declares their ontogenic relatedness. This logical definition of what constitutes a biomarker requires a clear declaration that the biomarker is derived in relation to the controls and shared by all the members of its clade. A synapomorphic biomarker can be supported by other synapomorphies that circumscribe the same clade; the higher the number of synapomorphies the higher the confidence in the predictivity of the selected biomarker. In addition, the occurrence of several synapomorphies for a clade offers more choices for selecting the optimum biomarker. 8 Incorporating complex patterns of omics NIH-PA Author Manuscript Proteomic and genomic data contain expression patterns that cannot simply be reduced to a statistical abstraction for data analysis. Such patterns are misrepresented when transformed into means, compared in fold-changes, or excluded from the analysis due to their complexity. One such pattern that is pervasive in cancer specimens is the DA distribution of gene expressions and proteins in a group of specimens [33]. The term dichotomous refers to a twopeak distribution with one peak above and the other below the range of control/normal specimens, while asynchronous denotes deviation from the normal range (Fig. 3). DA seems to be a population phenomenon that is only noticeable when a good number of specimens are included in the study. Statistical methods of analysis usually average the values of the specimens in the study in order to carry out comparisons by either t- and F-statistics or fold-change, and therefore, misrepresent and overlook any meaningful interpretation of the distribution pattern of the DA expressions. The presence of several to many DA gene expressions and proteins in a set of specimens Proteomics Clin Appl. Author manuscript; available in PMC 2008 May 5. Abu-Asab et al. Page 8 NIH-PA Author Manuscript underscores a complex pattern of pathway diversity that is difficult to model or classify in a simple phenetic clustering, but can be dealt with effectively and meaningfully in a parsimony phylogenetic context. Evolutionary polarity assessment for this phenomenon should consider it a multistate character and assigns different symbols for the values above and below the normals’ range; a parsimony phylogenetic analysis will then deal with each of these states as independent from one another. For a discussion and examples of multistate coding and analysis see Felsenstein’s instructions for using PHYLIP [34]. Although the DA pattern is a known phenomenon to scientists, and has been reported in tissues and cell lines [35,36], there has been no meaningful explanation for such variation or analytical considerations, and its evolutionary implications are still unknown. However, it may be related to the presence of several developmental pathways of cancer and other diseases [2]. 9 Parsimony phylogenetic analysis of omics: an example Currently available computing power permits the processing of large size matrices with relative speed. It is possible now to run a large data matrix with hundreds of specimens and tens of thousands of data points per specimen within a reasonable time [15]. During our experimentation with polarized proteomic data matrices on the parsimony program MIX, we managed to run a 23 × 106-point matrix (180 specimens) in 18 h on a 3.2 GHz CPU. NIH-PA Author Manuscript We have selected the parsimony program MIX of Felsenstein [34] to carry out a maximum parsimony phylogenetic analysis because of its speed, reliability, available settings, and output format (MIX is freely available from http://evolution.gs.washington.edu/phylip.html). MIX is part of the PHYLIP package that contains a number of other applications that can be utilized for a number of phylogenetic analyses for the same dataset such as likelihood and distance. All of these programs are controlled by a menu that allows options for the analysis. There are only a few examples of phylogenetic analysis of omics data [2,6,7,37]. One of the practical problems for many researchers that limit their usage of phylogenetic programs is the transformation process of the raw data to an input format that is acceptable by the program. Figure 4 presents an example of a parsimony phylogenetic analysis of MS SELDI-TOF proteomic serum data of prostate cancer patients and healthy men. The normalized raw data was polarized according to the polarity assessment method described in Section 6.2 and explained by an example in Table 1. However, this process was automated and carried out here by a computer program written by the authors (UNIPAL, Universal Polarity Assessment Algorithm) [2]. NIH-PA Author Manuscript UNIPAL transformed the original raw data into a matrix of 0s and 1s for all the cancerous specimens by using the normal specimens’ range for every m/z point as the baseline for polarity assessment (outgroup comparison). Then, the newly produced polarized matrix was processed with MIX to run a Wagner maximum parsimony. There are two output files produced by MIX, both in text format and can be read by any text reader program: the Outfile and the Treefile. The Outfile has all the equally parsimonious trees that data supports in a graphical format; and a list of all the synapomorphies supporting every node can be produced if one invokes this option. The Treefile lists the same trees as in the Outfile in a text format; this can be used to draw and edit the trees in other programs such as TreeView [38]. MIX produced one most parsimonious cladogram for this dataset that is based on 36 prostate cancer patients and 49 healthy men (Fig. 4). One most parsimonious cladogram means that MIX found only one tree that has the smallest number of steps to explain the relationship between the specimens. With other datasets, MIX may produce several equally parsimonious cladograms with some variation in minor branches. An interpretation of the topology of the cladogram and what it reveals about the trends in the dataset is discussed below. Proteomics Clin Appl. Author manuscript; available in PMC 2008 May 5. Abu-Asab et al. Page 9 10 The structure of an omics cladogram NIH-PA Author Manuscript The cladogram is the graphical representation of the hypothesized hierarchical relationships among the specimens that defines classes of specimens. The tip of each line of the cladogram denotes a specimen. It is the most efficient summary of the information contained in the raw data [17]. Each node on the cladogram is justified by the shared derived state(s) among the specimens of one of the segments. NIH-PA Author Manuscript The topology of the cladogram also conveys general trends within the data that are not obvious otherwise by other types of analysis [2]. Our own analyses of several MS proteomic and microarray data have indicated that there are three distinct sections of the cladogram: the basal, the middle, and upper. The basal contains most of the normal specimens; the middle has the transitional specimens between the normal and cancer, and the upper section has the cancerous ones. Figure 4 shows a parsimonious cladogram of MS proteomic serum specimens that were taken from healthy and diseased men. Its upper section has a dichotomy defining two major clades of the prostate cancer; both are more or less equal in size. The basal section is restricted to healthy specimens; it has a well-defined large clade encompassing the majority of specimens, one minor clade below the large one, and a few single-specimen clades at the bottom. Additionally, the middle section has mostly single-specimen clades in tandem that are of normal specimens and cancerous ones. The lower part of this section has the normal clades and upper part has the cancerous clades. Similar cladogram topology exists so far in all large genomic and proteomic datasets that we have analyzed thus far [2]. This makes the cladogram the only tool thus far that could identify the transitional patterns from healthy to cancerous, and possibly renders it a predictive tool for early disease detection. 11 Testing the congruence of omics’ data Incongruity of omics data is a criterion that has become a topic of serious debate [14,29], and the field is in need of robust method for testing congruence. The parsimony phylogenetic analysis as described here offers an evolutionary approach for testing concordance of datasets. In addition to carrying out inclusive analysis by the pooling of multiple omics datasets, several other data processes are possible under this model. Thus far we have focused in our presentation and discussion on examples of high-throughput data experiments, the approach outlined here is also applicable to many other types of data such as 2-D gels, as well as chromosomal and genomics data. As long as data polarity can be determined, a parsimony analysis is most likely achievable. NIH-PA Author Manuscript Interplatform comparability is attainable here at two levels: first, by testing the congruence of the data from two or more sources for the same set of specimens (for example, does proteomic data produce the same classification as genomic data?); secondly, by testing the congruence of the synapomorphies (the potential biomarkers) among different sets of specimens. Having multiple datasets for the same specimens allows the testing for the data accuracy and the robustness of the classification hypothesis [20,39]. Small sample size and variable methods of analysis are two chronic problems in published studies and experimental designs, which prevent direct comparisons of the results and conclusions. However, applying EPA and parsimony analysis will enable us to test the congruity of an experiment at the two levels mentioned above. The phylogenetic model produced higher congruence in synapomorphies of two separate sets of specimens from the same tissue type. We tested the concordance of two published studies [21,40] on uterine fibroids (leiomyomas) by comparing their two lists of synapomorphies, and found 62% concordance in synapomorphies despite the variation in the number of probes between the two Proteomics Clin Appl. Author manuscript; available in PMC 2008 May 5. Abu-Asab et al. Page 10 datasets; which was greatly enhanced in comparison with the 13% concordance shown between the published statistical gene lists of the two studies. NIH-PA Author Manuscript Because of its hierarchical nature, phylogenetic classification makes it possible to test gene linkage and specific alterations to pathways. The path of synapomorphies from the base of the cladogram to its tip is a sequential developmental map for the successional events that produce the different stages of the disease and the diversity of its specimens. These synapomorphies are the shared derived alterations that we need to identify since they will elucidate the disease etiology and are the biomarkers of its various stages. 12 Translating phylogenetic analysis of omics into clinical practice The practical aspect of phylogenetic analysis can be realized in various ways: better diagnosis of diseases and disorders through evolutionary classification that reflects the real ontogeny and phylogeny of disease, better treatment by fine targeting of pathways, and better assessment of health status from faster and cost-effective omics data analysis. The following illustrates through theoretical clinical scenarios the potential applications of parsimony phylogenetic analysis of omics in a clinical setting. Scenario A: Routine health assessment NIH-PA Author Manuscript The health status of the individual can be routinely assessed during a routine checkup from a small blood specimen (<0.5 mL) for early detection of degenerative diseases and cancer. The serum fraction is submitted for proteomic MS analysis, and the spectra are analyzed using parsimony phylogenetics against the serum of control specimens (healthy and diseased). The location of this individual on the cladogram (within healthy, diseased, or transitional clades, Fig. 5) determines the health status (healthy, diseased, or transitioning from healthy to disease). A specimen of a healthy individual assembles within the healthy clades. Scenario B: Early detection and prevention Individuals located within the transitional clade, nested between the healthy and cancer specimens (in this example, Fig. 5), are at-risk of developing cancer. Therefore, the at-risk individuals are accumulating mutations that are making them susceptible but have not yet reached clinical manifestations. In the evolutionary medical paradigm offered by the parsimony phylogenetic analysis, this person would be “at-risk” of developing disease/cancer. Preventive medicine could play a major role in this case. Scenario C: Diagnosis NIH-PA Author Manuscript If the phylogenetic analysis places the individual’s proteomic specimen within the cancer clades (Fig. 5), then the patient is a cancer carrier. We have demonstrated before, but have not yet validated, that for three cancer types (ovarian, pancreatic, and prostate) each produced its own clades separate from the other two, therefore, it is possible in a comprehensive analysis to place the patient within the respective type of cancer [2]. Scenario D: Post-treatment evaluation and prognosis Depending on the position of the patient’s proteomic specimen within the cancerous clades (basal, middle, or terminal) of the analysis cladogram, the cancer clinical stage can be determined. The staging here is based on the derived mutations that patient carries—the apomorphies. After a course of treatment – chemotherapy, radiation, surgery – the patient’s progress can be evaluated by a proteomic-phylogenetic analysis of their serum. If the location of the patient’s specimen on the analysis cladogram has moved from the cancerous clades to the normal, then Proteomics Clin Appl. Author manuscript; available in PMC 2008 May 5. Abu-Asab et al. Page 11 the treatment has succeeded. Follow up and monitoring can be carried out periodically by this minimally invasive and cost-effective method. NIH-PA Author Manuscript 13 Conclusions NIH-PA Author Manuscript Evolutionary analysis and interpretation of the omics data offer a unifying paradigm for the various types of the data and provides a multidimensional application of the analysis in a medical context. Cellular processes involved in disease development recapitulate evolutionary processes; they involve genetic modifications, selective pressure, and clonal propagation. Furthermore, diseases are currently assumed to be natural classes and subclasses with each having its own unique aberrations in developmental pathways. Thus, employing of evolutionary polarity assessment to sort out uniquely derived omics states coupled with parsimony phylogenetic analysis seem to provide a predictive, seamless, and dynamic evolutionary classification of the specimens that accurately reveal biological classes, patterns, and processes. This parsimonious paradigm is also capable of combining several large datasets from multiple sources for inclusive analyses, produces higher interplatform concordance, and offers intra and interplatform comparability. Additionally, a parsimonious cladogram reveals the directionality of change within a set of specimens, and could be utilized for early detection, diagnosis, prognosis, assessment of treatment, and biomarker identification. The parsimony phylogenetic approach could also serve as the basis for the individualized medicine of the 21st century. References NIH-PA Author Manuscript 1. Nesse RM, Stearns SC, Omenn GS. Medicine needs evolution. Science 2006;311:1071. [PubMed: 16497889] 2. Abu-Asab M, Chaouchi M, Amri H. Phyloproteomics: what phylogenetic analysis reveals about serum proteomics. J Proteome Res 2006;5:2236–2240. [PubMed: 16944935] 3. Shackney SE, Silverman JF. Molecular evolutionary patterns in breast cancer. Adv Anat Pathol 2003;10:278–290. [PubMed: 12973049] 4. Azzone GF. The nature of diseases: evolutionary, thermodynamic and historical aspects. Hist Philos Life Sci 1996;18:83–106. [PubMed: 8940904] 5. Sarnat HB, Netsky MG. Hypothesis: Phylogenetic diseases of the nervous system. Can J Neurol Sci 1984;11:29–33. [PubMed: 6704791] 6. Pennington G, Smith CA, Shackney S, Schwartz R. Expectation-maximization method for reconstructing tumor phylogenies from single-cell data. Comput Syst Bioinformatics Conf 2006:371– 380. [PubMed: 17369656] 7. Desper R, Khan J, Schaffer AA. Tumor classification using phylogenetic methods on expression data. J Theor Biol 2004;228:477–496. [PubMed: 15178197] 8. Nesse RM. How is Darwinian medicine useful? West J Med 2001;174:358–360. [PubMed: 11342524] 9. Culotta E, Pennisi E. Breakthrough of the year: evolution in action. Science 2005;310:1878–1879. [PubMed: 16373538] 10. Felsenstein, J. Inferring phylogenies. Sinauer Associates; Sunderland, MA: 2004. 11. Wiley, EO. Phylogenetics: The Theory and Practice of Phylogenetic Systematics. John Wiley and Sons; New York: 1981. 12. Hennig, W. Phylogenetic systematics. University of Illinois Press; Urbana: 1966. 13. Whitfield J. Linnaeus at 300: we are family. Nature 2007;446:247–249. [PubMed: 17361152] 14. The MicroArray Quality Control (MAQC) project shows inter- and intraplatform reproducibility of gene expression measurements. Nat Biotechnol 2006;24:1151–1161. [PubMed: 16964229] 15. Goloboff, PA.; Pol, D. Parsimony, phylogeny, and genomics. Albert, VA., editor. Oxford University Press; Oxford, New York: 2005. p. 148-159. 16. Gascuel, O. Mathematics of evolution and phylogeny. Gascuel, O., editor. Oxford University Press; New York: 2005. p. 1-8. Proteomics Clin Appl. Author manuscript; available in PMC 2008 May 5. Abu-Asab et al. Page 12 NIH-PA Author Manuscript NIH-PA Author Manuscript NIH-PA Author Manuscript 17. Farris JS. The information content of the phylogenetic system. Syst Zool 1979;28:483–519. 18. Albert, VA. Parsimony, phylogeny, and genomics. Albert, VA., editor. Oxford University Press; Oxford, New York: 2005. p. 1-11. 19. Planet PJ, DeSalle R, Siddall M, Bael T, et al. Systematic analysis of DNA microarray data: ordering and interpreting patterns of gene expression. Genome Res 2001;11:1149–1155. [PubMed: 11435396] 20. Kumazawa Y, Nishida M. Sequence evolution of mitochondrial tRNA genes and deep-branch animal phylogenetics. J Mol Evol 1993;37:380–398. [PubMed: 7508516] 21. Quade BJ, Wang TY, Sornberger K, Dal Cin P, et al. Molecular pathogenesis of uterine smooth muscle tumors from transcriptional profiling. Genes Chromosomes Cancer 2004;40:97–108. [PubMed: 15101043] 22. Mickevich MF. Taxonomic congruence. Syst Zool 1978;27:143–158. 23. Felsenstein J. A likelihood approach to character weighting and what it tells us about parsimony and compatibility. Biol J Linnean Soc 1981;16:183–196. 24. Stefankovic D, Vigoda E. Phylogeny of mixture models: robustness of maximum likelihood and nonidentifiable distributions. J Comput Biol 2007;14:156–189. [PubMed: 17456014] 25. Siddall ME. Success of parsimony in the four-taxon case: long-branch repulsion by likelihood in the Farris zone. Cladistics 1998;14:209–220. 26. Goloboff PA. Parsimony, likelihood, and simplicity. Cladistics 2003;19:91–103. 27. Stefankovic D, Vigoda E. Pitfalls of heterogeneous processes for phylogenetic reconstruction. Syst Biol 2007;56:113–124. [PubMed: 17366141] 28. Kolaczkowski B, Thornton JW. Performance of maximum parsimony and likelihood phylogenetics when evolution is heterogeneous. Nature 2004;431:980–984. [PubMed: 15496922] 29. Coombes KR, Morris JS, Hu J, Edmonson SR, Baggerly KA. Serum proteomics profiling–a young technology begins to mature. Nat Biotechnol 2005;23:291–292. [PubMed: 15765078] 30. Golub TR, Slonim DK, Tamayo P, Huard C, et al. Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 1999;286:531–537. [PubMed: 10521349] 31. Allison DB, Cui X, Page GP, Sabripour M. Microarray data analysis: from disarray to consolidation and consensus. Nat Rev Genet 2006;7:55–65. [PubMed: 16369572] 32. Graybeal A. Is it better to add taxa or characters to a difficult phylogenetic problem? Syst Biol 1998;47:9–17. [PubMed: 12064243] 33. Lyons-Weiler J, Patel S, Becich MJ, Godfrey TE. Tests for finding complex patterns of differential expression in cancers: towards individualized medicine. BMC Bioinformatics 2004;5:110. [PubMed: 15307894] 34. Felsenstein J. PHYLIP: Phylogeny Inference Package (Version 3.2). Cladistics 1989:164–166. 35. Fulda S, Poremba C, Berwanger B, Hacker S, et al. Loss of caspase-8 expression does not correlate with MYCN amplification, aggressive disease, or prognosis in neuroblastoma. Cancer Res 2006;66:10016–10023. [PubMed: 17047064] 36. Reed JC, Meister L, Tanaka S, Cuddy M, et al. Differential expression of bcl2 protooncogene in neuroblastoma and other human tumor cell lines of neural origin. Cancer Res 1991;51:6529–6538. [PubMed: 1742726] 37. Uddin M, Wildman DE, Liu G, Xu W, et al. Sister grouping of chimpanzees and humans as revealed by genome-wide phylogenetic analysis of brain gene expression profiles. Proc Natl Acad Sci USA 2004;101:2957–2962. [PubMed: 14976249] 38. Page RD. TreeView: an application to display phylogenetic trees on personal computers. Comput Appl Biosci 1996;12:357–358. [PubMed: 8902363] 39. Miyamoto MM, Fitch WM. Testing species phylogenies and phylogenetic methods with congruence. Syst Biol 1995;44:64–76. 40. Hoffman PJ, Milliken DB, Gregg LC, Davis RR, Gregg JP. Molecular characterization of uterine fibroids and its implication for underlying mechanisms of pathogenesis. Fertil Steril 2004;82:639– 649. [PubMed: 15374708] Proteomics Clin Appl. Author manuscript; available in PMC 2008 May 5. Abu-Asab et al. Page 13 Abbreviations DA NIH-PA Author Manuscript dichotomously asynchronous EPA evolutionary polarity assessment Glossary Definitions of terms Clade a group of specimens sharing one or more synapomorphies Cladogram a graphic classification of hierarchical relationships among specimens based on the synapomorphies (shared derived characters). It is also a summary of trends that occur within the data, and shows the directionality of accumulation of change with the highest number of synapomorphies shared by the specimens that are closer to the upper part of the cladogram NIH-PA Author Manuscript Dendrogram a tree diagram used to graphically illustrate the arrangement of the clusters produced by a phenetic clustering algorithm (see Phenetic Clustering). Dendrogram is produced in computational biology (e.g. microarray analysis) to illustrate similarity and gene-linkage Dichotomous asynchronicity a two-tailed pattern of protein or gene expression in a number of specimens with a physiological abnormality (e.g. cancer) in comparison with the normal specimens. Usually, the values of m/z protein or gene expression of the abnormal specimens are outside the range of the normal specimens (i.e. above and below the normal range) Dynamic classification a classification that has the capacity to incorporate novel specimens without major alterations to the composition of its main groups or their relationships NIH-PA Author Manuscript Evolutionary medicine a branch of medicine that seeks to explain the nature of disease in an evolutionary context Homoplasy similarity due to convergence, parallelism, or reversal. Convergence occurs when two or more specimens have different developmental pathways for a homologous character state; parallelism is independently acquiring similar non-homologous states; and reversal is reverting to an ancestral state from a derived state Ingroup the group of specimens under study, for example, diseased specimens Interplatform comparability the ability to compare several datasets produced on different platforms. Evolutionary polarity assessment transforms the omics data into polarized matrices of discrete values (0/1), therefore, makes possible the ability to compare Proteomics Clin Appl. Author manuscript; available in PMC 2008 May 5. Abu-Asab et al. Page 14 two or more separate experiments, and/or to combine several experiments in one large analysis NIH-PA Author Manuscript Interplatform concordance the intersection (sharing) of significant m/z values and gene-lists produced by two or more separate experiments. The higher the concordance between independent experiments the larger the number of common proteins and genes, and the more significant the results Mass to charge ratio (m/z) a unique value of a protein’s mass (m) to the total charge (z) it carries. The m/z is the value obtained from laser desorption mass spectrometry machines such as SELDI or MALDI Outgroup a group of specimens used to polarize the ingroup values of m/z or gene expression into ancestral (plesiomorphic) and derived (apomorphic) Parsimony NIH-PA Author Manuscript means simplicity, the preferred hypothesis is the one requiring the least number of explanations (Occam’s Razor). In the omics context, the preferred phylogenetic cladogram is the one that requires the least number of steps to construct it from the polarized data matrix Phenetic clustering grouping specimens on the basis of similarity without a priori sorting of similarity into ancestral and derived. Phenetics does not provide any information about the evolutionary phylogenetic relationships among specimens Phylogenetic classification a classification that uses synapomorphies to delimit clades (i.e. monophyletic groups), it provides evolutionary phylogenetic relationships among specimens Polarity assessment also known as outgroup comparison. It is the basis of sorting out the data values (whether proteomic [m/z], or microarray expression values) into ancestral and derived. In large datasets, it transforms absolute numbers of data values into polarized binary numbers (0/1), where zero (0) signifies ancestral and one (1) signifies derived NIH-PA Author Manuscript Predictive classification a classification that reveals the characteristics of a specimen when its place in the classification is determined Synapomorphy & biomarker a shared derived protein or gene expression value in comparison with a number of normal specimens (the outgroup). A protein synapomorphy may have one of the following conditions: (i) a novel protein, (ii) a disappeared protein, (iii) upregulated protein, (iv) down-regulated protein, and (v) dichotomously asynchronous regulated protein (the m/z values are above and below the normals’ range but not within the normals’ range). A gene synapomorphy may have one of the following conditions: (i) over-expressed value above normals’ range, (ii) under-expressed value below the normals’ range, (iii) dichotomously asynchronous values, and (iv) undetectable expression value. Biomarkers are unique proteins or gene expressions that could delimit (or characterize) a group of specimens sharing a physiological condition. Because synapomorphies also Proteomics Clin Appl. Author manuscript; available in PMC 2008 May 5. Abu-Asab et al. Page 15 NIH-PA Author Manuscript group together specimens sharing uniquely derived protein or gene expression into clades (i.e. every clade has its own set of synapomorphies), these synapomorphies are potential biomarkers for the clade they define NIH-PA Author Manuscript NIH-PA Author Manuscript Proteomics Clin Appl. Author manuscript; available in PMC 2008 May 5. Abu-Asab et al. Page 16 NIH-PA Author Manuscript NIH-PA Author Manuscript Figure 1. Flowchart outlining the various stages of an evolutionary phylogenetic analysis of omics data, as well as the interpretation and translation of the analysis results into a clinical setting. NIH-PA Author Manuscript Proteomics Clin Appl. Author manuscript; available in PMC 2008 May 5. Abu-Asab et al. Page 17 NIH-PA Author Manuscript NIH-PA Author Manuscript Figure 2. Phylogenetics vs. Phenetics. (A) Phylogenetic cladogram based on maximum parsimony analysis, and (B) phenetic dendrogram based on Pearson’s correlation for the same data set [21]. While the cladogram resolves the relationship between the leiomyoma and leiomyosarcoma specimens by finding 32 uniquely expressed synapomorphies shared by both groups and 20 synapomorphies distinguishing leiomyosarcomas from the leiomyomas, the dendrogram fails to resolve this relationship and clusters the leiomyomas with normal myometrium specimens. The cladogram has directionality for accumulated synapomorphies, and the dendrogram does not. For example, the cladogram indicates that the leiomyosarcoma specimen GSM11779 has the highest number of synapomorphies, and GSM11769 has the lowest. Dataset GDS533 available at http://www.ncbi.nlm.nih.gov/geo/. NIH-PA Author Manuscript Proteomics Clin Appl. Author manuscript; available in PMC 2008 May 5. Abu-Asab et al. Page 18 NIH-PA Author Manuscript NIH-PA Author Manuscript NIH-PA Author Manuscript Figure 3. Dichotomously asynchronous protein and gene-expression. Two-tailed distribution that occurs in a group of cancerous specimens. (A) Protein intensity at m/z 12 215 of 11 specimens of prostate cancer and 17 normals; six cancerous specimens show upregulation and five downregulations. (B) RNA signal intensity of ten specimens of uterine leiomyosarcoma showing four specimens overexpressing and six underexpressing Akt1. Proteomics Clin Appl. Author manuscript; available in PMC 2008 May 5. Abu-Asab et al. Page 19 NIH-PA Author Manuscript NIH-PA Author Manuscript Figure 4. NIH-PA Author Manuscript A most parsimonious cladogram produced by MIX for MS serum proteomic data of 36 prostate cancer patients and 49 healthy men. Each specimen had 15 144 m/z data points; polarity assessment was carried out by UNIPAL. Each line that ends on the right side of the figure represents a specimen. The red part of the cladogram indicates the cancerous specimens as diagnosed before the experiment; the green section indicates the healthy specimens; and the blue shows the presumed healthy specimens that seem to form a transitional zone between the healthy and cancerous clades. Proteomics Clin Appl. Author manuscript; available in PMC 2008 May 5. Abu-Asab et al. Page 20 NIH-PA Author Manuscript NIH-PA Author Manuscript Figure 5. Translating the parsimony phylogenetic analysis of omics into clinical practice. A schematic topology of a typical proteomic cladogram of a cancer analysis. There are two major cancerous clades at the upper section of the cladogram; transitional clades in the middle section; and the basal healthy clades. Adding an unknown specimen to an analysis will have three possible scenarios: scenario A indicates the likely location of a healthy specimen within the healthy clades; scenario B places a specimen from a susceptible individual with the transitional clades between the healthy and cancerous clades; and scenario C would locate a cancerous specimen within one of the two major cancer clades. A post-treatment analysis may change the location from the cancerous to transitional or healthy clades depending on the treatment’s efficacy. NIH-PA Author Manuscript Proteomics Clin Appl. Author manuscript; available in PMC 2008 May 5. NIH-PA Author Manuscript Table 1 NIH-PA Author Manuscript NIH-PA Author Manuscript PTAFR PRKCD ACAT1 MARCKS OGDH CRK CHKA GPR109B HTR1B IL2RG Gene 928.70 979.43 1018.58 1110.48 1228.72 1304.16 1407.12 1511.54 1623.71 1711.73 m/z 18 371.6 184.6 446.4 142.3 22.8 175 44.8 90.7 419.4 A 115 222 145 200 105 142 114 143 245 157 A 70.1 418.8 189 458.2 145.3 28 140.1 29 72.6 356.7 B 59 145 76 126 84 94 70 111 94 91 B 21.2 375.9 200.2 493.2 238.5 28.7 110.8 30.8 37.3 427.3 C 163 273 225 257 223 211 217 249 263 222 C 11.6 447.5 217 556.5 187.5 32.1 159.8 27.3 40.7 332.5 D Controls 137 263 200 271 180 201 196 241 214 183 D Controls 19.7 353 235.1 372.1 162.8 25 180.1 41 48.4 399.4 E 131 256 196 186 176 134 150 211 183 164 E 11.6 353 184.6 372.1 142.3 22.8 110.8 27.3 37.3 332.5 Min 59 145 76 126 84 94 70 111 94 91 Min 45 97 72 81 69 79 62 75 73 71 F 70.1 447.5 235.1 556.5 238.5 32.1 180.1 44.8 90.7 427.3 Max 93.4 239.4 196.8 324.3 171.9 41.6 121.9 147.4 38.1 297.7 F B. Gene Expression Data 163 273 225 271 223 211 217 249 263 222 Max Intensity A. MS Proteomic Data 29 178.8 144.4 340.7 152.2 43.4 70.3 65.8 66.7 565.6 G 62 111 101 85 91 90 96 120 115 121 G 9.1 260.1 321.2 338.8 123.4 11.1 215.6 53.9 42.7 169.7 H Experimentals 62 111 75 102 72 66 72 85 95 72 H Experimentals 34.4 267 101.6 437.2 158.4 28.5 80 18.6 47.2 287.4 I 577 379 665 319 222 123 109 157 151 115 I 89.6 270.2 138.7 501.2 175.8 23.8 130.6 35.1 57.1 387.7 J 145 238 208 225 202 186 204 258 200 179 J 1 1 0 1 0 1 0 1 0 1 F 1 1 1 1 1 1 1 1 1 1 F 0 1 1 1 1 1 0 1 0 1 H 1 1 1 1 0 0 0 0 0 0 I 0 1 1 1 0 1 1 1 0 1 G 1 1 1 1 1 1 1 1 0 1 H 0 1 1 0 0 0 1 1 0 1 I Polarized Values of Experimentals 0 1 0 1 0 1 0 0 0 0 G Polarized Values of Experimentals 1 1 1 0 0 0 0 0 0 0 J 0 0 0 0 0 0 0 1 0 0 J From omics to cladogram. The process of evolutionary polarity assessment is illustrated by using a sample of MS proteomic data, (A), and gene expression data, (B). Each dataset consists of ten specimens: five controls and five experimental. The minimum and maximum (i.e. range) of the controls are determined for each m/z point and gene, and then intensity values of the experimental specimens are transformed to either 0 or 1 depending on whether its value is within the controls’ range, or outside, respectively. The polarized values from the proteomic data, (A), and gene-expression, (B), are pooled together and processed in an algorithmic parsimony analysis (MIX) to produce the consensus cladogram (C). The synapomorphies for each clade are listed at its node Abu-Asab et al. Page 21 Proteomics Clin Appl. Author manuscript; available in PMC 2008 May 5. B NIH-PA Author Manuscript A C D Controls E NIH-PA Author Manuscript m/z Min Max Intensity F G H Experimentals I J F G H I Polarized Values of Experimentals NIH-PA Author Manuscript A. MS Proteomic Data J Abu-Asab et al. Page 22 Proteomics Clin Appl. Author manuscript; available in PMC 2008 May 5. OMICS: A Journal of Integrative Biology OMICS: A Journal of Integrative Biology: http://mc.manuscriptcentral.com/omics Phylogenetic Modeling of Heterogeneous Gene-Expression Microarray Data from Cancerous Specimens r Fo Journal: Manuscript ID: Manuscript Type: Complete List of Authors: OMI-2008-0010 Original Article 26-Feb-2008 Pe Date Submitted by the Author: OMICS: A Journal of Integrative Biology er Abu-Asab, Mones; NCI, Lab of Path Chaouchi, Mohamed; Georgetown University, Department of Physiology and Biophysics Amri, Hakima; Georgetown University, Department of Physiology and Biophysics Cancer, DNA Microarrays, Gene Expression, Data Analysis, Biological Databases ew vi Re Keyword: Mary Ann Liebert, Inc., 140 Huguenot Street, New Rochelle, NY 10801 Page 1 of 51 Phylogenetic Modeling of Heterogeneous GeneExpression Microarray Data from Cancerous Specimens Mones S. Abu-Asab1*, Mohamed Chaouchi2*, and Hakima Amri2§ 1 Laboratory of Pathology, National Cancer Institute, National Institutes of Health, r Fo Bethesda, MD 20892, USA. Phone 301-496-2164, Fax 301-480-9197, Email: [email protected] 2 Pe Mohamed Chaouchi. Department of Physiology and Biophysics, School of Medicine, er Georgetown University, Washington, DC 20007, USA. Phone 202-687-8594, Fax 202687-7407, Email: [email protected]. vi 2 Re Hakima Amri Department of Physiology and Biophysics, School of Medicine, Georgetown University, ew 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 OMICS: A Journal of Integrative Biology Washington, DC 20007, USA. Phone 202-687-8594, Fax 202-687-7407. Email: [email protected]. *These authors contributed equally to this work § To whom correspondence should be addressed: [email protected] -1- Mary Ann Liebert, Inc., 140 Huguenot Street, New Rochelle, NY 10801 OMICS: A Journal of Integrative Biology ABSTRACT The qualitative dimension of gene-expression data and its heterogeneous nature in cancerous specimens can be accounted for by phylogenetic modeling that incorporates the directionality of altered gene expressions, complex patterns of expressions among a group of specimens, and data-based rather that specimen-based gene linkage. Our phylogenetic modeling approach is a double algorithmic technique that includes polarity assessment that brings out the qualitative value of the data, followed by maximum r Fo parsimony analysis that is most suitable for the data heterogeneity of cancer geneexpression. We demonstrate that polarity assessment of expression values into derived Pe and ancestral states, via outgroup comparison, reduces experimental noise; reveals dichotomously-expressed asynchronous genes; and allows data pooling and er comparability of intra and interplatforms. Parsimony phylogenetic analysis of the polarized values produces a classification of specimens into clades that reveal shared Re derived gene expressions (the synapomorphies) that provides better qualitative vi assessment of ontogenic linkage of genes and phyletic relatedness of specimens; ew 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 efficiently utilizes dichotomously-expressed genes; produces highly predictive class recognition; illustrates gene linkage and multiple developmental pathways; provides higher concordance between gene lists; and projects the direction of change among specimens. Further implication of this phylogenetic approach is that it may transform microarray into diagnostic, prognostic, and predictive tool. INTRODUCTION Gene microarray has been employed in studying comparative gene-expression in cancer, genetic disorders, infections, drug response and interactions, as well as other -2- Mary Ann Liebert, Inc., 140 Huguenot Street, New Rochelle, NY 10801 Page 2 of 51 Page 3 of 51 biological processes (Quackenbush, 2006), and its data used to generate cancer taxonomy (Bittner, Meltzer, Chen et al., 2000; Golub, Slonim, Tamayo et al., 1999; Lossos and Morgensztern, 2006), diagnosis, prognosis (Beer, Kardia, Huang et al., 2002), subtyping/class discovery (Alizadeh, Eisen, Davis et al., 2000; Beer, Kardia, Huang et al., 2002) and biomarker detection (Lossos and Morgensztern, 2006). However, after more than a decade since its introduction and subsequent wide usage, microarray geneexpression is still suffering from a number of problems that are limiting its usefulness and r Fo potential (Harrison, Johnston and Orengo, 2007; Millenaar, Okyere, May et al., 2006; Wang, He, Band et al., 2005). There are the problems of reproducibility of measurements between runs, instruments, or laboratories; the inability to perform intra Pe and interplatform comparability, pooling, and insufficient concordance of gene lists; as er well as the lack of an analytical paradigm that can transform microarray data into a multidimensional bioinformatic tool useful for a clinical setting. Current analytical Re paradigms such as phenetic clustering and maximum likelihood (including Bayesian) have not resolved these issues (Abu-Asab, Chaouchi and Amri, 2006; Abu-Asab, vi Chaouchi and Amri, 2008). In an attempt to resolve some of the above listed problems ew 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 OMICS: A Journal of Integrative Biology and broaden the bioinformatic potential of the microarray technology, we introduce a parsimony phylogenetic approach for microarray data analysis that is based on outgroup comparison (a.k.a. polarity assessment) and maximum parsimony. Our approach is a double-algorithmic procedure where the data values are first polarized into derived or ancestral depending on whether they fall within the range of the outgroup, which is usually composed of normal healthy specimens, then the polarized data is processed with a maximum parsimony algorithm. The analysis produces a phylogenetic classification of -3- Mary Ann Liebert, Inc., 140 Huguenot Street, New Rochelle, NY 10801 OMICS: A Journal of Integrative Biology the specimens that recognizes monophyletic classes (clades) that are delimited by shared derived gene-expressions (the synapomorphies). Biologically meaningful interpretation of the data, and better correlation with clinical characteristics, diagnosis, and outcomes are highly desired criteria in an analytical tool (Allison, Cui, Page et al., 2006; Beer, Kardia, Huang et al., 2002; Bittner, Meltzer, Chen et al., 2000; Golub, Slonim, Tamayo et al., 1999). Clustering specimens into discernable entities on the basis of overall quantitative gene-expression linkage r Fo similarities has some serious drawbacks (Allison, Cui, Page et al., 2006; Lyons-Weiler, Patel, Becich et al., 2004), and appears to be incongruent with the nature of disease development (Abu-Asab, Chaouchi and Amri, 2006; Abu-Asab, Chaouchi and Amri, Pe 2008; Nesse and Stearns, 2008). In this report, we are demonstrating that the use of er parsimony phylogenetic analysis of microarray data resolves the issues of gene-ranking discrepancies, improves interplatform concordance, makes possible intra and Re interplatform comparability, eliminates biases in the gene linkage criteria, and casts geneexpression profiles into a biologically relevant and predictive model of class discovery. vi A superior classification is one that summarizes maximum knowledge about its ew 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 specimens, reflects their true ontogenic relationships to one another, and offers predictivity (Farris, 1979; Golub, Slonim, Tamayo et al., 1999). The latter is especially significant when the classification will be applied in a clinical setting for diagnosis, prognosis, or post-treatment evaluation. We are utilizing parsimony phylogenetics because of its inherent ability to produce a robust classification of relationships—class discovery; and its forecasting power to reveal the characters of a specimen when its place in the classification is established—class prediction (Albert, 2005b). Parsimony models -4- Mary Ann Liebert, Inc., 140 Huguenot Street, New Rochelle, NY 10801 Page 4 of 51 Page 5 of 51 the heterogeneity of cancerous microarray data without any a priori assumptions (Goloboff and Pol, 2005; Siddall, 1998; Stefankovic and Vigoda, 2007). Additionally, a phylogenetic approach elucidates the direction of change among specimens that leads to their molecular and cellular diversity: the presence of one or more developmental pathway (Abu-Asab, Chaouchi and Amri, 2008), and novel expressions that are involved in the progression and maintenance of the disease. A strict parsimony phylogenetic analysis uses only shared derived values, r Fo synapomorphies, to delimit a natural group of specimens within a clade (Wiley and Siegel-Causey, 1991). Shared derived values of a gene among several specimens constitute a synapomorphy; therefore, only a synapomorphy is indicative of their Pe relatedness. Since synapomorphies define clades at various grouping levels, a er parsimonious phylogenetic classification reflects hierarchical shared developmental pathways among a group of specimens and may reveal the presence of subclasses with Re each having its own uniquely derived gene-expression synapomorphies. In biological and clinical senses, class discovery and prediction should be based on shared derived vi gene expressions (i.e., synapomorphies). For example, a cancer class (a clade in ew 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 OMICS: A Journal of Integrative Biology phylogenetic terminology) is delimited by one or more synapomorphies, and a cancerous specimen will be placed in a class only if it shares the same synapomorphies with the members of the clade. We are describing a double-algorithmic analytical method of microarray geneexpression data based on polarity assessment algorithm, UNIPAL (Abu-Asab, Chaouchi and Amri, 2006) where the polarized values can be used by a parsimony algorithm, MIX (Felsenstein, 1989) to produce a phylogenetic classification of specimens. This approach -5- Mary Ann Liebert, Inc., 140 Huguenot Street, New Rochelle, NY 10801 OMICS: A Journal of Integrative Biology brings in a systematic solution to class discovery through phylogenetic classification whereby every class is delimited by shared derived gene expressions—i.e., synapomorphies-delimited clades. Because such a classification reflects the shared aberrations of gene expressions of the specimens, we expect it to have a biological and clinical relevance, and to advance targeted treatments of disease. MATERIALS AND METHODS r Fo Gene-Expression Datasets In order to demonstrate the applicability of parsimony phylogenetics to Pe microarray gene-expression data, and test the results of interplatform concordance and comparability, we downloaded three publicly available datasets of gene-expression er comparative studies, GDS484 (Hoffman, Milliken, Gregg et al., 2004), GDS533 (Quade, Wang, Sornberger et al., 2004), and GDS1210 (Hippo, Taniguchi, Tsutsumi et al., 2002), Re from NCBI’s Gene-Expression Omnibus (http://www.ncbi.nlm.nih.gov/geo/). The GDS484 was conducted on GPL96 (Affymetrix GeneChip Human Genome U133 Array vi Set HG-U133A), and the other two studies on GPL80 (Affymetrix GeneChip Human Full ew 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 Length Array HuGeneFL). The GDS484 was comprised of normal myometrium (n= 5) and uterine leiomyomas (n= 5) obtained from fibroid afflicted patients. The GDS533 study encompassed normal myometrium (n= 4), benign uterine leiomyoma (n= 7), as well as malignant uterine (n= 9) and extra-uterine (n= 4) leiomyosarcoma specimens. The GDS1210 study included expression profiling of 22 primary advanced gastric cancer tissues and 8 normal specimens. -6- Mary Ann Liebert, Inc., 140 Huguenot Street, New Rochelle, NY 10801 Page 6 of 51 Page 7 of 51 Polarity Assessment and Parsimony Analysis Polarity assessment through outgroup comparison does not use comparison of means and folds but rather it converts the continuous values into discontinuous ones through the assessment of each gene’s values against that of the normals’ range and produces a matrix of polarized values (0s and 1s). Our polarity assessment program, UNIPAL, compares independently each gene’s value of experimental specimens against its corresponding range within the outgroup, and scores each as either derived (1) or r Fo ancestral (0), so the matrix of gene-expression values is transformed into a matrix of polarized scores (0s & 1s). We used all the expression data points of all specimens in the analysis. For Pe polarity assessment (apomorphic [or derived] vs. plesiomorphic [or ancestral]), data was polarized with our customized algorithm (UNIPAL) written by the authors that er recognized derived values of each gene when compared with the outgroups (Abu-Asab, Re Chaouchi and Amri, 2006). Outgroups here were composed of normal healthy specimens only. UNIPAL determines the polarity for every data point among the specimens via vi outgroup comparison, and then scores each value of the study group as derived (1) or ew 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 OMICS: A Journal of Integrative Biology ancestral (0). Ideally, the outgroup should be large enough to encompass the maximum variation within normal healthy population. The phylogenetic analysis was carried out with MIX, the maximum parsimony program of PHYLIP ver. 3.57c (Felsenstein, 1989), to produce separate parsimony phylogenetic analyses for each dataset, and the inclusive matrix of the two sets (GDS533 & GDS1210) that included all their specimens. MIX was run in randomized and nonrandomized inputs, and no significant differences were observed between the two options. -7- Mary Ann Liebert, Inc., 140 Huguenot Street, New Rochelle, NY 10801 OMICS: A Journal of Integrative Biology Phylogenetic trees were drawn using TreeView (Page, 1996). Interplatform Concordance and Comparability To test interplatform concordance when analyzed parsimoniously, we compared the synapomorphies of the two uterine leiomyoma datasets, GDS484 & GDS533, and recorded the percentage of concordance. To test interplatform comparability (i.e., whether their datasets can be pooled together for an a parsimony analysis), we combined the polarized matrices of the two r Fo identical platform datasets, GDS533 & GDS1210, processed the combined matrix by MIX, and compared the result to their separate cladograms. Pe RESULTS er The implications of a parsimonious analysis of the gene-expression data are realized at several aspects: the recognition and utilization of partially asynchronous genes Re and dichotomously-expressed asynchronous genes; the importance of outgroup selection vi and its effect on gene listing, the multidimensionality of the cladograms, as well as interplatform concordance and comparability. ew 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 Dichotomously-Expressed Asynchronous (DEA) Genes Our analysis identified a specific punctuated pattern of gene expression that seemed to occur only in a set of specimens where a gene’s expression values were around the normals’ distribution (over and underexpressed), but did not overlap with it (Tables 17). This pattern has been only recognized once in the literature but was not named (Lyons-Weiler, Patel, Becich et al., 2004); we termed this phenomenon dichotomous -8- Mary Ann Liebert, Inc., 140 Huguenot Street, New Rochelle, NY 10801 Page 8 of 51 Page 9 of 51 asynchronicity to reflect its two-tailed distribution and deviation from the normal expression range. While t-statistic and fold-change may dismiss these asynchronous genes from the list of differentially-expressed genes, or misrepresent their significance (Lyons-Weiler, Patel, Becich et al., 2004), an outgroup polarity assessment will assess each value as derived and let the parsimony algorithm plot its significance in relation to the rest of the genes. A parsimony phylogenetic algorithm uses the polarity distribution of all genes to r Fo produce the most parsimonious classification, one with the lowest number of reversals and parallelisms (i.e., minimizes multiple origins of expression states in hypothesizing the relationships among the specimens) (Albert, 2005a; Felsenstein, 2004). Pe Through polarity assessment a large number of asynchronous genes that exhibited er dichotomous expression were recognized. All these genes had their expression values above and below that of the normal specimens’ range, i.e., derived in relation to Re outgroups. DEA genes were found in all the three datasets studied here (Tables 1-7), and were included within all the analyses. ew vi 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 OMICS: A Journal of Integrative Biology Most Parsimonious Cladograms Parsimony analysis produced one most parsimonious cladogram (having the least number of steps in constructing a classification of specimens) for the uterine GDS533 dataset (Fig.1). The topology of the tree showed one large inclusive clade that encompassed all of the leiomyomas and leiomyosarcomas delimited by 32 synapomorphies (Table 1), a terminal clade with 9 sarcoma specimens, middle sarcoma clade with 4 specimens, 5 small basal leiomyoma clades in tandem arrangement followed by 4 basal normal clades. -9- Mary Ann Liebert, Inc., 140 Huguenot Street, New Rochelle, NY 10801 OMICS: A Journal of Integrative Biology The cladogram in Fig. 1 showed that the leiomyoma specimens did not form a natural group by themselves—they did not form their own clade separating them from the leiomyosarcomas, and there were no synapomorphies circumscribing them as a clade when the ingroup was composed of leiomyoma and leiomyosarcoma. However, as a paraphyletic group, the leiomyomas shared 146 synapomorphies distinguishing them from the normals (Table 2). The 13 leiomyosarcoma specimens separated into a large terminal clade that was r Fo delimited by 20 synapomorphies in comparison with an outgroup composed of leiomyoma and normal specimens (Table 3), and 29 synapomorphies derived in relation to leiomyomas only as an outgroup (Table 4). Extrauterine sarcoma specimens did not Pe assemble together, but rather were scattered within the sarcoma clades (denoted by * on er the cladogram in Fig. 1). When the leiomyomas were removed from the comparison, there were 156 synapomorphies delimiting the sarcomas (Table 5); a result that illustrates Re the effect of outgroup and ingroup selections on the results. For the gastric dataset, GDS1210, parsimony analysis produced one most vi parsimonious cladogram (Fig. 2). The cladogram topology showed two terminal clades ew 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 with 6 and 5 specimens respectively and a tandem arrangement of 6 small clades with largest having 3 specimens. The inclusive gastric cancer clade was circumscribed by 34 synapomorphies (Table 6). In a list by list comparison, our 34 synapomorphies for the gastric cancer overlapped only with one common gene (CST4) from the gene list of the authors of the study (Hippo, Taniguchi, Tsutsumi et al., 2002). - 10 - Mary Ann Liebert, Inc., 140 Huguenot Street, New Rochelle, NY 10801 Page 10 of 51 Page 11 of 51 Interplatform Concordance Testing of interplatform concordance was carried out by comparing the two lists of synapomorphies of two leiomyoma studies, GDS484 and GDS533 (comparison results are summarized in Tables 7 & 8). Out of the ~ 22,000 genes in the GDS484 dataset, our analysis produced a total of 1485 synapomorphic genes circumscribing the leiomyoma specimens. While the leiomyomas of the GDS533 were delimited by 146 synapomorphies out of ~7000 gene probes. A comparison between the two sets of r Fo leiomyomas’ synapomorphies produced 45 shared ones between the two (Tables 7 & 8 ), a 31% concordance in synapomorphies despite the sizable difference in the number of probes between the two datasets, which is still better than the 12% concordance between Pe the statistically-produced gene lists of the two published studies (Hoffman, Milliken, Gregg et al., 2004; Quade, Wang, Sornberger et al., 2004). er However, 48% concordance resulted when comparing the 32 synapomorphies of Re the leiomyomas and leiomyosarcomas clade (GDS533, Table 1) with the 1485 synapomorphies of the leiomyomas of GDS484 (Table 7); the clades’ synapomorphies vi overlapped as follows: 1/1 OE, 7/8 UE (except FOSB), & 8/23 DE, an 89% concordance ew 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 OMICS: A Journal of Integrative Biology within the OE & UE and 35% within the DE. Additionally, there was 45% concordance between the 32 synapomorphies of the leiomyomas and leiomyosarcomas clade and the gene list of Quad et al. (Table 8). Furthermore, a lower concordance was obtained when comparing the phylogenetic synapomorphies against statistically-generated gene lists. The synapomorphies of leiomyomas (GDS533, Table 2) showed 18% concordance (4/25 OE, 8/42 UE) with the 78 significant genes of Hoffman et al. (2004, GDS484, gene list produced by fold-change), and 16.5% (5/25 OE, 6/42 UE) with the 146 genes of Quad et - 11 - Mary Ann Liebert, Inc., 140 Huguenot Street, New Rochelle, NY 10801 OMICS: A Journal of Integrative Biology al. (2004, GDS533, gene list produced by F-statistic). This was higher than the concordance between the two gene lists of the published uterine studies, 12% (3/25 OE, 5/42 UE). The two studies had no mention of DE genes. Data Pooling and Interplatform Comparability Data pooling and interplatform comparability was carried out on the combined polarized matrices of the gastric (GDS1210) and uterine (GDS533) datasets. Their inclusive parsimony analysis produced one most parsimonious cladogram (Fig. 3). Its r Fo topology showed a total separation of the gastric cancer from the uterine leiomyoma and sarcoma specimens into two large clades. However, the two types of cancers shared 16 synapomorphies that delimited a clade composed of all the gastric and uterine specimens Pe (Table 9). er The resulting inclusive cladogram (Fig. 3) showed an almost total agreement with the single type cladograms (Figs. 1 & 2) indicating a successful pooling of datasets. Re However, there was a slight variation in the topology of minor branches between the cladogram of Fig. 2 and the inclusive one of Fig. 3. These slight differences are most vi likely due to the increased number of normal specimens that were used in outgroup of the ew 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 inclusive cladogram. Outgroup size used here was by no means the idealist; the larger the membership of the outgroup the more stable the topology of the generated cladogram (Graybeal, 1998). DISCUSSION Microarray aims to identify differentially expressed genes, and subsequently characterize genetic patterns, classify specimens accordingly, and point out potential biomarkers. However, most of the problems that are currently associated with microarray - 12 - Mary Ann Liebert, Inc., 140 Huguenot Street, New Rochelle, NY 10801 Page 12 of 51 Page 13 of 51 analysis arise from using only the quantitative aspect of the data (the absolute continuous data values of gene-expression) to carry out parametric statistical analysis forecasting gene linkage on the basis of quantitative correlation and not expression pattern; and lacking the power to recognize and utilize specific gene-expression patterns such as dichotomous-expression and partial asynchronicities (Abu-Asab, Chaouchi and Amri, 2008; Allison, Cui, Page et al., 2006). This results in discrepancies that affect which genes are considered differentially expressed by the two main ranking criteria for r Fo generating gene-lists, the t-test and fold-change (Guo, Lobenhofer, Wang et al., 2006). Our double-algorithmic analysis supports a qualitative approach where the directionality of expression is the first step to designate the expression value as significant, followed by Pe parsimony search to plot a classification of specimens with the smallest number of steps er that explains the data’s distribution pattern. The results of parsimonious analysis of microarray gene-expression of three Re datasets show a total distinction of the sarcoma from the fibroid tissues (the leiomyomas), and these two classes from gastric cancer. It also identified a number of synapomorphies vi for gastric and uterine cancers, thus defining each as a natural disease entity with its ew 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 OMICS: A Journal of Integrative Biology unique shared derived expression; produced higher interplatform concordance than gene lists of t-test and fold-change (Tables 7 & 8); and allowed the pooling and comparability of two independent experiments. Such results confer reliability to a qualitative parsimonious approach to analyzing gene-expression data (Table 10). Advantages of Polarity Assessment There are several reasons for our preference of a combination of polarity assessment via outgroup comparison and parsimony over other methods for the analysis - 13 - Mary Ann Liebert, Inc., 140 Huguenot Street, New Rochelle, NY 10801 OMICS: A Journal of Integrative Biology of gene-expression microarray data (Allison, Cui, Page et al., 2006; Kolaczkowski and Thornton, 2004). Parsimony phylogenetic analysis requires polarity assessment for each data value to determine its novelty—whether it represents a change from the normal state (Abu-Asab, Chaouchi and Amri, 2008). We advocate that qualitative, and not only quantitative, similarity is a better measure of common ontogenetic steps among specimens, and that a correlation of genes based on similar quantitative expression is not necessarily indicative of ontogenic relationships among genes or specimens. Polarity r Fo assessment converts the absolute continuous data values into fixed discontinuous binary states (0/1) where the zero signifies no change in gene-expression and one indicates a deviation from the range of normal specimens. The change of a state from zero to one Pe conveys the direction of change in the diseased specimens since a derived state (1) er denotes a state that does not occur in normal specimens. Polarity assessment does not set an arbitrary stringency on gene selection Re especially where the distribution pattern is gene specific within a set of specimens (e.g., DE and partially asynchronous genes), and the other transformation methods are not vi optimal for its assessment (Huang and Qu, 2006; Lyons-Weiler, Patel, Becich et al., ew 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 2004). Fold-Change and t-test may dismiss from the gene-list those genes with dichotomous-expressions although they are indicative of a unique expression type and may account for some phenomena such transitional clades, and dichotomous or multipathway development in some disease types (Abu-Asab, Chaouchi and Amri, 2006; Lyons-Weiler, Patel, Becich et al., 2004). The gene lists of Tables 1C-7C show a large number of DE asynchronous genes that were mostly not considered significant by other methods (Hippo, Taniguchi, Tsutsumi et al., 2002; Hoffman, Milliken, Gregg et al., 2004; - 14 - Mary Ann Liebert, Inc., 140 Huguenot Street, New Rochelle, NY 10801 Page 14 of 51 Page 15 of 51 Quade, Wang, Sornberger et al., 2004), or their dichotomous mode was not noticed by the authors. Identifying synapomorphies is an important goal of a parsimony analysis since they are the basis for defining clades (Albert, 2005a; Wiley and Siegel-Causey, 1991). Polarity assessment identifies the genes with derived expressions in all of the ingroup specimens—i.e., it recognizes apomorphies, thus allows us to carry out parsimony phylogenetic analysis and benefit from its unique implications (Felsenstein, 2004; r Fo Hennig, 1966). It is the parsimony algorithm that plots a hierarchical distribution of synapomorphies to produce a hypothesis of relationships among the specimens in the form of a cladogram. A synapomorphy can be traced back from the parsimony Pe cladogram to the specimens that share it, thus permitting the determination of potential er biomarkers; this tracing back is almost impossible in other analytical methods (AbuAsab, Chaouchi and Amri, 2008). Re Because polarity assessment transforms the quantitative data into a qualitative matrix, it reduces the data noise. The absolute quantitative nature of the microarray data vi restricts their use and interpretation due to their range of inconsistencies between runs, ew 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 OMICS: A Journal of Integrative Biology platforms, and laboratories. By polarizing each data set with its own set of outgroup specimens, the inconsistencies of the experiment are eliminated since the polarization process is a comparison between equals—data values generated at the same time. The benefit here translates into the ability to pool a large number of experiments, carry out intra and interplatform comparabilities, and a better gene-list concordance between experiments. However, as discussed below, polarity assessment is sensitive to the choice and size of the outgroup specimens. - 15 - Mary Ann Liebert, Inc., 140 Huguenot Street, New Rochelle, NY 10801 OMICS: A Journal of Integrative Biology Selection and Size of the Outgroup When conducting a polarity assessment, outgroup’s selection and its effective size are very significant factors in correctly identifying synapomorphies, and therefore, delimiting the natural clades within the study group. The composition of the outgroup specimens affects the outcome of the analysis as demonstrated by the different combinations of outgroups that we used to conduct polarity assessment (Tables 1-5). In our opinion, the outgroup should be composed of only healthy specimens when the goal r Fo is to find out the genes involved in disease inception, progression, and maintenance. As Tables 1-5 show, variations of out/ingroup composition lead to variations in identifying synapomorphies, and therefore, may generate erroneous conclusions. When the ingroup Pe is a paraphyletic group (e.g., leiomyomas), the identified synapomorphies are different from those when the ingroup is monophyletic (contains all the related uterine specimens, er in this case the leiomyosarcoma as well). Re In our combined analysis (Fig. 3), the increase in outgroup size did not affect the major topology of the cladogram, but rather the internal branching of some clades vi (normal and gastric cancer) when compared with their single analysis (Figs. 1-2). ew 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 Because increasing the number of genes in the study does not have the same effect as enlarging outgroup size (Graybeal, 1998), it is our conclusion that a successful analysis requires a good number of normal specimens to be used as the outgroup. For microarray experiments to be meaningful and provide high predictivity, the smallest number of normal specimens that incorporates the maximum variation per population should be established and used in the analysis. - 16 - Mary Ann Liebert, Inc., 140 Huguenot Street, New Rochelle, NY 10801 Page 16 of 51 Page 17 of 51 Gene Linkage Whereas gene linkage of a clustering dendrogram is based on quantitative correlations between differentially expressed genes, in a parsimony cladogram it is based on the most parsimonious distribution of derived and ancestral gene-expression states of all genes of all the specimens; it is a map of expression states--both ancestral and derived. For this process to be accurate, the most parsimonious cladogram is selected. It reflects the classification that has the lowest number of steps as well as parallels and reversals to r Fo explain the distribution of expression states among specimens. Gene linkage here is based on the location of genes on the cladogram as synapomorphies. The synapomorphies below a node on the cladogram are the linked Pe genes that are shared among the specimens above that node. Because a parsimonious cladogram is hierarchical, every one of its nodes has its synapomorphy(ies). This er characteristic of a cladogram presents it as a map of linked genetic alterations that Re produce the diversity/relatedness of its specimens and may also permit the tracing of shared ontogenic pathways that are responsible for disease initiation and progression. Phylogenetic Implications on Disease Definition ew vi 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 OMICS: A Journal of Integrative Biology Although it is assumed that each disease has its own unique developmental pathway(s) (Adsay, Merati, Andea et al., 2002; Chung, 2000; Hayashi, Yamashita and Watanabe, 2004), thus far the omics data has not been used to prove this premise. Our analysis of two independently-generated datasets that represent uterine (GDS533) and gastric (GDS1210) cancers confirms that each of these two types of cancer is a natural class of specimens (a clade) that is circumscribed by its own set of synapomorphies. If this can be extended to other types of cancer, then each cancer can be considered a natural clade with its unique gene-expression identifiers—the synapomorphies. - 17 - Mary Ann Liebert, Inc., 140 Huguenot Street, New Rochelle, NY 10801 OMICS: A Journal of Integrative Biology There are several implications to this conclusion; the most obvious is its effect on the definition of biomarkers. If a type of cancer is a clade, then any suggested biomarker has to be a proven synapomorphy; otherwise it will not be a universal diagnostic test for all the specimens of this cancer. Some of the currently applied immunohistomarkers are not universal synapomorphies. For example, the memberships of all four clades of the gastric cancers (Fig. 2) did not correlate well with the specimens’ immunoreactivity to antibodies against p53, E-cadherin, and –catenin, and a published two-way clustering r Fo did not correlate any better (Hippo, Taniguchi, Tsutsumi et al., 2002). The discordance between molecular classifications and most of the currently used immunohistological markers is a problem that can be better addressed in a phylogenetic sense to indicate Pe whether a marker is a synapomorphy or has a random distribution among the subclades of er a cancer. Most of the immunohistological markers do partially stain their tumors, and therefore, are not expected to be synapomorphies. Re A second implication is that a phylogenetic classification can be a diagnostic tool because it is a process of class discovery based on synapomorphy-defined clades. This vi can be realized either through a parsimony analysis where the place of a specimen will ew 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 indicate its pathologic status or by using the synapomorphies as the biomarkers of a specimen, i.e., through class prediction. A third implication is that a parsimonious classification of specimens may be used as a prognostic tool. Because the cladogram also indicates the direction of change in gene-expression among the specimens; it places those specimens with the advanced number of derived gene-expression patterns at the terminal end of the cladogram, and places the specimens with the least number of gene-expression changes at the lower end - 18 - Mary Ann Liebert, Inc., 140 Huguenot Street, New Rochelle, NY 10801 Page 18 of 51 Page 19 of 51 of the cladogram, it may be developed for use in prognosis, targeted treatment, and posttreatment assessment. Additionally, the phylogenetic classification is a dynamic tool that will incorporate a novel specimen by placing it in the proximity of its sister groups, depending on the number of synapomorphies it shares with other members of a clade, without any radical alteration to the topology of the cladogram. Improved Interplatform Concordance and Comparability r Fo Improved interplatform concordance is a criterion that will bestow robustness and significance on microarray as a valid experimental and clinical platform. Using parsimony analysis, our tests of concordance by comparing the lists of synapomorphies Pe produced by polarity assessment of two experiments produced better results than those of er fold-change and F-statistic, and better than between the latter two (Table 8). When comparing the synapomorphies of a clade composed of leiomyomas and Re leiomyosarcomas (GDS533) with the synapomorphies of leiomyomas (GDS484), we obtained a high concordance of 89% within over and underexpressed and 35% within vi dichotomously expressed. The concordance between the two studies could have been ew 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 OMICS: A Journal of Integrative Biology higher if the number of probes of the GDS533 was closer to GDS484--7,000 v. 22,000 (Hoffman, Milliken, Gregg et al., 2004; Quade, Wang, Sornberger et al., 2004). Furthermore, even a comparison of the synapomorphies of two paraphyletic leiomyoma groups (GDS484 [1485 synapomorphies] & GDS533 [146 synapomorphies, Table 2]) produced 31% concordance between the two groups of leiomyoma (45/146, Table 7). This was a higher percentage than was produced by statistical methods (12%). - 19 - Mary Ann Liebert, Inc., 140 Huguenot Street, New Rochelle, NY 10801 OMICS: A Journal of Integrative Biology Interplatform comparability has been difficult to carry out on microarray data because of data inconsistencies between runs, experiments, and laboratories, however, with polarity assessment, which converts the quantitative values of gene-expression of every experiment into a qualitative matrix, it is possible to combine several matrices and carry out intra- and interplatform comparisons in a parsimonious phylogenetic sense. A phylogenetic interplatform comparability of microarray data can be carried out if each dataset can be polarized separately to produce its polarized matrix. Furthermore, when r Fo their probes are identical, two or more polarized sets can be pooled together and analyzed as Fig. 3 shows. We have successfully pooled and analyzed two separately polarized datasets (GDS533 & 1210) of gastric cancer as well as uterine leiomyoma and Pe leiomyosarcoma, where the two datasets were prepared separately but on an identical er gene chip platform, GPL80; and previously we have pooled three mass spectrometry proteomic datasets for a phylogenetic analysis (Abu-Asab, Chaouchi and Amri, 2006). Re Resolving Standing Questions Through Parsimony Phylogenetics: An Example Our analysis of uterine tissues illustrates how a parsimony phylogenetic analysis vi may confront some of the unresolved issues in bioinformatics and medicine. For ew 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 example, one of the persistent questions in pathology is the relationship between leiomyoma and leiomyosarcoma (Quade, Wang, Sornberger et al., 2004). It has been reported that approximately 1% of leiomyosarcoma may have arisen in pre-existing leiomyoma (Lee, Kong, Lee et al., 2005). By analyzing data of normal uterus, leiomyoma, and leiomyosarcoma, we are demonstrating that the latter two share a number of synapomorphies and form an inclusive clade (Table 1, Figs. 1 & 3), and that leiomyosarcoma has an additional number of synapomorphies distinguishing them from - 20 - Mary Ann Liebert, Inc., 140 Huguenot Street, New Rochelle, NY 10801 Page 20 of 51 Page 21 of 51 leiomyoma (Table 3). Although the leiomyoma specimens, when analyzed alone, without the leiomyosarcoma, appear to have a large number of synapomorphies (Table 2), these synapomorphies are not unique to leiomyoma, and the group appears to be paraphyletic (contains some but not all of its developmental relatives). Leiomyoma as a group does not form a clade within a comprehensive ingroup that includes the leiomyosarcoma; there is not even one gene-expression that is unique to the group itself in this context. Because it shares with the sarcoma its synapomorphies, leiomyoma r Fo should be considered an incipient form of leiomyosarcoma. Conclusion The application of phylogenetic analysis through polarity assessment and Pe parsimony to several gene-expression microarray datasets provides the basis for a new er paradigm to analyzing and interpreting microarray data (Table 10). It offers an alternative to F & t-statistics and fold-change methods of generating differentially- Re expressed gene listing and statistical gene linkage; brings out a higher interplatform concordance; resolves interplatform comparability problems; defines biomarkers as vi synapomorphies; circumscribes disease types as clades defined by synapomorphies; and ew 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 OMICS: A Journal of Integrative Biology possibly transforms microarray into diagnostic, prognostic, and post-treatment evaluation tool. ACKNOWLEDGEMENTS Competing interests. The authors have filed for US patent for their analytical method. - 21 - Mary Ann Liebert, Inc., 140 Huguenot Street, New Rochelle, NY 10801 OMICS: A Journal of Integrative Biology REFERENCE Abu-Asab, M., Chaouchi, M., and Amri, H. (2006). Phyloproteomics: What phylogenetic analysis reveals about serum proteomics. J Proteome Res 5, 2236-2240. Abu-Asab, M., Chaouchi, M., and Amri, H. (2008). Evolutionary medicine: A meaningful connection between omics, disease, and treatment. Proteomic Clin Appl 2, 122-134. r Fo Adsay, N. V., Merati, K., Andea, A., Sarkar, F., Hruban, R. H., Wilentz, R. E., et al. (2002). The dichotomy in the preinvasive neoplasia to invasive carcinoma sequence in the pancreas: differential expression of MUC1 and MUC2 Pe supports the existence of two separate pathways of carcinogenesis. Mod Pathol 15, 1087-1095. er Albert, V. A. (2005a). Parsimony and phylogenetics in the genomic age. In: Re Albert, V. A. (ed). Parsimony, phylogeny, and genomics. (Oxford University Press, Oxford, New York). vi Albert, V. A. (2005b). Parsimony, phylogeny, and genomics. (Oxford University ew 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 Press, Oxford, New York). Alizadeh, A. A., Eisen, M. B., Davis, R. E., Ma, C., Lossos, I. S., Rosenwald, A., et al. (2000). Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling. Nature 403, 503-511. Allison, D. B., Cui, X., Page, G. P., and Sabripour, M. (2006). Microarray data analysis: from disarray to consolidation and consensus. Nature Reviews 7, 55-65. - 22 - Mary Ann Liebert, Inc., 140 Huguenot Street, New Rochelle, NY 10801 Page 22 of 51 Page 23 of 51 Beer, D. G., Kardia, S. L., Huang, C. C., Giordano, T. J., Levin, A. M., Misek, D. E., et al. (2002). Gene-expression profiles predict survival of patients with lung adenocarcinoma. Nature Medicine 8, 816-824. Bittner, M., Meltzer, P., Chen, Y., Jiang, Y., Seftor, E., Hendrix, M., et al. (2000). Molecular classification of cutaneous malignant melanoma by gene expression profiling. Nature 406, 536-540. Chung, D. C. (2000). The genetic basis of colorectal cancer: insights into critical r Fo pathways of tumorigenesis. Gastroenterology 119, 854-865. Farris, J. S. (1979). The Information Content of the Phylogenetic System. Systematic Zoology 28, 483-519. Pe Felsenstein, J. (1989). PHYLIP: phylogeny inference package (version 3.2). Cladistics 5, 164-166. er Felsenstein, J. (2004). Inferring phylogenies. (Sinauer Associates, Sunderland, Mass.). Re Goloboff, P. A., and Pol, D. (2005). Parsimony and Bayesian phylogenetics. In: vi Albert, V. A. (ed). Parsimony, phylogeny, and genomics. (Oxford University ew 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 OMICS: A Journal of Integrative Biology Press, Oxford, New York). Golub, T. R., Slonim, D. K., Tamayo, P., Huard, C., Gaasenbeek, M., Mesirov, J. P., et al. (1999). Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286, 531-537. Graybeal, A. (1998). Is it better to add taxa or characters to a difficult phylogenetic problem? Systematic Biology 47, 9-17. - 23 - Mary Ann Liebert, Inc., 140 Huguenot Street, New Rochelle, NY 10801 OMICS: A Journal of Integrative Biology Guo, L., Lobenhofer, E. K., Wang, C., Shippy, R., Harris, S. C., Zhang, L., et al. (2006). Rat toxicogenomic study reveals analytical consistency across microarray platforms. Nat Biotech 24, 1162-1169. Harrison, A. P., Johnston, C. E., and Orengo, C. A. (2007). Establishing a major cause of discrepancy in the calibration of Affymetrix GeneChips. BMC Bioinformatics 8, 195. Hayashi, Y., Yamashita, J., and Watanabe, T. (2004). Molecular genetic analysis r Fo of deep-seated glioblastomas. Cancer Genet Cytogenet 153, 64-68. Hennig, W. (1966). Phylogenetic systematics. (University of Illinois Press, Urbana, IL). Pe Hippo, Y., Taniguchi, H., Tsutsumi, S., Machida, N., Chong, J. M., Fukayama, M., er et al. (2002). Global gene expression analysis of gastric cancer by oligonucleotide microarrays. Cancer Research 62, 233-240. Re Hoffman, P. J., Milliken, D. B., Gregg, L. C., Davis, R. R., and Gregg, J. P. (2004). Molecular characterization of uterine fibroids and its implication for vi underlying mechanisms of pathogenesis. Fertility and Sterility 82, 639-649. ew 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 Huang, S., and Qu, Y. (2006). The loss in power when the test of differential expression is performed under a wrong scale. J Comput Biol 13, 786-797. Kolaczkowski, B., and Thornton, J. W. (2004). Performance of maximum parsimony and likelihood phylogenetics when evolution is heterogeneous. Nature 431, 980-984. - 24 - Mary Ann Liebert, Inc., 140 Huguenot Street, New Rochelle, NY 10801 Page 24 of 51 Page 25 of 51 Lee, E. J., Kong, G., Lee, S. H., Rho, S. B., Park, C. S., Kim, B. G., et al. (2005). Profiling of differentially expressed genes in human uterine leiomyomas. Int J Gynecol Cancer 15, 146-154. Lossos, I. S., and Morgensztern, D. (2006). Prognostic biomarkers in diffuse large B-cell lymphoma. J Clin Oncol 24, 995-1007. Lyons-Weiler, J., Patel, S., Becich, M. J., and Godfrey, T. E. (2004). Tests for finding complex patterns of differential expression in cancers: towards r Fo individualized medicine. BMC Bioinformatics 5, 110. Millenaar, F. F., Okyere, J., May, S. T., van Zanten, M., Voesenek, L. A., and Peeters, A. J. (2006). How to decide? Different methods of calculating gene Pe expression from short oligonucleotide array data will give different results. BMC Bioinformatics 7, 137. er Nesse, R. M., and Stearns, S. C. (2008). The great opportunity: Evolutionary Re applications to medicine and public health. Evol Appl 1, 28-48. Page, R. D. (1996). TreeView: an application to display phylogenetic trees on vi personal computers. Comput Appl Biosci 12, 357-358. ew 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 OMICS: A Journal of Integrative Biology Quackenbush, J. (2006). Microarray analysis and tumor classification. The New England Journal of Medicine 354, 2463-2472. Quade, B. J., Wang, T. Y., Sornberger, K., Dal Cin, P., Mutter, G. L., and Morton, C. C. (2004). Molecular pathogenesis of uterine smooth muscle tumors from transcriptional profiling. Genes, Chromosomes & Cancer 40, 97-108. Siddall, M. E. (1998). Success of parsimony in the four-taxon case: long-branch repulsion by likelihood in the Farris zone. Cladistics 14, 209-220. - 25 - Mary Ann Liebert, Inc., 140 Huguenot Street, New Rochelle, NY 10801 OMICS: A Journal of Integrative Biology Stefankovic, D., and Vigoda, E. (2007). Phylogeny of mixture models: robustness of maximum likelihood and non-identifiable distributions. J Comput Biol 14, 156-189. Wang, H., He, X., Band, M., Wilson, C., and Liu, L. (2005). A study of inter-lab and inter-platform agreement of DNA microarray data. BMC Genomics 6, 71. Wiley, E. O., and Siegel-Causey, D. (1991). The Compleat cladist : a primer of phylogenetic procedures. (Museum of Natural History, Dyche Hall, University r Fo of Kansas, Lawrence, Kan.). er Pe ew vi Re 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 - 26 - Mary Ann Liebert, Inc., 140 Huguenot Street, New Rochelle, NY 10801 Page 26 of 51 Page 27 of 51 r Fo er Pe ew vi Re 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 OMICS: A Journal of Integrative Biology FIG. 1. A cladogram of a parsimony phylogenetic analysis of microarray gene-expression data representing normal myometrium (n= 4), leiomyoma (n= 7), leiomyosarcoma (n= 9), and extrauterine leiomyosarcoma (n= 4) specimens. The leiomyomas and leiomyosarcomas form a clade defined by 32 synapomorphies (Table 1). The leiomyosarcoma specimens form a terminal clade that is circumscribed by 20 synapomorphies (Table 3). Asterisk (*) denotes extrauterine leiomyosarcomas. 189x280mm (600 x 600 DPI) Mary Ann Liebert, Inc., 140 Huguenot Street, New Rochelle, NY 10801 OMICS: A Journal of Integrative Biology r Fo er Pe ew vi Re 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 FIG. 2. A cladogram of a parsimony phylogenetic analysis of gastric cancer and noncancerous specimens. It shows a clade delineated by 34 synapomorphies (Table 6) encompassing all cancer specimens. 219x249mm (600 x 600 DPI) Mary Ann Liebert, Inc., 140 Huguenot Street, New Rochelle, NY 10801 Page 28 of 51 Page 29 of 51 r Fo er Pe Re FIG. 3. A cladogram representing a comparability analysis of the gastric (GDS1210) and uterine (GDS533) datasets. The polarized matrices of the two datasets were pooled together and processed by the parsimony phylogenetic algorithm, MIX. Each of the cancers (gastric and leiomyosarcoma) forms its own clade, and the inclusive clade encompassing the two cancers and leiomyomas is delimited by a set of synapomorphies (Table 9). 248x189mm (600 x 600 DPI) ew vi 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 OMICS: A Journal of Integrative Biology Mary Ann Liebert, Inc., 140 Huguenot Street, New Rochelle, NY 10801 OMICS: A Journal of Integrative Biology Table 1. Synapomorphies defining a clade of leiomyoma and leiomyosarcoma specimens in comparison to normal specimens (GDS533). Synapomorphies include: one OE gene, 8 UE genes, and 23 DE genes. Last column reports the status of the synapomorphies as described by [1] Hoffman et al.(2004) and [2] Quade et al. (2004) in their significant genes’ lists. DE= dichotomously-expressed; NS= not significant; OE= overexpressed; UN= underexpressed. A. Overexpressed synapomorphic genes: r Fo D00596 TYMS thymidylate synthetase OE[1, 2] B. Underexpressed synapomorphic genes: L19871 ATF3 activating transcription factor 3 UE[1, 2] U62015 CYR61 cysteine-rich, angiogenic inducer, 61 UE[1, 2] X68277 DUSP1 dual specificity phosphatase 1 UE[1, 2] V01512 FOS v-fos FBJ murine osteosarcoma viral oncogene homolog UE[1], NS[2] L49169 FOSB FBJ murine osteosarcoma viral oncogene homolog B NS[1], UE[2] J04111 JUN v-jun sarcoma virus 17 oncogene homolog (avian) UE[1, 2] Y00503 KRT19 keratin 19 UE[1], NS[2] U24488 TNXB tenascin XB er Pe ew vi Re 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 Page 30 of 51 UE[1], UE,OE[2] C. Dichotomously-expressed synapomorphic genes: M31994 ALDH1A1 aldehyde dehydrogenase 1 family, member A1 UE[1], NS[2] X05409 ALDH2 aldehyde dehydrogenase 2 family (mitochondrial) NS[1, 2] D25304 ARHGEF6 Rac/Cdc42 guanine nucleotide exchange factor (GEF) 6 NS[1, 2] K03430 C1QB complement component 1, q subcomponent, B chain NS[1], OE[2] U60521 CASP9 caspase 9, apoptosis-related cysteine peptidase NS[1, 2] M73720 CPA3 carboxypeptidase A3 (mast cell) NS[1, 2] HG2663HT2759_at Cpg-Enriched DNA, Clone S19 (HG3995-HT4265) NS[1, 2] 1 Mary Ann Liebert, Inc., 140 Huguenot Street, New Rochelle, NY 10801 Page 31 of 51 M14676 FYN oncogene related to SRC, FGR, YES NS[1, 2] M34677 F8A1 coagulation factor VIII-associated (intronic transcript) 1 OE,UE[2] U60061 FEZ2 fasciculation and elongation protein zeta 2 (zygin II) NS[1, 2] U86529 GSTZ1 glutathione transferase zeta 1 (maleylacetoacetate isomerase) NS[1, 2] HG358HT358_at Homeotic Protein 7, Notch Group (HG358-HT358) NS[2] AB002365 KIAA0367 BCH motif-containing molecule at the carboxyl terminal region 1 NS[1], OE,UE[2] U37283 MFAP5 microfibrillar associated protein 5 NS[1, 2] HG406HT406_at MFI2 antigen p97 (melanoma associated) identified by monoclonal antibodies 133.2 and 96.5 MMP2 matrix metallopeptidase 2 (gelatinase A, 72kDa gelatinase, 72kDa type IV collagenase) M55593 r Fo NS[1, 2] NS[1], OE,UE[2] M76732 MSX1 msh homeobox homolog 1 NS[1, 2] L48513 PON2 paraoxonase 2 NS[1, 2] U77594 RARRES2 retinoic acid receptor responder (tazarotene induced) 2 NS[1, 2] M11433 RBP1 retinol binding protein 1 NS[1, 2] L03411 RDBP RD RNA binding protein NS[1], OE[2] Z29083 TPBG trophoblast glycoprotein NS[1, 2] S73591 TXNIP thioredoxin interacting protein er Pe NS[1, 2] ew vi Re 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 OMICS: A Journal of Integrative Biology 2 Mary Ann Liebert, Inc., 140 Huguenot Street, New Rochelle, NY 10801 OMICS: A Journal of Integrative Biology Table 2. Synapomorphies of leiomyoma specimens in comparison to normal specimens (GDS533). These comprise: 25 OE genes, 42 UE genes, and 79 DE genes. Asterisk (*) indicates a synapomorphy for leiomyosarcoma as well. Last column reports the status of the synapomorphies as described by Hoffman et al. (2004) and Quade et al. (2004) in their significant genes lists. A. Overexpressed synapomorphic genes: D16469 ATP6AP1 ATPase, H+ transporting, lysosomal accessory protein 1 NS[1, 2] U07139 CACNB3 calcium channel, voltage-dependent, beta 3 subunit NS[1], OE[2] M11718 COL5A2 collagen, type V, alpha 2 NS[1, 2] U18300 DDB2 damage-specific DNA binding protein 2, 48kDa NS[1, 2] D38550 E2F3 E2F transcription factor 3 NS[1, 2] M34677 F8A1 coagulation factor VIII-associated (intronic transcript) 1 NS[1, 2] D89289 FUT8 fucosyltransferase 8 (alpha (1,6) fucosyltransferase) NS[1, 2] D86962 GRB10 growth factor receptor-bound protein 10 NS[1, 2] M32053 H19, imprinted maternally expressed untranslated mRNA NS[1, 2] U07664 HLXB9 homeobox HB9 OE[1, 2] D87452 IHPK1 inositol hexaphosphate kinase 1 NS[1, 2] U51336 ITPK1 inositol 1,3,4-triphosphate 5/6 kinase NS[1, 2] r Fo er Pe ew vi Re 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 Page 32 of 51 AB002365 KIAA0367 NS[1], OE[2] D78611 MEST mesoderm specific transcript homolog (mouse) OE[1], NS[2] U19718 MFAP2 microfibrillar-associated protein 2 NS[1, 2] M55593 MMP2 matrix metallopeptidase 2 (gelatinase A, 72kDa gelatinase, 72kDa type IV collagenase) OE[1, 2] U79247 PCDH11X protocadherin 11 X-linked NS[1, 2] L24559 POLA2 polymerase (DNA directed), alpha 2 (70kD subunit) NS[1, 2] M65066 PRKAR1B protein kinase, cAMP-dependent, regulatory, type I, beta NS[1, 2] D14694 PTDSS1 phosphatidylserine synthase 1 NS[1, 2] U24186 RPA4 replication protein A4, 34kDa NS[1, 2] 3 Mary Ann Liebert, Inc., 140 Huguenot Street, New Rochelle, NY 10801 Page 33 of 51 U85658 TFAP2C transcription factor AP-2 gamma (activating enhancer binding protein 2 gamma) NS[1, 2] D82345 TMSL8 thymosin-like 8 NS[1, 2] D85376 TRHR thyrotropin-releasing hormone receptor NS[1, 2] D00596 TYMS* thymidylate synthetase OE[1, 2] B. Underexpressed synapomorphic genes: X03350 ADH1B alcohol dehydrogenase IB (class I), beta polypeptide NS[1, 2] M31994 ALDH1A1* aldehyde dehydrogenase 1 family, member A1 UE[1], NS[2] X05409 ALDH2* aldehyde dehydrogenase 2 family (mitochondrial) NS[1, 2] L19871 ATF3* activating transcription factor 3 UE[1, 2] U60521 CASP9 caspase 9, apoptosis-related cysteine peptidase NS[1, 2] D49372 CCL11 chemokine (C-C motif) ligand 11 NS[1, 2] X05323 CD200 molecule NS[1, 2] M83667 CEBPD CCAAT/enhancer binding protein (C/EBP), delta NS[1, 2] U90716 CXADR coxsackie virus and adenovirus receptor M21186 er NS[1, 2] CYBA cytochrome b-245, alpha polypeptide NS[1, 2] U62015 CYR61* cysteine-rich, angiogenic inducer, 61 UE[1, 2] Z22865 DPT dermatopontin NS[1, 2] X56807 DSC2 desmocollin 2 X68277 DUSP1* dual specificity phosphatase 1 V01512 FOS* v-fos FBJ murine osteosarcoma viral oncogene homolog UE[1], NS[2] L49169 FOSB FBJ murine osteosarcoma viral oncogene homolog B NS[1], UE[2] L11238 GP5 glycoprotein V (platelet) NS[1, 2] M36284 GYPC glycophorin C (Gerbich blood group) NS[1, 2] M60750 HIST1H2BG histone cluster 1, H2bg NS[1, 2] X79200 Homo spaiens mRNA for SYT-SSX protein NS[1, 2] X92814 HRASLS3 HRAS-like suppressor 3 NS[1, 2] M62831 IER2 immediate early response 2 NS[1, 2] J04111 JUN* v-jun sarcoma virus 17 oncogene homolog (avian) UE[1, 2] r Fo Pe ew vi Re 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 OMICS: A Journal of Integrative Biology NS[1, 2] UE[1, 2] 4 Mary Ann Liebert, Inc., 140 Huguenot Street, New Rochelle, NY 10801 OMICS: A Journal of Integrative Biology Y00503 KRT19* keratin 19 NS[1, 2] X89430 MECP2 methyl CpG binding protein 2 (Rett syndrome) NS[1, 2] U46499 MGST1 microsomal glutathione S-transferase 1 NS[1, 2] M93221 MRC1 mannose receptor, C type 1 NS[1, 2] M76732 MSX1* msh homeobox homolog 1 (Drosophila) NS[1, 2] S71824 NCAM1 neural cell adhesion molecule 1 OE[1], NS[2] X70218 PPP4C protein phosphatase 4 NS[1, 2] U02680 PTK9 protein tyrosine kinase 9 NS[1, 2] U79291 U77594 L20859 NS[1, 2] NS[1, 2] NS[1, 2] SLC20A1 solute carrier family 20 (phosphate transporter), member 1 STAT1 signal transducer and activator of transcription 1, 91kDa er M97935 RBP1* retinol binding protein 1, cellular Pe M11433, X07438 PTPN11 protein tyrosine phosphatase, non-receptor type 11 (Noonan syndrome 1) RARRES2* retinoic acid receptor responder (tazarotene induced) 2 r Fo J04152 TACSTD2 tumor-associated calcium signal transducer 2 X14787 THBS1 thrombospondin 1 U24488 TNXB* tenascin XB Z29083 TPBG* trophoblast glycoprotein X51521 VIL2 villin 2 (ezrin) D87716 WDR43 WD repeat domain 43 ew vi Re 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 Page 34 of 51 NS[1, 2] NS[1, 2] NS[1, 2] NS[1, 2] UE[1, 2] NS[1, 2] UE[1], NS[2] NS[1, 2] C. Dichotomously-expressed synapomorphic genes: ABCB1; ADRM1; AIM1; ALDH1A3; AMDD; ARHGEF6; ARL4D; ATP5B; Atp8a2; C1QB; CA9; CALM2; CTSB; CCRL2; CD52; CD99; CPA3; DPYD; DSG2; Emx2; FEZ2; FLNA; FOXO1A; FYN; GAPDH; GNB3; GSTZ1; H1F0; H2-ALPHA; HBG2; Ubx, Notch1; Hox5.4; HTR2C; ICA1; IGF2; INSR; ITGA6; ITGA9; KCNK1; KIAA0152; MAP1D; MATK; MBP; MDM4; MFAP5; MFI2 antigen p97; MLH1; MPZ; NDUFS1; NELL2; NNAT; NOS3; NR4A1; OASL; ODC1; OLFM1; PKN2; PON2; PRMT2; PSMC3; PTR2; RANBP2; RBMX; RDBP; RHOG; SAFB2; SCRIB; SELP; SERPINF1; SMS; SPOCK2; ST3GAL1; THRA; TNXB; TTLL4; TXNIP; UPK2; XA; ZNF43 5 Mary Ann Liebert, Inc., 140 Huguenot Street, New Rochelle, NY 10801 Page 35 of 51 Table 3. A clade of all leiomyosarcoma specimens defined by 20 synapomorphies in comparison to normal and leiomyoma specimens. Last column reports the status of the synapomorphies as described by Quade et al. (2004) in their significant genes list. A. Overexpressed synapomorphic genes: X54942 CKS2 CDC28 protein kinase regulatory subunit 2 NS U68566 HAX1 HCLS1 associated protein X-1 NS L03411 X59543 r Fo RDBP RD RNA binding protein OE RRM1 ribonucleotide reductase M1 polypeptide NS Pe B. Underexpressed synapomorphic genes: D13639 CCND2, cyclin D2 UE D21337 COL4A6 collagen, type IV, alpha 6 UE er HG2810-HT2921_at L36033 HG2810-HT2921_at AB002382 Homeotic Protein Emx2 ew HG2663-HT2759_at vi HG2663-HT2759_at Csh2 chorionic somatomammotropin hormone 2 [Rattus norvegicus] CXCL12 chemokine (C-X-C motif) ligand 12 (stromal cell-derived factor 1) EMX2 empty spiracles homolog 2 (Drosophila). Homeotic Protein Emx2 Re 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 OMICS: A Journal of Integrative Biology HOXA10 homeobox A10 Expressed in the adult human endometrium LOC284394 hypothetical gene supported by NM_001331 NS NS NS NS UE NS U69263 MATN2 matrilin 2 UE U85707 Meis1, myeloid ecotropic viral integration site 1 homolog (mouse) UE Z29678 MITF microphthalmia-associated transcription factor UE L35240 PDLIM7 PDZ and LIM domain 7 (enigma) NS 6 Mary Ann Liebert, Inc., 140 Huguenot Street, New Rochelle, NY 10801 OMICS: A Journal of Integrative Biology D87735 RPL14 ribosomal protein L14 NS L14076 SFRS4 splicing factor, arginine/serine-rich 4 UE J05243 SPTAN1 spectrin, alpha, non-erythrocytic 1 (alphafodrin) NS C. Dichotomously-expressed synapomorphic gene: M33197 GAPDH glyceraldehyde-3-phosphate dehydrogenase r Fo er Pe ew vi Re 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 Page 36 of 51 7 Mary Ann Liebert, Inc., 140 Huguenot Street, New Rochelle, NY 10801 NS Page 37 of 51 Table 4. A clade of all leiomyosarcoma specimens defined by 29 synapomorphies in comparison to leiomyoma specimens only (GDS533). Last column reports the status of the synapomorphies as described by Quade et al. (2004) in their significant genes list. A. Overexpressed synapomorphic genes: X54941 CKS1B CDC28 protein kinase regulatory subunit 1B OE X54942 CKS2 CDC28 protein kinase regulatory subunit 2 NS J03060 GBAP glucosidase, beta; acid, pseudogene NS U78027 GLA galactosidase, alpha (associated w/ Fabry’s) RPL36A ribosomal protein L36a No 4 4922 GPX1 glutathione peroxidase 1 NS Y00433 GPX1 glutathione peroxidase 1 NS U68566 HAX1 HCLS1 associated protein X-1 NS X59543 RRM1 ribonucleotide reductase M1 polypeptide NS U12465 RPL35 ribosomal protein L35 OE r Fo er Pe SLC10A2 solute carrier family 10 (sodium/bile acid cotransporter family), member 2 B. Underexpressed synapomorphic: genes U67674 Re NS U87223 CNTNAP1 contactin associated protein 1 UE D30655 EIF4A2 eukaryotic translation initiation factor 4A, isoform 2 UE L20814 GRIA2 glutamate receptor, ionotropic, AMPA 2 UE M10051 INSR insulin receptor NS D79999 LOC221181 hypothetical gene supported by NM_006437 NS D14812 MORF4L2 mortality factor 4 like 2 UE L36151 PIK4CA phosphatidylinositol 4-kinase, catalytic, alpha polypeptide NS D42108 PLCL1 phospholipase C-like 1 NS L13434 RpL41 Ribosomal protein L41 NS HG921HT3995_at Serine/Threonine Kinase, Receptor 2-2, Alt. Splice 3 NS D31891 SETDB1 SET domain, bifurcated 1 UE ew vi 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 OMICS: A Journal of Integrative Biology 8 Mary Ann Liebert, Inc., 140 Huguenot Street, New Rochelle, NY 10801 OMICS: A Journal of Integrative Biology AB002318 Talin2 NS U53209 TRA2A transformer-2 alpha NS D87292 TST thiosulfate sulfurtransferase (rhodanese) NS M15990 YES1 v-yes-1 Yamaguchi sarcoma viral oncogene homolog 1 NS C. Dichotomously expressed synapomorphic genes: U56417 AGPAT1 1-acylglycerol-3-phosphate O-acyltransferase 1 (lysophosphatidic acid acyltransferase, alpha) NS M63167 AKT1 v-akt murine thymoma viral oncogene homolog 1 NS L27560 IGFBP5 insulin-like growth factor binding protein 5 NS U40223 P2RY4 pyrimidinergic receptor P2Y, G-protein coupled, 4 NS D76444 RNF103 ring finger protein 103 NS r Fo er Pe ew vi Re 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 Page 38 of 51 9 Mary Ann Liebert, Inc., 140 Huguenot Street, New Rochelle, NY 10801 Page 39 of 51 Table 5. A clade composed of all leiomyosarcoma specimens is defined in relation to normal specimens (GDS533). Last column reports the status of the synapomorphies as described by Quade et al. (2004) in their significant genes list. A. Overexpressed synapomorphic genes: S78187 CDC25B cell division cycle 25B NS U40343 CDKN2D cyclin-dependent kinase inhibitor 2D (p19, inhibits CDK4) NS X54942 CKS2 CDC28 protein kinase regulatory subunit 2 NS r Fo X79353 GDI1 GDP dissociation inhibitor 1 NS H2AFX H2A histone family, member X NS IRF5 interferon regulatory factor 5 NS U04209 MFAP1 microfibrillar-associated protein 1 NS U43177 MpV17 mitochondrial inner membrane protein NS U19796 MRPL28 mitochondrial ribosomal protein L28 OE X14850 U51127 er Pe POLR2L polymerase (RNA) II (DNA directed) polypeptide L, 7.6kDa PPGB protective protein for beta-galactosidase (galactosialidosis) SLC18A3 solute carrier family 18 (vesicular acetylcholine), member 3 STIP1 stress-induced-phosphoprotein 1 (Hsp70/Hsp90organizing protein) U37690 Re M22960 U09210 M86752 M26880 UBC ubiquitin C U43177 UCN urocortin B. Underexpressed synapomorphic genes: ew vi 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 OMICS: A Journal of Integrative Biology NS NS NS OE OE NS HG3638HT3849_s_at ADH1A alcohol dehydrogenase 1A (class I), alpha polypeptide Amyloid Beta (A4) Precursor Protein, Alt. Splice 2, A4(751) L28997 ARL1 ADP-ribosylation factor-like 1 NS Z49269 CCL14 chemokine (C-C motif) ligand 14 UE M92934 CTGF connective tissue growth factor UE M74099 CUTL1 cut-like 1, CCAAT displacement protein (Drosophila) NS M12963 10 Mary Ann Liebert, Inc., 140 Huguenot Street, New Rochelle, NY 10801 UE NS OMICS: A Journal of Integrative Biology M96859 DPP6 dipeptidyl-peptidase 6 UE U94855 EIF3S5 eukaryotic translation initiation factor 3, subunit 5 epsilon, 47kDa NS L25878 EPHX1 epoxide hydrolase 1, microsomal (xenobiotic) NS U60061U69140 FEZ2 fasciculation and elongation protein zeta 2 (zygin II) NS X67491 GLUDP5 glutamate dehydrogenase pseudogene 5 NS HG4334HT4604_s_at Glycogenin NS X53296 IL1RN interleukin 1 receptor antagonist NS X55740 NT5E 5'-nucleotidase, ecto (CD73) UE PCBP2 poly(rC) binding protein 2 UE r Fo X78136 PHLDA1 pleckstrin homology-like domain, family A, member 1 PPP2R1A protein phosphatase 2 (formerly 2A), regulatory subunit A (PR 65), alpha isoform PPP2CB protein phosphatase 2, catalytic subunit, beta isoform NS U25988 PSG11 pregnancy specific beta-1-glycoprotein 11 NS M98539 PTGDS prostaglandin D2 synthase 21kDa (brain) UE X54131 PTPRB protein tyrosine phosphatase, receptor type, B NS M12174 RHOB ras homolog gene family, member B NS HG1879HT1919 RHOQ ras homolog gene family, member Q M33493 TPSB2 tryptase beta 2 L14837 TJP1 tight junction protein 1 (zona occludens 1) UE HG3344HT3521_at UBE2D1 ubiquitin-conjugating enzyme E2D 1 (UBC4/5 homolog, yeast) NS X98534 VASP vasodilator-stimulated phosphoprotein NS X51630 WT1 Wilms tumor 1 UE HG3426HT3610_s_at Zinc Finger Protein Hzf-16, Kruppel-Like, Alt. Splice 1 NS M92843 ZFP36 zinc finger protein 36, C3H type, homolog (mouse) UE Z50194 Pe J02902 J03805 er ew vi Re 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 Page 40 of 51 NS NS NS NS C. Dichotomously-expressed synapomorphic genes: U80226 ABAT 4-aminobutyrate aminotransferase 11 Mary Ann Liebert, Inc., 140 Huguenot Street, New Rochelle, NY 10801 NS Page 41 of 51 M14758 ABCB1 ATP-binding cassette, sub-family B (MDR/TAP), member 1 NS M95178 ACTN1 actinin, alpha 1 NS U76421 ADARB1 adenosine deaminase, RNA-specific, B1 (RED1 homolog rat) NS U46689 ALDH3A2 aldehyde dehydrogenase 3 family, member A2 NS L34820 ALDH5A1 aldehyde dehydrogenase 5 family, member A1 (succinate-semialdehyde dehydrogenase) NS M84332 ARF1 ADP-ribosylation factor 1 NS D14710 ATP5A1 ATP synthase, H+ transporting, mitochondrial F1 NS complex, alpha subunit 1, cardiac muscle X84213 U23070 M33518 r Fo BAK1 BCL2-antagonist/killer 1 NS BAMBI BMP and activin membrane-bound inhibitor homolog (Xenopus laevis) NS BAT2 HLA-B associated transcript 2 NS Pe X61123 BTG1 B-cell translocation gene 1, anti-proliferative NS S60415 CACNB2 calcium channel, voltage-dependent, beta 2 subunit NS M19878 CALB1 calbindin 1, 28kDa NS L76380 CALCRL calcitonin receptor-like NS M21121 CCL5 chemokine (C-C motif) ligand 5 NS D14664 CD302 CD302 molecule NS X72964 CETN2 centrin, EF-hand protein, 2 NS U66468 CGREF1 cell growth regulator with EF-hand domain 1 NS M63379 CLU clusterin NS X52022 COL6A3 collagen, type VI, alpha 3 L25286 COL15A1 collagen, type XV, alpha 1 NS S45630 CRYAB crystallin, alpha B NS X95325 CSDA cold shock domain protein A NS U03100 CTNNA1 catenin (cadherin-associated protein), alpha 1, 102kDa NS X52142 CTPS CTP synthase NS D38549 CYFIP1 cytoplasmic FMR1 interacting protein 1 NS X64229 DEK DEK oncogene (DNA binding) NS er ew vi Re 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 OMICS: A Journal of Integrative Biology 12 Mary Ann Liebert, Inc., 140 Huguenot Street, New Rochelle, NY 10801 UE OMICS: A Journal of Integrative Biology M63391 DES desmin UE Z34918 EIF4G3 eukaryotic translation initiation factor 4 gamma, 3 NS U97018 EML1 echinoderm microtubule associated protein like 1 NS U12255 FCGRT Fc fragment of IgG, receptor, transporter, alpha NS U36922 FOXO1A forkhead box O1A (rhabdomyosarcoma) NS U91903 FRZB frizzled-related protein NS M33197 GAPDH glyceraldehyde-3-phosphate dehydrogenase NS U09587 GARS glycyl-tRNA synthetase NS r Fo U66075 D13988 U31176 GATA6 GATA binding protein 6 NS GDI2 GDP dissociation inhibitor 2 NS GFER growth factor, augmenter of liver regeneration (ERV1 homolog, S. cerevisiae) NS Pe U28811 GLG1 golgi apparatus protein 1 NS U66578 GPR23 G protein-coupled receptor 23 NS L40027 GSK3A glycogen synthase kinase 3 alpha NS U77948 GTF2I general transcription factor II, i UE Z29481 HAAO 3-hydroxyanthranilate 3,4-dioxygenase NS D16480 HADHA hydroxyacyl-Coenzyme A dehydrogenase/3ketoacyl-Coenzyme A thiolase/enoyl-Coenzyme A hydratase (trifunctional protein), alpha subunit NS U50079 HDAC1 histone deacetylase 1 NS U50078 HERC1 hect (homologous to the E6-AP (UBE3A) carboxyl terminus) domain and RCC1 (CHC1)-like domain (RLD) 1 NS M95623 HMBS hydroxymethylbilane synthase NS X79536 HNRPA1 heterogeneous nuclear ribonucleoprotein A1 NS L15189 HSPA9B heat shock 70kDa protein 9B (mortalin-2) NS U05875 IFNGR2 interferon gamma receptor 2 (interferon gamma transducer 1) NS X57025 IGF1 insulin-like growth factor 1 (somatomedin C) UE HG3543HT3739_at IGF2 insulin-like growth factor 2 (somatomedin A) NS U40282 ILK integrin-linked kinase NS er ew vi Re 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 Page 42 of 51 13 Mary Ann Liebert, Inc., 140 Huguenot Street, New Rochelle, NY 10801 Page 43 of 51 X74295 ITGA7 integrin, alpha 7 NS X57206 ITPKB inositol 1,4,5-trisphosphate 3-kinase B NS AB002365 KIAA0367 UE J00124 KRT14 keratin 14 (epidermolysis bullosa simplex, Dowling-Meara, Koebner) NS X05153 LALBA lactalbumin, alpha- NS X02152 LDHA lactate dehydrogenase A NS HG3527HT3721_f_at LHB luteinizing hormone beta polypeptide NS X86018 LRRC41 leucine rich repeat containing 41 NS MFAP4 microfibrillar-associated protein 4 NS MIA3 melanoma inhibitory activity family, member 3 NS MSN moesin NS r Fo L38486 D87742 M69066 Pe AB003177 mRNA for proteasome subunit p27 NS U47742 MYST3 MYST histone acetyltransferase (monocytic leukemia) 3 NS M30269 NID1 nidogen 1 NS M10901 NKX3-1 NK3 transcription factor related, locus 1 (Drosophila) NR3C1 nuclear receptor subfamily 3, group C, member 1 (glucocorticoid receptor) Re U80669 er NS NS M16801 NR3C2 nuclear receptor subfamily 3, group C, member 2 NS U52969 PCP4 Purkinje cell protein 4 UE J03278 PDGFRB platelet-derived growth factor receptor, beta polypeptide NS D37965 PDGFRL platelet-derived growth factor receptor-like NS Z49835 PDIA3 protein disulfide isomerase family A, member 3 NS U78524 PIAS1 protein inhibitor of activated STAT, 1 NS U60644 PLD3 phospholipase D family, member 3 NS D11428 PMP22 peripheral myelin protein 22 NS U79294 PPAP2B phosphatidic acid phosphatase type 2B NS S71018 PPIC peptidylprolyl isomerase C (cyclophilin C) NS X07767 PRKACA protein kinase, cAMP-dependent, catalytic, alpha NS ew vi 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 OMICS: A Journal of Integrative Biology 14 Mary Ann Liebert, Inc., 140 Huguenot Street, New Rochelle, NY 10801 OMICS: A Journal of Integrative Biology X83416 PRNP prion protein (p27-30) NS M55671 PROZ protein Z, vitamin K-dependent plasma glycoprotein NS U72066 RBBP8 retinoblastoma binding protein 8 NS L25081 RHOC ras homolog gene family, member C NS U40369 SAT1 spermidine/spermine N1-acetyltransferase 1 NS M97287 SATB1 special AT-rich sequence binding protein 1 (binds to nuclear matrix/scaffold-associating DNA's) NS U83463 SDCBP syndecan binding protein (syntenin) NS U28369 SEMA3B sema domain, immunoglobulin domain (Ig), short basic domain, secreted, (semaphorin) 3B NS SFTPA2 surfactant, pulmonary-associated protein A2 NS HG3925HT4195_at L31801 r Fo SLC16A1 solute carrier family 16, member 1 (monocarboxylic acid transporter 1) SLC2A4 solute carrier family 2 (facilitated glucose transporter), member 4 SMARCD1 SWI/SNF related, matrix associated, actin dependent regulator of chromatin, subfamily d, member 1 NS NS U50383 SMYD5 SMYD family member 5 NS D43636 SNRK SNF related kinase NS D87465 SPOCK2 sparc/osteonectin, cwcv and kazal-like domains proteoglycan (testican) 2 NS M61199 SSFA2 sperm specific antigen 2 NS U15131 ST5 suppression of tumorigenicity 5 U95006 STRA13 stimulated by retinoic acid 13 homolog (mouse) NS M74719 TCF4 transcription factor 4 NS X14253 TDGF1 teratocarcinoma-derived growth factor 1 NS U52830 TERT telomerase reverse transcriptase NS U12471 THBS1 thrombospondin 1 NS U16296 TIAM1 T-cell lymphoma invasion and metastasis 1 NS L01042 TMF1 TATA element modulatory factor 1 NS U03397 TNFRSF9 tumor necrosis factor receptor superfamily, member 9 NS X05276 TPM4 tropomyosin 4 UE M91463 U66617 er Pe ew vi Re 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 Page 44 of 51 15 Mary Ann Liebert, Inc., 140 Huguenot Street, New Rochelle, NY 10801 NS NS Page 45 of 51 HG4683HT5108_s_at TRAF2 TNF receptor-associated factor 2 NS U64444 UFD1L ubiquitin fusion degradation 1 like (yeast) NS U39318 UBE2D3 ubiquitin-conjugating enzyme E2D 3 (UBC4/5 homolog, yeast) NS X59739 ZFX zinc finger protein, X-linked NS r Fo er Pe ew vi Re 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 OMICS: A Journal of Integrative Biology 16 Mary Ann Liebert, Inc., 140 Huguenot Street, New Rochelle, NY 10801 OMICS: A Journal of Integrative Biology Table 6. A list of 34 synapomorphies defining a clade composed of all gastric cancer specimens (GDS1210). Synapomorphies include: 8 OE genes, 24 UE genes, and 2 DE genes in comparison with the normal specimens. Last column reports the status of the synapomorphies as described by Hippo et al. (2002). Yes= listed; No= not listed. A. Overexpressed synapomorphic genes: X81817 BAP31 mRNA No D50914 BOP1 block of proliferation 1 No X54667 CST4: cystatin S MGC71923 Yes L17131 HMGA1 high mobility group AT-hook 1 No D63874 HMGB1 high-mobility group box 1 No D26600 PSMB4 proteasome (prosome, macropain) subunit, beta type, 4 No U36759 PTCRA pre T-cell antigen receptor alpha PT-ALPHA, PTA No X89750 TGIF TGFB-induced factor (TALE family homeobox) No r Fo er Pe B. Underexpressed synapomorphic genes: Re X76342 ADH7 alcohol dehydrogenase 7 (class IV), mu or sigma polypeptide ADH-4 No M63962 ATP4A ATPase, H+/K+ exchanging, alpha polypeptide ATP6A No M75110 ATP4B ATPase, H+/K+ exchanging, beta polypeptide ATP6B No J05401 CKMT2 creatine kinase, mitochondrial 2 (sarcomeric) No L38025 CNTFR ciliary neurotrophic factor receptor No M61855 CYP2C9: cytochrome P450, family 2, subfamily C, polypeptide 9 CPC9 No D63479 DGKD: diacylglycerol kinase, delta 130kDa DGKdelta, KIAA0145, dgkd-2 No X99101 ESR2 estrogen receptor 2 (ER beta) No U21931 FBP1 fructose-1,6-bisphosphatase 1 No HG3432HT3618_ at Fibroblast Growth Factor Receptor K-Sam, Alt. Splice 1 No M31328 GNB3 guanine nucleotide binding protein (G protein), beta polypeptide 3 No D42047 GPD1L glycerol-3-phosphate dehydrogenase 1-like No ew vi 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 Page 46 of 51 17 Mary Ann Liebert, Inc., 140 Huguenot Street, New Rochelle, NY 10801 Page 47 of 51 M62628 Human alpha-1 Ig germline C-region membrane-coding region, 3' end No D29675 Human inducible nitric oxide synthase gene, promoter and exon 1 No M63154 Human intrinsic factor mRNA No Z29074 KRT9 keratin 9 (epidermolytic palmoplantar keratoderma) EPPK, K9 No X05997 LIPF lipase, gastric No U50136 LTC4S leukotriene C4 synthase MGC33147 No X76223 MAL: mal, T-cell differentiation protein No U19948 PDIA2 protein disulfide isomerase family A, member 2 No L07592 PPARD peroxisome proliferative activated receptor, delta No U57094 RAB27A, member RAS oncogene family No AC00207 7 SLC38A3 solute carrier family 38, member 3 No Z29574 TNFRSF17 tumor necrosis factor receptor superfamily, member 17 Pe No r Fo C. Dichotomously-expressed synapomorphic genes: D00408 CYP3A7 cytochrome P450, family 3, subfamily A, polypeptide 7 CP37, P450HFLA No U29091 SELENBP1 selenium binding protein 1 No er ew vi Re 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 OMICS: A Journal of Integrative Biology 18 Mary Ann Liebert, Inc., 140 Huguenot Street, New Rochelle, NY 10801 OMICS: A Journal of Integrative Biology Table 7. Interplatform concordance. A list of overlapping identical (22) and homologous (23) synapomorphic genes in leiomyoma specimens of GDS484 & GDS533. These include: 9 OE, 24 UE, and 12 DE. GDS533 GDS484 A. Overexpressed synapomorphic genes a. Identical Synapomorphies DDB2 DDB2 FUT8 FUT8 MEST MEST TMSL8 TMSL8 TYMS TYMS FOSB JUNB PPP4C SLC20A1 THBS1 WDR43 b. Homologous synapomorphies a. Identical synapomorphies CACNB3 COL5A2 KIAA0367 PRKAR1B CTSB r Fo CACNA1C COL4A5 KIAA0101 PRKACB CTSB b. Homologous synapomorphies er ARL4D FOXO1A GNB3 ITGA6 ITGA9 KCNK1 MFAP5 PSMC3 SELP TXNIP ZNF43 ew vi Re B. Underexpressed synapomorphic genes a. Identical synapomorphies ALDH1A1 ALDH1A1 ALDH2 ALDH2 ATF3 ATF3 CEBPD CEBPD CXADR CXADR CYR61 CYR61 DUSP1 DUSP1 FOS FOS HRASLS3 HRASLS3 IER2 IER2 JUN JUN KRT19 KRT19 RARRES2 RARRES2 TACSTD2 TACSTD2 TNXB TNXB VIL2 VIL2 b. Homologous synapomorphies CASP9 CASP4 CYBA CYB5R1 FOS JUN PPP1R10 SLC18A2 THBD WDR37 C. Dichotomously-expressed synapomorphic Genes Pe 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 Page 48 of 51 ARL4C FOXJ3 GNB1L ITGA2B ITGA2B KCNJ5 MFAP4 PSMC2 SELL TXNDC13 ZNF259P 19 Mary Ann Liebert, Inc., 140 Huguenot Street, New Rochelle, NY 10801 Page 49 of 51 Table 8. Summary of concordance results between GDS484 and GDS533. The comparisons were carried out in various combinations: statistical v. statistical, phylogenetic v. statistical, and phylogenetic v. phylogenetic. GDS533 (Fibroids and Leiomyosarcomas) Quad et al. Abu-Asab et al. Abu-Asab et al. Gene List Synapomorphies Synapomorphies for for Fibroids (146) Fibroids and Leiomyosarcoma (32) Concordance Hoffman et al. Gene List GDS484 Abu-Asab et al. (Fibroids) Synapomorphies for Fibroids (1485) GDS533 Quad et al. Gene List r Fo 12% 18% 19.3% 20% 31% 48% 16.5% 45% er Pe ew vi Re 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 OMICS: A Journal of Integrative Biology 20 Mary Ann Liebert, Inc., 140 Huguenot Street, New Rochelle, NY 10801 OMICS: A Journal of Integrative Biology Table 9. Interplatform comparability. A list of 16 synapomorphies defining a clade composed of all gastric cancer (GDS1210) as well as uterine sarcoma and leiomyoma specimens (GDS533). ID Gene U52522 ARFIP2 ADP-ribosylation factor interacting protein 2 (arfaptin 2) U51478 ATP1B3 ATPase, Na+/K+ transporting, beta 3 polypeptide X66839 CA9 carbonic anhydrase IX M60974 X01677M33197 r Fo GADD45A growth arrest and DNA-damage-inducible, alpha GAPDH glyceraldehyde-3-phosphate dehydrogenase [two readings] Pe X14850 H2AFX H2A histone family, member X U52830 Homo sapiens Cri-du-chat region mRNA, clone CSC8 U25138 KCNMB1 potassium large conductance calcium-activated channel, subfamily M, beta member 1 D21063 MCM2 minichromosome maintenance deficient 2 L38486 MFAP4 microfibrillar-associated protein 4 D87463 PHYHIP phytanoyl-CoA 2-hydroxylase interacting protein X02419 PLAU plasminogen activator, urokinase L48513 PON2 paraoxonase 2 U29091 SELENBP1 selenium binding protein 1 Z19083 TPBG trophoblast glycoprotein M25077 TROVE2 TROVE domain family, member 2 er ew vi Re 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 21 Mary Ann Liebert, Inc., 140 Huguenot Street, New Rochelle, NY 10801 Page 50 of 51 Page 51 of 51 Table 10. Summary of the characteristics of a parsimonious phylogenetic analysis through polarity assessment of gene-expression values followed by a maximum parsimony analysis. Offers a qualitative assessment of microarray gene-expression data; uses only shared derived states (synapomorphies) as the basis of similarity between specimens. Efficiently models the heterogeneous expression profiles of the diseased r Fo specimens. Those with fast mutation rate such as cancer. Incorporates gene-expressions that violate normal distribution in a set of Pe specimens—e.g., dichotomously expressed genes. Identifies synapomorphies and uses them to delineate clades (class er discovery). Synapomorphies are also the potential biomarkers. Reduces experimental noise. Re Permits pooling of multiple experiments. vi Allows intra and intercomparability of data. ew 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 OMICS: A Journal of Integrative Biology Produces higher concordance between gene lists than statistical methods (F & t-statistics and fold-change). Offers a non-parametric data-based, not specimen-based, gene listing and gene linkage., gene listing and gene linkage. 22 Mary Ann Liebert, Inc., 140 Huguenot Street, New Rochelle, NY 10801 NIH Public Access Author Manuscript J Proteome Res. Author manuscript; available in PMC 2008 March 20. NIH-PA Author Manuscript Published in final edited form as: J Proteome Res. 2006 September ; 5(9): 2236–2240. Phyloproteomics: What Phylogenetic Analysis Reveals about Serum Proteomics Mones Abu-Asab*,†, Mohamed Chaouchi‡, and Hakima Amri§ Laboratory of Pathology, National Cancer Institute, National Institutes of Health, Bethesda, Maryland, National Oceanic and Atmospheric Administration, National Ocean Service, CO-OPS/ Information Systems Division, Silver Spring, Maryland, and Department of Physiology and Biophysics, School of Medicine, Georgetown University, Washington, D.C. Abstract NIH-PA Author Manuscript Phyloproteomics is a novel analytical tool that solves the issue of comparability between proteomic analyses, utilizes a total spectrum-parsing algorithm, and produces biologically meaningful classification of specimens. Phyloproteomics employs two algorithms: a new parsing algorithm (UNIPAL) and a phylogenetic algorithm (MIX). By outgroup comparison, the parsing algorithm identifies novel or vanished MS peaks and peaks signifying up or down regulated proteins and scores them as derived or ancestral. The phylogenetic algorithm uses the latter scores to produce a biologically meaningful classification of the specimens. Keywords Cancer; dichotomous development; mass spectrometry; phylogenetics; phyloproteomics; proteomics; serum; transitional clades Introduction NIH-PA Author Manuscript The utilization of the serum proteome to accurately diagnose cancer has been challenging, and its future continues to be surrounded by uncertainties.1 Although statistical analysis of mass spectrometry (MS) profiles of serum proteins has gained enormous popularity and credibility, 2-6 algorithmic analysis that produces biologically meaningful results with possible clinical diagnosis is still lacking. It now seems very simplistic to attempt to define cancer on the basis of statistical patterns, since cancer is a multifaceted evolving and adapting cellular condition with multiple proteomic profiles; some of these profiles cannot always be separated from noncancerous ones by narrowly defined statistical proteomic patterns on the basis of a limited number of spectral peaks. Cancer's incipience is marked by mutations that cause the malfunction of the apoptotic apparatus of the cell, and its promotion is characterized by different phases with each having its distinct proteomic profile.7,8 Advanced progression of cancer is marked by cellular dedifferentiation, loss of apoptosis, and metamorphosis into a primordial status where survival, and not function, becomes the cell's primary mission.8 In this latter stage, many proteins responsible for differentiation are not produced, and therefore missing MS peaks are as significant in defining the proteomic profiles of cancer. * To whom correspondence should be addressed. [email protected].. †National Institutes of Health. ‡National Oceanic and Atmospheric Administration. §Georgetown University. Abu-Asab et al. Page 2 NIH-PA Author Manuscript The multiphasic nature of cancer progression combined with possible multiple developmental pathways8-11 entail the presence of a large number of proteomic changes for each type of cancer and its phases. These factors suggest that the proteomic profile of a cancer type is a hierarchical and continuous accumulation of proteomic change over time rather than one or a few simple distinct proteomic patterns. For an analytical tool to be successful in producing a clinical diagnosis, it has to uncover the hierarchical profile of cancer and be able to place a specimen within this profile. NIH-PA Author Manuscript In the present study, we propose that cancer can be promptly diagnosed, even at early stages, by phylogenetic analysis of the serum proteome. Since cancer is an evolutionary condition that involves genetic modifications and clonal production, it therefore requires an evolutionary method of analysis. Such an analysis is possible if an algorithm for sorting out the polarity (derived vs ancestral) of the MS values is available. We are demonstrating here through our polarity assessment algorithm (UNIPAL) that this task can be performed, and MS data can be analyzed with an evolutionary algorithm (Figure 1). Phyloproteomics is an evolutionary analytical tool that sorts out mass-to-charge (m/z) values into derived (apomorphic) or ancestral (plesiomorphic) and then classifies specimens according to the distribution pattern of their apomorphies into clades (a group composed of all the specimens sharing the same apomorphies). Phyloproteomics also illustrates the multiphasic nature of cancer by assigning cancer specimens to a hierarchical classification with each hierarchy defined by the apomorphic protein changes that are present in its specimens. The classification is presented in a graphical display termed cladogram or tree. The assumption that all cancerous specimens fit into welldefined proteomic models (patterns based on a few peaks) that distinguish them from noncancerous ones12-16 is replaced here by phylogenetically distinct clades of specimens with each clade sharing unique protein changes (synapomorphies) among its specimens. Methods Proteomic Data We used mass spectrometry (MS) data of serum proteins generated by surface-enhanced laser desorption–ionization time-of-flight (SELDI-TOF) of 460 specimens from three types of cancer: ovarian (143), pancreatic (70), and prostate (36), as well as from noncancerous specimens (211). All sets of data used here are available from the NCI–FDA Clinical Proteomics Program (http://home.ccr.cancer.gov/ncifdaproteomics/ppatterns.asp) and are described and referred to in a few publications.12,13,15,17,18 From the prostate cancer data set, we included only the confirmed cancerous specimens. Polarity Assessment and Phylogenetic Analysis NIH-PA Author Manuscript We employed the continuous range of mass-to-charge ratio (m/z) values of all specimens for the analysis. For polarity assessment (apomorphic [or derived] vs plesiomorphic [or ancestral]), data were polarized with a customized algorithm (UNIPAL) written by the authors that recognized novel and vanished MS peaks, as well as peaks signifying upregulated and downregulated proteins for each specimen. Each of these events was coded as equal; however, no standardization, normalization, or smoothing of the data was applied before or after polarity assessment—UNIPAL does not require any of these processes to carry out the polarization. Outgroups used to carry out polarity for each cancer type were selected from the noncancerous specimens; each outgroup encompassed the total variability within the noncancerous specimens. UNIPAL requires a set of noncancerous specimens to be included in every separate data set in order to be used as an outgroup. It determines the polarity for every m/z value among the noncancerous specimens and then scores each value of the study group as derived or ancestral. J Proteome Res. Author manuscript; available in PMC 2008 March 20. Abu-Asab et al. Page 3 The outgroup should be large enough to encompass all possible variations that exist within noncancerous specimens. NIH-PA Author Manuscript For phylogenetic analysis, we used MIX, the parsimony program of PHYLIP version 3.57c, 19 to carry out separate phylogenetic parsimony analysis for each cancer type and then pooled all the specimens of the three cancer types plus the noncancerous in a larger analysis that included all 460 specimens. Processing with MIX was carried out in randomized and nonrandomized inputs; however, no significant differences were observed between the two. Phylogenetic trees were drawn using TreeView.20 Results and Discussion NIH-PA Author Manuscript The results of a phylogenetic analysis are best illustrated by a phylogenetic tree termed cladogram that shows the hierarchical classification in a graphical format. Parsimony analysis produced one most parsimonious cladogram (requiring the least number of steps in constructing a classification of specimens) for each of the pancreatic and prostate specimens (Figure 2a,b), five equally parsimonious cladograms for ovarian specimens (Figure 2c shows only one), and about 100 equally parsimonious cladograms for the inclusive analysis (Figure 3 summarizes only one). We examined all multiple equally parsimonious cladograms and found them to be fundamentally very similar in topology. They differed only in the internal arrangement of some minor branches where one or two specimens had equally plausible locations within their immediate clade. A complete separation of the cancer specimens from noncancerous ones depended on the size of the noncancerous outgroup used to carry out polarity assessment. Polarizing the m/z values with the largest size outgroups (ones encompassing the largest amount of variation) available for each cancer type produced cladograms with separate groupings of cancerous and noncancerous specimens, that is, no cancer specimens grouped with the healthy and vice versa (100% sensitivity and specificity). However, with the use of randomly selected smaller outgroups, sensitivity dropped to 96% and below; this illustrates the significance of using the largest number possible for outgroup polarity assessment. NIH-PA Author Manuscript Each of the cladograms (Figure 2a–c) showed an upper bifurcation composed of cancerous specimens, while the lower end of the cladogram was occupied by a number of basal clades composed of noncancerous specimens and a central assemblage of noncancerous clades adjacent to cancerous ones. The latter assembly formed a distinct order of well-resolved and mostly single-specimen clades in the middle of the cladogram nested between the cancer and healthy clades (bracketed arrows in Figure 2a–c); we termed them transitional clades (TC). The transitional clades bordered their respective types (cancer or noncancer) in a tandem arrangement that formed a transitional zone (TZ) between the noncancer and cancer clades. When data of all specimens of the three cancer types were pooled together with noncancerous ones and processed, each of the three cancers formed two large clades (the terminal and middle) and numerous small transitional clades adjacent to the noncancerous ones (Figure 3). The pancreatic and prostate clades formed sister groups in their terminal and middle clades, and their terminal clades were nested within the ovarian clades. The ovarian specimens formed two distinct clades at the upper part of the cladogram. The cladograms revealed greater similarities in topology among cancer types. For each of the three cancer types, there were two large recognizable clades (the terminal and the middle) forming a major dichotomy that encompassed the majority of the specimens of each type (Figure 2a–c). This dichotomy persisted in the inclusive cladogram as well (Figure 3), with each of the cancers having two clades. J Proteome Res. Author manuscript; available in PMC 2008 March 20. Abu-Asab et al. Page 4 NIH-PA Author Manuscript The use of mass spectrometry (MS) of serum proteins to produce clinically useful profiles has proved to be challenging and has generated some controversy.21-23 Although several methods have been published thus far,13-16 they all either had cancer type-specific sorting algorithms that produced below 95% specificity and did not apply well across other cancer types, did not utilize all potentially useful variability within the data, or were not widely tested.16,24 Furthermore, their relative success has been limited to diagnosis without any of the predictive conclusions potentially offered by phyloproteomics. Since cancer is an evolutionary condition produced by a set of mutations,7 its study should include evolutionary sound methods of analysis. Phylogenetics reveals both relatedness and diversity through a hypothesis of relationships among the specimens on the basis of the parsimonious distribution of novel m/z values of their proteomes. This is the first report on the application of a phylogenetic algorithm to MS serum proteomic data for cancer analysis. By developing and applying an algorithm for polarity assessment and then using a parsimony phylogenetic algorithm for classifying specimens of three cancer types (ovarian, pancreatic, and prostate), we demonstrated that phylogenetics can successfully be applied to MS serum proteomic data for cancer analysis, diagnosis, typing, and possibly susceptibility assessment. Additionally, phyloproteomics points out the presence of distinct trends within cancer proteomic profiles. NIH-PA Author Manuscript Despite the good number of algorithms used for MS serum analysis,13-16 reproducibility and comparability of proteomic analyses are unattainable because of the lack of broadly acceptable universal methods of analysis. Phyloproteomics is composed of two algorithms that are applicable to MS data of any cancer (Figure 1). The first algorithm, UNIPAL, is a new polarity assessment program that we designed to work with MS data to produce a listing of novel derived values in a coded format, and the second algorithm is a popular phylogenetic parsimony program, MIX of the PHYLIP package,19 that uses the values generated by the first algorithm to classify the specimens. MIX is a robust analytical package that has been tested by scientists for the past 16 years, and is probably the most cited in phylogenetic studies. An added benefit to this approach is that it makes possible the comparison among results from different data sets and the evaluation of competing analytical tools. NIH-PA Author Manuscript Phylogenetics has the intrinsic ability to reveal meaningful biological patterns by grouping together truly related specimens better than any other known methods (Table 1). Proteomic variability encompasses ancestral and derived variations, and only derived m/z intensity values are useful in classifying cancer types and subtypes into a meaningful hierarchy that reflects the phylogeny and ontogeny of their proteomic profiles. While clustering techniques use the presence of common peaks (without resolving their polarity) in order to create distinct patterns and then fit a specimen within a pattern,12,14,16 phylogenetics requires polarity assessment to sort out m/z intensities into derived and ancestral at first and then uses the distribution pattern of derived values among the specimens to produce their classification (i.e., the cladogram). Using only common intensity peaks without polarity assessment for pattern modelling has not been the most reliable means of classification.12,14 This is because clustering usually involves ancestral values and does not resolve multiple origins of a character (parallelisms), and both result in polyphyletic grouping (having unrelated specimens). Furthermore, phylogenetics can resolve the position of a novel specimen with new variations by placing it in a group that comprises its closest relatives on the basis of the number of apomorphic mutations it shares with them (Table 1). Phyloproteomics has a potential for cancer predictivity. Predictivity here is defined as the capacity of the classification to predict the characteristics of a specimen by determining the specimen's location within a cladogram. By using an ample number of well-characterized cancer specimens in an analysis, the unknown characters of a new specimen will be forecasted J Proteome Res. Author manuscript; available in PMC 2008 March 20. Abu-Asab et al. Page 5 NIH-PA Author Manuscript when it assembles within a clade in the cladogram. The specimen's location in a cladogram is always based on the type of mutations it carries and shares with the clade members, which will determine the diagnosis, cancer type, or possibly the susceptibility to developing cancer. Cladogram topology shows a hierarchical accumulation of novel serum protein changes across a continuum spanning from the transitional noncancerous specimens to the cancerous ones, with the latter having the highest number of apomorphic mutations. Cladograms also revealed that the three types of cancer have fundamentally similar topologies; they all have one major dichotomy that indicates two lineages within each type (represented on the cladograms by the terminal clade and the middle clade [Figures 2–3]). If this typification holds true for additional cancer types, then it is possible that ontogenetically all types of cancers undergo two major common pathways in their development. There are only a few recent reports that support a dichotomous pattern of development8 in colorectal cancer,9 glioblastomas,10 and pancreatic carcinoma.11 Dichotomies may arise in cancer because of the selective advantages of cells harboring various mutations; the surviving mutations can be genetic or chromosomal,8,9 point mutation or amplification,10 or differential expression of alleles.11 NIH-PA Author Manuscript Noncancerous transitional clades, present in all cladograms and mostly composed of individual specimens, are the closest sister groups to cancer clades. Because of their proximity to cancer clades, we hypothesize that these specimens, assumed to be from cancer-free individuals, represent the early stages of cancer development that cannot yet be morphologically or microscopically diagnosed as cancerous. For diagnostic purposes, cancerous and noncancerous transitional specimens will always be challenging to classify by other techniques. Occasionally, these specimens are distinct from one another by only very few apomorphies. The mostly single specimen composition of the transitional clades attests to their uniqueness. Current diagnosis of cancer is not based on the number of mutations or synapomorphies; therefore, the determination of the status of a transitional specimen is still subjective unless a clear definition that is based on derived mutations is established by pathologists. Till then we suggest that the position of a transitional specimen within the transitional zone determines its diagnosis; if a specimen is on the upper end of the transitional zone (i.e., bordering cancer clades), then it is a cancerous specimen, and those occurring in the middle and lower end of the transitional zone are to be called high risk specimens. So far, we have not yet carried out any correlations between specimens on the cladograms and patients' survival. Therefore, it is uncertain at this stage of the analysis if the terminal clades of cancers represent the advanced stages of cancer progression or if the two major clades have any prediction on prognosis. NIH-PA Author Manuscript Searching for biomarkers is a challenging process in biomedical research, and phyloproteomics offers the capacity to uncover many possible ones. The phylogenetic program, MIX, lists the shared derived m/z intensity values (synapomorphies) of each clade it produces, and each synapomorphy is a possible biomarker. In other words, the cladogram serves as a map showing the apomorphic m/z values of all potential biomarkers and their effective levels of groupings. A synapomorphy may represent a novel protein, a disappeared protein, or an up/down regulated protein; thus, these proteins corresponding to the apomorphic m/z values need to be identified if they are to be explored as biomarkers. Since the cladograms have hierarchical arrangement (i.e., presenting various levels of groupings) one can look for biomarkers at various levels of the cladogram. An apomorphic protein (we would like to call it apotein) that defines a clade will serve as a potential biomarker for the clade, while another apotein defining a nested subclade within the clade will be restricted as biomarker to the subgroup within the clade. J Proteome Res. Author manuscript; available in PMC 2008 March 20. Abu-Asab et al. Page 6 Conclusion NIH-PA Author Manuscript Phyloproteomics offers a new paradigm in cancer analysis that reveals relatedness and diversity of cancer specimens in a phylogenetic sense; its predictive power is a useful tool for diagnosis, characterizing cancer types, discovering biomarkers, and identifying universal characteristics that transcend several types of cancer. The implications of the new paradigm are of valuable clinical, academic, and scientific value. References NIH-PA Author Manuscript NIH-PA Author Manuscript 1. Hede K. $104 million proteomics initiative gets green light. J. Natl. Cancer Inst 2005;97(18):1324– 1325. [PubMed: 16174850] 2. Issaq HJ, Conrads TP, Prieto DA, Tirumalai R, Veenstra TD. SELDI-TOF MS for diagnostic proteomics. Anal. Chem 2003;75(7):148A–155A. 3. Marvin LF, Roberts MA, Fay LB. Matrix-assisted laser desorption/ionization time-of-flight mass spectrometry in clinical chemistry. Clin. Chim. Acta 2003;337(1−2):11–21. [PubMed: 14568176] 4. Merchant M, Weinberger SR. Recent advancements in surface-enhanced laser desorption/ionization– time-of-flight-mass spectrometry. Electrophoresis 2000;21(6):1164–1177. [PubMed: 10786889] 5. Pusch W, Flocco MT, Leung SM, Thiele H, Kostrzewa M. Mass spectrometry-based clinical proteomics. Pharmacogenomics 2003;4(4):463–476. [PubMed: 12831324] 6. Srinivas PR, Srivastava S, Hanash S, Wright GL Jr. Proteomics in early detection of cancer. Clin. Chem 2001;47(10):1901–1911. [PubMed: 11568117] 7. Wyllie AH, Bellamy CO, Bubb VJ, Clarke AR, Corbet S, Curtis L, Harrison DJ, Hooper ML, Toft N, Webb S, Bird CC. Apoptosis and carcinogenesis. Br. J. Cancer 1999;80(Suppl 1):34–37. [PubMed: 10466759] 8. Loeb KR, Loeb LA. Significance of multiple mutations in cancer. Carcinogenesis 2000;21(3):379– 385. [PubMed: 10688858] 9. Chung DC. The genetic basis of colorectal cancer: insights into critical pathways of tumorigenesis. Gastroenterology 2000;119(3):854–865. [PubMed: 10982779] 10. Hayashi Y, Yamashita J, Watanabe T. Molecular genetic analysis of deep-seated glioblastomas. Cancer Genet Cytogenet 2004;153(1):64–68. [PubMed: 15325097] 11. Adsay NV, Merati K, Andea A, Sarkar F, Hruban RH, Wilentz RE, Goggins M, Iocobuzio-Donahue C, Longnecker DS, Klimstra DS. The dichotomy in the preinvasive neoplasia to invasive carcinoma sequence in the pancreas: differential expression of MUC1 and MUC2 supports the existence of two separate pathways of carcinogenesis. Mod. Pathol 2002;15(10):1087–1095. [PubMed: 12379756] 12. Petricoin EE, Paweletz CP, Liotta LA. Clinical applications of proteomics: proteomic pattern diagnostics. J. Mammary Gland Biol. Neoplasia 2002;7(4):433–440. [PubMed: 12882527] 13. Alexe G, Alexe S, Liotta LA, Petricoin E, Reiss M, Hammer PL. Ovarian cancer detection by logical analysis of proteomic data. Proteomics 2004;4(3):766–783. [PubMed: 14997498] 14. Conrads TP, Fusaro VA, Ross S, Johann D, Rajapakse V, Hitt BA, Steinberg SM, Kohn EC, Fishman DA, Whitely G, Barrett JC, Liotta LA, Petricoin EF III, Veenstra TD. High-resolution serum proteomic features for ovarian cancer detection. Endocr.-Relat. Cancer 2004;11(2):163–178. [PubMed: 15163296] 15. Zhu W, Wang X, Ma Y, Rao M, Glimm J, Kovach JS. Detection of cancer-specific markers amid massive mass spectral data. Proc. Natl. Acad. Sci. U.S.A 2003;100(25):14666–14671. [PubMed: 14657331] 16. Adam BL, Qu Y, Davis JW, Ward MD, Clements MA, Cazares LH, Semmes OJ, Schellhammer PF, Yasui Y, Feng Z, Wright GL Jr. Serum protein fingerprinting coupled with a pattern-matching algorithm distinguishes prostate cancer from benign prostate hyperplasia and healthy men. Cancer Res 2002;62(13):3609–3614. [PubMed: 12097261] 17. Petricoin EF, Ornstein DK, Paweletz CP, Ardekani A, Hackett PS, Hitt BA, Velassco A, Trucco C, Wiegand L, Wood K, Simone CB, Levine PJ, Linehan WM, Emmert-Buck MR, Steinberg SM, Kohn EC, Liotta LA. Serum proteomic patterns for detection of prostate cancer. J. Natl. Cancer Inst 2002;94 (20):1576–1578. [PubMed: 12381711] J Proteome Res. Author manuscript; available in PMC 2008 March 20. Abu-Asab et al. Page 7 NIH-PA Author Manuscript 18. Petricoin EF III, Ardekani AM, Hitt BA, Levine PJ, Fusaro VA, Steinberg SM, Mills GB, Simone C, Fishman DA, Kohn EC, Liotta LA. Use of proteomic patterns in serum to identify ovarian cancer. Lancet 2002;359(9306):572–577. [PubMed: 11867112] 19. Felsenstein, J. PHYLIP: Phylogeny Inference Package, version 3.2.; Cladistics. 1989. p. 164-166. 20. Page RD. TreeView: an application to display phylogenetic trees on personal computers. Comput. Appl. Biosci 1996;12(4):357–358. [PubMed: 8902363] 21. Baggerly KA, Morris JS, Coombes KR. Reproducibility of SELDI-TOF protein patterns in serum: comparing datasets from different experiments. Bioinformatics 2004;20(5):777–785. [PubMed: 14751995] 22. Sorace JM, Zhan M. A data review and re-assessment of ovarian cancer serum proteomic profiling. BMC Bioinformatics 2003;4(1):24. [PubMed: 12795817] 23. Check E. Proteomics and cancer: running before we can walk? Nature 2004;429(6991):496–497. [PubMed: 15175721] 24. Ornstein DK, Rayford W, Fusaro VA, Conrads TP, Ross SJ, Hitt BA, Wiggins WW, Veenstra TD, Liotta LA, Petricoin EF III. Serum proteomic profiling can discriminate prostate cancer from benign prostates in men with total prostate specific antigen levels between 2.5 and 15.0 ng/mL. J. Urol 2004;172(4 Pt 1):1302–1305. [PubMed: 15371828] NIH-PA Author Manuscript NIH-PA Author Manuscript J Proteome Res. Author manuscript; available in PMC 2008 March 20. Abu-Asab et al. Page 8 NIH-PA Author Manuscript NIH-PA Author Manuscript Figure 1. Schematic representation of phyloproteomic analysis. The process involves two steps. The first is the algorithmic sorting of the m/z values into derived (exists in some but not all specimens) and ancestral (in all specimens); the derived values are those signifying either novel, vanished, or up and down regulated peaks. The second step is a parsimony phylogenetic analysis that groups the specimens on the basis of the shared derived values. NIH-PA Author Manuscript J Proteome Res. Author manuscript; available in PMC 2008 March 20. Abu-Asab et al. Page 9 NIH-PA Author Manuscript NIH-PA Author Manuscript NIH-PA Author Manuscript Figure 2. Phyloproteomic cladograms of three cancers: (A) pancreatic, (B) prostate, and (C) ovarian. The nodes of major clades are marked as follows: •, terminal cancer clade; ○, middle cancer clade; □, middle healthy clade; and ■, basal healthy clade. Transitional zones (TZ) are marked by bracketed arrows. J Proteome Res. Author manuscript; available in PMC 2008 March 20. Abu-Asab et al. Page 10 NIH-PA Author Manuscript NIH-PA Author Manuscript Figure 3. A phyloproteomic analysis showing dichotomous distribution of cancers into two clades. A schematic cladogram of a comprehensive phyloproteomic analysis composed of 460 specimens representing ovarian, pancreatic, and prostate cancers as well as noncancerous specimens. Specimens of every cancer type are classified into two clades: a terminal and middle, as well as transitional clades. Healthy specimens are classified into a major healthy clade and transitional clades. NIH-PA Author Manuscript J Proteome Res. Author manuscript; available in PMC 2008 March 20. Abu-Asab et al. Page 11 Table 1 The Advantages of Phylogenetic Analysis over Statistical Cluster Analysis NIH-PA Author Manuscript phylogenetic analysis cluster analysis ■ produces a classification based on shared derived similarities and reflects phyletic relationships ■ uses one algorithm for the analysis of all types of cancers ■ discriminates between ancestral and derived states; uses only derived character states (apomorphies) ■ resolves issues of parallelism (multiple independent origins) by parsimony or maximum likelihood ■ offers predictivity ■ produces a classification based on overall similarity and may not reflect phyletic relationship ■ may require a specific algorithm for each cancer type ■ does not discriminate between ancestral and derived character states; uses both ■ does not resolve issues of parallelism ■ does not offer predictivity. NIH-PA Author Manuscript NIH-PA Author Manuscript J Proteome Res. Author manuscript; available in PMC 2008 March 20. Department of Veterans Affairs Medical Center 50 Irving Street NW Washington, DC 20422 May 13, 2008 John VanMeter, Ph.D. Acting Director, Center for Functional and Molecular Imaging Georgetown University Medical Center 3900 Reservoir Road NW, Suite LM-14 Washington, DC 20057-1488 SUBJECT: Defense Center of Excellence for Psychological Health (PH) and Traumatic Brain Injury (TBI) Military Psychological Health Research – Complementary and Alternative Strategies funding opportunity W81XWH-08-PH-TBI Dear Dr. VanMeter: I am writing to express my strong interest and willingness to participate in your proposed study, entitled “Distinguishing Responders from Non-responders in a Mind-Body Treatment for PTSD.” I believe that your plan to utilize phlyogenetic methodology to distinguish treatment responders versus non-responders based on their neuroendocrine and neuroimaging biomarkers is quite novel, and deserves exploration. As a neuroendocrinologist with extensive experience in the design and conduct of both allopathic and CAM-related clinical and laboratory research, I find the use of mind body medicine as a potential treatment modality for patients with PTSD to be a promising area of research. Moreover, your project fits thematically with my consultation with Dr. Dutton and her colleagues on a newly received Concept Award from DOD to develop a mind-body intervention for use with primary care physicians in treating their veteran patients with PTSD. The study proposed herein is likely to provide further novel information regarding neural mechanisms that may help identify patients with PTSD who are likely to benefit from this type of treatment strategy. I shall participate in your proposed study as a consultant focusing on research design and data analysis, neuroendocrinology, and CAM. I shall also help to recruit subjects within the Washington DC VA Medical Center. I enthusiastically support your proposed study and look forward to participating on it with you and Drs. Mary Ann Dutton and Hakima Amri. This study will expand our ongoing collaboration regarding warrelated PTSD. Sincerely, Marc R. Blackman, M.D. Associate Chief of Staff for Research & Development Washington DC VAMC Research Service (151) 50 Irving Street Washington, DC 20422 Intellectual and Material Property Plan Rights to scientific discoveries, new techniques, or algorithms resulting from the combined efforts of the PI’s during the course of this study will be equally shared by all three and Georgetown University as dictated by policies of the Georgetown Office of Technology Transfer. Rights to the patent potential and/or commercial potential of the phylomics® algorithm were solely granted to Dr. Hakima Amri and her co-inventors as defined in her existing patent application. Statement of Work The proposed study will use a novel classification technique called ‘phylomics’ (patent pending) to identify PTSD treatment responders from non-responders based on their neurophysiological signature. Subjects will be randomized to one of two interventions: a CAM-based imagery modality called Guided Imagery (Naparstek 2004) or standard exposure therapy. This study will not only determine the efficacy of the CAM treatment for PTSD compared to and accepted therapy it will also provide scientific evidence of the changes induced by the each treatment in the neuroendocrine and neurobiological profile of the subjects. The treatment outcomes will be compared with the predictions generated by the phylomics algorithm. Thus, the proposed study has multiple endpoints each of which independently will move the study and treatment of PTSD forward and together represent a major advance in this field. Methods The experimental design centers on comparing CAM-based Guided Imagery intervention to the more standard exposure therapy. Subjects entering the study will be randomized into one of the two arms. Baseline assessments will include administration of a number of clinical assessment instruments, collection of saliva & blood specimens, and fMRI imaging. Follow-up assessment will be performed upon completion of the treatment. The salvia, blood draw, and fMRI scanning will be performed within two weeks of the last treatment session. PTSD symptom severity will be assessed using the CAPS no more than four weeks after the last treatment. Outcomes of the Study The major outcome of this study will be fourfold. First, we will determine the efficacy of Guided Imagery, a CAM-based treatment relative to exposure therapy, which is part of the Veteran’s Administration clinical practice guidelines (Clinical Practice Guideline Workgroup 2004). Second, we will be able to identify a set of biomarkers based on functional MRI, neuroendocrine, and genomic data that represents a signature of PTSD. Third, using the baseline measurements as input to a novel classification algorithm developed by Dr. Amri called Phylomics (patent pending). The phylomics algorithm separates groups of subjects based on the most parsimonious hierarchical separation on the basis of shared derived state(s). Using all of these biomarkers we expect this algorithm will be able to identify subjects who will be treatment responders from non-responders with the ultimate goal of identifying targeted treatments optimized to the individual PTSD sufferer. Human Subjects Protections The proposed project will involve studying humans at Georgetown University Medical Center. Institutional review will be obtained from Georgetown University. Additional IRB review will be required by the Washingon, DC Veterans Administration and the DOD, as the proposed studies will include subjects recruited from the VA and DOD funding will support this study. We anticipate and have planned for nine months to complete the review at all three IRB reviews. Location of Researchers Drs. VanMeter, Dutton, Amri and Amdur Georgetown Univ., Preclinical Sci Bldg, Suite LM-14, 3900 Reservoir Road NW, Washington, DC 20057-1488 Leadership Plan This proposal is a collaboration between Drs. VanMeter (Neuroloy), Amri (Physiology and Biophysics), and Dutton (Psychiatry) all three of which are PI’s. All three will provide oversight of the entire study and development and implementation of all policies, procedures and processes. In these roles, all three will be responsible for the implementation of the scientific agenda, the specific aims, and ensure systems are in place to guarantee institutional compliance for the protection of human subjects, data analysis, and facilities. Specifically, Dr. VanMeter will oversee Aim 3 (fMRI) and be responsible for all human subjects research approvals. Dr. Dutton is primarily responsible for Aim 1 (interventions). Dr. Amri will have primary responsibility for Aim 2 (neuroendocrine and proteomics/genomics) and Aim 4 (Phylomics). Dr. VanMeter will serve as contact PI and will assume fiscal and administrative management responsibility including maintaining communication among PI’s and key personnel through monthly meetings. He will be responsible for communication with the sponsor (Defense Center for Excellence) and submission of annual reports. Publication authorship will be based on the relative scientific contributions of the PIs and key personnel. Institution Name Time Outcome Georgetown Univ 2 mos. Dr. VanMeter Georgetown Univ 1 mos. Dr. VanMeter Georgetown Univ 5 mos. IRB protocol and consent forms First level IRB approval IRB approval for proposed study Incorporate any changes requested by DOD and VA and resubmit to all IRBs Dr. VanMeter Georgetown Univ 1 mos. Final IRB approval study Extension of the phylomics algorithm to work with fMRI and neuroendocrine data Develop of manuals for the interventions. Recruit and train interventionists Development and testing of fMRI stimulation paradigms Recruit and screen subjects for the first wave of intervention groups Dr. Amri Georgetown Univ 9 mos. Improved phylomics algorithm Dr. Dutton Georgetown Univ 9 mos. Manuals and Interventionists Dr. VanMeter Georgetown Univ 9 mos fMRI paradigms All Georgetown Univ 3 mos. Collect and assess neuroendocrine and genomic specimens Dr. Amri Georgetown Univ 3 mos. 26 subjects enrolled in the study Bio-samples on Wave 1 Perform baseline fMRI scanning on wave 1 Dr. VanMeter Georgetown Univ 3 mos. fMRI on Wave 1 Wave 1 Interventions Interventionists Georgetown Univ 4 mos. Follow-up of Wave 1 Drs VanMeter and Amri Georgetown Univ 1 mos. Wave 1 Completed Wave 1 Followup Review of Wave 1 results and identification of any problems All Georgetown Univ 1 mos. Wave 1 Review Phase 3B Wave 2-4 recruited, tested, and run through intervention All Georgetown Univ 18 mos. Waves 2-4 Completed Data analysis, paper writing, final report generation All Georgetown Univ 12 mos. Peer-reviewed journal papers and final report to DOD Phase 3A Phase 2 Phase 1B Phase 1A Individual Responsible Dr. VanMeter Phase 4 Tasks Project Phase Task Submit IRB protocol and consent form to Georgetown’s IRB Incorporate any changes requested and resubmit Submit modified IRB protocol and consent form to DOD and VA IRBs Impact Statement Previous studies of CAM (Complementary and Alternative Medicine) modalities to treat PTSD have demonstrated positive outcomes including studies of victims of war-related trauma in Kosovo (Gordon, Staples et al. 2004). These studies were limited by the lack of comparison to an accepted treatment modality such as exposure therapy. Furthermore, the neurological and physiological mechanisms that underlie the treatment effects have not been identified. Some of the gaps that need to be addressed in future studies of PTSD identified by the IOM (Institute of Medicine) include testing treatment efficacy using randomized control trials, investigator independence, and investigating the factors related to outcome: loss of PTSD diagnosis and symptom improvement (Institute of Medicine: Committee on Treatment of Posttraumatic Stress Disorder 2007). Our study is designed to tackle each of these issues. We propose to compare a positive mental imagery technique called Guided Imagery (Naparstek 2004) to Prolonged Exposure, which uses mental imagery to revisit the traumatic event. We expect this CAM-based treatment to reduce PTSD symptoms with an effect size that is equivalent to exposure therapy. This part of the study alone if successful will provide validation of a relatively new treatment for PTSD that is more readily implemented. A negative result for this part of the study would also be an important outcome regarding this treatment. In addition, we will collect a number of measures on each subject at baseline and immediately after the conclusion of the interventions. These will include stress hormones such as cortisol and DHEA/DHEA-s as well as genomic and proteomic data from peripheral blood samples. Further, we will investigate the neurobiology of PTSD and remission using functional MRI. Together these measures will provide a biomarker profile of PTSD from which we will be able to further our understanding of the neuronal, physiological, and genetic basis of PTSD. By examining these measures both at baseline and follow-up we will be able to identify the factors that lead to remission of PTSD symptoms. Based on these factors new treatments can be developed that target relevant neural and physiological mechanisms. Proteomic and genomic markers could be used to identify individual soldiers that would benefit from specialized pre-deployment inoculation strategies centered on stress management. Finally, this study will use a novel classification technique called phylomics developed by one of the PI’s that is based on the techniques used in genomics to separate classes of species. This algorithm, which has a patent pending, separates groups of subjects using the most parsimonious hierarchical separation on the basis of shared derived ‘state(s)’. This algorithm has been successfully used to separate out different cancer specimens from healthy tissues. Using the biomarkers colleted in this study as input, we expect this algorithm will not only identify subjects who will respond to treatment, but ultimately identify targeted treatments optimized to the individual PTSD sufferer. Thus, this study will produce three main endpoints: 1) a test of the efficacy of a CAM-based imagery treatment (Guided Imagery) against an established treatment (Prolonged Exposure); 2) further elucidate the neurological/physiological mechanisms underlying PTSD and subsequent changes related to treatment; and 3) test the ability of phylomics, a novel classification algorithm, to predict PTSD treatment responders from non-responders. Each of these results on their own has the potential to make a significant impact on PTSD treatment and further our understanding of this debilitating disorder. Combined, this study represents a unique opportunity to fundamentally change our understanding of PTSD and how to optimize treatments of individual patients. Innovation Statement The proposed study includes two major innovative components. First, we will use a novel classification technique called phylomics developed by one of the PI’s identify subjects treatment responders from nonresponders. Using the phylomics algorithm we will be able to classify subjects a priori using their baseline measures. Furthermore, this algorithm will identify sub-classes within the responder and non-responder groups (Figure 1). Second, we will collect both baseline and post-treatment data on each subject to assess their proteomic and genomic profile, neuropsychological assessments, and their underlying neurobiological and physiological state to provide a complete picture of the homeostatic state of the individual. Each of these measures has been used in isolation to provide partial representation of the factors that contribute to PTSD. We will be combining all of these data together to build a holistic framework for PTSD. Both of these components will leverage the results of the random control trial assessment of Guided Imagery (Naparstek 2004), a CAMbased treatment modality in comparison to exposure therapy, a standard treatment. The phylomics algorithm, which has a patent pending, 1: Hypothesized output of phylomics to PTSD derived separates groups of subjects using the most parsimonious Figure from the neuronal, physiological, and proteomic/genomic hierarchical separation on the basis of shared derived signature of individual subjects. state(s). This algorithm has been successfully used to separate out different cancer specimens from healthy tissues. Using the biomarkers colleted in this study as input, we expect this algorithm will be able to identify subjects who will be treatment responders from nonresponders on an a priori basis with the added ability to determine the combination of biomarkers needed to make that distinction. Beyond that high-level classification of subjects, this algorithm will generate sub-classes of subjects that are likely to have meaningful distinctions such PTSD with and without depression. Ultimately, we expect that application of this algorithm in the context of this study will lead to the ability to identify targeted treatments optimized to the individual PTSD sufferer. The measures collected on each subject at baseline and immediately after the conclusion of the interventions will include stress hormones such as cortisol and DHEA/DHEA-s as well as genomic and proteomic data from peripheral blood samples. Further, we will investigate the neurobiology of PTSD and remission using functional MRI. Together these measures will provide a biomarker profile of PTSD from which we will be able to further our understanding of the neuronal, physiological, and genetic basis of PTSD. By examining these measures both at baseline and follow-up we will be able to identify the factors that lead to remission of PTSD symptoms. Based on these factors new treatments can be developed that target those neural and physiological mechanisms. Proteomic and genomic markers could be used to identify individual soldiers that would benefit from specialized pre-deployment inoculation strategies centered on stress management. Finally, this study will produce three main endpoints: 1) demonstrate the ability of phylomics, a novel classification algorithm, to predict PTSD treatment responders from non-responders); 2) combine the neuronal, physiological, and proteomic/genomic measures collected to derive a complete picture of the mechanisms underlying PTSD and subsequent changes related to treatment; and 3) test of the efficacy of a CAM-based imagery treatment (Guided Imagery) against an established treatment (Prolonged Exposure). Each of these endpoints on their own represents pioneering advances in our understanding of PTSD and its treatment. The combination of all three endpoints provides a unique opportunity to fundamentally change our understanding of PTSD and how to optimize treatments of individual patients.