Ontology for Biomedical Investigations (OBI)
Transcription
Ontology for Biomedical Investigations (OBI)
Applying OBO Foundry ontologies to model, annotate and query longitudinal field studies on malaria Jie Zheng1, San Emmanuel James2, Emmanuel Arinaitwe2, Bryan Greenhouse3, Edwin Charlebois3, Grant Dorsey3, Ja’Shon Cade1, Brian P. Brunk1, Omar S. Harb1, David S. Roos1, Christian J. Stoeckert1 1University of Pennsylvania, Philadelphia PA USA 2Infectious Disease Research Collaboration, Kampala Uganda 3University of California, San Francisco CA USA 4/26/2015 Biocuration 2015, Beijing, China PRISM • Program for Resistance, Immunology, Surveillance and Modeling of Malaria (PRISM) http://muucsf.org/projects/prism.html • One of ten NIH-supported International Centers for Excellence in Malaria Research (ICEMR) • Aim: – To elucidate interactions between malaria parasites, their mosquito vectors, and human hosts using comprehensive surveillance data 4/26/2015 Biocuration 2015, Beijing, China PRISM-PlasmoDB Metadata Project • Integrate PRISM cohort studies into the Plasmodium Genomics Resource (http://PlasmoDB.org) and make data accessible to PRISM project members and ultimately the broader international research communities • PlasmoDB: a component of the Eukaryotic Pathogen Database Resources (EuPathDB) • EuPathDB: a NIAID Bioinformatics Resource Center covering Eukaryotic Parasites 4/26/2015 Biocuration 2015, Beijing, China PRISM Longitudinal Studies on Malaria • Longitudinal cohort study following participants from over 300 households in three regions of Uganda with diverse demographics and transmission intensity: – Jinja (low incidence of malaria) – Kanunga (moderate incidence of malaria) – Tororo (high incidence of malaria) (Over 1000 participants in the study) • Quarterly routine visits, plus additional sick visits • Monthly mosquito collection in each dwelling 4/26/2015 Biocuration 2015, Beijing, China PRISM Datasets Contain Extensive Metadata Dwelling data (over 80 fields) [location, construction, dwelling facilitate, etc.] • Extensive metadata Household member data (about 20 fields) [age, sex, genotype, etc.] Clinical visits data (about 170 fields) [lab findings, clinical history, diagnoses, etc.] Mosquito trapping data (about 10 fields) [mosquito abundances] Total over 280 different kinds of metadata 4/26/2015 Biocuration 2015, Beijing, China Questions Of Interest • Asymptomatic infection? Identify children with high exposure but no clinical malaria symptoms. – what is the impact of age? – what is the impact of prior exposure? – geographic correlates? • Hyper-susceptibles? Children with low exposure but multiple bouts of malaria. – human genotypes? – parasite genotypes? • Families with both non-malaria and malaria children? Are there clinical / behavioral correlations? 4/28/2015 by Dr. David Roos, EuPathDB Biocuration 2015, Beijing, China Provided Principle Investigator Challenges • Extensive metadata: hard to understand what they represent and how they are related to each other. • How to represent metadata consistently? • How to present the metadata for effective data mining? Our solution is to use OBO Foundry ontologies. 4/26/2015 Biocuration 2015, Beijing, China OBO Foundry Ontologies • Shared common upper level ontology, Basic Formal Ontology (BFO) and common relations • Orthogonal interoperable ontologies – reuse existing terms defined in OBO Foundry ontologies • Over 100 reviewed and candidate ontologies available to cover various biological and clinical domains: – Gene ontology (GO): biological process, molecular function, cell components – Human Disease Ontology (DOID): disease (human) – The Drug Ontology (DRON): drug product – Ontology for Biomedical Investigations (OBI): all aspects of an investigation 4/26/2015 Biocuration 2015, Beijing, China Ontology for Biomedical Investigations • OBI is about capturing all aspects of a biological and clinical investigation (investigation, assay, specimen, protocol, device, data, data analysis, etc.) which provides a semantic framework to model an investigation • Things to know – a member of the OBO Foundry – interoperable with other ontologies following OBO Foundry principles, such as the Gene Ontology (GO) – uses the Basic Formal Ontology (BFO) as its top level ontology – uses the Information Artifact Ontology (IAO) for general information entities • Details on OBI can be found at: – http://obi-ontology.org – J Biomed Semantics. 2010. Modeling biomedical experimental processes with OBI, Ryan R Brinkman, Mélanie Courtot, Dirk Derom, Jennifer M Fostel, Yongqun He, Phillip Lord, James Malone, Helen Parkinson, Bjoern Peters, Philippe Rocca-Serra, Alan Ruttenberg, Susanna-Assunta Sansone, Larisa N Soldatova, Christian J Stoeckert, Jr., Jessica A Turner, Jie Zheng, and the OBI consortium 4/26/2015 Biocuration 2015, Beijing, China Longitudinal Field Studies On Malaria Household data (over 80 fields) Dwellings Household member data (about 20 fields) Clinical visits data (about 170 fields) Clinical Visits Mosquito trapping data (about 10 fields) 4/28/2015 Participants Light Trap Assays Biocuration 2015, Beijing, China Applying OBI to Understand PRISM Data And Their Relations material entity household member of process Dwelling quality or information located in located in mosquitos Participant (person) has specified input participates in Light Trap Assay is about Clinical Visit has specified output Information content entity 4/26/2015 Biocuration 2015, Beijing, China is about has specified output data item OBI helped to understand metadata and relations between them (detailed modeling of dwelling and participants) 4/26/2015 Biocuration 2015, Beijing, China Applying OBO Foundry Ontologies To Annotate PRISM Data • Multiple OBO Foundry ontologies are need for PRISM data annotation • • • • • • • • Ontology for Biomedical Investigations (OBI): assay and its outputs Gene Ontology (GO): biological process Protein Ontology (PRO) Ontology for General Medical Science (OGMS) Phenotypic quality (PATO): quality Human Disease Ontology (DOID): disease Human Phenotype Ontology (HPO): symptom Drug Ontology (DRON): drug product • Not all terms are available in the existing ontologies 4/26/2015 Biocuration 2015, Beijing, China Data Annotation Using EuPath Ontology • An application ontology built for supporting standardized representation of data for EuPathDB • Started with OBI and pulled terms available in other OBO Foundry ontologies in a semantically consistent manner – Only terms needed for annotation are extracted from OBO Foundry ontologies • Add PRISM specific terms in the ontology – such as, CDC light trap assay, modern house, malaria diagnosis, etc. • Provide community preferred labels and definitions – enable user-friendly mining of PRISM data through PlasmoDB 4/26/2015 Biocuration 2015, Beijing, China EuPath Ontology Provides Structured User-Friendly Metadata Terms in PRISM housetype Terms in EuPath Ontology Terms on PlasmoDB website index rooftype walltype floortype eaves airbrickcat NUMPEOP SWATER TFACLTY ELECTIRC FUELTYPE SENERGY HHROOMS NUMALAND HHMEALS HHNUMT HHPSF Ontology term label DHFCTY 4/26/2015 Biocuration 2015, Beijing, China User preferred term label (defined as EuPathDB alternative term in ontology) Metadata Is Used As A Filter To Select Samples Of Interest Participants who has clinical visits number from 1 to 57 times 4/26/2015 Biocuration 2015, Beijing, China Complex Query: Find participants between 4 -11 years-old with either asymptomatic parasitemia or symptomatic malaria and not treated in preceeding 60 days with artemether-lumefantrine 4/28/2015 Biocuration 2015, Beijing, China Provided by Brian Brunk Summary • OBO Foundry Ontologies help in metadata standardization and category organization by: – providing a semantic framework to understand massive data and reveal inter-connections between them – supporting consistent data representation – helping in information retrieval and enabling complex queries 4/28/2015 Biocuration 2015, Beijing, China Acknowledgements EuPathDB (PlasmoDB) – Shon Cade OBI Consortium Disease Ontology Developers – Brian Brunk - Lynn Schriml – Omar Harb - Elvira Mitraka – David Roos Drug Ontology Developers – Christian Stoeckert - Bill Hogan - Josh Hanna PRISM – San Emmanuel James – Emmanuel Arinaitwe – Bryan Greenhouse – Edwin Charlebois – Grant Dorsey 4/28/2015 Biocuration 2015, Beijing, China