Ontology for Biomedical Investigations (OBI)

Transcription

Ontology for Biomedical Investigations (OBI)
Applying OBO Foundry ontologies to
model, annotate and query longitudinal
field studies on malaria
Jie Zheng1, San Emmanuel James2, Emmanuel
Arinaitwe2, Bryan Greenhouse3, Edwin Charlebois3,
Grant Dorsey3, Ja’Shon Cade1, Brian P. Brunk1, Omar S.
Harb1, David S. Roos1, Christian J. Stoeckert1
1University
of Pennsylvania, Philadelphia PA USA
2Infectious Disease Research Collaboration, Kampala Uganda
3University of California, San Francisco CA USA
4/26/2015
Biocuration 2015, Beijing, China
PRISM
• Program for Resistance, Immunology, Surveillance
and Modeling of Malaria (PRISM)
http://muucsf.org/projects/prism.html
• One of ten NIH-supported International Centers for
Excellence in Malaria Research (ICEMR)
• Aim:
– To elucidate interactions between malaria parasites, their
mosquito vectors, and human hosts using comprehensive
surveillance data
4/26/2015
Biocuration 2015, Beijing, China
PRISM-PlasmoDB Metadata Project
• Integrate PRISM cohort studies into the Plasmodium
Genomics Resource (http://PlasmoDB.org) and make
data accessible to PRISM project members and
ultimately the broader international research
communities
• PlasmoDB: a component of the Eukaryotic Pathogen
Database Resources (EuPathDB)
• EuPathDB: a NIAID Bioinformatics Resource Center
covering Eukaryotic Parasites
4/26/2015
Biocuration 2015, Beijing, China
PRISM Longitudinal Studies on Malaria
• Longitudinal cohort study following participants from
over 300 households in three regions of Uganda with
diverse demographics and transmission intensity:
– Jinja (low incidence of malaria)
– Kanunga (moderate incidence of malaria)
– Tororo (high incidence of malaria)
(Over 1000 participants in the study)
• Quarterly routine visits, plus additional sick visits
• Monthly mosquito collection in each dwelling
4/26/2015
Biocuration 2015, Beijing, China
PRISM Datasets Contain Extensive Metadata
Dwelling data (over 80 fields) [location, construction, dwelling facilitate, etc.]
• Extensive metadata
Household member data (about 20 fields) [age, sex, genotype, etc.]
Clinical visits data (about 170 fields) [lab findings, clinical history, diagnoses, etc.]
Mosquito trapping data (about 10 fields) [mosquito abundances]
Total over 280 different kinds of metadata
4/26/2015
Biocuration 2015, Beijing, China
Questions Of Interest
• Asymptomatic infection? Identify children with high
exposure but no clinical malaria symptoms.
– what is the impact of age?
– what is the impact of prior exposure?
– geographic correlates?
• Hyper-susceptibles? Children with low exposure but
multiple bouts of malaria.
– human genotypes?
– parasite genotypes?
• Families with both non-malaria and malaria children?
Are there clinical / behavioral correlations?
4/28/2015 by Dr. David Roos, EuPathDB
Biocuration
2015, Beijing,
China
Provided
Principle
Investigator
Challenges
• Extensive metadata: hard to understand what they
represent and how they are related to each other.
• How to represent metadata consistently?
• How to present the metadata for effective data
mining?
Our solution is to use OBO Foundry ontologies.
4/26/2015
Biocuration 2015, Beijing, China
OBO Foundry Ontologies
• Shared common upper level ontology, Basic Formal Ontology
(BFO) and common relations
• Orthogonal interoperable ontologies – reuse existing terms
defined in OBO Foundry ontologies
• Over 100 reviewed and candidate ontologies available to
cover various biological and clinical domains:
– Gene ontology (GO): biological process, molecular function, cell
components
– Human Disease Ontology (DOID): disease (human)
– The Drug Ontology (DRON): drug product
– Ontology for Biomedical Investigations (OBI): all aspects of an
investigation
4/26/2015
Biocuration 2015, Beijing, China
Ontology for Biomedical Investigations
• OBI is about capturing all aspects of a biological and clinical investigation
(investigation, assay, specimen, protocol, device, data, data analysis, etc.)
which provides a semantic framework to model an investigation
• Things to know
– a member of the OBO Foundry
– interoperable with other ontologies following OBO Foundry principles, such as the Gene
Ontology (GO)
– uses the Basic Formal Ontology (BFO) as its top level ontology
– uses the Information Artifact Ontology (IAO) for general information entities
• Details on OBI can be found at:
– http://obi-ontology.org
– J Biomed Semantics. 2010. Modeling biomedical experimental processes with OBI, Ryan
R Brinkman, Mélanie Courtot, Dirk Derom, Jennifer M Fostel, Yongqun He, Phillip Lord,
James Malone, Helen Parkinson, Bjoern Peters, Philippe Rocca-Serra, Alan Ruttenberg,
Susanna-Assunta Sansone, Larisa N Soldatova, Christian J Stoeckert, Jr., Jessica A Turner,
Jie Zheng, and the OBI consortium
4/26/2015
Biocuration 2015, Beijing, China
Longitudinal Field Studies On Malaria
Household data (over 80 fields)
Dwellings
Household member data (about 20 fields)
Clinical visits data (about 170 fields)
Clinical Visits
Mosquito trapping data (about 10 fields)
4/28/2015
Participants
Light Trap Assays
Biocuration 2015, Beijing, China
Applying OBI to Understand
PRISM Data And Their Relations
material entity
household
member of
process
Dwelling
quality or
information
located in
located in
mosquitos
Participant (person)
has specified input
participates in
Light Trap Assay
is about
Clinical Visit
has specified output
Information content entity
4/26/2015
Biocuration 2015, Beijing, China
is about
has specified output
data item
OBI helped to understand metadata and relations between them
(detailed modeling of dwelling and participants)
4/26/2015
Biocuration 2015, Beijing, China
Applying OBO Foundry Ontologies To
Annotate PRISM Data
• Multiple OBO Foundry ontologies are need for PRISM
data annotation
•
•
•
•
•
•
•
•
Ontology for Biomedical Investigations (OBI): assay and its outputs
Gene Ontology (GO): biological process
Protein Ontology (PRO)
Ontology for General Medical Science (OGMS)
Phenotypic quality (PATO): quality
Human Disease Ontology (DOID): disease
Human Phenotype Ontology (HPO): symptom
Drug Ontology (DRON): drug product
• Not all terms are available in the existing ontologies
4/26/2015
Biocuration 2015, Beijing, China
Data Annotation Using EuPath Ontology
• An application ontology built for supporting standardized
representation of data for EuPathDB
• Started with OBI and pulled terms available in other OBO
Foundry ontologies in a semantically consistent manner
– Only terms needed for annotation are extracted from OBO
Foundry ontologies
• Add PRISM specific terms in the ontology
– such as, CDC light trap assay, modern house, malaria
diagnosis, etc.
• Provide community preferred labels and definitions
– enable user-friendly mining of PRISM data through
PlasmoDB
4/26/2015
Biocuration 2015, Beijing, China
EuPath Ontology Provides Structured
User-Friendly Metadata
Terms in PRISM
housetype
Terms in EuPath Ontology
Terms on PlasmoDB website
index
rooftype
walltype
floortype
eaves
airbrickcat
NUMPEOP
SWATER
TFACLTY
ELECTIRC
FUELTYPE
SENERGY
HHROOMS
NUMALAND
HHMEALS
HHNUMT
HHPSF
Ontology term label
DHFCTY
4/26/2015
Biocuration 2015, Beijing, China
User preferred term label
(defined as EuPathDB
alternative term in ontology)
Metadata Is Used As A Filter To Select Samples Of Interest
Participants who has clinical visits
number from 1 to 57 times
4/26/2015
Biocuration 2015, Beijing, China
Complex Query: Find participants between 4 -11 years-old with
either asymptomatic parasitemia or symptomatic malaria and
not treated in preceeding 60 days with artemether-lumefantrine
4/28/2015
Biocuration 2015, Beijing, China
Provided by Brian Brunk
Summary
• OBO Foundry Ontologies help in metadata
standardization and category organization by:
– providing a semantic framework to understand massive
data and reveal inter-connections between them
– supporting consistent data representation
– helping in information retrieval and enabling complex
queries
4/28/2015
Biocuration 2015, Beijing, China
Acknowledgements
EuPathDB (PlasmoDB)
–
Shon Cade
OBI Consortium
Disease Ontology Developers
–
Brian Brunk
-
Lynn Schriml
–
Omar Harb
-
Elvira Mitraka
–
David Roos
Drug Ontology Developers
–
Christian Stoeckert
-
Bill Hogan
-
Josh Hanna
PRISM
–
San Emmanuel James
–
Emmanuel Arinaitwe
–
Bryan Greenhouse
–
Edwin Charlebois
–
Grant Dorsey
4/28/2015
Biocuration 2015, Beijing, China