Transforming Data into Action

Transcription

Transforming Data into Action
Transforming
Data into
Action
UC CYBERINFRASTRUCTURE CONFERENCE
A longstanding objective of
biomedical informatics has been
to transform the wealth of clinical
and research data into actionable
knowledge and insights. New
cyberinfrastructure is needed to
achieve this objective, given both
the vast amount of data and the
new types of information.
What could we do if we had new cyberinfrastucture?
 Precision medicine
 We could make the processing of
clinical genomic tests faster
(decreasing time from weeks to hours)
 We could combine more electronic
health record data to drive real-world,
continually learning predictive models
for individualized screening, diagnosis,
therapeutic prediction, and prognosis
 New paradigms for healthcare delivery/research
 We could create new pervasive, wireless monitoring models
for patients outside of the clinical environment
 We could support interdisciplinary sharing and reuse of large
datasets across institutions
 Translational capacity
 We could seamlessly transfer state-of-the-art algorithms and
techniques involving large amounts of computational power
from the research environment to the hospital (direct benchto-beside)
Supporting
Data-driven
Projects
Project
Objective
Computational Challenges
Athena Project
Predictive modeling and
stratification for breast cancer
screening
Machine learning algorithms; complex
data, 15 TB of data (EHR, imaging,
genomic, outcomes)
Depression
Determining early symptoms
and genomic factors
Integrated data mining methods over
large, heterogeneous datasets, 15 PB of
data in ~4 years (EHR, imaging,
genomic, sensors, etc.).
Autism
Uncovering genomic factors
Integrated data mining, 445 TB of data
(retrospective)
Clinical genomics
Diagnostic classification
Data size; data processing (e.g.,
sequence alignment); machine learning
methods; timely analysis
Wireless
monitoring
Online real-time assessment
for monitored
environments/patients
High volume, low complexity data;
network/streaming; machine learning
algorithms
Text/image
analysis projects
Biomarker identification and
validation
Data size; data processing; machine
learning algorithms; complex data
UCLA USE CASES
Both past and current projects
are driving the need for high
performance computing and
infrastructure to support research
and clinical applications.
Translational
Informatics
Platform
COMMON NEEDS ACROSS THE USE CASES
The overarching purpose of our
planned system is to enable
biomedical research in an
environment that enables the
secure sharing of
clinical/research data, and the
seamless translation of
developed computational
methods back into the clinical
environment.
Secure HPC
environment
Optimized
research database
• Access separation
between sensitive/secure
data and non-sensitive
research data
• Memory, storage, network
• ACL permissions
• Regular audits
• Different types of queries
to find data over integrated
views
• Mixed environment
needed (e.g., SQL vs.
NoSQL systems) for
retrieval and discovery
Workflow
pipelines
Production
environment
• Data loaders (from
underlying sources), batch
loading
• Sophisticated userdesigned sequences of
pre- and post-processing
of data (e.g., Kepler)
• Clinical applications
running off of the
database, developed
algorithms
• Runtime prioritiziation
Translational
Informatics
Platform
Data sources
Clinical data sources
Image archives
Sequencing core data
Other
HIGH-LEVEL BLOCK DIAGRAM
The overarching purpose of our
planned system is to enable
biomedical research in an
environment that enables the
secure sharing of
clinical/research data, and the
seamless translation of
developed computational
methods back into the clinical
environment.
Uploaded raw
data (e.g.,
BLOBs), data
archives
Pre- and postprocessing
algorithms
(bioinformatics,
NLP, image
processing)
Workflow
execution
Online
analytical
packages (e.g.,
Matlab, R, etc.)
Secondary
database
(Cassandra)
Machine
learning
algorithms
Source-specific
data loaders
Unified, webbased user
querying/data
retrieval
interface
Primary
database
(MySQL)
Secured cloud storage in restricted network address space
Clinical user
Clinical data
(CareConnect)
Analytics,
predictive
model
execution
HPC
Hospital computing
environment/network