Karmann Mills1, Anthony Hickey1, Alexander Tropsha** 1RTI

Transcription

Karmann Mills1, Anthony Hickey1, Alexander Tropsha** 1RTI
Karmann Mills1, Anthony Hickey1, Alexander Tropsha**
1RTI International
**University of North Carolina at Chapel Hill
nanoHUB Users Meeting
September 1, 2015
Comprehensively
curated, validated data
on a scale suitable for
decision making
Web Address:
www.nanomaterialregistry.org
Funded by:
What you can expect to find in the Nanomaterial Registry
DATA DOMAIN
DOMAIN: CHARACTERIZATION + STUDY DATA
Property
Measurements
Reference
Materials
Material and
system
properties.
Benchmark
Data.
Meta Data
Ecological
Impact on the
natural
environment.
Medical
The data about
the data.
Toxicology
potential toxicity
of
nanomaterials.
Drug delivery,
nanomedicine
applications.
Validated both
programmatically
and by a team of
scientists
Integrated
through controlled
vocabulary and
data formatting
Relevant growing
body of data that
is available to the
public
LOOKING ACROSS THE CHARACTERIZATION DATA
MEASUREMENT
PROPERTIES
Reference
Materials
Medical
Applications
Meta Data
Toxicological
Studies
Environmental
Studies
•
•
•
•
•
•
•
Particle Size
•
Size distribution
•
Surface area
•
Shape
•
Composition
•
Aggregation/
Agglomeration state
Purity
Surface chemistry
Surface charge
Surface reactivity
Solubility
Stability
PCC Data
12 Physical and
Chemical
Characteristics
Protocal &
Parameters
Information about
instrumentation
settings
Best Practice
Questions
Meta data about
measurement
technique
A controlled vocabulary of PCC & measurands have been identified
(https://www.nanomaterialregistry.com/resources/Glossary.aspx)
Minimal Information About Nanomaterials
PCC
PCC
Composition
Measurement
Measurement Type
Type
DLS Protocols and
Technique
Parameters (for DLS)
Parameters
Best Practice
Questions
Shape
Size
Algorithm
used
Size
Distribution
Surface Area
Diameter
Medium
type
Viscosity
pH of
suspension
Aggregation/
Agglomeration
State
Surface
Chemistry
Size
Size
Surface
Reactivity
Surface
Charge
Mean Hydrodynamic
Diameter
Mean
Aerodynamic
Diameter
Molecular
Weight
Sample
conc.
Temperature
of
suspension
Dispersing
agent
Were replicates done?
Were protocols used?
Purity of
dispersing
agent
Mean
Hydrodynamic
Diameter
Number Avg.
Molecular
Weight
Dispersing
agent
Concentration
of dispersing
agent
Sample
concentration
Medium
type
Stability
Weight Avg.
Molecular
Weight
pH
Dispersing
agent
Were controls used?
Was instrument calibrated?
Solubility
Volume Avg.
Molecular
Weight
Ionic
Strength
Purity
Nanomaterial
State
… + 9 more
Parameters
Sonication/
Milling
power
Sonication/
Milling time
LOOKING ACROSS TECHNIQUES
(blank)
318
X-ray Diffraction
13
UV-Visible Spectroscopy
10
Transmission Electron Microscopy
274
Spectrophotometry
4
Small Angle X-Ray Scattering
3
Particle size is
frequently reported
without technique
(LOW DATA QUALITY RATING)
Scanning Electron Microscopy
86
Raman Spectroscopy
1
Nanoparticle Tracking Analysis
6
Matrix-assisted Laser Desorption/Ionization -… 4
Laser Diffraction Spectrometry
7
Field Flow Fractionation
1
Electrophoretic Light Scattering
1
Dynamic Light Scattering
360
Differential Mobility Analysis
10
Chromatography
3
Brunauer
4
Atomic Force Microscopy
31
0
Techniques used to measure
particle size
50
100
150
200
250
300
350
Dynamic light
scattering is the
most curated
technique
400
LOOKING ACROSS PARAMETERS
Viscosity
50
Temperature of Suspension
136
Sonication/Milling Time
53
Sonication/Milling Power
49
Solvent/Medium Type
336
Sample Concentration
Purity of Dispersing Agent
1
pH of Suspension
92
Dispersing Agent
79
Concentration of Dispersing Agent
57
Algorithm Used
111
0
Parameters reported for
Dynamic Light Scattering
Solvent Type is the
most frequently
reported parameter
for DLS
107
50
100
150
200
250
300
350
400
Other DLS parameters
that are collected, but
not shown here,
include
 scattering angle
 wave length
 index of refraction
Compliance Levels in the Nanomaterial Registry
DATA QUALITY
DATA QUALITY METRIC
The Nanomaterial Registry’s COMPLIANCE LEVEL FEATURE provides a METRIC on the
QUALITY of characterization of a nanomaterial entry
Compliance Level
Score
Gold
76-100
Silver
51-75
Bronze
26-50
Merit
0-25
Medal
COMPLIANCE LEVELS are
broken into MERIT,
BRONZE, SILVER, and
GOLD and represent
increasing quality of
characterization based on
our evaluation criteria
A COMPLIANCE LEVEL SCORE is a quantitative value
calculated by an algorithm
MORE INFORMATION:
https://www.nanomaterialregistry.com/about/HowIsComplianceCalculated.aspx
NANOMATERIAL REGISTRY @ nanoHUB
DATA EXPORT
• The Nanomaterial Registry offers
a data export that focuses on
nanomaterial characterization
data
• This is currently a limitation on
the questions users can ask of
the Registry’s data
NANOMATERIAL REGISTRY PORTAL
Valuable partnership with nanoHUB
1. The Registry’s own website is designed for use as a reference library
2. nanoHUB offers data handling that appeals to the predictive modeling community
nanoHUB received an NSF stipend to participate in case studies of data hosting with RTI
International (Nanomaterial Registry) and the Center for the Environmental Implications
of NanoTechnology (CEINT) at Duke University
Registry Portal is now available and offers the Nanomaterial Registry’s authoritativelycurated data in a format useful for sorting and filtering by modeling researchers
https://nanohub.org/groups/nanomaterialregistry
NANOMATERIAL REGISTRY PORTAL
Nanomaterial data records with associated bio/env assay studies were exported from the
Nanomaterial Registry and uploaded to the Registry Portal at nanoHUB
Assertion terms can be selected
to build your data subset.
NANOMATERIAL REGISTRY PORTAL
Nanomaterial data records with associated bio/env assay studies were exported from the
Nanomaterial Registry and uploaded to the Registry Portal at nanoHUB
• Data are presented in a spreadsheet style for easy sorting and filtering
• Data can also be exported
NANOMATERIAL REGISTRY PORTAL
Data visualizations available at
this time:
• Bar chart
• Heat map
Custom columns can be created with
the help of data visualizations
NANOMATERIAL REGISTRY PORTAL
The Registry portal can be reached from the Registry homepage or by searching for the
Nanomaterial Registry at nanoHUB’s website
www.nanomaterialregistry.org
DATA MODELING
Growth in publications on nanomaterials from 1981
(1 paper) to 2014 (16394 papers)*
number of papers
20000
18000
16000
14000
12000
10000
number of papers
8000
6000
4000
2000
*Data based on Pubmed analysis
2015
2014
2013
2012
2011
2010
2009
2008
2007
2006
2005
2004
2003
2002
2001
2000
1999
1998
1997
1996
1995
1994
1993
1992
1991
1990
1989
1986
0
The DATA and data analytics interplay:
informatics vs. nanomaterials (Ngram Viewer)
Informatics
Nanomaterials
2010 Oct 26;4(10): 5703-12.
Comb Chem High Throughput Screen. 2011 Mar 1;14(3):217-25.
Thousands of molecular descriptors are
available for organic compounds
constitutional, topological, structural, quantum
mechanics based, fragmental, steric, pharmacophoric,
geometrical, thermodynamical conformational, etc.
- Building of
models using
machine learning methods (NN,
SVM, etc.)
- Validation of
models
according to numerous statistical
procedures and their
applicability domains.
Fourches D, Pu D, Tropsha A. Comb Chem High Throughput Screen. 2011 Mar 1;14(3):217-25]
ALL PARTICLES HAVE THE SAME CORE
BUT DIFFERENT SURFACE MODIFIERS
Classical molecular descriptors (e.g.,
Dragon, MOE, SiRMS) can be computed
for a single molecule that represents the
surface modifiers of a particular
nanoparticle.
ALL PARTICLES HAVE DIFFERENT
CORES/ARCHITECTURES
Shaw et al., PNAS, 2008, 105, 7387-7392
Currently, only experimental characteristics
can be used as descriptors.
Need for developing new
computational descriptors
(composition, structure, shape,
electronic properties)
ACS Nano, 2010, 4(10), 5703-5712.
109 surface modifiers; 5-fold external cross-validation; kNN and 2D Dragon descriptors
Without Applicability Domain
MAE = 0.18
R02 = 0.72
2
R0 = 0.77
MAE = 0.17
Coverage = 80%
With Applicability Domain
Chau and Yap, RSC Adv., 2012, 2, 8489-8496.
105 surface modifiers
Naïve Bayes, Logistic Regression, kNN, SVM; 5-fold external cross-validation; 2D PaDEL descriptors
Classification Sensitivity = 86.7 - 98.2% Specificity = 67.3 - 76.6%
Ghorbanzadeh et al., Ind. Eng. Chem. Res., 2012, 51 (32), 10712–10718.
109 surface modifiers
MLR, Artificial Neural Networks; 2D Dragon descriptors; one external set of 10 surface modifiers
MLR: R=0.755, SE=0.30
ANN: R=0.943, SE=0.22
Chandana Epa et al., Nano Lett., 2012, 12 (11), 5808–5812.
108 surface modifiers
MLR, Artificial Neural Networks; 2D Dragon descriptors; one external set of 21 surface modifiers
MLR: R2=0.79, SE=0.24
ANN: R2=0.54, SE=0.28
Toropov et al., Chemosphere, 2013, 92 (1), 31–37.
109 surface modifiers
Coral/Monte Carlo approach; 2D Coral descriptors; five validation sets of 15-20 surface modifiers
Training: R2=0.69-0.76, MAE=0.17-0.21
Validation: R2=0.80-0.93, MAE=0.11-0.15
Modeling of NPs for Protein Binding*
Different surface
modifiers
were
introduced at the
R1, R1’ and R2
position
84 surface-modified CNTs
Protein
Binding
Bovine
Serum
Albumin
BSA
Carbonic
Anhydrase
CA
Cytotoxicity
Chymotrypsin
CT
Hemoglobin
Acute
Toxicity
HB
*Zhou et al. Nano Lett., Vol. 8, No. 3, 2008
Immune
Toxicity
18
Number of CNTs
16
14
12
10
8
Non-Binders
Threshold at 2.00
QNAR Modeling of Carbonic Anhydrase Binding
Binders
6
4
2
0
[0.53, [1.01, [1.48, [1.96, [2.43, [2.91, [3.39, [3.86, [4.34, [4.81,
1.01) 1.48) 1.96) 2.43) 2.91) 3.39) 3.86) 4.34) 4.81) 5.29)
Carbonic Anhydrase Binding
Each CNT is represented by a single copy of its surface molecule.
Consensus modeling approach combining different machine learning methods (k
Nearest Neighbors, Support Vector Machines and Random Forest) and different
types of chemical descriptors (Dragon and MOE).
QNAR Modeling of CA Binding
Sens.
F1
Spec.
Accr.
Sens.
F2
Spec.
Accr.
Sens.
F3
Spec.
Accr.
Sens.
F4
Spec.
Accr.
Sens.
F5
Spec.
Accr.
Sens.
Total
Spec.
Accr.
kNN-Dragon
SVM-Dragon
RF-Dragon
kNN-MOE
SVM-MOE
RF-MOE
0.70
0.83
0.75
0.80
1.00
0.88
0.88
0.63
0.75
0.86
0.67
0.75
0.63
0.64
0.63
0.77
0.73
0.75
0.70
0.83
0.75
0.60
1.00
0.75
0.75
0.44
0.63
0.86
0.56
0.69
0.63
0.64
0.63
0.70
0.68
0.69
0.70
0.83
0.75
0.80
1.00
0.88
0.75
0.75
0.75
0.86
0.67
0.75
0.63
0.55
0.58
0.74
0.73
0.73
0.70
0.83
0.75
0.80
1.00
0.88
0.50
0.63
0.56
0.86
0.67
0.75
0.63
0.45
0.53
0.70
0.68
0.69
0.70
0.67
0.69
0.70
0.67
0.69
0.63
0.50
0.56
0.86
0.44
0.63
0.50
0.64
0.58
0.67
0.58
0.63
0.70
0.83
0.75
0.80
1.00
0.88
0.75
0.50
0.63
0.43
0.67
0.56
0.63
0.55
0.58
0.67
0.68
0.67
Computer-aided design of novel carbon nanotubes with desired
biological properties
(in collaboration with Dr. Bing Yan, St. Jude St. Jude Children's Research Hospital)
Similarity Search
Consensus QSAR models
VIRTUAL
SCREENING
240,000 in silico
designed small
molecules which
are considered
attachable
to CNTs
~102
molecules
QNAR Predictions
Non-bind: 110 cpds
462
molecules
Binders: 352 cpds
Nontoxic: 189 cpds
Toxic: 273 cpds
Experimental
Validation
Experimental Validation (Cytotoxicity assay)
Hits that were predicted as non-toxic
Cell Viability (%)= 100
treatment / control (percentage)
ID
Average
STDEV
Rep1
Rep2
Rep3
Rep4
II-1 (1831)
55
53
60
63
58
5
II-2 (48660)
57
62
62
63
61
3
All rationally
synthesized,
3
61
65 prioritized,
59
60
61
3
53
54
61
57
56as non-toxic
and tested
CNTs
predicted
2
57 confirmed
58
61
57
58
were
experimentally.
II-3 (1860)
II-4 (13031)
II-5 (11236)
II-6 (153907)
79
65
61
56
65
10
II-7 (153852)
73
72
66
62
68
6
II-8 (39260)
67
67
70
82
72
7
II-9 (13860)
72
67
66
67
68
3
II-10 (48636)
54
53
63
65
59
6
Experimental Validation (Cytotoxicity assay)
Hits that were predicted as toxic
Cell Viability (%)= 100
treatment / control (percentage)
ID
Average
STDEV
Rep1
Rep2
Rep3
Rep4
II-11 (170243)
38
29
38
52
39
9
II-12 (170217)
49
50
38
58
49
8
7
39 1043rationally
48
54 prioritized,
46
6 out of
5
44
46
53
54CNTs
49 predicted
synthesized,
and
tested
8
50
43
63
51
52
as toxic were confirmed experimentally
II-13 (170226)
II-14 (141618)
II-15 (126818)
II-16 (154018)
44
49
47
24
41
11
II-17 (4218)
44
51
54
56
51
5
II-18 (135817)
41
44
53
60
49
9
II-19 (120618)
47
43
66
62
55
11
II-20 (135018)
40
43
59
60
50
10
DATA SHARING
DATA SHARING
“Beginning in January 2015, we will start asking authors of all life
sciences submissions that are sent out for peer review to complete
relevant portions of the same checklist (available at
http://www.nature.com/authors/policies/checklist.pdf), and will
make the document available to referees during review.”
BUILDING the D to K Continuum
Requirements/Standards/Needs
Taxonomies, Ontologies, Minimum Reporting,
Philosophies of Reporting/Sharing
OECD
ISO
CODATA/VAMAS – Uniform
Descriptor System
CoR’s
MinChar
Funders
Enforcement in Req’s/Std’s
NR-Dependent
Users
Journals
Convergence of Data and Knowledge
Tools
Minimal Information
Data Curation
Data Creation/Collection
BUILDING THE BRIDGE FOR NANOTECHNOLOGY
TOOLS
Structured Data
Management
ID and leverage
exsisting
cyberinfrastuture and
project efforts.
USERS
DOMAIN
Regulatory Risk
Assessors &
Researchers
OHS,
Theraputics,
Research
Generate questions
that inform tool
development
From multiple stakeholder
groups, e.g. government,
academia
BRIDGE
CHAOTIC
DATA
APPLICATIONS
CHALLENGE: STREAMLINING DATA COLLECTION
The current workflow for data sharing is not streamlined
PUBLICATION WORKFLOW
CURATION WORKFLOW
Primary Data Collection
PDF of Peer Reviewed Paper
Possible use electronic notebooks
From scientific journal
Data Formatting
Manual Data Extraction
For Publication in the literature
And conversion to electronic format
Publication Process
Data Curation
Submission and review
Using process and webbased tools
PDF of Peer Reviewed Paper
Data Available through Portal
For scientific journal
Data is validated and published
PURPOSE: GROW THE DATA REPOSITRY
Primary data generation funded by many federal
grants: endorse data sharing via NR!
37
Nanomaterial Registry as an important tool for
compliance with the NIH Data Sharing Policy
Department of Health and Human Services Part 1. Overview Information
Funding Opportunity Title
Centers of Cancer Nanotechnology Excellence (CCNE) (U54)
Funding Opportunity Announcement (FOA) Number: RFA-CA-14-013
The applicants are encouraged to consider using, as appropriate, various
relevant NCI-supported resources described below.
Nanotechnology-related informatics.
….
NCI supports the Nanomaterial Registry (https://www.nanomaterialregistry.org/),
which archives research data on nanomaterials and their biological and
environmental implications from a broad collection of publically available
nanomaterial resources.
THANK YOU!
www.nanomaterialregistry.org
www.nanomaterialregistry.com
[email protected]