Big Data - Internet2

Transcription

Big Data - Internet2
OPEN &PREDICTIVE Biology with KBase
Adam Arkin
Water supply crises
Food Shortage Crises
Extreme Volatility in Energy and Agricult
ulnerability to pandemics
Rising Greenhouse Gas Emiss
Impact
Antibiotic-resistant bacteria
Land and Waterway misuse
Rising Rate of Chronic Disease
orld Economic Forum Top 10 Emerging Technolog
nformatics for adding value to information
Synthetic biology and metabolic engineering
Green Revolution 2.0 – technologies for increased food and biomass
Nanoscale design of materials
Systems biology and computational modeling/simulation of chemical and
biological systems
Utilization of carbon dioxide as a resource
Wireless power
High energy density power systems
Personalized medicine, nutrition and disease prevention
Enhanced education technology
AN INFLECTION POINT IN:
!  DATA SIZE AND DIVERSITY
Biology Workflows Are Complex and Idiosyncratic
Difficult to Reproduce; Hard to SHARE; Data input and parameter
The age of whole cell modeling has begun
Image credit:Covert Lab
Analysis OE mission: predict, control and design the biological mponents of energe0c processes and environmental bala
omplex missions with rapidly expanding, intricately related
verse data types requires a mean to augment scien0sts’ ab
: Filter informa0on Focus aCen0on Ask the right ques0ons Leverage other minds DOE Systems Biology Knowledgebase
KBASE Data and modeling for
predictive biology
an emerging software and data environment designed to enable
researchers to collaboratively generate, test and share new
hypotheses to predict functions and behaviors of biomolecules,
cells, organisms and communities and data to support them.
an open, extensible framework for secure sharing of data, tools, and
scientific conclusions in predictive and systems biology.
a scalable computational platform for cloud computing for data and
model-intensive computational biology supported by a growing datamodel capturing evidence-based assertions about biological
structure/function.
tributed Development our major Na0onal Labs and more han 10 collabora0ng universi0es/
esearch centers 0 people with diverse cultures and xper0se and with different ns0tu0onal and group alliances nthusiasm to create a system to evolu0onize understanding and pplica0on of biological systems. eliver version 1 in 18 months and rove it can support DOE scien0sts! Building an airplane in the Base provides an open, extensible framework
r secure sharing of data, tools, and scientific
onclusions in predictive and systems biology.
Base drives data through models to
redictions and experimental design.
Base accelerates reproducible, reusable, and
ansparent science.
Base deeply enables scientists to work
gether to approach complex biological
oblems. Base gives credit where it is due and privacy
here it is needed. Base is an open software and data
nvironment to which others can contribute and
ith which others can build. Base democratizes access to scalable
ompute, network and storage
KBase
Lowering Barriers to Scientific
Revolution
Open Access, Sharing, and instant
publishing
Leveraging minds and crowd
sourcing solutions
Reproducible/Transparent science
and technology
Learn by Watching, Copying and
Doing
control and provenance of ones
data/Tools
increasing number of data warehouses since biology is becoming a big data discipline BI, Ensembl, etc.) cialized applica0ons and databases for rela0vely generic analyses (e.g., MG-­‐RAST and
robesOnline) lving libraries of sophis0cated computa0onal biology algorithms for use in programm
ironments (e.g., Bioconductor) rkflow tools that allow the chaining of these algorithms together by non-­‐programmer
axy and Taverna) rkflow sharing tools to allow people to use each other’s work products en-­‐access publica0on of journal ar0cles with increasing use of seman0c tagging en0fic social networks (e.g., ResearchGate, Epernicus, etc. ) Driving&data&towards&dynamic&models&of&func;on&
ase&Workflow&Model&
ModelObased&Analysis&&&Eng.&
Compara;ve&Analysis&of&
Metabolic,&Regulatory,&and&Community&Network&Inference&
Predic;ons&
Func;onal&Inference&in&Genomes&and&Metagenomes&
α&
Inference of gene structure and ota0on by homology β&
Direct measurement and f&Guilt-­‐by-­‐Associa0on Func0on Inference γ&
p&
h&
m&
q&
r&
i&
e&
Behavioral Predic0on And Design ε&
g&
c&
d&
Inference of networks δ& sugges0ons for and hole-­‐filling n&
u&
v&
Best&
Tested&
Models&
Most&
“Useful”&
Data&
Knowledge&
Most&
Successful&
Driving&data&towards&dynamic&models&of&func;on&
Base&Workflow&Model&
ModelObased&Analysis&&&Eng.&
Compara;ve&Analysis&of&
Metabolic,&Regulatory,&and&Community&Network&Inference&
Predic;ons&
Func;onal&Inference&in&Genomes&and&Metagenomes&
α&
Inference of gene structure and ota0on by homology &
β&
Direct measurement and f&Guilt-­‐by-­‐Associa0on Func0on Inference γ&
p&
h&
Measures of Confidence and Quality m&
i&
e&
q&
u&
v&
Knowledge&
Clearinghouse of formal Predic0ons/Hypotheses Most&
Successful&
Protocols&
s&
w&
User Communi0es k&
t&
Best&
Tested&
Models&
Most&
“Useful”&
Data&
r&
n&
j&
Behavioral Predic0on And Design ε&
g&
c&
d&
Inference of networks and δ&sugges0ons for hole-­‐filling x&
User&
Input&
Compara;ve&Analysis&o
Predic;ons&
Inference Metabolic,&Regulatory,&and&Community&Network&Inference&
α&
f&
δ&
γ&
β&
g&
p&
c&
d&
Measures o
ε& and Q
h&
m&
n&
j&
β&
Most&
User Com
Successfu
Protocols
s&
o&
t&
Predic0ons
Knowle
w&
k&
l&
Best&
Tested&
Clearing
Models&
Most&
“Useful”&
Data&
r&
i&
e&
q&
u&
v&
x&
User&
Input&
erything is a model with predic0ons, confiden
orma0on, weighted by quality metrics, opagates to update models ving to link molecular measurement to enotypic outcome ul0mately mble development and use… Base drives data through models o predic0ons and experimental esign. Base accelerates reproducible, eusable, and transparent science. Base deeply enables scien0sts to work together to approach complex iological problems. Base gives credit where it is due nd privacy where it is needed. Base is an open sofware and data nvironment to which others can ontribute and with which others an build. ess Services a
parently access mul0ple heterogeneous ets and bioinforma0cs tools. ently annotate new microbial genomes nfer metabolic and regulatory networks. form network inferences into metabolic ls and map missing reac0ons to genes novel data reconcilia0on tools. n effec0ve sequencing strategies for lex mul0-­‐sample metagenomic projects microbial ecological hypotheses through omic and func0onal analysis of quality-­‐
sed metagenomic data ct plant gene func0on and molecular otype via naviga0on and analysis of -­‐specific co-­‐expression networks. ver gene0c varia0ons within plant a0ons and map these to complex ismal traits. !"#$"%&'%()*"&+%,-,(
'#$$%!$"&
(#$$%!$)&
!$#$'&
!$#$$+&
!$#$$,&
!$#$$-&
$#$$%*$$&
!$#$$"&
$&
!(#$$%!$)&
!'#$$%!$"&
!'#($%!$"&
!"#$$%!$"&
$#$$
le goals are to: onstruct and predict metabolic and e expression regulatory networks to ipulate microbial func0on ly increase the capability of the n0fic community to communicate u0lize their exis0ng data ble the planning of effec0ve riments and maximizing erstanding of microbial system func0on mpute e-­‐scale embly uta0ons e-­‐scale ST and uence nment tabolic odel pfilling tabolic odel ncilia0on Genome sequence Assembly Workspace Genome sequence Cen
R
se
Annota0on Genome Annota0on R
an
Metabolic model reconstruc0on Metabolic model R
Phenotype simula0on Predicted phenotypes R
ph
Phenotype Reconcilia0on Uploaded exp. data Reconciled model Co
g
Co
ph
Genome sequence Large-­‐scale query for genomes hylogene0c distance o'cholerae'
da'
Ruthia'magnifica'
DNA$
sequence$
Denitrovibrio'ace3philus'
Tailored$
model$with$
phenotype$
predic8ons$
UNIFIED-TEMPLATE-MODEL-
la'parvula'
Extract annota0ons/
models Sphingomonas'sp.'
Biomass!
Nitrobacter''
winogradskyi'
Construc0on applica0on of templates Chlamydophila''
felis'
Flavobacteriales''
bacterium'
Bifidobacterium'bifidum'
Neisseria'meningi3dis'
g “completeness” of understanding Sta0s0cal analysis of data Workspace Genome sequence Cen
R
se
Genome Annota0on R
an
Metabolic model R
Predicted phenotypes R
ph
Uploaded exp. data Reconciled model Co
g
Co
ph
utrients required for biosynthesis oten0al nutrients that it growth g models and growth behavior of ul0vable microbes me of interest microbe-­‐microbe ency or microbe-­‐
ndency hypotheses Genome sequence Assembly Annota0on Modeling Regulatory network reconstruc0on Predic0on of culture condi0ons Compare close cul0vable organisms Workspace Genome sequence Genome Annota0on Metabolic model Cen
R
se
R
an
R
Regulatory network R
n
Predict media Co
g
Co
ph
• 
Nathan Price’s probabilis0c regula0
of metabolism (PROM) integrates metabolic models a func0onal data
make beCer predic0ons of growth gene0c or transcrip0on varia0on. • 
Difficult to access and for people to
• 
Previously applied to just two organisms. • 
Now a KBase service and can be applied to any genome for which th
is expression or varia0on data. • 
Tes0ng on Shewanella oneidensis M
with transcrip0on, TF knockout and
growth/fitness data available easily
KBase. Automated processing and analysis of metagenomic data (16s/
18s, shotgun metagenome, meta ranscriptome) ncorpora0on of MGRAST and QIIME func0onality Novel sequence QC pipelines (DRISEE) Evidence-­‐based design of metagenomic experiments Who$are$they?$
What$are$they$doing?$
KBase Communities Help Hofmockel With Soil
Kirsten Hofmockel used KBase services to analyze and compare metagenomes from different sized soil aggregates across different crop treatments. mmunity composiDon differs oss aggregates Gene abundance and enzyme acDvity correl
only in microaggregates for cellulases. Poplar
Sorghum
Miscanthus
Arabidopsis
Chlamydomonas Brachypodium
Switchgrass
KBase
Fastq&
BWA&
BWA&
BWA&
Filter&
Filter&
Filter&
Novo&
Novo&
Novo&
Hydra&
lign & call SNPs from 35M 80bp (14Gbp) reads with maize genome (zmb73v
Identified 372k high confidence SNPs
onfig ow0e2 Serial 1 core (1 node) 45 h* MulDcore KBase Clou
44 core (1 node) 1h 10m 118 core
(15 nodes
23 m
rt 2 hr 2 hr N/
mtools 2 hr 2 hr 12 m
50h* 5h 10m 35 m
d-­‐to-­‐End fig w0e2 t mtools -­‐to-­‐End Align & call SNPs from 131 maize samples
1TB fastq / 408Gbp input data
Serial 1 core (1 node) 1311 hr* KBase cloud KBase C
210 cores (15 nodes) 19.5 hr 854 c
(61 no
58 hr* N/A 58 hr* 3.5 hr 1
1427 hr* 23 hr 6
(small) riation in Lignin Composition and Content
me variaDon contained in naDve populaDons of Populus cted in common garden experiments are linked to genes using ciaDon GeneDcs ble a a0on GWAS Analysis Phenotype the popula0on Create a SNP library : Glucose-­‐xylose release  
Pink objec0ves are the lignin biosynthe0c pathway genes works-based knowledge discovery
§  Gene-ontologies matche
works-based ontologies
datasets
Algorithms and UI tools
search and analyze bes
matched network
components for a user
specified gene set
ontologies
ate building predictive models
III. Infrastructure for scientific social n
orks-based reliable orthologs
§  Networks of scientific communities an
nt genomes expression based func0onal orthologs utwil et al. Plant Cell 2011) crobial genomes localiza0on based reliable orthologs Support narra0ve interface. Social networks
connec0ng users and joint projects §  Networks of algorithms similarity
Allows KBase to diversify its menu of algorith
avoid algorithms producing nearly iden0cal re
§  Networks quality control
Assign quality measures to KBase networks to
healthy compe00on between algorithms and
sets: collected, processed, classified
main Dataset sources Datasets 7 5099 6 46 1 1 i0es Network types REGULATORY_NETWORK CO_F
PROT_PROT_INTERACTION METABOLIC_SUBSYSTEM CO_EXP
FUNCTIONAL
FUNCTIONAL_ASSOCIATION PHYLOTYPE_
orks API: provide heterogeneous networks in unified format
GENE-CLUSTER
network
ENE
MIXED
network
INTEGRATED
network
PPI
re
Networks build methods
dFirstNeighborNetwork( ing> datasetIds, ing> en0tyIds, Network buildInternalNetwork( list<string> datasetIds, list<string> geneIds, Datasets management metho
list<Dataset> allDatasets() list<DatasetSource> allDatasetSour
list<NetworkType> allNetworkTyp
exploration and building
components
powered by the developed Networks API
e clusters associated
genes from dock panel
tive members of the
e network component
ked genes)
three genes from Fatty acid degradation pathway
SEED Subsystems
RegPrecise regulons
PPI compl
add to dock, and
restart
fadB, fadD, fadI, fadJ
genes
selected cluster
erated infrastructure and 10GBit/s transfer capabili0es. uilt for high speed data transfer over ESNET using 100 GBit/s rates. abytes data storage and 2000 cores for data processing including int
etween high performance compu0ng and cloud computa0onal reso
Base Magellan has 12,000 cores for data processing via both Open Stack Cloud nd Cluster Services Base has >3 Petabytes of storage capacity opment of core Knowledgebase integrated data and workflow analy
gement tools including Applica0on Programming Interfaces, seman0
nterfaces. tegrated KBase API specified and opera0onal. Used by third par.es to integrate
nd build apps. tegrated data model aware of 925 data types encompassing sequence reads, c
enomes, genome features, transcrip0on data, fitness data and more. 0 Interface descrip0on documents leading for 821 func0ons that can be compile
se for PERL, Python, Java, and R. ototype Search, Workflow and Novel Narra0ve/Notebook interfaces for naviga
nalyzing and building knowledge in KBase microbial systems, from 100-­‐1000 microbes: Reconstruct and Predict Metabolic and Gene Expression Regulatory Net
pulate Microbial Func0on Metabolic and regulatory reconstruc0ons for 5534 prokaryo0c and 161 archaeal genomes 7830 genome annota0ons, 23,058,670 features predicted 12,620 regulons with 266,345 protein families inferred 4985 metabolic models including a total of 16,196 compounds and 13,428 reac0ons 6202 growth curves, 1,947,690 strain fitness measurements; 3227 gene expression data sets Services for assembly, annota0on, phylogenomics, regulatory and metabolic networks inference, FBA and PROM modelin
metabolism, reconcilia0on and improvement of models against data plant systems, for 10 key plants related to DOE missions: Integrate Phenotypic and Experimental Data and Metadata
ass Proper0es from Genotype and Assemble Regulatory Data to Enable Analysis, Cross-­‐ Comparisons, and Modeling
Over 175 eukaryo0c genomes including many variants of Poplar, Arabidopsis, Sorghum, Chlamydomonas, Brachypodium,
Switchgrass as well as many other algae and fungi. Phenotypes for genome variants of plants and services for calling the gene0c varia0on among individuals. Services for varia0on calling, mapping genotype-­‐to-­‐phenotype via GWAS style analysis and tools for candidate gene filter
modeling, and pathway enrichment, 731 gene expression experiments in Arabidopsis and Poplar; Plant co-­‐expression network analysis for all. Ini0al plant metabolic modeling Microbial Communi0es: Model Metabolic Processes within Microbial Communi0es and Mine Metagenomic Data to I
nown Genes Access to 11,000 metagenomes(>21 TBp) Integrated KBase access to QIIME func0onality New tools for metagenome sequence quality assessment and experimental design Services for taxonomic and gene iden0fica0on, abundance, and a host of other func0ons KBase Platform!
Abstrac0ons & Models User Interfaces Programming APIs Data Stores KBase Infrastructure
•  Compute Servers
•  Data Servers •  Networking •  Security •  Cloud Functional
Website!
Sitemap!
IRIS!
KBase Labs!
Search!
Genome
Browser!
KBase Lab!
launch page!
static link!
dynamic link!
Core Model
Viewer!
iPython!
…!
Storing a diverse representation of biological da
anging from highly structured data in relatio
databases to frequently generated and changin
user data to large bulk data!
tructured Storage for Curated Data Flexible Storage for Workspaces Petabytes of Raw Data
edicted By
s n
h pe
Is M odeled By
Metabolic Model
Role Set
Subsystem
S
Belongs To
Included In
Reaction
Depends On
GPR Association
Role
Compound
Named By
Identifies Named By
Identifier
Feature
Publication
Exists In
Is Located
In
Named By
Concerns
Included in
Encodes protein For
Reactant Of
bserved
e ns
Is In Class
Protein sequence
DNA sequences
T
Experimentally Observed For
Related to
Is comprised of
Atomic Regulon
Consistent with
Expression level
Includes
Is assoc
Expression Experiment
Base Feb sofware build   API now contains over 800 commands Base Central Data Store with   6,416 genomes, 16,430,057 features   22,367,646 proteins, 3,920,975 annota0ons   1,798 subsystems, 7,231 publica0ons   12,620 regulons, 266,345 protein families   55,095 trees, 1,117,690 func0onal roles   16,996 compounds, 13,256 reac0ons   4985 models, 3191 experiments, 521 media type
0 Interface Descrip0on ocuments (Modules) 21 Func0ons 25 Data types ver 70 repositories on he KBase git server *!
*!
*!
(as of Feb 17)!
*!
•  Over 700 commands available in IRIS S thon Narra0ve xiliary Store Service mmuni0es API tagenomics Analysis Tools R Service nota0on Service ntral Store re Model Viewer periment data A Modeling tabolic Map Viewer crobes_model_builder spy babilis0c Annota0on prom_service tein Info service gula0on Service milarity Service • 
• 
• 
• 
• 
• 
• 
• 
• 
• 
• 
• 
• 
• 
• 
• 
• 
• 
Transla0on Service Workspace Service KB Model Seed Tree Service Assembly Service Authoriza0on Service Network Service Genotype Phenotype Service Genotyping Service Ontology Service Plant Expression Service Authen0ca0on and Authoriza0on Clien
Cluster Service ERDB service File Type Service ID Service Registry Type Compiler S thon Narra0ve xiliary Store Service tagenomics Analysis Tools R Service nota0on Service ntral Store mmuni0es API Service re Model Viewer periment data A Modeling tabolic Map Viewer crobes_model_builder phage Service babilis0c Annota0on prom_service tein Info service gula0on Service milarity Service • 
• 
• 
• 
• 
• 
• 
• 
• 
• 
• 
• 
• 
• 
• 
• 
• 
• 
Transla0on Service Workspace Service KB Model Seed Tree Service Assembly Service Authoriza0on Service Network Service Genotype Phenotype Service Varia0on Service Ontology Service Plant Expression Service Authen0ca0on and Authoriza0on Clien
Cluster Service ERDB service File Type Service ID Service Registry Type Compiler Service
gh-Throughput Sequencing
Analysis
Management Service
File System Operations
Job Submission
Workspace Management
Launch Analysis on Cluster
Utility Operations
Data Transfer
Job Status, Job Control
Upload/Download Data for
Analysis
KBase IRIS Interface
Command line
Interface
RPC-API
Analyses
AUTHENTICATION
Bowtie2 Alignment
Data products
Sequence Alignments
SAM/BAM
Globus Online Authentication
BWA Alignment
Variations
VCF
BLAST Alignment
Data Repository
Cloud Storage System
SAMtools SNP
Read Metrics
Mapping Statistics
K-mer Profile
K-mer Counting
K-mer Histogram
uster Service
DISTRIBUTED COMPUTE
SERVICE
CONTROL SERVER
RESOURCE CONTROL
JOB CONTROL
Register and Monitor
Resources
Submit, Update, Monitor,
and Cancel Jobs
WORKERS
AMQP QUEUE
LANGUAGE INTERFACES
RESTFUL API
AUTHENTIC
Nexus Authentication
ATION
Service
DATA REPOSITORIES
SHOCK!
WORKSPAC
NERSC CLUSTERS
APPLICATION
BNL CLUSTER
BLAST
BLAT
MAGELLAN
FBA
KIKI
QC
SNP
…MORE TO COME
uxiliary Store
Distributed Storage
Command Line Clients
User Interface
Upload Page
Data Downloader
Register New Data Sets
Retrieve analysis results or
raw data from KBase
Store Rich Metadata
Deployment
RESTFUL API
MongoDB
ANL Magellan
Client Cache
Freeform Key/Value Store
Data Set Collection Metadata
Shock
ORNL KBase
JGI @ NERSC
Computational Provenance
Data Server Component
Storage Back
Data Object Storag
POSIX File Backen
AUTHENTICATION
Nexus Authentication
Service
WIDE AREA TRANSFER
Globus Transfer/Grid-FTP
(coming soon)
OpenStack Volume St
Provides Subselect
Optimized for Large Da
IRIS !
ect model supports provenance meaning all previous versions of the objects can b
ews may be used for all public and private models and FBA soluti
Simple Tabular Views!
GLAMM!
Model Viewer!
CytoSEED Plugin!
R is the most popular sta0s0cs language R provides many exis0ng packages –  Has 4300 contributed packages (CRAN) –  Ecology, Machine learning, Clustering, etc. KBase API talks to R now g PCoA!
ython based an be shared/published   private by default   Can have a permanent URL ou can share/publish your nalyses (addi0onal material for ublica0ons) upports Shell, R, Perl, Python as visual analysis builder sual analysis results are splayed inline   Users can interact with visuals vice Level Server Level ervice monitoring on onfigurable intervals WAN transit 0mes Compute 0me ervice logging •  Using Nagios and check_
•  Each site has it’s own monitoring server •  Includes –  server health –  VM availability –  Network connec0vity between the monitoring server and monitored serv
n DOE ASCR Originated an Cloud Infrastructure n Stack Cloud @ Argonne Stack Cloud @ Oak Ridge ster system @ Berkeley er system @ Brookhaven Petabytes of Storage Argonne KBase Magellan Hardware > 700 nodes (> 12,000 cores) for KBase One Integrated KBase nfrastructure Base Sites trust each ther entralized security management at ANL 21.43.0/24 192.12.68.0/24 ORNL ORNL!
BNL!
ANL!
!
LBNL!
128.55.0.0/16 198.124.2
192.12.68.0/24 1.  Submit jobs to cluster!
service via web services !
and KBase API!
ANL!
!
ase Magellan:!
res for KBase!
21.43.0/24 ORNL ORNL!
ESnet Produc0on IP Network ORNL Hadoop Cluster:!
Kandinsky entirely for KBase:!
1088 cores!
!
BNL!
198.124.2
BNL Torq
320 cores, 3
NERSC Batch Queue:!
LBNL!
Hopper and Clusters!
128.55.0.0/16 Easy access to scalable compute and data transfer
Easy installa0on and scaling to use of commercial cloud solu0ons: amazon and google. Data and IP security but with flexible publishing model Distributed development with central quality cont
and security team Flexible use models… What next?
Building a starship in deep space? he data model must evolve to support he modeling mission • 
Need to improve data import, quali
assessment and metadata framework for turning bioinforma0cs gorithms’ output into models needs urther development • 
Efficient incorpora0on of new third
party algorithms and support with scalable compute. heory for integraDve, cross-­‐scale redicDve biology under development • 
Growing a strong external developm
community while maintaining quali
stability and vision. • 
Launching the KBase Founda0on to
ease licensing and growth of KBase par0cipa0on. • 
Constant update and propagaDon “asserDons” based on distributed, large data. Much beCer ontologies for nearly verything uilding the social tools. A more concrete view of the “Narra0ve” interface SE: Professional Computa0onal Biologists Data generators and basic analysts Knowledge Seekers Knowledge Generators efore we aim to: instances of “minimum inventory/m
diversity” systems, a term coined b
in his book, Structure in Nature Is a
Design (MIT Press, 1978). eate a powerful framework for programma0c access to data and func0ons o
base. (Users A,B)   Ul0mately provide stubs for use in PERL, PYTHON, R, MATLAB, Galaxy, etc. eate a set of packaged “Widgets” that make placement and recognizable splay of Kbase “func0ons” on web pages (or within perhaps other apps), eas
nd iden0fiable. (Users B) eate a “simplified” portal for search and aggrega0on of data for data nsumers and Knowledge Seekers. (Users C,D) eate a innova.ve pla9orm for knowledge crea.on, evolu.on and sharing. an share knowledge at at mulDple levels of rity t and paste parts of narra0ves to reuse workflows oss-­‐cita0on and branching of narra0ves management ea0on of teams allows management of projects ctronic lab notebook for computa0onal researchers ojects can track progress e publicaDon model alized Narra0ves are reviewed, assigned DOI numbers, d accepted in an appropriate journal a0on metrics of Research Efficiency mes to comple0on of narra0ves mes from hypothesis to confirming data/narra0ves ata, and algorithm raDngs by how many 0mes ducts appear in or are cited by narra0ves. n be aggregated by user, team, loca0on, agency ople networks can be inferred by looking at team, a0on, and comment structures. Refining the Metabolic Model for Escherichia coli F11 . I'll start by geing the genome of E. coli F11 from the KBase Central Data Store
Escherichia coli F11 Search Let’s take a look at this genome. Now I'm going to run a metabolic reconstruc0on on this genome. Let's see how Comment by [email protected]: I’ve grown this without serine, but the genes for serine are missing in this autom
reconstruc0on. I think you’ll need to add kb|g.362.peg.287, kb|g.382.peg.123, and k
898. Search Eschericha coli F11 Growth /01'
Narra$ve'Graphs'Provide'
Measures'of'Ac$vity'and'
Influence'
*+&,"#-.'/''
!"##"$%&'
()&"'
E'
/02'
/01'
*+&,"#-.'3'
/02'
Narra$ve'
Narra$ve'
Idea'
Narra$ve'
Idea'
Idea'
Scenario'A''
H'
Scenario'A''
Scenario'A''
Scenario'B'
H' H'
Scenario'B'
Scenario'B'
A.1'
A.1'
H'
A.1'
A.2'
A.2'
A.2'
A.1'
A.1'
E'
A.1'
A.2'
A.2'
A.2'
Func$on'and'
Data'contents'
can'be'tracked'
to'assess'“use”'
Narra$ve'Code'Versioning'and'Scenario'Branching'
Narra$ve'Query'Update'
Narra$ve'Data'Change'
/01'
/01'
*+&,"#-.'/''
/01'
*+&,"#-.'/''
*+&,"#-.'/''
/02'
!"##"$%&'
()&"'
Project'Linkage'and'Cita$on'
/01'
*+&,"#-.'3'
/02'
/02'
!"##"$%&'
()&"'
/01'
/02'
!"##"$%&'
()&"'
/01'
*+&,"#-.'3'
*+&,"#-.'3'
/02'
/02'
ers can share knowledge at all granularity Text descrip0on ACached files and copied plots Cut-­‐and-­‐paste of parts of narra0ves to reuse custom workflows Cross-­‐cita0on and branching of narra0ves oject management Crea0on of teams allows management of projects Effect electronic laboratory notebook for computa0onal researchers Principle inves0gators and project managers can track progress Time stamping aids in intellectual property protec0on ssible publicaDon model wherein finalized Narra0ves are reviewed, assigned DOI number
epted in an appropriate (or new journal) Cita0on metrics trics of Research Efficiency Times to comple0on of narra0ves Times from hypothesis to confirming data/narra0ves rraDve NavigaDon Graphical Views of Narra0ves, their branches, and their cross cita0ons allows both naviga0on of infor
about a topic and a way of ra0ng the influence of narra0ves and the interlinked nature of their hypot
er, data, and algorithm raDngs by how many 0mes the products appear in or are cited by ra0ves. Can be aggregated by user, team, loca0on, agency People networks can be inferred by looking at team, cita0on, and comment structures. ransparent and Reproducible Science and Data romulga0on of ac0ve quality metrics for data, gorithms, models and (eek) users.   The framework in which to execute predic0on “contests”
for compe0ng approaches corpora0on, sharing and propaga0on of both ormal predic0on and expert knowledge riving towards design of experiments and terven0on (e.g. engineering for increase roduc0on, inhibi0on for control of phenotype). omain agnos0cism Try the Beta Yourself
http://kbase.us
http://kbase.u
!
"Facebook!
DOE-Systems-Biology-Knowledge
!
!
"Twitter!
@DOEKBase!
Contact us at outreach@kbase.
Berkeley n ian Dehal s mohl n nner on han Chandonia olia Brookhaven AnneCe Greiner MaC Henderson Marcin Joachimiak Keith Keller Pavel Novichkov Sarah Poon Gavin Price Bill Riehl Michael Sneddon Gwyneth Terry Cary Whitney Sergei Maslov Fei He Shinjae Yoo Dantong Yu Cold Spring Harbor Doreen Ware James Gurtowski Sunita Kumari Shiran Pasternak Michael Schatz James Thomason Yale lege Mark Gerstein Gang Fang Lucas Lochovsky Daifeng Wang ngh est ntle N UIUC an Gary Olsen UC Davis Pamela Ronald Taeyun Oh CommuniDes Microbes Infrastructure Plants Management Outreach Argonne Rick Stevens Tom Brebn Elizabeth Glass Chris Henry Folker Meyer Jennifer Salazar Jared Bischof Neal Conrad Narayan Desai ScoC Devoid Terry Disz Paul Frybarger India Gordon Travis Harrison Adina H
Kevin K
Silvia M
Bob Ols
Dan Ols
Ross Ov
Tobias P
Bruce P
Sam Se
Will Trim
Andrea
Jared W
Fangfan
Oak Ridge Bob Cobngham Brian Davison Dave Weston Meghan Drake Guru Kora Miriam Michael
Steve M
Tony Pa
Mustafa
Adam Arkin es ehal y Rick Stevens CommuniDes Folker Meyer Plants Doreen Ware Bob Cobngham Infrastructure Tom Brebn Sergei Mas
Outreach Elizabeth Glass Project M
No
Pub
Dylan Chivian Dave Weston Shane Canon Brian Davison Jen
Current workflow types Formal comparison of algorithms (assembly
Add social outcomes Future in whole cell simula0on, first order ogics and probalis0c compu0ng