Slides - Synthetic Biology @ BBN

Transcription

Slides - Synthetic Biology @ BBN
[email protected]
AIforSynthe-cBiology
FusunYaman,AaronAdler
fusun,[email protected]
BBNTechnologies,Cambridge,MA
July9,2016
Workpar2allysponsoredbyDARPAundercontractHR0011-10-C-0168.TheviewsandconclusionscontainedinthisdocumentarethoseoftheauthorsandnotDARPAortheU.S.Government.
ThisdocumentdoesnotcontaintechnologyorTechnicalDatacontrolledundereithertheU.S.Interna2onalTrafficinArmsRegula2onsortheU.S.ExportAdministra2onRegula2ons.
The Vision – Programming Biology
Like Computers
[email protected]
Inthe1950s,computers
wereprogrammedby
flippingswitchesandmoving
cables…
Synthe2cBiologyisat
thatpointtoday…
•  Manualcrea2onofsimple“parts”
•  Tedious,adhoccrea2onofsimple
systems
Many
iterations
>1 year/system
X
•  Manyopportuni2esinDesign–
Build–Testlooptoimprove
process
•  Goal:high-levelprogramto
func2oningcells
If detect explosives:!
emit signal!
If signal > threshold:!
glow red!
Simple
systems
2
Why is this Important?
[email protected]
•  DNA synthesis has been growing exponentially,
but circuit (behavior) complexity has plateaued:
DNA synthesis
1,080,000
583,000
Lengthinbasepairs
1,000,000
100,000
11
Circuit size
32,000
7,500
10,000
14,600
2,100 2,700
1,000
207
100
1975 1980 1985 1990 1995 2000 2005 2010
Year
2012
[Purnick&Weiss,‘09]
Designingcomplexcircuitsishard!
3
*Samplingofsystemsinpublica2onswithexperimentalcircuits
How to break the complexity barrier?
[email protected]
OrganismLevelDescrip-on
Thisgapistoobig
tocrosswitha
singlemethod!
Cells
4
Answer: Tool Chain
[email protected]
OrganismLevelDescrip-on
High level simulator
HighLevelDescrip-on
If detect explosives:
emit signal
If signal > threshold:
glow red
Coarse chemical simulator
AbstractGene-cRegulatoryNetwork
Detailed chemical simulator
DNAPartsSequence
Collaborators:
Ron
Weiss
Douglas
Densmore
AssemblyInstruc-ons
Testing
Cells
[Bealetal,ACSSyn.Bio.2012]
5
Desperately Needs AI
Op2miza2on
[email protected]
OrganismLevelDescrip-on
High level simulator
Constraint
Sa2sfac2on
Planning
&Scheduling
HighLevelDescrip-on
If detect explosives:
emit signal
If signal > threshold:
glow red
Coarse chemical simulator
AbstractGene-cRegulatoryNetwork
Detailed chemical simulator
Knowledge
Representa2on
Collaborators:
Ron
Weiss
Douglas
Densmore
Expert
Systems
DNAPartsSequence
Mul2-agent
Sytems
AssemblyInstruc-ons
Machine
Learning
Reasoning
Uncertainty
Testing
Cells
[Bealetal,ACSSyn.Bio.2012]
Robo2cs
6
Synthetic Biology Workflow
[email protected]
Map/behavior/specifica?on/
to/nucleic/acid/sequence(s)/
Design
Transfect/transform/
cells,/assay/behavior/
Test
Build
Synthesize/assemble/
nucleic/acid/sequence(s)/
7
Design
Knowledge Representation (KR)
[email protected]
•  Why:
–  Capture details about system designs for reproducibility
–  Curate useful part databases
Test
Build
•  Qualitative and quantitative information enabling automation
–  Data exchange formats for collaboration
•  Challenges
–  What is the minimum information required?
–  Detailed protocols descriptions required to reproduce
results
–  Comparable measurement units and techniques are just
emerging
–  Results are stochastic and vary with time
–  Many biological processes poorly understood
8
Test
Build
Design
Solutions: SBOL
[email protected]
•  Synthetic Biology Open Language (SBOL), standard
describing genetic parts, devices, modules, systems
– 
– 
– 
– 
Community effort much like W3C
Ontology based approach
Defines a standard visualization for designs
Slowly gaining recognition and acceptance
•  ACS Synthetic Biology adopt SBOL for depiction and
representation of genetic construct for recording and
sharing
9
Design
Solutions: Part Characterization
TASBEcharacteriza2on
Target system for
characterization
Build
[email protected]
Fluorescent color
experiments
FACS data
Cell autofluorescence
compensation
Fluorescent color
selection
Normalized Dox transfer curve, colored by plasmid bin
4
10
3
10
IFP MEFL/plasmid
Test
Fluorescent color
compensation
Segmentation and
subgroup statistics
2
10
1
10
0
10
−1
10
−1
10
0
10
1
2
10
10
3
10
4
10
[Dox]
Normalization to
per-plasmid behavior
!"#$%&'"()*"+'(
10
Design
•  Why
Build
Knowledge-Based Systems
•  Challenges
[email protected]
–  Design is knowledge intensive
–  Understand what might be going wrong
–  What to do and how to correct it
Test
–  Greater precision required for computation
–  Managing complexity
11
[email protected]
•  Formal grammars (GenoCad) for verification
–  Is this a valid DNA sequence ?
•  Black listed bad sequences (Eugene)
•  BioCompiler: design motifs
Test
Build
Design
Solutions
GenoCADCFG[Caietal.,‘07]
12
Design
Solutions : Bio Compiler
l 
[email protected]
Motif-Based compilation where operators are
translated to motifs:
IPTG
not
green
IPTG
A
outputs
LacI
arg0
B
outputs arg0
GFP outputs
IPTG
LacI
A
B
GFP
13
Design
Optimization
[email protected]
IPTG
LacI
A
A
A
GFP
GFP
DeadCodeElimina9on
IPTG
LacI
B
DeadCodeElimina9on
IPTG
LacI
GFP
CopyPropaga9on
IPTG
LacI
B
A
GFP
Complex Example: 4-bit Counter
[email protected]
Delta1
Mike
Homemade
Design
Kilo
Thorn
Slippery
Kilo
Mike
Homemade
Sour
GFP
Delta1
GFP
Jealous
Sour
Thorn
Jealous
Slippery
Nuisance
Alfa1
Ripen
Handwriting
Xray
Nuisance
Hotel
Kilo
Hotel
Handwriting
Hotel
Alfa1
Homework
Ripen
Mend
Multiplication
Whichever
mKate
Homework
Yankee1
Xray
mKate
Multiplication
Paw
Cowardice
Motherly
Paw
Motherly
Congratulate
Overflow
Yankee1
Congratulate
Bribery
Overflow
Mend
Businesslike
Whichever
Bribery
Homework
Bribe
Bribery
Bribe
Canal
Cowardice
Bicycle
Bicycle
Conqueror
EYFP
Businesslike
Mat
EYFP
Breadth
Omission
Conqueror
Donkey1
Amongst
Madden
Omission
Weed
Madden
Cowardice
Weed
Descendant
Madden
Donkey1
Descendant
Coward
Coward
Breadth
Mat
Avoidance
Avoidance
Baggage
Amongst
Companionship
Amongst
Reproduction
Dox
Baggage
Fry
rtTA
Companionship
Quart1
Reproduction
Companionship
rtTA
Quart1
Fry
Spoon
Disappearance
Disappearance
Deceive
Deceive
Spoon
Lengthen
EBFP2
Canal
EBFP2
Lengthen
Op9mizedcompiler
alreadyoutperforms
humandesigners
Design
Constraints and Satisfiability
[email protected]
•  Why
–  Transforming high-level organism descriptions to DNA
sequences involves solving several constraint satisfaction
problems
•  Which parts to use
–  Parts are not compatible with each other, interact and interfere
•  Which method to use
–  Better noise reduction at the cost of higher metabolosmic load
•  Challenges
–  The data is missing or incomplete
–  Formulation of biological requirements as constraints
–  Domain is complex: identify necessary and sufficient
conditions
16
Solutions: MatchMaker
[email protected]
Design
How can we map the abstract parts in an AGRN to real parts?
Dox
Dox
EYFP
CFP
rtTA
rtTA
pHef1a
pTre
EYFP
lacI
CFP
pTre
pHef1a-
LacO1Oid
Feature Mapping: Satisfy the
Signal Matching: Pick parts that
constraints on the edges of the AGRN
are signal compatible accounting for
noise and preserving digital behavior
AGRN
Transcription
Factors
a0
Promoters
P0
a1
P1
x
P2
y
Feature Database
Transcription
Factors
q0
Find
Match
[Y]
[Z]
[Z]
Promoters
P0
q1
P1
q2
P2
q3
P3
q4
P4
P5
✗
NOT signal
compatible
[X]
[Y]
[X]
[Y]
[Z]
[Z]
✔
Signal
compatible
[X]
[Y]
[X]
17
Design
Optimization and Heuristic Search
[email protected]
•  Why:
–  Most of the constraint satisfaction problems at the design
stage are at least NP-Hard.
Test
Build
•  Feature matching, Signal Matching, Part Matching
–  Optimization is necessary at every stage
•  To manage complexity and the viability of the design
•  To make the most of shared resources that degrade over time:
e.g., chemicals, DNA
•  Challenges:
–  Objective functions for optimization can be very complex
–  Domain specific heuristics require a lot of biology expertise
18
Solutions
[email protected]
Build
•  Assembly Planner: Optimize BioBrick assembly
across multiple designs (Densmore)
–  Determines subpart sharing across goal parts.
–  Utilize three different heuristics to find an assembly
sequence that minimize assembly steps while
maximizing the sharing of intermediate products
ab c
bc
abc
b cd
bc
bcd
ab c
d
bc
abc bcd
19
Build
Design
Machine Learning
[email protected]
•  Why:
–  Wide range of applications need small synthetic
biology circuit to classify cell state.
•  Infected or not?
–  Learning from experience – success as well as failure
•  Currently just eyeballing Excel spreadsheets
Test
•  Challenges:
–  Big data
–  Incomplete and noisy data
–  Synthetic circuits might not be viable
20
Design
Solutions
[email protected]
•  Goal: Logic formula to identify HeLa cancer cells
•  Sensors: Micro RNA
HeLa=miR-1Hᴧ(miR-2H+ miR-3H)ᴧ
̴miR-4Lᴧ ̴miR-5Lᴧ̴miR-6L
WeissLab&
BenensonLab2011
•  Supervised learning and information-based approach to
design of cell-state detectors (Beal & Yaman 2012) (and(miRNA<’hsa-let-7c5.0)
(miRNA>’hsa-miR-130a0.3)
–  Which of the 600 markers are more informative?
–  What is the threshold?
(miRNA<’hsa-miR-29b27.0)
(miRNA<’hsa-miR-1540.3)
(miRNA<’hsa-miR-1970.3))
•  Identified a smaller set of markers
•  No false positives & moderate false
negatives
21
Planning and Scheduling
[email protected]
Build
•  Why
–  Complex dependencies with temporal and resource
constraints, non-deterministic outcomes at assembly and
test
–  Effective use of shared wet-lab, shared supplies, shared
equipment
–  Increasing precision of executed protocols
Test
•  Challenges:
–  Reliance on existing paper lab notebooks, software
–  Closed source lab management software
–  Complex scheduling including living cells that must be
maintained and measured at certain times
22
Solutions
[email protected]
•  Puppeteer (Vasilev et al. 2011)
Suite of tools for defining
Build
–  robot specific resources
–  robot specific protocols
–  producing executable scripts.
a
b
c
a+b
d
c+d
ab
cd
ab+cd
StockPlate
Dilu2onPlate
abcd
a
c
d
b
1.Assignmenttowells
abcd
Reac2onPlate
ab cd
2.Dilutewithbuffer
3.Combineassembly
step
23
Build
Design
Reasoning under Uncertainty
[email protected]
•  Why
–  Many biological processes are inherently stochastic
and not synchronized
–  Still many unknowns and noise in the data/models
•  Challenges:
Test
–  Some data collection methods destroy the cells
•  Can’t be used for data collection at later time points
–  Different trials of the same experiment may have
different results
–  Modeling enough to guarantee predictability
24
Design
Solutions
[email protected]
•  Empirical Quantitative Incremental Prediction (EQuIP)
–  accurate prediction of genetic regulatory network behavior from
detailed characterizations of their components. (Davidsohn 2015)
ACS Synthetic Biology
Letter
25
Design
Multi-Agent Systems
[email protected]
•  Why
–  Each cell can be viewed as an “agent”
Build
•  Cells can communicate using small molecules
–  Complex applications rely on interaction between
cells
•  Tissue differentiation, multi-cellular
organisms
Test
•  Challenges
[Weiss]
–  Communication rate and reaction time are slow
–  Need to model cell-to-cell communication for complex
aggregate behavior
–  Test and validation of differentiated cells
26
Solutions: PROTO
[email protected]
Design
•  Spatial Computing/ Aggregate programming
–  Cells are seen as spatial computers: Executing the same
program in reaction to the local sensor information.
High-Level
Bio-FocusedLanguage
Gene-cRegulatory
Network
(def band-detector (signal lo hi)
(and (> signal lo) !
(< signal hi)))
(let ((v (diffuse (aTc) 0.8 0.05)))!
(green (band-detect v 0.2 1)))
Compile&Op-mize
Realize
Simulate
[Beal&Bachrach,‘08]
[Weiss]
27
Robotics
[email protected]
•  Why
–  Complex assembly techniques, error prone
Build
•  “Magic touch” by some practitioners
–  Schedule flexibility and continuous availability
–  Increase consistency of results for repetitive, errorprone, time-sensitive procedures
Test
•  Challenges:
–  Sampling cells and pipetting small volumes
–  Effective human-machine collaborations
–  Assembly techniques that will require more
automation
28
Solutions
[email protected]
•  Automated Assembly (Densmore Lab)
Robot
Build
AssemblyPlanningTools
• 
• 
• 
Biomek3000
1-wayand8-waypiperetools
grippertomoveplatestomagnet
heatblock,shakerand-20Condeck
29
Summary & Conclusions
[email protected]
•  Synthetic biology has exciting applications:
–  curing cancer, diabetes, neglected diseases
–  environmental remediation
–  biofuels, nanofabrication
•  Current techniques and tools are very limited
•  AI has many techniques that can be applied to
Synthetic Biology
•  Workshop Goal: Build connections between the AI
community and the Synthetic Biology community.
Hopefully the first in a series of workshops!
30