Slides - Synthetic Biology @ BBN
Transcription
Slides - Synthetic Biology @ BBN
[email protected] AIforSynthe-cBiology FusunYaman,AaronAdler fusun,[email protected] BBNTechnologies,Cambridge,MA July9,2016 Workpar2allysponsoredbyDARPAundercontractHR0011-10-C-0168.TheviewsandconclusionscontainedinthisdocumentarethoseoftheauthorsandnotDARPAortheU.S.Government. ThisdocumentdoesnotcontaintechnologyorTechnicalDatacontrolledundereithertheU.S.Interna2onalTrafficinArmsRegula2onsortheU.S.ExportAdministra2onRegula2ons. The Vision – Programming Biology Like Computers [email protected] Inthe1950s,computers wereprogrammedby flippingswitchesandmoving cables… Synthe2cBiologyisat thatpointtoday… • Manualcrea2onofsimple“parts” • Tedious,adhoccrea2onofsimple systems Many iterations >1 year/system X • Manyopportuni2esinDesign– Build–Testlooptoimprove process • Goal:high-levelprogramto func2oningcells If detect explosives:! emit signal! If signal > threshold:! glow red! Simple systems 2 Why is this Important? [email protected] • DNA synthesis has been growing exponentially, but circuit (behavior) complexity has plateaued: DNA synthesis 1,080,000 583,000 Lengthinbasepairs 1,000,000 100,000 11 Circuit size 32,000 7,500 10,000 14,600 2,100 2,700 1,000 207 100 1975 1980 1985 1990 1995 2000 2005 2010 Year 2012 [Purnick&Weiss,‘09] Designingcomplexcircuitsishard! 3 *Samplingofsystemsinpublica2onswithexperimentalcircuits How to break the complexity barrier? [email protected] OrganismLevelDescrip-on Thisgapistoobig tocrosswitha singlemethod! Cells 4 Answer: Tool Chain [email protected] OrganismLevelDescrip-on High level simulator HighLevelDescrip-on If detect explosives: emit signal If signal > threshold: glow red Coarse chemical simulator AbstractGene-cRegulatoryNetwork Detailed chemical simulator DNAPartsSequence Collaborators: Ron Weiss Douglas Densmore AssemblyInstruc-ons Testing Cells [Bealetal,ACSSyn.Bio.2012] 5 Desperately Needs AI Op2miza2on [email protected] OrganismLevelDescrip-on High level simulator Constraint Sa2sfac2on Planning &Scheduling HighLevelDescrip-on If detect explosives: emit signal If signal > threshold: glow red Coarse chemical simulator AbstractGene-cRegulatoryNetwork Detailed chemical simulator Knowledge Representa2on Collaborators: Ron Weiss Douglas Densmore Expert Systems DNAPartsSequence Mul2-agent Sytems AssemblyInstruc-ons Machine Learning Reasoning Uncertainty Testing Cells [Bealetal,ACSSyn.Bio.2012] Robo2cs 6 Synthetic Biology Workflow [email protected] Map/behavior/specifica?on/ to/nucleic/acid/sequence(s)/ Design Transfect/transform/ cells,/assay/behavior/ Test Build Synthesize/assemble/ nucleic/acid/sequence(s)/ 7 Design Knowledge Representation (KR) [email protected] • Why: – Capture details about system designs for reproducibility – Curate useful part databases Test Build • Qualitative and quantitative information enabling automation – Data exchange formats for collaboration • Challenges – What is the minimum information required? – Detailed protocols descriptions required to reproduce results – Comparable measurement units and techniques are just emerging – Results are stochastic and vary with time – Many biological processes poorly understood 8 Test Build Design Solutions: SBOL [email protected] • Synthetic Biology Open Language (SBOL), standard describing genetic parts, devices, modules, systems – – – – Community effort much like W3C Ontology based approach Defines a standard visualization for designs Slowly gaining recognition and acceptance • ACS Synthetic Biology adopt SBOL for depiction and representation of genetic construct for recording and sharing 9 Design Solutions: Part Characterization TASBEcharacteriza2on Target system for characterization Build [email protected] Fluorescent color experiments FACS data Cell autofluorescence compensation Fluorescent color selection Normalized Dox transfer curve, colored by plasmid bin 4 10 3 10 IFP MEFL/plasmid Test Fluorescent color compensation Segmentation and subgroup statistics 2 10 1 10 0 10 −1 10 −1 10 0 10 1 2 10 10 3 10 4 10 [Dox] Normalization to per-plasmid behavior !"#$%&'"()*"+'( 10 Design • Why Build Knowledge-Based Systems • Challenges [email protected] – Design is knowledge intensive – Understand what might be going wrong – What to do and how to correct it Test – Greater precision required for computation – Managing complexity 11 [email protected] • Formal grammars (GenoCad) for verification – Is this a valid DNA sequence ? • Black listed bad sequences (Eugene) • BioCompiler: design motifs Test Build Design Solutions GenoCADCFG[Caietal.,‘07] 12 Design Solutions : Bio Compiler l [email protected] Motif-Based compilation where operators are translated to motifs: IPTG not green IPTG A outputs LacI arg0 B outputs arg0 GFP outputs IPTG LacI A B GFP 13 Design Optimization [email protected] IPTG LacI A A A GFP GFP DeadCodeElimina9on IPTG LacI B DeadCodeElimina9on IPTG LacI GFP CopyPropaga9on IPTG LacI B A GFP Complex Example: 4-bit Counter [email protected] Delta1 Mike Homemade Design Kilo Thorn Slippery Kilo Mike Homemade Sour GFP Delta1 GFP Jealous Sour Thorn Jealous Slippery Nuisance Alfa1 Ripen Handwriting Xray Nuisance Hotel Kilo Hotel Handwriting Hotel Alfa1 Homework Ripen Mend Multiplication Whichever mKate Homework Yankee1 Xray mKate Multiplication Paw Cowardice Motherly Paw Motherly Congratulate Overflow Yankee1 Congratulate Bribery Overflow Mend Businesslike Whichever Bribery Homework Bribe Bribery Bribe Canal Cowardice Bicycle Bicycle Conqueror EYFP Businesslike Mat EYFP Breadth Omission Conqueror Donkey1 Amongst Madden Omission Weed Madden Cowardice Weed Descendant Madden Donkey1 Descendant Coward Coward Breadth Mat Avoidance Avoidance Baggage Amongst Companionship Amongst Reproduction Dox Baggage Fry rtTA Companionship Quart1 Reproduction Companionship rtTA Quart1 Fry Spoon Disappearance Disappearance Deceive Deceive Spoon Lengthen EBFP2 Canal EBFP2 Lengthen Op9mizedcompiler alreadyoutperforms humandesigners Design Constraints and Satisfiability [email protected] • Why – Transforming high-level organism descriptions to DNA sequences involves solving several constraint satisfaction problems • Which parts to use – Parts are not compatible with each other, interact and interfere • Which method to use – Better noise reduction at the cost of higher metabolosmic load • Challenges – The data is missing or incomplete – Formulation of biological requirements as constraints – Domain is complex: identify necessary and sufficient conditions 16 Solutions: MatchMaker [email protected] Design How can we map the abstract parts in an AGRN to real parts? Dox Dox EYFP CFP rtTA rtTA pHef1a pTre EYFP lacI CFP pTre pHef1a- LacO1Oid Feature Mapping: Satisfy the Signal Matching: Pick parts that constraints on the edges of the AGRN are signal compatible accounting for noise and preserving digital behavior AGRN Transcription Factors a0 Promoters P0 a1 P1 x P2 y Feature Database Transcription Factors q0 Find Match [Y] [Z] [Z] Promoters P0 q1 P1 q2 P2 q3 P3 q4 P4 P5 ✗ NOT signal compatible [X] [Y] [X] [Y] [Z] [Z] ✔ Signal compatible [X] [Y] [X] 17 Design Optimization and Heuristic Search [email protected] • Why: – Most of the constraint satisfaction problems at the design stage are at least NP-Hard. Test Build • Feature matching, Signal Matching, Part Matching – Optimization is necessary at every stage • To manage complexity and the viability of the design • To make the most of shared resources that degrade over time: e.g., chemicals, DNA • Challenges: – Objective functions for optimization can be very complex – Domain specific heuristics require a lot of biology expertise 18 Solutions [email protected] Build • Assembly Planner: Optimize BioBrick assembly across multiple designs (Densmore) – Determines subpart sharing across goal parts. – Utilize three different heuristics to find an assembly sequence that minimize assembly steps while maximizing the sharing of intermediate products ab c bc abc b cd bc bcd ab c d bc abc bcd 19 Build Design Machine Learning [email protected] • Why: – Wide range of applications need small synthetic biology circuit to classify cell state. • Infected or not? – Learning from experience – success as well as failure • Currently just eyeballing Excel spreadsheets Test • Challenges: – Big data – Incomplete and noisy data – Synthetic circuits might not be viable 20 Design Solutions [email protected] • Goal: Logic formula to identify HeLa cancer cells • Sensors: Micro RNA HeLa=miR-1Hᴧ(miR-2H+ miR-3H)ᴧ ̴miR-4Lᴧ ̴miR-5Lᴧ̴miR-6L WeissLab& BenensonLab2011 • Supervised learning and information-based approach to design of cell-state detectors (Beal & Yaman 2012) (and(miRNA<’hsa-let-7c5.0) (miRNA>’hsa-miR-130a0.3) – Which of the 600 markers are more informative? – What is the threshold? (miRNA<’hsa-miR-29b27.0) (miRNA<’hsa-miR-1540.3) (miRNA<’hsa-miR-1970.3)) • Identified a smaller set of markers • No false positives & moderate false negatives 21 Planning and Scheduling [email protected] Build • Why – Complex dependencies with temporal and resource constraints, non-deterministic outcomes at assembly and test – Effective use of shared wet-lab, shared supplies, shared equipment – Increasing precision of executed protocols Test • Challenges: – Reliance on existing paper lab notebooks, software – Closed source lab management software – Complex scheduling including living cells that must be maintained and measured at certain times 22 Solutions [email protected] • Puppeteer (Vasilev et al. 2011) Suite of tools for defining Build – robot specific resources – robot specific protocols – producing executable scripts. a b c a+b d c+d ab cd ab+cd StockPlate Dilu2onPlate abcd a c d b 1.Assignmenttowells abcd Reac2onPlate ab cd 2.Dilutewithbuffer 3.Combineassembly step 23 Build Design Reasoning under Uncertainty [email protected] • Why – Many biological processes are inherently stochastic and not synchronized – Still many unknowns and noise in the data/models • Challenges: Test – Some data collection methods destroy the cells • Can’t be used for data collection at later time points – Different trials of the same experiment may have different results – Modeling enough to guarantee predictability 24 Design Solutions [email protected] • Empirical Quantitative Incremental Prediction (EQuIP) – accurate prediction of genetic regulatory network behavior from detailed characterizations of their components. (Davidsohn 2015) ACS Synthetic Biology Letter 25 Design Multi-Agent Systems [email protected] • Why – Each cell can be viewed as an “agent” Build • Cells can communicate using small molecules – Complex applications rely on interaction between cells • Tissue differentiation, multi-cellular organisms Test • Challenges [Weiss] – Communication rate and reaction time are slow – Need to model cell-to-cell communication for complex aggregate behavior – Test and validation of differentiated cells 26 Solutions: PROTO [email protected] Design • Spatial Computing/ Aggregate programming – Cells are seen as spatial computers: Executing the same program in reaction to the local sensor information. High-Level Bio-FocusedLanguage Gene-cRegulatory Network (def band-detector (signal lo hi) (and (> signal lo) ! (< signal hi))) (let ((v (diffuse (aTc) 0.8 0.05)))! (green (band-detect v 0.2 1))) Compile&Op-mize Realize Simulate [Beal&Bachrach,‘08] [Weiss] 27 Robotics [email protected] • Why – Complex assembly techniques, error prone Build • “Magic touch” by some practitioners – Schedule flexibility and continuous availability – Increase consistency of results for repetitive, errorprone, time-sensitive procedures Test • Challenges: – Sampling cells and pipetting small volumes – Effective human-machine collaborations – Assembly techniques that will require more automation 28 Solutions [email protected] • Automated Assembly (Densmore Lab) Robot Build AssemblyPlanningTools • • • Biomek3000 1-wayand8-waypiperetools grippertomoveplatestomagnet heatblock,shakerand-20Condeck 29 Summary & Conclusions [email protected] • Synthetic biology has exciting applications: – curing cancer, diabetes, neglected diseases – environmental remediation – biofuels, nanofabrication • Current techniques and tools are very limited • AI has many techniques that can be applied to Synthetic Biology • Workshop Goal: Build connections between the AI community and the Synthetic Biology community. Hopefully the first in a series of workshops! 30