Wellness Chip - Bio
Transcription
Wellness Chip - Bio
Cambridge Healthtech Media Group Indispensable Technologies Driving Discover y, Development, and Clinical Trials Download a PDF of This Issue www.bio-itworld.com SEE PAGE 4 JULY | AUGUSTr70-/0 Larry Gold’s ‘Wellness Chip’ SomaLogic CEO targets protein biomarkers for common diseases Page 25 NEW iDEAs IN DATA VISUALIZATION 10 TALKING BIO-IT AT BGI 8 THE CLINICAL BUZZ AT DIA 23 FUTURE OF CLINICAL TRIALS SURVEY 39 SPECIAL R E P O R T: 2011 BEST PRACTICES Awards 31 REGISTER BY 9 SEPTEMBER AND SAVE UP TO €150! Cambridge Healthtech Institute’s Third Annual 11-13 October 2011 Exhibition Grounds Hannover, Germany 11–12 October 12–13 October IT Infrastructure & the Cloud Bioinformatics NGS Data Management Drug Discovery Informatics Distinguished Faculty x Dimitris K. Agrafiotis, Vice President, Informatics, Johnson & Johnson Pharmaceutical Research & Development x Björn Andersson, Director of HPC, BlueArc x Stefan Baumann, Head of Imaging Infrastructure, Biomarker Development / Clinical Imaging, Novartis Pharma AG x Andrea Braeutigam, Ph.D., Postdoctoral Associate, Plant Biochemistry, Heinrich-Heine-University of Duesseldorf x Ting-Chao Chou, Ph.D., Director, Preclinical Pharmacology Core, Molecular Pharmacology & Chemistry Program, Memorial Sloan-Kettering Cancer Center x Thomas Eickermann, Ph.D., Head of Communication Systems Division, Juelich Supercomputing Centre, Research Ctr. Jülich x Samuel Flores, Ph.D., Assistant Professor, Cell and Molecular Biology, Uppsala University x Yuriy Gankin, Ph.D., CSO, GGA Software Services LLC x Laurent Gautier, Head of Core Facility and Senior Researcher, Systems Biology, Technical University of Denmark x Carole Goble, Professor, Computer Science, University of Manchester x Yike Guo, Ph.D., Professor, Computing Science, Computing, Imperial College London x Jonas Hagberg, M.Sc., Project Leader, UPPNEX; System Expert, UPPMAX, Uppsala University x Robert Haines, Research Computing Services, University of Manchester x Barry Hardy, Ph.D., Project Coordinator, Scientists Against Malaria and SYNERGY x Ian Harrow, Pistoia Alliance x Steve Hoffmann, Ph.D., Junior Research Group, Transcriptome Bioinformatics, Leipzig University x Lars Jorgensen, Ph.D., Sr. Scientific Manager, Production Software & Sequencing Informatics, Wellcome Trust Sanger Institute x Misha Kapushesky, Functional Genomics Team Leader, Microarray Informatic, European Bioinformatics Institute x Jean-Pierre Kocher, Ph.D., Bioinformatics Core Director, Health Sciences Research, Mayo Clinic x Karol Kozak, Ph.D., Data Handling Coordinator of Light Microscopy Center - RNAi Screening Center, High Content Screening Center, Swiss Federal Institute of Technology Zurich x Andreas Kremer, Ph.D., Department of Bioinformatics, Erasmus University Medical Center, Rotterdam x Detlef Labrenz, Sales Representative, DataDirect Networks, Inc. x Hermann Lederer, Ph.D., Deputy Director, Garching Computing Centre of the Max Planck Society x Hans Lehrach, Ph.D., Director & Head, Vertebrate Genomics, Max Planck Institute for Molecular Genetics x Urban Liebel, Ph.D., Group Leader, Head of Screening Centre, Inst. of Toxicology and Genetics, Karlsruhe Institute of Technology x Paul Lukowicz, Ph.D., Professor & Chair, Embedded Systems & Pervasive Computing, University of Passau KĸĐŝĂůWƵďůŝĐĂƟŽŶ͗ x Andrew Lyall, Ph.D., ELIXIR Project Manager, European Bioinformatics Institute x Daniel MacLean, Head of Bioinformatics and Training, The Sainsbury Laboratory, Norwich x Alberto Magi, Ph.D., Research Fellow, Center for the Study of Complex Dynamics (CSDC), Department of Medical and Surgical Critical Care, Careggi School of Medicine, University of Florence WƌĞŵŝĞƌ^ƉŽŶƐŽƌƐ͗ x Brian Marsden, Ph.D., Principal Investigator, Research Informatics, Structural Genomics Consortium, University of Oxford x M. Scott Marshall, Ph.D., Co-Chair W3C Health Care and Life Sciences Interest Group, University of Amsterdam / Leiden University Medical Center x Ilya Mazo, Ph.D., President, Ariadne x Folker Meyer, Ph.D., Computational Biologist, Institute for Genomics and Systems Biology, Argonne National Lab x Geoffrey Noer, Director, Product Marketing, Panasas, Inc. x Jan van Oeveren, Ph.D., Biostatistician, Bioinformatics, Keygene N.V. x John Overington, Team Leader, Chemogenomics, EMBL EBI Hinxton ŽƌƉŽƌĂƚĞ^ƉŽŶƐŽƌƐ͗ x Rolf Porsche, Ph.D., IBM Partner, Head of Pharma, Life Sciences and Healthcare, IBM x Corrado Priami, Ph.D., Professor, President & CEO, Microsoft Research, University of Trento, Centre for Computational and Systems Biology x Alban Ramette, Ph.D., Research Scientist, Microbial Habitat Group, Max Planck Institute for Marine Microbiology x Keith Robison, Ph.D., Lead Senior Scientist, Informatics, Infinity Pharmaceuticals, Inc. x Reinhard Schneider, Ph.D., Head, Bioinformatics Core Facility, Luxembourg Center for Systems Biomedicine, University of Luxembourg ŽƌƉŽƌĂƚĞ^ƵƉƉŽƌƚ ^ƉŽŶƐŽƌƐ͗ x Thomas Schulthess, Director, Swiss National Supercomputing Center x Ola Spjuth, Ph.D., Researcher, Pharmaceutical Biosciences, Uppsala University, Sweden; Project Leader, Bioclipse x Etzard Stolte, Ph.D., Global Head Strategy & Architecture, R&D Informatics, F. Hoffmann La Roche AG x Chris Taylor, Ph.D., Senior Technical Officer, European Bioinformatics Institute x Burkhard Tümmler, Ph.D., Professor, Pediatric Pneumology, Allergology and Neonatology, Hannover Medical School ,ĞůĚŝŶŽŶũƵŶĐƟŽŶǁŝƚŚ Pre-Conference Short Courses* (SC3) (SC4) (SC9) (SC10) Cloud Computing: Using Cloud Computing Infrastructure as a Service to Aid Research Scientists Microscopy Imaging Analysis: Quantitative Analysis of Large-Scale Biological Image Data Visualization of Large-Scale Biological Data NGS: Data Analysis * Separate registration required Organized by Cambridge Healthtech Institute, 250 First Ave., Ste. 300, Needham, MA 02494 Bio-ITWorldExpoEurope.com Europe’s No.1 Event in Biotechnology and Life Sciences Contents [&'~ Download a PDF of This Issue ] CLICK HERE! Special Report BiotIT World’s 2011 Best Practices Awards 31 The Select Six Best Practices 32 Enrollment Modeling Results in Productivity Gains for Merck 33 Novartis’ Open Source Clinical Imaging Platform 34 GSK’s Helium Rises to the Top 35 Accelrys Pipeline Pilot Guides ONT’s Nascent NGS Data Handling 36 CliniWorks Provides Patients as a Service 37 CDD’s Tuberculosis Collaboration Tool 38 2011 Best Practices Entries Up Front Next-Gen Data 8 Journal, Cloud, and Tool News from BGI 40 Open Source Genome Analytics for All 9 Eric Schadt Leads ‘Multiscale’ Institute at Mount Sinai 41 Ion Torrent Offers Sequencing at ‘Biblical Proportions’ 10 Illumina Showcases New Visions in Genomic Interpretation 42 Charges Fly over Ion Torrent Licenses 11 Briefs IT / Workflow 45 Gordon Puts Flash into Supercomputing 12 Ignite Institute Finds a Match at Fox Chase 47 NVIDIA Unveils New Flagship GPU Processor 5IF4LFQUJDBM0VUTJEFS 48 Panasas ActiveStor Storage Goes to 11 14 Big-Bucks Biology’s Broken Business Model 5IF#VTI%PDUSJOF 15 Limits to Drug Discovery Collaboration? In Every Issue *OTJHIUT0VUMPPL 5 Best of Best Practices; an Asian Engagement 16 The Cloud and Next Generation Sequencing 5IF3VTTFMM5SBOTDSJQU BY JOHN RUSSELL 49 DREAM6 Breaks New Ground Clinical Trials 22 Euphoria over EHR/EDC Interoperability May be Misplaced 23 DIA 2011—Compliance, Collaboration and the Cloud Computational Biology 25 Larry Gold’s Wellness Chip Detects Disease Biomarkers in Blood 28 Open Source Solutions for Image Data Analysis [4 ]#*0t*5 803-%+6-:|"6(6452011 'JSTU#BTF BY KEVIN DAVIES www.bio-itworld.com 6 Company and Advertiser Index 7 On Deck 50 Educational Opportunities SPECIAL ADVERTISING SECTION Begins on page 18 BEST New Products & Services $PWFSQIPUPHSBQICZ.BUU4UBWFS First Base Summer Heat KEVIN DAVIES An Asian Engagement Following our successful foray into Europe in 2009, we are thrilled to announce that we will be holding our first full Bio-IT World conference in Asia next summer (June 5-8, 2012). We’ve selected Singapore as the destination, and the spectacular 57-floor Marina Bay Sands convention center (and casino) as the venue. (I’m still trying to persuade my colleagues to convene one of the pre-conference workshops in the rooftop Infinity pool. We’ll see...) The move to Asia isn’t just a reflection of the gratifying growth in attendees, exhibitors and sponsors at our flagship conference in Boston and the European event. From talented software start-ups in India to the emerging power of BGI in China (see page 8), the Asian region is having an unprecedented impact on life sciences and biopharma, and is ripe with opportunities for partnerships and collaboration. We’ve been hearing from many regular attendees at Bio-IT World Expo how much they would like to reach the Asian scientific community under the right conditions. We intend to provide that forum for genuine technological and scientific exchange. We have put together a very impressive advisory board and we are now accepting speaker proposals at our website: www.bio-itworldasia.com. We welcome your input and contributions. In the meantime, we hope you’ll make plans to join us at Bio-IT World Europe this October (www.bio-itworldeurope.com). www.bio-itworld.com JULY | AUGUST 2011 #*0t*5 803-% [5] CONTENTS E ach year around this time, we like to showcase the winning entries in our annual Best Practices Awards competition, which has been held nearly every year since 2003. We invite a diverse group of judges to evaluate and rank dozens of entries from academia and industry highlighting best practices impacting data management in life sciences, however that might be defined. Our winners for 2011 were announced at the Bio-IT World Expo back in April. Our six winners—CliniWorks’ AccelFind, Collaborative Drug Discovery’s TB Database, GlaxoSmithKline’s delightfully named “Helium in Excel” (nominated by Ceiba Solutions); Merck’s Clinical Enrollment Optimization (nominated by DecisionView); Novartis’ ImagEDC solution; and Oxford Nanopore’s work with Accelrys on the Pipeline Pilot NGS Collection—are discussed elsewhere in this issue (see pages 31-38). It’s not feasible to pay tribute to every entry, but a few should be noted for making the judges’ task particularly difficult this year. The strongest of the four main categories this year was Knowledge Management. Andrew Su (Scripps Institute, formerly at the Genomics Institute of the Novartis Research Foundation, San Diego), submitted the Gene Wiki, a true collaborative project attracting 4 million views/month to help annotate and disseminate genome data. Another highly praised entry was Pfizer’s Oyster Imaging Collaborative Portal, designed in partnership with Radiant Sage Ventures, which has significantly improved image sharing and data access. In the IT Infrastructure category, judges also liked the UCLA Neuroimaging Lab’s unified storage infrastructure project, nominated by data storage vendor Isilon. Partnering with Accelrys, the London School of Hygiene and Tropical Medicine presented a high-throughput parasite imaging tool, while the Smithsonian Institution offered a LIMS tool for the Moorea Biocode project, providing barcode sequencing for 40,000 tropical species. Oxford Nanopore’s win in the Research and Discovery category is somewhat ironic, given how stealthy the British sequencing company has been. The Brits edged out some tough competition, including the Food and Drug Administration’s drug toxicity tool for animal testing, which one judge deemed “a dramatic step forward.” Many other entries would have competed strongly for top honors if they had featured more realworld deployments and collaborations. We’ll be announcing further details on the scope and timing of the 2012 Best Practices Awards shortly. ® Company Index 23andMe . . . . . . . . . . . . . . . . . . . . 40 454 Life Sciences . . . . . . . . . . . . . . 42 Abbott Laboratories. . . . . . . . . . . . . 38 Accelrys . . . . . . . . . . . . .5, 11, 35, 38 Agilent . . . . . . . . . . . . . . . . . . . . . . 26 Beijing Institutes of Life Science . . . 10 Biomatters . . . . . . . . . . . . . . . . . . . 38 BrainCells . . . . . . . . . . . . . . . . . . . . 38 BrainLab . . . . . . . . . . . . . . . . . . . . . 29 Brigham and Women’s Hospital . . . . 28 Bristol-Myers Squibb . . . . . . . . . . . . 26 British Columbia Cancer Research Centre . . . . . . . . . . . . . 10 Broad Institute . . . . . . . . . . . . .10, 28 Ceiba Solutions. . . . . . . . . . . . .34, 38 ChemAxon . . . . . . . . . . . . . . . . . . . 34 Children’s Hospital Boston . . . . . . . 13 ClearCanvas . . . . . . . . . . . . . . . . . . 29 ClearTrial . . . . . . . . . . . . . . . . . . . . . 38 Clinical Data Interchange Standards Consortium . . . . . . . . . 22 Clinical Ink . . . . . . . . . . . . . . . . . . . 22 CliniWorks . . . . . . . . . . . . . . . . .36, 38 Collaborative Drug Discovery . . .37, 38 Dana-Farber Cancer Institute . . . . . . 10 DecisionView. . . . . . . . . . . .23, 32, 38 DIA . . . . . . . . . . . . . . . . . . . . . . . . . 38 Drug Safety Alliance . . . . . . . . . . . . 23 Enlis Genomics . . . . . . . . . . . . .10, 11 ePharmaSolutions . . . . . . . . . . . . . . 38 ERT . . . . . . . . . . . . . . . . . . . . . . . . . 38 FastTrack . . . . . . . . . . . . . . . . . . . . . 22 FDA Division of Animal Research . . . 38 Food and Drug Administration . . . . . 23 Fox Chase Cancer Center . . . . . . . . 12 Fred Hutchinson Cancer Center . . . . 40 Fudan University . . . . . . . . . . . . . . . 10 Genomatix . . . . . . . . . . . . . . . . . . . 11 Genome Institute of Singapore . . . . 40 GlaxoSmithKline . . . . . . . . .23, 34, 38 Harvard University . . . . . . . . . . . . . . 11 Helicos Biosciences . . . . . . . . . . . . 13 IBM. . . . . . . . . . . . . . . . . . . . . . . . . 49 Ignite Institute for Individualized Health . . . . . . . . . . . . . . . . . . . . . 12 Illumina . . . . . . . . . . . . . . . . . . , 13, 9 ImmunoProfiles . . . . . . . . . . . . . . . . 11 Insights . . . . . . . . . . . . . . . . . . . . . . 14 Institute for Molecular Biosciences . . . . . . . . . . . . . . . . . 41 IO Informatics . . . . . . . . . . . . . . . . . 38 Ion Torrent . . . . . . . . . . . . . . . . .13, 42 Isilon. . . . . . . . . . . . . . . . . . . . . . . . . 5 Janssen Pharmaceutica. . . . . . . . . . 38 J. Craig Venter Institute . . . . . . . . . . 40 Life Technologies . . . . . . . . . . . .12, 42 London School of Hygiene and Tropical Medicine. . . . . . . . . . . . . 38 Max Planck Institute . . . . . . . . . . . . 47 Medidata . . . . . . . . . . . . . . . . . . . . 22 Merck . . . . . . . . . . . . . . .23, 9, 32, 36 National Cancer Institute . . . . . .22, 38 Navigenics . . . . . . . . . . . . . . . . . . . 40 New England Biolabs . . . . . . . . . . . 26 Nextrials . . . . . . . . . . . . . . . . . . . . . 22 Novartis . . . . . . . . . . . . . . .33, 36, 38 Novartis Institute for Biomedical Research . . . . . . . . . . . . . . . . . . . 38 NVIDIA . . . . . . . . . . . . . . . . . . . . . . 47 Ochsner Health System . . . . . . . . . . 38 OpenEye . . . . . . . . . . . . . . . . . . . . . 47 Oracle . . . . . . . . . . . . . . . . . . . . . . . 38 Orion Health . . . . . . . . . . . . . . . . . . 38 Otsuka . . . . . . . . . . . . . . . . . . . . . . 26 Oxford Nanopore . . . . . . . . . . . . . . . 16 Oxford Nanopore Technologies. .35, 38 Pacific Biosciences . . . . . . . . . . . . . , 9 Panasas . . . . . . . . . . . . . . . . . . . . . 48 Parexel . . . . . . . . . . . . . . . . . . . . . . 36 Partek . . . . . . . . . . . . . . . . . . . . . . . 11 Pennsylvania State University . .10, 11 Pfizer. . . . . . . . . . . . . . . . . . .5, 26, 38 Phlexglobal . . . . . . . . . . . . . . . . . . . 38 PHT Corporation . . . . . . . . . . . . . . . 11 PPD . . . . . . . . . . . . . . . . . . . . . . . . 38 ProtonMedia . . . . . . . . . . . . . . . . . . 23 Queensland Centre for Medical Genomics . . . . . . . . . . . . . . . . . . 41 Quest Diagnostics . . . . . . . . . . . . . . 25 Radiant Sage Ventures . . . . . . . . . . . 5 Recombinant Data Corp . . . . . . . . . 38 Roche . . . . . . . . . . . . . . . . . . . .23, 38 Rota Consortium, South Africa . . . . . 38 SAFE-BioPharma Association . . . . . 38 ScaleMP . . . . . . . . . . . . . . . . . . . . . 38 Scripps Institute . . . . . . . . . . . . . . . . 5 Selventa . . . . . . . . . . . . . . . . . . . . . 38 Smithsonian Institution . . . . . . . . 5, 38 SomaLogic . . . . . . . . . . . . . . . . . . . 25 Stanford University . . . . . . . . . .10, 42 Strand Life Sciences . . . . . . . . .10, 38 Strand Scientific Intelligence . . . . . . 11 Synexus Clinical Research . . . . . . . . 38 The Centre for Proteomic and Genomic Research . . . . . . . . . . . 11 TIBCO . . . . . . . . . . . . . . . . . . . . . . . 34 UCLA . . . . . . . . . . . . . . . . . . . . . . . . 5 UCSF . . . . . . . . . . . . . . . . . . . . . . . . 9 University Hospitals of Geneva. . . . . 29 University of California, San Diego . . . . . . . . . . . . . . . . . . 11 University of California, Santa Cruz . . . . . . . . . . . . . . . . . 42 University of Colorado . . . . . . . . . . . 26 University of Delaware . . . . . . . . . . . 11 University of Florida. . . . . . . . . . . . . 38 University of Georgia . . . . . . . . . . . . 11 University of Maryland. . . . . . . . . . . 10 University of Texas in Austin . . . . . . . 42 University of Texas Southwestern Medical Center at Dallas . . . . . . . 38 University of Tübingen . . . . . . . . . . . 11 VIB . . . . . . . . . . . . . . . . . . . . . . . . . 11 Indispensable Technologies Driving Discovery, Development, and Clinical Trials EDITOR-IN-CHIEF Kevin Davies (781) 972-1341 [email protected] MANAGING EDITOR Allison Proffitt (617) 233-8280 [email protected] ART DIRECTOR Mark Gabrenya (781) 972-1349 [email protected] VP BUSINESS DEVELOPMENT Angela Parsons (781) 972-5467 [email protected] VP SALES — LEAD GENERATION PROGRAMS Alan El Faye (213) 300-3886 [email protected] ACCOUNT MANAGER — ACCOUNTS A–K John J. Kistner (781) 972-1354 [email protected] ACCOUNT MANAGER — ACCOUNTS L–Z Tim McLucas (781) 972-1342 [email protected] CORPORATE MARKETING COMMUNICATIONS DIRECTOR Lisa Scimemi (781) 972-5446 [email protected] PROJECT/MARKETING MANAGER Lynn Cloonan (781) 972-1352 [email protected] ADVERTISING OPERATIONS COORDINATOR Stephanie Cline (781) 972-5465 [email protected] DESIGN DIRECTOR Tom Norton (781) 972-5440 [email protected] Contributing Editors Advertiser Index Advertiser Page # Bio-IT World & Bio-IT World Europe Conference & Expo. . . 2-3 Bio-ITWorldExpo.com, bio-itworldexpoeurope.com Advertiser Page # Clinical Ink . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 www.clinicalink.com Bio-IT World Asia . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 Bio-ITWorldAsia.com DiscoveRx . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18-19 www.discoverx.com Bio-IT World’s Cloud Computing Summit . . . . . . . . . . . . . . 52 Bio-ITCloudSummit.com Educational Opportunities. . . . . . . . . . . . . . . . . . . . . . . 50-51 Bio-ITWorld.com BioBase. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 biobase-international.com Michael Goldman, Karen Hopkin, Deborah Janssen, John Russell, Salvatore Salamone, Deborah Borfitz Ann Neuer, Tracy Smith Schmidt Insight Pharma Reports . . . . . . . . . . . . . . . . . . . . . . . . . . 30 InsightPharmaReports.com CHI Professional Marketing Services . . . . . . . . . . . . . . . . 21 www.bio-itworld.com/BioIT/WhitePapers.aspx! This index is provided as an additional service. The publisher does not assume any liability for errors or omissions. VO L U M E 1 0 , N O. 4 Editorial, Advertising, and Business Offices: 250 First Avenue, Suite 300, Needham, MA 02494; (781) 972-5400 BiorIT World (ISSN 1538-5728) is published bi-monthly by Cambridge Bio Collaborative, 250 First Avenue, Suite 300, Needham, MA 02494. Bio r IT World is free to qualified life science professionals. Periodicals postage paid at Boston, MA, and at additional post offices. The one-year subscription rate is $199 in the U.S., $240 in Canada, and $320 in all other countries (payable in U.S. funds on a U.S. bank only). POSTMASTER: Send change of address to Bio-IT World, 250 First Avenue, Suite 300, Needham, MA 02494. Canadian Publications Agreement Number 41318023. CANADIAN POSTMASTER: Please return undeliverables to PBIMS, Station A, PO Box 54, Windsor, ON N9A 6J5 or email [email protected]. Subscriptions: Address inquires to Bio-IT World, 250 First Avenue, Suite 300, Needham, MA 02494 888-999-6288 or e-mail [email protected] Reprints: Copyright © 2011 by Bio-IT World All rights reserved. Reproduction of material printed in Bio r IT World is forbidden without written permission. For reprints and/or copyright permission, please contact John J. Kistner, (781) 972-1354, [email protected] or Tim McLucas, (781) 972-1342, [email protected]. Advisory Board Jeffrey Augen, Mark Boguski, Steve Dickman, Kenneth Getz, Jim Golden, Andrew Hopkins, Caroline Kovac, Mark Murcko, John Reynders, Bernard P. Wess Jr. Cambridge Healthtech Institute PRESIDENT Phillips Kuhl Contact Information [email protected] 250 First Avenue, Suite 300 Needham, MA 02494 Follow us on Twitter, LinkedIn, and Facebook http://twitter.com/bioitworld www.linkedin.com/groupRegistration?gid=3141702 www.facebook.com/bioitworld [6 ]#*0t*5 803-%+6-:|"6(6452011 www.bio-itworld.com On Deck Coming in #JPt*58PSME’s September/October Issue Special Report: Laying the Genome Informatics Pipeline #JPt*58PSME editors present a series of compelling stories, features, and interviews on the latest advances in genome informatics and interpretation: t )PXBSFCJPJOGPSNBUJDJBOTQSFQBSJOHGPSUIFPOTMBVHIUPG clinical genomics? t 8IBUOFXDPNNFSDJBMBOEPQFOTPVSDFUPPMTBOEQMBUGPSNT are driving next-gen sequencing analysis? t )PXBSFTPNFPGUIFMBSHFTUUPPMDPNQBOJFTBOEUIFOJNCMFTU software providers adapting to the NGS market? For advertising and sponsorship opportunities in this exciting Special Report contact: Also in the September/October issue: Accounts A–K John J. Kistner (781) 972-1354 [email protected] t IT / Workflow Putting an IT Infrastructure in the Cloud CONTENTS t 8IBUJOTQJSFTUIFBSDIJUFDUPGUIFXPSMETMBSHFTUTFRVFODJOH factory? Accounts L–Z Tim McLucas (781) 972-1342 [email protected] t Clinical Trials The evolution of a clinical CRO And the Winner Is... We had an overwhelming response to our reader survey in the last issue of #JPt*58PSME. Your responses will be very helpful as we craft our digital strategy in the coming months. The winner of the Apple® iPad® was Michael Chin from Sanofi-Aventis. Thank you for your time and feedback, and we invite you to take part in this issue’s survey on page 39. Kevin Davies Editor-In-Chief *Apple is not a participant or supporter of this promotion. www.bio-itworld.com+6-:|"6(6452011 #*0t*5 803-% [7] Up Front News Big News from BGI #(*EFCVUTOFXKPVSOBMDMPVETFSWJDFTBOEUPPMT S BY ALLISON PROFFITT CONTENTS HENZHEN, CHINA—At the BioIT APAC conference hosted by BGI in Shenzhen in July*, researchers announced two new Cloud-based software-as-a-service offerings for nextgen data analysis, several new open source assembly tools, and launched a new journal for next-gen data. Structural variation requires a “totally different scale of technology,” said Yingrui Li, explaining that structural variations are more unique to individuals than SNPs. He called for more de novo sequencing, suggesting that whole genome de novo assembly would be offer a more comprehensive structural variant map. Li’s message aligned well with updates to the SOAP algorithms. The Short Oligonucleotide Analysis Package gained a de novo short reads assembler (SOAPdenovo 2), the alignment tool (SOAP3GPU/CPU), a graph-based indel finder (SOAPindel) and assembly-based structural variation finder (SOAPsv). These updates join the existing alignment tool (SOAPaligner/soap2) and re-sequencing consensus sequence builder (SOAPsnp). SOAP3 reflects improvements in two branches of the algorithm. GPU-accelerated alignment with BWT could take 2.6 seconds to perform exact matching for 1 million 100bp reads. SOAP3-CPU shows improved accuracy over SOAP2 with similar speed. SOAPdenovo 2 is designed to assemble human-sized genomes, and reflects algorithms changes to contig construction, scaffolding, and gap closure. The SOAP toolkit is available at http:// soap.genomics.org.cn/. Flex Time Hecate and Gaea (named for Greek gods) are two “flexible computing” solutions for de novo assembly and genome resequencing that make the most of the new SOAP algorithms. These are “cloud-based #JP*5"1"$$POGFSFODF&YQP4IFO[IFO$IJOB +VMZ [8 ]#*0t*5 803-%+6-:|"6(6452011 services for genetic researchers” so users don’t need to “purchase your own cloud clusters,” said Evan Xiang, part of the flexible computing group at BGI Shenzhen (see, “BGI Cloud on the Horizon,” %LRv,7 World, Jan 2011). Hecate will do de novo assembly, and Gaea will run the SOAP2, BWA, Samtools, DIndel, and BGI’s realSFS algorithms. Xiang expects an updated version of Gaea to be released later this year with more algorithms available. H aving DOIs for datasets will enable reserachers to cite datasets in their work. Flexible computing, explained Xiang, is a more efficient cluster architecture than traditional Cloud. Jobs of different types are grouped on the cluster to make the most of computing power and address scalability issues. For instance, CPU intensive jobs are grouped; memoryintensive jobs are grouped; and input/ output intensive jobs are grouped. Both the Hecate and Gaea services will run on the BGI compute cluster because “Amazon is slow,” Xiang said. Running the services on an in-house cluster also alleviates any internet access issues. Hecate is based on a series of distributed algorithms to recognize and simplify non-branching repeat-free regions of the genome, correct errors and resolve the ambiguous bubbles and short repeats, together with the distributed graph shrinkage algorithms to construct a linear DNA sequence. Based on BGI’s SOAPdenovo and SOAP2 algorithms, Hecate is more scalable than those algorithms alone. Xiang presented results from speed comparisons showing significant cost and time savings using Hecate for de novo www.bio-itworld.com assembly. Running SOAPdenovo on a single server for 70 hours resulted in 80% genome coverage at a hardware price of $150,000. Using 96 Hecate cores, the genome coverage increased to 84% in 42 hours at a price of $60,000. Gaea is designed to distribute resequencing computation to a cluster of nodes based on the Hadoop Streaming framework with personalized algorithm interfaces for SOAP and BWA. For the current version of Gaea (v1.2), Xiang reported speed increases of 75x for SOAP2 and 90x for BWA using 100 cores. At 400 cores those numbers rose to 300x and 346x speed increases compared to running either algorithm on a single core. Xiang expects Gaea v2.0 to see further improvements. Gaea is also optimized for a biomarker analysis toolkit that includes SOAPsnp, DIndel and realSFS for SNP calling, indel calling, and gap alignment. Data Citing Also at the event BGI formally announced its new journal, GigaScience, which will launch in November 2011. Co-published by BGI and BioMed Central, GigaScience is an integrated journal and database, said Scott Edmunds, editor of the journal. GigaScience plans to stress usability and reproducibility in its review process. The journal will solicit “big data” studies and hopes to provide a forum for dealing with the difficulties of handling large-scale data from all areas of the life sciences. In addition to traditionally peerreviewed papers, GigaScience will publish citable datasets, each with permanent digital object identifiers (DOIs). Datasets will be hosted on the BGI cloud along with the SOAP toolkit and other BGI products. This will facilitate tool testing, Edmunds said, as the tools and data are in the same place. Having DOIs for datasets will enable researchers to cite datasets used in their work and, Edmunds hopes, speed data release and dissemination. “Dealing with data is not just about storage,” he said, “but dissemination too.” To prime the pump, BGI released eight animal genomes, each with a DOI that enables the dataset to be freely used by researchers and then cited in publications. x Partnering on Multiscale Biology 1BD#JP$40&SJD4DIBEUUPMFBEBA.VMUJTDBMF*OTUJUVUFBU.PVOU4JOBJ “It’s not an exclusive relationship with PacBio,” says Charney. “We have Illumina machines and so forth. You can’t stick with one technology platform. So we’re going to be active in all of those platforms. However, I can see that Illumina might say, ‘It’s fine if you buy our commercial machines, but we’re not going to share our latest next-gen machine.’ That may be a consequence of this, but we’re certainly going to be buying other commercial machines.” BY KEVIN DAVIES Dual Role The new institute is an expansion of MSSM’s Institute of Genomics, which was formed a couple of years ago. Charney had been recruiting a new leader for the institute to succeed medical geneticist Robert Desnick, who is stepping down as director and as the Chair of the MSSM Eric Schadt genetics department. Schadt is taking on both positions. Charney began recruiting Schadt as the new head of the department. “I went after him big time,” he says. It quickly became clear that “a partnership with PacBio would be good for Mount Sinai, apparently good for PacBio, and was something that Eric really endorsed. To me, it seemed like a win-win,” he continued. “We get access to PacBio technology, which we’re all very excited about. And they get, in a sense, access to the great research we’re doing here, the patient populations that could be like a testing site.” Schadt waves off concerns that other next-generation sequencing and technology vendors might not be inclined to collaborate with the new MSSM multiscale institute team. “Other companies will be very hungry to work with the institute at MSSM,” Schadt told %LRv,7 :RUOG, “because our vision is to become a dominant force in integrating data from many technologies and developing predictive models that impact physician and patient decision making. This will help grow the market and all will benefit, so there will be strong incentive for many companies to be part of that—or watch it from the outside!” Patient Partner For its part, PacBio had been seeking a potential academic partner for more than a year to help develop new applications for its technology and gain access to patients to move the technology into the clinic. MSSM stepped into the picture after talks with UCSF stalled. Sounding themes around the integration of genomic, expression, and clinical data that have typified his earlier career at Rosetta Inpharmatics and Merck (see, “Eric Schadt’s Integrative Approach to Predictive Biology,” %LRv,7 :RUOG, Oct 2008), Schadt said: “Multiscale data integration, including genomic, expression, metabolite, protein, and clinical information, will ultimately define the future of patient care. With our intent to collaborate in areas such as newborn screening for rare genetic disorders, infectious diseases and cancer, we hope to accelerate this revolution, starting by integrating clinical data with previously untapped biological information to build new computational models for predicting human disease.” “Multiscale Biology” is a term that Schadt coined. The way Charney understands the term, “we’re talking about systems genetics, integrative genetics, systems biology, which we’re very strong in at Mount Sinai. The idea of one gene/ one disease or looking at genes in isolation from pathways doesn’t make sense. That’s totally in line with the way we’re doing things.” x www.bio-itworld.com+6-:|"6(6452011 #*0t*5 803-% [9] CONTENTS Pacific Biosciences has announced a partnership with Mount Sinai School of Medicine (MSSM) in New York to create the Institute for Genomics and Multiscale Biology. The director of the new institute will be PacBio’s chief science officer, Eric Schadt, who is retaining his non-operational position at PacBio while moving to New York this summer to run the center. The institute will be the hub of genomics research at MSSM, collaborating with 13 other translational and core facilities at Mount Sinai, and incorporating a user facility featuring PacBio’s technology. The MSSM SMRT Biology facility will be equipped with R&D versions or “Astros” of the PacBio instruments. They will be available for use by institute researchers as well as other collaborators in the eastern half of the country. MSSM Dean Dennis Charney says he expects to invest more than $100 million in the new institute over the next 5-6 years, as part of a $1-billion capital campaign for MSSM. “We’ve raised $750 million and it’s ahead of schedule,” said Charney, with the initial funding coming from philanthropy. Charney believes that the large-scale generation and integration of multiple sources of biological data, integrated with clinical information, will expand MSSM’s ability to characterize disease and ultimately benefit patients. “The Institute for Genomics and Multiscale Biology will be at the forefront of the revolution in genetics and genomic sciences, which will fundamentally change the practice of medicine,” he added. Up Front News Illumina Showcases New Visions in Genomic Interpretation J%&"DPOGFSFODFOBNFTEBUBWJ[XJOOFST BY ALLISON PROFFITT CONTENTS SAN DIEGO—Illumina CEO Jay Flatley kicked off the iDEA (Illumina Data Excellence Awards) conference* with a striking prediction: We would be at the $1,000 genome—all in—within three to five years and that there was no need for any new technology. Then he passed the microphone to the dozen finalists competing in the inaugural iDEA awards challenge. Scott Kahn, chief informatics officer at Illumina, assured me over lunch the next day that Flatley’s promises didn’t scare him. “Being the one responsible for the informatics part, I’m not shuddering,” he said. “Jay said $1,000 all in. ‘All in’ means everything to do with the sample, everything to do with the analysis. Go back two years, what everyone did was take the images off the machine and they had this huge informatics pipeline. They spent probably more than $1,000 to $2,000 just in raw CPU cycles to do the analysis and image processing.” Flatley insisted that the next-generation sequencing field needs improved technologies, not new ones; faster cameras, for instance, not different ones. “Jay’s alluding to the fact that… you’re going to see more of the downstream alignment and variant calling move onto the instrument because it can, because it reduces costs and the time to result,” said Kahn. He was adamant that Flatley’s “all in” doesn’t include the cost of interpretation. “It’s just the genome; that’s why I’m not shuddering.” Even as the technical costs continue to fall and sequencing gets faster (Illumina’s mini MiSeq, which launches later this year, will generate more than one gigabase per run in about a day), Kahn acknowledges that there are still huge problems that need to be addressed in the interpretation and use of genomic data. J%&"$IBMMFOHF4BO%JFHP+VOF [10 ]#*0t*5 803-%+6-:|"6(6452011 New Ideas To help bridge the gap between data and interpretation, Illumina hosted its first iDEA challenge and conference. Announced in May 2010, the competition was designed to challenge both commercial and academic entrants to develop new and creative visualization and data analysis techniques. From 30 entries, judges selected 12 finalists based on technical merit—7 academic entries, 5 commercial (see, “Top Twelve”). Each finalist gave a pre- Stephan Schuster [from Penn State] gave on inGAP… The challenge the judges are going to have is how to weight entries that explore very different variables.” “There are clearly some entries that are scratching an itch that people didn’t know was there, and there are others that are very novel approaches to solving problems that other people have solved,” Kahn said. “You can solve an unmet problem incompletely, but at least it’s a partial solution. How do you weigh that against, ‘Here’s something that takes a capability and enhances it significantly?’” Kahn had his own ideas, but the iDEA entries were evaluated by an independent group of judges, including Steven Jones (Genome Science Centre at the British Columbia Cancer Research Centre, Canada), Jared Maguire (Broad Institute), John Quackenbush (Dana-Farber Cancer Institute), Steven Salzberg (University of Maryland), Gavin Sherlock (Stanford University), and Bang Wong (Broad Institute). The Envelope Please The judges awarded sculptures from glass artist Barry Entner to the six winners. One of Kahn’s favorites, inGAP from Pennsylvania State University, in conjunction with Fudan University and the Beijing Institutes of Life Science (BioLS), Chinese Academy of Sciences, won the overall academic award and a GenomeRing helps visualize indels and SNPs compared to a $50,000 grant from Ilmaster genome, with each color representing a genome’s progression on a single chromosome or across several lumina to further develop chromosomes. the software. inGAP, an Integrated Next-gen Gesentation and answered questions at the nome Analysis Pipeline, started in 2007 conference. as a SNP calling tool for Sanger sequence Just before the winners were andata, and now includes aligners, detects nounced, Kahn said was pleased with SNPs, indels and structural variation, and the quality of talks and the entries, and does comparative genome assembly all said he had learned a couple of new with a graphic user interface. The award things. “Some of the methods are very grant will be used to extend inGAP to cool, good ideas… I like the idea of a nonmetagenomics and transcriptomics studlinear representation of the genome, like ies, said Fangqing Zhao at BioLS. the [Strand Life Sciences Avadis entry] Enlis Genomics received the overall elastic browser stuff. I liked the talk that award in the commercial category and a www.bio-itworld.com Briefs Top Twelve iDEAs Enlis Genomics Genomatix Software Harvard University, Seqeyes ImmunoProfiles Partek Pennsylvania State University, inGAP Strand Scientific Intelligence, Avadis NGS University of California, San Diego, STAR Genome Browser University of Delaware University of Georgia, DawgPack University of Tübingen, GenomeRing VIB, GenomeView PROTEIN-PROTEIN AGGREGATION PREDICTION AccelrysJTIPQJOHUPCPPTUTDJFOUJGJDDPMMBCPSBUJPOBOEFGGJDJFODZBT well as address a major challenge JOUIFEFWFMPQNFOUPGCJPUIFSBpeutics with the latest release of Discovery Studio. The new release incorporates what Accelrys says is UIFGJSTUDPNNFSDJBMMZBWBJMBCMF software for predicting proteinprotein aggregation to advance CJPUIFSBQFVUJDTSFTFBSDI*UFOBCMFT protein engineers to identify the MPDBUJPOPGSFHJPOTPOBOUJCPEies prone to aggregation and then QSFEJDUTVCTUJUVUJPOTUPJNQSPWF NPMFDVMBSTUBCJMJUZ SOUTH AFRICAN PHARMACOGENOMICS The Centre for Proteomic and Genomic Research (CPGR) in South Africa has joined forces with the Division of Human Genetics at the University of Cape Town (UCT), and the Pharmacogenomics for Every Nation Initiative (PGENI), to map genetic traits underlying the efficacy of drug treatments in Southern "GSJDBOQPQVMBUJPOT5IFDPMMBCPration will form a regional PGENI Centre of Excellence and, following a pilot phase, hopes to conduct large-scale pharmacogenetic studies to correlate the prevalence of DNA polymorphic traits in local populations with the efficacy of DPNNPOMZQSFTDSJCFEESVHT PATIENT REPORTING PREDICTIONS PHT Corporation is freely disseminating its ePRO Modality Tool, XIJDIFOBCMFTTQPOTPSTBOE$30T to determine which of five methPETTNBSUQIPOFUBCMFU*OUFSOFU digital pen and hand held device— is most effective for collecting ePRO EBUBCBTFEPOUIFJSTQFDJGJDTUVEZ protocol and questionnaires. By QVCMJTIJOHUIFUPPMBOEMJGUJOHBMM previous restrictions for its use, PHT hopes to promote further ePRO adoption. www.bio-itworld.com+6-:|"6(6452011 #*0t*5 803-% [ 11 ] CONTENTS one-year co-marketing agreement with Illumina. Another entry that impressed Kahn, Enlis hopes to enable “point-andclick genomics” for biologists rather than bioinformaticians. “Existing software packages have been focused on the bioinformatic tasks of assembling a genome, but our software is the first commercial package to recognize that for many biologists, the work of connecting genomic data to biology starts after variants have been called,” said Devon Jensen, Enlis’ founder. Fast algorithms enable variation filtering and genome comparisons, and Enlis’ .genome file format wraps all genomic data into a single compact and efficient file. GenomeRing from the University of Tübingen (Germany) and Partek won awards for the most creative algorithms. GenomeRing is an interactive tool to visualize indels, SNPs, and other changes in dozens of genomes in a circular, rather than linear, view by constructing a “SuperGenome” and using the structure to compare different genomes. “We feel really honored,” said Kay Nieselt, head of the integrative transcriptomics group. “We now hope to be able to apply for new funding in the area of Visual Analytics in Bioinformatics. We are very motivated to continue our work to create new innovative algorithms and visualizations in the area of next-generation sequencing technologies.” Partek debuted Gene-Specific Modeling for the iDEA Challenge along with the company’s Flow, Genomics Suite, and Pathway products. Gene-Specific Modeling takes the position that one model will not fit all genes, for example age affects some genes and doesn’t others. By using the algorithm to select the best model for each gene, users can identify which and how many genes are affected by which factors and make more accurate statistical analysis due to better model fit. “For years Partek has been recognized as a leader in making powerful statistical methods easily accessible to medical researchers,” said Tom Downey, president. “So, to be recognized for doing that again by a panel of renowned scientists is very satisfying. I’m proud of our team and glad that their hard work paid off.” GenomeView from VIB (Flanders, Belgium) and Genomatix received the most creative visualization awards. GenomeView enables users to dynamically browse high volumes of aligned short read data, with dynamic navigation and semantic zooming, viewing whole genome alignments of dozens of genomes relative to a reference sequence. “There is still a lot of work to be done. Everybody agrees there is a clear need for visualization tools for genomics data and GenomeView has at least part of the solution. The iDEA award tells me that we’re doing something right. Now the trick is to convert that information into papers and grant money and we’re good to go,” said Thomas Abeel, a postdoctoral/Broad fellow at VIB. Finally, Genomatix presented several workflow tools that one judge called “very intuitive” to cover the complete analysis of the iDEA datasets from mapping to the generation of biological networks. These included Transcriptome Viewer to interactively inspect transcript expression, splicing graphs and paired-end coverages in one view; a one-step mapping approach; and ElDorado, Genomatix’ genomic annotation database. “The iDEA challenge really sparked our interest from the very first moment we heard about it,” said Jochen Supper, project manager. “We felt that getting our hands on a high quality, diverse dataset like the one Illumina provided would be ideally suited to try and test our approach of combining multiple lines of evidence to get from sequencing data to biological results.” x Up Front News Ignite Institute Finds a Match at Fox Chase %JFUSJDI4UFQIBOTQFSTPOBMJ[FENFEJDJOFDFOUFSGJOETBIPNFJO1IJMBEFMQIJB BY KEVIN DAVIES CONTENTS Following the widely publicized demise of plans to locate the Ignite Institute for Individualized Health in Northern Virginia, the institute has found a new home as part of a three-way partnership at the Fox Chase Cancer Center in Philadelphia. Only it won’t be called Ignite anymore. Ignite has been rolled into pre-existing plans at Fox Chase to build a center for personalized medicine, says Jeff Boyd, senior vice president of Molecular Medicine at Fox Chase. “The Ignite Institute and Fox Chase are working together with Life Technologies to launch what is now the Cancer Genome Institute at Fox Chase,” says Boyd. Ignite’s founder, Dietrich Stephan, serves as consulting chief scientific officer of the new institute. While searching for a home for Ignite, Stephan had forged a provisional deal with Life Technologies for 100 next-generation sequencing (NGS) instruments. “After the big Ignite deal in Northern Virginia went away, the relationship between Life and Ignite went with it,” says Boyd. A new partnership between Fox Chase and Life Technologies was announced in June, although the Ignite name was notable for its absence in the news release. “We didn’t mention Ignite [when that Jeff Boyd [12 ]#*0t*5 803-%+6-:|"6(6452011 Fox Chase Cancer Center in Philadelphia. was announced]—that was intentional,” Boyd explains. “We got tired of negative reporters who want to dig into what happened to Ignite [in Northern Virginia] and dredge up that experience.” “Dietrich’s grand plan was to do personalized medicine in any number of manifestations—pediatrics, neurological, cancer, all in one big institute. But he saw that it made a lot of sense to step back and silo things out. He’s landed here at Fox Chase with respect to the oncology piece of his vision. We had a similar vision.” For his part, Stephan says he “crisscrossed the country multiple times, looking for a situation where we could land the whole shebang. It’s hard to build a $150-million research building.” Stephan says there were many organizations eager to get into the personalized medicine space, even if they couldn’t support a full-bore TGen or Broad Institute model. “So my idea was to break Ignite into five disease models and decentralize” (see, “Gene Partnership”). ‘Ome Coming In 2009, Fox Chase had established its own self-funded nascent Institute for Personalized Medicine. “Most of the leaders in this space believe this is the future, www.bio-itworld.com especially in the cancer arena,” says Boyd. “We’re not going to get any further with combinations of cytotoxic drugs. Combination therapies are clearly what we need to be thinking about, hence analysis of the tumor, and at some point exomes, transcriptomes and whole genomes. Something with ‘ome!” Fox Chase management hired PricewaterhouseCoopers as consultants to decide how to evolve the institute into the clinical arena. “They were helping us build a business plan, which required a lot of philanthropy to develop a larger institute of personalized medicine. We were introduced to Dietrich, and he introduced us to Life Technologies.” Stephan says he felt “a lot of allegiance to Life Technologies. They wanted to stick with me. When Fox Chase started looking real, I brought Life Technologies in to bring closure to that deal.” Boyd says Fox Chase had a lot to offer as an intellectual, medical, and technology partner. “We’re a free standing NCIfunded comprehensive cancer center, a northeast location, we have an incredible biosample repository, top Phase 1 clinical trial center, and brand new, state of the art lab space available.” Details of the Life Technologies deal Gene Partnership The other four areas under Dietrich 4UFQIBOTPSJHJOBM*HOJUFVNCSFMMB XFSFQFEJBUSJDTNFUBCPMJDEJTease, cardiac disease and neurology. Stephan has found another northeast home for his interests in pediatrics, or “germline disease,” in Boston. “$IJMESFOT)PTQJUBMIBECFFOUIJOLJOHBCPVUTPNFUIJOHTJNJMBS<UPNF> *TQFOUUJNFXJUIUIBUUFBN5IFZMM CFQVUUJOHNJMMJPOJOUPBZFBS effort called The Gene Partnership.” 5IF(FOF1BSUOFSTIJQ5(1 JTCJMMFE as “a cutting-edge research initiaUJWFUIBUDPNCJOFTUIFJOOPWBUJPOPG genomic research and IT to create the SJDIFTUMPOHJUVEJOBMLOPXMFEHFCBTF of genetic and clinical pediatric data so on, we’ll have a Fall start for enrolling patients. We’ve sequenced dozens of exomes, transcriptomes, from all manner of samples—fresh tissue, frozen tissue, microdissection, paraffin-embedded, but haven’t embarked on patient care yet.” “I think we’re going to leapfrog the 5500 XL and once the Ion Torrent has reached the stage where we can think about whole exomes and genomes, we’ll shift from SOLiD 4s to ultimately [Life Technologies’] third-generation instrument, based on the Ion Torrent technology. We’re quite optimistic that will become the industry standard.” N=1 Boyd says the institute will create its own model of patient care, focusing on “a rigorous analysis” of druggable targets and genes in cancer-related signaling pathways. The institute will see patients with most kinds of cancer, although it does not care for patients with brain cancer or pediatric cases. “We won’t fiddle with the standard of care for new cancer patients,” says Boyd. “Pancreatic cancer, stage IV ovarian cancer, breast cancer, lung cancer, those might be examples where we utilize genome sequencing out of the gate.” Initially, Boyd will offer transcriptome Dietrich Stephan expand into the other three areas once UBSHFUFECZ*HOJUFi*UIJOL*WFHPU enough going on right now,” he says. K.D. analysis in tandem with exome sequencing to provide insight into druggable pathways. “That comes as a package. We’re not offering full genome yet. But we don’t think it will be many years until we offer full genome. It’s a clinical decision: each patient will have to be considered uniquely in terms of life expectancy, cost, etc. It is expected to decrease substantially. There’ll be individual decisions made for each patient in consultation with their medical oncologist at the center.” While Boyd hesitates to say when full genome sequencing will become routine for cancer care, he does believe there is promise in looking at exome sequencing clinically, rather than focusing on just a group of “hotspot” genes frequently mutated in cancer. He says his group will remain at the front edge of the technology bell curve. Fox Chase admits 8,000 new patients/ year, a number that will increase as the genome center unfolds. “It’s both extraordinarily exciting and a little terrifying at the same time. But we’ve chosen to devote a lot of energy and resources to it, and we’ve cast our lot,” says Boyd. Stephan expects the center to sequence a couple of hundred patients this year, ramping up to 2,500 patients annually. x www.bio-itworld.com+6-:|"6(6452011 #*0t*5 803-% [ 13 ] CONTENTS are confidential, says Boyd, but he does say it is a multidisciplinary partnership involving state-of-the-art technology, “both from deep sequencing as well as from an IT/bioinformatics standpoint. They’re an enormous company with a lot of depth. This project is quite complex, more than just grinding up tumors and looking for mutations in pathways.” For genome analysis to become a routine part of clinical care, Boyd stresses that many issues still have to be worked out. “Patient flow, charging, informed consent, CLIA, etc. I think Life Tech is looking to us to represent how one would do that, how we’d create such an institute. We could establish a model for other institutions to follow that would benefit the field.” Fox Chase currently uses some Illumina instrumentation, but Boyd says he is “satisfied that the SOLiD 4 has the sensitivity and specificity that is comparable to anything Illumina has to offer.” But he is also making a bet on the scalability of the Ion Torrent semiconductor sequencing technology. The Fox Chase Cancer Institute is currently deploying six SOLiD instruments in an R&D setting, plus a couple of Ion Torrent machines. “Once we get plugged in, sign informed consent documents, and in the world.” Stephan has high hopes for TGP, of which he is the executive director. i*NBOFNJTTBSZPGTPSUTGPSUIJT GPDVTJOHPOUIFQSPWJEFSTJEF5IFZMM CMPXUIFEPPSTPGGUIJTBU$IJMESFOTw says Stephan. “If this ever takes root as an integral part of medicine, it has to CFNPOFUBSJMZTVTUBJOBCMFw4UFQIBO TBZTBOVNCFSPG#PTUPOCJPUFDIWFUerans, including venture capitalists /PVCBS"GFZBOBOE4UBOMFZ-BQJEVT (co-founders of Helicos Biosciences), were among “a true rock-star team” to discuss the concept. “Ultimately, the hospital had enough faith to make the investment.” 8JUIFOHBHFNFOUTBU'PY$IBTF BOE$IJMESFOT)PTQJUBMUPNBOBHF Stephan has no immediate plans to Up Front The Skeptical Outsider Big-Bucks Biology’s Broken Business Model BILL FREZZA CONTENTS ell me how someone is compensated and I’ll tell you how they’ll behave,” goes the old adage. If non-monetary rewards are considered alongside financial remuneration this pretty much describes why federally funded research in the life sciences is producing less and less bang for more and more bucks. And why the scientific literature is at risk of becoming polluted with overreaching claims, obfuscated shortcomings, and non-reproducible results. Scientists labor to discover nature’s truths, not design products. This makes it unfair to demand that they “cure cancer” in return for living on the public dole. Rather, we expect academic scientists to report on the fundamental rules that govern health and disease, passing their knowledge to commercial players to come up with products and services that improve our lives. We also realize it can take years, sometimes even decades, for scientific advances to find their way into pharmacopeia and physician practice. This makes the benefits taxpayers receive from supporting scientists both indirect and difficult to measure. Which makes it fair to ask: Is the $31 billion of taxpayer money funneled to scientists each year through the National Institutes of Health being spent wisely? For an endeavor that consumes billions, academic research remains a cottage industry of individual practitioners called Principle Investigators (PIs). Their search for truth begins with the quest for grant money, the mother’s milk of modern science. PI’s unlock the door to the treasury by having their grant applications reviewed by... fellow PIs. PIs are business units unto themselves, employing laboratory slaves otherwise known as graduate students, who perform the bulk of the hands-on scientific work. As long as grant money keeps flowing, PIs answer to no one. They pay overhead to the universities that give them lab space and in return these universities confer upon PIs the singular ability to manumit their lab slaves by awarding them Ph.Ds. One cannot build a life as a tenured, taxpayer-supported scientist without one. Lab slaves convert grant money into scientific papers that bear their PI’s names. Because lab slaves are comparatively ‘T [14 ]#*0t*5 803-%+6-:|"6(6452011 www.bio-itworld.com cheap, the level of automation and efficiency in most academic labs is appallingly low. Hand work and manual data collection are the rule, both prone to error and vulnerable to selection bias. Only data approved by the PI gets submitted for publication. Chasing Impact Factor Scientific journals exist within a status hierarchy. Publishing in a high-impact journal gives PIs the one thing they crave as much as grant money—academic fame. Useful collaboration between grad students working for different PIs is often discouraged, as too many PI names on the papers dilute fame. The most prestige goes to PIs who plant their flag first in a new area, whether they develop it or not. Like the race to the South Pole, no one cares who got there second. Before papers can be published they must be reviewed by... fellow PIs. PIs that review each other’s papers are not tasked with reproducing results, though politics can certainly play a role in critiquing conclusions. For some this means turning peer review into pal review. For others, it might mean delaying a rival’s pending publication. If the whole system sounds like a medieval guild, that’s because it is. We can be thankful that the vast majority of PIs operate with the highest degree of intellectual integrity. Such a small fraction of scientists engage in outright fabrication that when fraud is uncovered it makes national news. But the grey region short of fabrication covers a lot of Scientists labor ground, especially when pumping $31 billion a year through a medito discover eval guild system. nature’s truths, How many times does an experiment have to be repeated making it unfair before it is judged “successful?” to demand they What if that one-time “success” can’t be reproduced? How much “cure cancer.” inconvenient data gets discarded on the road to publication? Lab slaves that give PIs the results that they want, especially results confirming pet theories, move one step closer to freedom. Lab slaves that displease their PIs can wash glassware for years, or wash out with a master’s degree. Or worse, as in the infamous case of the Harvard chemistry professor who had three grad students commit suicide before the administration stepped in to make changes. Reforming our graduate education system by introducing more transparency, accountability, and efficiency would help ensure taxpayers get their money’s worth. Is that too much to ask of a “War on Cancer” that has gone on for 40 years? Bill Frezza is a consultant and venture capitalist living in Boston. He is a regular contributor to RealClearMarkets and Forbes.com. Bill can be reached at [email protected]. The Bush Doctrine Collaboration Limits? ERNIE BUSH The Target Validation Consortium Since the core of our business has always been about the “collaborative advantage,” we were approached a couple years ago by someone who wanted to build a collaborative organization of pharma companies for the purposes of establishing a “pharmacological targets” validation consortium. The central thesis of his proposal was that all large pharmaceutical companies have a constant need to discover and validate novel biological targets as a basis for developing new medical therapies. In particular, he felt that the target discovery field is dominated by the academic research community because they are focused on the basic biological sciences needed to uncover the functional activities of proteins and pathways. This basic biology is then supplemented by activities needing larger investments, which include large-scale protein synthesis, building and executing high-throughput functional screens, Ernie Bush is VP and scientific director of Cambridge Health Associates. He can be reached at: [email protected]. www.bio-itworld.com+6-:|"6(6452011 #*0t*5 803-% [ 15 ] CONTENTS I n a recent brief for Nature Reviews Drug Discovery, John Arrowsmith of Thomson Reuters reported the following information regarding Phase II clinical failures in the period 2008-10. Of the 108 reported failures, 87 listed reasons for failure and they were distributed in four categories as follows: Efficacy (51%); Strategic (29%); Safety (19% both preclinical and clinical); and PK/BA (1%). When he combined these data with the biological target areas for the failed compounds, Arrowsmith made the following observation: “Although it is difficult to draw conclusions from these data, the finding that a substantial proportion of Phase II failures were due to strategic reasons suggests that one important underlying factor could be overlapping R&D activity between companies with drugs in Phase II trials. This raises the question of whether an increase in collaborative efforts between companies up to the point of proof-of-concept for novel targets or mechanisms might be more cost- and time-effective.” While I know many companies conduct joint development efforts on a single compound, I find the idea of multiple companies joining to conduct clinical target validation studies, potentially across a range of compounds, a very intriguing idea. It is one that has come up before, however, but in a more limited context. and animal testing capabilities. Of course, all of these supplemental activities are usually provided or funded by pharma companies. Unfortunately, these larger investments are often replicated across multiple pharma companies as is the cost of conducting early safety studies to determine if the target has unwanted pharmacology. As it is very common for multiple companies to be chasing the same targets, the overall duplication of efforts and expense to achieve a validated target is wasteful and arguably not as thorough or comprehensive as could be achieved in a collaborative effort. Or so he proposed... unfortunately, when we tried to formalize and build such a consortium, his employer decided they could not support the idea of taking collaboration to that level. Of course, what is interesting to me is that Arrowsmith’s suggestion takes this proposal one step further. He is basically asking, “Why don’t companies collaborate not only on the discovery target validation, but on the validation of the target all the way through to human clinical studies?” This has the possibility of greatly reducing the number of failures both due to lack of efficacy but also of reducing the number of failures due to “strategic” reasons. In a logical extension of Arrowsmith’s observation, one could also ask: Why would collaboration have to stop at target validation? What about collaborative Phase II and/or Phase III studies? At the very least, having a shared safety database on compounds hitting the same targets would be of value to the companies, to the regulators, and to the general public health. Or am I clear off the reservation? As such, the larger question becomes one of “what are the limits of collaboration within the pharmaceutical R&D space?” Recent years have seen many types of ‘collaborations’ introduced such as the Enlight Biosciences (see, “Big Pharma’s Road to Enlight(enment),” %LRv,7:RUOG, Sept 2008) co-investment collaboration, the Pistoia Alliance (see, “The Italian (Informatics) Job,” %LRv,7:RUOG, Jan 2010) open software standards collaboration and the Preclinical Safety Testing Consortia biomarker discovery/development collaboration. All of these share the common thread of multiple pharmas joining together to achieve an objective that would be difficult or expensive for any one company to achieve independently; but they have very different operational characteristics and an even wider diversity of missions. But I must say that the concept of companies actually collaborating on the clinical development space truly looks like a significant step beyond anything actively ongoing to date. If there is real synergy and leverage to be obtained through collaboration, then why not collaborate across the whole R&D space? Or is the fact that contract organizations are becoming increasingly dominate across the entire R&D space really just another expression of this concept? Up Front Insights | Outlook Cloud and the Next Generation T odd Smith is the senior leader of research and applications at Geospiza, now part of PerkinElmer (acquired May 2011). As senior leader, Smith helps develop the company’s research roadmap around high-performance computing and ensures Geospiza’s GeneSifter software scales to meet the future demands of highthroughput sequencing systems. Smith was interviewed by ,QVLJKW3KDUPD5HSRUWVIRULWVODWHVWUHSRUWRQQH[WJHQHUDWLRQ VHTXHQFLQJ+HUHDUHVRPHH[WUDFWVIURPWKDWLQWHUYLHZ CONTENTS On Geospiza’s cloud strategy: I guess we’ve always felt the cloud would be very important, so we’ve always had that as part of our strategy. Going forward it becomes more of a technical implementation of our strategy. So we’re not saying we have to do more of this or less of it. In our marketing, we probably stress the word “cloud” more than “application service provider” or other terms we used to employ. So in that sense, we’re going with the flow, but the cloud’s always been very important in our strategy, because in general IT costs certainly can be prohibitive in getting started with next-generation sequencing. So when people try out cloud services and do some experiments, I think they definitely find some scale issues. When they have a data center-size operation, they need to consider accessing someone’s hosted service center versus building their own. I think those are the kinds of things people consider, and we need to consider those things as we mature and increase our business. But I’m going to call them technical implementation issues. How do you offer more services at a lower cost? That’s something we focus a lot of energy on. There’s an appeal to being able to use cloud services for data storage, most importantly for backup and for the infrastructure that goes with maintaining the data. In our cost structure, the way the fees work is transaction-based, so it’s focused on the analysis. On third-generation sequencing systems and the way informatics deals with the data: There will be new problems to solve. I think at one level they will be incremental problems. One of the very interesting features of Pacific Biosciences’ system is the ability to produce very long sequences, and a lot of alignment algorithms that are now doing very high-throughput work are dealing with short sequences. So people have to adapt those tools to handle longer sequences, and they will. There [16 ]#*0t*5 803-%+6-:|"6(6452011 www.bio-itworld.com will be strategies to deal with that. I’m a little less familiar with Oxford Nanopore in terms of the kinds of data that are coming out. But largely, they are producing bases like any other system. There is 20 or 30 years of alignment experience now in the collective community, and if you considered that by individuals it would be many hundreds of years of cumulative experience. People will solve those kinds of problems. Some interesting work is going on using MapReduce kinds of technologies to make these things super-scalable. Each new instrument is going to produce new varieties of the data that people will need to deal with. I don’t think any of these are going to limit adoption of the technology or be intractable given the vast amount of experience that now exists. What is a challenge is what to do with those alignments. How do you then go that next step and summarize and visualize the information contained in the large bodies of data? I did a FinchTalk post in which I talked about Illumina’s new HiSeq instrument and recent articles about cloud computing. Often these conversations focus on the alignment challenges, and yet there’s a far greater challenge, once you’ve done those alignments, with using that body of information to understand what your data means. That’s where I think we’ve done a particularly good job, and people like what we’ve done. On library preparation for sequencing: The benefits of next-generation sequencing override the library preparation difficulties, and this has been demonstrated in literature. We certainly see it in plenty of examples. Compared to microarrays, you’re going to get a higher dynamic range in terms of the sensitivities, so with next-generation sequencing you get at genes that are less expressed. Also you don’t blow out your signal, if you will, so you can measure high levels of expression to a finer degree. But more importantly with microarrays you can only measure with the probes you have on that chip. With the next-generation sequencing we’re finding that there are many regions of the genome that aren’t annotated and are showing expression. These sequences are not on today’s microarrays so I have a [chance] to discover new genes, gene boundaries, and exons through next-generation sequencing. This is information which until now you couldn’t get in a microarray experiment. Having said that, there are artifacts that you can see in an RNA-Seq that you’d never measure in a microarray, and those can get in the way. Ribosomal RNA is an example. It’s very important to have good preparation methods to remove those contaminating molecules. So there are some trade-offs. One of the nice things is that since we have a LIMS product, we can start to capture laboratory information about the experimental process. Our strategy integrates that laboratory information with the analytical information so that people know more quickly whether their experiments are on track. Further reading: /FYU(FOFSBUJPO4FRVFODJOH(BJOT.PNFOUVN.BSLFUT3FTQPOEUP5FDIOPMPHZBOE *OOPWBUJPO"EWBODFT+VOFXXXJOTJHIUQIBSNBSFQPSUTDPN Cambridge Healthtech Institute’s Inaugural FOCUSED SESSION TRACKS x IT Infrastructure and the Cloud x Next-Generation Sequencing Data Management and Interpretation x Bioinformatics x Drug Discovery Informatics CONFERENCE June 5-8, 2012 | Marina Bay Sands, Singapore MARK YOUR CALENDARS Recent advances in scientific technology have left us confronted with the task of discovering scientific knowledge from enormous amounts of data generated in genomics, pharmaceutics, medicine and other life science areas. Cambridge Healthtech Institute’s Inaugural Bio-IT World Asia Conference, building upon the exciting momentum of the flagship Bio-IT World conference in Boston, will provide an ideal occasion for both the life science community and the information technology industry to meet and discuss the challenges and solutions on data management infrastructure, interoperability, and the complexity of data analysis in biomedical research and drug development process. Official Publication Bio-ITWorldAsia.com ADVISORY BOARD x M. K. Bhan, M.B.B.S, M.D., D. Sc., Secretary, Government of India, Department of Biotechnology, Ministry of Science & Technology x Linh Hoang, Director of Genomic Medicine, Life Technologies x Chris Blessington, Senior Director, Marketing & Communications, Isilon x Krishan Kalra, Chairman & CEO, BioGenex Labs Inc. x Stephen Rudd, Ph.D., CSO, Malaysian Genomics Resource Centre Berhad x Peter Little, Ph.D., Research Director, Life Science Institute, National University of Singapore x Parthiban Srinivasan, Ph.D., President & CEO, Parthys Reverse Informatics, India x Yusuke Nakamura, Ph.D., Professor, Laboratory of Molecular Medicine, Institute of Medical Science, The University of Tokyo; Secretary General, Office of Medical Innovation, Cabinet Secretariat, Government of Japan x Tin-Wee Tan, Ph.D., Deputy Head of Department of Biochemistry , National University of Singapore x Han Cao, Ph.D., Founder, CSO, BioNanomatrix, Inc. x Laurie Goodman, Ph.D., Editor-inChief, (Giga)n Science, BGI-Shenzhen x Sean Grimmond, Ph.D., Professor, Molecular Bioscience, University of Queensland x Yike Guo, Ph.D., Professor in Computing Science, Imperial College London x Pauline Ng, Ph.D., Group Leader, Computational and Mathematical Biology, Genome Institute of Singapore x Alain Van Gool, Ph.D., Head Molecular Profiling, Translational Medicine Research Center, Merck Sharpe & Dohme ĂŵďƌŝĚŐĞ,ĞĂůƚŚƚĞĐŚ/ŶƐƟƚƵƚĞ ϮϱϬ&ŝƌƐƚǀĞŶƵĞ͕^ƵŝƚĞϯϬϬͮEĞĞĚŚĂŵ͕DϬϮϰϵϰͮd͗ϳϴϭͲϵϳϮͲϱϰϬϬŽƌdŽůůͲĨƌĞĞŝŶƚŚĞh͘^͘ϴϴϴͲϵϵϵͲϲϮϴϴͮ&͗ϳϴϭͲϵϳϮͲϱϰϮϱ SPECIAL ADVERTISING SECTION BEST New Products & Services scan.0%&™/PWFM5PPMTUP&YQMPSF &YQMPJU"DUJWBUJPO4UBUF%FQFOEFOU,JOBTF$POGPSNBUJPOT GPS4USVDUVSF(VJEFE%SVH%FTJHO0QUJNJ[BUJPO Daniel Jones, Daniel Treiber, Ph.D., Sailaja Kuchibhatla scanMODE™ represents the next generation of activation statespecific kinase assays available on the KINOMEscan™ kinase assay platform that may be used to gain structural insights in the absence of cocrystal data and to collect in vitro data most predictive of inhibitor potency in cellular assays. scanMODE employs a panel of phosphorylated/nonphosphorylated ABL assay pairs & autoinhibited/non-autoinhibited PDGFR family RTK assay pairs that facilitate understanding of how activation state-dependent conformational changes may affect inhibitor affinity and a novel approach to guide the strategic optimization of inhibitors best suited for specific disease indications. 'FBUVSF#FOFàUT ■ Classify inhibitors as having Type I or Type II binding modes without a requirement for cocrystal structures ■ Reports on the compatibility of an inhibitor’s binding mode with the autoinhibited conformation ■ Provides activation state-specific biochemical PDGFR family RTK inhibition data necessary to predict & interpret potency in cellular assays ■ Further differentiate inhibitors based on activation statespecific binding ■ Explore & exploit inhibitor binding to diverse activation statedependent kinase conformations selection decision making. However, binding mode determination can be difficult, time consuming, and expensive, often requiring the use of x-ray crystallography or in silico modeling. Classify inhibitors as having Type I or Type II binding modes without a requirement for cocrystal structures scanMODE capitalizes on several key observations enabling the use of these assay pairs to serve as surrogates to classify an inhibitor’s binding mode. ■ Type II inhibitors preferentially bind to the nonphosphorylated state of ABL, whereas Type I inhibitor binding is phosphorylation state-independent ■ Binding mode is generally maintained across kinases (e.g. imatinib is a Type II ABL inhibitor and a Type II LCK inhibitor). ■ Inhibitors that primarily target kinases other than ABL are correctly classified as Type I or Type II when tested against the differentially phosphorylated ABL assay pairs ■ A significant fraction of known kinase inhibitors have sufficient off-target affinity for ABL and/or ABL mutants to qualify for scanMODE analysis *OIJCJUPS#JOEJOH.PEF$MBTTJàDBUJPO The majority of ATP-competitive kinase inhibitors are classified as having either Type I or Type II binding modes. Although both Type I and II inhibitors generally contact the ATP binding site, only Type II inhibitors access an “allosteric” site unmasked in the inactive DFG-out conformation. Consequently, Type II inhibitor binding can be significantly more sensitive to the phosphorylation state of the A-loop than Type I inhibitor binding. An inhibitor’s binding mode can impact several key parameters in drug discovery, including enzyme inhibition kinetics, offsets between in vitro and cellular potency, nearest neighbor & kinome-wide selectivity, on target residence time & pharmacodynamics, interactions with upstream and downstream signaling molecules, and intellectual property position. Since the optimal binding mode is likely to be target-specific, it is an essential parameter to characterize for multiple leads at program outset and during optimization. When the optimal binding mode is unknown a priori, a strategy to pursue two lead series with distinct binding modes can de-risk early lead [18 ]#*0t*5 803-%+6-:|"6(6452011 www.bio-itworld.com 'JHVSFscan.0%&DMBTTJàFTJOIJCJUPSCJOEJOHNPEFCZNFBTVSJOH QIPTQIPSZMBUJPOTUBUFEFQFOEFOUBGàOJUZDIBOHFT 'VSUIFS EJGGFSFOUJBUF UIF EFUBJMFE CJOEJOH NPEFT PG inhibitors within the Type I and Type II classes scanMODE also includes a panel of PDGFR family RTK assay pairs (CSF1R, FLT3, KIT) in the autoinhibited (JM domain docked) and non-autoinhibited (JM domain not docked) states. Unlike the case for ABL A-loop phosphorylation, both Type I and Type II inhibitor affinities are dependent on the PDGFR family RTK activation state, with large and often dramatic preferences for the non-autoinhibited state observed for all inhibitors tested (Table 2). These binding affinity preferences are inhibitor-specific and report on the compatibility of an inhibitor’s binding mode with the autoinhibited conformation. In the autoinhibited SPECIAL ADVERTISING SECTION BEST New Products & Services Table 2. Activation state-dependent KIT inhibitor binding provides structural insights state, the docked JM domain can interfere with inhibitor binding in two ways: first, by sterically clashing with the inhibitor directly, and, second, by stabilizing an enzyme conformation incompatible with inhibitor binding. Whereas inhibitors such as sunitinib (Table 2) and dasatinib show relatively small affinity preferences and have binding modes compatible with the autoinhibited conformation, imatinib and nilotinib binding are sterically incompatible with JM domain docking and the affinity preferences are much larger (Table 2). Thus, structural insights are gained by measuring an inhibitor’s affinity preference for the non-autoinhibited state, the magnitude of which reports on the compatibility of an inhibitor’s binding mode with the autoinhibited conformation. Since a significant fraction of known kinase inhibitors have off target affinity for PDGFR family RTKs, these data can provide structural insights for inhibitors targeting kinases outside of the PDGFR family as well. $PMMFDUBDUJWBUJPOTUBUFTQFDJàDCJPDIFNJDBM1%('3 GBNJMZ 35, JOIJCJUJPO EBUB OFDFTTBSZ UP QSFEJDU interpret potency in cellular assays Because Type I and Type II inhibitor affinities may depend on the PDGFR family RTK activation state, it is critical to know the activation state being queried in biochemical assays when predicting cellular potency and interpreting cellular data. Figure 3 presents biochemical and cellular potency data for a panel of KIT inhibitors which show that both in vitro enzyme activity IC50s and the autoinhibited state Kds can greatly under-predict cellular potency, whereas the non-autoinhibited Kd data are most predictive and give the expected potency offsets (in vitro Kd < cellular IC50) and illustrate how highly potent PDGFR family RTK inhibitors can be missed in biochemical assays using enzyme preparations for which the activation state is undefined. "OFYUHFOFSBUJPOCJPDIFNJDBMUPPMGPSLJOBTF inhibitor drug discovery Continued investment in the discovery, development and optimization of kinase inhibitor therapeutics exhibiting improved potency, selectivity and safety profiles require a new generation of screening tools and solutions. These tools should provide insight about how inhibitors interact with kinases and binding mode in order to facilitate structure-guided drug design and provide a strategic approach to the optimization of a next generation of potent, selective and efficacious therapeutics. scanMODE is a novel biochemical tool consisting of an expanding set of activation state-specific kinase assay pairs that provide a facile, functional solution to inhibitor binding mode classification that enhances the predictive value, and facilitates interpretation of downstream cellular data to explore and exploit inhibitor binding to defined activation state-dependent kinase conformations. To learn more about scan.0%&WJTJU www.kinomescan.com/scanmode Tel | 1.800.644.5687 www.kinomescan.com 'JHVSF$PNQBSJTPOPGCJPDIFNJDBMBOEDFMMVMBS,*5JOIJCJUPS potency data Tel | 1.866.448.4464 +44.121.260.6142 www.discoverx.com www.bio-itworld.com+6-:|"6(6452011 #*0t*5 803-% [ 19 ] SPECIAL ADVERTISING SECTION BEST New Products & Services Eliminate Paper Source and eCRFs T he single biggest cost driver of late-stage clinical development is the on-site monitoring of paper source documents. Capturing source data on paper is costly, error-prone, and time consuming for both sites and sponsors. SureSource® Tablet — maintains the natural workflow, ease of use, and mobility of a paper chart while simultaneously capturing and validating data in real-time — no need for a separate eCRF to be filled out later and validated against the original paper source document. SureSource Portal — review source documents remotely immediately after a subject visit. Source Data Verification (SDV) is eliminated; monitors can focus review efforts on context, trends, and AEs. ,FZ#FOFàUT ■ Reduce study costs 25-30%; eliminate SDV; fewer queries and on-site monitoring visits ■ Intuitive, paper-like interface; eliminate duplicate site work to re-enter subject data ■ Offline functionality and real-time edit checks ■ Leverage existing infrastructure investments ■ Import/Export to EMRs via HL7 standards Can your EDC do this? www. clinicalink.com [20 ]#*0t*5 803-%+6-:|"6(6452011 www.bio-itworld.com SPECIAL ADVERTISING SECTION BEST New Products & Services %JTFBTF.VUBUJPOTGPS:PVS/FYU (FOFSBUJPO4FRVFODJOH/(4 "OBMZTJT (FOPNF5SBY™ G enome Trax is a collection of manually curated genome feature data that enables you to identify human genome variations of functional significance by mapping your NGS data to known elements such as disease mutations, transcription factor binding sites, drug target genes and more. Key advantages of Genome Trax for NGS analysis: ■ Quickly and easily identify functionally relevant variations in genome data ■ Find and display functional non-coding regions in human genome sequence ■ Filter large numbers of variants by multiple types of mapped sequence features Our database will help you understand the impact of human variation on disease risk. You can evaluate risk based on diseaselinked mutations mapped to your human genome variations, as well as by mapping novel mutations to functional features such as regulatory sites, disease genes and more. Genome Trax contains unique content: ■ 4,400+ regulatory sites ■ 90,000+ disease linked inherited mutations ■ 152,000+ COSMIC (Catalogue of Somatic Mutations in Cancer) mutations ■ 877,000+ ChIP-Seq fragments with best binding site predictions For more information contact us: [email protected] /FX.BSLFU4UVEZ/PX"WBJMBCMF 5IF'VUVSFPG/FYU(FO4FRVFODJOH/(4 T his comprehensive CHI Research Group market study covers developments and predictions related to the Next-Gen Sequencing (NGS) market. It is compiled from over 1350 surveys submitted by current and future NGS users. It is being offered at no charge through underwriting by key NGS technology providers including DataDirectNetworks, Illumina, Ingenuity and PerkinElmer. Areas covered include: ■ The evolving role of NGS in R&D ■ NGS production, analysis, visualization and storage ■ NGS cloud developments ■ NGS in the clinic ■ NGS outsourcing ■ Much more Read this valuable study to derive a highly informed sense of where the NGS market is heading. We invite your continuing post-study comments at our www.ngsleaders.org website. Download now at www.bio-itworld.com/BioIT/ WhitePapers.aspx! CHI PROFESSIONAL MARKETING SERVICES Market Research Group www.bio-itworld.com+6-:|"6(6452011 #*0t*5 803-% [ 21 ] Clinical Trials Euphoria over EHR/EDC Interoperability May be Misplaced #VEHFUXPFTBOEMBDLPGDMJOJDBMJOWFTUJHBUPSTTUJMMCJHHFSQSPCMFNT D BY DEBORAH BORFITZ CONTENTS espite noble efforts by the Clinical Data Interchange Standards Consortium (CDISC) and others to write a rulebook for the exchange of patient-level clinical information between electronic health records (EHRs) and electronic data capture (EDC), interoperability between the two systems is largely a pipe dream. All this obsessing over data will not in any case remedy the budgetary woes of trial sponsors or the exodus of investigators from clinical research. Answers to these most pressing concerns will remain elusive until the focus shifts to the realities of investigators and study coordinators at sites initially capturing the data. So says Edward Seguine, formerly an advisory board member of CDISC and CEO of clinical trial planning software FastTrack (acquired three years ago by Medidata) and now president of electronic source record creator Clinical Ink. Relying solely upon EHR data is impractical for clinical research focused on investigational new drugs, excepting oncology studies where all but the new treatment is customary medical care. The proposed clinical research interoperability standard (HITSP IS158), which is intended to serve as the basis for how data within EHR systems can be used to support clinical research, instead “makes the process even more complicated.” Critically, of the 37 “use case actions” mapped out for an interconnected environment, 22 have dependencies on systems that don’t exist, says Seguine. Data monitoring activities, the biggest cost driver of clinical research, are addressed merely by referencing a non-existent Reviewer System or EHRs that magically learn of protocol requirements via a standardized message from another non-existent Protocol Development [22 ]#*0t*5 803-%+6-:|"6(6452011 System. “The [nonsensical] HITSP day’s technology and business practices” IS158 standard completely misses the to export from EDC or Clinical Ink’s big picture.” SureSource solution an HL7 standard It’s no coincidence that the United document containing study visit data into Kingdom, with its nearly 20-year-old an EHR so other physicians are aware national database of patient informathe patient was involved in a trial, says tion, is still unable to tap the data for Seguine. Conversely, “already standardclinical studies or use its EHR capabiliized concepts”—e.g. patient demographics, prior medications, ties for direct entry of lab results, and medical clinical trial data, says history—could under Seguine. Interoperabilcertain conditions be ity proponents this side taged imported into research of the pond will point to results of recent Condata from an EHR. integration nectathons sponsored But data-related doesn’t fully by CDISC and IHE (Inactivities—data entry, tegrating the Healthcare database handling, and test all the Enterprise) as “proof data clean-up—account that EHR/EDC intefor less than 12% of pieces together gration is viable now.” clinical trial budgets, and simply Tellingly, Nextrials is the according to Medidata only participating EDC CRO Contractor calcu‘assumes’ data company and “that’s belations averaged across cause this type of staged phases. Meanwhile, site exists. integration doesn’t fully monitoring and site test all the pieces tomanagement consume gether and simply ‘assumes’ data exists a whopping 43% of the total. “Other research points out that monitoring [by from non-existent systems,” says Seguine. itself ] can be nearly 40% of a large phase Part of the holdup is CDISC’s close III study budget,” says Seguine. “As a ties to the National Cancer Institute (two result of the convoluted process, project current CDISC board members repremanagement is over 26% of the total sent NCI), which have resulted in “overly study budget. In any other industry that simplistic views about the best approach,” would be laughable.” says Seguine. “In contrast to oncology, most other therapeutic areas don’t manifest the same treatment dynamic.” About The Paperless Path two-thirds of procedures called for in All of this brings us to the ideals of Cliniclinical trial protocols over the last decade cal Ink, which include getting rid of all have no corresponding medical billing the paper that has made clinical research standard, including well-known research burdensome for sites and unnecessarily instruments such as the Hamilton Deexpensive for study sponsors. Online porpression Rating Scale, says Seguine, who tals used to disseminate newer versions recently co-authored research on increasof paper documents is tacit acknowledgment that “paper rules the roost,” says ing protocol complexity. Seguine. His goal with the newly minted Irrespective of therapeutic area, it’s (CONTINUED ON PAGE 24) “extremely valuable and easy with to- www.bio-itworld.com S DIA 2011—Compliance, Collaboration and the Cloud 5SFOETBSPVOEBWBUBSTTPDJBMNFEJBFOSPMMNFOU BY ANN NEUER Avatars in the ProtoSphere ProtonMedia is the Pennsylvania-based developer of ProtoSphere, a 3-D virtual collaboration environment for the highperformance workplace that has had success in the financial services and oil and gas industries. In ProtoSphere, a 3-D avatar is created for each member of a clinical team, enabling them to interact in a virtual conference room and hold face-to-face discussions. Text chat, voice over Internet protocol (VOIP), and application-sharing are all enabled, allowing people to connect in a socially relevant manner. CEO Ron Burns says the power of ProtoSphere is about humanizing interactions. “Collaboration is a human-tohuman interaction—not a document-todocument interaction. It’s about a higher level of engagement, a two-way discussion. ProtoSphere puts context around those documents, and makes it easier and more interesting to transfer knowledge.” ProtonMedia’s research indicates that doctors are willing to do an entire education session in ProtoSphere, instead of relying on conventional Web tools and slide presentations. “This results in efficiency gains and lower cost, as travel can %*""OOVBM.FFUJOH$IJDBHP+VOF be cut,” Burns notes. As reported last year, Merck has successfully used ProtoSphere to conduct a virtual poster session (see, “Drug Discovery in a Virtual Environment,” %LRv,7 :RUOG, August 2010). All attendees surveyed afterward said they would participate in another virtual event, while junior scientists said they felt more comfortable conversing with senior colleagues in the ProtoSphere than they would in person. “This is about collaboration and learning coming together in the clinical space, and allowing individuals to have their voices heard,” says Burns. Patient Enrollment Benchmarks Linda Drumright, president and CEO of California-based DecisionView, is committed to solving delays in clinical trial enrollment. “Cycle time is taking longer and costing more, and the biggest chunk of that problem is patient enrollment,” Drumright explains. Enrollment delays have long been an intractable problem, yet much enrollment planning and analysis proceeds with little access to historical data. What data do exist are often found in homegrown solutions such as Excel spreadsheets. “This is our main competition,” says Drumright. “Those spreadsheets are bursting at the seams and there are no consistencies from study to study or from department to department. There’s little visibility of data and therefore, an inability to set expectations.” StudyOptimizer, DecisionView’s Webbased solution (see p. 32), helps life sciences companies deliver clinical trials on time and on budget by automating four processes: planning, tracking patient enrollment, diagnosing problems, and optimizing enrollment. The solution leverages predictive analytics and data visualizations to help study teams monitor actual and projected enrollment in Safety and Social Media Elizabeth Garrard, chief safety officer of Drug Safety Alliance, a North Carolinabased provider of pharmacovigilance and risk management services, sees growing concern among sponsors regarding use of social media. In the age of Facebook and Twitter, there is little guidance from the Food and Drug Administration (FDA) as to what sponsors should do if they become aware of online postings of possible adverse events. “Sponsors want to know how they can harness the power of social media at a time when FDA has given no direction,” Garrard says. So far, the only agency guidance is a www.bio-itworld.com+6-:|"6(6452011 #*0t*5 803-% [ 23 ] CONTENTS CHICAGO—This year’s annual DIA meeting* featured hundreds of exhibitors and session topics ranging from patient recruitment to cloud computing to improving regulatory reporting. There were several overriding themes including the continuing march toward eClinical suites, the changing requirements for safety reporting, and an increased emphasis on strategic partnering. Three companies in particular stood out to this reporter: ProtonMedia, DecisionView, and Drug Safety Alliance. near real-time. “As actual data come in, our forecasting engine shows where you thought you were going to be, and where you actually are,” Drumright explains, adding that customers need industry benchmarking data to make realistic assumptions for rescuing trials and evaluating new therapeutic areas. To amass the needed information, DecisionView is creating an aggregated, anonymized dataset based on customerprovided information. The input will then be fed back to customers so they can apply it to different therapeutic areas. Once the dataset becomes robust, users of StudyOptimizer will have access to more granular information for planning and creating rescue strategies. DecisionView expects to have the first group of benchmarks available to Roche, Merck, and GlaxoSmithKline this summer, folding that information into a newly released version of StudyOptimizer later in the year. Several desired benchmarks have been identified, such as screening and randomization rates, recruitment cycle time, and drop out percentages. The new version of StudyOptimizer will incorporate the benchmarks into the application, enabling comparisons of these industry benchmarks with a company’s own historical data as users make planning and rescue decisions for their trials. Only those who contribute data will have access to the benchmark information. “We expect to refresh the dataset quarterly as studies complete and as new StudyOptimizer customers contribute their historical trial data,” she says. Clinical Trials CONTENTS lengthy draft guidance on post-marketing safety reporting dating back to 2001—the pre-Facebook era—which carried a single paragraph on a sponsor’s responsibility for reporting adverse events (AEs) from information gathered online. It does, however, list four criteria that determine whether something is reportable. There must be an identifiable patient; an identifiable reporter of the AE; a suspect drug or biological product; and a suspected adverse experience or fatal outcome. Ten years ago, companies could put up carefully crafted informational Web pages, with little or no ability for readers to post personal responses. But with today’s social interactive media, how does a sponsor respond to a tweet or blog post about a possible AE? If a patient posts a note on a company blog such as, “I lost consciousness after taking the drug,” or “I had to be hospitalized,” the sponsor might be unable to investigate without knowing many relevant facts. “This is fraught with Safety Alliance provides intake and case all kinds of unknowns. You don’t know if processing on AEs and the submission of this person is on concomitant meds, what appropriate cases to regulatory agencies. The company also offers dose he or she took or for global aggregate safety how long, or anything reporting services and else about that person’s handles risk managemedical history. The only ith today’s ment projects, such as thing you do know is that risk maps and risk evalthe patient took the drug, social media, uation and mitigation but you don’t know how how does a strategies (REMS) to furto follow up,” Garrard ther refine the risk/bensays. sponsor efit profile of products. Pharma wants to enA new guidance adgage its customers but respond to a dressing social media it must also consider if tweet or blog and post-marketing AE they have a regulatory reporting is expected obligation they are not post about a from FDA, but the date meeting. (Some research is not yet known. suggests that most of possible AE? In the interim, Garthe time, the AE is not rard says she advises cusreportable because it does not meet the four criteria.) To help tomers to work within existing regulatory customers navigate this process, Drug confines. x &)3&%$ cut total study costs by 25% or more and produce hundreds of millions of dollars in annual savings for large trial sponsors, says Seguine. The return on investment is calculable based on the number of monitoring visits ($2,500-$5,000 each) and queries related to SDV ($65-$100 per cleaning) that can be eliminated. With the untimely passing of Clinical Ink co-founder Tommy Littlejohn in March, the company lost some commercial acceleration as well as a respected peer who befriended everyone he met, says Seguine. During the development of SureSource, Littlejohn provided unfettered access to the 11 sites of WinstonSalem, NC-based PMG Research where he served as president and executive medical director. Field testing at the sites ensured data entry into a tablet computer happens in the most expeditious way possible—be it a drop-down list, yes/no checkbox, number scale, image, or handwriting—with the familiar feel of pen and paper. The e-source record complements existing data warehousing infrastructures, and ultimately could supplant existing EDC systems that output data statisticians want to analyze, says Seguine. SureSource (CONTINUED FROM PAGE 22) SureSource is to create a model for how sites collect information that reduces the need for paper-based source documents and thus source data verification (SDV). The paper-free world Seguine envisions won’t happen at the hand of Clinical Ink alone. SureSource provides a Web portal where study sponsors, monitors, and site users can review electronic source documents remotely as visits happen in real time. But sponsors and sites would also need access to other types of information electronically, including regulatory and informed consent documents as well as clinical trial results for individual participants. Seguine feels this broader information-sharing environment could be built now using Microsoft’s Sharepoint and Amalga platforms, but require interfaces that are geared toward sites—not data managers—to facilitate the gathering of data and documents. Clinical Ink is now working with small biotechnology companies and clinical research organizations to prove the SureSource concept which, if successful, could [24 ]#*0t*5 803-%+6-:|"6(6452011 8 www.bio-itworld.com collects the same information in addition to the source data investigators must document to demonstrate compliance with Good Clinical Practice and patient case history requirements. Importantly, within the source documents study monitors can immediately spot an adverse event that investigators may initially interpret as clinically insignificant. Indeed, the frequency and relevance of interactions between monitors and sites increases even as the number of face-to-face visits decline, he adds. Despite new e-source guidance from the U.S. Food and Drug Administration, sites as a rule are not doing direct data entry into EDC because those systems were created to address the needs of data managers and are thus sequentially out-of-sync with how patients get evaluated, says Seguine. “If EDC could be used to capture source data directly in front of a patient, enterprising researchers would have been doing so long ago. They haven’t because EDC doesn’t meet their needs.” And so long as doctors have to endlessly toggle between forms, setting off round after round of edit checks, they will continue documenting patient visits on paper. x Computational Biology Turning Blood into Gold: The Wellness Chip 8BUDIBWJEFPPG-BSSZ (PMETUBMLGSPNUIF (PME-BCTZNQPTJVN Larry Gold’s SomaLogic EFUFDUTUIPVTBOETPG QSPUFJOCJPNBSLFSTXJUI VOQSFDFEFOUFETFOTJUJWJUZ and specificity. Larry Gold has 1,100 proteins on a chip that he believes could be used to indicate disease. “a longitudinal proteomic biomarker monitoring company.” Blood Simple What could be easier than monitoring an individual’s health over time via a blood test? In a few situations, screening blood biomarkers can be as simple as measuring a single protein, such as in pregnancy (HCG) or prostate cancer (PSA). But what if the early—and treatable—presence of cancer or heart disease could be gleaned in a similar blood test, measuring a critical subset or “signature” of circulating proteins unequivocally associated with the disease? The task begins by whittling down the total number of secreted proteins in blood—the number is around 3,400, or one seventh of the human proteome—to the subset that represents a validated diagnostic. Using proprietary reagents called SOMAmers, custom nucleic acids that target a specific protein, SomaLogic’s current technology can simultaneously detect and quantify 1,100 human proteins (see “The Strength of SOMAmers”). “We’re a quarter of the way there,” says Gold, noting that the total number of blood proteins (including intracellular proteins released after cell death) is probably closer to 4,000. “1,100 is already an awful lot. Nobody else can do more than 20-30 at a time. For the moment, we have an opportunity to learn a lot of medicine and biology quickly. Every time we’ve added proteins to the chip, the performance gets better.” The “Wellness Chip” (the term is trademarked) refers to measuring all 1,100 proteins in one assay, providing information on all diseases on the same chip. Of those 1,100 proteins, Gold says one third have already turned out to be markers in various diseases or indications. He shows me a wall chart in which all of the current biomarkers are laid out horizontally like a bar code, with diseases grouped vertically. The key markers are color coded for each indication. Interestingly, part of the blue oncology group overlaps with the red cardiovascular disease markers, which Gold says might be indicative of inflammatory pathways. www.bio-itworld.com+6-:|"6(6452011 #*0t*5 803-% [ 25 ] CONTENTS BOULDER, CO—Fourteen years after conceiving a tool to discover and measure protein biomarkers, Larry Gold and his colleagues at SomaLogic are poised to see their first diagnostic—a lung cancer blood test licensed to Quest Diagnostics—reach the marketplace, perhaps before the end of the year. This would be the first of a potentially extensive list of diagnostic assays under development for various cancers, cardiovascular disease, neurological disorders and neglected diseases. Eventually, they could be brought together into a single, simple blood test: the Wellness Chip. “We understand that longitudinal ‘omics is the ball game,” says Gold, the company’s chairman and CEO. “Whether it’s proteomics or lipidomics or transcriptomics, snapshots at time T are interesting but a series of snapshots at many times T are better for managing health.” Gold says his friends tease him, wondering how such a terrific hypothesisdriven scientist—he sold his former company NeXstar to Gilead in 1999 for about $550 million—is content to be data-gathering ‘omics guy? Gold smiles and tells them: “I have a hypothesis: if we can measure more things than you, better than you, we will learn more than you know. That’s it! That’s what all ‘omics is about. That’s why I don’t dump on genomics. But I don’t think DNA sequencing or biopsies of numerous tissues are the best measurements to detect diseases in a way that is immediate and actionable.” Gold says SomaLogic aims to become MATT STAVER BY KEVIN DAVIES Computational Biology The green markers at the bottom are what Gold calls “the horse****”—pre-analytic variation largely due to sample acquisition and handling differences. Big Business Gold has assembled an experienced executive team to explore the full range of diagnostic and research applications for the SOMAmer platform. Among his key colleagues are two ex-Pfizer executives— Steve Williams (chief medical officer) and Nicholas Saccomano (chief technology officer). (Ed. Note: Saccomano was on the cover of %LRv,7:RUOG in April 2005 while Pfizer’s senior VP global research technology.) Mark Messenbaugh, SomaLogic’s director of corporate strategy, joined the firm three years ago, having previously worked as a lawyer and on Al Gore’s 2000 presidential election campaign. “My guy The Strength of SOMAmers CONTENTS -BSSZ(PMEBOE$SBJH5VFSLJOWFOUFEBQUBNFSTTIPSUPMJHPOVDMFPUJEFTUIBUDBOCJOE proteins, at the University of Colorado in 1989. The first aptamer drug, produced CZ(JMFBEBGUFSBDRVJSJOH(PMETDPNQBOZ/F9TUBSJOXBTDBMMFE.BDVHFOGPS the treatment of age-related macular degeneration. (The drug was successful, BMUIPVHI(PMEDPODFEFTUIBU(FOFOUFDIT-VDFOUJTXIJDIUBSHFUTUIFTBNFSFDFQtor, is also a very good drug.) 4IPSUMZBGUFS(JMFBECPVHIU/F9TUBSGPSBCPVUNJMMJPOJO+VMZ(PME was furloughed. But he had already started researching new uses for aptamer SFBHFOUT(JMFBEBMMPXFE(PMEUPCVZCBDLUIFEJBHOPTUJDSJHIUTUPUIFUFDIOPMPHZ XIJDIGPSNFEUIFCBTJTGPS4PNB-PHJD SomaLogic developed a new class of aptamer reagents, which they call SOMAmers (the term stands for “Slow Off-rate Modified Aptamers”). SOMAmers are made of DNA-containing modified nucleotides with unique chemical and kinetic properUJFT&BDI40."NFSDPOUBJOTBVOJRVFTUSFUDIPGBCPVUNPEJGJFEOVDMFPUJEFT XJUIBUPUBMMJCSBSZTJ[FPGBCPVU15EJGGFSFOUTQFDJFT8JUITPNVDIWBSJBUJPOUP choose from, and a development process designed to select against non-specific CJOEJOHBTJOHMF40."NFSDBODPNCJOFUIFTQFDJGJDJUZPGUXPBOUJCPEJFT(PME explains: i8IZEPQFPQMFEP&-*4"BTTBZTXJUIUXPBOUJCPEJFTJOTUFBEPGPOF 5IFSFBTPOJTPOFBOUJCPEZDBOHSBCBQSPUFJOCVUUIFCJOEJOHBGGJOJUZ,E JTTVDIUIBU UIFBOUJCPEZXJMMBMTPCJOEUPPUIFS<NPSFBCVOEBOU>QSPUFJOTXJUIMPXFSBGGJOJUZ 5IFSFBSFMPHTEJGGFSFODFJOQSPUFJODPODFOUSBUJPOTJOCMPPEBOEMPHTEJGGFSFODFJOBGGJOJUZ"NPOPDMPOBMBOUJCPEZTN"C TQFDJGJDJUZJTVTVBMMZCBTFEPO,E ZPVNJHIUFOEVQNFBTVSJOHBMCVNJOPSGFSSJUJO<UXPQSFWBMFOUQSPUFJOT>XIJDI ZPVEPOUXBOU*GZPVVTFUXPN"CTZPVHFUUPNVMUJQMZUIFTQFDJGJDJUJFTw 4PNB-PHJDIBTGJOBMMZCFFOBCMFUPSFQSPEVDFUIFTQFDJGJDJUZPGUXPBOUJCPEJFT in a single SOMAmer reagent, in a way that allows for multiplexing literally thouTBOETPG40."NFSTPOBTJOHMFBSSBZ(PMEBENJUTTPMWJOHUIBUQSPCMFNXBTIBSE CVUiBMMUIFCJPNBSLFSTBSFMJLFMZUPCFEPXOJOUIFXFFETBUWFSZMPXDPODFOUSBUJPOTwIFTBZTi8IFOUIJOHTXFSFOUXPSLJOHXFIBEUPDIBOHF5IFQSFWJPVT BQUBNFSUFDIOPMPHZXBTOUHPPEFOPVHI8FEJEOUMPTFIFBSU8FLFQUGVOEJOH coming, and it worked.” 40."NFSTDBOCFHFOFSBUFEJOXFFLTUPWJSUVBMMZBOZHJWFOUBSHFU"GUFSTFMFDUJOHUIF40."NFSUPEFUFDUBTQFDJGJDQSPUFJOQSPUFJOMFWFMTDBOCFNFBTVSFECZ DPNCJOJOHTBNQMFTXJUIBMMUIFTQFDJGJD40."NFST"GUFSUIFGSFF40."NFSTBSF EJTDBSEFEUIFCPVOE40."NFSTBSFSFMFBTFEQSPEVDJOHGMVPSFTDFOUMZUBHHFE SOMAmers ready for high-throughput detection using microarray technologies (Agilent is used the most at SomaLogic), which in turn gives a readout of the identities and concentrations of the proteins in the original sample. K.D. [26 ]#*0t*5 803-%+6-:|"6(6452011 www.bio-itworld.com lost. I wrote a lot of those losing briefs,” he admits. While working in the non-profit world, Messenbaugh went to hear Gold speak at a local business meeting, and was instantly hooked by Gold’s vision for the future of health care. “I was smitten!” he admits. Messenbaugh followed Gold to the elevators and asked for a job, which he eventually took just as the SOMAmer technology was maturing. The deal with Quest Diagnostics, worth $15 million at the time, was signed in 2005. “We’ve raised a lot of money here without any revenues,” says Gold. (The first SOMAmer on the market is actually part of a “hot-start” PCR kit sold by New England Biolabs.) Now Messenbaugh and colleagues are laying out the longerterm vision. “How do we move toward the Wellness Chip?” Pharma customers clearly like the technology, using SOMAmers to study basic disease biology, drug effects, target discovery and selection. “We recognized that the tool is more powerful than we as a small company can ever make use of completely,” says Messenbaugh. “Pharmas can create value out of this, but how do we enable that without standing in our own way?” In one study with Bristol-Myers Squibb, Gold says analysis of blood samples before and after administration with an anti-angiogenesis drug candidate revealed some delayed responses. “We saw a pattern of what was coming within a month to help understand the mechanism of action,” he says. SomaLogic has struck deals with Japan’s Otsuka to use SOMAmers for target validation in animal models, and NEC to deliver data analysis tools and, ultimately, health information via cloud-computing services, among others. “Discovery is quite easy here,” says Messenbaugh. “We can do broad-based discovery on virtually any clinical question. So far, the successes are outnumbering failures—by a lot.” Messenbaugh and Gold recognize the need to stay disciplined. “This tool will be great for basic science and understanding biology,” says Messenbaugh, “but our core function is driving diagnostic tests into the market. We have to ask: Is the clinical indication of value? If not, we have to think about our resources.” They see a broader impact of SOMAmers for the advancement of medicine. “Could some gene be overexpressed and thus be a good target for drug development? I think there’s enormous hope for neglected diseases,” says Gold. “Wouldn’t it be nice if we could do proteomics on people with single-gene mutations and find something that helps understand the biology or helps with ideas about therapeutics?” SomaLogic has programs looking at ALS and Duchenne muscular dystrophy, but as Gold says, “We have to be careful not to be drawn away from our end-goal of powerful, simple, and fast diagnostics.” Rule of Four Larry Gold views SomaLogic as a longitudinal proteomic CJPNBSLFSNPOJUPSJOHDPNQBOZi:PVIBWFUPQBSUOFSXJUI TPNFPOFJOUFSFTUFEJO*5wIFTBZTi:PVSFOPUHPJOHUPTJU BSPVOEXJUIQSPUFJONFBTVSFNFOUTBOEFYQFDUUIF QFSTPOUPDPNQBSFUIJTZFBSTEBUBUPMBTUZFBS*UTBCPVUIBOdling the informatics around vectors as an aid to health and disease management.” i0VSCJPJOGPSNBUJDTHVZTIBWFEFWFMPQFEUIFJSPXOUPPM TFUOFDFTTBSZUPEPCMPPECBTFEQSPUFPNJDT8FSFDPNNJUUFE UPEFWFMPQJOHBCJP*5UPPMTFUGPSPVSFOEVTFSTBTXFMM*UT BMMHPJOHUPCFBCPVUEFDJTJPOTVQQPSUwFYQMBJOT.BSL.FTTFOCBVHI4PNB-PHJDTIFBEPGDPSQPSBUFTUSBUFHZi*UTOPU KVTUUIFBSSBZBOZNPSF*UTUIFEBUBTFUBOEUIFGJMUFSGPSUIF EBUBTFU8FWFHPUUPUBLFUIBUGSBNFXPSLJOUPUIFIFBMUIDBSF world.” “Google thinks you can get there with a set of non-physJDBMNFBTVSFNFOUT8FUIJOLUIFLFZQIZTJDBMNFBTVSFNFOU is proteomics,” says Gold. “The algorithm part is actually not UIFUIJOHUIBUMJNJUTUIFFOUFSQSJTF5IBUTGJHVSJOHPVUIPX UPNBLFUIFNFBTVSFNFOUT8FXFSFSFBEZUPEPBMHPSJUIN EFWFMPQNFOUZFBSTBHP8FKVTUEJEOUIBWFUIFEBUBw The SomaLogic informatics team is a small unit of four staff MFECZ%PN;JDIJi"LFZTUSFOHUIJTUIBUXFSFGBNJMJBSXJUI UIFNFBTVSFNFOUEFWJDFT8FLOPXUIFQJUGBMMT*UTBOFWPMWJOHUFDIOPMPHZwTBZT;JDIJ Ultimately it comes down to understanding the protein EJGGFSFODFTCFUXFFODBTFTBOEDPOUSPMTXIBUJTSFBMBOEXIBU is an artifact resulting from how the samples were collected, IBOEMFEBOETUPSFEBTXFMMBTXIBUNJHIUCFBUUSJCVUBCMFUP BDPNPSCJEJUZBOEOPUUIFEJTFBTFJORVFTUJPO;JDIJTUFBN CVJMETUIFDMBTTJGJFSTUPEJTUJOHVJTIUIFUXPHSPVQTVTJOHWBSJous machine-learning algorithms–Bayesian classifier, random forests, clustering and multi-dimensional scaling, or PCA QSJODJQBMDPNQPOFOUBOBMZTJT i5IFSFTOPPOFSJHIUXBZw IFTBZT*ONPTUDBTFTBTVCTFUPGNBSLFSTiTIPVMECF TVGGJDJFOUGPSNPTUUIJOHTXFSFMPPLJOHBUw An ongoing challenge is understanding the relationship CFUXFFOQSPUFJOMFWFMTBOEUIFNBOOFSJOXIJDIUIFCMPPE samples are collected. Major variations can hinge on the type PGUVCFOFFEMFHBVHFBOETQFFEPGTBNQMFDPMMFDUJPO5IF SBUFPGCMPPEGMPXJNQBDUTUIFTIFBSPOUIFQMBUFMFUTXIJDIJO turn can result in a tenfold difference in some proteins. i8FIPQFUIJTXPSLXJMMEFGJOFBQSPUPDPMGPSDPMMFDUJPOB GPPMQSPPGXBZUPDPMMFDUBTBNQMF8FSFTUJMMEFWFMPQJOHBCFTU QSBDUJDFGPSTBNQMFDPMMFDUJPOw;JDIJTBZTi'PSFYBNQMF1$" is helping to get a handle on which analytes move in tandem XJUIBCVTFTPGUIFCMPPETBNQMFDFMMMZTJT 8FSFTUBSUJOHUP understand certain signatures.” “The Holy Grail is to recover what the analyte levels were QSJPSUPUIFBCVTFwTBZT(PMEi:PVEPOUXBOUUP<EJTDBSE> UIFWBMJENBSLFST<IJEEFOJOUIFTBNQMFIBOEMJOHWBSJBCJMJUZ> :PVWFHPUUPGJHVSFJUPVUwK.D. www.bio-itworld.com+6-:|"6(6452011 #*0t*5 803-% [ 27 ] CONTENTS Information Model The release of the lung cancer test is in Quest’s hands. Says Messenbaugh: “It’s an LDT [lab-developed test].They’ll do it at the pace they consider right.” The test will enable early detection of lung cancer, providing an indication whether nodules are cancerous. Another test for pancreatic many markers simply shift up and down in tandem with others. To better understand F PC B the underlying biology, Gold is learning about KEGG, GO and other pathway tools. S Ultimately, Gold sees his business model as “an information model,” especially if S SOMAmer — Slow Off-Rate Modified (ssDNA) Aptamer (~40nt) one uses the term “longituF Fluorophore dinal ‘omics.” He says: “You PC Photocleavable group — o-Nitrobenzylether B Biotin do your annual test, get your 1,100, 2,000, or 3,000 data Anatomy of a SOMAmer points. The computer sends you a note, ‘Nice job, see you next year. cancer has also been licensed to Quest. An We didn’t see anything.’ Or ‘Go see Dr. ongoing challenge is to standardize the Finklestein, because you need your methods for blood sample collection and [whatever] examined.’ analysis to eliminate variability as much “Medicine will change over the next as possible (see “Rule of Four”). decade: people who get sick will be able to “One would hope that as you add more enter the medical system more effectively markers, you get more perfect, but there’s than they do today, because they’ll have an asymptotic plateau,” says Gold. Having early, even pre-symptomatic access to real studied some 12,000 blood samples to information. Nobody’s got time to think date, SomaLogic scientists have concludthe way you do about your own health.” x ed there’s a lot of redundancy in biology— Computational Biology Open Source Solutions for Image Data Analysis 'SPNOFVSPOTUPOFNBUPEFTUIFDIBMMFOHFT in data analysis remain pervasive. BY OLIVIER MORTEAU CONTENTS Ron Kikinis, who runs the Surgical Planning Laboratory (SPL) at Boston’s Brigham and Women’s Hospital, admits that his research field is privileged when it comes to the tools that have been developed in the past two decades for neuroimage analysis. “It’s probably the most advanced area in terms of image data analysis,” he says, which he attributes to a long-term effort by the NIH to fund projects in neuroimaging. Imaging analysis exists for clinical applications in many other organs, but there has not been as much funding to develop post-processing for those applications, which consequently tend to lag behind neuroimaging applications. But that doesn’t mean that neuronal image data analysis technologies cannot be improved. For example, most technologies have focused on group comparison in like healthy brains. “A lot of tools can do an incredible job in finalizing this type of data. However, as soon as you go into brain pathologies, the technology available is significantly less robust,” says Kikinis. Advances in bioimaging devices, which are producing larger volumes of data of ever greater complexity, mean “we’re drowning in data”, he says. Images generated by magnetic resonance imaging (MRI), CT scans and positron emission tomography (PET), are typically 3-D or 4-D, where the fourth dimension is time, contrast uptake, or some chemical parameter. “How do you process and analyze data to the point where you see the information that you are interested in?” he asks. “That usually means some form of processing that consists of throwing away a lot of data, until the only data left are what you are interested in.” The key is a combination of acquiring high quality data by expert scientists and post-processing [28 ]#*0t*5 803-%+6-:|"6(6452011 using relevant algorithms. “The point of post-processing is not to decrease the storage requirements—although it typically reduces data files of several gigabytes to just a few kilobytes—but to expose the relevant information in the context of a particular task.” High-Throughput Imaging Anne Carpenter, who directs the imaging platform at the Broad Institute, says that extracting key information is a task inherent to bioimage analysis. “That is just what image analysis is—converting a large amount of digital information into a more manageable amount of the most critical information,” says Carpenter. Because her focus is high-throughput screening (HTS), she uses microscopes Visualization of a brain tumor using 3D Slicer. www.bio-itworld.com that generate static 2-D high-throughput images. The data are usually less complex than those generated by medical imaging devices like MRI or CT-scans. “In HTS, the goal is to take millions or hundreds of thousands of images and identify the small percentage of them that has the characteristics of interest. Conceptually, that’s very simple, but the challenge is actually in doing it,” says Carpenter. Bioimaging and medical imaging possess separate challenges. The structure of the human brain doesn’t vary much from patient to patient. But studies of the nematode (Caenorhabditis elegans), for example, might involve organisms that can curve upside down or backwards. The cardinal features in one image analysis project can vary from one experiment to another, says Carpenter. The same is true with cultured cells. “You can’t align them to each other in the same way that you can align a brain to another brain,” says Carpenter. From her viewpoint as a cell and computational biologist, the challenge of bioimaging merely reflects the level of physiological complexity of the biological system studied. Biologists are gravitating toward much more physiological systems than before, she says, preferring to work with whole organisms rather than cultured cells. “However, many organisms do not have yet their own image analysis algorithms. C. elegans and zebrafish are two organisms we’ve been working on.” And cell biologists, who are often culturing two different types of cells together (because it keeps the cells in a more physiological environment), pose their own challenges. “Whenever you mix two cell types together, not only is it challenging to get the cells to grow happily, but it also presents image analysis challenges, because you are not tuning the algorithm just to fit one cell type,” she says. roccessing and analyzing data usually means throwing away a lot of data until the only data left are what your are interested in. Ron Kikinis#SJHIBNBOE8PNFOT)PTQJUBM developed at the Digital Imaging Unit of the University Hospitals of Geneva, Switzerland, is OsiriX, which is the successor of Osiris on the Mac platform (Osiris for PC, still available for free, is no longer supported). Another software product, ClearCanvas PACS, was recently released by ClearCanvas. The 3D Slicer software package comes with a set of tutorials so as to be as userfriendly as possible. But 3D Slicer also targets developers using a plugin architecture. “We want to encourage people to develop their own things,” says Kikinis. Although designed for basic research applications, another interesting feature of the software is its potential to communicate with clinical devices via the Open Image Guided Therapy (IGT) Link. The connection enables 3D Slicer to receive and send information from a medical device, allowing it to control a scanner or a robot, for example. Specific clinical devices produced by companies such as BrainLab come with the Open IGT Link. Carpenter’s team built CellProfiler, a successful open-source software that won a %LRv,7:RUOG Best Practices Award in 2009 (see, “Carpenter Builds Open Source Imaging Software,” %LRv,7:RUOG, Jul 2009). The goal was to find an alternative to custom programs, such as MetaMorph (Molecular Devices) and Image-Pro Plus (Media Cybernetics), which can be challenging to adapt to a specific experiment, and to commercial software that is useful for screens in certain cells but otherwise limiting. “CellProfiler is the only high-throughput cell image analysis software in existence that is open source,” Carpenter says. Not only is it modular and therefore quite flexible for complicated assays, but it is also user-friendly; a beginner can mix and match modules and different image analysis functions. “We have users who do low-throughput experiments where they just count cells in a dozen or so images, and users who look for a very complicated phenotype and need to process images in a cluster and measure hundreds of thousands of images in a round-the-clock manner,” says Carpenter. Working with a number of nematode research groups, Carpenter is about to release a toolbox of robust algorithms for C. elegans analysis, and aims to do the same for the zebrafish. Her group has also completed a couple of screens in cocultured cells, using machine-learning to accomplish those projects. With two different cell types of different textures or size, it is easy to tune one algorithm to one cell type and a different algorithm to the other cell type. “But when you mix them together, both algorithms would have to work on the entire image, and an algorithm that’s very well fitted to one cell type might chop the other cell type into bits, and think that a portion of the large cell type might be a clump of a number of the other very small cell type,” Carpenter says. The group has developed an algorithm that “intentionally chops the cells into bits and then uses machine-learning algorithms to allow the biologist to train the computer to learn which pieces belong to which cell type. Then, optionally, you can piece the cells back together again using machine-learning.” x 0MJWJFS.PSUFBVJTBDPNNVOJDBUJPOTDJFOUJTU BUB#PTUPOCBTFECJPQIBSNBDPNQBOZ www.bio-itworld.com+6-:|"6(6452011 #*0t*5 803-% [ 29 ] CONTENTS Seeing Solutions As a tool for medical image analysis and post-processing, Kikinis and his colleagues at the SPL have been developing the 3D Slicer software package. “I’m a medical doctor, so I don’t write codes myself, but I’ve been working in interdisciplinary research with computer scientists for a quarter of a century,” says Kikinis. 3D Slicer has been developed with NIH funding with no restrictions over the past several years. “NIH wanted us to make this software available in a meaningful way, and from our point of view the most meaningful way was to go completely open source,” says Kikinis. “Think of 3D Slicer as a big chest of tools,” says Kikinis. For example, Kikinis and his colleagues rely on a proven imaging method called diffusion-weighted imaging (based on the local microstructural characteristics of water diffusion) that is used to study the organization of the brain’s white matter. 3D Slicer offers a suite of tools to do rapid post-processing of these images. “You would first filter for noise reduction,” he says, “then do an estimate of the diffusion tensor of the diffusion-weighted images, and finally do some form of phase streamline analysis inside the diffusion tensor file.” 3D Slicer offers a versatile solution for biomedical imaging analysis. Many software packages overlap various aspects of 3D Slicer, but none cover all of its applications, says Kikinis, and none are compatible with both Mac and PC. One offering, P ‘ Expert Intelligence for Better Decisions Next-Generation Sequencing Generates Momentum: Markets Respond to Technology and Innovation Advances This report focuses on current and innovative NGS technologies, services and markets to answer such questions as; ) Which early NGS market entrants have been continually improving and updating their original systems? ) Who has introduced new scaled-down instruments to broaden the market? ) What Generation 2.5 systems featuring new detection technologies and single-molecule sequencing are now on the market? ) When will third generation of instruments led by nanopore technologies be entering the commercial feasibility stage? ) Why upstream sample handling is undergoing continual technological innovation? ) Which informatics providers are moving rapidly toward fully integrated systems to provide the rapid generation of actionable biological information? For more info & to order: InsightPharmaReports.com Insight Pharma Reports a division of Cambridge Healthtech Institute 250 First Ave., Suite 300, Needham, MA 02494 T: 781-972-5400 Toll-free in the U.S. 888-999-6288 F: 781-972-5425 Biot*58PSMET 2011 BEST S P E C I A L R E P O R T: PRACTICES Awards The Select Six Best Practices CONTENTS A s always, we are dedicating our summer issue to a showcase of the winners of our annual Best Practices Awards competition. The six winning entries—from CliniWorks, Collaborative Drug Discovery, GlaxoSmithKline, Merck, Novartis, and Oxford Nanopore Technologies—were introduced at the Bio-IT World Expo in April. Their stories are presented in the following pages. This year’s competition attracted 34 entries and prompted much frank deliberation among our judging panel, as they sought to identify the most important, novel, and potentially impactful collaborations and ideas from basic research and IT infrastructure to translational medicine. We believe that the winners of the 2011 Best Practices Awards offer some exciting stories that highlight the value of ingenuity and collaboration, impacting areas including drug discovery, diagnostics, and clinical research. We hope that some of these advances will have resonance across portions of the industry. As always, thanks to our panel of 13 guest judges for volunteering their time and insights. We congratulate not only our winners (and their nominating organizations) but everyone else who took time to enter this year’s competition. We will have news about the make-up and timing of the 2012 awards in our next issue. — The Editors Best Practices Awards 2011 Enrollment Modeling Results in Productivity Gains for Merck i l o t t e s t i n g D e c i s i o n V i e w ’s StudyOptimizer provided Merck & Co. with ample evidence that the predictive analytics platform significantly improves the odds of clinical trials getting done on time and within budget. The clinical enrollment optimization and decision support tool has since become the standard for large phase II and III studies across legacy Merck, and will $MJOJDBMBOE)FBMUI*53FTFBSDI CONTENTS Winner:.FSDL Nominator: %FDJTJPO7JFX Project: $MJOJDBM&OSPMMNFOU0QUJNJ[BUJPO soon be the standard for big-investment studies across legacy Schering-Plough as well, says Christopher Heider, director of information technology at Merck. The two former rivals merged in late 2009. Merck expects to receive the same return on investment across the entire portfolio of trials that it observed in the pilot studies, says Heider. These include a reduction in overall cycle time variance by approximately two to eight weeks, reduction in the time trial managers spend aggregating study data by roughly 50%, improvement in timelines and accuracy of reporting study data to management by 50%, and reduction in the time trial managers spend identifying recruitment and data cleanup issues by 20%. The insights and productivity gains Merck has achieved using the enrollment modeling capabilities of StudyOptimizer were recognized in April with a %LRv,7 World Best Practices Award in the category of clinical and health-IT research. StudyOptimizer is a “leading-edge,” cloud-based application for planning, tracking, and optimizing clinical trial enrollment performance, says Heider. Eight of the ten top global pharmaceutical companies are now customers. Traditionally, patient enrollment projections and course correction strategies were based on the [32 ]#*0t*5 803-%+6-:|"6(6452011 experience and intuition of study managers with inconsistent and often costly consequences. Merck previously used a custom solution that looked only at first patient enrolled/last patient enrolled and the number of sites. It had no way to model additional variables such as the impact of additional sites, vendor tactics, and screen failure ratio differences between geographies. The tool automates the business process of enrollment through a collaborative platform, allowing Chris Heider, director clinical trial operations IT, Merck; headquarters and regional David Hilmer, director of sales, DecisionView trial management teams to work together to create realistic enrolllowing the impact of various strategies to ment plans, validate plan assumptions, be visualized, Heider says. When several test multiple scenarios, and approve a targeted countries dropped out of one dibaseline against which performance will abetes study during enrollment with only be monitored, says Linda Drumright, seven months remaining, for example, president and CEO of DecisionView. Merck’s Global Trial Optimization (GTO) Underperforming sites can be quickly group was able to use StudyOptimizer to pinpointed and closed, and rescue sites develop three recovery strategies using identified to keep studies on track. validated assumptions about new and existing countries: the number of additional sites that could be brought on board, siteTrial Testing ready ramp-up time, and fluctuations in Importantly, StudyOptimizer captures screening rates during winter holidays. current enrollment plans and historical Seven months later, the diabetes trial enrollment metrics in a single database, finished enrollment within three weeks updated from the organization’s clinical of projections. trial management system nightly. Study Feedback from Merck’s GTO group, managers no longer need to aggregate the primary users of StudyOptimizer, data into Excel files to see overall study has been extremely positive, says Heider. enrollment progress. As part of the projStudyOptimizer gets a daily data feed ect with DecisionView, Merck centralized data loads from disparate sources—interfrom Merck, which is loaded into the apactive voice recognition, electronic data plication to update projections, estimate capture, and central lab systems—encompletion dates, display alerts, recaliabling creation of this single “source of brate the forecasting model, and execute truth,” Heider says. many of the tasks once performed manuThe “real value” of StudyOptimizer ally. Trial managers can thus concentrate comes from underlying algorithms that their efforts on analyzing trends and spotproduce usable charts and graphs, alting potential problems. x www.bio-itworld.com MARK GABRENYA P BY DEBORAH BORFITZ Clinical Imaging: Focus on Service N BY ALISSA POH *5*OGPSNBUJDT MARK GABRENYA relevant health care standards, including DICOM and CDISC. When deployed at clinical trial sites, ImagEDC enables clients to produce images compatible with trial requirements, without additional processing from the responsible CRO. These clean, [patient] de-identified images are then stored in a local repository that includes a tracking service to record their receipt and other workflow events. One might figure that physical bandwidth for image transfer between Novartis and a study partner could be a bottleneck, but Baumann and Snyder disagree. “Several generic and specialized solutions [to maximize bandwidth] work very well, and we don’t seek to supplant these with ImagEDC. Data format and quality issues are far more pertinent.” The approach is also cost-effective: with NIAI and ImagEDC replacing manual image reads, Novartis estimates that it has reduced the cost of each applicable clinical trial by about $80,000. “We were interested to share our success story with NIAI and encourage adoption of ImagEDC, to promote interoperability between sponsor and vendor infrastructures,” says Snyder of Novartis’ decision to participate in this year’s Best Practices competition. “Given the excellent work submitted by our competitors, it was definitely gratifying when the judges announced our win; it’s validation that we are working to a valuable purpose.” Snyder, Baumann, and their colleagues at Novartis are also organizing a Pharma Image Exchange interest group to help govern the rapidly evolving landscape of image processing, “We hope that SOA-based tools like ImagEDC will be increasingly adopted, and that we’ll see an encapsulation of more image processing and workflow functions as services,” Snyder says. “Service-based exchange of data quality requirements, for example, is an important next step.” x www.bio-itworld.com+6-:|"6(6452011 #*0t*5 803-% [ 33 ] CONTENTS ovartis’ open-source platform, Winner:/PWBSUJT*OTUJUVUFGPS ImagEDC, first rolled out in 2010 #JP.FEJDBM3FTFBSDI/*#3 (see, “Novel IT Platform Helps Project: Novartis Image Analysis Novartis Gain Control of Clinical *OUFSGBDFBOE*NBH&%$ Imaging Data,” %LRv,7:RUOG, Nov 2010), continues to make waves in the IT and implementing infrastructure.” informatics world. It garnered a Best Until recently, Novartis researchers Practices Award (IT & Informatics cathad to deal with a “closed box” workflow, egory) at this year’s Bio-IT World Expo, in where several different parties were inconjunction with Novartis Image Analysis volved in any one trial. Each—from the Interface (NIAI), the company’s fully auimaging CRO to core labs—had its own tomated image analysis workflow system. systems and processes, with no standards ImagEDC, a nifty combination of enabling decentralized data storage or service-oriented architecture (SOA) and removal of patient-identifying informagrid computing, gives researchers greater tion. This was an inefficient process, and control and ownership of imaging data potentially compromised research quality. across multiple clinical trials. Basically, it enables smooth data transfer between trial partners, using caGrid-enabled Transparent, Trackable Data Web services for high performance and The imaging team at Novartis developed security. a plan to manage clinical trial data that According to Stefan Baumann, head of would be transparent, trackable, and Novartis’ clinical imaging team, and Josh easily configurable, with real-time qualSnyder, an imaging infrastructure expert ity control enabled through faster image at the company, incorporating innovative transport between study partners. They image processing techniques—even into turned to caGrid, an open-source middlesmall, exploratory trials—is currently no ware product capable of supporting parteasy task. “If you look at other industries, ners with different IT proficiency levels such as travel or banking, interoperability and budgets, and compliant with the between data sources and consumers has grown tremendously in recent years, and the resulting ease of data exchange has created huge advances in data driven applications,” Snyder says. “Not so for imaging; we believe there is significant opportunity in this space.” Meanwhile, there are increasingly complex needs that come with this opportunity—data quality requirements, interface standards, and workflow modularization, to name several. ImagEDC offers the flexibility to adapt as these requirements change. “We’ve delivered not just a solution for interoperability, but a working, open source reference software package,” says Snyder. “Vendors Thierry Cladé, solution architect, Novartis; and other sponsors can use this Stefan Baumann, head of clinical imaging, NIBR to accelerate their own efforts at Best Practices Awards 2011 GSK’s Helium Rises to the Top F BY ALISSA POH ,OPXMFEHF.BOBHFNFOU Winner:(MBYP4NJUI,MJOF Nominator: $FJCB4PMVUJPOT Project: )FMJVNJO&YDFM"/FX1BSBEJHN GPS%BUB*OTJHIU that researchers would find Helium’s “wrapper” comfortingly familiar, while the tool retained Spotfire’s functionality. Helium mines and reveals relationships between integrated data stores. Based on a data’s “type”—say, a compound, gene, target, or project code— Helium suggests complimentary data from disparate sources. For instance, if a project code is entered, Helium prompts scientists to retrieve associated compound numbers and places these in a second column. If the scientist clicks on this column, Helium offers data relationships such as “Compound Number to Structure,” or “Compound Number to Biological Result.” Helium can thus generate a vast lexicon of data mashups, all via commands in plain English. Researchers need not know where data is stored, its format, or even the specifics of running a query. Helium’s advent enabled the retirement of Richard Bolton, strategic IT portfolio manager and Ashley George, two key systems within director of strategic IT portfolio for discovery, GlaxoSmithKline; discovery: a toolset for Tom Arneman, president, Ceiba Solutions retrieving biological data from GSK’s in-house systems; and probably overlapped in a lot of cases.” a bespoke chemistry spreadsheet. Both Unsurprisingly, the idea of creating a of these were “complex to maintain and “Swiss army knife” approach to all of the required significant training,” says a company’s SAR needs proved popular. company representative; scientists find The first version of Helium was based on Helium much more flexible and intuitive. TIBCO Spotfire, but the average bench Developing Helium involved plenty of researcher found Spotfire difficult to end user ownership and interaction, acnavigate. Then in 2009, GSK purchased cording to GSK. A senior researcher from ChemAxon’s suite of tools—JChem for discovery headed a group of 10 users covExcel, Instant JChem, and JChem Carering all disciplines and sites within the tridge—and modified Helium to utilize company’s R&D; the group met weekly Excel’s spreadsheet format. The idea was MARK GABRENYA CONTENTS amiliarity breeds contempt, it’s said, but not in the case of GlaxoSmithKline’s most recent tool for SAR analysis: Helium, which has the ubiquitous Microsoft Excel at its core, and received %LRv,7 :RUOG V Best Practices Award in the Knowledge Management category this year. Helium was first conceived several years ago as GSK sought to streamline data access for its scientists. Then, GSK managed data based on how it was stored, not how it was used within workflows. This resulted in “siloed” datasets requiring a slew of laborious steps before scientists could use the information. Researchers were left with a mishmash of tools and resources that, as a GSK representative puts it, “did some things really well but [34 ]#*0t*5 803-%+6-:|"6(6452011 www.bio-itworld.com to review Helium’s progress and put the product through its paces on “live” data. GSK opted for a gradual, “viral” release of Helium in 2010. Users passed Helium along to peers if they felt the product would be useful, and this word-of-mouth approach “actually worked very well.” Currently, Helium is employed by over 1,400 users in GSK’s discovery domain, a number expected to rise to 3,500. This tool has “dramatically increased” productivity and scientific knowledge interchange, the company says, besides eliminating over 30% of IT infrastructure. Branching Out Plans are already afoot to commercialize Helium’s functionality for a broader market, starting with GSK’s biopharma and preclinical spaces. The company is working with IT company Ceiba Solutions on this active expansion of Helium into new domains, which involves updating core functionalities such as security, and will also necessitate preliminary feedback from new user groups. “Ultimately, R&D IT leadership strives to provide their researchers with realtime, comprehensive insights across disparate data sources,” says Ceiba Solutions’ president Tom Arneman. “In developing Helium, GSK realized this goal, drastically reducing license fees for point search portals, and saving scientists valuable time and focus.” Ceiba nominated the product for Best Practices consideration this year, as “Helium is a visionary solution to a very real and growing need for the industry.” A GSK representative admits that they had few expectations with regards to Helium’s entry, as competition in their category was “very strong indeed; we were surprised and very pleased at the announcement [of Helium’s win].” Like many of its pharmaceutical peers, GSK is moving away from huge data stores, and adopting Web 2.0/3.0 for sleeker data integration, while requiring minimal specific training for its scientists. Helium, designed to be sophisticated yet user-friendly, will be “a crucial element in effecting this cultural change.” x Accelrys Pipeline Pilot Guides ONT’s Nascent NGS Data Handling A BY KEVIN DAVIES lthough still in stealth mode, Oxford Nanopore Technologies (ONT) recently revealed details of the GridION hardware that will form the basis of its next-generation sequencing technology as well as protein analysis and other applications. And as its Best Practices Award shows, it has been laying the groundwork for an effective and flexible informatics solution as well. 3FTFBSDIBOE%JTDPWFSZ “In the face of staggering estimates for the all-inclusive cost and complexity of NGS analysis, simply providing a new instrument is only half of the story,” says ONT senior scientist Richard Carter. The British company believes in offering simple ways for scientists to analyze NGS data while retaining the flexibility to adapt to “a rapidly shifting landscape of analysis methods and algorithms.” After assessing several commercial and public options, ONT elected to partner with Accelrys, agreeing to offer a version of the Pipeline Pilot NGS Collection as its recommended platform for NGS data analysis. Already deployed in some 1,300 institutions, the Pipeline Pilot workflow software appears to be a good choice. After all, “Pipeline Pilot is the computational underpinning for all Accelrys products,” says Clifford Baron, product marketing director. A bioinformatician himself, Carter collaborated with Accelrys to develop the NGS collection and created a series of workflows that reflect analyses performed on a broad range of publications. “It’s relatively simple even for a novice user of Pipeline Pilot to create useful and powerful applications using the NGS Collection,” says Carter. “In little time No Best Answer Launched in early 2011, the NGS Collection for Pipeline Pilot consists of some 150 components for analyzing NGS data, including quality assessment and processing, assembly and mapping, variant detection and profiling, and transcript and ChIP-Seq analysis. From ONT’s standpoint, Pipeline Pilot’s use of graphical application development and application integration provides the data management and algorithmic building blocks needed to develop customized NGS analyses in a relatively accessible environment. “It’s all about empowering your bench scientists,” says Carter. For example, one user of the system used the software to run an analysis of the publicly available German food poisoning Escherichia coli data. In a handful of mouse clicks and a couple of hours, a de novo assembly had been performed and the sequence compared with other strains in Genbank. Carter has created several NGS workflows using out-of-the-box Pipeline Pilot components. One calculates GC content in a genome and compares it to depth of coverage, helping scientists to spot outliers. Carter also integrated the popular Circos plot (now a standard component in the NGS collection) for visualizing genomic variation such as SNP prevalence Richard Carter, Oxford Nanopore or gene density. The software appears well suited to the properties of ONT’s technology when it is launched. ONT’s GridION is designed to acquire and analyze data in real time so that experiments can be monitored and adjusted as they are being performed. The range of analyses in the NGS collection facilitates the “Run until…” function, where users will choose to sequence until a pre-determined experimental outcome has been achieved. “Our customer surveys indicate that Pipeline Pilot saves 30-70% development time,” says Baron. Trevor Heritage, Accelrys’ senior VP, adds that he is “really excited” about the new NGS Collection. “We’re not prescribing an out-of-thebox packaged solution... We’re offering a workflow-oriented platform with the scientific brains to read the data in an intelligent way and do the analysis on top.” ONT believes that Pipeline Pilot can help address the rising challenges of data analysis and the level of expertise required to perform it. “That’s what makes it such a powerful and important tool,” says Carter. x www.bio-itworld.com+6-:|"6(6452011 #*0t*5 803-% [ 35 ] CONTENTS Winner:0YGPSE/BOPQPSF5FDIOPMPHJFT Nominator: Accelrys Project: %BUB1JQFMJOFTGPS/FYU(FOFSBUJPO 4FRVFODJOH"QQMJDBUJPOT and without requiring scientists to learn sophisticated analysis software, Pipeline Pilot helps scientists ask relevant, scientific questions about [NGS] data.” With the growing number of NGS software algorithms available, selecting and configuring the best tool is a tricky, even risky business. “There is no universal ‘best answer’ when it comes to NGS analysis algorithms,” says Carter. “Analysis of NGS data is far from a settled science.” What bioinformatics teams need, he says, are systems to compare analysis algorithms and organize data processing workflows for their various user groups quickly and efficiently, minimizing repetition. Best Practices Awards 2011 Providing Patients as a Service C BY ALLISON PROFFITT ambridge, Mass.-based CliniWorks’ new software-as-a-service platform, AccelFind, allows real time clinical data mining and patient screening from medical records to streamline planning and recruitment of clinical trials. It is fully HIPAA compliant, protects patient privacy, and is capable of incorporating data of any source, format, structure and content. The platform’s promise caught the IRB. They only need IRB approval for those 30 [or so appropriate patients] that the system identified.” Faster Feasibility Studies This saves time and money, Sneh says. High quality, iterative feasibility studies can be done in a few days or even a few hours, rather than weeks or months. Recruiting can be compressed by 3-6 months, not only by shortening the frontend exploratory part but also by being +VEHFT1SJ[F able to target only the most promising sites with known and quantified availWinner:$MJOJ8PSLT ability of suitable candidates. The acProject: AccelFind celeration at each phase can accrue to a significant reduction in time to market, the judges’ attention and earned it the or faster decision to eliminate a drug 2011 %LRv,7:RUOG Best Practices Judges’ candidate from the pipeline, leading Prize. to reduced clinical development costs AccelFind is a specialized natural(approximately $50,000 per day). For language processing platform with a successful drug candidates the ROI is vast conceptual terminology database, even higher: earlier revenue as well as syntax and context and longer patent protection analytics. “Our search engine (value can be in the vicinity of could be compared to Google, $1,000,000 per day). but Google is looking for keyThe last year has been words… while in our case, what marked by rapid growth for we’re looking for is relationCliniWorks. In December ship between words,” explains 2010, AccelFind successfully Nitzan Sneh, CEO. concluded a 140-study pilot of Sponsors access the system sponsored phase II or phase to plan a recruiting strategy III studies. Sneh lists current for a clinical trial. The platcustomers including pharma form converts existing medical companies like Novartis (in records from any number of rare diseases), Merck (in oncolinstitutions (from databases, ogy), CROs like Parexel, and transcriptions, or scanned cophospitals including a health ies) and other notations (from doctors’ notes, nurses’ notes, information exchange of 11 or lab reports) into a unified hospitals in Texas using the and universally usable form program for internal quality using language rather than the and safety studies. Nitzan Sneh, founder and CEO and Udi Meirav, executive chairman and co-founder, CliniWorks structure of databases to deciWith 10 employees in Campher the meaning of medical bridge and eight at a whollypart of a document. A site or sponsor data and place it accordingly. AccelFind owned subsidiary in Tel Aviv, Israel, Clinican used AccelFind to scan the patient then searches and analyzes the unified Works is small, but Sneh expects to hire population before getting IRB approval, data against any set of inclusion/exclusion five to six new staff by year end. He says Sneh says, because the data is completely criteria, intelligently sifting through free that winning the Best Practices award anonymous. “Users can freely search for text entries and accounting for context. was a very personal triumph for the team. anything they need because they won’t The system is “very sensitive to the “Many [employees] consider this prize as be exposed to any patient identifiers, so meaning of vocabulary,” says Sneh. For recognition for their own contribution. they can go and search before they go to example, AccelFind can distinguish effecThey are very proud.” x MARK GABRENYA CONTENTS tively between the statements “patient has heart disease”, “patient expressed concern about heart disease”, or “patient has family history of heart disease”. Researchers can screen millions of patients against a complex set of inclusion and exclusion criteria with instantaneous feedback. For example, Sneh says, “the system can screen the medical records of the entire population looking for a hemoglobin level between 7 and 9. Only 30% of the time is the result found in the lab results section, the rest of the time it’s everywhere—comments made by the physician, lab summaries, etc.” AccelFind has put great emphasis on patient privacy, removing all data from HIPAA identification fields, not just patient name, date of birth and address from the structured fields of a medical record but also from any mention or reference that might be buried in any other [36 ]#*0t*5 803-%+6-:|"6(6452011 www.bio-itworld.com Tuberculosis Cloud Collaborations ollaborative Drug Discovery has developed a molecular library database to serve a network of over 100 tuberculosis researchers in the U.S. and Europe, helping their users mine and collaborate on tuberculosis data. Their efforts, nominated by the Tuberculosis Research Section, NIAID, NIH and the Global Alliance for TB Drug Development, earned them the 2011 %LRv,7:RUOG &EJUPST$IPJDF"XBSE Winner:$PMMBCPSBUJWF%SVH%JTDPWFSZ Nominators: 5VCFSDVMPTJT3FTFBSDI 4FDUJPO/*"*%/*)BOE5IF(MPCBM"MMJBODF GPS5#%SVH%FWFMPQNFOU Project: $PMMBCPSBUJWF%SVH%JTDPWFSZ5# database www.bio-itworld.com+6-:|"6(6452011 #*0t*5 803-% [ 37 ] CONTENTS Best Practices Editors’ Choice Award. With funding from the Bill and Melinda Gates Foundation and other investors, CDD collated at least 15 public datasets on Mycobacterium tuberculosis, representing well over 300,000 compounds derived from patents, literature and high throughput sequencing data. “Any new chemoinformatic system should provide, at minimum, capabilities for fundamental data storage, retrieval, and analysis of diverse data originating from chemistry, biology, pharmacology, and toxicology activities,” explained the TB Alliance in its nomination. “Ideally such a system would be Web-based so that any participating laboratory could use it without further investment in hardware. The system should be intuitive so that new participants can learn the system with minimum training. In addition to fundamental chemoinformatic tools, such a system should be able to enhance collaboration among researchers in the same field, the community.” Enter Collaborative Drug Discovery. “The Gates Foundation made us a grant to fund specific groups that needed to use the software for collaborations either within the institute or between institutes or between institutes and companies,” explains Sean Ekins, CDD’s collaborations director. CDD took their Cloud-based application and developed software specifically for the TB community. The initial grant was for two years awarded in 2008, but has been extended to five. CDD’s database allows collaborators to share research data securely within and across organizations without the need to install and maintain complex software. CDD runs on a fault tolerant infrastructure providing redundant storage, compute nodes, power, HVAC, and backbone connections. The infrastructure is also redundantly secure, protected by multiple layers of host-based, network and physical security measures. CDD software runs on a MySQL database and was developed using the Ruby and Java programming languages. The tool was Sean Ekins, Collaborative Drug Discovery developed using an agile their data into vaults that enable sharing development process which uses an intewith specific collaborators, says Ekins. grated design-build-test process. “But there’s another component of the CDD’s TB database fosters data ardatabase. There’s a public side where we chiving and selective sharing within have some datasets, and we’ve done anthe research community and enhances notations around TB—sort of curation creation of computational models, said of data from the literature around comthe Tuberculosis Research Section in its pounds,” he says. nomination. “This award acknowledges two years of software development and support for TB Curation TB research groups funded by the Gates Having public and private screening data Foundation, and is a credit to all CDD available against M. tuberculosis enables users and community members who have researchers to analyze the biological achelped guide our technology over the tivity vs. physicochemical properties of past seven years in the cloud,” said Barry compounds in the database, said the TB Bunin, CDD’s founder. Alliance. “Consequently, this database “This gives me a rare opportunity to has also been used to build novel compupublicly recognize the exceptional actational machine learning and pharmacocomplishments of our software developphore models that could be used to filter other libraries of molecules to rapidly ment and product team. We would like to identify potential M. tuberculosis-active thank our nominators and collaborators, compounds.” as well as the editors of %LRv,7:RUOG for The software allows users to segment this prestigious award!” x MARK GABRENYA C BY ALLISON PROFFITT Best Practices Awards 2011 #FTU1SBDUJDFT&OUSJFT CLINICAL AND HEALTH-IT RESEARCH KNOWLEDGE MANAGEMENT Company | Nominator | Project Company | Nominator | Project WINNER: Merck | DecisionView | Clinical Enrollment Optimization WINNER: GlaxoSmithKline | Ceiba Solutions | Helium in Excel: A New Paradigm for Data Insight JUDGES’ PRIZE: CliniWorks | AccelFind EDITORS’ CHOICE AWARD: Collaborative Drug Discovery | Tuberculosis Research Section, NIAID, NIH and TB Alliance (Global Alliance for TB Drug Development) | Collaborative Drug Discovery TB database Abbott Laboratories | eLearning Abbott Vascular | ClearTrial | Optimizing Budgets and Resource Demand Across the Clinical Trial Portfolio Accelrys | Synthesis and Process Route Planning Roche | ePharmaSolutions | Safety Letter Distribution (SLD) Application Pfizer Business Information Systems, Business Operations, Pharmaceutical Sciences, Global R&D | Composite Software | Rapid Deployment Technology Program Ochsner Health System | Orion Health | Implementing a System Wide HIE Merck | Oracle | Enhanced ELN Performance via Oracle Exadata DIA | Phlexglobal | TMF Reference Model Harvard Catalyst, the Harvard Clinical and Translational Science Center | Recombinant Data Corp | Profiles Research Networking Software CONTENTS National Cancer Institute’s Cancer Therapy Evaluation Program | SAFE-BioPharma Association | Research collaboration in the cloud: How NCI and Research Partners are using Digital Identities to Accelerate Drug Rota Consortium, South Africa | Synexus Clinical Research | Effect of Human Rotavirus Vaccine on Severe Diarrhea in African Infants PPD | PatientView Pfizer Global Research and Development | Oyster Imaging Collaborative Portal Genomics Institute of the Novartis Research Foundation | The Gene Wiki – community annotation of gene function RESEARCH AND DISCOVERY Company | Nominator | Project WINNER: Oxford Nanopore Technologies | Accelrys | Data Pipelines for Next Generation Sequencing Applications IT & INFORMATICS Company | Nominator | Project WINNER: Novartis Institute for Biomedical Research | Novartis Image Analysis Interface and ImagEDC FDA Division of Animal Research, Center for Veterinary Medicine (CVM) | IO Informatics | Species-independent drug toxicity and disease markers BrainCells | Accelrys | CIVET: Cohort In/Ex Vivo Experiment Tracker Strand Life Sciences | A global RNAi screen analysis leads to the identification of key regulators of heart function London School of Hygiene and Tropical Medicine | Accelrys | Whole organism high-throughput drug screening of Schistosoma mansoni Smithsonian Institution | Biomatters | Biocode LIMS University of Texas Southwestern Medical Center at Dallas | Elucidation of evolutionarily stable, immunologically reactive regions of human H1N1 influenza viruses through integrative data analysis using the Influenza Research Database UCLA Laboratory of Neuro Imaging | Isilon | Unified storage infrastructure Janssen Pharmaceutica | I/NI-calls: a statistical search engine for the relevant genes in ’omics studies University of Florida, Interdisciplinary Center for Biotechnology Research | ScaleMP | ICBR needed a local system that would allow scientists and researchers to submit large interactive jobs. However, the price point of large SMP systems was prohibitive to ICBR. PPD | REMS Technology Solution (Risk Evaluation and Mitigation Strategy) Janssen Pharmaceutica | Computer-based mechanistic disease model of schizophrenia to predict therapeutic effect of new investigative drugs ERT | EXPERT [38 ]#*0t*5 803-%+6-:|"6(6452011 Selventa | Novel mechanism-based classifiers to predict patient response before availability of clinical treatment outcomes data www.bio-itworld.com YOUR OPINION IS NEEDED! The Cambridge Healthtech Market Research Group in conjunction with #JPt*58PSME, F$MJOJRVB and other partners is conducting an industry market research study on: “The Future of Clinical Trials” Findings will be released beginning in October 2011. If you are currently involved in, or will soon be engaged in clinical trials work then please join us in helping to design the survey by answering just 5 short questions. Click here to access this survey. For your time and input you will be provided results from the first study module released, and entered into a bonus prize drawing. Thank you. CHI PROFESSIONAL MARKETING SERVICES Market Research Group CLICK HERE TO ACCESS THIS SURVEY For vendor sponsorship information regarding this study contact: Alan El Faye $BNCSJEHF)FBMUIUFDI.FEJB(SPVQ#JP*58PSME BFMGBZF!IFBMUIUFDIDPN Next-Gen Data Genome Analytics for All 1BVMJOF/HJTQMBOOJOHPQFOTPVSDFPQFOBDDFTTBOBMZUJDTGPSUIFHFOPNFTUPDPNF S BY ALLISON PROFFITT CONTENTS INGAPORE—Pauline Ng’s office is the Genome building of the Biopolis science park in Singapore, a fitting home for one of the authors of the first published personal genome, that of J. Craig Venter, published in 2007 while Ng was a senior scientist at the J. Craig Venter Institute. Now Ng leads an expanding group of three bioinformaticists (she’s hiring!) at the Genome Institute of Singapore (GIS). Before her stint at the Venter Institute, Ng worked for Illumina as well as the Fred Hutchinson Cancer Center in Seattle, where she wrote the powerful SIFT algorithm (http://sift-dna.org), a widely used tool to predict the effect of a given amino acid substitution on protein function. “We put the algorithm on a Web server,” she said. “Ten years ago people would publish their algorithms, but they wouldn’t necessarily put them on a Web server. But my Ph.D. advisors were very emphatic, ‘You need to do this.’ That actually was very informative, because people used it. That opened it up for clinicians and geneticists to use the algorithm, instead of just pure bioinformaticists.” Ng believes that access is very important. “What’s happening is [sequencing is] accessible to academic institutions like GIS. We can sequence; we can analyze that data. The Broad Institute, University of Washington, Baylor—these are very highly regarded institutions with collaborations with a medical center. But if you’re anyone else, you may not have access to those types of resources.” In 2009, Ng co-authored a muchdiscussed Nature commentary outlining an agenda for personalized medicine in which they compared the results of two commercial consumer genomics tests. They found that the accuracy of raw data in both 23andMe and Navigenics tests was high, but one third of risk predictions (for five anonymous individuals) did not agree between the tests. A disappoint- [40 ]#*0t*5 803-%+6-:|"6(6452011 ing result for Ng. “At that time I though, wow, there’s something not quite right,” she said. “When you get a health diagnosis, you don’t consider it a prediction, you expect it to be correct. Just like you go to the doctor and he says, ‘Take this drug because you’re at risk for Pauline Ng heart disease,’ or something. But if you went to another doctor and they said something else, it would reduce the credibility overall.” GIS a Job Ng moved to Singapore in 2010, but hasn’t quite shaken her discomfort. “All of this together: working on individual genomes, making tools that are accessible to everybody, and just getting exposure to direct-to-consumer [tests]” has shaped what she now hopes to do at GIS: make bioinformatics accessible to everyone. Like SIFT, Ng’s next tools will be open source. “The plan is not to let just doctors access the software, but really anybody.” She acknowledges that bioinformatics is “a bit specialized,” but also believes that the patient is his own best advocate. She cites Hugh Reinhoff ’s work on his daughter’s DNA (see, “Hugh Reinhoff ’s Voyage Round his Daughter’s DNA,” %LRv,7 World, Sept 2010). “There’s someone with a huge self interest in finding out what is wrong with his daughter. That’s one example, but you can probably imagine all across the world there are families like this where doctors probably don’t have time or resources to do it. But if there truly is a $1,000 genome, that means that for $5,000 they can get the full family sequenced.” Affordable sequencing is still a limiting factor, but Ng is confident in that progression. And the types of diseases that Ng hopes to address need full genome sequencing. “The 23andMe data, www.bio-itworld.com they’ve squeezed as much as they can from it. But the applications—cancer, Mendelian disorders— they’re tailored toward the rare variants or somatic variants which you need [to get] from sequencing.” She expects that to be easy enough to outsource in about two years. But sequencing and analysis—today at least—cost the same. “The problem is that right now, companies like Knome are actually charging the same amount for bioinformatics as they are for sequencing. If you sequence more individuals, I’d expect the bioinformatics to go down, but it’s the same price. That means the price is double! If we can make these tools online, accessible for free or at least at cost, I think I can get it to a tenth of the cost.” Ng plans to do the computation on the Amazon Cloud and, at today’s rates, expects a genome analysis to cost $500. She hopes that these price points will enable doctors and individuals to use genomics. “If we could say, OK, outsource [the sequencing] to these companies. You’re going to get a hard disk. Mail it to Amazon and get your results in a week.” Ng is not promising a magic cure, and doesn’t even think that this model should be the only one. She just hopes to drive prices down and open the market. “There’s never a guarantee of an answer,” she says. “Even with the software we write, there may not be a guarantee of an answer, but at least…” she pauses and begins again, emphatically. “We can definitely give you the basic annotation and provide the tools that everyone uses. And if it doesn’t work, then you go to an expensive company that really uses the same tools as the academics but with a couple of more bells and whistles. If you try our stuff first, at least you’ve invested only $500 instead of $5,000.” x Sequencing at ‘Biblical Proportions’ 5IF6OJWFSTJUZPG2VFFOTMBOET4FBO(SJNNPOEHFUT UIFGJSTUQFFLBU*PO5PSSFOUTOFXUFDIOPMPHZ Sean Grimmond heads the University of Queensland’s sequencing center BRISBANE, AUSTRALIA—Sean Grimmond, director of the Queensland Centre for Medical Genomics at the Institute for Molecular Biosciences in Brisbane, was the first lab in Australia to obtain (premarket) the Personal Ge5IJTTUPSZPSJHJOBMMZ nome Machine (PGM) ran in Australian Life Sciences. from Ion Torrent. Unlike second-generation sequencing platforms, Ion Torrent’s technology foregoes optics, lasers, and cameras to quantitatively measures changes in pH generated by hydrogen ions released during nucleotide incorporation. “Relative to how much of a particular base is added, you get a quantitative difference in the amount of hydrogen ions released,” says Grimmond. Those pH spikes are translated into base calls and nucleotide sequence within a matter of seconds. The PGM is essentially a sophisticated pH meter. The chip inside comprises millions of tiny wells for the samples sitting on millions of tiny electrodes. The PGM offers several advantages says Grimmond. “The data file sizes are small, and the way it actually analyzes and measures the nucleotide incorporation is quick. Generating reads of about 120 bases takes less than two hours.” Moreover, he says, “Converting changes in pH directly into a base call means much smaller files sizes” than other 2ndgen platforms. “You really could run those machines pretty well all year without emptying the hard drives, whereas we run the SOLiD machine twice and then we have to move data to make more room.” Scaling Up The early PGM machines come with a “314” chip, containing about 1.2 million wells and matching electrodes. A newer “316” chip (6-8 million wells) is about to be released, and within a year Ion Torrent is planning to release the “318” chip which will comprise some 25 million wells (Ion Torrent says it is aiming for read lengths of 400 bases). Each new chip offers a theoretical tenfold increase in sequence throughput. “Using the exact same machine and sequencing platform, you can go from generating 1 million base reads to 25 million reads, and with that many wells, www.bio-itworld.com+6-:|"6(6452011 #*0t*5 803-% [ 41 ] CONTENTS BY FIONA WYLIE we are getting into the 1-gigabasepair range of data in around two hours,” says Grimmond. The SOLiD instruments generate about 100 Gb over two weeks and are still Grimmond’s preferred choice for sequencing human genomes. “But for smaller and more tractable sequencing applications such as the transcriptome or microRNAs, or candidate DNA mutation analysis, or microbial genomes, the PGM is ideal,” he says. Grimmond leads Australia’s effort as part of the International Cancer Genome Consortium (ICGC). In the ICGC program, the PGMs are validating patient mutations initially detected using SOLiD instruments and to address questions of clinical significance. “For example, we can now do very deep sequencing on samples from the tumor margins using primers that will detect every mutation found in the parent cancer, and in this way more closely define the risk of metastases—this is particularly critical in the case of pancreatic cancer,” says Grimmond. The PGMs are also helping to validate every DNA variant found in the cancer genomes. “We can cut some corners to pick up some of those variants, but for the novel ones we really need validate around 200 mutations per individual. Ion Torrent allows us to automate our primers, hone in on the regions that we think will have mutations, PCR them up and sequence them all on the chip and then move on to the next one quickly and easily,” says Grimmond. “If they can make a silicon chip that determines DNA sequences the size they are now and sell it for ~$200, and you can generate enough long reads, it would be very easy to make a bigger chip that could generate a human genome in two hours,” says Grimmond. “The detection system needed is virtually already built—they just have to work out how to get the molecular biology down to fit in with the more and more sophisticated chips.” Grimmond predicts, “we will be reaching data sizes of ‘Biblical’ proportions in the near future—then you really might start seeing one on every bench.” x Next-Gen Data Charges Continue to Fly over Ion Torrent Sequencing Licenses 3FTFBSDIFSTBSFVOIBQQZXJUIIPXUIFUFDIOPMPHZXBTMJDFOTFEBOEXIPHPUDSFEJU attribution of credit—or lack thereof—for results published in an important paper Six years after publishing details of the co-authored by Pourmand and Davis in first commercially available next-gener2006 (and cited in the new Ion Torrent ation sequencing (NGS) system, by 454 Nature paper). Life Sciences, Jonathan Rothberg and his Those feelings were resurrected in the colleagues at Ion Torrent have published past month after Pourmand turned to the first results from a new desktop NGS %LRv,7:RUOG and other media to express technology today, also in Nature. his frustration with the Stanford-Ion TorAll 44 co-authors are (or were) emrent licensing deal. ployees of Ion Torrent or its parent com“It is very surprising to me that Nader pany, Life Technologies, including Kevin is claiming right now that ‘Hydrogen McKernan, one of the architect’s of Life generation [during DNA sequencing] is Technologies second-generation SOLiD my patent, my invention,’” said Hassibi, platform, who recently left the company. The new paper includes an overview of the sequence (at tenfold coverage) of Gordon Moore, the co-founder of Intel and the author of the famous Moore’s Law concerning the growth of compute processing capacity. Meanwhile, charges continue to fly over the origins of some of the key technology that Ion Torrent licensed from Stanford University’s Office of Technology Licensing (OTL). Recently, two scientists, Stanford’s Ron Davis and his former colleague, Nader Pourmand Arjang Hassibi (University of California, Santa repeating a quote from Pourmand in a Cruz), complained publicly that Stanford %LRv,7:RUOG story. “They want to erase OTL undervalued their technology durany memory of what happened before ing negotiations of an exclusive license to that. This I have a problem with.” Ion Torrent. Now those complaints have sparked a strong response by another forThey Did Great mer Stanford colleague, Arjang Hassibi. In an exclusive interview with %LRv,7 For Hassibi, it is a matter of principle and seeking fair credit for past intellecWorld, Hassibi, now an assistant profestual contributions. Like Davis and Poursor at the University of Texas in Austin, mand, he is receiving licensing fees from says he and others played a key role in Ion Torrent, albeit less than the paltry the development of “charge sequencing” $2,300 that irked Pourmand. “Let’s be technology. In 2001, Hassibi co-founded a clear: Ion Torrent hasn’t done anything biotech company called Xagros Genomics wrong. Stanford hasn’t done anything with Pourmand. But the two men fell out wrong. Some scientists came up with a over the demise of the company and the CONTENTS UNIVERSITY OF TEXAS IN AUSTIN BY KEVIN DAVIES [42 ]#*0t*5 803-%+6-:|"6(6452011 www.bio-itworld.com good idea. We [Xagros] failed. Another company—Ion Torrent—picked it up and they did great.” The issue of credit for what Ion Torrent calls semiconductor sequencing— and potentially financial compensation down the road—has become more acute following Ion Torrent’s acquisition by Life Technologies in 2010 for $375 million (potentially rising to $725 million). Among dozens of patents it has inlicensed, Ion Torrent acquired exclusive licenses to two related Stanford patents. The first—Stanford docket S00-157 “Charge Sequencing: A New Technique for DNA Sequencing and SNP Detection” (priority date October 2001)—lists Hassibi and Pourmand as the inventors. The second—docket S04-291 “Charge Alternation DNA Detection System” (priority date November 2004)—has three inventors: Pourmand, Davis, and Miloslav Karhanek. “Based on the rule of ‘Success has many fathers, failure is an orphan,’ there will be many ‘fathers’ for this technology, including Nader,” said Hassibi. “However, he is completely distorting the story right now by claiming all the credit for himself. I believe this to be neither ethical nor constructive if [as he claimed] he wants to improve Stanford OTL’s licensing processes.” Signal Detection The sequencing squabble dates back to 2000, when Hassibi, then a Ph.D. student in electrical engineering at Stanford University, first met Pourmand, who was a postdoc at the Stanford Genome Technology Center (SGTC). Pourmand and Mostafa Ronaghi (now Illumina’s chief technology officer) were work- STANFORD UNIVERSITY OF CALIFORNIA SANTA CRUZ Hafeman, a founding scientist at Molecular Devices, becoming chief technologist. According to Hassibi, as the assay development lagged other aspects of the technology, management decided to put Hafeman in charge of that project. “Nader first agreed and the project got on track partially and we all were very hopeful that we would pass this speed bump,” says Hassibi. One day, however, Hassibi came to work to find that Pourmand had cleared out his desk. “He said that due to health reasons, he could not work in a start-up anymore,” Hassibi said. Hassibi and Pourmand later met in person at Stanford, but Pourmand was unhappy with the way Xagros was operating, and had decided to rejoin Davis’ group at Stanford. Pourmand encouraged Hassibi to do the same, which infuriated Hassibi. “I told him I had quit my Ph.D. and put 2.5 years of my life without back-up plans or getting any academic credit—and now he wants me to come back? I also mentioned that the investors relied on us two and had put serious money into Xagros and we were responsible for the other employees.” The two agreed to try to co-exist. “Our last handshake was that if we ever decide to publish this work, we would do it together,” said Hassibi. But that did not happen. Hassibi said Pourmand refused to hand over government-funded projects on which he was listed as the PI, or to negotiate the status of his outstanding shares. A second round of company financing fell apart, and Xagros finally went out of business in 2004. Ultimately, the Stanford patents went back to Stanford, while other patents were abandoned (some related to CMOS chips), as there was no money to support them. Nader Pourmand for characterizing molecular interaction and/or motion in a sample.” Hassibi later called the technology Charge Perturbation Signature (CPS). The core of Hassibi and Pourmand’s original patent, said Hassibi, is this: “If polymerization happens near an electrode, you see an explosion of ions. ‘Ion Torrent’ is a perfect name for a company commercializing this specific technology, although I am not sure if they named it because of this. Now, to simplify it, one can say it is pH. Initially, when we were marketing Xagros, we said, ‘It is negative charge.’ But the explanation is more complicated than that. These ions move, you have diffusive processes, then you detect it if they get to an electrode.” Fall Out In 2001, Hassibi and Pourmand decided to jump on the biotech start-up bandwagon and form a company called Xagros Genomics (named after the Zagros mountains in Iran, but with the obligatory ‘X’ instead). Xagros exclusively licensed docket S00-157 from Stanford and got funding from Tempo Ventures. Hassibi’s task was to create the sequencing hardware, sensor and the semiconductor chip, while Pourmand was in charge of assay development. A strong advisory board included Davis, Lee, and Berkeley’s Richard Mathies. Hassibi said they obtained funding because “CPS was semiconductor-compatible... the story was music to the investors’ ears.” Pourmand, a family man, remained affiliated with Stanford, but Hassibi, still a graduate student, opted to join Xagros full-time. Sia Ghazvini joined the company from Combimetrix as CEO, with Dean PNAS Envy Hassibi eventually returned to Stanford to get his Ph.D. (designing CMOS chips for biosensing and sequencing), but had no contact with Pourmand and Davis. It was during his last semester in 2006 that a colleague showed him a paper by www.bio-itworld.com+6-:|"6(6452011 #*0t*5 803-% [ 43 ] CONTENTS ing with Davis on pyrosequencing, the sequencing technology that underlies the Roche/454 next-gen sequencing platform. “I remember clearly, one afternoon in the spring of 2000, talking with Nader in the second floor of Stanford Center for Integrated Systems (CIS). He asked me whether we can detect any electrical signal in the DNA Ron Davis structure and if yes, whether I could build the electronics for it,” Hassibi recalled. Pourmand argued that DNA must have an associated charge, because it migrates during electrophoresis (the basis of Sanger sequencing). The next time the two met, Hassibi proposed to place some DNA near an electrode and connect it to a high-impedance voltage amplifier to see if a length difference resulted in a different charge (or voltage) signature. Hassibi designed a setup in the IC design lab of his then advisor, Thomas Lee, using iron needles as electrodes and micro-titer plates, while Pourmand prepared magnetic beads with primed DNA attached to them. “We placed a small refrigerator magnet on top one of the needles to immobilize the beads (and DNA) on one electrode. We placed the electrodes in the polymerization buffer and added dNTP. What we saw wasn’t conclusive, but there were distinct fluctuations when polymerization was supposed to happen,” said Hassibi. Hassibi and Pourmand, who both hail from Iran, began a close collaboration. Based on those early data, they filed a provisional patent in October 2001, which became a full patent the following year. “We initially called the technology ‘charge-sequencing,’ inspired by pyrosequencing,” said Hassibi. [The Stanford docket S00-157 and provisional patent both had this title.] But Hassibi argued that “what we are seeing is not the DNA charge but essentially some perturbation in the charge equilibrium near the DNA (and near the electrode).” This, he says, is why the title of Hassibi and Pourmand’s US patent 7,223,540 is “Transient electrical signal based methods and devices Next-Gen Data CONTENTS Pourmand and colleagues in the Proceedings of the National Academy of Sciences, contributed by Davis, entitled “Direct electrical detection of DNA synthesis.” “I was appalled and could not believe what I was reading in the paper,” said Hassibi. “There was no reference to me or Xagros—that pissed me off. Much of the data and the methods were developed at Xagros by me and others and [Pourmand] simply published it without acknowledging any of it. It seems that they came up with technology by themselves and they are trying to erase [our contribution].” Hassibi has not spoken to Pourmand since the publication of the PNAS paper. Hassibi subsequently learned that Pourmand and Davis had filed an incremental patent “to overshadow” their 540 patent. “They should have involved us as coinventors,” said Hassibi. Pourmand had also resubmitted Xagros’s SBIR grants to receive NIH funding, which further upset Hassibi, but as he was still an international graduate student on an F1 visa, he decided not to pursue any further action. Pourmand Response Contacted by %LRv,7 :RUOG, Pourmand says he “highly respects” Hassibi’s work and down plays any talk of a disagreement. He points out that the original 2001 patent “didn’t mention anything about hydrogen [ions]. We saw the signal generation based on incorporation of nucleotides. We solved electrical detection. [The 2001 patent] showed that we could detect electrically dNTP incorporation—but still at that time we couldn’t understand where that comes from.” Pourmand says he left Xagros because the company was increasingly interested on bioluminescence, prompting him to return to Stanford to work on electrical detection. “After Xagros went belly up, I came back to Stanford, and continued working on that without Arjang.” “In 2004, we realized the signal we’re detecting is hydrogen [ions]. In the original work with Arjang, we thought it was pyrophosphate actually… Even the sensors, they say they originally designed, we didn’t use that in the 2006 [PNAS] paper. We used commercially available polarized electrodes, amplifiers and so forth. Basi- [44 ]#*0t*5 803-%+6-:|"6(6452011 Jonathan Rothberg on the cover of Forbes magazine. cally, we started from fresh.” Pourmand insists he was “not ignoring his [Hassibi’s] work, absolutely not,” in the PNAS paper. “It’s completely different.” He adds he would “absolutely” have cited a paper that referred to his earlier collaboration with Hassibi if one existed—but it didn’t. “I understand he feels I’m saying ‘it’s my patent,’ but I’m particularly referring to hydrogen detection and hydrogen release, not the electrical signals,” says Pourmand. “In the 2006 patent and paper, we’re clearly claiming it is hydrogen [ions]. I don’t really care which system you’re using to detect it, the release of hydrogen is important.” Texas Shuffle Hassibi is now based at the University of Texas at Austin, where his research focuses on building new semiconductor chips for life sciences applications. Hassibi was late learning about Ion Torrent’s interest in his intellectual property. In September 2009, Stanford OTL sent a “conflict of interest” memo to the Stanford Dean’s office on the proposed licensing deal of the two aforementioned dockets to Ion Torrent. (Davis was already a member of Ion Torrent’s SAB at that time.) According to the OTL, the earlier S00- www.bio-itworld.com 157 technology provided “a faster and cheaper alternative to current methods of DNA sequencing” by detecting variations in the charge of immobilized DNA. The S04-291 invention more specifically focused on sequencing “by detection of electric charge perturbations of polymerasecatalyzed reaction by the electrochemical detection sensor with immobilized DNA.” Stanford’s OTL concluded that Ion was in “a strong position to successfully commercialize” both technologies. Despite widespread marketing to dozens of companies, only the 157 docket had been previously licensed (to Xagros). Several companies subsequently expressed interest in both technologies, but none took a license until the deal with Ion Torrent. “I expected Stanford to take some equity, but I think they were convinced by Rothberg et al. that this is the maximum that they can get for it,” said Hassibi. “Stanford OTL didn’t have an obligation to involve me in the negotiations. I was not the Stanford PI involved in this project at the time. Who would they talk to? It would be Ron, but he had a conflict of interest [as a member of Ion’s SAB]. My anger is not at OTL, although they should have got equity.” “I want to give a lot of credit to Rothberg,” Hassibi added. “We never perceived, including Ron or Nader, putting things in microwells. This is very important. Everything that was done in Stanford, as far as I know, has been based on immobilizing DNA on a gold electrode. Ion’s embodiment in terms of sample loading and interfacing is quite different, because they came out of 454, which is a perfect match for this technology.” Indeed, Hassibi may have reason to be especially grateful to Rothberg. Last year, he started raising money again. “This was music to my ears: Rothberg on cover of Forbes magazine was the best marketing we needed! Companies like Life Technologies are not going to have a division of integrated circuit designers creating these CMOS chips. They’re going to outsource it.” Hassibi’s new company, Insilixa, is a fabless semiconductor company that might count the next-generation of sequencing manufacturers among its future clients. x IT / Workflow Gordon Puts Flash into Data Intensive Supercomputing $BMJUEJSFDUPS-BSSZ4NBSSPGGFSTTPMVUJPOTGPSIJHIUISPVHIQVUEBUBNBOBHFNFOU S BY KEVIN DAVIES 9(FO$POHSFTT$BNCSJEHF)FBMUIUFDI*OTUJUVUF 4BO %JFHP.BSDI UCSD campus—featuring optical fiber— provide a shining example of the university campus of the future. High-definition video streams can be sent as live feeds from microscopes and tiled LCD walls (driven by PCs with NVIDIA graphics cards), allowing microscopy collages featuring 600 million pixels to be viewed. Smarr says each optical fiber has independent infra-red channels, each providing 100-1,000 times greater data throughput compared to the existing Internet. And instead of 200 university campuses going through one channel, “I’m saying you should have one yourself.” For example, the National Lambda Rail (with many 10GbE paths on their fibers) connects large data research centers in California and around the world. Making the Switch With Calit2 and the SDSC on its campus, it is no surprise that UCSD has jumped ahead in improving the campus cyberinfrastructrure. UCSD now boasts 60 10GbE paths across campus, in parallel with the shared Internet, eliminating data bottlenecks. Users can choose which layer of the Internet to send data using a simple 3-level switch (0.5 terabits/ sec). “Think about the clusters on campus and the space and energy they use,” says Smarr incredulously. “They’re completely isolated into islands, connected to the Internet at 10 megabits/sec. They’re 1,000:1 isolated from the rest of the world. You’re putting all your money into those instead of a fairly inexpensive optical switch? Whatever.” UCSD has also brought optical fiber to the NGS facilities on the medical campus. “There’s nothing wrong with the shared Internet for email; it’s what it’s built for. But it’s not useful for where we’re going.” Trey Ideker, who heads the systems biology group, is starting to generate more NGS data (see “Groundbreaking Work”). The UCSD campus has centralized data storage at SDSC that Smarr equated to the old library in the center of the campus. “Imagine a digital aquifer under the grass. All researchers get to use that data oasis. Then you plug in these 10 Gbps optical fibers.” Or imagine taking the output from an Illumina NGS instrument and putting it in RAM. Smarr is no stranger to working with genomics data, having collaborated for years with Craig Venter on the CAMERA project, a global microbial metagenomics community research (see, “CAMERA Database Snaps into Action,” %LRv,7:RUOG, Apr 2007). CAMERA’s IT infrastructure boasts 512 processors, 5 TeraFlops and 200 terabytes storage. “You can take your genome and BLAST it against the entire dataset. We now have more than 4,000 users in 90 countries, all www.bio-itworld.com+6-:|"6(6452011 #*0t*5 803-% [ 45 ] CONTENTS AN DIEGO—A new supercomputer at the University of California San Diego (UCSD)’s San Diego Supercomputer Center (SDSC) named Gordon, featuring a quarter of a petabyte of flash memory (hence the name) and which has been dubbed “the world’s largest thumb drive,” earned raves from Larry Smarr in a wideranging talk about the future management of life sciences data. Smarr has spent ten years building the California Institute for Telecommunications Larry Smarr and Information Technology (Calit2)—a joint program between the UCSD and UC Irvine . Speaking at CHI’s XGen Congress*, Smarr urged organizations and universities to radically retool their approaches to computing infrastructure to facilitate collaboration, data sharing, telecommunication, and nextgeneration sequencing (NGS) data. Although he has long advocated the transformative nature of optical networking, he says, “the ability to have your own personal 10,000 megabit/second (mbps) optical link is what we really need to deal with NGS machines. We’re trying to do data-intensive science on an infrastructure—the shared Internet—that was never meant for that.” But from the shared Internet to dedicated high-performance optical networks, much else has to change as well. “The last 100 feet aren’t there,” he said. Many of the innovations Smarr and colleagues have deployed across the IT / Workflow (SPVOECSFBLJOH8PSL CONTENTS 5IF6$4%DBNQVTIBTBWBSJFUZPG/(4JOTUSVNFOUTDPOOFDUFEUPB(JHBCJU SFTFBSDICBDLCPOFXIFSFEBUBBSFQJQFEEJSFDUMZUPTFSWFSTBENJOJTUFSFECZ4%4$JO conjunction with CalIT2. "DDPSEJOHUP6$4%TZTUFNTCJPMPHJTU5SFZ*EFLFSUIFDPNNFSDJBM/(4PVUPGUIF CPY*5TPMVUJPOiTJNQMZXBTOUBMMPXJOHBRVJDLFOPVHIUVSOBSPVOEUJNFw%FQFOEing on the width of the pipe, Ideker says it could take a day to transmit data out of UIFDPSFGBDJMJUZi5IBUTBQSPCMFNwIFTBZTi8FEMJLFUPHFUSJEPGUIJTJOGPSNBUJPOTPZPVDBOSFCPPUUIFTFRVFODFSBOETUBSUUIFOFYUSVO5IFXIPMFHPBMJTUP keep these machines working 24/7. To do that, you have to get the data off the temporary location in the core facility and quickly onto something else.” The new model dispenses with a reason to have the temporary location. “Even XJUIB(CJUDBNQVTSFTFBSDIOFUXPSLJUUBLFTMFTTUIBOBOIPVSUPHFU/(4SVO EBUBUSBOTGFSSFE8FDBOTBWFUIBUIPVSCZXSJUJOHEBUBJOSFBMUJNFUPBSFNPUF MPDBUJPO5IFDPODFSOTBSFUIBUJGUIFSFTBHMJUDIJOUIFOFUXPSLyZPVDPVMEMPTFB XFFLTXPSUIPGEBUB#VUUIBUIBTOUSFBMMZCFFOBIVHFQSPCMFNw %FTQJUFUIFVTFPGPQUJDBMGJCFSBSPVOEUIF6$4%DBNQVTiJUTPGUFOUIFMBTU NJMFUIBUJTUIFQSPCMFNwTBZT*EFLFSi8FNBZIBWFUIJTXJEFCBOEXJEUIJOUIF HSPVOEGPSNPTUPGUIFOFDFTTBSZEJTUBODFCVUJUTHFUUJOHJUJOUPUIFCVJMEJOHy UIBUJTPGUFOUIFCPUUMFOFDLw*OTPNFDBTFTSFTFBSDIFSTIBWFXPSLFEXJUIDBNQVT OFUXPSLJOHTUBGGUPFOTVSFUIBUiMBTUNJMFwPGDBCMFBOETXJUDIFTBSFJOQMBDF *EFLFSEPFTOUDMBJNUIBU6$4%IBTGPVOEBVOJRVFTPMVUJPOUPIBOEMJOH/(4 EBUBCVUTBZTi*UIJOLXIBU-BSSZJTEPJOHXJUIHFUUJOHBMMUIJTEJHJUBMHFOPNJD JOGPSNBUJPOQJQFEEJSFDUMZJOUPUIF4%4$NBDIJOFSPPNJTHSPVOECSFBLJOHw/P pun intended.) K.D. connected to Calit2’s CAMERA cluster. If a researcher has a dedicated 10Gbps connection to Calit2, they can use uncompressed, high-def feeds at 1,500 megabits/ second. “This avoids latency—the enemy of real-time collaboration. This is the kind of thing you can do once you have this infrastructure in place.” “The cost of electricity is becoming unbearable,” said Smarr. UCSD is already a 40MW campus and additional computers are becoming the most important driver of higher electricity demands. Smarr is part of an NSF grant, the GreenLight Project, that is adapting Sun modular data centers to measure a series of metrics including temperature, airflow, etc. on various applications running on various architectures—from multicores to GPUs, FPGAs, routers, and storage. “At the end of day, we have to know it costs this much for electricity or CO2 production. You’ll see this more and more. Universities have got to get on top of electricity costs.” Flash Gordon Smarr calls the SDSC’s new 245-Tera- [46 ]#*0t*5 803-%+6-:|"6(6452011 Flop supercomputer, Gordon, “the first high-performance data computer in the academic world. It has 256,000 GB [a quarter of a petabyte] of flash memory, that’s more flash in one place than anywhere in the world. We thank Steve Jobs for making flash memory cheap enough!” Smarr’s colleague Michael Norman, SDSC director, says Gordon “will do for scientific data analysis what Google does for Web search.” In a normal computer, with tens of gigabytes of RAM, most data sits on the disk. “But disk is 100X slower than memory. You’re disc I/O limited, waiting for the disk to get data to the RAM. Now imagine you have terabytes of RAM. You can put all your data in there at once. Then algorithms completely change.” Gordon has 32 nodes, each with 2 TB RAM, 8 TB Flash SSD (sold state drive), and a 4-PB parallel disc farm (file system). “There’s nothing like it in the world,” says Smarr. “When I think about next-gen sequencing, Gordon is the machine almost built for this. De novo assembly will benefit from large www.bio-itworld.com shared memory. This is not your father’s supercomputer. It’s a high-performance computer designed for data intensive science, just like supercomputers were optimized for solving differential equations. Federations of databases and interaction networks will benefit from low latency I/O from Flash.” The construction of Gordon, funded by a $20-million NSF grant, has also benefited from the plummeting price of 10GbE switches. In 2005, Smarr said the cost of a 10GbE port was around $80,000. In 2011, a single port is less than $1,000. Gordon will have 128 parallel channels, each 10GbE. “We now use 10GbE paths in the back-end like they’re popcorn! 10G is the new 1G. Apple is shipping MacBooks with two 10-Gbps ports! People still act like 10GbE is a lot—I don’t get it.” Smarr had some less positive views on Cloud computing, however. “The Cloud is not set up for terabyte or gigabyte files,” he said. “You can get there, but once you’re inside, there isn’t the SDSC 10GbE farm to move your data around. How much to get it back out? What do you pay for egress and exit? There are lots of developments necessary for commercial clouds to be useful for science.” “You need to understand you have a problem,” Smarr continued. “I have data! It’s exponentially growing. It just boggles my mind how otherwise intelligent places aren’t dealing with it. It’s hard—you have to bring together experts who normally don’t talk to each other—biologists, computer scientists, engineers. You have to bring together... the School of Medicine, the campus, networking/storage, departments. Now it’s a collective problem. Noone has enough money themselves.” “People don’t think about exponentials, but they make the impossible routine as we go through the threshold you care about. It’s impossible to plan for. It cost a couple of billion dollars for first human genome. Now it’s $1,000?! That’s a factor of 1 million in ten years. Over that time, Moore’s Law is 1,000. It’s the square of Moore’s Law.” “The 10GbE data superhighway is coming into being. NGS is its most important application for science, because of the democratization of sequencing.” x NVIDIA Unveils New Flagship GPU Processor /FX5FTMBJTAGBTUFTUQSPDFTTPSGPS)1$NBSLFU BY KEVIN DAVIES ence they’re trying to do with GPUs,” says Gupta. Besides HP and Dell, NVIDIA also works with SGI, Supermicro, IBM, Tyan and others. “HP is very high volume OEM. They only build systems like this when they believe there’s a very wide market for them,” says Gupta. While OEMs typically determine pricing, Gupta says it is possible to buy a GPU server with 4 GPUs for less than $10,000. “It’s essentially in the $5,000-$10,000 range to buy a server fully equipped,” says Gupta. Goes to 11 The benefits of GPUs can be found both in enhanced performance and accessibility. Mark Berger, NVIDIA’s specialist in life and material sciences, recently joined the company after working in drug discovery with Cytokinetix. “I see huge momentum in GPUs, there’s a real wind in our back with a lot of people in academia and software development in national labs and software companies working on GPU versions,” says Berger. To showcase the performance of the M2090, Gupta cites work using the popular AMBER 11 molecular dynamics software. “Using 4 GPUs, you can now simulate 69 nanoseconds [of molecular dynamics] per day,” says Gupta. Previously, this kind of simulation would require access to a supercomputer in a national laboratory, such as KRAKEN, the 192-quad-core CPU supercomputer at the Oak Ridge National Laboratory, which held the previous simulation record at 46 ns/day. “This is the fastest result ever reported,” says Ross Walker, a researcher at the San Diego Supercomputer Center who did the AMBER benchmarking. “AMBER users from a university department can now accelerate their scientific work as if NVIDIA Telsa M2090 they had a supercomputer in their own lab. Other life sciences customers include Boston Scientific (magnetic resonance imaging), Max Planck Institute (3-D electron cryo-microscopy), Massachusetts General Hospital (imaging), and OpenEye. Gupta adds: “It democratizes access to this software to every researcher around the world. You don’t have to write a grant proposal to get access to a supercomputer.” Similar analyses and results are being obtained by David E. Shaw and colleagues, but Gupta points out that their work is performed on a custom supercomputer, Anton. There are several bioinformatics applications already running on GPUs, including BLAST, Hidden Markov Models, and MATLAB. “Users can get real performance and quite easily port their applications to the GPU,” says Gupta. “The toughest task is that most applications are written with a sequential mind frame—CPUs are inherently sequential. Users have to rethink some of the applications to take advantage of the GPU acceleration and parallel processor.” A key question facing potential users is, do they have to modify the entire application? “The answer is no,” says Gupta. “When I open a photograph on my hard disk, this is a fairly sequential task, suitable for a CPU. Once a photo is open, you might want to do red eye reduction, autofocus etc. Those tasks modify each pixel mathematically. That’s extremely amenable to GPU. That’s the only part of Picasa you’d have to port to a GPU. Now take sequence search software. Reading the database, opening the sequences can continue to run on the CPU. But the search gets accelerated by GPUs.” x www.bio-itworld.com+6-:|"6(6452011 #*0t*5 803-% [ 47 ] CONTENTS NVIDIA has released the latest version of its flagship GPU (graphics processing unit) processor, the Tesla M2090. Company executives claim this to be the fastest processor for high-performance computing (HPC) in the market, accelerating applications and offering a 20-30% increase in speed and performance compared to its predecessor, the M2070. According to Sumit Gupta, NVIDIA’s Tesla product line manager, “life sciences is our #1 vertical” in terms of widespread adoption and the number of users. Applications range from molecular dynamics to genome sequence analysis, with at least one next-generation sequencing company using GPUs in its instruments. The Tesla M2090 GPU is equipped with 512 CUDA parallel processing cores, delivering 665 gigaflops of peak doubleprecision performance and providing application acceleration up to 10x compared to a CPU alone. At the same time, HP is announcing the release of a new server featuring 8 NVIDIA GPUs, the HP ProLiant SL390 G7 4U server. The SL390 family is built for hybrid computing environments that combine GPUs and CPUs. The SL390 G7 4U server incorporates up to eight Tesla M2090 GPUs in a 4U chassis. With a configuration of 8 GPUs to two CPUs, this server has the highest GPU-to-CPU ratio currently available, says Gupta. (Just a few years ago, no server could take even 1 GPU.) The ideal configuration would be 1 CPU core to 1 GPU. “We’re not there yet with this server, but getting closer,” says Gupta. (For the record, Gupta notes that Dell has an extension box that can take up to 16 GPUs, but this is not a single server—it has to connect to another machine.) Gupta says most customers work with OEMs—NVIDIA doesn’t sell direct, but helps move applications to a GPU. “We’re trying to learn from users about the sci- IT / Workflow Panasas ActiveStor Storage Goes to 11 /(4TUPSBHFQSPEVDUTFFLTUPCBMBODFQFSGPSNBODFDBQBDJUZBOEDPTU at a more attractive cost. We’re confident that will help us in life sciences,” says Panasas unveiled the latest version of its Noer. It represents “the lowest dollar/TB ActiveStor storage product line at the Inoption for all three models.” ternational Supercomputing Conference Until now, Panasas products carried in Germany in June. The ActiveStor 11 up to 40 TB/chassis. But that has now product features 3-terabyte (TB) enterexpanded to 60 TB/chassis with the use prise drives without a price premium. The of 3-TB drives, which has a substantial California company hopes it will prove impact on scaling. “We now scale to 6 an attractive offering for life sciences petabytes in a single file system.” customers in general, and next-genera“Performance has been in Panasas’ tion sequencing (NGS) applications in particular. Panasas’ background is in technical computing. It boasts five years of consecutive revenue growth (42% in 2010) as it pushes into new markets from its traditional strengths in energy, finance/ risk analysis, universities, and government/defense. “Our customers tend to start by buying 1-2 shelves of our storage, and then become loyal customers over time and expand that footprint. That’s one of the attributes that comes with having such a scalable system,” says Geoffrey Noer, Panasas’ senior director of Panasas ActiveStor 11 product marketing. DNA from the very beginning. That The introduction of ActiveStor 11, hasn’t been as much of a core need for a which nestles between the top-of-the-line competitor like Isilon in their prior mar12 (launched last year) and the more afkets,” says Noer. “Institutions are deployfordable 8 products, should appeal to life ing more and more NGS machines, with sciences organizations. Existing clients faster run times. Workloads are becominclude NIH, Yale University, BGI in ing about large-file throughput rather China, and Uppsala University in Swethan millions of small files. So NGS is a den. “We left a gap so we could introduce perfect application for ActiveStor stor11 after 12,” Noer explains, adding that age. You have a blade design that allows he expects ActiveStor 11 to represent the you to grow as needed—a single shelf can bulk of sales going forward. stand on its own, but it takes less than ten minutes to grow capacity as you need it Performance Issues to. And you can maintain a single global The ActiveStor 12 delivers 80 megabytes/ namespace.” sec per SATA drive. “Where performance Noer says several genome institutes is the top factor, this is the solution,” says are using Panasas for high-performance Noer. The ActiveStor 11 is some 20% less needs, but also using storage from Isilon expensive. “Some markets need more of a (which he admits offers a much lower dolbalance between performance and capaclar/TB footprint) for their bulk capacity ity, that’s where ActiveStor 11 is available BY KEVIN DAVIES CONTENTS [48 ]#*0t*5 803-%+6-:|"6(6452011 www.bio-itworld.com requirement. “It can make sense to have both installed,” he says. Private Clouds Panasas is actively looking at the private cloud to further its momentum, a move applauded by IDC analyst Earl Joseph, who says that the “ActiveStor 11 appliance is well positioned to capitalize on this important [HPC] trend.” But at least one top bio-IT industry consultant, BioTeam’s Chris Dagdigian, blasted private clouds as “empty hype” at Bio-IT World Conference and Expo a couple of months ago. “With all due respect to [Dagdigian], I see the public cloud as being more of an overhyped approach than private clouds,” responds Noer. “Our products are the best suited for big data workloads, typically hundreds of terabytes or petabytes of storage. That data is valuable and highly proprietary. Trying to leverage the public cloud fails on several reasons—the cost of the bandwidth is out of sight, and you have a lot of security and performance concerns having the data remote... We don’t see a lot of traction for big data workloads in public clouds.” Noer acknowledges that “private clouds” is a new marketing name for what was previously labeled utility computing or grid computing. “But the trend has been taking place for many years, before the term “private cloud” was invented. The desire to centralize is a very real direction and gaining momentum,” he says. He also admits that Panasas has been criticized in the past that its price/TB was unapproachable. “Now, with the 50% bump in capacity with the 3-TB drives in addition to cost reductions we’re announcing and ActiveStor 11, all those things make Panasas a very attractive option.” x The Russell Transcript DREAM6 Breaks New Ground JOHN RUSSELL oughly five years ago the organizers of DREAM— Dialogue for Reverse Engineering Assessment and Methods—set out to find the best algorithms for Advanced Aggregate inferring biological networks from blinded data sets. One interesting aspect of this aggregation approach is that even Emulating the CASP* program, they created an analgorithms that perform poorly overall may get a particular nual competition in which researchers downloaded interaction right and be captured in the aggregate prediction. data for a set of challenges and used their favorite algorithms In one DREAM4 challenge, 11 of 12 groups identified a new to solve the problem. Winners were announced at the annual interaction inferred from the data that was included in the agDREAM conference, and the results published in an effort to gregate prediction although many of the teams’ create a valuable resource. overall predictions were poor. “Even suboptiA funny thing happened along the way. mal algorithms have a place in the zoo of algoActually two important things. First, it turns The aggregate of rithm,” he says. out there is no such critter as the perfect algoIt’s collaboration by competition, says Storithm. “Data itself is so high dimensional, the the predictions is lovitzky. “So people try to do something against biology itself is so complex, that probably the really robust with each other but unbeknownst to them, when you notion of finding the best algorithm to analyze aggregate their predictions on exactly the same a data set was a little too simplistic,” says Gusrespect to any of data you are really making them collaborate. In tavo Stolovitzky, DREAM chair and manager that aggregate prediction is where the wisdom of functional genomics and systems biology, at the individual of the crowd can emerge.” All of a sudden, “It is IBM’s Computational Biology Center. predictions; very not worth it to try to develop the best algorithm Potentially much more important, it also if the aggregate is always the best.” turned out the aggregate prediction of comoften it is better Instead, perhaps, problem selection bepeting groups was nearly always the best precomes more important. This doesn’t mean diction or in the top three. This unexpected than the best... or work on developing great algorithms is worthresult highlighting the wisdom of the crowd is among the best. less; it’s not, emphasizes Stolovitzky, but it beprompting DREAM to rethink its mission and comes secondary. Tackling real world problems begin seeking to put “collaboration by competibecomes more enticing and impactful. One issue is incentivtion” to work solving real-world problems rather than chase izing the activity. “People like to be the best at something,” notes down better algorithms. Stolovitzky. It’s a fascinating finding, which might have great utility in Recently, the challenges for DREAM6 were posted (http:// unraveling thorny basic biology questions as well as use in early the-dream-project.org; submission deadline is August 22, drug discovery. winners will be announced October 14). This year, Stolovitzky “The big encompassing lesson is that without a doubt there says there is no specific network inference challenge while the is wisdom of crowds. Consistently the aggregate of the predicDREAM team mulls over its future direction. However, there tions is really robust with respect to any of the other individual is one on diagnosing Acute Myeloid Leukemia from patient predictions; very often it is better than the best, and when it’s samples using flow cytometry data. $SJUJDBM"TTFTTNFOUPG5FDIOJRVFTGPS1SPUFJO4USVDUVSF1SFEJDUJPO Change is in the wind for DREAM. www.bio-itworld.com+6-:|"6(6452011 #*0t*5 803-% [ 49 ] CONTENTS R not, it is among the best,” explains Stolovitzky. Stolovitzky says, “There are many ways of aggregating predictions. The one we have been using is very robust.” It’s a little complicated, so with apologies to Gustavo, here’s an attempt to simplify and summarize it: Each competing team produces an overall solution to a challenge (e.g. fairly granular description of a particular signaling pathway or gene regulatory network). They rank each of their solution’s components (e.g. an interaction between two genes) in terms of their confidence it is correct. One interaction may be ranked high while another may be ranked low. A team’s overall solution may or may not perform well. Stolovitzky averages the confidence rankings for each interaction from all the teams and produces a new aggregate solution to the challenge by re-ranking the interactions according to their average rank. Obviously there’s a bit more to it, but you get the broad picture. Educational Opportunities ,FFQBCSFBTUPGUIFWBSJFUZPGFEVDBUJPOBMFWFOUTJOUIFMJGFTDJFODFJOEVTUSZ UIBUXJMMIFMQZPVXJUIZPVSCVTJOFTTBOEQSPGFTTJPOBMOFFET5PQSFWJFX BNPSFJOEFQUIMJTUJOHPGFEVDBUJPOBMPGGFSJOHTWJTJUUIFi&WFOUTwTFDUJPO of bio-itworld.com. [email protected]. Bio-IT World Conference & Expo Europe 0DUPCFS])BOOPWFS(FSNBOZ NGS Data Management 0DUPCFS])BOOPWFS(FSNBOZ Next-Generation Sequencing Data Management 4FQUFNCFS]1SPWJEFODF3* Featured Events CHI Events 'PSNPSFJOGPSNBUJPOPOUIFTFDPOGFSFODFT BOEPUIFS$)*FWFOUTWJTJUhealthtech.com. MOLECULAR DIAGNOSTICS SUMMIT EUROPE IT Infrastructure and the Cloud 0DUPCFS])BOOPWFS(FSNBOZ Drug Discovery Informatics 0DUPCFS])BOOPWFS(FSNBOZ Bioinformatics 0DUPCFS])BOOPWFS(FSNBOZ Molecular Diagnostics Summit Europe 0DUPCFS])BOOPWFS(FSNBOZ Emerging Molecular Diagnostics Partnering Forum "VHVTU]8BTIJOHUPO%$ Molecular Diagnostics for Cancer 0DUPCFS])BOOPWFS(FSNBOZ Next Generation Diagnostics Summit "VHVTU]8BTIJOHUPO%$ Convergence of Technologies for Point-of-Care Diagnostics 0DUPCFS])BOOPWFS(FSNBOZ ADAPT 2011 4FQUFNCFS]1IJMBEFMQIJB1" Molecular Diagnostics for Infectious Disease 0DUPCFS])BOOPWFS(FSNBOZ CONTENTS Cloud Computing: Looking Beyond the Cloud 4FQUFNCFS]-B+PMMB$" NGS: Molecular Diagnostics Magnified 0DUPCFS])BOOPWFS(FSNBOZ Barnett Educational Services 7JTJUBarnettInternational.com for detailed information on Barnett’s live TFNJOBSTJOUFSBDUJWFXFCTFNJOBSTPOTJUFUSBJOJOHQSPHSBNTDVTUPNJ[FEF-FBSOJOH EFWFMPQNFOUTFSWJDFTBOEQVCMJDBUJPOT Web Seminars Regulatory Intelligence "VHVTU Source Documentation: What is Adequate & Accurate? "VHVTU How to Prepare and Submit a Bullet Proof 510(k) "VHVTU]4BO'SBODJTDP$" Writing and Maintaining the Canadian CTA (Clinical Trial Application) "VHVTU Monitoring Clinical Drug Studies: Intermediate "VHVTU]4BO'SBODJTDP$" Sponsor Management of Investigator Non-Compliance "VHVTU Adverse Events: Managing and Reporting for Pharmaceuticals 4FQUFNCFS]$IJDBHP*- Introduction to Data Management "VHVTU t$POUFOUUBJMPSFEUPZPVS VOJRVFOFFET Introduction to Signal Detection and Data Mining "VHVTU t$PSFDPNQFUFODZBTTFTTNFOUT BOEFYBNT Comparing FDA and Health Canada Regulations: Using an ICH GCP Framework "VHVTU Gap Analysis: How to Bridge the NonApprovable to the Approved Marketing Application "VHVTU 10 Week CRA & CRC: Beginner Program 4FQUFNCFS [50 ]#*0t*5 803-%+6-:|"6(6452011 Conducting Clinical Trials in ResourceLimited Settings "VHVTU]4BO'SBODJTDP$" Monitoring Phase I Clinical Trials "VHVTU eLearning Solutions t)JHIMZJOUFSBDUJWFGFBUVSFTXJUI BEVMUMFBSOJOHJONJOE Live Seminars www.bio-itworld.com Clinical Drug Development 4FQUFNCFS]$IJDBHP*- Introduction to Clinical Data Management 4FQUFNCFS]#PTUPO." Introduction to Clinical Project Management 4FQUFNCFS]#PTUPO." Negotiation Skills for Clinical Research Professionals 4FQUFNCFS]1IJMBEFMQIJB1" Patient Recruitment and Retention 4FQUFNCFS]1IJMBEFMQIJB1" Pharmacovigilance Audit 4FQUFNCFS]1IJMBEFMQIJB1" Webcasts, White Papers, and Podcasts JODSFBTFJOQSPEVDUJWJUZXJUIOFBS DPTUSFEVDUJPO Visit: www.bio-itworld.com to download 7JTJUbio-itworld.comUPCSPXTFPVS FYUFOTJWFMJTUPGDPNQMJNFOUBSZ-JGF 4DJFODFXIJUFQBQFSTQPEDBTUTBOE webcasts. The Power of HP Converged Infrastructure for Genomic Research Sponsored by HP 5PMFBSONPSFBCPVUEFWFMPQJOH BNVMUJNFEJBMFBEHFOFSBUJOH TPMVUJPODPOUBDUmarketing_chmg@ chimediagroup.com. Webcast Enabling Better Data Relationships Utilizing Oracle 11g New Oracle Data Miner GUI, and Applications “Powered by ODM” Sponsored by: Oracle Whitepapers A BPM-Approach to Adverse Event Management Sponsored by Pegasystems 5PEBZTCJPQIBSNB Surfing the Rich Data Deluge companies are drownJOHVOEFSBEFMVHFPG EJHJUBMJNBHFTBOEPUIFS SJDIEBUB'SPN/FYU(FO %/"TFRVFODJOHUPIJHI DPOUFOUTDSFFOJOH)$4 UIFTFSBQJEMZFWPMWJOH UFDIOPMPHJFTBSFPQFOJOHJNQPSUBOUBWFOVFTPG TDJFOUJGJDFYQMPSBUJPOBDSPTTESVHEJTDPWFSZBOE EFWFMPQNFOU)PXFWFSUIFZBMTPDPOGSPOU*5 PSHBOJ[BUJPOTXJUIUIFDIBMMFOHFPGNBOBHJOH MBSHFEJWFSTFEBUBTFUT -FBSOUIFDSJUJDBMJTTVFTBOETUFQTSFRVJSFEUP EFWFMPQFGGFDUJWF*5TUSBUFHJFTGPSNBOBHJOH BOENBYJNJ[JOHUIFWBMVFPGJNBHFBOESJDI EBUBJOUIJTXIJUFQBQFS Visit: www.bio-itworld.com to download STEPS TOWARD DEVELOPING AN EFFECTIVE IT STRATEGY By John Russell, Contributing Editor, #JPr*58PSME Produced by Cambridge Healthtech Media Group Custom Publishing www.tessella.com 8FC4ZNQPTJB4FSJFTDPWFSTBCSPBEBSSBZ PGUPQJDTXJUIJOUIFMJGFTDJFODFTBOEESVH development enterprise. t 3FHJTUFSGPSVQDPNJOHXFCTZNQPTJB t -JTUFOUPSFDPSEFEXFCFWFOUT t 1VSDIBTFB%7%PS&MFDUSPOJD7FSTJPO Safety management is POFPGUIFNPTUEJGGJDVMU SFRVJSFNFOUTJNQPTFE POUIFMJGFTDJFODFT JOEVTUSZ$PNQBOJFT A BPM-Approach to Safety Management confront a tangle of safety monitoring SFRVJSFNFOUTUIBUTQBO CPUIQSFBOEQPTU market approval activiUJFTBOEWBSZCZ*3#*&$HPWFSOBODFQPMJDJFT QSPEVDUUZQFBOEEJGGFSFOUHMPCBMSFHVMBUPSZ BHFODJFT-FBSOIPX1FHBTZTUFNT#1.BOEJUT "EWFSTF&WFOU$BTF1SPDFTTJOH4PMVUJPO"&$1 DBOIFMQDPNQBOJFT t5SBOTGPSNBEWFSTFFWFOUNBOBHFNFOUTZTtems t-PXFSDPTUXJUIJODSFBTFEQSPEVDUJWJUZ 0OFDMJFOUTVDDFTTGVMMZVUJMJ[FE1FHB#1.UP FTUBCMJTIQBQFSMFTTBEWFSTFFWFOUSFQPSUJOH BDSPTTDPVOUSJFTSFTVMUJOHJOVQXBSETPGB BY JOHN RUSSELL Produced by Cambridge Healthtech Media Group Custom Publishing Surfing the Rich Data Deluge — Developing an IT Strategy Sponsored by Tessella CONTENTS *5EFQBSUNFOUTPGBMMTJ[FTJODSFBTJOHMZSFMZ POTFMGNBOBHJOHTZTUFNTUPIFMQPWFSDPNF DIBMMFOHFTXJUIMPXFSDPTUBOESJTL#ZVTJOH FNCFEEFE0SBDMFUFDIOPMPHZBTUSBOTQBSFOU CVJMEJOHCMPDLTJOBQQMJDBUJPOTPSEFWJDFT*47 BOE0&.TPMVUJPOEFWFMPQFSTDBOPGGFSMJGFTDJFODFTSPCVTUEBUBNBOBHFNFOUDBQBCJMJUJFT 4PGUXBSFEFWFMPQFSTWJFXJOHUIJTMJWFXFCJOBS DBOMFBSOIPX0SBDMFFNCFEEBCMFQSPEVDUT NBLFJUFBTJFSUPEFWFMPQNBOBHFBOEEFQMPZ TFDVSFSFMJBCMFBOETDBMBCMFDVTUPNFSTPMVUJPOT0OUIF&OE6TFSTJEF0SBDMF%BUBCBTF UFDIOPMPHJFTBOEBQQMJDBUJPOTBSFSVOOJOHJO BMMUIFUPQMJGFTDJFODFTDPNQBOJFTBOEUPQ NFEJDBMEFWJDFDPNQBOJFT6TFSTWJFXJOH UIJTMJWFXFCJOBSDBOMFBSOIPX0%.BVUPNBUJDBMMZEJTDPWFSTSFMBUJPOTIJQTIJEEFOJO EBUBBOEIPXQSFEJDUJWFNPEFMTBOEJOTJHIUT EJTDPWFSFEXJUI0SBDMF%BUB.JOJOHBEESFTTMJGF TDJFODFTIFBMUIDBSFBOECVTJOFTTQSPCMFNT Visit: www.bio-itworld.com to download )1BJETUIFHFOPNJDTSFTFBSDIQSPDFTT by providing scalable TUPSBHFTPMVUJPOTUIBU simplify data analysis BOERVJDLBDDFTTUPUIBU EBUBTPUIBUUIFHPBMPG QFSTPOBMJ[FENFEJDJOF DBOCFSFBMJ[FE-FBSO IPX)1IBTXPSLFEXJUI MJGFTDJFODFTQSPGFTTJPOBMTUPNFFUUIFNPTU EFNBOEJOHDPNQVUBUJPOBMBOETUPSBHFOFFET Visit: www.bio-itworld.com to download w w w. p e g a . c o m Podcast Metrics that Matter: How Actionable Data Can Drive Better Decisions 8JUIUIFXJEFTQSFBE adoption of eClinical UFDIOPMPHZDMJOJDBMPQFSBUJPOTEFQBSUNFOUTIBWF BDDFTTUPVOQSFDFEFOUFE BNPVOUTPGEBUB5IJT QPEDBTUXJMMEJTDVTTIPX DMJOJDBMCVTJOFTTBOBMZUJDT DBOIFMQTQPOTPSTFGGFDUJWFMZNJOFUIBUEBUBUP make more informed decisions. *OEVTUSZFYFDVUJWFTXJMMBEESFTTUIFGPMMPXJOH RVFTUJPOT t8IBUBSFUIFLFZUFDIOJDBMBOEPSHBOJ[BUJPOBMDIBMMFOHFTUPFGGJDJFODZJODMJOJDBM operations? t)PXJTUIFEFGJOJUJPOPGBDUJPOBCMFEBUB evolving? t 4QPOTPSBTZNQPTJVNPOBUPQJDPG your choice For details onUIF8FC4ZNQPTJB 4FSJFTWJTJU www.bio-itworld symposia.com or email marketing [email protected] t)PXDBODMJOJDBMCVTJOFTTBOBMZUJDTCFBO BHFOUPGDIBOHFGPSBOPSHBOJ[BUJPO t8IBUUFDIOJDBMBOEPSHBOJ[BUJPOBMJTTVFT TIPVMETQPOTPSTDPOTJEFSXIFOTFFLJOHB CVTJOFTTBOBMZUJDTTPMVUJPO 4QFBLFST4UFQIFO:PVOH4FOJPS1SPEVDU %JSFDUPS.FEJEBUB4PMVUJPOTBOE-BVSJF )BMMPSBO$&01SFTJEFOU)BMMPSBO$POTVMUJOH (SPVQ Listen Now — Visit: www.bio-itworld.com to download www.bio-itworld.com+6-:|"6(6452011 #*0t*5 803-% [51 ] Register by August 12 & Save up to $200! CAMBRIDGE HEALTHTECH INSTITUTE’S INAUGURAL September 19-20, 2011 LOOKING BEYOND THE CLOUD Boosting Life Science Researches and Drug Discovery with Ubiquitous High Performance Computing Focused Sessions on: Keynote Presentation: t High Performance Computing in the Cloud How We Got Here, Where We Are, and Where We Are Heading Jeff Barr, Web Services Evangelist, Amazon.com t Science-as-a-Service t Genomics in the Cloud The Hilton La Jolla Torrey Pines La Jolla, CA t Pharma Adopting the Cloud t Ubiquitous Personal Health Service Don’t Miss - Pre-Conference Events: t Cloud Computing Training: Amazon Web Services t Cloud Computing and Genome Content Management Driving Translational Bioinformatics through the Next Decade t Orchestrating Cloud Systems and Workflows with Opscode Chef t Ensuring Information Security and Compliance When Moving into the Cloud Premier Sponsor Corporate Sponsors Corporate Support Sponsor Official Publication Organized by: Cambridge Healthtech Institute 250 First Avenue, Suite 300, Needham, MA 02494 T: 781-972-5400 or toll-free in the U.S. 888-999-6288 'tXXXIFBMUIUFDIDPN Bio-ITCloudSummit.com